The advent of Large Language Models (LLMs) has ushered in a new era of artificial intelligence, demonstrating remarkable capabilities in understanding and generating human-like text.1 Models such as OpenAI's ChatGPT and Google's Gemini have rapidly permeated various aspects of our digital lives, showcasing proficiency in tasks ranging from content creation to answering complex queries.1 These sophisticated AI systems, built upon transformer-based architectures, excel at processing vast amounts of textual data, identifying intricate language patterns, and producing coherent and contextually relevant responses.4 Their ability to learn from massive datasets has enabled them to achieve impressive feats in natural language processing, making them seemingly versatile tools for a multitude of applications.6
In the realm of decision-making, the Analytic Hierarchy Process (AHP) stands as a well-established and mathematically rigorous methodology.8 Developed by Thomas Saaty, AHP provides a structured framework for tackling complex decisions involving multiple criteria and alternatives.10 This method assists decision-makers in identifying the most logical choice by systematically breaking down the problem into a hierarchical structure, quantifying subjective preferences through pairwise comparisons, and then synthesizing these preferences to arrive at a prioritized ranking of the alternatives.8 AHP's foundation in mathematical principles, including matrix algebra and eigenvalue calculations, ensures a transparent and defensible decision-making process.13 Given the increasing capabilities of LLMs in handling complex tasks, a pertinent question arises: can these language-centric models effectively recreate and fully explain the intricacies of a mathematically driven process like AHP, particularly in practical scenarios such as selecting a leader for an organization or purchasing a family car? This blog post will delve into the core mathematical principles of AHP, examine the strengths and limitations of current LLMs, and explore the reasons why a significant gap exists between the capabilities of LLMs and the demands of accurately performing and explaining AHP analyses.
Unpacking the Mathematical Elegance of AHP
At its core, the Analytic Hierarchy Process is built upon a set of fundamental principles that guide decision-makers through a structured analytical process.8 The first of these principles is the hierarchical structuring of the decision problem.9 AHP requires that the decision problem be decomposed into a hierarchy, typically starting with the overall goal at the top, followed by the criteria and sub-criteria that influence the decision, and culminating in the alternatives being considered at the bottom level.10 For instance, in the example of choosing a leader, the overarching goal might be "Select the best leader for the organization." This goal would then be broken down into criteria such as "Vision," "Communication Skills," and "Experience." Finally, the different candidates for the leadership position would be listed as the alternatives at the lowest level of the hierarchy.9 Similarly, when buying a family car, the goal might be "Purchase the ideal family vehicle," with criteria like "Safety," "Fuel Efficiency," "Price," and "Comfort," and various car models as the alternatives.12 This structured approach allows for a systematic analysis of the decision by breaking it down into smaller, more manageable components.9
The second key principle of AHP involves pairwise comparisons.12 To assess the relative importance of the criteria and the preference of the alternatives with respect to each criterion, decision-makers engage in a series of pairwise comparisons.10 For criteria, they might compare "Vision" against "Communication Skills," asking which is more important for a leader and by how much.16 For each criterion, they would then compare the alternatives, for example, asking how much more preferable Car A is to Car B in terms of safety.10 These comparisons are quantified using Saaty's nine-point scale, which ranges from 1 (equal importance/preference) to 9 (extreme importance/preference), with intermediate values for nuanced judgments.11 If one element is deemed less important or preferred, the reciprocal values (1/2 to 1/9) of the scale are used.10 This method allows decision-makers to express the intensity of their preferences in a structured way.12
The results of these pairwise comparisons are then organized into a reciprocal pairwise comparison matrix.17 For 'n' criteria, an 'n x n' matrix is constructed. The diagonal elements of this matrix are always 1, as they represent the comparison of an element with itself.13 The off-diagonal elements (i, j) represent the ratio of the importance of element 'i' over element 'j' based on the pairwise judgment. A crucial aspect of this matrix is its reciprocal nature: if element (i, j) has a value 'x', then element (j, i) will have the value 1/x.16 This reciprocal relationship ensures consistency within the judgments. Only the upper or lower triangular part of the matrix needs to be filled directly, as the other part can be automatically derived from the reciprocals.13 Similar matrices are created for comparing the alternatives with respect to each of the identified criteria.10
The next stage of AHP involves a mathematical synthesis of the information captured in the comparison matrices.10 This synthesis primarily relies on eigenvector calculation.17 From each pairwise comparison matrix, the relative weights or priorities of the elements being compared are derived by calculating the principal eigenvector of the matrix.13 An eigenvector is a non-zero vector that, when multiplied by a square matrix, results in a vector that is a scalar multiple of the original vector. This scalar is known as the eigenvalue.19 In AHP, the normalized principal eigenvector (the eigenvector corresponding to the largest eigenvalue) provides the priority weights of the criteria or the relative scores of the alternatives with respect to a specific criterion.14 The elements of this eigenvector, when normalized to sum to 1, represent the relative importance or preference of each element in the comparison set.13 For instance, the eigenvector derived from the criteria comparison matrix will give the relative weights of each criterion in the overall decision.10 While eigenvector calculation is the mathematically rigorous method, approximation techniques like column normalization (summing each column to 1 and then averaging across rows) or using the geometric mean of each row followed by normalization can provide estimates, especially when the level of inconsistency in the judgments is low.20
Once the priority weights of the criteria and the scores of the alternatives on each criterion are determined, these values are aggregated to obtain an overall priority score for each alternative.10 This aggregation is typically performed using a weighted sum model.10 For each alternative, its score on each criterion is multiplied by the weight of that criterion, and these products are then summed across all the criteria. This process yields a final numerical score for each alternative, allowing for a direct comparison and ranking.13 The alternative with the highest overall score is considered the most preferred choice based on the AHP analysis.10
Finally, AHP incorporates a crucial step to assess the reliability of the input judgments through the calculation of the consistency ratio (CR).21 Human judgments are not always perfectly consistent, and AHP provides a mechanism to measure this inconsistency.13 The CR is calculated by first determining the consistency index (CI) using the principal eigenvalue (λ_max) of the comparison matrix and the number of elements being compared (n): CI = (λ_max - n) / (n - 1).21 The CR is then obtained by comparing the CI with the random index (RI), which is the average CI of randomly generated reciprocal matrices of the same size.22 The formula for the consistency ratio is CR = CI / RI.22 A CR of 0.10 or less is generally considered acceptable, indicating a reasonable level of consistency in the decision-maker's judgments.21 A CR exceeding this threshold suggests that the judgments might be too inconsistent and should be reviewed.22
The mathematical rigor inherent in AHP, particularly the eigenvector calculation and the consistency ratio, is what distinguishes it as a robust and reliable decision-making method.8 This framework allows for the systematic decomposition of complex problems, the quantification of subjective preferences, and an assessment of the consistency of those preferences, ultimately leading to more defensible and logical decision outcomes.11
Â
LLMs Under the Hood: A Look at Their Strengths and Inherent Limitations
Large Language Models have revolutionized the field of artificial intelligence with their ability to process and generate human-like text.5 These models, such as OpenAI's GPT series, Google's Gemini family, Perplexity AI, and xAI's Grok, are primarily based on the transformer architecture.1 This architecture utilizes self-attention mechanisms that allow the model to weigh the importance of different parts of an input sequence when making predictions, enabling them to capture intricate contextual relationships within the text.2 The transformer's ability to process sequences in parallel has been a key factor in the efficiency and scalability of modern LLMs.4
A crucial aspect of LLM development is pre-training.1 These models are trained on massive datasets comprising text and code from a wide range of sources, including books, articles, and websites.4 This extensive training allows LLMs to learn the underlying patterns of language, develop a broad understanding of grammar and semantics, and accumulate a vast amount of general knowledge.2 Following pre-training, LLMs can undergo fine-tuning on smaller, task-specific datasets to optimize their performance for particular applications, such as question answering, text summarization, or code generation.1 This fine-tuning process helps to align the model's outputs with desired formats and improve accuracy for specific use cases.2
While the foundational architecture is similar across major LLM providers, each platform has its own unique nuances and focuses. OpenAI's GPT models have been at the forefront of scaling language model capabilities, consistently pushing the boundaries with larger and more sophisticated models.7 Google's Gemini models emphasize multimodal capabilities, allowing them to process and generate information across different modalities like text, images, audio, and video.26 Perplexity AI distinguishes itself through its Retrieval-Augmented Generation (RAG) approach, which integrates real-time web search and source citations into its responses, often leveraging powerful base models like GPT-4 and Anthropic's Claude.28 xAI's Grok, known for its access to real-time data from the X platform and its "sense of humor," also boasts advanced reasoning capabilities in its latest iterations like Grok 3.30
The strengths of LLMs in natural language processing and reasoning are undeniable.3 They excel at tasks such as generating coherent and contextually relevant text, summarizing lengthy documents, translating languages with increasing accuracy, and analyzing the sentiment expressed in text.3 LLMs can also answer a wide range of questions, engage in seemingly natural conversations, and even perform basic deductive reasoning by identifying patterns and following logical structures present in their training data.6 These capabilities have made them invaluable tools across various industries for tasks like customer service, content creation, and information retrieval.3
However, despite their impressive linguistic abilities, current LLMs face inherent limitations, particularly when it comes to precise numerical computation and handling structured analytical tasks that rely heavily on mathematical rigor.35 One significant challenge is their struggle with multi-step calculations and maintaining logical consistency in numerical reasoning.37 Unlike humans or traditional computational systems, LLMs process information based on statistical probabilities of word sequences, which can lead to inaccuracies when dealing with interdependent numerical variables or complex mathematical operations.39 They also exhibit sensitivity to numerical changes and problem complexity.36 Even minor alterations in numerical values or an increase in the number of steps required to solve a problem can significantly impact their performance and accuracy.41
A fundamental limitation lies in their reliance on pattern matching over true logical reasoning, especially in mathematical contexts.36 LLMs learn from the vast amounts of text they are trained on, and their ability to solve problems often stems from recognizing patterns similar to those they have encountered before, rather than applying fundamental mathematical principles or axioms.42 This can lead to difficulties when faced with novel problems or situations that deviate from familiar patterns.40 Furthermore, LLMs are prone to "hallucinations," generating incorrect or fabricated information with a high degree of confidence.38 This is particularly concerning in areas requiring precise factual knowledge or calculation, as LLMs might produce plausible-sounding but mathematically incorrect results.44 Finally, current LLMs have a limited ability to perform symbolic reasoning and handle abstract mathematical concepts.39 Tasks that require manipulating mathematical symbols, understanding algebraic relationships, or grasping the principles of linear algebra, such as eigenvector calculation, pose a significant challenge for their text-centric architecture.42
The AHP Bottleneck: Why LLMs Struggle with Replication and Explanation
The inherent limitations of current LLMs become particularly apparent when attempting to apply them to the Analytic Hierarchy Process. Several core aspects of AHP present significant challenges for these language-based models, hindering their ability to accurately perform and explain the methodology.
One major bottleneck is the difficulty LLMs face in handling matrix operations.19 While they can certainly understand and generate text that describes a pairwise comparison matrix, performing the actual mathematical operations required in AHP, such as matrix multiplication, normalization, and especially the crucial eigenvector calculation, is not a native capability of their architecture.17 Constructing the initial pairwise comparison matrix itself can also be problematic. It requires not just understanding the user's preferences but also ensuring the reciprocal relationships between the paired elements are correctly represented, a task that demands a logical application of the Saaty scale rather than just pattern recognition in text.13
The accurate performance of eigenvector calculations represents another significant hurdle.14 Deriving the priority weights of criteria and the scores of alternatives in AHP relies heavily on extracting the principal eigenvector from the pairwise comparison matrices.18 This process involves iterative numerical methods or direct algebraic solutions that demand a high degree of precision and a fundamental understanding of linear algebra concepts.19 LLMs, without relying on external computational tools, typically struggle with such precise numerical computations and abstract mathematical concepts.42 While approximation methods for eigenvector estimation exist, their reliability is contingent upon the consistency of the input judgments, and LLMs may lack the capacity to determine when these approximations are sufficient or when a more rigorous calculation is necessary.20
The nuanced concept of the consistency ratio in AHP also presents a challenge for LLMs.21 Calculating the CR requires not only determining the principal eigenvalue of the comparison matrix (another numerical computation) but also understanding the statistical basis for the acceptable threshold of inconsistency.22 Interpreting whether a calculated CR value (e.g., below 0.10) indicates acceptable consistency requires a grasp of the underlying mathematical relationships within the comparison matrix and the implications of inconsistencies for the validity of the AHP results.21 While an LLM might be instructed to calculate the CR if provided with the formula and access to a calculator tool, explaining the theoretical significance and the acceptable range necessitates a deeper understanding of AHP's mathematical foundations that goes beyond pattern recognition in text.23
Finally, current LLMs often exhibit limitations in explaining the rationale behind the numerical outputs of AHP in a mathematically rigorous way.8 When asked to describe the results of an AHP analysis, an LLM might provide a high-level overview of the process and the final ranking of alternatives.13 However, it typically falls short in offering a detailed, step-by-step, and mathematically accurate explanation of how each numerical value – the weights of the criteria, the scores of the alternatives, and the consistency ratio – was precisely derived.8 Their explanations are more likely to be based on patterns observed in text describing AHP rather than a direct application of the mathematical formulas and procedures involved.36
To illustrate these limitations, consider the example of choosing a leader for an organization. An LLM might be able to list potential criteria like "vision," "communication skills," and "experience" and even generate some qualitative comparisons between candidates based on these criteria.10 However, it would likely struggle to construct a truly consistent pairwise comparison matrix that accurately reflects the relative importance of these subjective criteria. Furthermore, accurately calculating the eigenvector from this matrix to determine the precise weight of each criterion and then explaining the mathematical derivation of these weights would be a significant challenge.13 The final overall score for each candidate, derived from aggregating their performance on each criterion weighted by the criterion's importance, would likely lack the necessary mathematical justification if generated solely by the LLM without external computational aid.
Similarly, in the scenario of buying a family car, an LLM might identify relevant criteria such as "safety," "fuel efficiency," and "price" and even provide some textual comparisons between different car models.12 However, when it comes to performing the pairwise comparisons between the cars based on these criteria and then accurately calculating the priority scores for each car with respect to each criterion, the LLM's limitations in numerical reasoning would likely lead to inaccuracies.13 Explaining the mathematical derivation of the final ranking of the cars and the consistency of the preferences expressed in the pairwise comparisons would also likely lack the required precision, potentially resulting in a plausible-sounding explanation that is not mathematically sound or consistent with the principles of AHP.
A Comparative Diagnosis: LLM Capabilities in Multi-Criteria Decision Analysis
To gain a clearer understanding of how different LLM platforms fare in the context of AHP, it's useful to compare their statistical capabilities for Multi-Criteria Decision Analysis, paying particular attention to their mathematical accuracy.
OpenAI's GPT Models (e.g., GPT-4o, o3, o1): These models demonstrate significant strengths in understanding and following complex instructions, which could be beneficial in guiding them through the steps of AHP.3 Notably, their ability to leverage tools like the Code Interpreter provides a potential pathway for performing the mathematical calculations inherent in AHP, such as matrix operations and eigenvector extraction.47 When the problem scope is constrained and domain-specific, GPT models have shown high accuracy in some analytical tasks.50 However, they possess inherent limitations in numerical accuracy and multi-step mathematical reasoning if relying solely on the model without tool use, which could lead to errors in AHP calculations.36 There is also the potential for them to "hallucinate" numerical results, and they might struggle with the abstract linear algebra concepts involved in eigenvector calculation even with detailed instructions.49
Google's Gemini (e.g., Gemini 2.5 Pro, 1.5): Gemini models exhibit strong reasoning abilities in benchmark tests, which could be advantageous for understanding the logic behind AHP.26 Their multimodal capabilities offer the potential to represent AHP hierarchies visually, and their large context windows could allow for the processing of complex AHP scenarios.52 Furthermore, their performance in mathematical reasoning is continually improving.55 However, like other LLMs, the accuracy of Gemini in performing the precise numerical calculations required for eigenvector extraction and consistency ratio in AHP might still be a challenge.56 Their performance in free-form analytical tasks, as opposed to structured benchmark settings, could also be less reliable.
Perplexity AI: As a search and answer engine that often leverages powerful LLMs like GPT-4 and Claude, Perplexity AI excels at information retrieval and providing answers with source citations.28 Its "Deep Research" feature could be particularly useful for gathering information about the criteria and alternatives in AHP examples.58 However, Perplexity AI itself lacks the inherent capability to perform the complex numerical computations of AHP.29 Any mathematical operations would depend on the underlying LLM it utilizes, inheriting the strengths and weaknesses of that model in numerical accuracy.61
xAI's Grok (e.g., Grok 3, Grok 1.5): Grok models, especially the latest Grok 3, have demonstrated strong performance in mathematics and reasoning benchmarks.31 Grok's real-time data access could be beneficial for obtaining up-to-date information relevant to AHP criteria and alternatives, and Grok 3's "Think" mode emphasizes reasoning and accuracy.64 This suggests that Grok might have a higher potential for handling the mathematical aspects of AHP compared to some other LLMs.66 However, as a relatively newer model, detailed evaluations of its ability to perform all the specific steps of AHP, including eigenvector calculation and consistency ratio, are still needed.67
LLM Provider | Strengths in MCDA/AHP | Weaknesses in MCDA/AHP | Mathematical Accuracy for AHP |
OpenAI (GPT Models) | Strong instruction following, potential for tool use (Code Interpreter) for calculations, high accuracy in constrained analytical tasks. | Limitations in inherent numerical accuracy, potential for hallucinations, may struggle with abstract linear algebra concepts. | Relies on external tools for accurate calculations, accuracy without tools is limited. |
Google (Gemini) | Strong reasoning abilities, multimodal capabilities for visualization, large context windows, improving mathematical reasoning. | Accuracy in complex numerical calculations (eigenvector, consistency ratio) may still be a challenge. | Improving, but needs further evaluation for precise AHP calculations. |
**Perplexity AI | Excellent information retrieval with source citations, "Deep Research" for gathering information. | Lacks inherent computational capabilities for AHP, accuracy depends on the base LLM. | Limited, relies on the mathematical capabilities of the underlying LLM (GPT-4, Claude, etc.). |
xAI (Grok) | Strong performance in math and reasoning benchmarks (Grok 3), real-time data access, "Think" mode for accuracy. | Relatively newer model, detailed evaluation for all AHP steps needed. | Promising, particularly Grok 3, but requires further assessment in the specific context of AHP. |
Mastering the Prompt: Strategies for Eliciting Better AHP Results from LLMs
While LLMs may not inherently possess the mathematical capabilities to fully execute AHP, strategic prompt engineering can significantly improve the quality and accuracy of their responses related to this methodology. Here are some key strategies to consider when crafting prompts for LLMs in the context of AHP:
Clearly Define the Task: Explicitly state that you want the LLM to perform or explain a specific step of the AHP process. For example, instead of a vague request like "Analyze this decision," ask "Explain how to construct a pairwise comparison matrix for the following criteria..." or "Using the following pairwise comparison matrix, calculate the priority weights of the criteria.".47
Break Down the Process: Since LLMs can struggle with multi-step reasoning, break down the AHP process into smaller, sequential steps in your prompt. Guide the LLM through each stage, from defining the hierarchy to calculating the consistency ratio. This can help the model stay focused and reduce the chances of errors.37
Specify the Output Format: Clearly instruct the LLM on how you want the output to be formatted. For example, if you're asking for a pairwise comparison matrix, specify that it should be a square matrix with criteria listed in rows and columns, and the values should represent the comparison ratios using Saaty's scale.16 If you need priority weights, ask for a numbered list or a table with the criterion name and its corresponding weight.13
Emphasize Mathematical Accuracy: Explicitly instruct the LLM to prioritize mathematical accuracy in its calculations and explanations. You can even ask it to double-check its work or to show its steps if it's capable of performing calculations (especially if it can use tools). Phrases like "Ensure the mathematical calculations are precise" or "Provide a mathematically accurate explanation" can be helpful.49
Provide Context and Examples: Offer clear context about the decision problem you're working on, including the goal, criteria, and alternatives. Providing a small example of a pairwise comparison or a sample matrix can also help the LLM understand the task better and follow the desired format.16
Encourage Step-by-Step Explanation: Ask the LLM to explain its reasoning and the steps it took to arrive at a particular answer. This can help you understand the model's approach and identify any potential errors in its logic or calculations. Requesting explanations for each step of the eigenvector calculation or the consistency ratio determination can be particularly useful.37
Prompt for Consistency Checks: If the LLM is generating pairwise comparisons, you can prompt it to consider the consistency of its judgments. For example, you can ask, "If criterion A is much more important than B, and B is slightly more important than C, ensure that A is significantly more important than C in your comparisons".21
Iterative Refinement: Be prepared to refine your prompts based on the LLM's initial responses. If the output is not what you expected, try rephrasing your prompt, providing more specific instructions, or breaking down the task further.25
Consider Using Tools (if available): If the LLM has access to tools like a code interpreter or a calculator, instruct it to use these tools for the mathematical calculations involved in AHP. This can significantly improve the accuracy of the numerical results.47
Request Citations (if applicable): If you're asking the LLM to explain AHP concepts or best practices, prompt it to cite its sources if it has access to a knowledge base or can browse the web (like Perplexity AI). This can help you verify the information it provides.28
By employing these prompt engineering strategies, users can guide LLMs to provide more structured, accurate, and insightful responses related to the Analytic Hierarchy Process, even if the models cannot fully execute the mathematical computations natively.
Best-in-Class Prompts: Practical Examples for AHP with LLMs
To further illustrate how to effectively prompt LLMs for AHP-related tasks, here are some best-in-class prompt examples for the "Choosing a leader for an organization" and "Buying a family car" scenarios mentioned in the Wikipedia article:
Example 1: Choosing a Leader for an Organization
Prompt 1 (Focus on explaining pairwise comparison matrix construction):
You are an expert in the Analytic Hierarchy Process (AHP). Our organization needs to choose a new leader, and we have identified three key criteria: Vision, Communication Skills, and Experience. We will be comparing three candidates: Alice, Bob, and Carol.
Explain step-by-step how to construct a pairwise comparison matrix for these three criteria using Saaty's nine-point scale. Provide specific examples of the questions we should ask to compare each pair of criteria (Vision vs. Communication Skills, Vision vs. Experience, Communication Skills vs. Experience) and how the answers to these questions translate into numerical values in the matrix.
Format your answer clearly, starting with an explanation of the matrix structure, followed by the example questions and their corresponding matrix entries. Use formatting like bold text and bullet points to enhance readability. 🚀
Prompt 2 (Focus on calculating priority weights using an approximation method):
You are an expert in the Analytic Hierarchy Process (AHP). We have performed pairwise comparisons for the three criteria for choosing a leader (Vision, Communication Skills, and Experience) and obtained the following pairwise comparison matrix:
  Vision | Communication Skills | Experience
------|----------------------|------------|------------
Vision | 1 | 3 | 5
Communication Skills | 1/3 | 1 | 2
Experience | 1/5 | 1/2 | 1
Explain how to calculate the approximate priority weights for each of these criteria using the column normalization method. Show each step of the calculation clearly. What do these weights tell us about the relative importance of Vision, Communication Skills, and Experience in our leader selection process? 🤔
Example 2: Buying a Family Car
Prompt 1 (Focus on explaining the overall AHP process):
You are an expert in the Analytic Hierarchy Process (AHP). We are trying to decide which of three family cars to buy: Sedan X, SUV Y, or Minivan Z. We have identified four important criteria: Safety, Fuel Efficiency, Price, and Comfort.
Explain the complete Analytic Hierarchy Process we should follow to make this decision. Detail each step, including:
1. Structuring the hierarchy.
2. Conducting pairwise comparisons for the criteria and the cars with respect to each criterion using Saaty's nine-point scale. Provide example questions for each type of comparison.
3. Calculating the priority weights of the criteria and the scores of the cars.
4. Combining these weights and scores to rank the cars.
5. Assessing the consistency of our judgments using the consistency ratio.
Use clear and concise language, and feel free to use emoticons to make the explanation engaging. 🚗💨
Prompt 2 (Focus on explaining eigenvector calculation conceptually):
You are an expert in the Analytic Hierarchy Process (AHP). We have constructed a pairwise comparison matrix for the criteria for buying a family car (Safety, Fuel Efficiency, Price, and Comfort). Now we need to derive the priority weights from this matrix.
Explain the concept of eigenvector calculation in the context of AHP in simple, layman's terms. Why is the principal eigenvector important, and what does it represent in our analysis? You don't need to perform the actual calculation, just explain the underlying idea and its significance in determining the weights of our car-buying criteria. 💡
These prompts are designed to be clear, specific, and engaging, guiding the LLM to focus on particular aspects of the AHP process relevant to the given examples. The use of role-playing ("You are an expert..."), step-by-step instructions, specified output formats, and emphasis on clarity and accuracy aims to elicit more informative and useful responses from the LLM.
The Horizon of AI in AHP: Looking Towards the Future
Looking ahead, the advancement of LLMs holds both promise and challenges for their application in AHP analysis. Current limitations in mathematical accuracy and deep logical reasoning mean that fully automating or accurately explaining the intricacies of AHP remains a significant hurdle.35 However, the rapid pace of development in the field suggests that future iterations of LLMs may overcome some of these obstacles.64
One potential avenue for improvement lies in the development of Large Numerical Models (LNMs) that are specifically designed for high-precision numerical computation and mathematical reasoning.46 The integration of such LNMs with existing LLMs could create a collaborative system where the LLM handles the understanding and interpretation of the decision problem, as well as the communication of results in natural language, while the LNM tackles the complex mathematical calculations of AHP with greater accuracy and reliability.46
Another interesting concept on the horizon is that of "LLM-as-Judge".70 This framework explores using LLMs to evaluate or assess information based on predefined criteria. In the context of AHP, an LLM could potentially be trained or prompted to act as a "judge" to evaluate the consistency of pairwise comparison judgments provided by a user.72 By comparing the user's judgments against expected patterns or even generating its own ideal judgments based on the problem context, the LLM could provide feedback on the level of consistency and suggest areas for review.70 While not directly performing the AHP calculations, this "LLM-as-Judge" approach could significantly enhance the accuracy and reliability of AHP analyses conducted by human decision-makers by helping them refine their subjective inputs.
Furthermore, the continuous improvement in LLMs' reasoning capabilities, as seen in models like Grok 3 with its "Think" mode, suggests that future models might be better equipped to understand the logical relationships and mathematical principles underlying AHP.31 With more extensive training on mathematical data and the development of more sophisticated reasoning architectures, LLMs could potentially move beyond mere pattern matching and develop a more robust understanding of mathematical concepts.42
However, it's important to remain grounded in the current realities. While LLMs can be valuable tools for understanding and explaining the conceptual framework of AHP, and even for assisting with certain aspects like information gathering and prompt generation, their direct application to performing the core mathematical calculations of AHP with high accuracy remains limited.44 For the foreseeable future, relying on dedicated software or computational tools will likely be necessary for the accurate execution of the mathematical steps in AHP. The true potential of LLMs in this domain might lie in their ability to augment human decision-makers by providing guidance, explanations, and consistency checks, rather than fully replacing the need for mathematical computation.
Conclusion: Navigating the Future of Decision Analysis with AI
In conclusion, while Large Language Models have made remarkable strides in natural language processing and reasoning, their current capabilities fall short of being able to fully recreate or accurately explain the Analytic Hierarchy Process, particularly concerning the mathematical rigor required for tasks like eigenvector calculation and consistency ratio analysis. The text-centric architecture of LLMs, coupled with their limitations in precise numerical computation and abstract mathematical understanding, presents a significant hurdle in directly applying them to the mathematically elegant framework of AHP.
However, this does not negate the potential for LLMs to play a valuable role in the broader context of decision analysis using AHP. Their strengths in understanding and explaining complex concepts, coupled with their ability to process and generate human-like text, make them useful tools for guiding users through the AHP methodology, assisting with the definition of hierarchies and criteria, and even providing feedback on the consistency of judgments. Strategic prompt engineering can further enhance the quality and accuracy of LLM responses related to AHP.
Looking towards the future, the advent of specialized Large Numerical Models and the concept of "LLM-as-Judge" offer promising avenues for improving the integration of AI with AHP. Collaborative systems where LLMs and LNMs work in tandem could potentially bridge the gap between natural language understanding and mathematical accuracy. Furthermore, LLMs acting as "judges" could help human decision-makers refine their judgments and improve the reliability of AHP analyses.
Ultimately, the intersection of AI and decision analysis is an evolving landscape. While current LLMs may not be able to replace the need for mathematical computation in AHP, their ability to augment human intelligence and provide valuable support throughout the decision-making process suggests a promising future for this intriguing intersection. As AI continues to advance, we can anticipate even more sophisticated ways in which these powerful language models can be leveraged to enhance and improve how we approach complex decisions.
Works cited
Understanding LLM Architecture: How Large Language Models Work - KX, accessed April 3, 2025, https://kx.com/glossary/llm-architecture/
What is LLM? - Large Language Models Explained - AWS, accessed April 3, 2025, https://aws.amazon.com/what-is/large-language-model/
Large language models: The basics and their applications - Moveworks, accessed April 3, 2025, https://www.moveworks.com/us/en/resources/blog/large-language-models-strengths-and-weaknesses
Understanding the LLM Architecture - Metaschool, accessed April 3, 2025, https://metaschool.so/articles/llm-architecture
Exploring Architectures and Capabilities of Foundational LLMs - Coralogix, accessed April 3, 2025, https://coralogix.com/ai-blog/exploring-architectures-and-capabilities-of-foundational-llms/
What Are Large Language Models (LLMs)? - IBM, accessed April 3, 2025, https://www.ibm.com/think/topics/large-language-models
How ChatGPT and our foundation models are developed - OpenAI Help Center, accessed April 3, 2025, https://help.openai.com/en/articles/7842364-how-chatgpt-and-our-foundation-models-are-developed
The Analytic Hierarchy Process: A Mathematical Model for Decision Making Problems - Open Works - The College of Wooster, accessed April 3, 2025, https://openworks.wooster.edu/cgi/viewcontent.cgi?article=7071&context=independentstudy
Analytic hierarchy process - Wikipedia, accessed April 3, 2025, https://en.wikipedia.org/wiki/Analytic_hierarchy_process
What is the Analytic Hierarchy Process (AHP)? - 1000minds, accessed April 3, 2025, https://www.1000minds.com/decision-making/analytic-hierarchy-process-ahp
Make Effective Decisions. Comprehensive Guide to Analytic Hierarchy Process (AHP), accessed April 3, 2025, https://www.6sigma.us/six-sigma-in-focus/analytic-hierarchy-process-ahp/
What to Do? Let's Think It Through! Using the Analytic Hierarchy Process to Make Decisions, accessed April 3, 2025, https://kids.frontiersin.org/articles/10.3389/frym.2020.00078
Analytic Hierarchy Process (What is AHP) Pair-wise Comparison (What is pair-wise comparison?) - Informatika, accessed April 3, 2025, https://informatika.stei.itb.ac.id/~rinaldi.munir/AljabarGeometri/2017-2018/AHPTutorial.pdf
Decision-making with the AHP: Why is the principal eigenvector necessary, accessed April 3, 2025, https://www.stat.uchicago.edu/~lekheng/meetings/mathofranking/ref/saaty1.pdf
Shared decision-making – transferring research into practice: the Analytic Hierarchy Process (AHP) - PMC, accessed April 3, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC2650240/
The AHP Pairwise Process. How the Analytic Hierarchy Process… | by Bill Adams | DLProdTeam | Medium, accessed April 3, 2025, https://medium.com/dlprodteam/the-ahp-pairwise-process-c639eadcbd0e
Analytic hierarchy process as module for productivity evaluation and decision-making of the operation theater - PMC, accessed April 3, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC4759970/
CALCULATING THE AHP PRIORITY VECTOR - DTIC, accessed April 3, 2025, https://apps.dtic.mil/sti/pdfs/AD1088926.pdf
Math of AHP - Welcome to Comparion Help Center!, accessed April 3, 2025, https://comparion.knowledgeowl.com/help/math-of-ahp
AHP Calculation Methods - SpiceLogic, accessed April 3, 2025, https://www.spicelogic.com/docs/ahpsoftware/intro/ahp-calculation-methods-396
Understanding Consistency Index, Consistency Ratio, Transitivity Rule in Analytic Hierarchy Process - YouTube, accessed April 3, 2025, https://www.youtube.com/watch?v=Ks0hw9Hmv8k
Consistency Improvement in the Analytic Hierarchy Process - MDPI, accessed April 3, 2025, https://www.mdpi.com/2227-7390/12/6/828
Consistency ratio and Transitivity Rule. - SpiceLogic, accessed April 3, 2025, https://www.spicelogic.com/docs/ahpsoftware/intro/ahp-consistency-ratio-transitivity-rule-388
OpenAI GPT Models - Lei Mao's Log Book, accessed April 3, 2025, https://leimao.github.io/article/OpenAI-GPT-Models/
OpenAI — Understand Foundational Concepts of ChatGPT and cool stuff you can explore!, accessed April 3, 2025, https://medium.com/@amol-wagh/open-ai-understand-foundational-concepts-of-chatgpt-and-cool-stuff-you-can-explore-a7a77baf0ee3
What is Google Gemini? | IBM, accessed April 3, 2025, https://www.ibm.com/think/topics/google-gemini
Gemini: A New Multimodal AI Model of Google - Comet.ml, accessed April 3, 2025, https://www.comet.com/site/blog/gemini-a-new-multimodal-ai-model-of-google/
What Model Does Perplexity AI Use: An In-Depth Exploration | Flyrank, accessed April 3, 2025, https://www.flyrank.com/blogs/ai-insights/what-model-does-perplexity-ai-use-an-in-depth-exploration
Perplexity AI: A Deep Dive - Reflections, accessed April 3, 2025, https://annjose.com/post/perplexity-ai/
Grok (chatbot) - Wikipedia, accessed April 3, 2025, https://en.wikipedia.org/wiki/Grok_(chatbot)
Grok 3 Beta — The Age of Reasoning Agents | xAI, accessed April 3, 2025, https://x.ai/news/grok-3
The Strengths and Limitations of Large Language Models, accessed April 3, 2025, https://newsletter.ericbrown.com/p/strengths-and-limitations-of-large-language-models
The Strengths and Limitations of Large Language Models in Reasoning, Planning, and Code Integration | by Jacob Grow | Medium, accessed April 3, 2025, https://medium.com/@Gbgrow/the-strengths-and-limitations-of-large-language-models-in-reasoning-planning-and-code-41b7a190240c
The Strengths, Weaknesses and Dangers of LLMs - MinIO Blog, accessed April 3, 2025, https://blog.min.io/the-strengths-weaknesses-and-dangers-of-llms/
Limitations of LLM Reasoning - DZone, accessed April 3, 2025, https://dzone.com/articles/llm-reasoning-limitations
A Quick Summary Of Several Limitations In The Current Evaluations Of Llms' Mathematical Reasoning Capabilities - Murat Durmus (CEO @AISOMA_AG), accessed April 3, 2025, https://murat-durmus.medium.com/a-quick-summary-of-several-limitations-in-the-current-evaluations-of-llms-mathematical-reasoning-52b2eb357043
Solving Numerical Reasoning in AI: Insights from PROCESSBENCH and Applications in Conversational Agents | by Anna Alexandra Grigoryan, accessed April 3, 2025, https://thegrigorian.medium.com/solving-numerical-reasoning-in-ai-dc072f1f8479
What Are the Limitations of Large Language Models (LLMs)? - PromptDrive.ai, accessed April 3, 2025, https://promptdrive.ai/llm-limitations/
Mathematical revolution! LLMs break down barriers and tackle mathematical challenges (Part 2) - Telefónica Tech, accessed April 3, 2025, https://telefonicatech.com/en/blog/llm-mathematical-challenges
GSM-Symbolic: Analyzing LLM Limitations in Mathematical Reasoning and Potential Solutions - Gretel.ai, accessed April 3, 2025, https://gretel.ai/blog/gsm-symbolic-analyzing-llm-limitations-in-mathematical-reasoning
Apple study highlights limitations of LLMs | MobiHealthNews, accessed April 3, 2025, https://www.mobihealthnews.com/news/apple-study-highlights-limitations-llms
[D] How does LLM solves new math problems? : r/MachineLearning - Reddit, accessed April 3, 2025, https://www.reddit.com/r/MachineLearning/comments/1ihsftt/d_how_does_llm_solves_new_math_problems/
Understanding LLMs and overcoming their limitations | Lumenalta, accessed April 3, 2025, https://lumenalta.com/insights/understanding-llms-overcoming-limitations
How Large Language Models Perform in Quantitative Management Problem-Solving - arXiv, accessed April 3, 2025, https://www.arxiv.org/pdf/2502.16556
Calibrated Decision-Making through Large Language Model-Assisted Retrieval - arXiv, accessed April 3, 2025, https://arxiv.org/html/2411.08891v1
Why AI Needs Large Numerical Models (LNMs) for Mathematical Mastery - AI Blog, accessed April 3, 2025, https://www.artificial-intelligence.blog/ai-news/why-ai-needs-large-numerical-models-lnms-for-mathematical-mastery
Enhancing Multi-Criteria Decision Analysis with AI: Integrating Analytic Hierarchy Process and GPT-4 for Automated Decision Support - arXiv, accessed April 3, 2025, https://arxiv.org/pdf/2402.07404
(PDF) Enhancing Multi-Criteria Decision Analysis with AI: Integrating Analytic Hierarchy Process and GPT-4 for Automated Decision Support - ResearchGate, accessed April 3, 2025, https://www.researchgate.net/publication/378157475_Enhancing_Multi-Criteria_Decision_Analysis_with_AI_Integrating_Analytic_Hierarchy_Process_and_GPT-4_for_Automated_Decision_Support
Does ChatGPT make mistakes more when we use numbers? - Prompting, accessed April 3, 2025, https://community.openai.com/t/does-chatgpt-make-mistakes-more-when-we-use-numbers/580132
Assessing The Accuracy and Reliability Of LLMs In Classroom Analytics Exercises: Insights From Three Case Studies - ResearchGate, accessed April 3, 2025, https://www.researchgate.net/publication/386548980_Assessing_The_Accuracy_and_Reliability_Of_LLMs_In_Classroom_Analytics_Exercises_Insights_From_Three_Case_Studies
OpenAI Research Finds That Even Its Best Models Give Wrong Answers a Wild Proportion of the Time : r/Futurology - Reddit, accessed April 3, 2025, https://www.reddit.com/r/Futurology/comments/1gn2mmo/openai_research_finds_that_even_its_best_models/
Google's Multimodal AI Gemini - A Technical Deep Dive - Unite.AI, accessed April 3, 2025, https://www.unite.ai/googles-multimodal-ai-gemini-a-technical-deep-dive/
Our next-generation model: Gemini 1.5 - Google Blog, accessed April 3, 2025, https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/
Google's Gemini 2.5 Pro model tops LMArena by close to 40 points - R&D Magazine, accessed April 3, 2025, https://www.rdworldonline.com/googles-gemini-2-5-pro-model-tops-lmarena-by-40-points-outperforms-competitors-in-scientific-reasoning/
Gemini Pro - Google DeepMind, accessed April 3, 2025, https://deepmind.google/technologies/gemini/pro/
Artificial intelligence in healthcare education: evaluating the accuracy of ChatGPT, Copilot, and Google Gemini in cardiovascular pharmacology - Frontiers, accessed April 3, 2025, https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2025.1495378/full
An Introduction to RAG Models - Perplexity, accessed April 3, 2025, https://www.perplexity.ai/page/an-introduction-to-rag-models-jBULt6_mSB2yAV8b17WLDA
Introducing Perplexity Deep Research, accessed April 3, 2025, https://www.perplexity.ai/hub/blog/introducing-perplexity-deep-research
Perplexity AI Enhances Research Capabilities with Deep Research - AI-Pro.org, accessed April 3, 2025, https://ai-pro.org/learn-ai/articles/perplexity-ai-elevates-fact-finding-capabilities-with-deep-research/
What is Perplexity's default language model, and how does it compare to Pro options?, accessed April 3, 2025, https://www.perplexity.ai/hub/technical-faq/what-model-does-perplexity-use-and-what-is-the-perplexity-model
Why do tools like Perplexity struggle to calculate accurate stats from different sources when the exact number is not posted online? : r/LocalLLaMA - Reddit, accessed April 3, 2025, https://www.reddit.com/r/LocalLLaMA/comments/1i34642/why_do_tools_like_perplexity_struggle_to/
Perplexity AI: The Game-Changer in Conversational AI and Web Search, accessed April 3, 2025, https://originality.ai/blog/perplexity-ai-statistics
Grok-3 - Most Advanced AI Model from xAI - OpenCV, accessed April 3, 2025, https://opencv.org/blog/grok-3/
Grok 3 AI Is Here: Is Elon Musk's xAI 'Smartest AI on Earth' a Disruptor or Just Another AI?, accessed April 3, 2025, https://www.fluid.ai/blog/grok-3-is-here-is-xais-smartest-ai
Elon Musk Released Grok 3 : xAI New Reasoning Model - MPG ONE, accessed April 3, 2025, https://mpgone.com/elon-musk-released-grok-3-xai-new-reasoning-model/
Grok 3 Review: I Tested 100+ Prompts and Here's the Truth (2025) - Writesonic Blog, accessed April 3, 2025, https://writesonic.com/blog/grok-3-review
Exploring the latest features of Grok 3: xAI's chatbot - Ultralytics, accessed April 3, 2025, https://www.ultralytics.com/blog/exploring-the-latest-features-of-grok-3-xais-chatbot
How Grok 3 compares to ChatGPT, DeepSeek and other AI rivals - Mashable, accessed April 3, 2025, https://mashable.com/article/grok-3-versus-chatgpt-deepseek-ai-rivals-comparison
Accurate and safe LLM numerical calculations using Interpreter and LLMTool, accessed April 3, 2025, https://jschrier.github.io/blog/2024/01/09/Accurate-and-safe-LLM-numerical-calculations-using-Interpreter-and-LLMTools.html
AHP-Powered LLM Reasoning for Multi-Criteria Evaluation of Open-Ended Responses - ACL Anthology, accessed April 3, 2025, https://aclanthology.org/2024.findings-emnlp.101.pdf
AHP-Powered LLM Reasoning for Multi-Criteria Evaluation of Open-Ended Responses, accessed April 3, 2025, https://www.researchgate.net/publication/384598854_AHP-Powered_LLM_Reasoning_for_Multi-Criteria_Evaluation_of_Open-Ended_Responses
Enhancing Adversarial Robustness of LLMs with Analytic Hierarchy Process - OpenReview, accessed April 3, 2025, https://openreview.net/forum?id=DMUGTMWrKZ&referrer=%5Bthe%20profile%20of%20Jiahao%20Zhao%5D(%2Fprofile%3Fid%3D~Jiahao_Zhao1)
Enhancing Adversarial Robustness of LLMs with Analytic Hierarchy Process - OpenReview, accessed April 3, 2025, https://openreview.net/pdf?id=DMUGTMWrKZ
Is LLM Good at Math? | Restackio, accessed April 3, 2025, https://www.restack.io/p/using-llms-in-software-development-answer-llm-good-at-math-cat-ai
[2501.13936] Evaluating Computational Accuracy of Large Language Models in Numerical Reasoning Tasks for Healthcare Applications - arXiv, accessed April 3, 2025, https://arxiv.org/abs/2501.13936