Paper Content:
Page 1:
AGENTIC RETRIEVAL -AUGMENTED GENERATION : A S URVEY ON
AGENTIC RAG
Aditi Singh
Department of Computer Science
Cleveland State University
Cleveland, OH, USA
a.singh22@csuohio.eduAbul Ehtesham
The Davey Tree Expert Company
Kent, OH, USA
abul.ehtesham@davey.comSaket Kumar
The MathWorks Inc
Natick, MA, USA
saketk@mathworks.com
Tala Talaei Khoei
Khoury College of Computer Science
Roux Institute at Northeastern University
Portland, ME, USA
t.talaeikhoei@northeastern.edu
ABSTRACT
Large Language Models (LLMs) have revolutionized artificial intelligence (AI) by enabling human-
like text generation and natural language understanding. However, their reliance on static training
data limits their ability to respond to dynamic, real-time queries, resulting in outdated or inaccurate
outputs. Retrieval-Augmented Generation (RAG) has emerged as a solution, enhancing LLMs by
integrating real-time data retrieval to provide contextually relevant and up-to-date responses. Despite
its promise, traditional RAG systems are constrained by static workflows and lack the adaptability
required for multi-step reasoning and complex task management.
Agentic Retrieval-Augmented Generation (Agentic RAG) transcends these limitations by embedding
autonomous AI agents into the RAG pipeline. These agents leverage agentic design patterns reflec-
tion, planning, tool use, and multi-agent collaboration to dynamically manage retrieval strategies,
iteratively refine contextual understanding, and adapt workflows through clearly defined operational
structures ranging from sequential steps to adaptive collaboration. This integration enables Agentic
RAG systems to deliver unparalleled flexibility, scalability, and context-awareness across diverse
applications.
This survey provides a comprehensive exploration of Agentic RAG, beginning with its foundational
principles and the evolution of RAG paradigms. It presents a detailed taxonomy of Agentic RAG archi-
tectures, highlights key applications in industries such as healthcare, finance, and education, and exam-
ines practical implementation strategies. Additionally, it addresses challenges in scaling these systems,
ensuring ethical decision-making, and optimizing performance for real-world applications, while
providing detailed insights into frameworks and tools for implementing Agentic RAG1. The GitHub
link for this survey is available at: https://github.com/asinghcsu/AgenticRAG-Survey .
Keywords Large Language Models (LLMs) ·Artificial Intelligence (AI) ·Natural Language Understanding ·
Retrieval-Augmented Generation (RAG) ·Agentic RAG ·Autonomous AI Agents ·Reflection ·Planning ·Tool
Use ·Multi-Agent Collaboration ·Agentic Patterns ·Contextual Understanding ·Dynamic Adaptability ·Scalability ·
Real-Time Data Retrieval ·Taxonomy of Agentic RAG ·Healthcare Applications ·Finance Applications ·Educational
Applications ·Ethical AI Decision-Making ·Performance Optimization ·Multi-Step Reasoning
1GitHub link : https://github.com/asinghcsu/AgenticRAG-SurveyarXiv:2501.09136v3 [cs.AI] 4 Feb 2025
Page 2:
1 Introduction
Large Language Models (LLMs) [ 1,2] [3], such as OpenAI’s GPT-4, Google’s PaLM, and Meta’s LLaMA, have signifi-
cantly transformed artificial intelligence (AI) with their ability to generate human-like text and perform complex natural
language processing tasks. These models have driven innovation across diverse domains, including conversational
agents [ 4], automated content creation, and real-time translation. Recent advancements have extended their capabilities
to multimodal tasks, such as text-to-image and text-to-video generation [ 5], enabling the creation and editing of videos
and images from detailed prompts [6], which broadens the potential applications of generative AI.
Despite these advancements, LLMs face significant limitations due to their reliance on static pre-training data. This
reliance often results in outdated information, hallucinated responses [ 7], and an inability to adapt to dynamic, real-world
scenarios. These challenges emphasize the need for systems that can integrate real-time data and dynamically refine
responses to maintain contextual relevance and accuracy.
Retrieval-Augmented Generation (RAG) [ 8,9] emerged as a promising solution to these challenges. By combining
the generative capabilities of LLMs with external retrieval mechanisms [ 10], RAG systems enhance the relevance and
timeliness of responses. These systems retrieve real-time information from sources such as knowledge bases [ 11], APIs,
or the web, effectively bridging the gap between static training data and the demands of dynamic applications. However,
traditional RAG workflows remain limited by their linear and static design, which restricts their ability to perform
complex multi-step reasoning, integrate deep contextual understanding, and iteratively refine responses.
The evolution of agents [12] has significantly enhanced the capabilities of AI systems. Modern agents, including
LLM-powered and mobile agents [13], are intelligent entities capable of perceiving, reasoning, and autonomously
executing tasks. These agents leverage agentic patterns, such as reflection [14], planning [15], tool use, and multi-agent
collaboration [16], to enhance decision-making and adaptability.
Furthermore, these agents employ agentic workflow patterns [ 12,13], such as prompt chaining, routing, parallelization,
orchestrator-worker models, and evaluator-optimizer , to structure and optimize task execution. By integrating these
patterns, Agentic RAG systems can efficiently manage dynamic workflows and address complex problem-solving
scenarios. The convergence of RAG and agentic intelligence has given rise to Agentic Retrieval-Augmented Generation
(Agentic RAG) [ 14], a paradigm that integrates agents into the RAG pipeline. Agentic RAG enables dynamic retrieval
strategies, contextual understanding, and iterative refinement [ 15], allowing for adaptive and efficient information
processing. Unlike traditional RAG, Agentic RAG employs autonomous agents to orchestrate retrieval, filter relevant
information, and refine responses, excelling in scenarios requiring precision and adaptability. The overview of Agentic
RAG is in figure 1.
This survey explores the foundational principles, taxonomy, and applications of Agentic RAG. It provides a comprehen-
sive overview of RAG paradigms, such as Naïve RAG, Modular RAG, and Graph RAG [ 16], alongside their evolution
into Agentic RAG systems. Key contributions include a detailed taxonomy of Agentic RAG frameworks, applications
across domains such as healthcare [ 17,18], finance, and education [ 19], and insights into implementation strategies,
benchmarks, and ethical considerations.
The structure of this paper is as follows: Section 2 introduces RAG and its evolution, highlighting the limitations of
traditional approaches. Section 3 elaborates on the principles of agentic intelligence and agentic patterns. Section 4
elaborates agentic workflow patterns. Section 5 provides a taxonomy of Agentic RAG systems, including single-agent,
multi-agent, and graph-based frameworks. Section 6 provides comparative analysis of Agentic RAG frameworks.
Section 7 examines applications of Agentic RAG, while Section 8 discusses implementation tools and frameworks.
Section 9 focuses on benchmarks and dataset, and Section 10 concludes with future directions for Agentic RAG systems.
2 Foundations of Retrieval-Augmented Generation
2.1 Overview of Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) represents a significant advancement in the field of artificial intelligence,
combining the generative capabilities of Large Language Models (LLMs) with real-time data retrieval. While LLMs
have demonstrated remarkable capabilities in natural language processing, their reliance on static pre-trained data
often results in outdated or incomplete responses. RAG addresses this limitation by dynamically retrieving relevant
information from external sources and incorporating it into the generative process, enabling contextually accurate and
up-to-date outputs.
2
Page 3:
Figure 1: An Overview of Agentic RAG
2.2 Core Components of RAG
The architecture of RAG systems integrates three primary components (Figure2):
•Retrieveal : Responsible for querying external data sources such as knowledge bases, APIs, or vector databases.
Advanced retrievers leverage dense vector search and transformer-based models to improve retrieval precision
and semantic relevance.
•Augmentation : Processes retrieved data, extracting and summarizing the most relevant information to align
with the query context.
•Generation : Combines retrieved information with the LLM’s pre-trained knowledge to generate coherent,
contextually appropriate responses.
2.3 Evolution of RAG Paradigms
The field of Retrieval-Augmented Generation (RAG) has evolved significantly to address the increasing complexity of
real-world applications, where contextual accuracy, scalability, and multi-step reasoning are critical. What began as
simple keyword-based retrieval has transitioned into sophisticated, modular, and adaptive systems capable of integrating
diverse data sources and autonomous decision-making processes. This evolution underscores the growing need for
RAG systems to handle complex queries efficiently and effectively.
This section examines the progression of RAG paradigms, presenting key stages of development—Naïve RAG,
Advanced RAG, Modular RAG, Graph RAG, and Agentic RAG alongside their defining characteristics, strengths, and
3
Page 4:
Figure 2: Core Components of RAG
limitations. By understanding the evolution of these paradigms, readers can appreciate the advancements made in
retrieval and generative capabilities and their application in various domains
2.3.1 Naïve RAG
Naïve RAG [ 20] represents the foundational implementation of retrieval-augmented generation. Figure 3 illustrates the
simple retrieve-read workflow of Naive RAG, focusing on keyword-based retrieval and static datasets.. These systems
rely on simple keyword-based retrieval techniques, such as TF-IDF and BM25, to fetch documents from static datasets.
The retrieved documents are then used to augment the language model’s generative capabilities.
Figure 3: An Overview of Naive RAG.
Naïve RAG is characterized by its simplicity and ease of implementation, making it suitable for tasks involving
fact-based queries with minimal contextual complexity. However, it suffers from several limitations:
•Lack of Contextual Awareness : Retrieved documents often fail to capture the semantic nuances of the query
due to reliance on lexical matching rather than semantic understanding.
•Fragmented Outputs : The absence of advanced preprocessing or contextual integration often leads to
disjointed or overly generic responses.
•Scalability Issues : Keyword-based retrieval techniques struggle with large datasets, often failing to identify
the most relevant information.
Despite these limitations, Naïve RAG systems provided a critical proof-of-concept for integrating retrieval with
generation, laying the foundation for more sophisticated paradigms.
2.3.2 Advanced RAG
Advanced RAG [ 20] systems build upon the limitations of Naïve RAG by incorporating semantic understanding and
enhanced retrieval techniques. Figure 4 highlights the semantic enhancements in retrieval and the iterative, context-
aware pipeline of Advanced RAG. These systems leverage dense retrieval models, such as Dense Passage Retrieval
(DPR), and neural ranking algorithms to improve retrieval precision.
Key features of Advanced RAG include:
4
Page 5:
Figure 4: Overview of Advanced RAG
•Dense Vector Search : Queries and documents are represented in high-dimensional vector spaces, enabling
better semantic alignment between the user query and retrieved documents.
•Contextual Re-Ranking : Neural models re-rank retrieved documents to prioritize the most contextually
relevant information.
•Iterative Retrieval : Advanced RAG introduces multi-hop retrieval mechanisms, enabling reasoning across
multiple documents for complex queries.
These advancements make Advanced RAG suitable for applications requiring high precision and nuanced understanding,
such as research synthesis and personalized recommendations. However, challenges such as computational overhead
and limited scalability persist, particularly when dealing with large datasets or multi-step queries.
2.3.3 Modular RAG
Modular RAG [ 20] represents the latest evolution in RAG paradigms, emphasizing flexibility and customization.
These systems decompose the retrieval and generation pipeline into independent, reusable components, enabling
domain-specific optimization and task adaptability. Figure 5 demonstrates the modular architecture, showcasing hybrid
retrieval strategies, composable pipelines, and external tool integration.
Key innovations in Modular RAG include:
•Hybrid Retrieval Strategies : Combining sparse retrieval methods (e.g., a sparse encoder-BM25) with dense
retrieval techniques [ 21] (e.g., DPR - Dense Passage Retrieval ) to maximize accuracy across diverse query
types.
•Tool Integration : Incorporating external APIs, databases, or computational tools to handle specialized tasks,
such as real-time data analysis or domain-specific computations.
•Composable Pipelines : Modular RAG enables retrievers, generators, and other components to be replaced,
enhanced, or reconfigured independently, allowing high adaptability to specific use cases.
For instance, a Modular RAG system designed for financial analytics might retrieve live stock prices via APIs, analyze
historical trends using dense retrieval, and generate actionable investment insights through a tailored language model.
This modularity and customization make Modular RAG ideal for complex, multi-domain tasks, offering both scalability
and precision.
5
Page 6:
Figure 5: Overview of Modular RAG
2.3.4 Graph RAG
Graph RAG [ 16] extends traditional Retrieval-Augmented Generation systems by integrating graph-based data structures
as illustrated in Figure 6. These systems leverage the relationships and hierarchies within graph data to enhance multi-
hop reasoning and contextual enrichment. By incorporating graph-based retrieval, Graph RAG enables richer and more
accurate generative outputs, particularly for tasks requiring relational understanding.
Graph RAG is characterized by its ability to:
•Node Connectivity : Captures and reasons over relationships between entities.
•Hierarchical Knowledge Management : Handles structured and unstructured data through graph-based
hierarchies.
•Context Enrichment : Adds relational understanding by leveraging graph-based pathways.
However, Graph RAG has some limitations:
•Limited Scalability : The reliance on graph structures can restrict scalability, especially with extensive data
sources.
•Data Dependency : High-quality graph data is essential for meaningful outputs, limiting its applicability in
unstructured or poorly annotated datasets.
•Complexity of Integration : Integrating graph data with unstructured retrieval systems increases design and
implementation complexity.
Graph RAG is well-suited for applications such as healthcare diagnostics, legal research, and other domains where
reasoning over structured relationships is crucial.
2.3.5 Agentic RAG
Agentic RAG represents a paradigm shift by introducing autonomous agents capable of dynamic decision-making
and workflow optimization. Unlike static systems, Agentic RAG employs iterative refinement and adaptive retrieval
strategies to address complex, real-time, and multi-domain queries. This paradigm leverages the modularity of retrieval
and generation processes while introducing agent-based autonomy.
6
Page 7:
Figure 6: Overview of Graph RAG
Key characteristics of Agentic RAG include:
•Autonomous Decision-Making : Agents independently evaluate and manage retrieval strategies based on
query complexity.
•Iterative Refinement : Incorporates feedback loops to improve retrieval accuracy and response relevance.
•Workflow Optimization : Dynamically orchestrates tasks, enabling efficiency in real-time applications.
Despite its advancements, Agentic RAG faces some challenges:
•Coordination Complexity : Managing interactions between agents requires sophisticated orchestration
mechanisms.
•Computational Overhead : The use of multiple agents increases resource requirements for complex work-
flows.
•Scalability Limitations : While scalable, the dynamic nature of the system can strain computational resources
for high query volumes.
Agentic RAG excels in domains like customer support, financial analytics, and adaptive learning platforms, where
dynamic adaptability and contextual precision are paramount.
2.4 Challenges and Limitations of Traditional RAG Systems
Traditional Retrieval-Augmented Generation (RAG) systems have significantly expanded the capabilities of Large
Language Models (LLMs) by integrating real-time data retrieval. However, these systems still face critical challenges
that hinder their effectiveness in complex, real-world applications. The most notable limitations revolve around
contextual integration ,multi-step reasoning , and scalability and latency issues .
2.4.1 Contextual Integration
Even when RAG systems successfully retrieve relevant information, they often struggle to seamlessly incorporate it
into generated responses. The static nature of retrieval pipelines and limited contextual awareness lead to fragmented,
inconsistent, or overly generic outputs.
Example: A query such as, "What are the latest advancements in Alzheimer’s research and their implications for
early-stage treatment?" might yield relevant research papers and medical guidelines. However, traditional RAG
systems often fail to synthesize these findings into a coherent explanation that connects the new treatments to specific
patient scenarios. Similarly, for a query like, "What are the best sustainable practices for small-scale agriculture in
arid regions?" , traditional systems might retrieve documents on general agricultural methods but overlook critical
sustainability practices tailored to arid environments.
7
Page 8:
Table 1: Comparative Analysis of RAG Paradigms
Paradigm Key Features Strengths
Naïve RAG•Keyword-based retrieval (e.g.,
TF-IDF, BM25)• Simple and easy to implement
• Suitable for fact-based queries
Advanced RAG•Dense retrieval models (e.g.,
DPR)
•Neural ranking and re-ranking
• Multi-hop retrieval• High precision retrieval
• Improved contextual relevance
Modular RAG•Hybrid retrieval (sparse and
dense)
• Tool and API integration
•Composable, domain-specific
pipelines•High flexibility and customization
• Suitable for diverse applications
• Scalable
Graph RAG•Integration of graph-based
structures
• Multi-hop reasoning
•Contextual enrichment via
nodes• Relational reasoning capabilities
• Mitigates hallucinations
• Ideal for structured data tasks
Agentic RAG• Autonomous agents
• Dynamic decision-making
•Iterative refinement and work-
flow optimization• Adaptable to real-time changes
• Scalable for multi-domain tasks
• High accuracy
2.4.2 Multi-Step Reasoning
Many real-world queries require iterative or multi-hop reasoning—retrieving and synthesizing information across
multiple steps. Traditional RAG systems are often ill-equipped to refine retrieval based on intermediate insights or user
feedback, resulting in incomplete or disjointed responses.
Example: A complex query like, "What lessons from renewable energy policies in Europe can be applied to developing
nations, and what are the potential economic impacts?" demands the orchestration of multiple types of information,
including policy data, contextualization for developing regions, and economic analysis. Traditional RAG systems
typically fail to connect these disparate elements into a cohesive response.
2.4.3 Scalability and Latency Issues
As the volume of external data sources grows, querying and ranking large datasets becomes increasingly computationally
intensive. This results in significant latency, which undermines the system’s ability to provide timely responses in
real-time applications.
Example: In time-sensitive settings such as financial analytics orlive customer support , delays caused by querying
multiple databases or processing large document sets can hinder the system’s overall utility. For example, a delay in
retrieving market trends during high-frequency trading could result in missed opportunities.
2.5 Agentic RAG: A Paradigm Shift
Traditional RAG systems, with their static workflows and limited adaptability, often struggle to handle dynamic, multi-
step reasoning and complex real-world tasks. These limitations have spurred the integration of agentic intelligence,
8
Page 9:
resulting in Agentic RAG. By incorporating autonomous agents capable of dynamic decision-making, iterative reasoning,
and adaptive retrieval strategies, Agentic RAG builds on the modularity of earlier paradigms while overcoming their
inherent constraints. This evolution enables more complex, multi-domain tasks to be addressed with enhanced precision
and contextual understanding, positioning Agentic RAG as a cornerstone for next-generation AI applications. In
particular, Agentic RAG systems reduce latency through optimized workflows and refine outputs iteratively, tackling
the very challenges that have historically hindered traditional RAG’s scalability and effectiveness.
3 Core Principles and Background of Agentic Intelligence
Agentic Intelligence forms the foundation of Agentic Retrieval-Augmented Generation (RAG) systems, enabling them
to transcend the static and reactive nature of traditional RAG. By integrating autonomous agents capable of dynamic
decision-making, iterative reasoning, and collaborative workflows, Agentic RAG systems exhibit enhanced adaptability
and precision. This section explores the core principles underpinning agentic intelligence.
Components of an AI Agent. In essence, an AI agent comprises (Figure. 7):
•LLM (with defined Role and Task): Serves as the agent’s primary reasoning engine and dialogue interface.
It interprets user queries, generates responses, and maintains coherence.
•Memory (Short-Term and Long-Term): Captures context and relevant data across interactions. Short-term
memory [ 22] tracks immediate conversation state, while long-term memory [ 22]stores accumulated knowledge
and agent experiences.
•Planning (Reflection & Self-Critique): Guides the agent’s iterative reasoning process through reflection,
query routing, or self-critique[23], ensuring that complex tasks are broken down effectively [24].
•Tools Vector Search, Web Search, APIs, etc.): Expands the agent’s capabilities beyond text generation,
enabling access to external resources, real-time data, or specialized computations.
Figure 7: An Overview of AI Agents
Agentic Patterns [ 25,26] provide structured methodologies that guide the behavior of agents in Agentic Retrieval-
Augmented Generation (RAG) systems. These patterns enable agents to dynamically adapt, plan, and collaborate,
ensuring that the system can handle complex, real-world tasks with precision and scalability. Four key patterns underpin
agentic workflows:
3.1 Reflection
Reflection is a foundational design pattern in agentic workflows, enabling agents to iteratively evaluate and refine their
outputs. By incorporating self-feedback mechanisms, agents can identify and address errors, inconsistencies, and areas
for improvement, enhancing performance across tasks like code generation, text production, and question answering (
9
Page 10:
as shown in Figure 8). In practical use, Reflection involves prompting an agent to critique its outputs for correctness,
style, and efficiency, then incorporating this feedback into subsequent iterations. External tools, such as unit tests or
web searches, can further enhance this process by validating results and highlighting gaps.
In multi-agent systems, Reflection can involve distinct roles, such as one agent generating outputs while another
critiques them, fostering collaborative improvement. For instance, in legal research, agents can iteratively refine
responses by re-evaluating retrieved case law, ensuring accuracy and comprehensiveness. Reflection has demonstrated
significant performance improvements in studies like Self-Refine [27], Reflexion [28], and CRITIC [23].
Figure 8: An Overview of Agentic Self- Reflection
3.2 Planning
Planning [ 24] is a key design pattern in agentic workflows that enables agents to autonomously decompose complex tasks
into smaller, manageable subtasks. This capability is essential for multi-hop reasoning and iterative problem-solving in
dynamic and uncertain scenarios as shown in Figure 9a.
By leveraging planning, agents can dynamically determine the sequence of steps needed to accomplish a larger objective.
This adaptability allows agents to handle tasks that cannot be predefined, ensuring flexibility in decision-making.
While powerful, Planning can produce less predictable outcomes compared to deterministic workflows like Reflection.
Planning is particularly suited for tasks that require dynamic adaptation, where predefined workflows are insufficient.
As the technology matures, its potential to drive innovative applications across domains will continue to grow.
3.3 Tool Use
Tool Use enables agents to extend their capabilities by interacting with external tools, APIs, or computational resources
as illustrated in 9b. This pattern allows agents to gather information, perform computations, and manipulate data beyond
their pre-trained knowledge. By dynamically integrating tools into workflows, agents can adapt to complex tasks and
provide more accurate and contextually relevant outputs.
Modern agentic workflows incorporate tool use for a variety of applications, including information retrieval, computa-
tional reasoning, and interfacing with external systems. The implementation of this pattern has evolved significantly
with advancements like GPT-4’s function calling capabilities and systems capable of managing access to numerous
tools. These developments facilitate sophisticated workflows where agents autonomously select and execute the most
relevant tools for a given task.
While tool use significantly enhances agentic workflows, challenges remain in optimizing the selection of tools,
particularly in contexts with a large number of available options. Techniques inspired by retrieval-augmented generation
(RAG), such as heuristic-based selection, have been proposed to address this issue.
3.4 Multi-Agent
Multi-agent collaboration [ 29] is a key design pattern in agentic workflows that enables task specialization and parallel
processing. Agents communicate and share intermediate results, ensuring the overall workflow remains efficient and
coherent. By distributing subtasks among specialized agents, this pattern improves the scalability and adaptability
10
Page 11:
(a) An Overview of Agentic Planning
(b) An Overview of Tool Use
Figure 9: Overview of Agentic Planning and Tool Use
of complex workflows. Multi-agent systems allow developers to decompose intricate tasks into smaller, manageable
subtasks assigned to different agents. This approach not only enhances task performance but also provides a robust
framework for managing complex interactions. Each agent operates with its own memory and workflow, which can
include the use of tools, reflection, or planning, enabling dynamic and collaborative problem-solving (see Figure 10).
While multi-agent collaboration offers significant potential, it is a less predictable design pattern compared to more
mature workflows like Reflection and Tool Use. Nevertheless, emerging frameworks such as AutoGen, Crew AI, and
LangGraph are providing new avenues for implementing effective multi-agent solutions.
Figure 10: An Overview of MultiAgent
These design patterns form the foundation for the success of Agentic RAG systems. By structuring workflows—from
simple, sequential steps to more adaptive, collaborative processes—these patterns enable systems to dynamically
adapt their retrieval and generative strategies to the diverse and ever-changing demands of real-world environments.
Leveraging these patterns, agents are capable of handling iterative, context-aware tasks that significantly exceed the
capabilities of traditional RAG systems.
4 Agentic Workflow Patterns: Adaptive Strategies for Dynamic Collaboration
Agentic workflow patterns, [ 12,13] structure LLM-based applications to optimize performance, accuracy, and efficiency.
Different approaches are suitable depending on task complexity and processing requirements.
11
Page 12:
4.1 Prompt Chaining: Enhancing Accuracy Through Sequential Processing
Prompt chaining [ 12,13] decomposes a complex task into multiple steps, where each step builds upon the previous
one. This structured approach improves accuracy by simplifying each subtask before moving forward. However, it may
increase latency due to sequential processing.
Figure 11: Illustration of Prompt Chaining Workflow
When to Use: This workflow is most effective when a task can be broken down into fixed subtasks, each contributing
to the final output. It is particularly useful in scenarios where step-by-step reasoning enhances accuracy.
Example Applications:
• Generating marketing content in one language and then translating it into another while preserving nuances.
•Structuring document creation by first generating an outline, verifying its completeness, and then developing
the full text.
4.2 Routing:Directing Inputs to Specialized Processes
Routing [ 12,13] involves classifying an input and directing it to an appropriate specialized prompt or process. This
method ensures distinct queries or tasks are handled separately, improving efficiency and response quality.
Figure 12: Illustration Routing Workflow
When to Use: Ideal for scenarios where different types of input require distinct handling strategies, ensuring optimized
performance for each category.
Example Applications:
•Directing customer service queries into categories such as technical support, refund requests, or general
inquiries.
12
Page 13:
•Assigning simple queries to smaller models for cost efficiency, while complex requests go to advanced models.
4.3 Parallelization: Speeding Up Processing Through Concurrent Execution
Parallelization [ 12,13] divides a task into independent processes that run simultaneously, reducing latency and
improving throughput. It can be categorized into sectioning (independent subtasks) and voting (multiple outputs for
accuracy).
Figure 13: Illustration of Parallelization Workflow
When to Use: Useful when tasks can be executed independently to enhance speed or when multiple outputs improve
confidence.
Example Applications:
•Sectioning: Splitting tasks like content moderation, where one model screens input while another generates a
response.
•Voting: Using multiple models to cross-check code for vulnerabilities or analyze content moderation decisions.
4.4 Orchestrator-Workers: Dynamic Task Delegation
This workflow [ 12,13] features a central orchestrator model that dynamically breaks tasks into subtasks, assigns them
to specialized worker models, and compiles the results. Unlike parallelization, it adapts to varying input complexity.
Figure 14: Illustration of Orchestrator-Workers Workflow
When to Use: Best suited for tasks requiring dynamic decomposition and real-time adaptation, where subtasks are not
predefined.
13
Page 14:
Example Applications:
• Automatically modifying multiple files in a codebase based on the nature of requested changes.
• Conducting real-time research by gathering and synthesizing relevant information from multiple sources.
4.5 Evaluator-Optimizer: Refining Output Through Iteration
The evaluator-optimizer [ 12,13] workflow iteratively improves content by generating an initial output and refining it
based on feedback from an evaluation model.
Figure 15: Illustration of Evaluator-Optimizer Workflow
When to Use: Effective when iterative refinement significantly enhances response quality, especially when clear
evaluation criteria exist.
Example Applications:
• Improving literary translations through multiple evaluation and refinement cycles.
• Conducting multi-round research queries where additional iterations refine search results.
5 Taxonomy of Agentic RAG Systems
Agentic Retrieval-Augmented Generation (RAG) systems can be categorized into distinct architectural frameworks
based on their complexity and design principles. These include single-agent architectures, multi-agent systems, and hi-
erarchical agentic architectures. Each framework is tailored to address specific challenges and optimize performance for
diverse applications. This section provides a detailed taxonomy of these architectures, highlighting their characteristics,
strengths, and limitations.
5.1 Single-Agent Agentic RAG: Router
ASingle-Agent Agentic RAG: [30] serves as a centralized decision-making system where a single agent manages the
retrieval, routing, and integration of information (as shown in Figure. 16). This architecture simplifies the system by
consolidating these tasks into one unified agent, making it particularly effective for setups with a limited number of
tools or data sources.
Workflow
1.Query Submission and Evaluation: The process begins when a user submits a query. A coordinating
agent (or master retrieval agent) receives the query and analyzes it to determine the most suitable sources of
information.
2.Knowledge Source Selection: Based on the query’s type, the coordinating agent chooses from a variety of
retrieval options:
14
Page 15:
•Structured Databases: For queries requiring tabular data access, the system may use a Text-to-SQL
engine that interacts with databases like PostgreSQL or MySQL.
•Semantic Search: When dealing with unstructured information, it retrieves relevant documents (e.g.,
PDFs, books, organizational records) using vector-based retrieval.
•Web Search: For real-time or broad contextual information, the system leverages a web search tool to
access the latest online data.
•Recommendation Systems: For personalized or contextual queries, the system taps into recommendation
engines that provide tailored suggestions.
3.Data Integration and LLM Synthesis: Once the relevant data is retrieved from the chosen sources, it is
passed to a Large Language Model (LLM) . The LLM synthesizes the gathered information, integrating insights
from multiple sources into a coherent and contextually relevant response.
4.Output Generation: Finally, the system delivers a comprehensive, user-facing answer that addresses the
original query. This response is presented in an actionable, concise format and may optionally include
references or citations to the sources used.
Figure 16: An Overview of Single Agentic RAG
Key Features and Advantages.
•Centralized Simplicity: A single agent handles all retrieval and routing tasks, making the architecture
straightforward to design, implement, and maintain.
•Efficiency & Resource Optimization: With fewer agents and simpler coordination, the system demands
fewer computational resources and can handle queries more quickly.
•Dynamic Routing: The agent evaluates each query in real-time, selecting the most appropriate knowledge
source (e.g., structured DB, semantic search, web search).
•Versatility Across Tools: Supports a variety of data sources and external APIs, enabling both structured and
unstructured workflows.
•Ideal for Simpler Systems: Suited for applications with well-defined tasks or limited integration requirements
(e.g., document retrieval, SQL-based workflows).
15
Page 16:
Use Case: Customer Support
Prompt: Can you tell me the delivery status of my order?
System Process (Single-Agent Workflow):
1.Query Submission and Evaluation:
• The user submits the query, which is received by the coordinating agent.
•The coordinating agent analyzes the query and determines the most appropriate sources of
information.
2.Knowledge Source Selection:
• Retrieves tracking details from the order management database.
• Fetches real-time updates from the shipping provider’s API.
•Optionally conducts a web search to identify local conditions affecting delivery, such as weather
or logistical delays.
3.Data Integration and LLM Synthesis:
•The relevant data is passed to the LLM, which synthesizes the information into a coherent
response.
4.Output Generation:
•The system generates an actionable and concise response, providing live tracking updates and
potential alternatives.
Response:
Integrated Response: “Your package is currently in transit and expected to arrive tomorrow evening. The live
tracking from UPS indicates it is at the regional distribution center.”
5.2 Multi-Agent Agentic RAG Systems:
Multi-Agent RAG [30] represents a modular and scalable evolution of single-agent architectures, designed to handle
complex workflows and diverse query types by leveraging multiple specialized agents (as shown in Figure 17). Instead
of relying on a single agent to manage all tasks—reasoning, retrieval, and response generation—this system distributes
responsibilities across multiple agents, each optimized for a specific role or data source.
Workflow
1.Query Submission : The process begins with a user query, which is received by a coordinator agent or master
retrieval agent. This agent acts as the central orchestrator, delegating the query to specialized retrieval agents
based on the query’s requirements.
2.Specialized Retrieval Agents : The query is distributed among multiple retrieval agents, each focusing on a
specific type of data source or task. Examples include:
•Agent 1 : Handles structured queries, such as interacting with SQL-based databases like PostgreSQL or
MySQL.
•Agent 2 : Manages semantic searches for retrieving unstructured data from sources like PDFs, books, or
internal records.
•Agent 3 : Focuses on retrieving real-time public information from web searches or APIs.
•Agent 4 : Specializes in recommendation systems, delivering context-aware suggestions based on user
behavior or profiles.
3.Tool Access and Data Retrieval : Each agent routes the query to the appropriate tools or data sources within
its domain, such as:
•Vector Search : For semantic relevance.
•Text-to-SQL : For structured data.
•Web Search : For real-time public information.
•APIs : For accessing external services or proprietary systems.
The retrieval process is executed in parallel, allowing for efficient processing of diverse query types.
16
Page 17:
Figure 17: An Overview of Multi-Agent Agentic RAG Systems
4.Data Integration and LLM Synthesis : Once retrieval is complete, the data from all agents is passed to a
Large Language Model (LLM) . The LLM synthesizes the retrieved information into a coherent and contextually
relevant response, integrating insights from multiple sources seamlessly.
5.Output Generation : The system generates a comprehensive response, which is delivered back to the user in
an actionable and concise format.
Key Features and Advantages.
•Modularity : Each agent operates independently, allowing for seamless addition or removal of agents based on
system requirements.
•Scalability : Parallel processing by multiple agents enables the system to handle high query volumes efficiently.
•Task Specialization : Each agent is optimized for a specific type of query or data source, improving accuracy
and retrieval relevance.
•Efficiency : By distributing tasks across specialized agents, the system minimizes bottlenecks and enhances
performance for complex workflows.
•Versatility : Suitable for applications spanning multiple domains, including research, analytics, decision-
making, and customer support.
Challenges
•Coordination Complexity : Managing inter-agent communication and task delegation requires sophisticated
orchestration mechanisms.
•Computational Overhead : Parallel processing of multiple agents can increase resource usage.
•Data Integration : Synthesizing outputs from diverse sources into a cohesive response is non-trivial and
requires advanced LLM capabilities.
17
Page 18:
Use Case: Multi-Domain Research Assistant
Prompt: What are the economic and environmental impacts of renewable energy adoption in Europe?
System Process (Multi-Agent Workflow):
•Agent 1: Retrieves statistical data from economic databases using SQL-based queries.
•Agent 2: Searches for relevant academic papers using semantic search tools.
•Agent 3: Performs a web search for recent news and policy updates on renewable energy.
•Agent 4: Consults a recommendation system to suggest related content, such as reports or expert
commentary.
Response:
Integrated Response: “Adopting renewable energy in Europe has led to a 20% reduction in greenhouse gas
emissions over the past decade, according to EU policy reports. Economically, renewable energy investments
have generated approximately 1.2 million jobs, with significant growth in solar and wind sectors. Recent
academic studies also highlight potential trade-offs in grid stability and energy storage costs.”
5.3 Hierarchical Agentic RAG Systems
Hierarchical Agentic RAG: [14] systems employ a structured, multi-tiered approach to information retrieval and
processing, enhancing both efficiency and strategic decision-making as shown in Figure 18. Agents are organized in
a hierarchy, with higher-level agents overseeing and directing lower-level agents. This structure enables multi-level
decision-making, ensuring that queries are handled by the most appropriate resources.
Figure 18: An illustration of Hierarchical Agentic RAG
Workflow
1.Query Reception : A user submits a query, received by a top-tier agent responsible for initial assessment and
delegation.
2.Strategic Decision-Making : The top-tier agent evaluates the query’s complexity and decides which subor-
dinate agents or data sources to prioritize. Certain databases, APIs, or retrieval tools may be deemed more
reliable or relevant based on the query’s domain.
3.Delegation to Subordinate Agents : The top-tier agent assigns tasks to lower-level agents specialized in
particular retrieval methods (e.g., SQL databases, web search, or proprietary systems). These agents execute
their assigned tasks independently.
18
Page 19:
4.Aggregation and Synthesis : The results from subordinate agents are collected and integrated by the higher-
level agent, which synthesizes the information into a coherent response.
5.Response Delivery : The final, synthesized answer is returned to the user, ensuring that the response is both
comprehensive and contextually relevant.
Key Features and Advantages.
•Strategic Prioritization : Top-tier agents can prioritize data sources or tasks based on query complexity,
reliability, or context.
•Scalability : Distributing tasks across multiple agent tiers enables handling of highly complex or multi-faceted
queries.
•Enhanced Decision-Making : Higher-level agents apply strategic oversight to improve overall accuracy and
coherence of responses.
Challenges
•Coordination Complexity : Maintaining robust inter-agent communication across multiple levels can increase
orchestration overhead.
•Resource Allocation : Efficiently distributing tasks among tiers to avoid bottlenecks is non-trivial.
Use Case: Financial Analysis System
Prompt: What are the best investment options given the current market trends in renewable energy?
System Process (Hierarchical Agentic Workflow):
1.Top-Tier Agent : Assesses the query’s complexity and prioritizes reliable financial databases and
economic indicators over less validated data sources.
2.Mid-Level Agent : Retrieves real-time market data (e.g., stock prices, sector performance) from
proprietary APIs and structured SQL databases.
3.Lower-Level Agent(s) : Conducts web searches for recent policy announcements and consults recom-
mendation systems that track expert opinions and news analytics.
4.Aggregation and Synthesis : The top-tier agent compiles the results, integrating quantitative data with
policy insights.
Response:
Integrated Response: “Based on current market data, renewable energy stocks have shown a 15% growth over
the past quarter, driven by supportive government policies and heightened investor interest. Analysts suggest
that wind and solar sectors, in particular, may experience continued momentum, while emerging technologies
like green hydrogen present moderate risk but potentially high returns.”
5.4 Agentic Corrective RAG
Corrective RAG : introduces mechanisms to self-correct retrieval results, enhancing document utilization and improving
response generation quality as demonstrated in Figure 19. By embedding intelligent agents into the workflow, Corrective
RAG [ 31] [32] ensures iterative refinement of context documents and responses, minimizing errors and maximizing
relevance.
Key Idea of Corrective RAG: The core principle of Corrective RAG lies in its ability to evaluate retrieved documents
dynamically, perform corrective actions, and refine queries to enhance the quality of generated responses. Corrective
RAG adjusts its approach as follows:
•Document Relevance Evaluation: Retrieved documents are assessed for relevance by the Relevance Evalua-
tion Agent . Documents below the relevance threshold trigger corrective steps.
•Query Refinement and Augmentation: Queries are refined by the Query Refinement Agent , which leverages
semantic understanding to optimize retrieval for better results.
19
Page 20:
Figure 19: Overview of Agentic Corrective RAG
•Dynamic Retrieval from External Sources: When context is insufficient, the External Knowledge Retrieval
Agent performs web searches or accesses alternative data sources to supplement the retrieved documents.
•Response Synthesis: All validated and refined information is passed to the Response Synthesis Agent for final
response generation.
Workflow: The Corrective RAG system is built on five key agents:
1.Context Retrieval Agent: Responsible for retrieving initial context documents from a vector database.
2.Relevance Evaluation Agent: Assesses the retrieved documents for relevance and flags any irrelevant or
ambiguous documents for corrective actions.
3.Query Refinement Agent: Rewrites queries to improve retrieval, leveraging semantic understanding to
optimize results.
4.External Knowledge Retrieval Agent: Performs web searches or accesses alternative data sources when the
context documents are insufficient.
5.Response Synthesis Agent: Synthesizes all validated information into a coherent and accurate response.
Key Features and Advantages:
•Iterative Correction: Ensures high response accuracy by dynamically identifying and correcting irrelevant or
ambiguous retrieval results.
•Dynamic Adaptability: Incorporates real-time web searches and query refinement for enhanced retrieval
precision.
•Agentic Modularity: Each agent performs specialized tasks, ensuring efficient and scalable operation.
•Factuality Assurance: By validating all retrieved and generated content, Corrective RAG minimizes the risk
of hallucination or misinformation.
20
Page 21:
Use Case: Academic Research Assistant
Prompt: What are the latest findings in generative AI research?
System Process (Corrective RAG Workflow):
1.Query Submission: A user submits the query to the system.
2.Context Retrieval:
•TheContext Retrieval Agent retrieves initial documents from a database of published papers on
generative AI.
• The retrieved documents are passed to the next step for evaluation.
3.Relevance Evaluation:
• The Relevance Evaluation Agent assesses the documents for alignment with the query.
•Documents are classified into relevant, ambiguous, or irrelevant categories. Irrelevant documents
are flagged for corrective actions.
4.Corrective Actions (if needed):
• The Query Refinement Agent rewrites the query to improve specificity and relevance.
•TheExternal Knowledge Retrieval Agent performs web searches to fetch additional papers and
reports from external sources.
5.Response Synthesis:
•TheResponse Synthesis Agent integrates validated documents into a coherent and comprehensive
summary.
Response:
Integrated Response: “Recent findings in generative AI highlight advancements in diffusion models, reinforce-
ment learning for text-to-video tasks, and optimization techniques for large-scale model training. For more
details, refer to studies published in NeurIPS 2024 and AAAI 2025.”
5.5 Adaptive Agentic RAG
Adaptive Retrieval-Augmented Generation (Adaptive RAG) [33] enhances the flexibility and efficiency of large
language models (LLMs) by dynamically adjusting query handling strategies based on the complexity of the incoming
query. Unlike static retrieval workflows, Adaptive RAG [ 34] employs a classifier to assess query complexity and
determine the most appropriate approach, ranging from single-step retrieval to multi-step reasoning, or even bypassing
retrieval altogether for straightforward queries as illustrated in Figure 20.
Figure 20: An Overview of Adaptive Agentic RAG
Key Idea of Adaptive RAG The core principle of Adaptive RAG lies in its ability to dynamically tailor retrieval
strategies based on the complexity of the query. Adaptive RAG adjusts its approach as follows:
21
Page 22:
•Straightforward Queries: For fact-based questions that require no additional retrieval (e.g., "What is the
boiling point of water?" ), the system directly generates an answer using pre-existing knowledge.
•Simple Queries: For moderately complex tasks requiring minimal context (e.g., "What is the status of my
latest electricity bill?" ), the system performs a single-step retrieval to fetch the relevant details.
•Complex Queries: For multi-layered queries requiring iterative reasoning (e.g., "How has the population of
City X changed over the past decade, and what are the contributing factors?" ), the system employs multi-step
retrieval, progressively refining intermediate results to provide a comprehensive answer.
Workflow: The Adaptive RAG system is built on three primary components:
1.Classifier Role:
• A smaller language model analyzes the query to predict its complexity.
•The classifier is trained using automatically labeled datasets, derived from past model outcomes and
query patterns.
2.Dynamic Strategy Selection:
•For straightforward queries, the system avoids unnecessary retrieval, directly leveraging the LLM for
response generation.
• For simple queries, it employs a single-step retrieval process to fetch relevant context.
•For complex queries, it activates multi-step retrieval to ensure iterative refinement and enhanced reasoning.
3.LLM Integration:
• The LLM synthesizes retrieved information into a coherent response.
• Iterative interactions between the LLM and the classifier enable refinement for complex queries.
Key Features and Advantages
•Dynamic Adaptability: Adjusts retrieval strategies based on query complexity, optimizing both computational
efficiency and response accuracy.
•Resource Efficiency: Minimizes unnecessary overhead for simple queries while ensuring thorough processing
for complex ones.
•Enhanced Accuracy: Iterative refinement ensures that complex queries are resolved with high precision.
•Flexibility: Can be extended to incorporate additional pathways, such as domain-specific tools or external
APIs.
22
Page 23:
Use Case: Customer Support Assistant
Prompt: Why is my package delayed, and what alternatives do I have?
System Process (Adaptive RAG Workflow):
1.Query Classification:
•The classifier analyzes the query and determines it to be complex, requiring multi-step reasoning.
2.Dynamic Strategy Selection:
• The system activates a multi-step retrieval process based on the complexity classification.
3.Multi-Step Retrieval:
• Retrieves tracking details from the order database.
• Fetches real-time status updates from the shipping provider API.
• Conducts a web search for external factors such as weather conditions or local disruptions.
4.Response Synthesis:
•The LLM integrates all retrieved information, synthesizing a comprehensive and actionable
response.
Response:
Integrated Response: “Your package is delayed due to severe weather conditions in your region. It is currently
at the local distribution center and will be delivered in 2 days. Alternatively, you may opt for a local pickup
from the facility.”
5.6 Graph-Based Agentic RAG
5.6.1 Agent-G: Agentic Framework for Graph RAG
Agent-G [8]: introduces a novel agentic architecture that integrates graph knowledge bases with unstructured document
retrieval. By combining structured and unstructured data sources, this framework enhances retrieval-augmented
generation (RAG) systems with improved reasoning and retrieval accuracy. It employs modular retriever banks,
dynamic agent interaction, and feedback loops to ensure high-quality outputs as shown in Figure 21.
Figure 21: An Overview of Agent-G: Agentic Framework for Graph RAG [8]
23
Page 24:
Key Idea of Agent-G The core principle of Agent-G lies in its ability to dynamically assign retrieval tasks to
specialized agents, leveraging both graph knowledge bases and textual documents. Agent-G adjusts its retrieval strategy
as follows:
•Graph Knowledge Bases: Structured data is used to extract relationships, hierarchies, and connections (e.g.,
disease-to-symptom mappings in healthcare).
•Unstructured Documents: Traditional text retrieval systems provide contextual information to complement
graph data.
•Critic Module: Evaluates the relevance and quality of retrieved information, ensuring alignment with the
query.
•Feedback Loops: Refines retrieval and synthesis through iterative validation and re-querying.
Workflow: The Agent-G system is built on four primary components:
1.Retriever Bank:
• A modular set of agents specializes in retrieving graph-based or unstructured data.
• Agents dynamically select relevant sources based on the query’s requirements.
2.Critic Module:
• Validates retrieved data for relevance and quality.
• Flags low-confidence results for re-retrieval or refinement.
3.Dynamic Agent Interaction:
• Task-specific agents collaborate to integrate diverse data types.
• Ensures cohesive retrieval and synthesis across graph and text sources.
4.LLM Integration:
• Synthesizes validated data into a coherent response.
• Iterative feedback from the critic ensures alignment with the query’s intent.
Key Features and Advantages
•Enhanced Reasoning: Combines structured relationships from graphs with contextual information from
unstructured documents.
•Dynamic Adaptability: Adjusts retrieval strategies dynamically based on query requirements.
•Improved Accuracy: Critic module reduces the risk of irrelevant or low-quality data in responses.
•Scalable Modularity: Supports the addition of new agents for specialized tasks, enhancing scalability.
24
Page 25:
Use Case: Healthcare Diagnostics
Prompt: What are the common symptoms of Type 2 Diabetes, and how are they related to heart disease?
System Process (Agent-G Workflow):
1.Query Reception and Assignment: The system receives the query and identifies the need for both
graph-structured and unstructured data to answer the question comprehensively.
2.Graph Retriever:
•Extracts relationships between Type 2 Diabetes and heart disease from a medical knowledge
graph.
•Identifies shared risk factors such as obesity and high blood pressure by exploring graph hierar-
chies and relationships.
3.Document Retriever:
•Retrieves descriptions of Type 2 Diabetes symptoms (e.g., increased thirst, frequent urination,
fatigue) from medical literature.
• Adds contextual information to complement the graph-based insights.
4.Critic Module:
• Evaluates the relevance and quality of the retrieved graph data and document data.
• Flags low-confidence results for refinement or re-querying.
5.Response Synthesis: The LLM integrates validated data from the Graph Retriever and Document
Retriever into a coherent response, ensuring alignment with the query’s intent.
Response:
Integrated Response: “Type 2 Diabetes symptoms include increased thirst, frequent urination, and fatigue.
Studies show a 50% correlation between diabetes and heart disease, primarily through shared risk factors such
as obesity and high blood pressure.”
5.6.2 GeAR: Graph-Enhanced Agent for Retrieval-Augmented Generation
GeAR [35]: introduces an agentic framework that enhances traditional Retrieval-Augmented Generation (RAG) systems
by incorporating graph-based retrieval mechanisms. By leveraging graph expansion techniques and an agent-based
architecture, GeAR addresses challenges in multi-hop retrieval scenarios, improving the system’s ability to handle
complex queries as shown in Figure 22.
Key Idea of GeAR GeAR advances RAG performance through two primary innovations:
•Graph Expansion: Enhances conventional base retrievers (e.g., BM25) by expanding the retrieval process to
include graph-structured data, enabling the system to capture complex relationships and dependencies between
entities.
•Agent Framework: Incorporates an agent-based architecture that utilizes graph expansion to manage retrieval
tasks more effectively, allowing for dynamic and autonomous decision-making in the retrieval process.
Workflow: The GeAR system operates through the following components:
1.Graph Expansion Module:
•Integrates graph-based data into the retrieval process, allowing the system to consider relationships
between entities during retrieval.
•Enhances the base retriever’s ability to handle multi-hop queries by expanding the search space to include
connected entities.
2.Agent-Based Retrieval:
•Employs an agent framework to manage the retrieval process, enabling dynamic selection and combination
of retrieval strategies based on the query’s complexity.
•Agents can autonomously decide to utilize graph-expanded retrieval paths to improve the relevance and
accuracy of retrieved information.
25
Page 26:
3.LLM Integration:
•Combines the retrieved information, enriched by graph expansion, with the capabilities of a Large
Language Model (LLM) to generate coherent and contextually relevant responses.
•The integration ensures that the generative process is informed by both unstructured documents and
structured graph data.
Figure 22: An Overview of GeAR: Graph-Enhanced Agent for Retrieval-Augmented Generation[35]
Key Features and Advantages
•Enhanced Multi-Hop Retrieval: GeAR’s graph expansion allows the system to handle complex queries that
require reasoning over multiple interconnected pieces of information.
•Agentic Decision-Making: The agent framework enables dynamic and autonomous selection of retrieval
strategies, improving efficiency and relevance.
•Improved Accuracy: By incorporating structured graph data, GeAR enhances the precision of retrieved
information, leading to more accurate and contextually appropriate responses.
•Scalability: The modular nature of the agent framework allows for the integration of additional retrieval
strategies and data sources as needed.
26
Page 27:
Use Case: Multi-Hop Question Answering
Prompt: Which author influenced the mentor of J.K. Rowling?
System Process (GeAR Workflow):
1.Top-Tier Agent : Evaluates the query’s multi-hop nature and determines that a combination of graph
expansion and document retrieval is necessary to answer the question.
2.Graph Expansion Module :
• Identifies that J.K. Rowling’s mentor is a key entity in the query.
•Traces the literary influences on that mentor by exploring graph-structured data on literary
relationships.
3.Agent-Based Retrieval :
•An agent autonomously selects the graph-expanded retrieval path to gather relevant information
about the mentor’s influences.
•Integrates additional context by querying textual data sources for unstructured details about the
mentor and their influences.
4.Response Synthesis : Combines insights from the graph and document retrieval processes using the
LLM to generate a response that accurately reflects the complex relationships in the query.
Response:
Integrated Response: “J.K. Rowling’s mentor, [Mentor Name], was heavily influenced by [Author Name],
known for their [notable works or genre]. This connection highlights the layered relationships in literary history,
where influential ideas often pass through multiple generations of authors.”
5.7 Agentic Document Workflows in Agentic RAG
Agentic Document Workflows (ADW) [36] extend traditional Retrieval-Augmented Generation (RAG) paradigms by
enabling end-to-end knowledge work automation. These workflows orchestrate complex document-centric processes,
integrating document parsing, retrieval, reasoning, and structured outputs with intelligent agents (see Figure 23). ADW
systems address limitations of Intelligent Document Processing (IDP) and RAG by maintaining state, coordinating
multi-step workflows, and applying domain-specific logic to documents.
Workflow
1.Document Parsing and Information Structuring:
•Documents are parsed using enterprise-grade tools (e.g., LlamaParse) to extract relevant data fields such
as invoice numbers, dates, vendor information, line items, and payment terms.
• Structured data is organized for downstream processing.
2.State Maintenance Across Processes:
•The system maintains state about document context, ensuring consistency and relevance across multi-step
workflows.
• Tracks the progression of the document through various processing stages.
3.Knowledge Retrieval:
• Relevant references are retrieved from external knowledge bases (e.g., LlamaCloud) or vector indexes.
• Retrieves real-time, domain-specific guidelines for enhanced decision-making.
4.Agentic Orchestration:
•Intelligent agents apply business rules, perform multi-hop reasoning, and generate actionable recommen-
dations.
• Orchestrates components such as parsers, retrievers, and external APIs for seamless integration.
5.Actionable Output Generation:
• Outputs are presented in structured formats, tailored to specific use cases.
• Recommendations and extracted insights are synthesized into concise and actionable reports.
27
Page 28:
Figure 23: An Overview of Agentic Document Workflows (ADW)
[36]
Use Case: Invoice Payments Workflow
Prompt: Generate a payment recommendation report based on the submitted invoice and associated vendor
contract terms.
System Process (ADW Workflow):
1.Parse the invoice to extract key details such as invoice number, date, vendor information, line items,
and payment terms.
2.Retrieve the corresponding vendor contract to verify payment terms and identify any applicable
discounts or compliance requirements.
3.Generate a payment recommendation report that includes original amount due, potential early payment
discounts, budget impact analysis, and strategic payment actions.
Response: Integrated Response: "Invoice INV-2025-045 for $15,000.00 has been processed. An early payment
discount of 2% is available if paid by 2025-04-10, reducing the amount due to $14,700.00. A bulk order discount
of 5% was applied as the subtotal exceeded $10,000.00. It is recommended to approve early payment to save
2% and ensure timely fund allocation for upcoming project phases."
Key Features and Advantages
•State Maintenance: Tracks document context and workflow stage, ensuring consistency across processes.
•Multi-Step Orchestration: Handles complex workflows involving multiple components and external tools.
•Domain-Specific Intelligence: Applies tailored business rules and guidelines for precise recommendations.
•Scalability: Supports large-scale document processing with modular and dynamic agent integration.
•Enhanced Productivity: Automates repetitive tasks while augmenting human expertise in decision-making.
28
Page 29:
6 Comparative Analysis of Agentic RAG Frameworks
Table 2 provides a comprehensive comparative analysis of the three architectural frameworks: Traditional RAG, Agentic
RAG, and Agentic Document Workflows (ADW). This analysis highlights their respective strengths, weaknesses, and
best-fit scenarios, offering valuable insights into their applicability across diverse use cases.
Table 2: Comparative Analysis: Traditional RAG vs Agentic RAG vs Agentic Document Workflows (ADW)
Feature Traditional RAG Agentic RAG Agentic Document
Workflows (ADW)
Focus Isolated retrieval and
generation tasksMulti-agent
collaboration and
reasoningDocument-centric
end-to-end workflows
Context Maintenance Limited Enabled through
memory modulesMaintains state across
multi-step workflows
Dynamic Adaptability Minimal High Tailored to document
workflows
Workflow
OrchestrationAbsent Orchestrates multi-agent
tasksIntegrates multi-step
document processing
Use of External
Tools/APIsBasic integration (e.g.,
retrieval tools)Extends via tools like
APIs and knowledge
basesDeeply integrates business
rules and domain-specific
tools
Scalability Limited to small
datasets or queriesScalable for multi-agent
systemsScales for multi-domain
enterprise workflows
Complex Reasoning Basic (e.g., simple
Q&A)Multi-step reasoning
with agentsStructured reasoning across
documents
Primary Applications QA systems, knowledge
retrievalMulti-domain
knowledge and
reasoningContract review, invoice
processing, claims analysis
Strengths Simplicity, quick setup High accuracy,
collaborative reasoningEnd-to-end automation,
domain-specific intelligence
Challenges Poor contextual
understandingCoordination
complexityResource overhead, domain
standardization
The comparative analysis underscores the evolutionary trajectory from Traditional RAG to Agentic RAG and further to
Agentic Document Workflows (ADW). While Traditional RAG offers simplicity and ease of deployment for basic tasks,
Agentic RAG introduces enhanced reasoning and scalability through multi-agent collaboration. ADW builds upon these
advancements by providing robust, document-centric workflows that facilitate end-to-end automation and integration
with domain-specific processes. Understanding the strengths and limitations of each framework is crucial for selecting
the most appropriate architecture to meet specific application requirements and operational demands.
7 Applications of Agentic RAG
Agentic Retrieval-Augmented Generation (RAG) systems have demonstrated transformative potential across a variety
of domains. By combining real-time data retrieval, generative capabilities, and autonomous decision-making, these
systems address complex, dynamic, and multi-modal challenges. This section explores the key applications of Agentic
RAG, providing detailed insights into how these systems are shaping industries such as customer support, healthcare,
finance, education, legal workflows, and creative industries.
7.1 Customer Support and Virtual Assistants
Agentic RAG systems are revolutionizing customer support by enabling real-time, context-aware query resolution.
Traditional chatbots and virtual assistants often rely on static knowledge bases, leading to generic or outdated responses.
29
Page 30:
By contrast, Agentic RAG systems dynamically retrieve the most relevant information, adapt to the user’s context, and
generate personalized responses.
Use Case: Twitch Ad Sales Enhancement [37]
For instance, Twitch leveraged an agentic workflow with RAG on Amazon Bedrock to streamline ad sales. The system
dynamically retrieved advertiser data, historical campaign performance, and audience demographics to generate detailed
ad proposals, significantly boosting operational efficiency.
Key Benefits:
•Improved Response Quality : Personalized and context-aware replies enhance user engagement.
•Operational Efficiency : Reduces the workload on human support agents by automating complex queries.
•Real-Time Adaptability : Dynamically integrates evolving data, such as live service outages or pricing
updates.
7.2 Healthcare and Personalized Medicine
In healthcare, the integration of patient-specific data with the latest medical research is critical for informed decision-
making. Agentic RAG systems enable this by retrieving real-time clinical guidelines, medical literature, and patient
history to assist clinicians in diagnostics and treatment planning.
Use Case: Patient Case Summary [38]
Agentic RAG systems have been applied in generating patient case summaries. For example, by integrating electronic
health records (EHR) and up-to-date medical literature, the system generates comprehensive summaries for clinicians
to make faster and more informed decisions.
Key Benefits:
•Personalized Care : Tailors recommendations to individual patient needs.
•Time Efficiency : Streamlines the retrieval of relevant research, saving valuable time for healthcare providers.
•Accuracy : Ensures recommendations are based on the latest evidence and patient-specific parameters.
7.3 Legal and Contract Analysis
Agentic RAG systems are redefining how legal workflows are conducted, offering tools for rapid document analysis and
decision-making.
Use Case: Contract Review [39]
A legal agentic RAG system can analyze contracts, extract critical clauses, and identify potential risks. By combining
semantic search capabilities with legal knowledge graphs, it automates the tedious process of contract review, ensuring
compliance and mitigating risks.
Key Benefits:
•Risk Identification : Automatically flags clauses that deviate from standard terms.
•Efficiency : Reduces the time spent on contract review processes.
•Scalability : Handles large volumes of contracts simultaneously.
7.4 Finance and Risk Analysis
Agentic RAG systems are transforming the finance industry by providing real-time insights for investment decisions,
market analysis, and risk management. These systems integrate live data streams, historical trends, and predictive
modeling to generate actionable outputs.
Use Case: Auto Insurance Claims Processing [40]
In auto insurance, Agentic RAG can automate claim processing. For example, by retrieving policy details and combining
them with accident data, it generates claim recommendations while ensuring compliance with regulatory requirements.
Key Benefits:
•Real-Time Analytics : Delivers insights based on live market data.
30
Page 31:
•Risk Mitigation : Identifies potential risks using predictive analysis and multi-step reasoning.
•Enhanced Decision-Making : Combines historical and live data for comprehensive strategies.
7.5 Education and Personalized Learning
Education is another domain where Agentic RAG systems are making significant strides. These systems enable adaptive
learning by generating explanations, study materials, and feedback tailored to the learner’s progress and preferences.
Use Case: Research Paper Generation [41]
In higher education, Agentic RAG has been used to assist researchers by synthesizing key findings from multiple
sources. For instance, a researcher querying, “What are the latest advancements in quantum computing?” receives a
concise summary enriched with references, enhancing the quality and efficiency of their work.
Key Benefits:
•Tailored Learning Paths : Adapts content to individual student needs and performance levels.
•Engaging Interactions : Provides interactive explanations and personalized feedback.
•Scalability : Supports large-scale deployments for diverse educational environments.
7.6 Graph-Enhanced Applications in Multimodal Workflows
Graph-Enhanced Agentic RAG (GEAR) combines graph structures with retrieval mechanisms, making it particularly
effective in multimodal workflows where interconnected data sources are essential.
Use Case: Market Survey Generation
GEAR enables the synthesis of text, images, and videos for marketing campaigns. For example, querying, “What
are the emerging trends in eco-friendly products?” generates a detailed report enriched with customer preferences,
competitor analysis, and multimedia content.
Key Benefits:
•Multi-Modal Capabilities : Integrates text, image, and video data for comprehensive outputs.
•Enhanced Creativity : Generates innovative ideas and solutions for marketing and entertainment.
•Dynamic Adaptability : Adapts to evolving market trends and customer needs.
The applications of Agentic RAG systems span a wide range of industries, showcasing their versatility and transformative
potential. From personalized customer support to adaptive education and graph-enhanced multimodal workflows, these
systems address complex, dynamic, and knowledge-intensive challenges. By integrating retrieval, generation, and
agentic intelligence, Agentic RAG systems are paving the way for next-generation AI applications.
8 Tools and Frameworks for Agentic RAG
Agentic Retrieval-Augmented Generation (RAG) systems represent a significant evolution in combining retrieval,
generation, and agentic intelligence. These systems extend the capabilities of traditional RAG by integrating decision-
making, query reformulation, and adaptive workflows. The following tools and frameworks provide robust support for
developing Agentic RAG systems, addressing the complex requirements of real-world applications.
Key Tools and Frameworks:
•LangChain and LangGraph: LangChain [ 42] provides modular components for building RAG pipelines,
seamlessly integrating retrievers, generators, and external tools. LangGraph complements this by introducing
graph-based workflows that support loops, state persistence, and human-in-the-loop interactions, enabling
sophisticated orchestration and self-correction mechanisms in agentic systems.
•LlamaIndex: LlamaIndex’s [ 43] Agentic Document Workflows (ADW) enable end-to-end automation of
document processing, retrieval, and structured reasoning. It introduces a meta-agent architecture where
sub-agents manage smaller document sets, coordinating through a top-level agent for tasks such as compliance
analysis and contextual understanding.
•Hugging Face Transformers and Qdrant: Hugging Face [ 44] offers pre-trained models for embedding and
generation tasks, while Qdrant [ 45] enhances retrieval workflows with adaptive vector search capabilities,
allowing agents to optimize performance by dynamically switching between sparse and dense vector methods.
31
Page 32:
•CrewAI and AutoGen: These frameworks emphasize multi-agent architectures. CrewAI [ 46] supports
hierarchical and sequential processes, robust memory systems, and tool integrations. AG2 [ 47] (formerly
knows as AutoGen [ 48,49]) excels in multi-agent collaboration with advanced support for code generation,
tool execution, and decision-making.
•OpenAI Swarm Framework: An educational framework designed for ergonomic, lightweight multi-agent
orchestration [50], emphasizing agent autonomy and structured collaboration.
•Agentic RAG with Vertex AI: Developed by Google, Vertex AI [ 51] integrates seamlessly with Agentic
Retrieval-Augmented Generation (RAG), providing a platform to build, deploy, and scale machine learning
models while leveraging advanced AI capabilities for robust, contextually aware retrieval and decision-making
workflows.
•Semantic Kernel: Semantic Kernel [ 52,53] is an open-source SDK by Microsoft that integrates large language
models (LLMs) into applications. It supports agentic patterns, enabling the creation of autonomous AI agents
for natural language understanding, task automation, and decision-making. It has been used in scenarios like
ServiceNow’s P1 incident management to facilitate real-time collaboration, automate task execution, and
retrieve contextual information seamlessly
•Amazon Bedrock for Agentic RAG: Amazon Bedrock [ 37] provides a robust platform for implementing
Agentic Retrieval-Augmented Generation (RAG) workflows.
•IBM Watson and Agentic RAG: IBM’s watsonx.ai [ 54] supports building Agentic RAG systems, exemplified
by using the Granite-3-8B-Instruct model to answer complex queries by integrating external information and
enhancing response accuracy.
•Neo4j and Vector Databases: Neo4j, a prominent open-source graph database, excels in handling complex
relationships and semantic queries. Alongside Neo4j, vector databases like Weaviate, Pinecone, Milvus, and
Qdrant provide efficient similarity search and retrieval capabilities, forming the backbone of high-performance
Agentic Retrieval-Augmented Generation (RAG) workflows.
9 Benchmarks and Datasets
Current benchmarks and datasets provide valuable insights into evaluating Retrieval-Augmented Generation (RAG)
systems, including those with agentic and graph-based enhancements. While some are explicitly designed for RAG,
others are adapted to test retrieval, reasoning, and generation capabilities in diverse scenarios. Datasets are crucial for
testing the retrieval, reasoning, and generation components of RAG systems. Table 3 discusses some key datasets based
on the dowstream task for RAG Evaluation.
Benchmarks play a critical role in standardizing the evaluation of RAG systems by providing structured tasks and
metrics. The following benchmarks are particularly relevant:
•BEIR (Benchmarking Information Retrieval): A versatile benchmark designed for evaluating embedding
models on a variety of information retrieval tasks, encompassing 17 datasets across diverse domains like
bioinformatics, finance, and question answering [55].
•MS MARCO (Microsoft Machine Reading Comprehension): Focused on passage ranking and question
answering, this benchmark is widely used for dense retrieval tasks in RAG systems [56].
•TREC (Text REtrieval Conference, Deep Learning Track): Provides datasets for passage and document
retrieval, emphasizing the quality of ranking models in retrieval pipelines [57].
•MuSiQue (Multihop Sequential Questioning): A benchmark for multihop reasoning across multiple
documents, emphasizing the importance of retrieving and synthesizing information from disconnected contexts
[58].
•2WikiMultihopQA: A dataset designed for multihop QA tasks over two Wikipedia articles, focusing on the
ability to connect knowledge across multiple sources [59].
•AgentG (Agentic RAG for Knowledge Fusion): Tailored for agentic RAG tasks, this benchmark assesses
dynamic information synthesis across multiple knowledge bases [8].
•HotpotQA: A multi-hop QA benchmark requiring retrieval and reasoning over interconnected contexts, ideal
for evaluating complex RAG workflows[60].
•RAGBench: A large-scale, explainable benchmark featuring 100,000 examples across industry domains, with
a TRACe evaluation framework for actionable RAG metrics [61].
32
Page 33:
•BERGEN (Benchmarking Retrieval-Augmented Generation): A library for systematically benchmarking
RAG systems with standardized experiments [62].
•FlashRAG Toolkit: Implements 12 RAG methods and includes 32 benchmark datasets to support efficient
and standardized RAG evaluation [63].
•GNN-RAG: This benchmark evaluates graph-based RAG systems on tasks like node-level and edge-level
predictions, focusing on retrieval quality and reasoning performance in Knowledge Graph Question Answering
(KGQA) [64].
Table 3: Downstream Tasks and Datasets for RAG Evaluation (Adapted from [20]
Category Task Type Datasets and References
QASingle-hop QA Natural Questions (NQ) [ 65], TriviaQA [ 66], SQuAD [ 67],
Web Questions (WebQ) [ 68], PopQA [ 69], MS MARCO
[56]
Multi-hop QA HotpotQA [60], 2WikiMultiHopQA [59], MuSiQue [58]
Long-form QA ELI5 [ 70], NarrativeQA (NQA) [ 71], ASQA [ 72], QM-
Sum [73]
Domain-specific QA Qasper [ 74], COVID-QA [ 75], CMB/MMCU Medical
[76]
Multi-choice QA QuALITY [ 77], ARC (No reference available), Common-
senseQA [78]
Graph-based QAGraph QA GraphQA [79]
Event Argument Extraction WikiEvent [80], RAMS [81]
DialogOpen-domain Dialog Wizard of Wikipedia (WoW) [82]
Personalized Dialog KBP [83], DuleMon [84]
Task-oriented Dialog CamRest [85]
Recommendation Personalized Content Amazon Datasets (Toys, Sports, Beauty) [86]
ReasoningCommonsense Reasoning HellaSwag [87], CommonsenseQA [78]
CoT Reasoning CoT Reasoning [88]
Complex Reasoning CSQA [89]
OthersLanguage Understanding MMLU (No reference available), WikiText-103 [65]
Fact Checking/Verification FEVER [90], PubHealth [91]
Strategy QA StrategyQA [92]
SummarizationText Summarization WikiASP [93], XSum [94]
Long-form Summarization NarrativeQA (NQA) [71], QMSum [73]
Text Generation Biography Biography Dataset (No reference available)
Text ClassificationSentiment Analysis SST-2 [95]
General Classification VioLens[96], TREC [57]
Code Search Programming Search CodeSearchNet [97]
RobustnessRetrieval Robustness NoMIRACL [98]
Language Modeling Robustness WikiText-103 [99]
Math Math Reasoning GSM8K [100]
Machine Translation Translation Tasks JRC-Acquis [101]
10 Conclusion
Agentic Retrieval-Augmented Generation (RAG) represents a transformative advancement in artificial intelligence,
addressing the limitations of traditional RAG systems through the integration of autonomous agents. By leveraging
33
Page 34:
agentic intelligence, these systems introduce capabilities such as dynamic decision-making, iterative reasoning, and
collaborative workflows, enabling them to tackle complex, real-world tasks with enhanced precision and adaptability.
This survey explored the evolution of RAG systems, from their initial implementations to advanced paradigms like
Modular RAG, highlighting the contributions and limitations of each. The integration of agents into the RAG pipeline
has emerged as a pivotal development, resulting in Agentic RAG systems that overcome static workflows and limited
contextual adaptability. Applications across healthcare, finance, education, and creative industries demonstrate the
transformative potential of these systems, showcasing their ability to deliver personalized, real-time, and context-aware
solutions.
Despite their promise, Agentic RAG systems face challenges that require further research and innovation. Coordination
complexity in multi-agent architectures, scalability, and latency issues, as well as ethical considerations, must be
addressed to ensure robust and responsible deployment. Additionally, the lack of specialized benchmarks and datasets
tailored to evaluate agentic capabilities poses a significant hurdle. Developing evaluation methodologies that capture
the unique aspects of Agentic RAG, such as multi-agent collaboration and dynamic adaptability, will be crucial for
advancing the field.
Looking ahead, the convergence of retrieval-augmented generation and agentic intelligence has the potential to redefine
AI’s role in dynamic and complex environments. By addressing these challenges and exploring future directions,
researchers and practitioners can unlock the full potential of Agentic RAG systems, paving the way for transformative
applications across industries and domains. As AI systems continue to evolve, Agentic RAG stands as a cornerstone for
creating adaptive, context-aware, and impactful solutions that meet the demands of a rapidly changing world.
References
[1]Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, and
Jianfeng Gao. Large language models: A survey, 2024.
[2]Aditi Singh. Exploring language models: A comprehensive survey and analysis. In 2023 International Con-
ference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication
Engineering (RMKMATE) , pages 1–4, 2023.
[3]Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang,
Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren,
Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. A survey of large language
models, 2024.
[4]Sumit Kumar Dam, Choong Seon Hong, Yu Qiao, and Chaoning Zhang. A complete survey on llm-based ai
chatbots, 2024.
[5]Aditi Singh. A survey of ai text-to-image and ai text-to-video generators. In 2023 4th International Conference
on Artificial Intelligence, Robotics and Control (AIRC) , pages 32–36, 2023.
[6]Aditi Singh, Abul Ehtesham, Gaurav Kumar Gupta, Nikhil Kumar Chatta, Saket Kumar, and Tala Talaei Khoei.
Exploring prompt engineering: A systematic review with swot analysis, 2024.
[7]Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua
Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. A survey on hallucination in large language models: Principles,
taxonomy, challenges, and open questions. ACM Transactions on Information Systems , November 2024.
[8]Meng-Chieh Lee, Qi Zhu, Costas Mavromatis, Zhen Han, Soji Adeshina, Vassilis N. Ioannidis, Huzefa Rangwala,
and Christos Faloutsos. Agent-g: An agentic framework for graph retrieval augmented generation, 2024.
[9]Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao
Zhang, Jie Jiang, and Bin Cui. Retrieval-augmented generation for ai-generated content: A survey, 2024.
[10] Zhengbao Jiang, Frank F. Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan,
and Graham Neubig. Active retrieval augmented generation, 2023.
[11] Yikun Han, Chunjiang Liu, and Pengfei Wang. A comprehensive survey on vector database: Storage and retrieval
technique, challenge, 2023.
[12] Anthropic. Building effective agents, 2024. https://www.anthropic.com/research/
building-effective-agents . Accessed: February 2, 2025.
[13] LangChain. Langgraph workflows tutorial, 2025. https://langchain-ai.github.io/langgraph/
tutorials/workflows/ . Accessed: February 2, 2025.
34
Page 35:
[14] Chidaksh Ravuru, Sagar Srinivas Sakhinana, and Venkataramana Runkana. Agentic retrieval-augmented
generation for time series analysis, 2024.
[15] Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey, 2023.
[16] Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang.
Graph retrieval-augmented generation: A survey, 2024.
[17] Aditi Singh, Abul Ehtesham, Saifuddin Mahmud, and Jong-Hoon Kim. Revolutionizing mental health care
through langchain: A journey with a large language model. In 2024 IEEE 14th Annual Computing and
Communication Workshop and Conference (CCWC) , pages 0073–0078, 2024.
[18] Gaurav Kumar Gupta, Aditi Singh, Sijo Valayakkad Manikandan, and Abul Ehtesham. Digital diagnostics: The
potential of large language models in recognizing symptoms of common illnesses. AI, 6(1), 2025.
[19] Aditi Singh, Abul Ehtesham, Saket Kumar, Gaurav Kumar Gupta, and Tala Talaei Khoei. Encouraging responsible
use of generative ai in education: A reward-based learning approach. In Tim Schlippe, Eric C. K. Cheng, and
Tianchong Wang, editors, Artificial Intelligence in Education Technologies: New Development and Innovative
Practices , pages 404–413, Singapore, 2025. Springer Nature Singapore.
[20] Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and
Haofen Wang. Retrieval-augmented generation for large language models: A survey, 2024.
[21] Vladimir Karpukhin, Barlas O ˘guz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen
tau Yih. Dense passage retrieval for open-domain question answering, 2020.
[22] Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai, Jieming Zhu, Zhenhua Dong, and Ji-Rong
Wen. A survey on the memory mechanism of large language model based agents, 2024.
[23] Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, and Weizhu Chen. Critic: Large
language models can self-correct with tool-interactive critiquing, 2024.
[24] Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng Wang, Ruiming Tang,
and Enhong Chen. Understanding the planning of llm agents: A survey, 2024.
[25] Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. Enhancing ai systems with agentic workflows
patterns in large language model. In 2024 IEEE World AI IoT Congress (AIIoT) , pages 527–532, 2024.
[26] DeepLearning.AI. How agents can improve llm performance. https://www.deeplearning.ai/the-batch/
how-agents-can-improve-llm-performance/?ref=dl-staging-website.ghost.io , 2024. Ac-
cessed: 2025-01-13.
[27] Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha
Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann,
Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. Self-refine: Iterative refinement with self-feedback, 2023.
[28] Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao.
Reflexion: Language agents with verbal reinforcement learning, 2023.
[29] Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V . Chawla, Olaf Wiest, and
Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges, 2024.
[30] Weaviate Blog. What is agentic rag? https://weaviate.io/blog/what-is-agentic-rag#:~:text=is%
20Agentic%20RAG%3F-,%E2%80%8B,of%20the%20non%2Dagentic%20pipeline. Accessed: 2025-01-14.
[31] Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, and Zhen-Hua Ling. Corrective retrieval augmented generation, 2024.
[32] LangGraph CRAG Tutorial. Langgraph crag: Contextualized retrieval-augmented generation tutorial. https:
//langchain-ai.github.io/langgraph/tutorials/rag/langgraph_crag/ . Accessed: 2025-01-14.
[33] Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C. Park. Adaptive-rag: Learning to adapt
retrieval-augmented large language models through question complexity, 2024.
[34] LangGraph Adaptive RAG Tutorial. Langgraph adaptive rag: Adaptive retrieval-augmented generation tu-
torial. https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_adaptive_rag/ .
Accessed: 2025-01-14.
[35] Zhili Shen, Chenxin Diao, Pavlos V ougiouklis, Pascual Merita, Shriram Piramanayagam, Damien Graux, Dandan
Tu, Zeren Jiang, Ruofei Lai, Yang Ren, and Jeff Z. Pan. Gear: Graph-enhanced agent for retrieval-augmented
generation, 2024.
[36] LlamaIndex. Introducing agentic document workflows. https://www.llamaindex.ai/blog/
introducing-agentic-document-workflows , 2025. Accessed: 2025-01-13.
35
Page 36:
[37] AWS Machine Learning Blog. How twitch used agentic workflow with rag on amazon
bedrock to supercharge ad sales. https://aws.amazon.com/blogs/machine-learning/
how-twitch-used-agentic-workflow-with-rag-on-amazon-bedrock-to-supercharge-ad-sales/ ,
2025. Accessed: 2025-01-13.
[38] LlamaCloud Demo Repository. Patient case summary workflow using llamacloud. https:
//github.com/run-llama/llamacloud-demo/blob/main/examples/document_workflows/
patient_case_summary/patient_case_summary.ipynb , 2025. Accessed: 2025-01-13.
[39] LlamaCloud Demo Repository. Contract review workflow using llamacloud. https://github.com/
run-llama/llamacloud-demo/blob/main/examples/document_workflows/contract_review/
contract_review.ipynb , 2025. Accessed: 2025-01-13.
[40] LlamaCloud Demo Repository. Auto insurance claims workflow using llamacloud. https:
//github.com/run-llama/llamacloud-demo/blob/main/examples/document_workflows/auto_
insurance_claims/auto_insurance_claims.ipynb , 2025. Accessed: 2025-01-13.
[41] LlamaCloud Demo Repository. Research paper report generation workflow using llamacloud.
https://github.com/run-llama/llamacloud-demo/blob/main/examples/report_generation/
research_paper_report_generation.ipynb , 2025. Accessed: 2025-01-13.
[42] LangGraph Agentic RAG Tutorial. Langgraph agentic rag: Nodes and edges tutorial. https://langchain-ai.
github.io/langgraph/tutorials/rag/langgraph_agentic_rag/#nodes-and-edges . Accessed:
2025-01-14.
[43] LlamaIndex Blog. Agentic rag with llamaindex. https://www.llamaindex.ai/blog/
agentic-rag-with-llamaindex-2721b8a49ff6 . Accessed: 2025-01-14.
[44] Hugging Face Cookbook. Agentic rag: Turbocharge your retrieval-augmented generation with query reformula-
tion and self-query. https://huggingface.co/learn/cookbook/en/agent_rag . Accessed: 2025-01-14.
[45] Qdrant Blog. Agentic rag: Combining rag with agents for enhanced information retrieval. https://qdrant.
tech/articles/agentic-rag/ . Accessed: 2025-01-14.
[46] crewAI Inc. crewai: A github repository for ai projects. https://github.com/crewAIInc/crewAI , 2025.
Accessed: 2025-01-15.
[47] AG2AI Contributors. Ag2: A github repository for advanced generative ai research. https://github.com/
ag2ai/ag2 , 2025. Accessed: 2025-01-15.
[48] Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun
Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, and Chi Wang. Autogen: Enabling
next-gen llm applications via multi-agent conversation framework. 2023.
[49] Shaokun Zhang, Jieyu Zhang, Jiale Liu, Linxin Song, Chi Wang, Ranjay Krishna, and Qingyun Wu. Training
language model agents without modifying language models. ICML’24 , 2024.
[50] OpenAI. Swarm: Lightweight multi-agent orchestration framework. https://github.com/openai/swarm .
Accessed: 2025-01-14.
[51] LlamaIndex Documentation. Agentic rag using vertex ai. https://docs.llamaindex.ai/en/stable/
examples/agent/agentic_rag_using_vertex_ai/ . Accessed: 2025-01-14.
[52] Microsoft. Semantic kernel overview, 2025. https://learn.microsoft.com/en-us/semantic-kernel/
overview/ . Accessed: February 2, 2025.
[53] Microsoft. Semantic kernel github repository, 2025. https://github.com/microsoft/semantic-kernel .
Accessed: February 2, 2025.
[54] IBM Granite Community. Agentic rag: Ai agents with ibm granite models. https://github.com/
ibm-granite-community/granite-snack-cookbook/blob/main/recipes/AI-Agents/Agentic_
RAG.ipynb . Accessed: 2025-01-14.
[55] Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. Beir: A heterogenous
benchmark for zero-shot evaluation of information retrieval models, 2021.
[56] Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew
McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong
Wang. Ms marco: A human generated machine reading comprehension dataset, 2018.
[57] Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Jimmy Lin, Ellen M. V oorhees, and Ian Soboroff.
Overview of the trec 2022 deep learning track. In Text REtrieval Conference (TREC) . NIST, TREC, March 2023.
36
Page 37:
[58] Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. Musique: Multihop questions
via single-hop question composition, 2022.
[59] Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. Constructing a multi-hop qa dataset
for comprehensive evaluation of reasoning steps, 2020.
[60] Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, and Christo-
pher D. Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering, 2018.
[61] Robert Friel, Masha Belyi, and Atindriyo Sanyal. Ragbench: Explainable benchmark for retrieval-augmented
generation systems, 2024.
[62] David Rau, Hervé Déjean, Nadezhda Chirkova, Thibault Formal, Shuai Wang, Vassilina Nikoulina, and Stéphane
Clinchant. Bergen: A benchmarking library for retrieval-augmented generation, 2024.
[63] Jiajie Jin, Yutao Zhu, Xinyu Yang, Chenghao Zhang, and Zhicheng Dou. Flashrag: A modular toolkit for efficient
retrieval-augmented generation research, 2024.
[64] Costas Mavromatis and George Karypis. Gnn-rag: Graph neural retrieval for large language model reasoning,
2024.
[65] Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle
Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei
Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: A benchmark for question
answering research. Transactions of the Association for Computational Linguistics , 7:452–466, 2019.
[66] Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. Triviaqa: A large scale distantly supervised
challenge dataset for reading comprehension, 2017.
[67] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine
comprehension of text, 2016.
[68] Jonathan Berant, Andrew K. Chou, Roy Frostig, and Percy Liang. Semantic parsing on freebase from question-
answer pairs. In Conference on Empirical Methods in Natural Language Processing , 2013.
[69] Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, and Hannaneh Hajishirzi. When not to
trust language models: Investigating effectiveness of parametric and non-parametric memories. In Anna Rogers,
Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers) , pages 9802–9822, Toronto, Canada, July 2023. Association
for Computational Linguistics.
[70] Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, and Michael Auli. Eli5: Long form
question answering, 2019.
[71] Tomáš Ko ˇciský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, and Edward
Grefenstette. The narrativeqa reading comprehension challenge. 2017.
[72] Ivan Stelmakh, Yi Luan, Bhuwan Dhingra, and Ming-Wei Chang. Asqa: Factoid questions meet long-form
answers, 2023.
[73] Ming Zhong, Da Yin, Tao Yu, Ahmad Zaidi, Mutethia Mutuma, Rahul Jha, Ahmed Hassan Awadallah, Asli
Celikyilmaz, Yang Liu, Xipeng Qiu, and Dragomir Radev. QMSum: A new benchmark for query-based
multi-domain meeting summarization. pages 5905–5921, June 2021.
[74] Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, and Matt Gardner. A dataset of information-
seeking questions and answers anchored in research papers. In Kristina Toutanova, Anna Rumshisky, Luke
Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao
Zhou, editors, Proceedings of the 2021 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies , pages 4599–4610, Online, June 2021. Association
for Computational Linguistics.
[75] Timo Möller, Anthony Reina, Raghavan Jayakumar, and Malte Pietsch. COVID-QA: A question answering
dataset for COVID-19. In ACL 2020 Workshop on Natural Language Processing for COVID-19 (NLP-COVID) ,
2020.
[76] Xidong Wang, Guiming Hardy Chen, Dingjie Song, Zhiyi Zhang, Zhihong Chen, Qingying Xiao, Feng Jiang,
Jianquan Li, Xiang Wan, Benyou Wang, and Haizhou Li. Cmb: A comprehensive medical benchmark in chinese,
2024.
[77] Richard Yuanzhe Pang, Alicia Parrish, Nitish Joshi, Nikita Nangia, Jason Phang, Angelica Chen, Vishakh
Padmakumar, Johnny Ma, Jana Thompson, He He, and Samuel R. Bowman. Quality: Question answering with
long input texts, yes!, 2022.
37
Page 38:
[78] Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. CommonsenseQA: A question answering
challenge targeting commonsense knowledge. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pages 4149–4158, Minneapolis,
Minnesota, June 2019. Association for Computational Linguistics.
[79] Xiaoxin He, Yijun Tian, Yifei Sun, Nitesh V . Chawla, Thomas Laurent, Yann LeCun, Xavier Bresson, and Bryan
Hooi. G-retriever: Retrieval-augmented generation for textual graph understanding and question answering,
2024.
[80] Sha Li, Heng Ji, and Jiawei Han. Document-level event argument extraction by conditional generation, 2021.
[81] Seth Ebner, Patrick Xia, Ryan Culkin, Kyle Rawlins, and Benjamin Van Durme. Multi-sentence argument
linking, 2020.
[82] Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, and Jason Weston. Wizard of wikipedia:
Knowledge-powered conversational agents, 2019.
[83] Hongru Wang, Minda Hu, Yang Deng, Rui Wang, Fei Mi, Weichao Wang, Yasheng Wang, Wai-Chung Kwan,
Irwin King, and Kam-Fai Wong. Large language models as source planner for personalized knowledge-grounded
dialogue, 2023.
[84] Xinchao Xu, Zhibin Gou, Wenquan Wu, Zheng-Yu Niu, Hua Wu, Haifeng Wang, and Shihang Wang. Long time
no see! open-domain conversation with long-term persona memory, 2022.
[85] Tsung-Hsien Wen, Milica Gaši ´c, Nikola Mrkši ´c, Lina M. Rojas-Barahona, Pei-Hao Su, Stefan Ultes, David
Vandyke, and Steve Young. Conditional generation and snapshot learning in neural dialogue systems. In
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , pages 2153–2162,
Austin, Texas, November 2016. Association for Computational Linguistics.
[86] Ruining He and Julian McAuley. Ups and downs: Modeling the visual evolution of fashion trends with one-class
collaborative filtering. In Proceedings of the 25th International Conference on World Wide Web , WWW ’16, page
507–517, Republic and Canton of Geneva, CHE, 2016. International World Wide Web Conferences Steering
Committee.
[87] Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. HellaSwag: Can a machine really
finish your sentence? In Anna Korhonen, David Traum, and Lluís Màrquez, editors, Proceedings of the 57th
Annual Meeting of the Association for Computational Linguistics , pages 4791–4800, Florence, Italy, July 2019.
Association for Computational Linguistics.
[88] Seungone Kim, Se June Joo, Doyoung Kim, Joel Jang, Seonghyeon Ye, Jamin Shin, and Minjoon Seo. The
cot collection: Improving zero-shot and few-shot learning of language models via chain-of-thought fine-tuning,
2023.
[89] Amrita Saha, Vardaan Pahuja, Mitesh M. Khapra, Karthik Sankaranarayanan, and Sarath Chandar. Complex
sequential question answering: Towards learning to converse over linked question answer pairs with a knowledge
graph. 2018.
[90] James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. FEVER: a large-scale dataset for
fact extraction and VERification. In Marilyn Walker, Heng Ji, and Amanda Stent, editors, Proceedings of the 2018
Conference of the North American Chapter of the Association for Computational Linguistics: Human Language
Technologies, Volume 1 (Long Papers) , pages 809–819, New Orleans, Louisiana, June 2018. Association for
Computational Linguistics.
[91] Neema Kotonya and Francesca Toni. Explainable automated fact-checking for public health claims, 2020.
[92] Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, and Jonathan Berant. Did aristotle use a laptop?
a question answering benchmark with implicit reasoning strategies, 2021.
[93] Hiroaki Hayashi, Prashant Budania, Peng Wang, Chris Ackerson, Raj Neervannan, and Graham Neubig. Wikiasp:
A dataset for multi-domain aspect-based summarization, 2020.
[94] Shashi Narayan, Shay B. Cohen, and Mirella Lapata. Don’t give me the details, just the summary! topic-aware
convolutional neural networks for extreme summarization, 2018.
[95] Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher
Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In David Yarowsky,
Timothy Baldwin, Anna Korhonen, Karen Livescu, and Steven Bethard, editors, Proceedings of the 2013
Conference on Empirical Methods in Natural Language Processing , pages 1631–1642, Seattle, Washington,
USA, October 2013. Association for Computational Linguistics.
38
Page 39:
[96] Sourav Saha, Jahedul Alam Junaed, Maryam Saleki, Arnab Sen Sharma, Mohammad Rashidujjaman Rifat,
Mohamed Rahouti, Syed Ishtiaque Ahmed, Nabeel Mohammed, and Mohammad Ruhul Amin. Vio-lens: A novel
dataset of annotated social network posts leading to different forms of communal violence and its evaluation. In
Firoj Alam, Sudipta Kar, Shammur Absar Chowdhury, Farig Sadeque, and Ruhul Amin, editors, Proceedings
of the First Workshop on Bangla Language Processing (BLP-2023) , pages 72–84, Singapore, December 2023.
Association for Computational Linguistics.
[97] Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. Codesearchnet
challenge: Evaluating the state of semantic code search, 2020.
[98] Nandan Thakur, Luiz Bonifacio, Xinyu Zhang, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo,
Xiaoguang Li, Qun Liu, Boxing Chen, Mehdi Rezagholizadeh, and Jimmy Lin. "knowing when you don’t know":
A multilingual relevance assessment dataset for robust retrieval-augmented generation, 2024.
[99] Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models, 2016.
[100] Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert,
Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to
solve math word problems, 2021.
[101] Ralf Steinberger, Bruno Pouliquen, Anna Widiger, Camelia Ignat, Tomaž Erjavec, Dan Tufi¸ s, and Dániel Varga.
The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Nicoletta Calzolari, Khalid
Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, and Daniel Tapias, editors, Proceedings of
the Fifth International Conference on Language Resources and Evaluation (LREC‘06) , Genoa, Italy, May 2006.
European Language Resources Association (ELRA).
39