loader

Ready to Build Apps in Microsoft 365 with Power Platform? Discover the ultimate guide to transforming ideas into powerful apps! Read a FREE chapter now and start creating today!

Get on Kindle

1. Retrieval-Augmented Generation (RAG)

RAG combines a retrieval model and a generative model to create contextually rich, accurate responses. The retriever fetches relevant documents or passages from an external knowledge base, and the generator uses this retrieved context to produce responses.

Workflow:

  1. Retriever: Retrieves the top-K documents from a knowledge base based on the query.
  2. Generator: Generates a response based on the retrieved documents.
  3. Feedback Loop (optional): Fine-tunes retrieval and generation.

Key Features:

  • Scalability: Can work with vast external knowledge bases.
  • Factual Accuracy: Reduces hallucination by grounding responses in retrieved information.
  • Applications: Open-domain QA, chatbots, content creation, summarization.

Style Transfer "Transferring artistic styles from one image to another."

2. Cache-Augmented Generation

In Cache-Augmented Generation, responses are enhanced by using a local cache to store previously generated or retrieved results, avoiding redundant computations.

Workflow:

  1. Query Matching: Checks if a similar query exists in the cache.
  2. Cache Hit: Uses the cached response directly or combines it with retrieved results.
  3. Cache Miss: Executes a full RAG pipeline and updates the cache.

Advantages:

  • Reduced Latency: Avoids redundant computations for repeated queries.
  • Efficiency: Improves response times in high-throughput systems.
  • Applications: Customer support, FAQs, and other repetitive-query environments.

Neural Networks "Computational models mimicking the human brain for pattern recognition."

3. GraphRAG

GraphRAG leverages graph-based representations of knowledge to improve retrieval and generation. Instead of linear or flat knowledge bases, it works with knowledge graphs where entities and their relationships are explicitly defined.

Workflow:

  1. Graph-Based Retrieval: Retrieves relevant nodes and subgraphs.
  2. Contextual Understanding: Generates responses by understanding relationships between entities.
  3. Integration with Generative Models: Provides structured graph-based context for generation.

Key Benefits:

  • Semantic Understanding: Captures richer contextual relationships.
  • Applications: Complex reasoning tasks, multi-hop question answering, biomedical research.

Responsible AI "Ensuring ethical and fair use of AI technologies."

4. VectorRAG

VectorRAG combines retrieval-augmented generation with vector-based search powered by vector embeddings. This method excels in semantic search and retrieval.

Workflow:

  1. Embedding Generation: Transforms queries and documents into vector embeddings.
  2. Vector Search: Uses similarity metrics (like cosine similarity) to retrieve the most relevant results.
  3. Generative Augmentation: Uses retrieved documents for response generation.

Key Benefits:

  • Semantic Precision: Captures intent and meaning beyond keyword matching.
  • Applications: Personalized recommendations, document search, and multilingual applications.

Human-in-the-Loop Systems "Combining AI automation with human oversight for better outcomes."

5. HybridRAG

HybridRAG combines dense retrieval (vector-based search) and sparse retrieval (traditional keyword-based search like BM25) for a more robust retrieval system.

Workflow:

  1. Sparse Retrieval: Matches keywords and retrieves documents using traditional methods.
  2. Dense Retrieval: Uses vector embeddings for semantic retrieval.
  3. Hybrid Ranking: Combines results from both approaches and ranks them for relevance.

Key Benefits:

  • Balanced Approach: Combines strengths of keyword and semantic search.
  • Resiliency: Handles diverse queries effectively.
  • Applications: Enterprise search, e-commerce, legal document search.

Text-to-Video Models "Models that generate video content based on text inputs."

6. Comparison Table

Variant Key Feature Best Use Case
RAG Combines retrieval and generation. Open-domain QA, chatbots.
CacheRAG Uses a cache for efficiency. High-throughput systems, repetitive queries.
GraphRAG Leverages knowledge graphs. Complex reasoning, biomedical research.
VectorRAG Vector-based semantic search. Personalized recommendations, multilingual QA.
HybridRAG Combines sparse and dense retrieval techniques. Enterprise search, e-commerce.

We introduce you our Open AGI Codes | Your Codes Reflect! Team! Get more information about us here!

About Us

What are RAG Metrics?

RAG metrics, short for Retrieval-Augmented Generation metrics, are evaluation methods used in systems that combine retrieval-based approaches with generative models to assess the quality and effectiveness of generated responses. These metrics are commonly applied in open-domain question answering and similar applications.

Key Components of RAG Systems

  • Retriever: Finds relevant documents or passages from a knowledge base. Evaluated using metrics like Recall@K and MRR.
  • Generator: Produces responses using retrieved information as context. Evaluated for fluency, informativeness, and accuracy.

Positional Encoding "Embedding spatial or sequential information into input data for transformers."

Metrics for Evaluating RAG Systems

1. Recall@K

Measures how many of the top K retrieved documents contain relevant information.

Formula:

Recall@K = \( \frac{\text{Number of relevant documents in top K results}}{\text{Total number of relevant documents}} \)

Example:

If K = 2 and the relevant document appears in the top 2 results:

Recall@2 = \( \frac{1}{1} = 1.0 \)

2. Mean Reciprocal Rank (MRR)

Evaluates how quickly the first relevant document is retrieved.

Formula:

MRR = \( \frac{1}{N} \sum_{i=1}^{N} \frac{1}{\text{rank}_i} \)

Example:

Query 1: Reciprocal Rank = \( \frac{1}{2} \)
Query 2: Reciprocal Rank = \( 1.0 \)
Query 3: Reciprocal Rank = \( 0 \)
MRR = \( \frac{1}{3} (\frac{1}{2} + 1.0 + 0) = 0.5 \)

3. BLEU (Bilingual Evaluation Understudy)

Measures the overlap of n-grams between the generated text and a reference answer.

Formula:

BLEU = BP * exp\( \sum_{n=1}^{N} w_n \log p_n \)

4. ROUGE (Recall-Oriented Understudy for Gisting Evaluation)

Focuses on recall by measuring how much of the reference text is captured in the generated text.

5. Factual Consistency

Ensures that the generated response is factually consistent with the retrieved content. This can be evaluated manually or using automated tools.

Implementation Examples

Python Code for Recall@K and MRR

def recall_at_k(retrieved_docs, relevant_docs, k):
            retrieved_at_k = retrieved_docs[:k]
            return sum(1 for doc in retrieved_at_k if doc in relevant_docs) / len(relevant_docs)

        def mean_reciprocal_rank(retrieved_docs, relevant_docs_list):
            reciprocal_ranks = []
            for relevant_docs in relevant_docs_list:
                for rank, doc in enumerate(retrieved_docs, start=1):
                    if doc in relevant_docs:
                        reciprocal_ranks.append(1 / rank)
                        break
                else:
                    reciprocal_ranks.append(0)
            return sum(reciprocal_ranks) / len(relevant_docs_list)

        retrieved_docs = ["doc1", "doc2", "doc3"]
        relevant_docs_list = [["doc2"], ["doc3"]]

        print("Recall@2:", recall_at_k(retrieved_docs, relevant_docs_list[0], 2))
        print("MRR:", mean_reciprocal_rank(retrieved_docs, relevant_docs_list))

Python Code for BLEU

from nltk.translate.bleu_score import sentence_bleu

        reference = [["OpenAI", "was", "founded", "by", "Elon", "Musk", "and", "Sam", "Altman"]]
        candidate = ["OpenAI", "was", "started", "by", "Sam", "Altman", "and", "Elon", "Musk"]

        score = sentence_bleu(reference, candidate)
        print("BLEU Score:", score)

Approximate Nearest Neighbor Search "Algorithm for quickly finding vectors closest to a query vector."

Based on insights from the video "7 Measurements that Help Minimize Model Risk for RAG", here are seven essential metrics for assessing the performance of RAG systems:

  1. BLEU (Bilingual Evaluation Understudy Score): Assesses the precision of n-grams in the generated text compared to reference texts, indicating how much of the generated output matches the reference.
  2. ROUGE (Recall-Oriented Understudy for Gisting Evaluation): Measures the recall by evaluating the overlap of n-grams between the generated and reference texts, focusing on how much of the reference content is captured in the output.
  3. METEOR (Metric for Evaluation of Translation with Explicit ORdering): Balances precision and recall, incorporating stemming and synonymy to better align with human judgment in evaluating the quality of generated text.
  4. PII (Personally Identifiable Information) Detection: Ensures that the model does not generate responses containing sensitive information that can identify individuals, such as names, addresses, or social security numbers.
  5. Context Relevance: Evaluates how closely the retrieved context aligns with the user's query, ensuring that the most pertinent information is provided to support the generated response.
  6. Hate, Abuse, and Profanity (HAP) Score: Monitors the model for generating language that is hateful, abusive, or profane, aiming to maintain respectful and appropriate interactions.
  7. Hallucination Rate: Assesses the frequency at which the model produces information not supported by the retrieved context, striving to minimize fabricated or incorrect outputs.

These metrics provide a comprehensive framework for evaluating both the retrieval and generation components of RAG systems, ensuring their effectiveness, reliability, and ethical considerations.

For a more in-depth understanding, you can watch the full video below:

Fine-Grained Control in Generative AI "Allowing users to control specific aspects of generative outputs."

Personalized advice from industry veterans—AI, Cloud, and No-Code solutions await you!

Get Started Now

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating external data to improve accuracy and relevance. Microsoft's research categorizes RAG tasks into four levels, each requiring progressively complex reasoning and data integration:

Level 1: Explicit Fact Queries

These involve straightforward questions seeking specific facts directly present in the data, without the need for additional reasoning. The model's task is to locate and extract this information.

Level 2: Implicit Fact Queries

These queries require the model to interpret and combine information to derive an answer. The necessary data might be dispersed across multiple segments or require simple inferencing. For example, determining the majority party in the country where Canberra is located involves knowing that Canberra is in Australia and identifying Australia's current majority party.

Level 3: Interpretable Rationale Queries

These focus on understanding the reasoning behind facts and necessitate data that supports logical explanations. Such queries require both factual knowledge and the ability to interpret and apply specific domain-based guidelines essential to the context. For instance, in financial auditing, an LLM may need to follow regulatory compliance guidelines to assess if a company's financial statements meet standards.

Level 4: Hidden Rationale Queries

These seek deeper insights, often requiring context-based reasoning to uncover underlying meanings or implications. The AI must infer complex rationales that aren't explicitly documented, relying on patterns and outcomes observed within the data. For example, in IT operations, a language model might analyze patterns from past incident resolutions to identify successful strategies.

This hierarchical framework aids in selecting appropriate RAG architectures tailored to specific use cases, ensuring alignment with task demands and enhancing the system's effectiveness.

Transfer Learning "Reusing a pre-trained model's knowledge for a different but related task."

Do you want to check out our featured section?

Featured