Advanced RAG Techniques using Lamaindex-01-Pre-Retreival

Published in

Level Up Coding

4 min readFeb 29, 2024

Introduction

The objective of this article is to organize and discuss the fundamental advanced RAG techniques, with specific examples drawn from their implementation in LlamaIndex. In the initial segment of this series, we will address shortcomings in basic RAG approaches and outline the overarching methodology of advanced RAG. A thorough examination of the pre-retrieval phase will follow, encompassing a variety of techniques utilized during the retrieval process.

Naive RAG Flow

The Naive RAG follows a conventional process that consists of indexing, retrieval, and generation. It is also characterized as a “Retrieve and Generate” framework:

Indexing Involves cleansing, extracting, and standardizing statistics into plain text chunks A chosen embedding model vectorizes the chunks to permit similarity comparisons all through retrieval The vectorized chunks are listed as key-value pairs, allowing scalable search

Retrieval Employs the equal encoding model to vectorize the consumer query Computes similarity rankings among this and the indexed chunks Retrieves the pinnacle maximum comparable chunks to increase context for addressing the request

Generation Synthesizes the question and retrieved chunks right into a prompt A language model formulates a response to this prompt

Naive Rag posses significant challenges including low precison, recall , hallucinations and mid air-drop issues .

Some of the key Challenges in Naive Rag is summarised below:

Hallucination risk during generation, where the model produces responses not grounded in the provided context
Irrelevant context and potential toxicity or bias in generated responses
Difficulty effectively integrating retrieved information into the current generation task, resulting in disjointed or incoherent outputs
Repetition and redundancy when retrieved passages contain similar information
Challenges discerning the relevance and importance of different retrieved passages
Inconsistencies in writing style and tone when augmenting with multiple passages
Overreliance on augmented information, with generation models simply reiterating content rather than synthesizing new information

Advanced RAG techniques help resolved some if these issues . We will be delving in depth the various techniques involved to overcome these issues.

Advanced RAG Flow

In Advanced RAG the retrival process goes through the pre-retreival and post-retrival phase.

Pre-Retreival Phase

The goal of pre retrival is to improve the quality of the content being indexed . This is achieved using the following strategies

Query Routing

Directs queries to the optimal index or data source to return more relevant results

They can be used for the following use cases and more:

Selecting the right data source
Deciding whether to do summarization (e.g. using summary index query engine) or semantic search (e.g. using vector index query engine)
Deciding whether to “try” out a bunch of choices at once and combine the results (using multi-routing capabilities).

In llamaindex , they can be used on their own (as “selector modules”), or used as a query engine or retriever (e.g. on top of other query engines/retrievers).

A simple example of using router module as part of a query engine is given below.

from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import PydanticSingleSelector
from llama_index.core.tools import QueryEngineTool


list_tool = QueryEngineTool.from_defaults(
    query_engine=list_query_engine,
    description="Useful for summarization questions related to the data source",
)
vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description="Useful for retrieving specific context related to the data source",
)

query_engine = RouterQueryEngine(
    selector=PydanticSingleSelector.from_defaults(),
    query_engine_tools=[
        list_tool,
        vector_tool,
    ],
)
query_engine.query("<query>")

Below are some more examples of using RouterQueryEngine

Query Transformations

Query transformations are components designed to convert one query into another. These transformations can either be single-step, where the conversion occurs before executing the query against an index, or multi-stepwhere query undergoes transformation and is is then executed against an index and the response is retrived. Subsequent queries are processed sequentially, undergoing transformation and execution in turn.

User queries can undergo various transformations before being processed by a RAG query engine, agent, or pipeline. These transformations include:

Query-Rewriting: Rewriting the query in different ways while keeping the tools constant.
Sub-Questions: Breaking down queries into multiple sub-questions across different tools based on their metadata.
HyDE (Hypothetical Document Embeddings): HyDE is a method in which a human generates a hypothetical document or answer based on a natural language query. This generated content is then used for embedding lookup instead of using the original query directly

Multi-Step Query Decomposition : Multi-step query transformations build upon existing methods for transforming queries, allowing for more complex query processing. Starting with a complex initial query, it undergoes transformation and is executed against an index, with the response retrieved. Based on this response and previous ones, along with the original query, additional questions can be posed against the index. In this approach, the model employs a self-ask technique, generating and answering additional questions to enrich its understanding before providing a final response. By integrating information gathered from various sources during training, the model can offer more comprehensive and insightful answers.