Skip to content

HariniS1018/robust-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Robust RAG System

This repository contains the architecture and the implementation details for a Robust RAG (Retrieval-Augmented Generation) system designed to provide highly reliable and contextually relevant responses to user queries.

Table of Contents

  • Introduction
  • Architecture Overview
  • System Components
    • Storage Layer
    • Data Ingestion
    • User Interface (UI)
    • Server
    • Retriever
    • Context Filtering and Scoring
    • Response Generation (LLM)
  • How It Works (End-to-End Flow)

Introduction

The Robust RAG System aims to overcome common limitations of standard RAG implementations by incorporating advanced context filtering and scoring mechanisms. This ensures that the Large Language Model (LLM) receives the most relevant and reliable contextual information, leading to more accurate and trustworthy responses for users.

Architecture Overview

The system is designed with modularity and scalability in mind. It handles document ingestion, intelligent retrieval of relevant contexts based on user prompts, a sophisticated multi-stage context filtering process, and reliable response generation using an LLM. The core flow involves:

  1. Ingesting documents into various storage solutions and a vector database (Chroma).
  2. A user submitting a query/prompt via the User Interface.
  3. The Server coordinating the retrieval of initial relevant contexts.
  4. A multi-step context refinement process (Isolate Active Context, Keyword Extraction, Context Scoring, Filter Most Relevant Context).
  5. Two stages of LLM generation: one using keywords for an initial reliable response, and another using the most relevant context for the final reliable response.
  6. Delivering the final response back to the user.

System Components

Storage Layer

  • Google Cloud Storage / AWS S3 Bucket: External cloud storage solutions for long-term storage of raw documents.
  • Chroma: A vector database (likely used for storing embeddings of documents for efficient semantic search/retrieval).

Data Ingestion

  • Documents: Raw input data (e.g., text, PDFs) that is processed and stored.
  • Prompt: Input from the user via the UI, which is a prompt for querying existing data.

User Interface (UI)

  • User Interface: The front-end application where users interact with the system, submit documents/prompts, and receive the Final Response.

Server

  • Server: The central orchestrator of the system. It handles user Requests from the UI, communicates with the Retriever and LLM components, and sends Responses back to the UI. It also provides Status Messages during document ingestion.

Retriever

  • Retriever: Responsible for fetching Relevant Contexts from Storage based on the Prompt from the Server.

Context Filtering and Scoring

This is a critical multi-stage process designed to refine the initially retrieved contexts into the most reliable and relevant information for the LLM.

  • Isolate Active Context: Takes Relevant Contexts & Prompt and identifies Potential Contexts that are most likely to contain the answer.
  • Keyword Extraction: Extracts Keywords from the Potential Contexts.
  • Score Contexts: Evaluates the Potential Contexts and assigns a Score to each, resulting in Contexts & its Score.
  • Filter Most Relevant Context: Selects the Most Relevant Context based on the scores, ensuring the highest quality information is passed to the LLM.

Response Generation (LLM)

Two stages of LLM interaction are utilized to enhance response reliability:

  • Generate Reliable Response using Keywords: An LLM (e.g., GPT) generates a Reliable Response based on the Filtered Keywords. This acts as an initial pass or a safeguard.
  • Generate Reliable Response using Context: Again the same LLM instance generates the Reliable Response using the Most Relevant Context. This is expected to be the more comprehensive and accurate response.

How It Works (End-to-End Flow)

  1. Document Ingestion: Documents are uploaded to cloud storage (Google Cloud Storage/AWS S3/Chroma) and then processed into the Storage layer, likely involving chunking and embedding.
  2. User Query: A user submits a Prompt through the User Interface.
  3. Request Handling: The User Interface sends the Request to the Server.
  4. Initial Retrieval: The Server forwards the Prompt to the Retriever, which fetches Relevant Contexts from Storage.
  5. Context Refinement: The Relevant Contexts are passed through a series of filtering steps:
    • Isolate Active Context identifies Potential Contexts.
    • Keyword Extraction pulls Filtered Keywords from these contexts.
    • Score Contexts scores the Potential Contexts.
    • Filter Most Relevant Context selects the Most Relevant Context.
  6. LLM Response Generation:
    • An LLM generates a Reliable Response using Filtered Keywords or using the Most Relevant Context.
  7. Response Delivery: The final Reliable Response is sent from the Server back to the User Interface as the Final Response.

About

This repository contains the architecture and the implementation details for a Robust RAG (Retrieval-Augmented Generation) system designed to provide highly reliable and contextually relevant responses to user queries.

Topics

Resources

Stars

Watchers

Forks

Contributors