Skip to content

RushyanthN/Advanced-Modular-RAG-with-Agentic-Routing-and-Self-Correction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Finance RAG - Advanced RAG Pipeline for SEC 10-K Filings

An advanced Retrieval-Augmented Generation (RAG) system that answers questions about Apple and Tesla SEC 10-K annual filings using state-of-the-art NLP techniques.

RAG Architecture

This pipeline implements 6 advanced RAG techniques:

  1. Parent-Child Chunking (Dense X) -- Small child chunks (512 chars) for precise retrieval, large parent chunks (2048 chars) for rich context
  2. RAG Fusion -- Generates multiple query variations and fuses results with Reciprocal Rank Fusion
  3. Stepback Prompting -- Generates abstract queries for broader context retrieval
  4. Semantic Routing -- Classifies queries (financial metrics, risk factors, comparisons, etc.) to optimize retrieval strategy
  5. Cross-Encoder Reranking -- BGE reranker for precise relevance scoring after initial retrieval
  6. Adaptive Generation -- Adjusts response strategy (high/medium/low confidence) based on retrieval quality

Tech Stack

Component Technology
LLM Google Gemini 2.0 Flash
Embeddings BAAI/bge-base-en-v1.5
Reranker BAAI/bge-reranker-base
Vector DB ChromaDB

Data Coverage

  • Apple 10-K filings: 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025
  • Tesla 10-K filings: 2020, 2021, 2023, 2024, 2025
  • 2,016 pages and 1,735 table documents indexed

Example Queries

  • "What was Apple's revenue in 2023 compared to 2020?"
  • "What are Tesla's main risk factors?"
  • "Compare Apple and Tesla's profit margins"
  • "How has Apple's R&D spending changed over the years?"

About

Hybrid Multi-Stage Retrieval-Augmented Generation with Semantic Routing, Query Expansion, Cross-Encoder Reranking, and Adaptive Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors