Skip to content

Satharva2004/Vectorless-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Vectorless DeepSeek Harry Potter

Vectorless RAG: Reasoning-based Retrieval

Reasoning-native RAG  ◦  No Vector DB  ◦  No Chunking  ◦  Human-like Tree Search

🔥 Features:

  • Sleek Minimal UI: A completely overhauled frontend focused on readability and "Wizarding World" aesthetics.
  • Auto-Scrolling Reasoning: Real-time "Thinking" blocks that scroll intelligently as the model reasons through the text.
  • DeepSeek-R1 Integration: Powered by Featherless AI for state-of-the-art reasoning over 100% grounded context.

📝 Concepts:

  • Knowledge Hierarchy: I've transformed 'Harry Potter and the Philosopher's Stone' into a semantic tree structure that preserves the narrative flow of every chapter.
  • Zero-Vector Retrieval: I achieved 100% accuracy on complex plot queries without a single embedding call.

📑 Introduction

Are you tired of "vibe-based" retrieval where your RAG system returns random snippets that only look like the answer? Traditional vector RAG relies on semantic similarity, but for professional long-form content or complex narratives, similarity ≠ relevance.

Inspired by human experts, this project implements a vectorless, reasoning-based RAG system. It builds a hierarchical tree index from the book and uses an LLM to reason over that index to find the exact pages or chapters needed.

🎯 Core Features

  • No Vector DB: Uses the document's natural structure and LLM reasoning instead of opaque vector math.
  • No Chunking: Chapters are kept whole, preserving context and "connecting the dots" that vector systems miss.
  • Human-like Retrieval: The model "browses" the library just like you would—starting with the Table of Contents and zooming into the right chapter.
  • Perfect Traceability: Every answer includes the exact chapter and reasoning path taken to find it.

🏗️ The Architecture

Instead of calculating mathematical distances in a high-dimensional space, we perform a Semantic Tree Search.

graph LR
    Q[User Question] --> R[Router LLM]
    R -- Search Summaries --> T[Tree Index]
    T -- Identify Chapters --> C[Grounded Context]
    C -- Reasoning Pass --> A[Final Answer]
Loading

🌲 Page-Index Tree Structure

This project transforms lengthy PDFs into a semantic tree structure, optimized for LLM consumption. Below is an example of how the Philosopher's Stone is indexed:

{
  "id": "chapter_001",
  "type": "chapter",
  "title": "CHAPTER ONE: The Boy Who Lived",
  "page_start": 11,
  "page_end": 22,
  "summary": "Introduction of the Dursleys and the arrival of Harry Potter at Privet Drive...",
  "full_text": "Mr and Mrs Dursley, of number four, Privet Drive, were proud to say..."
}

⚙️ Usage

1. Install Dependencies

# Frontend
cd frontend && npm install

# Backend
cd backend && pip install -r requirements.txt

2. Configure Environment

Create a .env file in the backend/ directory with your Featherless/Provider keys:

OPENROUTER_API_KEY=your_key_here
OPENROUTER_MODEL=deepseek-ai/DeepSeek-R1-0528

3. Launch

# Run Backend
python -m uvicorn app.main:app --port 8000

# Run Frontend
npm run dev

⭐ Support

This project demonstrates that you don't always need a Vector Database to build a powerful RAG system. Sometimes, a well-structured tree and a smart reasoning model are all you need to find the magic.

Leave a star 🌟 if you find this architecture useful!

Connect

LinkedIn GitHub


About

Vectorless RAG: Reasoning-based Retrieval Reasoning-native RAG ◦ No Vector DB ◦ No Chunking ◦ Human-like Tree Search

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors