RAG Grounded QA System

This project is a Retrieval Augmented Generation (RAG) system that answers nutrition related questions using grounded context from a source document. It uses Python for chunking and embeddings, Supabase as the vector store, and a Next.js app as the interface for asking questions.

Features

End to end RAG pipeline
Python based chunking and embedding generation
Supabase Vector Store for semantic search
LLM answers restricted to retrieved document context
Next.js frontend and API routes
Clean and modular structure so it is easy to extend

Folder Structure

NUTRITION_RAG/
│
├── rag-chat/                   # Next.js frontend and API routes
│   ├── src/
│   ├── public/
│   ├── .env
│   ├── .env.local
│   ├── next-env.d.ts
│   ├── next.config.ts
│   ├── package.json
│   └── postcss.config.mjs
│
├── human-nutrition-text.pdf    # Nutrition document used for ingestion
│
├── ingest.py                   # Chunking, embeddings, Supabase upload
├── test_embeddings.py          # Simple embedding pipeline test
├── requirements.txt            # Python dependencies
│
└── README.md                   # Project overview

How It Works

1. Ingestion and Embeddings (Python)

The ingest.py script:

Loads the nutrition PDF
Splits the text into meaningful chunks
Generates embeddings using a sentence transformer model
Uploads vectors and metadata to a Supabase table

You can use test_embeddings.py to quickly verify that the embedding and Supabase connection are working.

2. Storage in Supabase

Supabase stores:

Chunk text
Embedding vectors
Any metadata such as page number or section
Timestamps

This makes it easy to search semantically over your document using vector similarity.

3. Retrieval Flow

When a user asks a question in the Next.js app:

The Next.js API route calls Supabase with the question embedding
Supabase returns the most relevant chunks
These chunks are passed as context to the LLM
The LLM generates an answer that stays grounded in the retrieved text

4. Grounded Answer Generation

The prompt is designed so that the LLM only uses retrieved context from the document and clearly says if the answer is not present. This helps reduce hallucinations and keeps answers faithful to the source.

Setup

Python pipeline

pip install -r requirements.txt
python ingest.py

Create a .env file in the project root and add your Supabase values:

SUPABASE_URL=your_supabase_url
SUPABASE_KEY=your_supabase_service_role_or_anon_key
SUPABASE_TABLE=your_table_name

You can optionally use .env.local for values that only apply on your machine and should not be committed.

Next.js app

cd rag-chat
npm install
npm run dev

Add the same Supabase and model related environment variables in rag-chat/.env or rag-chat/.env.local as needed.

Example Query

Question: "What vitamins are mentioned in the document and what benefits do they provide"

The system retrieves the relevant chunks from Supabase and generates an answer using only that context.

Tech Stack

Python
Sentence Transformers
Supabase Vector Store
Next.js with React and TypeScript
OpenAI or compatible LLM provider

Possible Next Steps

Add a simple chat style UI for follow up questions
Support multiple PDFs and sources
Highlight citations in the answer by linking back to chunks
Add evaluation scripts to measure retrieval quality

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Grounded QA System

Features

Folder Structure

How It Works

1. Ingestion and Embeddings (Python)

2. Storage in Supabase

3. Retrieval Flow

4. Grounded Answer Generation

Setup

Python pipeline

Next.js app

Example Query

Tech Stack

Possible Next Steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
rag-chat		rag-chat
.DS_Store		.DS_Store
README.md		README.md
gitignore		gitignore
human-nutrition-text.pdf		human-nutrition-text.pdf
ingest.py		ingest.py
requirements.txt		requirements.txt
test_embeddings.py		test_embeddings.py

Folders and files

Latest commit

History

Repository files navigation

RAG Grounded QA System

Features

Folder Structure

How It Works

1. Ingestion and Embeddings (Python)

2. Storage in Supabase

3. Retrieval Flow

4. Grounded Answer Generation

Setup

Python pipeline

Next.js app

Example Query

Tech Stack

Possible Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages