This project is a Retrieval Augmented Generation (RAG) system that answers nutrition related questions using grounded context from a source document. It uses Python for chunking and embeddings, Supabase as the vector store, and a Next.js app as the interface for asking questions.
- End to end RAG pipeline
- Python based chunking and embedding generation
- Supabase Vector Store for semantic search
- LLM answers restricted to retrieved document context
- Next.js frontend and API routes
- Clean and modular structure so it is easy to extend
NUTRITION_RAG/
│
├── rag-chat/ # Next.js frontend and API routes
│ ├── src/
│ ├── public/
│ ├── .env
│ ├── .env.local
│ ├── next-env.d.ts
│ ├── next.config.ts
│ ├── package.json
│ └── postcss.config.mjs
│
├── human-nutrition-text.pdf # Nutrition document used for ingestion
│
├── ingest.py # Chunking, embeddings, Supabase upload
├── test_embeddings.py # Simple embedding pipeline test
├── requirements.txt # Python dependencies
│
└── README.md # Project overview
The ingest.py script:
- Loads the nutrition PDF
- Splits the text into meaningful chunks
- Generates embeddings using a sentence transformer model
- Uploads vectors and metadata to a Supabase table
You can use test_embeddings.py to quickly verify that the embedding and Supabase connection are working.
Supabase stores:
- Chunk text
- Embedding vectors
- Any metadata such as page number or section
- Timestamps
This makes it easy to search semantically over your document using vector similarity.
When a user asks a question in the Next.js app:
- The Next.js API route calls Supabase with the question embedding
- Supabase returns the most relevant chunks
- These chunks are passed as context to the LLM
- The LLM generates an answer that stays grounded in the retrieved text
The prompt is designed so that the LLM only uses retrieved context from the document and clearly says if the answer is not present. This helps reduce hallucinations and keeps answers faithful to the source.
pip install -r requirements.txt
python ingest.pyCreate a .env file in the project root and add your Supabase values:
SUPABASE_URL=your_supabase_url
SUPABASE_KEY=your_supabase_service_role_or_anon_key
SUPABASE_TABLE=your_table_nameYou can optionally use .env.local for values that only apply on your machine and should not be committed.
cd rag-chat
npm install
npm run devAdd the same Supabase and model related environment variables in rag-chat/.env or rag-chat/.env.local as needed.
Question: "What vitamins are mentioned in the document and what benefits do they provide"
The system retrieves the relevant chunks from Supabase and generates an answer using only that context.
- Python
- Sentence Transformers
- Supabase Vector Store
- Next.js with React and TypeScript
- OpenAI or compatible LLM provider
- Add a simple chat style UI for follow up questions
- Support multiple PDFs and sources
- Highlight citations in the answer by linking back to chunks
- Add evaluation scripts to measure retrieval quality