A lightweight Retrieval-Augmented Generation (RAG) system built using FastAPI and Groq LLM.
- 📄 Upload PDF & TXT documents
- 🔍 Semantic search (cosine similarity)
- 🤖 LLM-powered answers (LLaMA 3 via Groq)
- ⚡ FastAPI backend
- 🌐 Simple web UI
- Python
- FastAPI
- Groq API
- HTML/CSS
pip install -r requirements.txt
uvicorn main:app --reloadOpen: http://localhost:8000
rag_project/
│
├── main.py ← FastAPI server (the "waiter")
├── rag_engine.py ← All RAG logic (the "kitchen")
├── requirements.txt ← Python packages to install
├── .env ← Your API key goes here
│
├── static/
│ └── index.html ← Chat UI (opens in browser)
│
└── documents/
└── sample_handbook.txt ← Sample document to test with
File → Open Folder → select rag_project
Terminal → New Terminal (or press Ctrl+`)
python -m venv venvActivate it:
- Windows:
venv\Scripts\activate - Mac/Linux:
source venv/bin/activate
You'll see (venv) appear in the terminal. Good.
pip install -r requirements.txtOpen .env and replace your-api-key-here with your real key:
ANTHROPIC_API_KEY=sk-ant-...
Get a key at: https://console.anthropic.com
uvicorn main:app --reloadYou'll see:
INFO: Uvicorn running on http://127.0.0.1:8000
Open your browser at: http://localhost:8000
The --reload flag means the server restarts automatically
whenever you save a file. Great for development.
- Click "Click to upload" in the sidebar
- Select
documents/sample_handbook.txt(or any .txt/.pdf) - Click "Upload & Index" — watch the terminal as it chunks and embeds
- Type a question like: "What is the vacation policy?"
- See the retrieved chunks + generated answer
| Method | URL | What it does |
|---|---|---|
| GET | / |
Opens the chat UI |
| POST | /upload |
Upload + index a document |
| POST | /ask |
Ask a question, get an answer |
| GET | /status |
See what's indexed |
| DELETE | /reset |
Clear all indexed docs |
You can also test the API directly at: http://localhost:8000/docs (FastAPI gives you a free interactive API explorer)
Your document
↓
[Load] → Read the raw text
↓
[Chunk] → Split into ~400 char pieces with overlap
↓
[Embed] → Convert each chunk to a vector (list of numbers)
↓
[Store] → Keep vectors in memory
--- When you ask a question ---
Your question
↓
[Embed question] → Same embedding model
↓
[Similarity search] → Find chunks with closest vectors
↓
[Augment prompt] → question + top 3 chunks
↓
[Generate] → Claude reads context and answers
↓
Answer!