A Retrieval-Augmented Generation (RAG) system designed for efficiently querying a structured handbook.
what to expect next:
- Expanding "Knowledge base" to include 42-related informations and overall general facts and rules any 42/1337 student need to be aware of.
- A dockerized environment for ease of deployment.
- Web app UI for ease of use with intra account login.
- Possibility of choosing your own embedding and LLM models.
handbook_assistant is a local Retrieval-Augmented Generation (RAG) system built to answer questions related to 1337 coding school using its official handbook and internal documentation. The system combines:
- Structured document chunking
- Dense vector embeddings
- Semantic search with FAISS
- Local LLM inference via Ollama
It is designed to only answer questions related to 1337, while rejecting unrelated queries or instructions.
- ✅ Hierarchical markdown chunking (H1 / H2 aware)
- ✅ Token-aware chunk splitting for embeddings
- ✅ Local embeddings using nomic-embed-text
- ✅ Vector search with FAISS
- ✅ Query routing (retrieval vs chat)
- ✅ Ollama-based local LLM inference
- ✅ Strict scope enforcement (1337-only answers)
- ✅ Modular and extensible design
User Query
│
▼
Query Router (Embedding Similarity / LLM fallback)
├── Chat → Generic LLM (scope-limited)
└── Retrieval
├── Embed query
├── FAISS similarity search
├── Retrieve top-k chunks
└── LLM answers using retrieved context
Click to expand
.
├── data/
│ ├── cleaned/
│ │ └── handbook_clean.md
│ ├── chunked/
│ │ └── handbook_chunked.jsonl
│ ├── index/
│ │ ├── handbook.faiss
│ │ └── metadata.json
│ ├── processed/
│ │ ├── images/
│ │ └── handbook.md
│ └── raw/
│ └── handbook.pdf
├── src/
│ ├── chat_retrieve.py
│ ├── cli_app.py
│ ├── prep_data.py
│ └── scripts/
│ ├── chunker.py
│ ├── answer.py
│ ├── clean_md.py
│ ├── memory.py
│ ├── pdf_to_md.py
│ ├── prompt.py
│ ├── trim_history.py
│ ├── embedding.py
│ ├── retrieval.py
│ ├── decision.py
│ └── token_length.py
├── README.md
└── requirements.txt
- Clone the repository
git clone https://github.com/Sfeso13/handbook_assistant.git
cd handbook_assistant- Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate- Install dependencies
pip install -r requirements.txtMake sure ollama is installed and running locally.
- Pull required models
ollama pull qwen3:4b-instruct-2507-q4_K_M
ollama pull sanruss/qwen3-2b-rag
ollama pull nomic-embed-text-v2-moe- Start the assistant
python3 src/cli_app.py #for general use
python3 src/cli_app.py --debug #for printing debug messages- Query to your heart contents