🤖handbook_assistant

A Retrieval-Augmented Generation (RAG) system designed for efficiently querying a structured handbook.

what to expect next:

Expanding "Knowledge base" to include 42-related informations and overall general facts and rules any 42/1337 student need to be aware of.
A dockerized environment for ease of deployment.
Web app UI for ease of use with intra account login.
Possibility of choosing your own embedding and LLM models.

About

handbook_assistant is a local Retrieval-Augmented Generation (RAG) system built to answer questions related to 1337 coding school using its official handbook and internal documentation. The system combines:

Structured document chunking
Dense vector embeddings
Semantic search with FAISS
Local LLM inference via Ollama

It is designed to only answer questions related to 1337, while rejecting unrelated queries or instructions.

Features

✅ Hierarchical markdown chunking (H1 / H2 aware)
✅ Token-aware chunk splitting for embeddings
✅ Local embeddings using nomic-embed-text
✅ Vector search with FAISS
✅ Query routing (retrieval vs chat)
✅ Ollama-based local LLM inference
✅ Strict scope enforcement (1337-only answers)
✅ Modular and extensible design

System Architecture

User Query
    │
    ▼
Query Router (Embedding Similarity / LLM fallback)
    ├── Chat → Generic LLM (scope-limited)
    └── Retrieval
          ├── Embed query
          ├── FAISS similarity search
          ├── Retrieve top-k chunks
          └── LLM answers using retrieved context

Project Structure

Click to expand

.
├── data/
│   ├── cleaned/
│   │   └── handbook_clean.md
│   ├── chunked/
│   │   └── handbook_chunked.jsonl
│   ├── index/
│   │   ├── handbook.faiss
│   │   └── metadata.json
│   ├── processed/
│   │   ├── images/
│   │   └── handbook.md
│   └── raw/
│       └── handbook.pdf
├── src/
│   ├── chat_retrieve.py
│   ├── cli_app.py
│   ├── prep_data.py
│   └── scripts/
│       ├── chunker.py
│       ├── answer.py
│       ├── clean_md.py
│       ├── memory.py
│       ├── pdf_to_md.py
│       ├── prompt.py
│       ├── trim_history.py
│       ├── embedding.py
│       ├── retrieval.py
│       ├── decision.py
│       └── token_length.py
├── README.md
└── requirements.txt

Setup

Clone the repository

git clone https://github.com/Sfeso13/handbook_assistant.git
cd handbook_assistant

Create and activate a virtual environment

python3 -m venv venv
source venv/bin/activate

Install dependencies

pip install -r requirements.txt

Make sure ollama is installed and running locally.

Pull required models

ollama pull qwen3:4b-instruct-2507-q4_K_M
ollama pull sanruss/qwen3-2b-rag
ollama pull nomic-embed-text-v2-moe

Usage

Start the assistant

python3 src/cli_app.py            #for general use
python3 src/cli_app.py --debug    #for printing debug messages

Query to your heart contents

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖handbook_assistant

📋 Table of Contents

About

Features

System Architecture

Project Structure

Setup

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🤖handbook_assistant

📋 Table of Contents

About

Features

System Architecture

Project Structure

Setup

Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages