Skip to content

Sfeso13/handbook_assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖handbook_assistant

A Retrieval-Augmented Generation (RAG) system designed for efficiently querying a structured handbook.

what to expect next:

  • Expanding "Knowledge base" to include 42-related informations and overall general facts and rules any 42/1337 student need to be aware of.
  • A dockerized environment for ease of deployment.
  • Web app UI for ease of use with intra account login.
  • Possibility of choosing your own embedding and LLM models.

📋 Table of Contents


Film Projector About

handbook_assistant is a local Retrieval-Augmented Generation (RAG) system built to answer questions related to 1337 coding school using its official handbook and internal documentation. The system combines:

  • Structured document chunking
  • Dense vector embeddings
  • Semantic search with FAISS
  • Local LLM inference via Ollama

It is designed to only answer questions related to 1337, while rejecting unrelated queries or instructions.


Control Knobs Features

  • ✅ Hierarchical markdown chunking (H1 / H2 aware)
  • ✅ Token-aware chunk splitting for embeddings
  • ✅ Local embeddings using nomic-embed-text
  • ✅ Vector search with FAISS
  • ✅ Query routing (retrieval vs chat)
  • ✅ Ollama-based local LLM inference
  • ✅ Strict scope enforcement (1337-only answers)
  • ✅ Modular and extensible design

Alien Monster System Architecture

User Query
    │
    ▼
Query Router (Embedding Similarity / LLM fallback)
    ├── Chat → Generic LLM (scope-limited)
    └── Retrieval
          ├── Embed query
          ├── FAISS similarity search
          ├── Retrieve top-k chunks
          └── LLM answers using retrieved context


Card File Box Project Structure

Click to expand
.
├── data/
│   ├── cleaned/
│   │   └── handbook_clean.md
│   ├── chunked/
│   │   └── handbook_chunked.jsonl
│   ├── index/
│   │   ├── handbook.faiss
│   │   └── metadata.json
│   ├── processed/
│   │   ├── images/
│   │   └── handbook.md
│   └── raw/
│       └── handbook.pdf
├── src/
│   ├── chat_retrieve.py
│   ├── cli_app.py
│   ├── prep_data.py
│   └── scripts/
│       ├── chunker.py
│       ├── answer.py
│       ├── clean_md.py
│       ├── memory.py
│       ├── pdf_to_md.py
│       ├── prompt.py
│       ├── trim_history.py
│       ├── embedding.py
│       ├── retrieval.py
│       ├── decision.py
│       └── token_length.py
├── README.md
└── requirements.txt
  

Gear Setup

  1. Clone the repository
git clone https://github.com/Sfeso13/handbook_assistant.git
cd handbook_assistant
  1. Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate
  1. Install dependencies
pip install -r requirements.txt

Make sure ollama is installed and running locally.

  1. Pull required models
ollama pull qwen3:4b-instruct-2507-q4_K_M
ollama pull sanruss/qwen3-2b-rag
ollama pull nomic-embed-text-v2-moe

Abacus Usage

  1. Start the assistant
python3 src/cli_app.py            #for general use
python3 src/cli_app.py --debug    #for printing debug messages
  1. Query to your heart contents

About

RAG-based chatbot for querying 42 / 1337 related rules and informations. Combines Nomic embeddings + FAISS retrieval with LLM generation for context-aware answers and conversational memory.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages