A private document RAG (Retrieval-Augmented Generation) system that ingests PDFs and exposes a search tool via an MCP server. Retrieval combines vector search (pgvector) and BM25 keyword search with cross-encoder reranking, and answers are generated by an AWS Bedrock LLM.
flowchart LR
subgraph Ingestion
direction TB
A[📄 data/ PDFs] --> B[Docling<br/>PDF Parser]
B --> C[HybridChunker<br/>BAAI/bge-m3 tokenizer]
C --> D[HuggingFace Embeddings<br/>BAAI/bge-m3 · 1024-dim]
D --> E[(pgvector<br/>PostgreSQL)]
C --> F[(Redis<br/>BM25 Docstore)]
end
subgraph Query["Query — mcp_server.py"]
direction TB
G[search_knowledge<br/>tool call] --> H[Vector Retriever<br/>pgvector]
G --> I[BM25 Retriever<br/>Redis]
H & I --> J[QueryFusionRetriever<br/>relative_score fusion]
J --> K[Cross-encoder Reranker<br/>BAAI/bge-reranker-large]
K --> L[BedrockConverse LLM]
L --> M[Answer + Sources]
end
E --> H
F --> I
- Python 3.11+
- uv
- Docker & Docker Compose
- AWS credentials with Bedrock access
ingestion/config.py is the source of truth for supported environment variables and defaults. Copy .env.example to .env and update credentials/endpoints for your environment.
DATABASE_URL=postgresql://chat-app:admin@localhost:5432/chat_app
BEDROCK_API_KEY=<your-aws-bearer-token>
AWS_REGION=eu-central-1
DATA_DIR=./data- Install Python dependencies:
uv sync- Start backing services:
docker compose up pgvector redis -d- Enqueue ingestion jobs:
uv run python ingest.py- Start one or more workers:
uv run python worker.py- Query locally (optional):
uv run python ask.py "your question"- Run MCP server:
uv run python mcp_server.pyThe server starts on http://localhost:8000 and exposes:
| Tool | Description |
|---|---|
search_knowledge |
Searches the knowledge base and returns an answer with source file citations |
Build images:
docker compose buildRun full stack:
docker compose up -dRun only ingestion infrastructure + workers:
docker compose up -d pgvector redis workerScale worker count:
docker compose up -d --scale worker=3 workerRun ingestion from inside the worker container (optional):
docker compose exec worker bash
uv run python ingest.py- Add dependencies with
uv add <package>and commit bothpyproject.tomlanduv.lock. - Rebuild images after dependency changes with
docker compose build. - When adding or renaming settings, update both
ingestion/config.pyand.env.example.
.
├── data/ # PDF documents to ingest
├── ingestion/
│ ├── config.py # Pydantic settings (loaded from .env)
│ ├── pipeline.py # Docling parsing, embedding, pgvector + Redis ingestion
│ ├── queue.py # Redis/RQ enqueueing for ingestion jobs
│ └── tasks.py # Worker task wrappers around ingestion functions
├── query/
│ └── engine.py # Hybrid retriever + reranker + Bedrock LLM query engine
├── ingest.py # Ingestion queue producer entry point
├── worker.py # RQ worker entry point
├── ask.py # Local interactive query CLI
├── mcp_server.py # FastMCP server exposing search_knowledge tool
├── Dockerfile
└── docker-compose.yml