Self-hosted, hybrid-search knowledge base exposed as an MCP server.
Hybrid search beats either method alone for small-to-mid corpora:
- FTS5 is fast and precise. Exact identifiers, named entities, and rare-token queries return what you expect.
- Sentence-transformer embeddings catch paraphrase and concept proximity. "How do I tune recall?" finds the document that talks about thresholds and similarity scores even if it never uses those words.
For a personal or small-team knowledge base of a few thousand documents, hybrid SQLite + local embeddings is enough. No hosted vector store, no API costs, no data leaving your VPS.
Three layers:
- SQLite + FTS5. Documents, annotations, and tags. FTS5 virtual tables kept in sync via AFTER INSERT/UPDATE/DELETE triggers.
- Local embeddings.
paraphrase-multilingual-MiniLM-L12-v2(384d, multilingual). Embeddings stored asfloat32BLOBs, loaded into RAM at server start. Lazy-load fallback ifsentence-transformersis absent. - MCP tool surface. Four tools, three read-only and one mutating.
Every call logged to
query_logfor usage analytics.
The hybrid contract: FTS5 matches first, then semantic matches with cosine similarity above 0.3 fill in. No double-counting (semantic results that already appeared in FTS are skipped).
search_documents(query, tag?, source?)(read-only): hybrid search across documents, with optional filtersget_document(id)(read-only): single document with all annotations and tagssearch_annotations(query, dimension?)(read-only): hybrid search scoped to annotationsadd_annotation(document_id, dimension, text)(mutating): append a new annotation to an existing document
git clone https://github.com/danielkliem/mcp-knowledge-base-template.git
cd mcp-knowledge-base-template
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python seed.py # populate ./kb.db with 8 example documents
python embed.py # add embeddings (downloads model on first run, ~120 MB)
python server.py # start the MCP server (HTTP at 0.0.0.0:8000)Set MCP_KB_TRANSPORT=stdio to run over stdio for local Claude Desktop
use:
MCP_KB_TRANSPORT=stdio python server.pyEnvironment variables:
| Variable | Default | Purpose |
|---|---|---|
MCP_KB_DB |
./kb.db |
SQLite database path |
MCP_KB_TRANSPORT |
http |
http or stdio |
MCP_KB_HOST |
0.0.0.0 |
HTTP bind host |
MCP_KB_PORT |
8000 |
HTTP bind port |
MCP_KB_API_TOKEN |
(unset) | Optional bearer token for HTTP transport |
See deploy/DEPLOY.md for an end-to-end VPS setup
behind Caddy with token-in-URL auth. The deploy/ folder also ships a
sample Caddyfile.example and systemd.service.example.
This is a template. The entity model (documents, annotations,
tags) is intentionally generic, the search machinery is the
substance. Fork, replace the schema with your domain, keep the search
layer.
Not included by design: write-side document ingestion (this is a read-mostly query layer), versioning, multi-tenant access control, distributed embedding storage, GPU inference. If you need any of these, this template is the wrong starting point.
MIT. See LICENSE.