mcp-knowledge-base-template

Self-hosted, hybrid-search knowledge base exposed as an MCP server.

Why

Hybrid search beats either method alone for small-to-mid corpora:

FTS5 is fast and precise. Exact identifiers, named entities, and rare-token queries return what you expect.
Sentence-transformer embeddings catch paraphrase and concept proximity. "How do I tune recall?" finds the document that talks about thresholds and similarity scores even if it never uses those words.

For a personal or small-team knowledge base of a few thousand documents, hybrid SQLite + local embeddings is enough. No hosted vector store, no API costs, no data leaving your VPS.

Architecture

Three layers:

SQLite + FTS5. Documents, annotations, and tags. FTS5 virtual tables kept in sync via AFTER INSERT/UPDATE/DELETE triggers.
Local embeddings. paraphrase-multilingual-MiniLM-L12-v2 (384d, multilingual). Embeddings stored as float32 BLOBs, loaded into RAM at server start. Lazy-load fallback if sentence-transformers is absent.
MCP tool surface. Four tools, three read-only and one mutating. Every call logged to query_log for usage analytics.

The hybrid contract: FTS5 matches first, then semantic matches with cosine similarity above 0.3 fill in. No double-counting (semantic results that already appeared in FTS are skipped).

Tools

search_documents(query, tag?, source?) (read-only): hybrid search across documents, with optional filters
get_document(id) (read-only): single document with all annotations and tags
search_annotations(query, dimension?) (read-only): hybrid search scoped to annotations
add_annotation(document_id, dimension, text) (mutating): append a new annotation to an existing document

Quick start

git clone https://github.com/danielkliem/mcp-knowledge-base-template.git
cd mcp-knowledge-base-template
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

python seed.py            # populate ./kb.db with 8 example documents
python embed.py           # add embeddings (downloads model on first run, ~120 MB)
python server.py          # start the MCP server (HTTP at 0.0.0.0:8000)

Set MCP_KB_TRANSPORT=stdio to run over stdio for local Claude Desktop use:

MCP_KB_TRANSPORT=stdio python server.py

Configuration

Environment variables:

Variable	Default	Purpose
`MCP_KB_DB`	`./kb.db`	SQLite database path
`MCP_KB_TRANSPORT`	`http`	`http` or `stdio`
`MCP_KB_HOST`	`0.0.0.0`	HTTP bind host
`MCP_KB_PORT`	`8000`	HTTP bind port
`MCP_KB_API_TOKEN`	(unset)	Optional bearer token for HTTP transport

Deployment

See deploy/DEPLOY.md for an end-to-end VPS setup behind Caddy with token-in-URL auth. The deploy/ folder also ships a sample Caddyfile.example and systemd.service.example.

Scope and non-goals

This is a template. The entity model (documents, annotations, tags) is intentionally generic, the search machinery is the substance. Fork, replace the schema with your domain, keep the search layer.

Not included by design: write-side document ingestion (this is a read-mostly query layer), versioning, multi-tenant access control, distributed embedding storage, GPU inference. If you need any of these, this template is the wrong starting point.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
deploy		deploy
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
embed.py		embed.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
schema.sql		schema.sql
seed.py		seed.py
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mcp-knowledge-base-template

Why

Architecture

Tools

Quick start

Configuration

Deployment

Scope and non-goals

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mcp-knowledge-base-template

Why

Architecture

Tools

Quick start

Configuration

Deployment

Scope and non-goals

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages