Self-organizing Markdown knowledge store for AI agents.
Agents write facts with ## field headers. Rust parses, indexes, and auto-organizes them into a directory tree. No SQL. No embeddings. No schema. The filesystem is the database.
knowledge append pitfalls '## tool
fastapi
## severity
high
## source
UVicorn timeout causes 504 on slow async endpoints
## fix
Set timeout_keep_alive=300'
knowledge search pitfalls 'tool:fastapi timeout'
knowledge read pitfalls
Rust stores it as pitfalls/fastapi/high/uvicorn-timeout-causes-504-on.md. Auto-splits directories when they grow too large. The agent never sees the filesystem — just reads and searches the logical category.
cargo install knowledge-dbgit clone https://github.com/workswithagents/knowledge-db
cd knowledge-db
cargo build --release
# Binary at ./target/release/knowledgeIf the Rust binary isn't available, use the Python fallback script. It's feature-complete and drop-in compatible:
# Make executable and add to PATH, or run directly:
python3 knowledge-py append pitfalls '## tool\nfastapi\n...'
python3 knowledge-py read pitfalls
python3 knowledge-py search pitfalls 'fastapi timeout'
python3 knowledge-py statsThe Python fallback uses the same store directory and file format. Agents built on the Python fallback will seamlessly upgrade to the Rust binary later.
| Category | What goes in it |
|---|---|
pitfalls |
Bugs, gotchas, lessons learned |
fixes |
Solutions, workarounds, patches |
workflows |
Procedures, patterns, recipes |
facts |
General observations, preferences |
user |
User profile data, preferences |
You can create arbitrary categories — just use the name. Categories are logical: an agent reads pitfalls and gets all pitfalls assembled as one document, regardless of how many files or directories are behind it.
Every entry uses ## field headers followed by values:
## tool
fastapi
## severity
high
## source
What happened, what was observed
## fix
How to fix or work around itPitfalls:
| Field | Meaning |
|---|---|
tool |
Framework/library involved |
severity |
high / medium / low |
source |
What happened |
fix |
How to resolve it |
Fixes:
| Field | Meaning |
|---|---|
tool |
Framework/library |
problem |
What was fixed |
solution |
How it was fixed |
Facts:
| Field | Meaning |
|---|---|
topic |
Subject area |
detail |
The fact itself |
Workflows:
| Field | Meaning |
|---|---|
task |
What this workflow does |
steps |
Ordered procedure |
- First line starting with
##begins a new field - Everything until the next
##line belongs to the current field - Keys are lowercased; values preserve original casing
- Duplicate field names: last one wins
- Lines before the first
##heading are ignored - Empty fields (heading with no content) are skipped
- Values can be multi-line
knowledge append <category> '<markdown content>'
# From stdin:
echo '## tool\ndocker\n\n## source\nContainer leak' | knowledge append pitfalls
# From file:
knowledge append pitfalls "$(cat entry.md)"Returns the relative path where the entry was stored.
Field tool, severity, domain, and type affect directory nesting — entries with those fields get organized into subdirectories (e.g. pitfalls/fastapi/high/timeout-bug.md).
knowledge read <category>
knowledge read pitfallsWalks the directory tree and assembles all .md files into a single output. Entries separated by ---.
Sub-path read for scoped access:
knowledge read pitfalls fastapi
# Only reads pitfalls/fastapi/...knowledge search <category> '<query>'
# Plain tokens — content match:
knowledge search pitfalls 'fastapi timeout'
# Field filters:
knowledge search pitfalls 'tool:fastapi'
# Mixed — field filter + content:
knowledge search pitfalls 'tool:fastapi timeout'
# Severity filter:
knowledge search pitfalls 'severity:high'Syntax:
word— matches content (case-insensitive)field:value— exact field value match (case-insensitive)- Multiple tokens combined with AND logic
Results ranked by match count, returned as assembled markdown.
knowledge stats # All categories
knowledge stats pitfalls # Single category breakdownShows entry counts, field distribution, and top field values per category.
knowledge aliases # Show current aliases
knowledge aliases --generate # Rebuild aliases.json from storeWhat it does: when you search knowledge search facts "wwa", the alias engine expands wwa → project:wwa automatically. No special syntax needed. Aliases are auto-generated from ## domain, ## project, and ## type fields on every append. Manual entries can be added to aliases.json and survive auto-regeneration.
# These all work without knowing the field:value syntax:
knowledge search facts "wwa" # → project:wwa
knowledge search facts "deploy" # → domain:workflow
knowledge search facts "bastion colors" # → project:bastion + domain:design
knowledge search facts "comms publish" # → project:comms + content matchknowledge dedup --dry-run # Show what would merge
knowledge dedup --threshold 0.75 # Custom similarity (default 0.85)
knowledge dedup # Execute mergeFinds entries with high word-overlap similarity. Merges duplicates by adding ## merged_from to the survivor.
knowledge prune --dry-run # Show what would delete
knowledge prune --days 30 # Remove entries older than 30 days
knowledge prune --days 90 # Default: 90 daysknowledge mount /tmp/knowledge-fs
knowledge unmount /tmp/knowledge-fsCreates a read-optimized directory tree:
pitfalls.md— assembled markdown of all pitfallspitfalls/tool/severity/entry.md— individual entry stubs
Useful for tools that expect a filesystem (grep, editors, backup scripts).
cargo install knowledge-db --features watch
knowledge daemon --mount /tmp/knowledge-fsWatches the store directory and auto-rebuilds the virtual filesystem on changes. Useful for dashboards and live editors.
If you're building an AI agent that needs persistent knowledge, expose these three tools:
# Tool: knowledge_write
def knowledge_write(category: str, content: str) -> str:
"""Append a markdown entry to the knowledge store."""
return run(f"knowledge append {category} {shlex.quote(content)}")
# Tool: knowledge_read
def knowledge_read(category: str, path: str = None) -> str:
"""Read all entries in a category."""
if path:
return run(f"knowledge read {category} {path}")
return run(f"knowledge read {category}")
# Tool: knowledge_search
def knowledge_search(category: str, query: str) -> str:
"""Search entries by tokens and field filters."""
return run(f"knowledge search {category} {shlex.quote(query)}")Include this in your agent's system prompt:
## Knowledge store usage
You have a persistent knowledge store at ~/.hermes/knowledge/.
Use it to remember facts, pitfalls, fixes, and workflows across sessions.
When to write:
- You discover a bug, tricky edge case, or gotcha → knowledge_write("pitfalls", ...)
- You find a solution or workaround → knowledge_write("fixes", ...)
- You learn a multi-step procedure → knowledge_write("workflows", ...)
- You observe a fact or user preference → knowledge_write("facts", ...)
Entry format uses ## field headers:
## tool\nfastapi\n\n## severity\nhigh\n\n## source\nDescription\n\n## fix\nSolution
Always search the store before troubleshooting: check if this problem has been seen before.
If you use Hermes, the knowledge tool is built-in. Configure your agent to use knowledge-db as the backend:
# config.yaml
knowledge:
backend: knowledge-db
store_path: ~/.hermes/knowledgeThe Hermes knowledge tool maps directly to knowledge-db commands — action=write maps to append, action=search maps to search, action=read maps to read.
For agents not using Hermes, wrap the CLI:
# Write
echo '## tool
docker
## severity
medium
## source
Container memory leak under load
## fix
Set --memory limit in docker run' | knowledge append pitfalls
# Search
knowledge search pitfalls 'docker memory leak'
# Read
knowledge read pitfalls# 1. Write a test entry
knowledge append test '## topic\ntesting\n\n## detail\nIntegration works'
# 2. Read it back
knowledge read test
# 3. Search it
knowledge search test 'integration'
# 4. Check stats
knowledge stats test
# 5. Clean up (delete the test category directory)
rm -rf ~/.hermes/knowledge/testEach append does this:
- Parse
## fieldheaders from the markdown - Resolve directory path from key fields (
tool,severity,domain,type) - Generate filename slug from
title→source→fix→ first long-enough value - Acquire file lock, write
.mdfile, update flat-text index, release lock - Check if parent directory exceeds 50 files → auto-split if needed
When a directory hits 50+ .md files, the engine:
- Scans all entries in the directory
- Picks the best field to split on (cardinality ratio 0.05–0.3 — produces 5–50 subdirectories)
- Creates subdirectories per distinct field value
- Moves files into their subdirectories
- Rebuilds the index
The agent never notices. knowledge read pitfalls works identically before and after splitting.
A flat-text index file at .index maps relative paths to field:value pairs. Rebuilt on startup if missing. Format:
pitfalls/fastapi/high/uvicorn-timeout.md → tool: fastapi, severity: high, source: UVicorn timeout causes 504
File locking via flock (fs2 crate). Multiple agents writing simultaneously won't corrupt each other's entries. Lock contention returns an error — agents should retry.
knowledge append → markdown file → directory tree
knowledge search → flat-text index → assembled markdown
knowledge read → tree walk → assembled markdown
| Component | Lines | Purpose |
|---|---|---|
parser.rs |
127 | ## field → value extraction |
store.rs |
882 | Index, append, search, read, dedup, prune, split |
main.rs |
493 | CLI, stats, mount, daemon |
knowledge-py |
180 | Standalone Python fallback |
- Binary size: ~1.4 MB release build
- Dependencies: slug, fs2, serde, clap, chrono (+ notify for watch feature)
- Runtime: zero runtime deps, no daemon required, no SQL, no embeddings
At 500 files, rg beats SQLite. At 5,000 files, the auto-split engine keeps directories small. The filesystem IS the index — path components encode structure, grep handles search, ls handles listing.
Benefits:
- Git-friendly: the store is just markdown files — version control works naturally
- No migration: add a new field, old entries just lack it; no ALTER TABLE
- No lock contention: per-file locking, not database-level
- Transparent:
ls,grep,catall work directly on the store - Zero setup: no daemon, no connection string, no migrations
If you need JOINs, aggregates, or complex queries — add a SQLite read cache. But you probably don't.
# Clone
git clone https://github.com/workswithagents/knowledge-db
cd knowledge-db
# Build
cargo build
# Run tests
cargo test
# Release build
cargo build --release
# Run locally
cargo run -- append test '## topic\ntesting\n\n## detail\nhello'
# With watch feature
cargo run --features watch -- daemoncargo publishMIT