knowledge-db

Self-organizing Markdown knowledge store for AI agents.

Agents write facts with ## field headers. Rust parses, indexes, and auto-organizes them into a directory tree. No SQL. No embeddings. No schema. The filesystem is the database.

knowledge append pitfalls '## tool
fastapi

## severity
high

## source
UVicorn timeout causes 504 on slow async endpoints

## fix
Set timeout_keep_alive=300'

knowledge search pitfalls 'tool:fastapi timeout'
knowledge read pitfalls

Rust stores it as pitfalls/fastapi/high/uvicorn-timeout-causes-504-on.md. Auto-splits directories when they grow too large. The agent never sees the filesystem — just reads and searches the logical category.

Install

From crates.io

cargo install knowledge-db

From source

git clone https://github.com/workswithagents/knowledge-db
cd knowledge-db
cargo build --release
# Binary at ./target/release/knowledge

Python fallback

If the Rust binary isn't available, use the Python fallback script. It's feature-complete and drop-in compatible:

# Make executable and add to PATH, or run directly:
python3 knowledge-py append pitfalls '## tool\nfastapi\n...'
python3 knowledge-py read pitfalls
python3 knowledge-py search pitfalls 'fastapi timeout'
python3 knowledge-py stats

The Python fallback uses the same store directory and file format. Agents built on the Python fallback will seamlessly upgrade to the Rust binary later.

Categories (logical files)

Category	What goes in it
`pitfalls`	Bugs, gotchas, lessons learned
`fixes`	Solutions, workarounds, patches
`workflows`	Procedures, patterns, recipes
`facts`	General observations, preferences
`user`	User profile data, preferences

You can create arbitrary categories — just use the name. Categories are logical: an agent reads pitfalls and gets all pitfalls assembled as one document, regardless of how many files or directories are behind it.

Field format

Every entry uses ## field headers followed by values:

## tool
fastapi

## severity
high

## source
What happened, what was observed

## fix
How to fix or work around it

Common fields by category

Pitfalls:

Field	Meaning
`tool`	Framework/library involved
`severity`	high / medium / low
`source`	What happened
`fix`	How to resolve it

Fixes:

Field	Meaning
`tool`	Framework/library
`problem`	What was fixed
`solution`	How it was fixed

Facts:

Field	Meaning
`topic`	Subject area
`detail`	The fact itself

Workflows:

Field	Meaning
`task`	What this workflow does
`steps`	Ordered procedure

Rules

First line starting with ## begins a new field
Everything until the next ## line belongs to the current field
Keys are lowercased; values preserve original casing
Duplicate field names: last one wins
Lines before the first ## heading are ignored
Empty fields (heading with no content) are skipped
Values can be multi-line

Commands

`append` — Write an entry

knowledge append <category> '<markdown content>'

# From stdin:
echo '## tool\ndocker\n\n## source\nContainer leak' | knowledge append pitfalls

# From file:
knowledge append pitfalls "$(cat entry.md)"

Returns the relative path where the entry was stored.

Field tool, severity, domain, and type affect directory nesting — entries with those fields get organized into subdirectories (e.g. pitfalls/fastapi/high/timeout-bug.md).

`read` — Assemble a category

knowledge read <category>
knowledge read pitfalls

Walks the directory tree and assembles all .md files into a single output. Entries separated by ---.

Sub-path read for scoped access:

knowledge read pitfalls fastapi
# Only reads pitfalls/fastapi/...

`search` — Find entries

knowledge search <category> '<query>'

# Plain tokens — content match:
knowledge search pitfalls 'fastapi timeout'

# Field filters:
knowledge search pitfalls 'tool:fastapi'

# Mixed — field filter + content:
knowledge search pitfalls 'tool:fastapi timeout'

# Severity filter:
knowledge search pitfalls 'severity:high'

Syntax:

word — matches content (case-insensitive)
field:value — exact field value match (case-insensitive)
Multiple tokens combined with AND logic

Results ranked by match count, returned as assembled markdown.

`stats` — Store statistics

knowledge stats              # All categories
knowledge stats pitfalls     # Single category breakdown

Shows entry counts, field distribution, and top field values per category.

`aliases` — Query alias expansion

knowledge aliases                # Show current aliases
knowledge aliases --generate     # Rebuild aliases.json from store

What it does: when you search knowledge search facts "wwa", the alias engine expands wwa → project:wwa automatically. No special syntax needed. Aliases are auto-generated from ## domain, ## project, and ## type fields on every append. Manual entries can be added to aliases.json and survive auto-regeneration.

# These all work without knowing the field:value syntax:
knowledge search facts "wwa"           # → project:wwa
knowledge search facts "deploy"        # → domain:workflow
knowledge search facts "bastion colors" # → project:bastion + domain:design
knowledge search facts "comms publish"  # → project:comms + content match

`dedup` — Find duplicates

knowledge dedup --dry-run                # Show what would merge
knowledge dedup --threshold 0.75         # Custom similarity (default 0.85)
knowledge dedup                          # Execute merge

Finds entries with high word-overlap similarity. Merges duplicates by adding ## merged_from to the survivor.

`prune` — Remove old entries

knowledge prune --dry-run                # Show what would delete
knowledge prune --days 30                # Remove entries older than 30 days
knowledge prune --days 90                # Default: 90 days

`mount` — Virtual filesystem

knowledge mount /tmp/knowledge-fs
knowledge unmount /tmp/knowledge-fs

Creates a read-optimized directory tree:

pitfalls.md — assembled markdown of all pitfalls
pitfalls/tool/severity/entry.md — individual entry stubs

Useful for tools that expect a filesystem (grep, editors, backup scripts).

`daemon` (optional, requires `watch` feature)

cargo install knowledge-db --features watch
knowledge daemon --mount /tmp/knowledge-fs

Watches the store directory and auto-rebuilds the virtual filesystem on changes. Useful for dashboards and live editors.

Agent setup guide

For AI agent tool definitions

If you're building an AI agent that needs persistent knowledge, expose these three tools:

# Tool: knowledge_write
def knowledge_write(category: str, content: str) -> str:
    """Append a markdown entry to the knowledge store."""
    return run(f"knowledge append {category} {shlex.quote(content)}")

# Tool: knowledge_read
def knowledge_read(category: str, path: str = None) -> str:
    """Read all entries in a category."""
    if path:
        return run(f"knowledge read {category} {path}")
    return run(f"knowledge read {category}")

# Tool: knowledge_search
def knowledge_search(category: str, query: str) -> str:
    """Search entries by tokens and field filters."""
    return run(f"knowledge search {category} {shlex.quote(query)}")

System prompt instructions

Include this in your agent's system prompt:

## Knowledge store usage

You have a persistent knowledge store at ~/.hermes/knowledge/.
Use it to remember facts, pitfalls, fixes, and workflows across sessions.

When to write:
- You discover a bug, tricky edge case, or gotcha → knowledge_write("pitfalls", ...)
- You find a solution or workaround → knowledge_write("fixes", ...)
- You learn a multi-step procedure → knowledge_write("workflows", ...)
- You observe a fact or user preference → knowledge_write("facts", ...)

Entry format uses ## field headers:
  ## tool\nfastapi\n\n## severity\nhigh\n\n## source\nDescription\n\n## fix\nSolution

Always search the store before troubleshooting: check if this problem has been seen before.

Hermes agent integration

If you use Hermes, the knowledge tool is built-in. Configure your agent to use knowledge-db as the backend:

# config.yaml
knowledge:
  backend: knowledge-db
  store_path: ~/.hermes/knowledge

The Hermes knowledge tool maps directly to knowledge-db commands — action=write maps to append, action=search maps to search, action=read maps to read.

Standalone agent integration

For agents not using Hermes, wrap the CLI:

# Write
echo '## tool
docker
## severity
medium
## source
Container memory leak under load
## fix
Set --memory limit in docker run' | knowledge append pitfalls

# Search
knowledge search pitfalls 'docker memory leak'

# Read
knowledge read pitfalls

Testing your integration

# 1. Write a test entry
knowledge append test '## topic\ntesting\n\n## detail\nIntegration works'

# 2. Read it back
knowledge read test

# 3. Search it
knowledge search test 'integration'

# 4. Check stats
knowledge stats test

# 5. Clean up (delete the test category directory)
rm -rf ~/.hermes/knowledge/test

How it works

Entry storage

Each append does this:

Parse ## field headers from the markdown
Resolve directory path from key fields (tool, severity, domain, type)
Generate filename slug from title → source → fix → first long-enough value
Acquire file lock, write .md file, update flat-text index, release lock
Check if parent directory exceeds 50 files → auto-split if needed

Auto-split engine

When a directory hits 50+ .md files, the engine:

Scans all entries in the directory
Picks the best field to split on (cardinality ratio 0.05–0.3 — produces 5–50 subdirectories)
Creates subdirectories per distinct field value
Moves files into their subdirectories
Rebuilds the index

The agent never notices. knowledge read pitfalls works identically before and after splitting.

Index

A flat-text index file at .index maps relative paths to field:value pairs. Rebuilt on startup if missing. Format:

pitfalls/fastapi/high/uvicorn-timeout.md → tool: fastapi, severity: high, source: UVicorn timeout causes 504

Concurrency

File locking via flock (fs2 crate). Multiple agents writing simultaneously won't corrupt each other's entries. Lock contention returns an error — agents should retry.

Architecture

knowledge append → markdown file → directory tree
knowledge search → flat-text index → assembled markdown
knowledge read   → tree walk → assembled markdown

Component	Lines	Purpose
`parser.rs`	127	`## field` → value extraction
`store.rs`	882	Index, append, search, read, dedup, prune, split
`main.rs`	493	CLI, stats, mount, daemon
`knowledge-py`	180	Standalone Python fallback

Binary size: ~1.4 MB release build
Dependencies: slug, fs2, serde, clap, chrono (+ notify for watch feature)
Runtime: zero runtime deps, no daemon required, no SQL, no embeddings

Why not SQLite?

At 500 files, rg beats SQLite. At 5,000 files, the auto-split engine keeps directories small. The filesystem IS the index — path components encode structure, grep handles search, ls handles listing.

Benefits:

Git-friendly: the store is just markdown files — version control works naturally
No migration: add a new field, old entries just lack it; no ALTER TABLE
No lock contention: per-file locking, not database-level
Transparent: ls, grep, cat all work directly on the store
Zero setup: no daemon, no connection string, no migrations

If you need JOINs, aggregates, or complex queries — add a SQLite read cache. But you probably don't.

Development

# Clone
git clone https://github.com/workswithagents/knowledge-db
cd knowledge-db

# Build
cargo build

# Run tests
cargo test

# Release build
cargo build --release

# Run locally
cargo run -- append test '## topic\ntesting\n\n## detail\nhello'

# With watch feature
cargo run --features watch -- daemon

Publishing to crates.io

cargo publish

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
knowledge-py		knowledge-py

Folders and files

Latest commit

History

Repository files navigation

knowledge-db

Install

From crates.io

From source

Python fallback

Categories (logical files)

Field format

Common fields by category

Rules

Commands

append — Write an entry

read — Assemble a category

search — Find entries

stats — Store statistics

aliases — Query alias expansion

dedup — Find duplicates

prune — Remove old entries

mount — Virtual filesystem

daemon (optional, requires watch feature)

Agent setup guide

For AI agent tool definitions

System prompt instructions

Hermes agent integration

Standalone agent integration

Testing your integration

How it works

Entry storage

Auto-split engine

Index

Concurrency

Architecture

Why not SQLite?

Development

Publishing to crates.io

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`append` — Write an entry

`read` — Assemble a category

`search` — Find entries

`stats` — Store statistics

`aliases` — Query alias expansion

`dedup` — Find duplicates

`prune` — Remove old entries

`mount` — Virtual filesystem

`daemon` (optional, requires `watch` feature)

Packages