Autonomous Knowledge Base Architect for Claude Code
Librarian is a Claude Code agent (teammate) that manages an Obsidian Vault as a structured knowledge base with a three-tier memory architecture. It ingests raw data, filters garbage, performs semantic merges, and maintains ClickHouse indexes — all autonomously.
L0 Embedding Index TEI + ClickHouse Cosine similarity, milliseconds
L1 Summary Index ClickHouse YAML frontmatter: title, summary, tags, domain
L2 Full Articles Obsidian Vault Markdown files with YAML frontmatter
- L0 — vector embeddings for associative retrieval (semantic search)
- L1 — structured metadata in ClickHouse (
kb.articlestable, Bloom filters on tags) - L2 — full markdown articles in an Obsidian vault, organized by domain
raw data → _inbox/ → Chunker (>50KB) → file_parse (Zero-Value filter + synthesis)
→ Template Selection (documentation) → /kb search for duplicates → Merge / Create / Drop
→ /git_commit → /kbupd (sync L1 index)
- Inbox: Drop raw files (memory dumps, logs, notes, requirement docs) into
_inbox/ - Chunker (Haiku subagent): Splits files >50KB into semantic chunks, quarantines toxic content
- file_parse: Evaluates content value (Zero-Value Protocol — drops garbage), extracts engineering essence, generates YAML frontmatter
- Template Selection: If content is project/product documentation (requirements, specs, architecture, market analysis, quality standards), applies modular template structure:
templates/base.md+ relevant addons fromtemplates/addons/. Multiple addons combine for complex docs (e.g., PRD + FRD + TRD). Empty fields →[TBD], never fabricated. - Dedup: Searches existing KB via
/kbskill (ClickHouse Bloom filters) - Merge Logic: Semantic merge into existing article, or create new one. Lossless fact retention — never drop details for readability
- Persist: Git commit + ClickHouse L1 index sync
The main agent. Owns the vault, runs the pipeline, makes merge/create/drop decisions.
# .claude/agents/librarian.md
name: librarian
model: claude-sonnet-4-6
permissionMode: bypassPermissionsSubagent spawned by Librarian for large files. Splits semantically, quarantines junk, returns file listing, dies.
# .claude/agents/chunker.md
name: chunker
model: claude-haiku-4-6
permissionMode: bypassPermissions| Skill | Description |
|---|---|
| file_parse | Zero-Value filter + adaptive extraction. Drops garbage, synthesizes content into structured markdown with YAML frontmatter. Domain-aware: infra → architecture decisions; AI → mechanism analysis; science → causal chains. |
| kbupd | Parses YAML frontmatter from .md files and UPSERTs into ClickHouse kb.articles table. |
| git_commit | Commits vault state to git for audit trail. |
| kb | Queries ClickHouse for existing articles by domain, tags, or search. Returns L1 summaries and L2 vault paths. |
| librarian | Orchestrator skill — spawns the librarian agent into a team with persistent tasks. |
Every article in the vault has YAML frontmatter:
---
domain: "infra"
category: "clickhouse"
title: "L0/L1 Knowledge Base Schema"
title_ru: "Схема L0/L1 базы знаний"
summary: "DDL for kb.articles table using ReplacingMergeTree with Bloom filters."
tags_en: ["database", "schema", "architecture"]
tags_ru: ["база_данных", "схема", "архитектура"]
source: "user"
---- Sequential only — never spawn multiple librarians in parallel (race condition on dedup)
- Zero-Value Protocol — ruthless garbage filter before any processing
- Semantic merge, not concatenation — new facts integrated into existing document structure
- Lossless fact retention — isolated notes section for facts that don't fit the main structure
- Chunker quarantine — toxic/jailbreak content isolated, never enters KB
- Claude Code with agent support
- Obsidian vault (or any markdown file structure)
- ClickHouse for L0/L1 indexes (optional — works without it as pure file-based KB)
- TEI for L0 embeddings (optional)
- Copy agent definitions to
.claude/agents/ - Copy skills to
.claude/skills/(or your skills directory) - Adapt hardcoded paths in the following files (see table below)
- Set up ClickHouse
kb.articlestable (see Schema section) - Create
_inbox/directory in your vault - Drop files into
_inbox/and spawn the librarian
These files contain paths specific to the reference setup. Update them to match your environment:
| File | What to change |
|---|---|
agents/librarian.md |
_inbox/ path, _quarantine/ path, canonical folders root |
agents/chunker.md |
_quarantine/ path, temp storage path |
skills/git_commit/SKILL.md |
Vault root path (for cd and git add) |
skills/kbupd/SKILL.md |
Path to sync_l1_index.py script |
skills/kb/SKILL.md |
ClickHouse connection details, kbcli path |
skills/librarian/SKILL.md |
Team name (if you want a different one) |
Document templates adapted from req-docs by @alenazaharovaux — a collection of requirement document templates for AI coding agents. Licensed under MIT.
MIT