Not a note-taking app. Not a bookmark manager. A brain.
A local-first knowledge intelligence system that turns saved content into structured insights, principles, predictions, and searchable memory. Feed it anything — YouTube, blogs, PDFs, podcasts — and it extracts every actionable insight, discovers connections across domains, and continuously gets smarter.
Built for founders, researchers, writers, and anyone who thinks for a living.
INGEST EXTRACT LEARN USE
any URL → insights via → connections, → search, decide,
or file Claude + Zod compression, predict, act
gap detection
Every source you feed it becomes structured knowledge. The system finds patterns you wouldn't — cross-domain transfers, contradictions, gaps. Insights compress into principles. Principles inform decisions. Decisions track outcomes. The loop closes.
Source: Paul Graham — "How to Present to Investors" (2,742 words)
↓ npm run ingest
Insight: "Narrow descriptions beat vague ones in investor pitches — as a
description approaches 'could be anything,' its information
content approaches zero."
↓ npm run learn (connection discovery + compression)
Principle: "Early-stage business models are almost certainly wrong — pitch
time spent on monetization displaces discussion of the problem
and product where founders actually have insight."
↓ npm run search "how should I pitch?"
Result: Retrieves the principle + 5 supporting insights from 4 different
sources, with confidence levels and source attribution.
One essay becomes searchable knowledge that connects to everything else you've ingested.
Prerequisites: Node.js 20+, npm.
git clone https://github.com/gorajing/zuhn.git && cd zuhn
npm installThe repo includes a live reference corpus (~11k insights) as both proof-of-concept and working example. If you want your own clean knowledge base instead of the bundled reference corpus:
rm -rf knowledge-base
node --import tsx scripts/init.tsThis creates an empty knowledge base with a bundled sample source — a short essay you can extract from immediately, no network required.
Open Claude Code in the zuhn directory and say:
extract insights from SRC-000000-DEMO
Claude reads the sample source, writes structured JSON, and runs extract.ts — your first insights land in the knowledge base. Extraction works with Claude Code or any LLM that can produce schema-valid JSON (see CLAUDE.md for the format).
You can delete the demo source after your first successful extraction.
npm run search "spacing effect"Three steps: init → extract → search. Everything else builds on top.
Note: If
npm runcommands fail with tsx-related errors in restrictive environments (Docker, sandboxed shells), usenode --import tsx scripts/<name>.tsas a fallback.
| Feature | LLM (Claude recommended) | Ollama | Network |
|---|---|---|---|
| Create KB and extract insights | Yes | — | — |
| Keyword search | — | — | — |
| Semantic (hybrid) search | — | Yes | — |
| Ingest URLs (YouTube, blogs, PDFs) | — | — | Yes |
Learning mechanisms (npm run learn) |
— | Yes | — |
| Decision briefs (MCP) | — | — | — |
| Autonomous daemon | Yes | Yes | Yes |
Zuhn defaults to keyword search (SQLite FTS5). For hybrid keyword + semantic search, install Ollama:
ollama serve # Start the server
ollama pull nomic-embed-text # Download the embedding model (768 dims)
npm run embed # Embed your existing insightsWithout Ollama, everything works — search returns keyword-only results.
Feed Zuhn any URL — YouTube, blogs, Reddit, PDFs:
npm run ingest https://youtu.be/your-video-id
npm run ingest https://example.com/interesting-article
npm run ingest path/to/local/file.pdfYouTube requires yt-dlp. PDF ingestion is local (no network after download).
The daemon watches an inbox, extracts insights, red-teams beliefs, and scouts evidence:
npm run daemon:startRequires Claude Code + Ollama running. See inbox setup for configuration.
- Ingests any URL or file — YouTube, blogs, Reddit, PDFs, audio recordings
- Extracts discrete, actionable insights (Claude reasons, TypeScript validates via Zod)
- Learns — 9 automated mechanisms discover connections, cluster topics, detect gaps, transfer principles across domains, flag contradictions
- Compresses knowledge upward: insights → principles → mental models
- Searches via hybrid keyword + semantic search (SQLite FTS5 + Ollama embeddings)
- Predicts & Decides — testable claims with deadlines, decisions with outcome tracking
- Surfaces ambient context — concise decision briefs appear reflexively during decision-shaped conversations (via MCP)
- Tracks chronology — append-only
meta/log.mdrecords every ingestion, compression, prediction, and resolution - Runs autonomously — daemon processes inbox, extracts insights, red-teams beliefs, scouts evidence while you sleep
Level 5: MENTAL MODELS — Transferable frameworks across domains
Level 4: PRINCIPLES — Synthesized rules backed by evidence
Level 3: INSIGHTS — Individual knowledge cards
Level 2: PROCESSED SOURCES — Summarized, tagged original content
Level 1: RAW INTAKE — Original content as received
Each level compresses the one below. ~800 lines of context for an expert-level answer whether you have 100 or 10,000 insights.
# Ingestion
npm run ingest <url-or-path> # Any content type
npm run ingest-channel <url> # Batch YouTube channel (--top N)
# Search
npm run search "query" # Keyword search
npm run search "query" -- --hybrid # Keyword + semantic
# Pipeline
npm run post-ingest # Full pipeline: health → reindex → embed → learn → views
npm run health # Validate everything
npm run learn # Run 9 learning mechanisms
# Knowledge Management
npm run compress # Find topics ready for compression
npm run mindmap # Interactive visualization
npm run resurface # Daily digest of insights to review
npm run archive # Intelligent forgetting (--dry-run first)
# Empirical Engine
npx tsx scripts/predict.ts --file <json> # Create testable predictions
npx tsx scripts/decide.ts --file <json> # Log decisions with insight links
npx tsx scripts/resolve.ts --id <ID> --status <STATUS> # Track outcomes
# Decision Briefs
npm run brief "should I raise VC?" # Full markdown brief (CLI default)
npm run brief -- --mode concise "hire or automate" # Compact ~300-token summary
# MCP default is concise — agents invoke reflexively on decision-shaped prompts
# Session
npm run wake # Morning briefing
npm run sleep # Save session state
# Autonomous
npm run daemon:start # Background: inbox + scouts + red team
npm run autoknowledge # Self-improving extraction loop
# Platform
npm run mcp # MCP server (12 tools, any compatible MCP client)
npm run test # Vitest test suite| Mechanism | What It Does |
|---|---|
| Connection Discovery | Finds semantically similar insights, populates bidirectional links |
| Link Prediction | Common-neighbor algorithm finds connections embeddings miss |
| Emergence Detection | Flags topics for compression, sorted by surprise score |
| Confidence Propagation | Increases confidence when independent sources corroborate |
| Semantic Clustering | Louvain community detection discovers cross-topic clusters |
| Gap Detection | Finds sparse areas adjacent to dense knowledge |
| Cross-Domain Transfer | Finds principles that apply to other domains (zero-tag-overlap surprise) |
| Tension Detection | Flags contradictory insights for resolution |
| Empirical Propagation | Cascades confidence when predictions confirm or falsify |
| Cross-Domain Synthesis | Finds structurally parallel principles across different domains |
All mechanisms run automatically via npm run learn.
knowledge-base/ ← Source of truth (markdown + YAML frontmatter)
├── domains/{domain}/{topic}/*.md ← Insight files (Zod-validated)
├── principles/{domain}/*.md ← Synthesized rules from insights
├── mental-models/*.md ← Transferable frameworks
├── tensions/*.md ← Tracked contradictions
├── decisions/*.md ← Decision records with outcomes
├── predictions/*.md ← Testable claims with deadlines
├── sources/{type}/*.md ← Where insights came from
├── views/mindmap.html ← Interactive zoomable mindmap
├── meta/flags.md ← Learning flags (COMPRESS/DISCOVER/GAP/TRANSFER)
├── meta/log.md ← Append-only chronological log of events
└── db/brain.db ← SQLite + FTS5 + sqlite-vec
scripts/ ← 75+ TypeScript scripts
├── lib/learning.ts ← 9 learning mechanisms
├── lib/vector-search.ts ← Hybrid search (RRF ranking)
├── lib/embeddings.ts ← Ollama embedding client
└── mcp-server.ts ← MCP server (12 tools)
skills/ ← 17 portable SKILL.md files (any LLM agent)
benchmarks/ ← Epistemic CI/CD (quality metrics + regression gates)
Dual-Graph Architecture — Fast (vector similarity, rebuilt every run) + Slow (LLM-classified semantic relationships, built async). Relationship types: SUPPORTS · CONTRADICTS · EXTENDS · TRANSFERS_TO · REFINES · CHALLENGES
- Local-first — Ollama for embeddings, Whisper for transcription, SQLite for search. Zero cloud dependencies.
- The file system is the database — Markdown + YAML frontmatter, git-tracked, human-readable
- Claude reasons, TypeScript validates — No LLM in the data path. Zod schemas enforce structure.
- Compression over accumulation — Insights → principles → mental models
- Empiricism over epistemology — Predictions and decisions close the loop with real-world outcomes
- The system learns, not just stores — 9 automated mechanisms discover structure in knowledge
| Component | Technology |
|---|---|
| Knowledge files | Markdown + YAML frontmatter |
| Validation | Zod |
| Database | SQLite + FTS5 + sqlite-vec |
| Embeddings | Ollama (nomic-embed-text, 768 dims) |
| Transcription | Whisper (local) |
| Scripts | TypeScript (tsx) |
| Tests | Vitest |
| Graph analysis | graphology + Louvain |
| Reasoning | Claude (in conversation) |
Expand for detailed architecture documents
- Dual-Graph Architecture — Fast/Slow memory, typed relationships, surprise-gated compression
- Brain Engine Architecture — Original design document
- Universal Ingestion Pipeline — Multi-format content fetching
- Learning Mechanisms 4-6 — Clustering, gap detection, cross-domain transfer
- Resolve Pipeline — Empirical engine with confidence cascade
- Autonomous Metabolism — Daemon, inbox, red team, scouts
ISC — see LICENSE for details.
Built by Jin Choi + Claude.