Twin is a local-first knowledge OS with semantic search, RAG, and agent execution. It ingests Markdown notes, PDFs, and URLs into a local vector store and lets you query them with natural language or run multi-step agents that reason across your knowledge base — supporting five LLM providers with an encrypted local keychain.
Details on how to use or build the project can be found at HOW_TO_USE.md
# Ingest anything — Markdown, PDF, or URL
$ twin ingest ./notes
Done. Ingested 47 files (312 chunks). Skipped 0 unchanged.
$ twin ingest research.pdf
Done. Ingested research.pdf (18 chunks).
$ twin ingest https://example.com/article
Done. Ingested URL (6 chunks).
$ twin query "What did I write about the Rust ownership model?"
┌───┬───────┬───────────────────────┬──────────────────────────────────┐
│ # │ Score │ Source │ Text │
├───┼───────┼───────────────────────┼──────────────────────────────────┤
│ 1 │ 0.91 │ rust.md › Ownership │ "Ownership in Rust is the..." │
└───┴───────┴───────────────────────┴──────────────────────────────────┘
$ twin rag "What is the Rust ownership model?"
Rust's ownership model gives each value a single owner. When the owner
goes out of scope, the value is dropped automatically...
Sources:
• rust.md Ownership
• systems.md Memory Safety
1 call · 1,240 tokens · ~$0.003
# Multi-step agent with knowledge base search and vault write-back
$ twin agent "Summarize everything I know about async Rust"
iter 0 → search_knowledge_base [source: rust.md > Async] The async keyword...
iter 1 → search_knowledge_base [source: tokio.md > Runtime] Tokio provides...
Based on your notes, async Rust centers on the Future trait...
Tool calls made: 2
2 calls · 890 tokens · ~$0.005
$ twin config set-key
Providers: anthropic, openai, gemini, openrouter
Provider: anthropic
API key for anthropic: ****
$ twin config set-provider openai
✓ Active provider set to openai.
$ twin usage
┌────────────┬───────────┬───────┬──────────────┬───────────────────┬──────────┐
│ Date │ Provider │ Calls │ Prompt tokens │ Completion tokens │ Est. cost│
├────────────┼───────────┼───────┼──────────────┼───────────────────┼──────────┤
│ 2026-06-01 │ anthropic │ 14 │ 18,430 │ 3,210 │ $0.0421 │
└────────────┴───────────┴───────┴──────────────┴───────────────────┴──────────┘
# Watch an Obsidian vault for changes and re-ingest on save
$ twin watch ~/vault
Watching ~/vault for .md changes. Log: ~/.twin/watcher.log (Ctrl-C to stop)
Markdown / PDF / URL
│
▼
ingestion/ Format routing:
parser.py • .md/.txt → Obsidian-aware chunker (wikilinks, tags, frontmatter)
pdf.py • .pdf → pymupdf page extractor
url.py • URL → trafilatura web extractor
obsidian.py All formats produce the same _Chunk shape.
│
▼
embedder.py nomic-embed-text-v1.5 (768-dim). Applies task prefixes:
search_document: for ingestion, search_query: for queries.
│
▼
vector.py LanceDB persistent store. ANN search. Stores link_targets
metadata.py and tags for Obsidian notes. SHA-256 hash registry (SQLite)
for idempotent ingestion.
│
▼
retriever.py Search orchestration, ranking, Rich-formatted output.
│
▼
rag/pipeline.py Retrieve → format context with source attribution →
stream LLM synthesis → grounded answer + sources.
│
▼
agent/runtime.py Multi-step tool-using loop. LLM decides when to search
agent/tools.py the KB or write a note to the vault. Streams final answer.
All tool calls logged. Usage tracked per session.
│
▼
llm/ Five provider implementations behind one async interface:
anthropic.py Claude (default)
openai.py GPT-4o and variants
gemini.py Gemini 2.0 Flash and variants
ollama.py Local models — no API key, no cost
openrouter.py Unified access to 100+ models
config_manager.py AES-256-GCM encrypted keychain (~/.twin/keychain.enc).
Key derived from username + machine ID via PBKDF2 — non-portable.
usage.py JSONL token and cost log (~/.twin/usage.jsonl).
Session summaries printed at end of each rag/agent call.
| Concern | Choice |
|---|---|
| Language | Python 3.11+, Rust (chunking hot path) |
| Embeddings | sentence-transformers (nomic-embed-text-v1.5) |
| Vector store | LanceDB |
| Metadata store | SQLite via SQLModel |
| LLM providers | Anthropic, OpenAI, Google Gemini, Ollama, OpenRouter |
| CLI | Typer |
| Terminal output | Rich |
| Encryption | PyCA cryptography (AES-256-GCM + PBKDF2) |
| PDF extraction | pymupdf |
| Web extraction | trafilatura |
| Filesystem watch | watchdog |
| HTTP client | httpx |
| Testing | pytest |
| Dependency management | uv |
| Python-Rust bindings | PyO3 / maturin |
twin/
config.py Provider enum, ModelInfo, AppConfig (TWIN_* env vars)
config_manager.py AES-256-GCM keychain + config.json read/write
usage.py UsageRecord, UsageLogger, format_session_summary
cli.py Typer CLI — all commands
ingestion/
parser.py Markdown chunking (Rust extension)
embedder.py sentence-transformers wrapper, prefix handling
pdf.py pymupdf-based PDF parser
url.py trafilatura-based URL ingester
obsidian.py Wikilink/tag/frontmatter parser + VaultWatcher
storage/
vector.py LanceDB schema, ANN search, link_targets/tags fields
metadata.py SQLite document registry, frontmatter_json field
query/
retriever.py Search orchestration, ranking, Rich output
llm/
base.py LLMProvider ABC, ToolDefinition, ToolCall, LLMResponse
anthropic.py Async Claude
openai.py Async OpenAI
gemini.py Google Gemini via google-genai SDK
ollama.py Local Ollama via httpx
openrouter.py OpenRouter (unified multi-provider access)
rag/
pipeline.py query() + query_stream(), session usage tracking
context.py Chunk formatting with source attribution
prompts.py System prompt definitions
agent/
runtime.py execute() + execute_stream(), session usage tracking
tools.py search_knowledge_base + VaultWriter + ToolDispatcher
log.py AgentLog: chronological event log, JSON-serializable
twin_core/
Cargo.toml
src/
lib.rs PyO3 bindings
chunker.rs Heading-aware chunking logic
tokens.rs Token counting (word-based)
Zero LangChain, LlamaIndex, or similar. Every component is a thin wrapper around its underlying library — LanceDB, sentence-transformers, SQLite, provider SDKs.
Why: Abstractions obscure what's happening during retrieval, make debugging harder, and add dependencies with frequent breaking changes. Every retrieval failure in Twin is traceable: query → embedding → ANN search → ranking → formatting. No framework magic in the path.
Choice: nomic-ai/nomic-embed-text-v1.5 (768 dimensions)
Benchmarked against the MTEB leaderboard. Ranks top 5 for retrieval tasks among locally-runnable models. The model requires task-specific prefixes — search_document: for ingestion, search_query: for queries — which Twin applies explicitly because the distinction is measurable in retrieval quality.
llm/base.py defines an abstract LLMProvider with four methods: complete(), stream(), estimate_cost(), and list_models(). All are async. Five concrete implementations ship with Phase 2: Anthropic, OpenAI, Gemini, Ollama, and OpenRouter.
Provider resolution order: --provider flag → config.json → TWIN_PROVIDER env var → Anthropic.
The runtime always appends messages in Anthropic content-block format; non-Anthropic providers convert internally in their complete() method. The agent runtime and RAG pipeline have zero provider-specific code.
API keys are stored encrypted in ~/.twin/keychain.enc using AES-256-GCM. The encryption key is derived from username:machine_id via PBKDF2-SHA256 (480,000 iterations) — intentionally non-portable. Keys are never printed, logged, or returned anywhere in the codebase.
Resolution order: keychain → environment variable → descriptive error with onboarding instructions.
Running ingest twice on unchanged content produces no changes. SHA-256 hashes of file content (or URL content) are stored in the SQLite registry. On re-ingest: hash match → skip; hash changed → delete old chunks, insert new ones.
All .md files go through the Obsidian-aware parser. Non-vault Markdown simply yields empty link_targets and tags. The parser:
- Extracts
[[Note Name]]and[[Note Name|Alias]]→link_targets(note names, deduplicated) - Extracts
#tagsand#nested/childfrom the body (not YAML frontmatter) - Strips
![[embed.png]]from chunk text - Converts wikilinks to plain text for embedding
- Preserves full YAML frontmatter as structured metadata in SQLite
write_vault_note enforces that agent output never escapes <vault>/Agents/. The path is sanitized (slashes and control characters replaced), then a relative_to boundary check is applied as defense-in-depth. The constraint is at the path level, not a convention.
Chunking and token counting live in twin_core/ — a Rust crate exposed via PyO3 bindings. The boundary is clean: Rust receives plain strings, returns text + offsets. It does not touch LanceDB, SQLite, the filesystem, or any Python objects. Same pattern as Hugging Face tokenizers.
| Parameter | Value | Reason |
|---|---|---|
| Max chunk tokens | 512 | Precision vs. context trade-off — smaller is sharper |
| Overlap tokens | 64 | Prevents context loss at chunk boundaries |
| Primary split | Markdown headings | Semantic units, not arbitrary length |
| Secondary split | Paragraph breaks | Natural prose boundaries |