Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
9403363
feat: add vecstore module with sqlite-vec integration
devwhodevs Mar 25, 2026
5f7a89e
feat: integrate sqlite-vec into Store with transaction helpers
devwhodevs Mar 25, 2026
3407644
refactor: replace HNSW semantic lane with sqlite-vec in search
devwhodevs Mar 25, 2026
e8a5e5f
refactor: replace HNSW rebuild with sqlite-vec inserts in indexer
devwhodevs Mar 25, 2026
424189c
refactor: remove hnsw_rs dependency, delete hnsw.rs — vectors now in …
devwhodevs Mar 25, 2026
6e0986d
feat: auto-migrate existing BLOB vectors to sqlite-vec on startup
devwhodevs Mar 25, 2026
478a72f
feat: add tag registry with fuzzy resolution
devwhodevs Mar 25, 2026
8062a0e
feat: add link discovery module for auto-wikilinks
devwhodevs Mar 25, 2026
c3129fc
feat: add folder placement module with type rules and semantic centroids
devwhodevs Mar 25, 2026
843d15b
feat: add write pipeline orchestrator with create, append, update, an…
devwhodevs Mar 25, 2026
7efd2d4
fix: resolve clippy warnings in writer, links, and placement modules
devwhodevs Mar 25, 2026
5a5aafc
feat: add create, append, update_metadata, and move_note MCP write tools
devwhodevs Mar 25, 2026
f8271cd
feat: add write CLI subcommands (create, append)
devwhodevs Mar 25, 2026
cebd3bf
feat: add crash recovery — cleanup orphan .tmp files on startup
devwhodevs Mar 25, 2026
66c9118
test: add write pipeline integration tests
devwhodevs Mar 25, 2026
4b80ebe
chore: v0.6.0 — write pipeline, sqlite-vec migration, tombstone removal
devwhodevs Mar 25, 2026
7f7ca0e
fix: update stored mtime after rename to prevent false conflict detec…
devwhodevs Mar 25, 2026
7d4ed8c
feat: add archive/unarchive for soft-delete with index exclusion
devwhodevs Mar 25, 2026
0befe42
style: apply cargo fmt to archive/unarchive code
devwhodevs Mar 25, 2026
03c7566
feat: complete v0.6 spec coverage — content analysis, suggested_folde…
devwhodevs Mar 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 24 additions & 14 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Local hybrid search CLI for Obsidian vaults. Rust, MIT licensed.

## Architecture

Single binary with 14 modules behind a lib crate:
Single binary with 19 modules behind a lib crate:

- `config.rs` — loads `~/.engraph/config.toml` and `vault.toml`, merges CLI args, provides `data_dir()`
- `chunker.rs` — smart chunking with break-point scoring algorithm. Finds optimal split points considering headings, code fences, blank lines, and thematic breaks. `split_oversized_chunks()` handles token-aware secondary splitting with overlap
Expand All @@ -14,46 +14,56 @@ Single binary with 14 modules behind a lib crate:
- `fts.rs` — FTS5 full-text search support. Re-exports `FtsResult` from store. BM25-ranked keyword search
- `fusion.rs` — Reciprocal Rank Fusion (RRF) engine. Merges semantic + FTS5 + graph results. Supports lane weighting, `--explain` output with per-lane detail
- `context.rs` — context engine. Six functions: `read` (full note content + metadata), `list` (filtered note listing), `vault_map` (structure overview), `who` (person context bundle), `project` (project context bundle), `context_topic` (rich topic context with budget trimming). Pure functions taking `ContextParams` — no model loading except `context_topic` which reuses `search_internal`
- `serve.rs` — MCP stdio server via rmcp SDK. Exposes 7 read-only tools (search, read, list, vault_map, who, project, context). EngraphServer struct with Arc+Mutex wrapping for async handlers. Loads all resources at startup.
- `vecstore.rs` — sqlite-vec virtual table integration. Manages the `vec_chunks` vec0 table for vector storage and KNN search. Handles insert, delete, and search operations against the virtual table
- `tags.rs` — tag registry module. Maintains a `tag_registry` table tracking known tags with source attribution. Supports fuzzy matching for tag suggestions during note creation
- `links.rs` — link discovery module. Scans note content for potential wikilink targets using fuzzy basename matching and heading detection. Suggests links that could be added to improve vault connectivity
- `placement.rs` — folder placement engine. Uses folder centroids (average embeddings per folder) to suggest the best folder for new notes. Falls back to inbox when confidence is low
- `writer.rs` — write pipeline orchestrator. 5-step pipeline: resolve tags (fuzzy match + register new), discover links, place in folder, atomic file write (temp + rename), and index update. Supports create, append, update_metadata, and move_note operations with mtime-based conflict detection and crash recovery via temp file cleanup
- `serve.rs` — MCP stdio server via rmcp SDK. Exposes 11 tools: 7 read (search, read, list, vault_map, who, project, context) + 4 write (create, append, update_metadata, move_note). EngraphServer struct with Arc+Mutex wrapping for async handlers. Loads all resources at startup
- `graph.rs` — vault graph agent. Extracts wikilink targets, expands search results by following graph connections 1-2 hops. Relevance filtering via FTS5 term check and shared tags
- `profile.rs` — vault profile detection. Auto-detects PARA/Folders/Flat structure, vault type (Obsidian/Logseq/Plain), wikilinks, frontmatter, tags. Writes/loads `vault.toml`
- `store.rs` — SQLite persistence. Tables: `meta`, `files` (with docid), `chunks` (with vector BLOBs), `chunks_fts` (FTS5), `edges` (vault graph), `tombstones`. Handles incremental diffing via content hashes
- `hnsw.rs` — thin wrapper around `hnsw_rs`. **Important:** `hnsw_rs` does not support inserting after `load_hnsw()`. The index is rebuilt from vectors stored in SQLite on every index run
- `indexer.rs` — orchestrates vault walking (via `ignore` crate for `.gitignore` support), diffing, chunking, embedding (Rayon for parallel chunking, serial embedding since `Embedder` is not `Send`), serial writes to store + HNSW + FTS5, and vault graph edge building (wikilinks + people detection)
- `store.rs` — SQLite persistence. Tables: `meta`, `files` (with docid), `chunks` (with vector BLOBs), `chunks_fts` (FTS5), `edges` (vault graph), `tombstones`, `tag_registry`, `folder_centroids`. `vec_chunks` virtual table (sqlite-vec) for KNN search. Handles incremental diffing via content hashes
- `indexer.rs` — orchestrates vault walking (via `ignore` crate for `.gitignore` support), diffing, chunking, embedding (Rayon for parallel chunking, serial embedding since `Embedder` is not `Send`), serial writes to store + sqlite-vec + FTS5, vault graph edge building (wikilinks + people detection), and folder centroid computation
- `search.rs` — hybrid search orchestrator. Runs semantic (sqlite-vec KNN), keyword (FTS5 BM25), and graph expansion lanes, then fuses via RRF

`main.rs` is a thin clap CLI (async via `#[tokio::main]`). Subcommands: `index`, `search` (with `--explain`), `status`, `clear`, `init`, `configure`, `models`, `graph` (show/stats), `context` (read/list/vault-map/who/project/topic), `serve` (MCP stdio server).
`main.rs` is a thin clap CLI (async via `#[tokio::main]`). Subcommands: `index`, `search` (with `--explain`), `status`, `clear`, `init`, `configure`, `models`, `graph` (show/stats), `context` (read/list/vault-map/who/project/topic), `write` (create/append/update-metadata/move), `serve` (MCP stdio server).

## Key patterns

- **3-lane hybrid search:** Queries run through three lanes — semantic (HNSW embeddings), keyword (FTS5 BM25), and graph (wikilink expansion). Results are fused via Reciprocal Rank Fusion (RRF) with configurable lane weights (semantic 1.0, FTS 1.0, graph 0.8)
- **3-lane hybrid search:** Queries run through three lanes — semantic (sqlite-vec KNN embeddings), keyword (FTS5 BM25), and graph (wikilink expansion). Results are fused via Reciprocal Rank Fusion (RRF) with configurable lane weights (semantic 1.0, FTS 1.0, graph 0.8)
- **Vault graph:** `edges` table stores bidirectional wikilink edges and mention edges. Built during indexing after all files are written. People detection scans for person name/alias mentions using notes from the configured People folder
- **Graph agent:** Expands seed results by following wikilinks 1-2 hops. Decay: 0. for 1-hop, 0. for 2-hop. Relevance filter: must contain query term (FTS5) or share tags with seed. Multi-parent merge takes highest score
- **Graph agent:** Expands seed results by following wikilinks 1-2 hops. Decay: 0.8x for 1-hop, 0.5x for 2-hop. Relevance filter: must contain query term (FTS5) or share tags with seed. Multi-parent merge takes highest score
- **Smart chunking:** Break-point scoring algorithm assigns scores to potential split points (headings 50-100, code fences 80, thematic breaks 60, blank lines 20). Code fence protection prevents splitting inside code blocks
- **Incremental indexing:** `diff_vault()` compares file content hashes in SQLite against disk. Changed files have their old chunks and edges deleted, then are re-processed. FTS5 entries cleaned up alongside vector entries
- **HNSW rebuild on every run:** Vectors stored as BLOBs. Full HNSW index rebuilt from `store.get_all_vectors()` after SQLite update (hnsw_rs limitation)
- **Incremental indexing:** `diff_vault()` compares file content hashes in SQLite against disk. Changed files have their old chunks, vectors, and edges deleted, then are re-processed. FTS5 and sqlite-vec entries cleaned up alongside store entries
- **sqlite-vec for vector search:** Vectors stored in a `vec_chunks` virtual table (vec0). KNN search via `vec_distance_cosine()`. Real deletes — no tombstone filtering needed during search
- **Write pipeline:** 5-step process for creating/modifying notes: (1) resolve tags via fuzzy matching against tag registry, (2) discover potential wikilinks via basename matching, (3) suggest folder placement via centroid similarity, (4) atomic file write (temp + rename for crash safety), (5) immediate index update (embed + insert into sqlite-vec + FTS5 + edges)
- **Docids:** Each file gets a deterministic 6-char hex ID. Displayed in search results
- **Vault profiles:** `engraph init` auto-detects vault structure and writes `vault.toml`
- **Pluggable models:** `ModelBackend` trait enables future model swapping

## Data directory

`~/.engraph/` — hardcoded via `Config::data_dir()`. Contains `engraph.db` (SQLite with FTS5 + edges), `hnsw/` (index files), `models/` (ONNX model + tokenizer), `vault.toml` (vault profile), `config.toml` (user config).
`~/.engraph/` — hardcoded via `Config::data_dir()`. Contains `engraph.db` (SQLite with FTS5 + sqlite-vec + edges), `models/` (ONNX model + tokenizer), `vault.toml` (vault profile), `config.toml` (user config).

Single vault only. Re-indexing a different vault path triggers a confirmation prompt.

## Dependencies to be aware of

- `ort` (2.0.0-rc.12) — ONNX Runtime Rust bindings. Pre-release API. Does not provide prebuilt binaries for all targets
- `hnsw_rs` (0.3) — pure Rust HNSW. `Box::leak` in `load()`. Read-only after load
- `sqlite-vec` (0.1.8-alpha.1) — SQLite extension for vector search. Provides vec0 virtual tables with KNN via `vec_distance_cosine()`
- `zerocopy` (0.7) — zero-copy serialization for vector data passed to sqlite-vec
- `strsim` (0.11) — string similarity for fuzzy tag matching in the write pipeline
- `time` (0.3) — date/time handling for frontmatter timestamps
- `tokenizers` (0.22) — HuggingFace tokenizer. Needs `fancy-regex` feature
- `ignore` (0.4) — vault walking with `.gitignore` support
- `rusqlite` (0.32) — bundled SQLite with FTS5 support
- `rmcp` (1.2) — MCP server SDK for stdio transport

## Testing

- Unit tests in each module (`cargo test --lib`) — 146 tests, no network required
- Unit tests in each module (`cargo test --lib`) — 190 tests, no network required
- 1 ignored smoke test (`test_embed_smoke`) — downloads ONNX model, verifies embedding
- Integration tests (`cargo test --test integration -- --ignored`) — 8 tests, require model download
- Integration tests (`cargo test --test integration -- --ignored`) — require model download

## CI/CD

Expand Down
Loading
Loading