🔥 the fastest and most compact embedded vector database in the world 🌍
Built by Egoist Machines, Inc. - efficient full-stack infrastructure for reliable AI systems.
LodeDB is great for local RAG; it's extremely fast, exact by default, in-process, and on-disk. We're the best drop-in durable memory backend for LangChain, LlamaIndex, and mem0: the most compact on disk, the fastest per single query, GPU-accelerated for batched search, and durable in about a millisecond per write. Point any of them at LodeDB instead of its default store. Over 17.5k documents, per framework default:
| vs the framework's default store | LangChain InMemoryVectorStore |
LlamaIndex SimpleVectorStore |
mem0 Qdrant |
|---|---|---|---|
| On-disk footprint | 13.6× smaller (15 vs 199 MB) | 9.9× smaller (15 vs 145 MB) | 5.7× smaller (12 vs 70 MB) |
| Single-query p50 (CPU) | ~600× faster (0.45 vs 272 ms) | ~620× faster (0.44 vs 272 ms) | ~46× faster (0.59 vs 27 ms) |
| Batched retrieval, 64 (GPU) | ~2,880× (11,049 vs ~4 qps) | ~3,050× (11,297 vs ~4 qps) | ~139× (5,084 vs 36 qps) |
| Durable add of one memory | ~26,000× faster (0.26 ms vs 6.9 s) | ~57,000× faster (0.26 ms vs 14.8 s) | 0.28 vs 0.44 ms (both sub-ms) |
Among embedded stores, LodeDB has the smallest footprint and the fastest single-query and batched search, and its durable add leads the fastest lazy-append stores (sqlite-vec, qdrant) too:
| embedded stores | durable add p50 | single-query p50 | batch-64/query | memory footprint |
|---|---|---|---|---|
| LodeDB | 0.26 ms | 0.45 ms | 0.09 ms | 15 MB |
| sqlite-vec | 0.42 ms | 26.8 ms | 24.7 ms | 96 MB |
| qdrant | 0.48 ms | 13.9 ms | 14.2 ms | 81 MB |
| pgvector | 2.29 ms | 35.1 ms | 37.0 ms | 48 MB |
| lancedb | 3.36 ms | 10.6 ms | 10.3 ms | 35 MB |
| chroma | 5.92 ms | 3.35 ms | 3.26 ms | 144 MB |
All numbers are reported as the mean of 3 independent runs on a L40S server. Full benchmark, all backends (FAISS, Chroma, Qdrant, LanceDB, sqlite-vec, pgvector), and method.
Like what you see? Point the coding assistant in your project at egoistmachines.com/lodedb/install-agent and it will migrate your existing store onto the LodeDB backend.
Most embedded vector databases stop at the CPU. LodeDB runs the same on-disk index on the GPU when you have one: batched search hits 24k queries/sec on an A10 and 53k qps on an L40S, with recall matching the CPU scan. It also persists changed rows incrementally, so a commit stays sub-millisecond even at 1M vectors.
- GPU-resident batch search: a float32 copy of the index lives on the GPU, scored with a
cuBLAS GEMM plus an on-device top-k (
[gpu], Linux/CUDA). How it works. - O(changed) persistence: commits only the rows that changed, 173× to 1,308× faster than a full rewrite. How it works.
- Compact storage: the MIT TurboVec core packs vectors into 2/4-bit codes
and scans them with SIMD CPU kernels; retained document text is stored zstd-compressed
(on by default, set at create time with
compression=). - In-process, on-disk (
.tvim/.tvd/.jsd): no daemon, no account, no API key. - Safe concurrency: one writer and many lock-free readers per path; every commit is crash-atomic and rolls back to the last committed state on failure, never a torn store. How it works.
- Private by default: text, ids, and vectors stay local; telemetry is metrics-only (counts, bytes, latency), never raw payloads.
- Local embeddings: ONNX Runtime by default (lower per-query latency), with a PyTorch
sentence-transformersfallback; runs on CPU, CUDA, or MPS. Pick withembedding_runtime=. On an NVIDIA GPU installonnxruntime-gpu(the default wheel is CPU-only); LodeDB warns if embedding silently falls back to the CPU. Running on the GPU. - Multimodal: index images and text in one shared CLIP space (
model="clip") for cross-modal search, or bring your own vectors from any model. How it works. - Batteries included: a
lodedbCLI, a loopback/private-network dev server, an MCP server, LangChain, LlamaIndex, and mem0 adapters (VectorStores, plus a LlamaIndexPropertyGraphStore), and a one-line PrivateGPT vector-store provider built on the LlamaIndex adapter. - Swift / iOS bindings: a native Swift package for macOS and iOS over the same Rust
core, with on-device vector, text, and hybrid search, durable storage, metadata filters,
late-interaction (MaxSim), and an agent-memory facade. Swift guide,
published as the
swift-lodedbSwiftPM package. - Migrate onto LodeDB:
lodedb migratemoves an existing LangChain, LlamaIndex, or mem0 store, or a direct provider such as pgvector, onto a local LodeDB path along a plan-first, non-destructive inspect/plan/dry-run/run/validate path. Migration guide.
🏢 Enterprise The LodeDB core is Apache-2.0 and free to use. Enterprise licensing is available for commercial support, managed and at-scale serving, and on-prem / BYOC deployment. Contact sales@egoistmachines.com.
pip install "lodedb[embeddings]" # vector store + built-in text embedding (ONNX, no PyTorch)Prebuilt wheels cover Linux, macOS (Apple Silicon and Intel), and Windows on Python 3.11+, and
bundle the TurboVec (Rust) core, so there's nothing to compile. Confirm the install with lodedb doctor.
Bringing your own vectors or embedding model? The base install carries no embedding runtime. It's
a dependency-light vector store (open_vector_store / add_vectors / search_by_vector,
or pass your own embedder=):
pip install lodedbOptional extras:
pip install "lodedb[embeddings,torch]" # + PyTorch fallback, CLIP, Apple MPS
pip install "lodedb[gpu]" # GPU-resident scan (Linux/CUDA)
pip install "lodedb[image]" # image + text (CLIP) embedding (model="clip")
pip install "lodedb[mcp,langchain,llama-index,mem0]" # MCP server + LangChain/LlamaIndex/mem0 adapters
pip install "lodedb[onnx-export]" # export ONNX for a custom model (Optimum); presets need no export
pip install "lodedb[all]" # everything aboveUsing LodeDB as memory for a coding assistant? After installing the mcp,embeddings extras,
register its server in one step (details under Use as an MCP server):
lodedb mcp install --client claude-code # or: claude-desktop | cursor | lm-studio | codex | allWindows: NVIDIA GPU embeddings
On Windows, PyPI serves the CPU-only PyTorch build by default, so installing the PyTorch tier
(pip install "lodedb[torch]", and uv) leaves embeddings on the CPU even on a CUDA machine,
and no package metadata can override which torch wheel pip resolves. lodedb doctor detects this
and prints the fix; lodedb doctor --fix reinstalls the CUDA build for you:
lodedb doctor # flags a CPU-only PyTorch on Windows and prints the command
lodedb doctor --fix # reinstalls the CUDA build so embeddings use your NVIDIA GPUOr reinstall manually, picking the index for your CUDA version (cu121, cu124, ...) from the
PyTorch install guide:
pip install torch --force-reinstall --no-deps --index-url https://download.pytorch.org/whl/cu121
uv pip install torch --reinstall --index-url https://download.pytorch.org/whl/cu121 # with uvThis is Windows-only: the default Linux PyPI wheel already bundles CUDA, and macOS uses CPU or MPS.
Build from source (contributors, or a platform without a wheel)
Needs a Rust toolchain and a CBLAS provider (Accelerate on macOS, libopenblas-dev on
Linux). uv builds and bundles the core for you:
git clone https://github.com/Egoist-Machines/LodeDB && cd LodeDB
uv sync # builds + bundles the TurboVec core via maturin
uv sync --extra embeddings --extra torch # + built-in text embedding (ONNX + PyTorch)
uv sync --extra mcp --extra langchain --extra llama-index --extra mem0 # + MCP server, adapters
uv sync --extra gpu # + GPU-resident scan (Linux/CUDA)Run with uv run (e.g. uv run lodedb doctor).
from lodedb import LodeDB
with LodeDB(path="./data", model="minilm") as db: # "minilm" (fast) | "bge" (quality) | "clip" (image+text)
fox = db.add("the quick brown fox jumps", metadata={"topic": "animals"})
db.add("a lazy dog sleeps all day", metadata={"topic": "animals"})
for score, doc_id, meta in db.search("fox", k=5):
print(score, doc_id, meta)
for hits in db.search_many(["fox", "dog"], k=5): # batched; the GPU can serve this
print([(h.score, h.id, h.metadata) for h in hits])
# filter by metadata: exact match, plus $gt/$gte/$lt/$lte/$in/$nin/$exists and $and/$or/$not
db.search("fox", k=5, filter={"topic": "animals"}) # bare scalar = exact
db.search("fox", k=5, filter={"$or": [{"topic": "animals"}, {"year": {"$gte": 2020}}]})
# hybrid search: vector recall plus exact lexical matches the embedding misses
db.add("turbine tripped, fault code E1234 overnight", metadata={"topic": "ops"})
for score, doc_id, meta in db.search("E1234", k=5, mode="hybrid"): # exact code the vector misses
print(score, doc_id, meta)
print(db.get(fox)) # "the quick brown fox jumps" (text retained by default)
# leaving the block persists a durable .tvim/.tvd/.jsd snapshot and releases the storeReopen with LodeDB(path="./data"); no migration step. Original text is kept in a
.tvtext sidecar for db.get; pass store_text=False to keep none. Presets are minilm
(384-dim), bge (768-dim), and clip (512-dim, image+text), with weights pulled from Hugging
Face on first use. More in examples/.
Need to read a store another process is writing to? Open it read-only. It takes no writer lock, so it never blocks on (or is blocked by) the writer:
reader = LodeDB.open_readonly("./data") # or LodeDB(path="./data", read_only=True)
reader.search("fox", k=5) # reads a committed snapshot
reader.add("nope") # raises ReadOnlyError
reader.refresh() # overlay the current WAL tail (see appended records)
reader.applied_lsn() # highest LSN visible; >= an Appender's returned LSN == read-your-writesThe read-only handle is a stable snapshot of the last committed generation until
you call refresh(), which folds in whatever another process (or an Appender)
has written since, without taking a lock or checkpointing. For read-your-writes,
compare applied_lsn() to the LSN an append returned: the record is visible once
applied_lsn() >= that_lsn.
Pick the mode by who owns the embeddings. The Quickstart above uses the recommended default; the other is for when you already compute vectors yourself.
- Text-in (recommended for RAG).
LodeDB(path, model="minilm")owns the embedder: youaddandsearchtext, and LodeDB embeds it, retains it, and can run hybrid BM25 + vector search. This is what most applications want. - Vector-in (bring your own vectors).
LodeDB.open_vector_store(path, vector_dim=...)has no embedder: youadd_vectorsandsearch_by_vectorwith vectors computed elsewhere (any model or hosted API). Use it when you own the embedding step or need a model LodeDB does not bundle.
Text-in (model=) |
Vector-in (open_vector_store) |
|
|---|---|---|
| Add | add / add_many (text) |
add_vectors / add_vectors_many (vectors) |
| Search | search / search_many (text) |
search_by_vector / search_many_by_vector |
| Hybrid / lexical BM25 | yes (mode="hybrid") |
no (no text to rank) |
Raw-text retrieval (get) |
yes (store_text=True, default) |
no (metadata still returned) |
| Embedding runtime | bundled (ONNX / PyTorch) | none (you bring vectors) |
| Calling text verbs on it | works | raises VectorOnlyIndexError |
For per-tenant isolation, open one text-in LodeDB per tenant at its own path (optionally sharing a
single loaded model with embedder=). GPU setup, the performance knobs, the model-alias table, and
operational gotchas live in Deployment and performance.
Hybrid retrieval is the default. Vector search alone misses exact tokens the embedding does not
capture: error codes (E1234), serial numbers (ABC-123), dates (2024-01-15). By default
LodeDB runs a lexical BM25 ranker alongside the vector scan and fuses the two ranked lists with
Reciprocal Rank Fusion, so a document whose body carries the code is recovered even when the
embedding ranks it nowhere near the top. The default resolves to hybrid whenever a text source is
available (the out-of-the-box configuration) and falls back to a plain vector scan otherwise, so
it never raises on a vector-only store.
db.add("the turbine tripped and reported fault code E1234 overnight", metadata={"unit": "t3"})
db.search("E1234", k=5) # default: hybrid (BM25 + RRF) when text is retained
db.search("E1234", k=5, mode="vector") # vector scan alone: may miss the exact code
db.search("E1234", k=5, mode="lexical") # BM25 ranking alone, no vector scanPrerequisites
mode="hybrid" and mode="lexical" build a BM25 index over your text, so they need a text
source enabled when you open the database. Both text sources are on by default, so the hybrid
default works out of the box; mode="vector" needs nothing and is the automatic fallback when no
text source is present.
| Mode | Enable | Source of the BM25 index |
|---|---|---|
"hybrid" (default), "lexical" |
store_text=True (on by default) |
rebuilt in memory from the retained raw text |
"hybrid" (default), "lexical" |
index_text=True (on by default, follows store_text) |
a durable on-disk postings store, no raw text required |
"vector" (fallback) |
nothing | not used |
Both sources are on by default, so hybrid and lexical work out of the box: store_text=True
retains the raw text and index_text defaults to match it, persisting the lexical postings.
Either source alone is enough, and index_text decouples from store_text when set explicitly.
With neither source enabled (store_text=False, index_text=False), the default resolves to a
plain vector scan; an explicit mode="hybrid"/"lexical" then raises a clear, actionable
error rather than silently degrading.
How it works
A filter constrains both rankers, so mode="hybrid" with a filter returns the true top-k of
the matching subset. The vector half of a hybrid query runs on the same scan as mode="vector",
including the GPU-resident batch scan that serves search_many; only the BM25 ranking and the
fusion run on the CPU, and the vector kernel and on-disk format are untouched. The serving BM25
index lives in memory and is maintained incrementally: a small mutation folds just the changed
chunks into the existing index, so a single add never forces a full re-tokenization.
Durable lexical index (`index_text=True`)
index_text defaults to match store_text (on by default), so each document's per-chunk terms
are captured at add time into a dedicated .tvlex sidecar (a base plus a .lxd delta journal,
committed O(changed) per write). Hybrid and lexical search then survive a reopen rebuilt straight
from the persisted terms, with no re-tokenization. Pass index_text=False to skip persistence:
with store_text=True the index is instead rebuilt in memory from the retained raw text on the
first hybrid query after opening. Set it explicitly to decouple the two, e.g. index_text=True, store_text=False for a durable lexical index that retains no raw text at all. The .tvlex
sidecar holds payload-derived terms only and, like the raw-text sidecar, never reaches the
redacted artifacts or telemetry. The tokenizer lowercases and splits on punctuation but keeps
code-like tokens whole, so ABC-123 and 2024-01-15 stay findable as single tokens. Reopen with
the same effective index_text value you wrote with.
db = LodeDB(path="./data", index_text=True, store_text=False) # durable lexical index, no raw text
db.add("the turbine tripped and reported fault code E1234 overnight")
db.close()
reopened = LodeDB(path="./data", index_text=True, store_text=False)
reopened.search("E1234", k=5, mode="hybrid") # works after reopen, rebuilt from persisted termsApproximate search (`ann="cluster"`)
LodeDB scans exactly by default: every query compares against every vector, so recall is 100%.
For large corpora where that full scan is the bottleneck, pass ann="cluster" at create time to
opt into IVF-style cluster pruning. The query scores cluster centroids, scans only the nearest
nprobe clusters, and the exact TurboVec scan re-scores those candidates. Returned scores are
therefore exact, but the result set is approximate: a true neighbor sitting in an unprobed
cluster can be missed, so recall drops below 100%. Exact scan stays the default and the
authority, and probing every cluster reproduces the exact top-k.
db = LodeDB(path="./data", ann="cluster") # opt in; exact is the default
db = LodeDB(path="./data", ann="cluster", ann_clusters=256, ann_nprobe=16) # optional tuning
db.add("the turbine tripped and reported fault code E1234 overnight")
db.search("turbine fault", k=5) # nearest clusters, exact re-scoreann_clusters (partitions) defaults to about sqrt(n) and ann_nprobe (clusters probed per
query) to about sqrt(clusters). ANN is a create-time choice persisted with the index and works
for text and bring-your-own-vector indexes alike; keep the exact default for small-to-mid corpora.
The storage and scan are modality-agnostic: TurboVec stores any normalized float32 vector, so an image, audio, or video embedding is indexed and scanned exactly like a text one. There are two ways to use that.
Bring your own vectors. Open a vector-only index at your dimension and pass the embeddings you already computed with any model (CLIP, SigLIP, ImageBind, an audio or video encoder, a hosted API). No embedding model is bundled on this path:
db = LodeDB.open_vector_store("./media", vector_dim=512)
db.add_vectors(image_vector, id="img-001", metadata={"path": "photos/img-001.jpg"})
db.search_by_vector(query_vector, k=10)Or use the built-in clip preset for image and text in one shared space, so a text
query retrieves images and an image query retrieves images and text. It runs on the
sentence-transformers stack plus Pillow for decoding, both pulled by pip install 'lodedb[image]':
db = LodeDB("./gallery", model="clip") # downloads clip-ViT-B-32 on first use
db.add_image("photos/beach.jpg", metadata={"path": "photos/beach.jpg"})
db.search("a beach at sunset", k=5) # text -> image, cross-modal
db.search_by_image("photos/beach.jpg", k=5) # image -> imageThe raw image is never stored; keep it on disk and put its path in metadata. Keep one
embedding model per index (scores are only comparable within one space); the model
identity is pinned and re-enforced on reopen. To hold several encoders side by side, use
LodeCollection named spaces, and pass embedder= to drive an index with your own
model. See docs/multimodal.md.
For visual-document RAG, ColPali / ColQwen style models encode a page as a set of
patch vectors and rank with MaxSim (sum over query tokens of the best patch match),
rather than pooling to one vector. LodeLateInteractionIndex runs this on the
bring-your-own-vectors path with no engine change: each document is one row holding
its whole patch matrix, and an unfiltered query is answered by an exact resident
scan (the corpus scored in one GEMM plus a segmented max) that returns the true
top-k in a few milliseconds on thousands of pages; filtered queries score the
matching subset exhaustively and over-budget corpora stream from disk (both exact).
from lodedb import LodeLateInteractionIndex
idx = LodeLateInteractionIndex("./pages", dim=128) # bring your own encoder
idx.add_document("report-p1", page_patches, metadata={"file": "report.pdf"})
hits = idx.search(query_tokens, k=5) # [(score, doc_id, metadata), ...]The encoder stays bring-your-own (ColPali / ColQwen weights are multi-GB). Patch
matrices are stored at storage="float32" (default, fastest query and bit-exact),
"float16" (near-exact, half the size), or "int8" (~4x smaller); the choice
persists with the index. See docs/late-interaction.md.
With the [gpu] extra on a CUDA host, LodeDB keeps a reconstructed float32 copy of the
compact index resident on the GPU and scores a batched search_many with a cuBLAS GEMM plus
an on-device top-k. It is opt-in and lazy: single queries, non-CUDA hosts, and GPU-memory
rejection fall back to the CPU scan, which stays the source of truth.
Both scans stream the corpus once per batch and amortize that read across the queries, so
per-query throughput climbs with batch size; the GPU pulls away as the batch grows and its
parallelism dominates. End-to-end through the public arrays API
(search_many_by_vector_arrays, scores and ids) on a 4-bit index (d=1536, 100K):
| query batch | A10 GPU | L40S GPU |
|---|---|---|
| 16 | 8,413 | 12,469 |
| 64 | 19,030 | 33,040 |
| 256 | 23,669 | 49,927 |
| 1024 | 24,359 | 52,940 |
These are the arrays fast path; the default hits API (per-hit result objects) runs somewhat lower, about 18.4k / 35.6k q/s at batch 1024. A single query stays on the CPU (the GPU batch path engages at batch >= 4) and is one exact scan over the whole corpus, so its cost is bound by memory bandwidth rather than the API; batching amortizes that corpus read, which is what the throughput above exploits.
The curves below are the GPU and CPU scans through that same default hits API, so the gap is like-for-like:
A one-line reproduction is in benchmarks/vector_search_native.
Recall matches the CPU scan: the GPU scores the same 4-bit reconstruction, and LodeDB's parity tests hold its document recall within 0.002 of the CPU scan across batch sizes, so moving a query to the GPU changes its latency, not its results.
Other in-process vector databases stay CPU-bound. Alibaba's zvec reports about 8.4k q/s (VectorDBBench, 16-vCPU CPU, Cohere 768-dim); read it as the CPU-class baseline that the GPU-resident path clears.
Scope. GPU search is Linux/CUDA-only and opt-in ([gpu]). macOS scans on the CPU by
default; a first-class opt-in MPS exact scan exists (LODEDB_MPS_DIRECT_TURBOVEC) but NEON
stays the default. On the measured M1 it was slower than NEON at every batch size; newer Apple
GPUs should be re-measured before any default change. See docs/benchmarks.md and
docs/architecture.md.
Most embedded indexes rewrite the whole file on every change (O(N)). LodeDB writes only the rows that changed (O(changed)), so a 1,000-row commit stays sub-millisecond at any size:
| corpus | full rewrite | delta export | speedup |
|---|---|---|---|
| 100K | 42.4 ms | 0.25 ms | 173× |
| 500K | 190.4 ms | 0.24 ms | 782× |
| 1M | 404.9 ms | 0.31 ms | 1,308× |
The GPU path makes reads fast; the delta makes writes cheap. The on-disk format stays a plain snapshot that replays on reopen.
The opt-in raw-text store (store_text=True) is journaled the same way: an incremental commit
appends a small .txd text delta instead of rewriting the whole document_id -> text map, so
enabling text retrieval keeps commits O(changed) too. Isolated, the per-commit text write drops
from a full-map rewrite (~57 ms at 20K docs, ~244 ms at 80K) to a flat ~0.7 ms regardless of
corpus size.
And the rest of an incremental add() is O(changed) too: a single-doc update no longer rebuilds
the whole index layout or rewrites the full text map on the commit path, so write latency stays
flat as the corpus grows instead of climbing with it.
All artifacts are metrics-only (counts, bytes, latency), never payloads. Full methodology and the complete figure set are in docs/benchmarks.md; each benchmarks/ folder has a README and a one-line reproduction command.
Local is the common case. On an Apple M1 (MiniLM, 20K docs) the CPU scan is ~0.25 ms p50, and end-to-end single-query latency is 5.7 ms p50.
lodedb doctor # capability report: embedding / GPU / TurboVec backend
lodedb index ... # build / add to an on-disk index
lodedb query ... # search
lodedb serve # dev server (127.0.0.1 by default; private LAN only, no auth)
lodedb mcp # stdio MCP server for agent memory
lodedb benchmark # local, metrics-only benchmarkLodeDB ships a Model Context Protocol server, so an agent
can use a local on-disk database as long-term memory or a RAG store. It runs over stdio, adds
no storage logic of its own, and your data stays on the machine. The server embeds text to add
and search, so install the MCP and embedding extras together, then point your host at lodedb mcp:
pip install "lodedb[mcp,embeddings]"
# for coding assistant:
lodedb mcp install --client claude-code # or: claude-desktop | cursor | lm-studio | codex | allIt exposes lodedb_add, lodedb_search, lodedb_remove, and lodedb_stats, plus
lodedb_get when text is available. lodedb_search returns each hit's stored text alongside
the score, id, and metadata, so a model can rank and answer in a single call rather than
chaining a follow-up lookup. It runs hybrid search (BM25 lexical + vector,
fused with RRF) by default when text is retained, so exact tokens like error codes and serials
surface next to semantic matches; with no text retained it falls back to a vector scan. Start
the server with --exclude-text to return metrics only (this also withdraws lodedb_get), or
--no-store-text to keep no text on disk at all. lodedb_stats is always metrics-only and raw
query text never leaves the process.
lodedb mcp install writes the correct entry to a client's config for you, so you do not have to
find the file or hand-write the JSON/TOML:
lodedb mcp install --client claude-code # or: claude-desktop | cursor | lm-studio | codex | all
lodedb mcp install --client cursor --path ./data --model bgeIt resolves the launch command for your environment, so command and args are correct even when
lodedb is not on PATH (it falls back to the uv run --project ... form, then an absolute path to
the entry point), and it resolves --path to an absolute path so the server opens the right
directory wherever the client starts it. The edit is idempotent (an existing lodedb entry is
updated, never duplicated) and never touches other servers in the file. It passes through the same
options as lodedb mcp (--path, --model, --device, --exclude-text, --no-store-text);
--dry-run prints the entry and target file without writing, and lodedb mcp uninstall --client <client> removes it again. Override the config location with --config <path> (Claude Desktop and
LM Studio paths differ per OS), and use --project <dir> to write Cursor's project-level
.cursor/mcp.json. For Claude Code it runs claude mcp add; for the others it edits the config file
directly.
Register by hand (Claude Code, Claude Desktop, Cursor, LM Studio, Codex)
The lodedb command must be on the host's PATH; if you installed into a virtual environment
(including a uv project) where it isn't, use the uv run form at the bottom.
Claude Code, Claude Desktop, Cursor, LM Studio: add the stdio entry to the host's MCP
config (claude_desktop_config.json, .cursor/mcp.json, or LM Studio's mcp.json), or run
claude mcp add lodedb -- lodedb mcp --path ./data:
{ "mcpServers": { "lodedb": { "command": "lodedb", "args": ["mcp", "--path", "./data"] } } }Codex: add to ~/.codex/config.toml:
[mcp_servers.lodedb]
command = "lodedb"
args = ["mcp", "--path", "./data"]From a virtual environment (uv), when lodedb is not on PATH:
{ "mcpServers": { "lodedb": { "command": "uv",
"args": ["run", "--project", "/path/to/LodeDB", "lodedb", "mcp", "--path", "/path/to/data"] } } }See examples/mcp_config.json for a copy-paste starting point.
- Single writer, many readers, per path. One handle holds the path open for writing at
a time (an exclusive OS advisory lock); a second writer waits for it to close, then fails
fast (
ConcurrentWriterError) afterLODEDB_PERSIST_LOCK_TIMEOUT(default 30s). Read-only handles (LodeDB.open_readonly(path)orread_only=True; used bylodedb query/get) take no lock, so they read one consistent committed snapshot while a writer is open. They just don't auto-see the writer's in-flight changes (no live cross-process refresh). Within one process the engine serializes operations under an in-process lock, so the threadedlodedb servesafely shares one handle. - Crash-atomic commits. A commit spans several files, but it is sealed by atomically
swapping one
<key>.commit.jsonroot pointer over generation-addressed artifacts, so a crash mid-commit rolls back to the last committed generation on reopen (never a torn, half-applied store) and readers always load one consistent generation. - Durability is
fastby default. Commits are atomic but not fsync'd. Passdurability="fsync"(or--durability fsync/LODEDB_DURABILITY=fsync) to fsync each file and its directory on commit for power-loss durability, at some commit-throughput cost. - WAL commit by default for low-latency durable writes. Each
add/removeappends one framed record to a<key>.wallog and a full generation is checkpointed periodically, so a durable single add costs roughly an order of magnitude less than publishing a whole generation per write, into the sqlite-vec/qdrant range (see the comparison up top). The WAL is replayed crash-atomically on reopen (a half-written trailing record is discarded), every writable open folds it into a clean committed generation, andclose()/persist()checkpoint it. WAL is single-writer: a concurrentopen_readonlyreader still loads a consistent committed generation, but the last checkpointed one, not the writer's in-flight WAL. Passcommit_mode="generation"(orLODEDB_COMMIT_MODE=generation) for the classic path that publishes a crash-atomic, MVCC-readable generation on every write; pick it when many out-of-process readers must see each write the instant it commits. Note<key>.walis payload-bearing before a checkpoint (raw text understore_text=True, otherwise embedding deltas plus, withindex_text=True, lexical tokens), so treat it as sensitively as the data it indexes;persist()/close()checkpoint and truncate it, andgenerationmode keeps no WAL. See the payload boundary docs. - Concurrent multi-writer append (WAL mode). Beyond the single exclusive writer, many
processes can append to one path at once through a shared-lock appender. Each takes a shared
lock (the exclusive writer's lock still excludes them), logs self-contained vector-in records to
<key>.walordered by a durable, process-shared sequence allocator, and the next writable open folds them into a clean generation. Appends are durable once acknowledged underdurability="fsync"(the defaultfastis atomic but not fsynced, like the writer's own adds) and become queryable after the next writable open folds them in, or immediately in a read-only handle that callsrefresh()to overlay the WAL tail (whoseapplied_lsn()then gives read-your-writes against an append's returned LSN). On Windows the shared lock degrades to an exclusive hold, so appenders serialize there rather than coexisting. A record is a precomputed vector plus metadata (with an optional caption, e.g. for an image, retained only when the appender opts intostore_text-- off by default, so no raw text reaches the WAL); with an embedder configured, the appender also ingests full text (chunked by the core, embedded in the binding layer, then logged as a post-embedding record) so text writes are multi-producer too, without a captured base generation. It requires WAL commit mode. Exposed as the nativeCoreAppender, over the C ABI, in Python (lodedb.Appender:append/append_manyfor vectors,append_text/append_text_manyfor text), and in Swift (LodeAppender). - Running checkpointer (WAL mode). So appended records become durable without an
application re-opening a writer, a running checkpointer folds the WAL into fresh
generations continuously. It holds a crash-reclaimable lease (a sentinel distinct from
the writer lock, so it elects one checkpointer at a time) and takes the exclusive writer
lock only for the brief window of each fold, so appenders keep logging between folds.
Drive its
checkpoint()on a loop or timer; each fold advances the committed generation, so a read-only handle'srefresh()(or a fresh open) sees the appended records shortly after they are logged, with no writable open in the loop. A dead lease-holder's lease is reclaimable, so a fresh checkpointer takes over after a crash. Exposed as the nativeCoreCheckpointer, over the C ABI, in Python (lodedb.Checkpointer), and in Swift (LodeCheckpointer). - Local filesystems only. The OS advisory lock is unreliable on NFS/SMB.
LodeDB has a native Swift binding for macOS and iOS over the same Rust core (no Python
runtime, no network, on device). It exposes durable open/persist, vector/text/hybrid
search, the full metadata-filter grammar, batched search, late-interaction (MaxSim),
a concurrent WAL appender (LodeAppender) for multi-process ingest,
on-device embedders (Apple NLEmbedding out of the box, or an ONNX parity path), and a
LodeMemory save/recall/forget facade for agent memory. The .tvim format is
byte-compatible, so an index built here loads on a phone. See
swift/LodeDBCore/README.md and the agent contract in
docs/swift-agent-contract.md.
- Exact scan by default; opt-in ANN. Exact scan is the default and the authority (full
recall). An opt-in IVF-style cluster-prune index (
ann="cluster") trades a little recall for speed on large corpora by scanning only the nearest clusters and exactly re-scoring the candidates. Built for small-to-mid corpora, not billion-scale. - GPU-resident scan is Linux/CUDA-only and opt-in (
[gpu]). macOS has a first-class, opt-in Metal (MPS) exact scan (LODEDB_MPS_DIRECT_TURBOVEC=auto); NEON is the default and was faster on the measured M1, so the MPS scan stays off by default until newer Apple GPUs are re-measured. - Single queries run on the CPU; the GPU serves batched
search_many. - Hybrid search needs a lexical source and serves from memory.
mode="hybrid"/"lexical"need eitherstore_text=True(the index built from raw text) orindex_text=True(a durable.tvlexpostings store that survives reopens without raw text). The serving index is held in memory and maintained incrementally across mutations. - Single exclusive writer per path. One full writer at a time (many concurrent readers), with no live cross-process refresh, on local filesystems only. Concurrent append is the exception: in WAL mode many processes can log vector-in records at once via a shared-lock appender, folded in by the next writer. See Concurrency & durability.
- Model weights download from Hugging Face on first use, then cache locally.
The compact core is the upstream MIT TurboVec
project (© Ryan Codrai), vendored under third_party/turbovec/
with its license preserved. LodeDB's lifecycle patches (encoded-row export/import,
upsert_with_ids, calibration) are Apache-2.0. See NOTICE.
Apache-2.0 (LICENSE). The bundled TurboVec core is MIT (NOTICE,
third_party/turbovec/LICENSE). "LodeDB" and
"Egoist Machines" are trademarks; Apache-2.0 grants no
trademark rights (§6).
Enterprise licensing and commercial support are available from Egoist Machines, Inc.: contact sales@egoistmachines.com.
PRs welcome; see CONTRIBUTING.md. Report security issues privately
per SECURITY.md, not in public issues. Other bugs and requests go to the
issue tracker.
