587x faster semantic search for markdown files
QMD-Fast is an optimized fork of QMD that replaces heavy LLM inference with lightweight transformer models for dramatically improved query performance.
| Metric | Original QMD | QMD-Fast | Speedup |
|---|---|---|---|
| Avg Query Latency | 63,194 ms | 108 ms | 587x |
| P50 Query Latency | 4,347 ms | 107 ms | 40x |
| Index Time | 421 ms | 358 ms | 1.2x |
Benchmark: 100 markdown files, 5 semantic queries, Apple M1 Max
| Component | Original QMD | QMD-Fast |
|---|---|---|
| Embedding | embeddinggemma-300M (node-llama-cpp) | BGE-base-en-v1.5 (@xenova/transformers) |
| Reranking | qwen3-reranker-0.6B (node-llama-cpp) | ms-marco-MiniLM-L-6-v2 (@xenova/transformers) |
| Query Expansion | 1.7B LLM | Dictionary-based (0ms) |
| Total Model Params | 2.6B | 132M |
# Clone the repository
git clone https://github.com/anthropics/qmd-fast.git
cd qmd-fast
# Install dependencies
bun installQMD-Fast is a drop-in replacement for QMD with the same CLI:
# Index a directory
bun src/qmd.ts collection add ./docs --name mydocs --mask "*.md"
# Create embeddings
bun src/qmd.ts embed
# Search (fast!)
bun src/qmd.ts query "authentication and login"src/llm-fast.ts- Optimized LLM implementation using @xenova/transformerssrc/store.ts- Modified to use llm-fastsrc/qmd.ts- Modified to use llm-fast
Instead of running a 300M parameter GGUF model through node-llama-cpp (CPU inference), QMD-Fast uses:
- Model:
Xenova/bge-base-en-v1.5(110M params) - Runtime: ONNX via @xenova/transformers
- Result: ~15ms vs ~10,000ms per embedding
Instead of a 600M parameter LLM for scoring:
- Model:
Xenova/ms-marco-MiniLM-L-6-v2(22M params) - Architecture: Cross-encoder (purpose-built for reranking)
- Result: ~80ms vs ~5,000ms per rerank batch
Instead of a 1.7B parameter LLM generating query variations:
- Method: Dictionary-based synonym lookup
- Result: 0ms vs ~3,000ms per query
- Bun >= 1.0.0
- macOS, Linux, or Windows
- Original QMD by Tobi Lütke
- Optimizations by K2 Team
- Models by Hugging Face community
MIT (same as original QMD)