Skip to content

gemini2026/qmd-fast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QMD-Fast

587x faster semantic search for markdown files

QMD-Fast is an optimized fork of QMD that replaces heavy LLM inference with lightweight transformer models for dramatically improved query performance.

Performance

Metric Original QMD QMD-Fast Speedup
Avg Query Latency 63,194 ms 108 ms 587x
P50 Query Latency 4,347 ms 107 ms 40x
Index Time 421 ms 358 ms 1.2x

Benchmark: 100 markdown files, 5 semantic queries, Apple M1 Max

What Changed

Component Original QMD QMD-Fast
Embedding embeddinggemma-300M (node-llama-cpp) BGE-base-en-v1.5 (@xenova/transformers)
Reranking qwen3-reranker-0.6B (node-llama-cpp) ms-marco-MiniLM-L-6-v2 (@xenova/transformers)
Query Expansion 1.7B LLM Dictionary-based (0ms)
Total Model Params 2.6B 132M

Installation

# Clone the repository
git clone https://github.com/anthropics/qmd-fast.git
cd qmd-fast

# Install dependencies
bun install

Usage

QMD-Fast is a drop-in replacement for QMD with the same CLI:

# Index a directory
bun src/qmd.ts collection add ./docs --name mydocs --mask "*.md"

# Create embeddings
bun src/qmd.ts embed

# Search (fast!)
bun src/qmd.ts query "authentication and login"

Key Files

  • src/llm-fast.ts - Optimized LLM implementation using @xenova/transformers
  • src/store.ts - Modified to use llm-fast
  • src/qmd.ts - Modified to use llm-fast

How It Works

Embeddings

Instead of running a 300M parameter GGUF model through node-llama-cpp (CPU inference), QMD-Fast uses:

  • Model: Xenova/bge-base-en-v1.5 (110M params)
  • Runtime: ONNX via @xenova/transformers
  • Result: ~15ms vs ~10,000ms per embedding

Reranking

Instead of a 600M parameter LLM for scoring:

  • Model: Xenova/ms-marco-MiniLM-L-6-v2 (22M params)
  • Architecture: Cross-encoder (purpose-built for reranking)
  • Result: ~80ms vs ~5,000ms per rerank batch

Query Expansion

Instead of a 1.7B parameter LLM generating query variations:

  • Method: Dictionary-based synonym lookup
  • Result: 0ms vs ~3,000ms per query

Requirements

  • Bun >= 1.0.0
  • macOS, Linux, or Windows

Credits

  • Original QMD by Tobi Lütke
  • Optimizations by K2 Team
  • Models by Hugging Face community

License

MIT (same as original QMD)

About

587x faster semantic search for markdown files - optimized fork of QMD using @xenova/transformers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors