minisearch is a small Rust search engine crate and CLI for indexing local text content with an inverted index, BM25-style scoring, phrase, proximity, and fuzzy matching, metadata filters, highlighted snippets, and on-disk persistence.
- Recursive directory indexing for
.txtand.mdfiles by default - Custom indexing options for file extensions and maximum file size
- Lowercased alphanumeric tokenization with positional postings
- BM25-style ranking for term queries
- Phrase search with quoted queries like
"distributed systems" - Proximity search with slop syntax like
"distributed systems"~3 - Fuzzy search with typo-tolerant term syntax like
serch~1 - Metadata filters like
ext:rs,path:guides/, andtitle:search - Highlighted snippets with
[[...]]markers around matched text - Required and excluded terms or phrases via
+term,-term,+"phrase", and-"phrase" - Search-time filters for path prefixes and minimum score thresholds
- Simple save/load support for persisting an index to disk
- Lightweight stats helpers for vocabulary inspection and top terms
cargo add minisearchTo use the CLI locally:
cargo run -- <command>The repository also includes a small Python wrapper in python/minisearch that calls the Rust backend through a C ABI exposed by this crate.
Build the shared library:
cargo build --release --features python-bindingsRun the Python example from the repo root:
python3 python/examples/basic.pyOr use it directly:
from minisearch import SearchEngine, SearchOptions
engine = SearchEngine()
engine.add_document("guides/rust.txt", "Rust phrase search and BM25 ranking.")
results = engine.search("rust", SearchOptions(top_k=5))
for result in results:
print(result.path, result.score)If the shared library lives outside target/release or target/debug, set MINISEARCH_LIBRARY to the full path before importing minisearch.
use minisearch::{SearchEngine, SearchOptions};
fn main() {
let mut engine = SearchEngine::new();
engine.add_document(
"guides/project.txt",
"A mini search engine in Rust with BM25 ranking and phrase search.",
);
engine.add_document(
"notes/distributed.txt",
"This document talks about distributed systems and indexing.",
);
let results = engine.search_with_options(
"path:guides/ ext:txt rust serch~1 +\"phrase search\"",
&SearchOptions::new(10).with_path_prefix("guides/"),
);
for result in results {
println!(
"{} -> {:.3} [{}]",
result.path,
result.score,
result.matched_terms.join(", ")
);
if let Some(snippet) = result.snippet {
println!("snippet: {snippet}");
}
}
}| Syntax | Meaning | Example |
|---|---|---|
rust bm25 |
Optional terms ranked by BM25 | rust bm25 |
+rust |
Required term | +rust search |
-java |
Excluded term | rust -java |
serch~1 |
Fuzzy term match within edit distance 1 | serch~1 |
ext:rs |
Required extension filter | ext:rs |
path:guides/ |
Required path-prefix filter | path:guides/ |
title:search |
Required title-term filter | title:search |
"phrase search" |
Phrase boost / phrase-only search | "phrase search" |
"distributed systems"~3 |
Ordered proximity search with up to 3 extra tokens between terms | "distributed systems"~3 |
+"phrase search" |
Required phrase | rust +"phrase search" |
-"toy example" |
Excluded phrase | rust -"toy example" |
Notes:
- Optional terms contribute score when they appear.
- Required terms and required phrases must match for a document to be returned.
- Excluded terms and phrases remove a document from the result set.
- Fuzzy terms use
term~Nand match indexed terms within edit distanceN. - Metadata filters default to required; use
-ext:md,-path:notes/, or-title:generatedto exclude. - Phrase-only queries work even when no standalone terms are present.
- Proximity phrases preserve term order and allow up to
Nextra intervening tokens. - Search results include an optional highlighted snippet built from the original stored document text.
use minisearch::{IndexOptions, SearchEngine};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let options = IndexOptions::default()
.with_extensions(["md", "txt", "rs"])
.with_max_file_size_bytes(250_000);
let engine = SearchEngine::build_from_directory_with_options("src", &options)?;
println!("Indexed {} documents", engine.document_count());
Ok(())
}use minisearch::{SearchEngine, SearchOptions};
fn main() {
let mut engine = SearchEngine::new();
engine.add_document("guides/rust.md", "rust search engine rust phrase search");
engine.add_document("notes/rust.md", "rust notes");
let options = SearchOptions::new(5)
.with_path_prefix("guides/")
.with_min_score(1.0);
for result in engine.search_with_options("rust", &options) {
println!("{} -> {:.3}", result.path, result.score);
}
}use minisearch::SearchEngine;
fn main() {
let mut engine = SearchEngine::new();
engine.add_document("guide.txt", "rust rust search");
engine.add_document("notes.txt", "rust indexing");
println!("document frequency: {}", engine.document_frequency("rust"));
println!("term frequency in doc 0: {}", engine.term_frequency(0, "rust"));
for stat in engine.top_terms(3) {
println!(
"{} -> total {}, docs {}",
stat.term, stat.total_frequency, stat.document_frequency
);
}
}use minisearch::SearchEngine;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut engine = SearchEngine::new();
engine.add_document("guide.txt", "rust search engine rust bm25");
engine.save_to_path("sample.idx")?;
let loaded = SearchEngine::load_from_path("sample.idx")?;
println!("loaded {} documents", loaded.document_count());
Ok(())
}minisearch index <docs_dir> <index_file> [--ext=txt,md,rs] [--max-bytes=1048576]
minisearch search <index_file> <query> [top_k] [--path-prefix=guides/] [--min-score=1.0]
minisearch stats <index_file> [top_terms]
minisearch demo
cargo run -- index docs search.idx --ext=txt,md,rs --max-bytes=100000
cargo run -- search search.idx 'rust +"phrase search"' 5 --path-prefix=guides/
cargo run -- search search.idx 'bm25' --min-score=1.0
cargo run -- stats search.idx 10
cargo run -- demoRun any example with cargo run --example <name>.
basic: in-memory indexing plus filtered searchcustom_indexing: directory indexing with custom extensions and file size limitsfiltered_search: search-time path and score filterspersistence: save/load and vocabulary statisticsquery_syntax: inspect parsed queries and required/excluded phrases
SearchEngine: the main in-memory indexSearchOptions: search-time filters liketop_k,path_prefix, andmin_scoreIndexOptions: directory indexing controls for extensions and max file sizeSearchResult: a matched document with score and matched termssnippetcontains a highlighted excerpt using[[...]]markers when source content is available.TermStat: aggregated term statistics for reportingParsedQuery/PhraseQuery/FuzzyTermQuery/MetadataFilter: parsed query structures if you want to inspect or cache queries
Indexes are stored in a plain-text format that begins with the MSE3 header and records:
- average document length
- document metadata including extension, title, and modified timestamp
- original content for snippet generation
- positional postings for each term
Older MSE1 and MSE2 indexes still load. Legacy indexes derive missing metadata from the stored path/content, and MSE1 results still lack snippets because those files never stored original content.