minisearch

minisearch is a small Rust search engine crate and CLI for indexing local text content with an inverted index, BM25-style scoring, phrase, proximity, and fuzzy matching, metadata filters, highlighted snippets, and on-disk persistence.

Features

Recursive directory indexing for .txt and .md files by default
Custom indexing options for file extensions and maximum file size
Lowercased alphanumeric tokenization with positional postings
BM25-style ranking for term queries
Phrase search with quoted queries like "distributed systems"
Proximity search with slop syntax like "distributed systems"~3
Fuzzy search with typo-tolerant term syntax like serch~1
Metadata filters like ext:rs, path:guides/, and title:search
Highlighted snippets with [[...]] markers around matched text
Required and excluded terms or phrases via +term, -term, +"phrase", and -"phrase"
Search-time filters for path prefixes and minimum score thresholds
Simple save/load support for persisting an index to disk
Lightweight stats helpers for vocabulary inspection and top terms

Install

cargo add minisearch

To use the CLI locally:

cargo run -- <command>

Python Bindings

The repository also includes a small Python wrapper in python/minisearch that calls the Rust backend through a C ABI exposed by this crate.

Build the shared library:

cargo build --release --features python-bindings

Run the Python example from the repo root:

python3 python/examples/basic.py

Or use it directly:

from minisearch import SearchEngine, SearchOptions

engine = SearchEngine()
engine.add_document("guides/rust.txt", "Rust phrase search and BM25 ranking.")
results = engine.search("rust", SearchOptions(top_k=5))

for result in results:
    print(result.path, result.score)

If the shared library lives outside target/release or target/debug, set MINISEARCH_LIBRARY to the full path before importing minisearch.

Quick Start

use minisearch::{SearchEngine, SearchOptions};

fn main() {
    let mut engine = SearchEngine::new();
    engine.add_document(
        "guides/project.txt",
        "A mini search engine in Rust with BM25 ranking and phrase search.",
    );
    engine.add_document(
        "notes/distributed.txt",
        "This document talks about distributed systems and indexing.",
    );

    let results = engine.search_with_options(
        "path:guides/ ext:txt rust serch~1 +\"phrase search\"",
        &SearchOptions::new(10).with_path_prefix("guides/"),
    );

    for result in results {
        println!(
            "{} -> {:.3} [{}]",
            result.path,
            result.score,
            result.matched_terms.join(", ")
        );
        if let Some(snippet) = result.snippet {
            println!("snippet: {snippet}");
        }
    }
}

Query Syntax

Syntax	Meaning	Example
`rust bm25`	Optional terms ranked by BM25	`rust bm25`
`+rust`	Required term	`+rust search`
`-java`	Excluded term	`rust -java`
`serch~1`	Fuzzy term match within edit distance 1	`serch~1`
`ext:rs`	Required extension filter	`ext:rs`
`path:guides/`	Required path-prefix filter	`path:guides/`
`title:search`	Required title-term filter	`title:search`
`"phrase search"`	Phrase boost / phrase-only search	`"phrase search"`
`"distributed systems"~3`	Ordered proximity search with up to 3 extra tokens between terms	`"distributed systems"~3`
`+"phrase search"`	Required phrase	`rust +"phrase search"`
`-"toy example"`	Excluded phrase	`rust -"toy example"`

Notes:

Optional terms contribute score when they appear.
Required terms and required phrases must match for a document to be returned.
Excluded terms and phrases remove a document from the result set.
Fuzzy terms use term~N and match indexed terms within edit distance N.
Metadata filters default to required; use -ext:md, -path:notes/, or -title:generated to exclude.
Phrase-only queries work even when no standalone terms are present.
Proximity phrases preserve term order and allow up to N extra intervening tokens.
Search results include an optional highlighted snippet built from the original stored document text.

Library API

Build an Index from a Directory

use minisearch::{IndexOptions, SearchEngine};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let options = IndexOptions::default()
        .with_extensions(["md", "txt", "rs"])
        .with_max_file_size_bytes(250_000);

    let engine = SearchEngine::build_from_directory_with_options("src", &options)?;
    println!("Indexed {} documents", engine.document_count());
    Ok(())
}

Filter Search Results

use minisearch::{SearchEngine, SearchOptions};

fn main() {
    let mut engine = SearchEngine::new();
    engine.add_document("guides/rust.md", "rust search engine rust phrase search");
    engine.add_document("notes/rust.md", "rust notes");

    let options = SearchOptions::new(5)
        .with_path_prefix("guides/")
        .with_min_score(1.0);

    for result in engine.search_with_options("rust", &options) {
        println!("{} -> {:.3}", result.path, result.score);
    }
}

Inspect the Vocabulary

use minisearch::SearchEngine;

fn main() {
    let mut engine = SearchEngine::new();
    engine.add_document("guide.txt", "rust rust search");
    engine.add_document("notes.txt", "rust indexing");

    println!("document frequency: {}", engine.document_frequency("rust"));
    println!("term frequency in doc 0: {}", engine.term_frequency(0, "rust"));

    for stat in engine.top_terms(3) {
        println!(
            "{} -> total {}, docs {}",
            stat.term, stat.total_frequency, stat.document_frequency
        );
    }
}

Save and Reload an Index

use minisearch::SearchEngine;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut engine = SearchEngine::new();
    engine.add_document("guide.txt", "rust search engine rust bm25");
    engine.save_to_path("sample.idx")?;

    let loaded = SearchEngine::load_from_path("sample.idx")?;
    println!("loaded {} documents", loaded.document_count());
    Ok(())
}

CLI

Commands

minisearch index <docs_dir> <index_file> [--ext=txt,md,rs] [--max-bytes=1048576]
minisearch search <index_file> <query> [top_k] [--path-prefix=guides/] [--min-score=1.0]
minisearch stats <index_file> [top_terms]
minisearch demo

Examples

cargo run -- index docs search.idx --ext=txt,md,rs --max-bytes=100000
cargo run -- search search.idx 'rust +"phrase search"' 5 --path-prefix=guides/
cargo run -- search search.idx 'bm25' --min-score=1.0
cargo run -- stats search.idx 10
cargo run -- demo

Included Examples

Run any example with cargo run --example <name>.

basic: in-memory indexing plus filtered search
custom_indexing: directory indexing with custom extensions and file size limits
filtered_search: search-time path and score filters
persistence: save/load and vocabulary statistics
query_syntax: inspect parsed queries and required/excluded phrases

Public Types

SearchEngine: the main in-memory index
SearchOptions: search-time filters like top_k, path_prefix, and min_score
IndexOptions: directory indexing controls for extensions and max file size
SearchResult: a matched document with score and matched terms snippet contains a highlighted excerpt using [[...]] markers when source content is available.
TermStat: aggregated term statistics for reporting
ParsedQuery / PhraseQuery / FuzzyTermQuery / MetadataFilter: parsed query structures if you want to inspect or cache queries

Persistence Format

Indexes are stored in a plain-text format that begins with the MSE3 header and records:

average document length
document metadata including extension, title, and modified timestamp
original content for snippet generation
positional postings for each term

Older MSE1 and MSE2 indexes still load. Legacy indexes derive missing metadata from the stored path/content, and MSE1 results still lack snippets because those files never stored original content.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
examples		examples
python		python
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

minisearch

Features

Install

Python Bindings

Quick Start

Query Syntax

Library API

Build an Index from a Directory

Filter Search Results

Inspect the Vocabulary

Save and Reload an Index

CLI

Commands

Examples

Included Examples

Public Types

Persistence Format

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

minisearch

Features

Install

Python Bindings

Quick Start

Query Syntax

Library API

Build an Index from a Directory

Filter Search Results

Inspect the Vocabulary

Save and Reload an Index

CLI

Commands

Examples

Included Examples

Public Types

Persistence Format

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages