From ad6e812d2f803f3a404f6497357c80972383cd7b Mon Sep 17 00:00:00 2001
From: Joel Dierkes
Date: Mon, 22 Dec 2025 11:54:16 +0100
Subject: [PATCH] chore: update the README
---
README.md | 167 +++++++++++++++++++++++++-----------------------------
1 file changed, 76 insertions(+), 91 deletions(-)
diff --git a/README.md b/README.md
index a65d288..167a87f 100644
--- a/README.md
+++ b/README.md
@@ -18,23 +18,6 @@
-## Why mgrep?
-- Natural-language search that feels as immediate as `grep`.
-- Semantic, multilingual & multimodal (audio, video support coming soon!)
-- Web search built-in — query the web alongside your local files with `--web`.
-- Smooth background indexing via `mgrep watch`, designed to detect and keep up-to-date everything that matters inside any git repository.
-- Friendly device-login flow and first-class coding agent integrations.
-- Built for agents and humans alike, and **designed to be a helpful tool**, not a restrictive harness: quiet output, thoughtful defaults, and escape hatches everywhere.
-- Reduces the token usage of your agent by 2x while maintaining superior performance
-
-```bash
-# index once
-mgrep watch
-
-# then ask your repo things in natural language
-mgrep "where do we set up auth?"
-```
-
## Quick Start
1. **Install**
@@ -69,51 +52,30 @@ mgrep "where do we set up auth?"
```
Searches default to the current working directory unless you pass a path.
-**Today, `mgrep` works great on:** code, text, PDFs, images.
+**Today, `mgrep` works great on:** code, text, PDFs, images.
**Coming soon:** audio & video.
-## Using it with Coding Agents
-
-`mgrep` supports assisted installation commands for many agents:
-- `mgrep install-claude-code` for Claude Code
-- `mgrep install-opencode` for OpenCode
-- `mgrep install-codex` for Codex
-- `mgrep install-droid` for Factory Droid
-
-These commands sign you in (if needed) and add Mixedbread `mgrep` support to the
-agent. After that you only have to start the agent in your project folder, thats
-it.
-
-### More Agents Coming Soon
-
-More agents (Cursor, Windsurf, etc.) are on the way—this section will grow as soon as each integration lands.
-
-## Making your agent smarter
-
-We plugged `mgrep` into Claude Code and ran a benchmark of 50 QA tasks to evaluate the economics of `mgrep` against `grep`.
-
-
-
-In our 50-task benchmark, `mgrep`+Claude Code used ~2x fewer tokens than grep-based workflows at similar or better judged quality.
-
-`mgrep` finds the relevant snippets in a few semantic queries first, and the model spends its capacity on reasoning instead of scanning through irrelevant code from endless `grep` attempts. You can [Try it yourself](http://demo.mgrep.mixedbread.com).
-
-*Note: Win Rate (%) was calculated by using an LLM as a judge.*
+## Why mgrep?
-## Why we built mgrep
+- Natural-language search that feels as immediate as `grep`.
+- Semantic, multilingual & multimodal (audio, video support coming soon!)
+- Web search built-in — query the web alongside your local files with `--web`.
+- Smooth background indexing via `mgrep watch`, designed to detect and keep up-to-date everything that matters inside any git repository.
+- Friendly device-login flow and first-class coding agent integrations.
+- Built for agents and humans alike, and **designed to be a helpful tool**, not a restrictive harness: quiet output, thoughtful defaults, and escape hatches everywhere.
+- Reduces the token usage of your agent by 2x while maintaining superior performance
`grep` is an amazing tool. It's lightweight, compatible with just about every machine on the planet, and will reliably surface any potential match within any target folder.
But grep is **from 1973**, and it carries the limitations of its era: you need exact patterns and it slows down considerably in the cases where you need it most, on large codebases.
-Worst of all, if you're looking for deeply-buried critical business logic, you cannot describe it: you have to be able to accurately guess what kind of naming patterns would have been used by the previous generations of engineers at your workplace for `grep` to find it. This will often result in watching a coding agent desperately try hundreds of patterns, filling its token window, and your upcoming invoice, with thousands of tokens.
-
-But it doesn't have to be this way. Everything else in our toolkit is increasingly tailored to understand us, and so should our search tools. `mgrep` is our way to bring `grep` to 2025, integrating all of the advances in semantic understanding and code-search, without sacrificing anything that has made `grep` such a useful tool.
+Worst of all, if you're looking for deeply-buried critical business logic, you cannot describe it: you have to be able to accurately guess what kind of naming patterns would have been used by the previous generations of engineers at your workplace for `grep` to find it. This will often result in watching a coding agent desperately try hundreds of patterns, filling its token window, and your upcoming invoice, with thousands of tokens.
-Under the hood, `mgrep` is powered by [Mixedbread Search](https://www.mixedbread.com/blog/mixedbread-search), our full-featured search solution. It combines state-of-the-art semantic retrieval models with context-aware parsing and optimized inference methods to provide you with a natural language companion to `grep`. We believe both tools belong in your toolkit: use `grep` for exact matches, `mgrep` for semantic understanding and intent.
+But it doesn't have to be this way. Everything else in our toolkit is increasingly tailored to understand us, and so should our search tools. `mgrep` is our way to bring `grep` to 2025, integrating all of the advances in semantic understanding and code-search, without sacrificing anything that has made `grep` such a useful tool.
+Under the hood, `mgrep` is powered by [Mixedbread Search](https://www.mixedbread.com/blog/mixedbread-search), our full-featured search solution. It combines state-of-the-art semantic retrieval models with context-aware parsing and optimized inference methods to provide you with a natural language companion to `grep`.
-## When to use what
+### When to use what
We designed `mgrep` to complement `grep`, not replace it. The best code search combines `mgrep` with `grep`.
@@ -122,23 +84,35 @@ We designed `mgrep` to complement `grep`, not replace it. The best code search c
| **Exact Matches** | **Intent Search** |
| Symbol tracing, Refactoring, Regex | Code exploration, Feature discovery, Onboarding |
-## Web Search
+## Benchmarks
-`mgrep` can also search the web alongside your local files. This is useful when
-you need to find documentation, tutorials, or answers to programming questions
-without leaving your terminal.
+We plugged `mgrep` into Claude Code and ran a benchmark of 50 QA tasks to evaluate the economics of `mgrep` against `grep`.
-```bash
-# Search the web and get a summarized answer
-mgrep --web --answer "How do I integrate a JavaScript runtime into Deno?"
+
-# Get the urls of the search
-mgrep --web "best practices for error handling in TypeScript"
-```
+In our 50-task benchmark, `mgrep`+Claude Code used ~2x fewer tokens than grep-based workflows at similar or better judged quality.
-Web search queries the `mixedbread/web` store in addition to your local store, merging results based on relevance. Use `--answer` (or `-a`) to get a concise summary instead of raw results.
+`mgrep` finds the relevant snippets in a few semantic queries first, and the model spends its capacity on reasoning instead of scanning through irrelevant code from endless `grep` attempts. You can [Try it yourself](http://demo.mgrep.mixedbread.com).
+
+*Note: Win Rate (%) was calculated by using an LLM as a judge.*
+
+## Using it with Coding Agents
+
+`mgrep` supports assisted installation commands for many agents:
+- `mgrep install-claude-code` for Claude Code
+- `mgrep install-opencode` for OpenCode
+- `mgrep install-codex` for Codex
+- `mgrep install-droid` for Factory Droid
+
+These commands sign you in (if needed) and add Mixedbread `mgrep` support to the
+agent. After that you only have to start the agent in your project folder, thats
+it.
+
+### More Agents Coming Soon
+
+More agents (Cursor, Windsurf, etc.) are on the way—this section will grow as soon as each integration lands.
-## Commands at a Glance
+## Command Reference
| Command | Purpose |
| --- | --- |
@@ -168,7 +142,7 @@ directory for a pattern.
| `--max-file-count ` | Maximum number of files to upload (overrides config) |
All search options can also be configured via environment variables (see
-[Environment Variables](#environment-variables) section below).
+[Configuration](#configuration) section below).
**Examples:**
```bash
@@ -176,9 +150,24 @@ mgrep "What code parsers are available?" # search in the current directory
mgrep "How are chunks defined?" src/models # search in the src/models directory
mgrep -m 10 "What is the maximum number of concurrent workers in the code parser?" # limit the number of results to 10
mgrep -a "What code parsers are available?" # generate an answer to the question based on the results
-mgrep --web --answer "How do I integrate a JavaScript runtime into Deno?" # search the web and get a summarized answer
```
+#### Web Search
+
+`mgrep` can also search the web alongside your local files. This is useful when
+you need to find documentation, tutorials, or answers to programming questions
+without leaving your terminal.
+
+```bash
+# Search the web and get a summarized answer
+mgrep --web --answer "How do I integrate a JavaScript runtime into Deno?"
+
+# Get the urls of the search
+mgrep --web "best practices for error handling in TypeScript"
+```
+
+Web search queries the `mixedbread/web` store in addition to your local store, merging results based on relevance. Use `--answer` (or `-a`) to get a concise summary instead of raw results.
+
### mgrep watch
`mgrep watch` is used to index the current repository and keep the Mixedbread
@@ -201,15 +190,6 @@ mgrep watch --max-file-size 1048576 # limit uploads to files under 1MB
mgrep watch --max-file-count 5000 # limit uploads to directories with 5000 files or fewer
```
-## Mixedbread under the hood
-
-- Every file is pushed into a Mixedbread Store using the same SDK your apps get.
-- Searches request top-k matches with Mixedbread reranking enabled by default
- for tighter relevance (can be disabled with `--no-rerank` or
- `MGREP_RERANK=0`).
-- Results include relative paths plus contextual hints (line ranges for text, page numbers for PDFs, etc.) for a skim-friendly experience.
-- Because stores are cloud-backed, agents and teammates can query the same corpus without re-uploading.
-
## Configuration
mgrep can be configured via config files, environment variables, or CLI flags.
@@ -233,25 +213,14 @@ maxFileCount: 5000
4. Global config file (`~/.config/mgrep/config.yaml`)
5. Default values
-### Configuration Tips
-
-- `--store ` lets you isolate workspaces (per repo, per team, per experiment). Stores are created on demand if they do not exist yet.
-- Ignore rules come straight from git, so temp files, build outputs, and vendored deps stay out of your embeddings.
-- `watch` reports progress (`processed / uploaded`) as it scans; leave it running in a terminal tab to keep your store fresh.
-- `search` accepts most `grep`-style switches, and politely ignores anything it cannot support, so existing muscle memory still works.
-
-## Environment Variables
-
-All search options can be configured via environment variables, which is
-especially useful for CI/CD pipelines or when you want to set defaults for all
-searches.
+### Environment Variables
-### Authentication & Store
+#### Authentication & Store
- `MXBAI_API_KEY`: Set this to authenticate without browser login (ideal for CI/CD)
- `MXBAI_STORE`: Override the default store name (default: `mgrep`)
-### Search Options
+#### Search Options
- `MGREP_MAX_COUNT`: Maximum number of results to return (default: `10`)
- `MGREP_CONTENT`: Show content of the results (set to `1` or `true` to enable)
@@ -261,10 +230,10 @@ searches.
- `MGREP_DRY_RUN`: Enable dry run mode (set to `1` or `true` to enable)
- `MGREP_RERANK`: Enable reranking of search results (set to `0` or `false` to disable, default: enabled)
-### Sync Options
+#### Sync Options
-- `MGREP_MAX_FILE_SIZE`: Maximum file size in bytes to upload (default: `10485760` / 10MB)
-- `MGREP_MAX_FILE_COUNT`: Maximum number of files to upload (default: `10000`)
+- `MGREP_MAX_FILE_SIZE`: Maximum file size in bytes to upload (default: 1MB)
+- `MGREP_MAX_FILE_COUNT`: Maximum number of files to upload (default: `1000`)
**Examples:**
```bash
@@ -289,6 +258,22 @@ mgrep "search query"
Note: Command-line options always override environment variables.
+### Configuration Tips
+
+- `--store ` lets you isolate workspaces (per repo, per team, per experiment). Stores are created on demand if they do not exist yet.
+- Ignore rules come straight from git, so temp files, build outputs, and vendored deps stay out of your embeddings.
+- `watch` reports progress (`processed / uploaded`) as it scans; leave it running in a terminal tab to keep your store fresh.
+- `search` accepts most `grep`-style switches, and politely ignores anything it cannot support, so existing muscle memory still works.
+
+## How It Works
+
+- Every file is pushed into a Mixedbread Store using the same SDK your apps get.
+- Searches request top-k matches with Mixedbread reranking enabled by default
+ for tighter relevance (can be disabled with `--no-rerank` or
+ `MGREP_RERANK=0`).
+- Results include relative paths plus contextual hints (line ranges for text, page numbers for PDFs, etc.) for a skim-friendly experience.
+- Because stores are cloud-backed, agents and teammates can query the same corpus without re-uploading.
+
## Development
```bash