Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 76 additions & 91 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,23 +18,6 @@
</p>
</div>

## Why mgrep?
- Natural-language search that feels as immediate as `grep`.
- Semantic, multilingual & multimodal (audio, video support coming soon!)
- Web search built-in — query the web alongside your local files with `--web`.
- Smooth background indexing via `mgrep watch`, designed to detect and keep up-to-date everything that matters inside any git repository.
- Friendly device-login flow and first-class coding agent integrations.
- Built for agents and humans alike, and **designed to be a helpful tool**, not a restrictive harness: quiet output, thoughtful defaults, and escape hatches everywhere.
- Reduces the token usage of your agent by 2x while maintaining superior performance

```bash
# index once
mgrep watch

# then ask your repo things in natural language
mgrep "where do we set up auth?"
```

## Quick Start

1. **Install**
Expand Down Expand Up @@ -69,51 +52,30 @@ mgrep "where do we set up auth?"
```
Searches default to the current working directory unless you pass a path.

**Today, `mgrep` works great on:** code, text, PDFs, images.
**Today, `mgrep` works great on:** code, text, PDFs, images.
**Coming soon:** audio & video.

## Using it with Coding Agents

`mgrep` supports assisted installation commands for many agents:
- `mgrep install-claude-code` for Claude Code
- `mgrep install-opencode` for OpenCode
- `mgrep install-codex` for Codex
- `mgrep install-droid` for Factory Droid

These commands sign you in (if needed) and add Mixedbread `mgrep` support to the
agent. After that you only have to start the agent in your project folder, thats
it.

### More Agents Coming Soon

More agents (Cursor, Windsurf, etc.) are on the way—this section will grow as soon as each integration lands.

## Making your agent smarter

We plugged `mgrep` into Claude Code and ran a benchmark of 50 QA tasks to evaluate the economics of `mgrep` against `grep`.

![mgrep benchmark](public/bench.jpg)

In our 50-task benchmark, `mgrep`+Claude Code used ~2x fewer tokens than grep-based workflows at similar or better judged quality.

`mgrep` finds the relevant snippets in a few semantic queries first, and the model spends its capacity on reasoning instead of scanning through irrelevant code from endless `grep` attempts. You can [Try it yourself](http://demo.mgrep.mixedbread.com).

*Note: Win Rate (%) was calculated by using an LLM as a judge.*
## Why mgrep?

## Why we built mgrep
- Natural-language search that feels as immediate as `grep`.
- Semantic, multilingual & multimodal (audio, video support coming soon!)
- Web search built-in — query the web alongside your local files with `--web`.
- Smooth background indexing via `mgrep watch`, designed to detect and keep up-to-date everything that matters inside any git repository.
- Friendly device-login flow and first-class coding agent integrations.
- Built for agents and humans alike, and **designed to be a helpful tool**, not a restrictive harness: quiet output, thoughtful defaults, and escape hatches everywhere.
- Reduces the token usage of your agent by 2x while maintaining superior performance

`grep` is an amazing tool. It's lightweight, compatible with just about every machine on the planet, and will reliably surface any potential match within any target folder.

But grep is **from 1973**, and it carries the limitations of its era: you need exact patterns and it slows down considerably in the cases where you need it most, on large codebases.

Worst of all, if you're looking for deeply-buried critical business logic, you cannot describe it: you have to be able to accurately guess what kind of naming patterns would have been used by the previous generations of engineers at your workplace for `grep` to find it. This will often result in watching a coding agent desperately try hundreds of patterns, filling its token window, and your upcoming invoice, with thousands of tokens.

But it doesn't have to be this way. Everything else in our toolkit is increasingly tailored to understand us, and so should our search tools. `mgrep` is our way to bring `grep` to 2025, integrating all of the advances in semantic understanding and code-search, without sacrificing anything that has made `grep` such a useful tool.
Worst of all, if you're looking for deeply-buried critical business logic, you cannot describe it: you have to be able to accurately guess what kind of naming patterns would have been used by the previous generations of engineers at your workplace for `grep` to find it. This will often result in watching a coding agent desperately try hundreds of patterns, filling its token window, and your upcoming invoice, with thousands of tokens.

Under the hood, `mgrep` is powered by [Mixedbread Search](https://www.mixedbread.com/blog/mixedbread-search), our full-featured search solution. It combines state-of-the-art semantic retrieval models with context-aware parsing and optimized inference methods to provide you with a natural language companion to `grep`. We believe both tools belong in your toolkit: use `grep` for exact matches, `mgrep` for semantic understanding and intent.
But it doesn't have to be this way. Everything else in our toolkit is increasingly tailored to understand us, and so should our search tools. `mgrep` is our way to bring `grep` to 2025, integrating all of the advances in semantic understanding and code-search, without sacrificing anything that has made `grep` such a useful tool.

Under the hood, `mgrep` is powered by [Mixedbread Search](https://www.mixedbread.com/blog/mixedbread-search), our full-featured search solution. It combines state-of-the-art semantic retrieval models with context-aware parsing and optimized inference methods to provide you with a natural language companion to `grep`.

## When to use what
### When to use what

We designed `mgrep` to complement `grep`, not replace it. The best code search combines `mgrep` with `grep`.

Expand All @@ -122,23 +84,35 @@ We designed `mgrep` to complement `grep`, not replace it. The best code search c
| **Exact Matches** | **Intent Search** |
| Symbol tracing, Refactoring, Regex | Code exploration, Feature discovery, Onboarding |

## Web Search
## Benchmarks

`mgrep` can also search the web alongside your local files. This is useful when
you need to find documentation, tutorials, or answers to programming questions
without leaving your terminal.
We plugged `mgrep` into Claude Code and ran a benchmark of 50 QA tasks to evaluate the economics of `mgrep` against `grep`.

```bash
# Search the web and get a summarized answer
mgrep --web --answer "How do I integrate a JavaScript runtime into Deno?"
![mgrep benchmark](public/bench.jpg)

# Get the urls of the search
mgrep --web "best practices for error handling in TypeScript"
```
In our 50-task benchmark, `mgrep`+Claude Code used ~2x fewer tokens than grep-based workflows at similar or better judged quality.

Web search queries the `mixedbread/web` store in addition to your local store, merging results based on relevance. Use `--answer` (or `-a`) to get a concise summary instead of raw results.
`mgrep` finds the relevant snippets in a few semantic queries first, and the model spends its capacity on reasoning instead of scanning through irrelevant code from endless `grep` attempts. You can [Try it yourself](http://demo.mgrep.mixedbread.com).

*Note: Win Rate (%) was calculated by using an LLM as a judge.*

## Using it with Coding Agents

`mgrep` supports assisted installation commands for many agents:
- `mgrep install-claude-code` for Claude Code
- `mgrep install-opencode` for OpenCode
- `mgrep install-codex` for Codex
- `mgrep install-droid` for Factory Droid

These commands sign you in (if needed) and add Mixedbread `mgrep` support to the
agent. After that you only have to start the agent in your project folder, thats
it.

### More Agents Coming Soon

More agents (Cursor, Windsurf, etc.) are on the way—this section will grow as soon as each integration lands.

## Commands at a Glance
## Command Reference

| Command | Purpose |
| --- | --- |
Expand Down Expand Up @@ -168,17 +142,32 @@ directory for a pattern.
| `--max-file-count <count>` | Maximum number of files to upload (overrides config) |

All search options can also be configured via environment variables (see
[Environment Variables](#environment-variables) section below).
[Configuration](#configuration) section below).

**Examples:**
```bash
mgrep "What code parsers are available?" # search in the current directory
mgrep "How are chunks defined?" src/models # search in the src/models directory
mgrep -m 10 "What is the maximum number of concurrent workers in the code parser?" # limit the number of results to 10
mgrep -a "What code parsers are available?" # generate an answer to the question based on the results
mgrep --web --answer "How do I integrate a JavaScript runtime into Deno?" # search the web and get a summarized answer
```

#### Web Search

`mgrep` can also search the web alongside your local files. This is useful when
you need to find documentation, tutorials, or answers to programming questions
without leaving your terminal.

```bash
# Search the web and get a summarized answer
mgrep --web --answer "How do I integrate a JavaScript runtime into Deno?"

# Get the urls of the search
mgrep --web "best practices for error handling in TypeScript"
```

Web search queries the `mixedbread/web` store in addition to your local store, merging results based on relevance. Use `--answer` (or `-a`) to get a concise summary instead of raw results.

### mgrep watch

`mgrep watch` is used to index the current repository and keep the Mixedbread
Expand All @@ -201,15 +190,6 @@ mgrep watch --max-file-size 1048576 # limit uploads to files under 1MB
mgrep watch --max-file-count 5000 # limit uploads to directories with 5000 files or fewer
```

## Mixedbread under the hood

- Every file is pushed into a Mixedbread Store using the same SDK your apps get.
- Searches request top-k matches with Mixedbread reranking enabled by default
for tighter relevance (can be disabled with `--no-rerank` or
`MGREP_RERANK=0`).
- Results include relative paths plus contextual hints (line ranges for text, page numbers for PDFs, etc.) for a skim-friendly experience.
- Because stores are cloud-backed, agents and teammates can query the same corpus without re-uploading.

## Configuration

mgrep can be configured via config files, environment variables, or CLI flags.
Expand All @@ -233,25 +213,14 @@ maxFileCount: 5000
4. Global config file (`~/.config/mgrep/config.yaml`)
5. Default values

### Configuration Tips

- `--store <name>` lets you isolate workspaces (per repo, per team, per experiment). Stores are created on demand if they do not exist yet.
- Ignore rules come straight from git, so temp files, build outputs, and vendored deps stay out of your embeddings.
- `watch` reports progress (`processed / uploaded`) as it scans; leave it running in a terminal tab to keep your store fresh.
- `search` accepts most `grep`-style switches, and politely ignores anything it cannot support, so existing muscle memory still works.

## Environment Variables

All search options can be configured via environment variables, which is
especially useful for CI/CD pipelines or when you want to set defaults for all
searches.
### Environment Variables

### Authentication & Store
#### Authentication & Store

- `MXBAI_API_KEY`: Set this to authenticate without browser login (ideal for CI/CD)
- `MXBAI_STORE`: Override the default store name (default: `mgrep`)

### Search Options
#### Search Options

- `MGREP_MAX_COUNT`: Maximum number of results to return (default: `10`)
- `MGREP_CONTENT`: Show content of the results (set to `1` or `true` to enable)
Expand All @@ -261,10 +230,10 @@ searches.
- `MGREP_DRY_RUN`: Enable dry run mode (set to `1` or `true` to enable)
- `MGREP_RERANK`: Enable reranking of search results (set to `0` or `false` to disable, default: enabled)

### Sync Options
#### Sync Options

- `MGREP_MAX_FILE_SIZE`: Maximum file size in bytes to upload (default: `10485760` / 10MB)
- `MGREP_MAX_FILE_COUNT`: Maximum number of files to upload (default: `10000`)
- `MGREP_MAX_FILE_SIZE`: Maximum file size in bytes to upload (default: 1MB)
- `MGREP_MAX_FILE_COUNT`: Maximum number of files to upload (default: `1000`)

**Examples:**
```bash
Expand All @@ -289,6 +258,22 @@ mgrep "search query"

Note: Command-line options always override environment variables.

### Configuration Tips

- `--store <name>` lets you isolate workspaces (per repo, per team, per experiment). Stores are created on demand if they do not exist yet.
- Ignore rules come straight from git, so temp files, build outputs, and vendored deps stay out of your embeddings.
- `watch` reports progress (`processed / uploaded`) as it scans; leave it running in a terminal tab to keep your store fresh.
- `search` accepts most `grep`-style switches, and politely ignores anything it cannot support, so existing muscle memory still works.

## How It Works

- Every file is pushed into a Mixedbread Store using the same SDK your apps get.
- Searches request top-k matches with Mixedbread reranking enabled by default
for tighter relevance (can be disabled with `--no-rerank` or
`MGREP_RERANK=0`).
- Results include relative paths plus contextual hints (line ranges for text, page numbers for PDFs, etc.) for a skim-friendly experience.
- Because stores are cloud-backed, agents and teammates can query the same corpus without re-uploading.

## Development

```bash
Expand Down