diff --git a/README.md b/README.md
index 9f5c5a8..045e0ec 100644
--- a/README.md
+++ b/README.md
@@ -23,7 +23,17 @@
 
 A search engine for Lean 4 declarations. This project provides tools and resources for exploring the Lean 4 ecosystem.
 
-**For full documentation, please visit: [https://www.leanexplore.com/docs](https://www.leanexplore.com/docs)**
+The current indexed projects include:
+
+* Batteries
+* CSLib
+* FLT (Fermat's Last Theorem)
+* FormalConjectures
+* Init
+* Lean
+* Mathlib
+* PhysLean
+* Std
 
 ## Installation
 
@@ -46,19 +56,20 @@ lean-explore data fetch
 lean-explore mcp serve --backend local
 ```
 
-The current indexed projects include:
+## Documentation
 
-* Batteries
-* CSLib
-* FLT (Fermat's Last Theorem)
-* FormalConjectures
-* Init
-* Lean
-* Mathlib
-* PhysLean
-* Std
+Full docs live in the [`docs/`](docs/README.md) folder, or at [https://www.leanexplore.com/docs](https://www.leanexplore.com/docs).
 
-This code is distributed under an Apache License (see [LICENSE](LICENSE)).
+| Page | Description |
+|---|---|
+| [Getting Started](docs/getting-started.md) | Install and run your first search. |
+| [CLI Reference](docs/cli.md) | Every `lean-explore` command and flag. |
+| [MCP Server](docs/mcp-server.md) | Wire LeanExplore into Claude, Cursor, or any MCP client. |
+| [API Client](docs/api-client.md) | Use `ApiClient` from Python. |
+| [Local Search Backend](docs/local-backend.md) | How hybrid BM25 + FAISS + reranking works. |
+| [Configuration](docs/configuration.md) | Environment variables and data layout. |
+| [Data Models](docs/data-models.md) | `SearchResult`, `SearchResponse`, and related types. |
+| [Extraction Pipeline](docs/extraction-pipeline.md) | Rebuild the dataset from Lean source (contributors). |
 
 ## Contributing
 
@@ -82,3 +93,7 @@ Justin Asher. (2025). *LeanExplore: A search engine for Lean 4 declarations*. [h
   url = {https://arxiv.org/abs/2506.11085}
 }
 ```
+
+## License
+
+This code is distributed under an Apache License (see [LICENSE](LICENSE)).
diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 0000000..e9386e5
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,44 @@
+# LeanExplore Documentation
+
+LeanExplore is a search engine for Lean 4 declarations (theorems, definitions,
+lemmas, instances, and more) from Mathlib and other major Lean packages. It
+provides hybrid retrieval that matches both by declaration name (e.g.,
+`List.map`, `Nat.Prime`) and by informal natural-language meaning (e.g.,
+"continuous function on a compact set").
+
+This documentation covers how to install, configure, and use LeanExplore from
+the CLI, as a Python library, and as an MCP server for AI assistants.
+
+## Contents
+
+| Page | Description |
+|---|---|
+| [Getting Started](./getting-started.md) | Install the package, fetch data, and run your first search. |
+| [CLI Reference](./cli.md) | Every `lean-explore` command, flag, and option, with examples. |
+| [MCP Server](./mcp-server.md) | Run the MCP server and connect it to Claude, Cursor, and other MCP clients. Full reference for every tool exposed. |
+| [API Client](./api-client.md) | Use `ApiClient` to query the remote LeanExplore API from Python. |
+| [Local Search Backend](./local-backend.md) | How the on-device BM25 + FAISS + cross-encoder pipeline works. |
+| [Configuration](./configuration.md) | Environment variables, cache paths, and data layout on disk. |
+| [Data Models](./data-models.md) | `SearchResult`, `SearchResponse`, and related types returned from search. |
+| [Extraction Pipeline](./extraction-pipeline.md) | Rebuild the dataset from Lean source (contributor-focused). |
+
+## Which backend should I use?
+
+LeanExplore has two backends and you pick one per task:
+
+| | **Remote API** | **Local backend** |
+|---|---|---|
+| Install | `pip install lean-explore` | `pip install lean-explore[local]` |
+| Requires | API key | ~1 GB of data + a few GB of model weights |
+| Network | Required per query | Only for initial data fetch |
+| Use when | You want zero setup | You want offline, private, or tunable search |
+
+If you don't know yet, start with the remote API. It is the fastest path to a
+working search. You can switch to local later without changing your code.
+
+## Links
+
+- Website: <https://www.leanexplore.com>
+- Repository: <https://github.com/justincasher/lean-explore>
+- Paper: <https://arxiv.org/abs/2506.11085>
+- Contributing: [../CONTRIBUTING.md](../CONTRIBUTING.md)
diff --git a/docs/api-client.md b/docs/api-client.md
new file mode 100644
index 0000000..ad596d9
--- /dev/null
+++ b/docs/api-client.md
@@ -0,0 +1,136 @@
+# API Client
+
+`lean_explore.api.ApiClient` is an async HTTP client for the hosted
+LeanExplore API. It ships with the base package: no PyTorch, no local
+indices, no data download required.
+
+## Install and authenticate
+
+```bash
+pip install lean-explore
+```
+
+Get an API key from <https://www.leanexplore.com> and set it as an
+environment variable:
+
+```bash
+export LEANEXPLORE_API_KEY="your-key-here"
+```
+
+Or pass it explicitly to the client constructor.
+
+## Quick start
+
+```python
+import asyncio
+from lean_explore.api import ApiClient
+
+async def main():
+    client = ApiClient()  # reads LEANEXPLORE_API_KEY
+
+    response = await client.search("prime number divisibility", limit=5)
+    for result in response.results:
+        print(f"{result.name}  ({result.module})")
+
+asyncio.run(main())
+```
+
+The `search` and `get_by_id` methods are both `async`, so call them from
+inside an async function or via `asyncio.run`.
+
+## `ApiClient`
+
+```python
+ApiClient(api_key: str | None = None, timeout: float = 10.0)
+```
+
+| Parameter | Default | Description |
+|---|---|---|
+| `api_key` | `None` | API key. Falls back to `LEANEXPLORE_API_KEY` env var. Raises `ValueError` if neither is provided. |
+| `timeout` | `10.0` | HTTP timeout in seconds for every request. |
+
+The client hits `https://www.leanexplore.com/api/v2` by default.
+
+### `search()`
+
+```python
+async def search(
+    query: str,
+    limit: int = 20,
+    rerank_top: int | None = None,
+    packages: list[str] | None = None,
+) -> SearchResponse
+```
+
+| Parameter | Default | Description |
+|---|---|---|
+| `query` | *required* | Declaration name or natural-language text. |
+| `limit` | `20` | Maximum results to return. |
+| `rerank_top` | `None` | Ignored by the API backend (reranking is server-side). Accepted for interface parity with the local `Service`. |
+| `packages` | `None` | Filter by package, e.g., `["Mathlib", "Std"]`. |
+
+Returns a [`SearchResponse`](./data-models.md#searchresponse) with `query`,
+`results`, `count`, and `processing_time_ms`.
+
+Raises:
+
+- `httpx.HTTPStatusError`: the API returned a non-2xx status.
+- `httpx.RequestError`: network, DNS, or timeout failure.
+
+### `get_by_id()`
+
+```python
+async def get_by_id(declaration_id: int) -> SearchResult | None
+```
+
+Fetches a single declaration by its numeric id (ids come from `search`
+results). Returns a [`SearchResult`](./data-models.md#searchresult) on hit,
+or `None` on a 404. Other non-2xx statuses raise
+`httpx.HTTPStatusError`.
+
+## Full example
+
+```python
+import asyncio
+from lean_explore.api import ApiClient
+
+async def main():
+    client = ApiClient(api_key="sk-...", timeout=15.0)
+
+    response = await client.search(
+        query="continuous function on a compact set",
+        limit=10,
+        packages=["Mathlib"],
+    )
+    print(f"{response.count} results in {response.processing_time_ms} ms")
+
+    # Grab the first result's full source
+    if response.results:
+        first = response.results[0]
+        print(f"\n{first.name}")
+        print(first.source_link)
+        print(first.source_text)
+
+        # Round-trip by id
+        same = await client.get_by_id(first.id)
+        assert same is not None and same.id == first.id
+
+asyncio.run(main())
+```
+
+## Notes
+
+- The client creates a fresh `httpx.AsyncClient` per call. If you make many
+  rapid requests, consider wrapping them in `asyncio.gather(...)`; they run
+  concurrently without extra setup.
+- Package names follow the casing shown in results (e.g., `Mathlib`, `Std`,
+  `PhysLean`).
+- For offline use or custom tuning, see [Local Search Backend](./local-backend.md)
+  and use `lean_explore.search.Service` instead.
+
+## See also
+
+- [Data Models](./data-models.md): field reference for `SearchResult` and
+  `SearchResponse`.
+- [Configuration](./configuration.md): environment variables including
+  `LEANEXPLORE_API_KEY`.
diff --git a/docs/cli.md b/docs/cli.md
new file mode 100644
index 0000000..f511f0e
--- /dev/null
+++ b/docs/cli.md
@@ -0,0 +1,164 @@
+# CLI Reference
+
+The `lean-explore` command is installed alongside the Python package. It is
+organized as a small command tree:
+
+```
+lean-explore
+├── search               # Query Lean declarations
+├── mcp
+│   └── serve            # Run the MCP server (api or local backend)
+└── data
+    ├── fetch            # Download the data toolchain
+    └── clean            # Delete all cached data
+```
+
+Run `lean-explore --help` or `lean-explore <command> --help` for built-in
+help at any time.
+
+## `lean-explore search`
+
+Search the Lean Explore index and print results to your terminal.
+
+```bash
+lean-explore search QUERY [OPTIONS]
+```
+
+### Arguments
+
+| Argument | Description |
+|---|---|
+| `QUERY` | The search query. Can be a declaration name (`List.map`, `Nat.Prime`) or natural-language text (`"continuous function on a compact set"`). Required. |
+
+### Options
+
+| Flag | Default | Description |
+|---|---|---|
+| `--limit`, `-n` | `5` | Number of results to display. |
+| `--package`, `-p` | (all) | Filter by package. Repeatable: `-p Mathlib -p Std`. |
+
+### Requirements
+
+`lean-explore search` uses the remote API. You must have `LEANEXPLORE_API_KEY`
+set in your environment:
+
+```bash
+export LEANEXPLORE_API_KEY="your-key-here"
+```
+
+### Examples
+
+```bash
+lean-explore search "List.map"
+lean-explore search "prime number divisibility" --limit 10
+lean-explore search "fundamental theorem of calculus" -p Mathlib
+lean-explore search "continuous" -p Mathlib -p Std -n 20
+```
+
+## `lean-explore mcp serve`
+
+Launch the Model Context Protocol server so an MCP client (Claude, Cursor,
+etc.) can call LeanExplore's search tools. The server speaks MCP over stdio,
+so you normally do not run it manually; your MCP client launches it. Running
+it directly is mostly useful for debugging.
+
+```bash
+lean-explore mcp serve [OPTIONS]
+```
+
+### Options
+
+| Flag | Default | Description |
+|---|---|---|
+| `--backend`, `-b` | `api` | Backend to use: `api` or `local`. |
+| `--api-key` | (none) | API key for the `api` backend. Overrides `LEANEXPLORE_API_KEY`. |
+
+### Backends
+
+- **`api`**: Delegates every query to the hosted LeanExplore API. Requires an
+  API key (via env var or `--api-key`).
+- **`local`**: Runs the full hybrid search pipeline on-device. Requires
+  `pip install lean-explore[local]` and `lean-explore data fetch`.
+
+### Examples
+
+```bash
+# Remote API (most users)
+lean-explore mcp serve --backend api
+
+# Remote API with an inline key
+lean-explore mcp serve --backend api --api-key sk-...
+
+# Local, fully offline backend
+lean-explore mcp serve --backend local
+```
+
+For MCP tool schemas and client configuration, see
+[MCP Server](./mcp-server.md).
+
+## `lean-explore data fetch`
+
+Download the prebuilt data toolchain used by the local search backend. This
+pulls the SQLite database, the FAISS semantic index, and the BM25 indices from
+remote storage into `~/.lean_explore/cache/<version>/`.
+
+```bash
+lean-explore data fetch [OPTIONS]
+```
+
+### Options
+
+| Flag | Default | Description |
+|---|---|---|
+| `--version`, `-v` | latest | Specific version to install (e.g., `20260127_103630`). |
+
+### Behavior
+
+1. Fetches the current version tag (or uses the one you passed).
+2. Downloads all required files to `~/.lean_explore/cache/<version>/`:
+   - `lean_explore.db`
+   - `informalization_faiss.index` and `informalization_faiss_ids_map.json`
+   - `bm25_ids_map.json`
+   - `bm25_name_raw/` and `bm25_name_spaced/` directories
+3. Writes `~/.lean_explore/active_version` so future commands resolve the
+   correct cache directory.
+4. Removes any older cached versions to reclaim disk space.
+
+Already-present files are skipped, so re-running `fetch` after an interrupted
+download will resume rather than start over.
+
+### Examples
+
+```bash
+# Install the latest version
+lean-explore data fetch
+
+# Pin a specific version
+lean-explore data fetch --version 20260127_103630
+```
+
+## `lean-explore data clean`
+
+Remove every cached data toolchain and clear the active-version pointer.
+Useful if you want to reset state or free disk.
+
+```bash
+lean-explore data clean
+```
+
+This command prompts for confirmation before deleting anything. It does not
+touch downloaded model weights; those live under `~/.cache/huggingface/`.
+
+## Exit codes
+
+All CLI commands follow standard conventions:
+
+- `0`: success
+- non-zero: an error occurred (missing API key, failed download, etc.).
+  An error message is printed to stderr.
+
+## See also
+
+- [Getting Started](./getting-started.md)
+- [MCP Server](./mcp-server.md)
+- [Configuration](./configuration.md)
diff --git a/docs/configuration.md b/docs/configuration.md
new file mode 100644
index 0000000..2acf6f7
--- /dev/null
+++ b/docs/configuration.md
@@ -0,0 +1,143 @@
+# Configuration
+
+This page lists every environment variable LeanExplore reads, where files
+live on disk, and how to override defaults. Configuration is centralized in
+`src/lean_explore/config.py`.
+
+## Environment variables
+
+### Authentication
+
+| Variable | Default | Used by |
+|---|---|---|
+| `LEANEXPLORE_API_KEY` | (required for API use) | `ApiClient`, `lean-explore search`, `lean-explore mcp serve --backend api` |
+
+### Paths
+
+| Variable | Default | Purpose |
+|---|---|---|
+| `LEAN_EXPLORE_CACHE_DIR` | `~/.lean_explore/cache` | Where `lean-explore data fetch` downloads data. Read by the local backend. |
+| `LEAN_EXPLORE_DATA_DIR` | `<repo-root>/data` | Local extraction output directory. Only relevant if you run the extraction pipeline yourself. |
+| `LEAN_EXPLORE_PACKAGES_ROOT` | `<repo-root>/lean` | Root for per-package Lean workspaces. Only relevant for extraction. |
+| `LEAN_EXPLORE_VERSION` | contents of `~/.lean_explore/active_version`, or `v4.24.0` | Pin a specific cached version. |
+
+### Local backend tuning
+
+| Variable | Default | Purpose |
+|---|---|---|
+| `LEAN_EXPLORE_EMBEDDING_BATCH_SIZE` | `8` | Batch size used when embedding the query with the sentence-transformer. Raise on GPUs with more VRAM. |
+| `LEAN_EXPLORE_RERANKER_BATCH_SIZE` | `16` on CUDA, `32` on CPU | Batch size for the cross-encoder reranker. |
+
+### Extraction pipeline (only for contributors rebuilding the index)
+
+| Variable | Default | Purpose |
+|---|---|---|
+| `OPENROUTER_API_KEY` | (none) | Used by the informalization step to call OpenRouter. |
+
+## On-disk layout
+
+### Cache directory
+
+Populated by `lean-explore data fetch`:
+
+```
+~/.lean_explore/
+├── active_version                       # Text file: currently active version
+└── cache/
+    └── <VERSION>/                       # e.g. 20260127_103630
+        ├── lean_explore.db              # SQLite database
+        ├── informalization_faiss.index  # FAISS semantic index
+        ├── informalization_faiss_ids_map.json
+        ├── bm25_ids_map.json
+        ├── bm25_name_raw/               # BM25 index (raw tokenization)
+        │   ├── data.csc.index.npy
+        │   ├── indices.csc.index.npy
+        │   ├── indptr.csc.index.npy
+        │   ├── nonoccurrence_array.index.npy
+        │   ├── params.index.json
+        │   └── vocab.index.json
+        └── bm25_name_spaced/            # BM25 index (spaced tokenization)
+            └── (same six files)
+```
+
+Change this location with `LEAN_EXPLORE_CACHE_DIR`.
+
+### Model cache
+
+First-run local-backend models are cached under `~/.cache/huggingface/` by
+`sentence-transformers` and `transformers`. This is controlled by
+Hugging Face, not LeanExplore. See `HF_HOME` / `TRANSFORMERS_CACHE` if you
+need to move it.
+
+## Programmatic access
+
+All of the above resolves through the `Config` class:
+
+```python
+from lean_explore.config import Config
+
+Config.CACHE_DIRECTORY      # ~/.lean_explore/cache
+Config.ACTIVE_VERSION       # current version string
+Config.ACTIVE_CACHE_PATH    # ~/.lean_explore/cache/<version>
+Config.DATABASE_PATH        # .../lean_explore.db
+Config.DATABASE_URL         # sqlite+aiosqlite:///.../lean_explore.db
+Config.FAISS_INDEX_PATH
+Config.FAISS_IDS_MAP_PATH
+Config.BM25_RAW_PATH
+Config.BM25_SPACED_PATH
+Config.BM25_IDS_MAP_PATH
+Config.API_BASE_URL         # https://www.leanexplore.com/api/v2
+Config.R2_ASSETS_BASE_URL   # data download base URL
+```
+
+`ACTIVE_VERSION` is resolved, in order:
+
+1. `LEAN_EXPLORE_VERSION` environment variable, if set.
+2. Contents of `~/.lean_explore/active_version` (written by `data fetch`).
+3. Fallback: `v4.24.0`.
+
+## Common recipes
+
+**Move the cache to a shared drive.**
+
+```bash
+export LEAN_EXPLORE_CACHE_DIR=/mnt/shared/lean-explore-cache
+lean-explore data fetch
+```
+
+**Pin a specific data version across a team.**
+
+```bash
+export LEAN_EXPLORE_VERSION=20260127_103630
+```
+
+**Tune for a beefier GPU.**
+
+```bash
+export LEAN_EXPLORE_EMBEDDING_BATCH_SIZE=64
+export LEAN_EXPLORE_RERANKER_BATCH_SIZE=128
+lean-explore mcp serve --backend local
+```
+
+**Run the MCP server under a non-default config in Claude Desktop.**
+
+```json
+{
+  "mcpServers": {
+    "lean-explore": {
+      "command": "lean-explore",
+      "args": ["mcp", "serve", "--backend", "local"],
+      "env": {
+        "LEAN_EXPLORE_CACHE_DIR": "/Users/me/lean-cache",
+        "LEAN_EXPLORE_EMBEDDING_BATCH_SIZE": "32"
+      }
+    }
+  }
+}
+```
+
+## See also
+
+- [Getting Started](./getting-started.md)
+- [Local Search Backend](./local-backend.md)
+- [MCP Server](./mcp-server.md)
diff --git a/docs/data-models.md b/docs/data-models.md
new file mode 100644
index 0000000..9d7d340
--- /dev/null
+++ b/docs/data-models.md
@@ -0,0 +1,101 @@
+# Data Models
+
+All public search results are Pydantic v2 models, defined in
+`lean_explore.models`. Both the remote `ApiClient` and the local `Service`
+return the same types, so code written against one works against the other.
+
+## Imports
+
+```python
+from lean_explore.models import (
+    SearchResult,
+    SearchResponse,
+    SearchResultSummary,
+    SearchSummaryResponse,
+    extract_bold_description,
+)
+```
+
+## `SearchResult`
+
+A single Lean declaration returned from a search.
+
+| Field | Type | Description |
+|---|---|---|
+| `id` | `int` | Primary key identifier. |
+| `name` | `str` | Fully qualified Lean name (e.g., `Nat.add`). |
+| `module` | `str` | Module path (e.g., `Mathlib.Data.List.Basic`). |
+| `docstring` | `str \| None` | Doc comment from the Lean source. May be `None`. |
+| `source_text` | `str` | Full Lean source code for the declaration. |
+| `source_link` | `str` | GitHub URL to the source. |
+| `dependencies` | `str \| None` | JSON-encoded array of declaration names this depends on. Parse with `json.loads` if you need the list. |
+| `informalization` | `str \| None` | AI-generated natural-language description. Typically starts with a bold header: `**Title.** Rest...`. |
+
+Example:
+
+```python
+for result in response.results:
+    print(result.name)
+    print("  module:", result.module)
+    print("  link:  ", result.source_link)
+    if result.docstring:
+        print("  doc:   ", result.docstring.splitlines()[0])
+```
+
+## `SearchResponse`
+
+Envelope returned by `ApiClient.search()` and `Service.search()`.
+
+| Field | Type | Description |
+|---|---|---|
+| `query` | `str` | The original query string. |
+| `results` | `list[SearchResult]` | Hits, ordered best-first. |
+| `count` | `int` | `len(results)`. |
+| `processing_time_ms` | `int \| None` | Server/engine latency in milliseconds. |
+
+## `SearchResultSummary`
+
+Slim form used by the MCP `search_summary` tool. Contains only enough to let
+a caller decide which ids to drill into.
+
+| Field | Type | Description |
+|---|---|---|
+| `id` | `int` | Declaration id (feed into per-field getters). |
+| `name` | `str` | Fully qualified Lean name. |
+| `description` | `str \| None` | The bold header extracted from the informalization. |
+
+## `SearchSummaryResponse`
+
+Envelope for slim results.
+
+| Field | Type | Description |
+|---|---|---|
+| `query` | `str` | The original query string. |
+| `results` | `list[SearchResultSummary]` | Slim hits, ordered best-first. |
+| `count` | `int` | `len(results)`. |
+| `processing_time_ms` | `int \| None` | Latency in milliseconds. |
+
+## `extract_bold_description`
+
+```python
+extract_bold_description(informalization: str | None) -> str | None
+```
+
+Pulls the `**Bold Title.**` prefix out of an informalization. Returns `None`
+if the input is `None` or does not start with a bold header. This is the
+helper used to build `SearchResultSummary.description`.
+
+## ORM model (advanced)
+
+If you are working directly against the SQLite database (e.g., in the
+extraction pipeline), `lean_explore.models.Declaration` is the SQLAlchemy
+ORM model that backs the `declarations` table. Most users should stick to
+the Pydantic models above.
+
+## See also
+
+- [API Client](./api-client.md): returns `SearchResponse` / `SearchResult`.
+- [Local Search Backend](./local-backend.md): same return types via
+  `Service`.
+- [MCP Server](./mcp-server.md): tool return payloads serialize these
+  models to JSON.
diff --git a/docs/extraction-pipeline.md b/docs/extraction-pipeline.md
new file mode 100644
index 0000000..ca022d0
--- /dev/null
+++ b/docs/extraction-pipeline.md
@@ -0,0 +1,182 @@
+# Extraction Pipeline
+
+The extraction pipeline is how LeanExplore *builds* the dataset that the
+search engine queries. For everyday use you do not run it; `lean-explore
+data fetch` downloads a prebuilt version. This page is for contributors who
+want to regenerate the data (e.g., against a newer Mathlib) or run a custom
+build locally.
+
+The pipeline lives in `lean_explore.extract` and is invoked as a Python
+module:
+
+```bash
+python -m lean_explore.extract [OPTIONS]
+```
+
+## What the pipeline does
+
+Four stages, run in order:
+
+1. **doc-gen4** (optional, `--run-doc-gen4`): runs Lake + doc-gen4 against
+   each package workspace in `lean/<package>/` to produce structured
+   declaration data (BMP files). This is the slow step: it builds Lean
+   packages from source.
+2. **Parse** (`--parse-docs`): reads the doc-gen4 output and writes one
+   row per declaration into a fresh SQLite database under a new
+   `YYYYMMDD_HHMMSS` directory in `LEAN_EXPLORE_DATA_DIR`.
+3. **Informalize** (`--informalize`): calls an LLM (through OpenRouter) to
+   generate a natural-language description for each declaration.
+4. **Embed** (`--embeddings`): runs each informalization through the
+   sentence-transformer to produce vector embeddings.
+5. **Index** (`--index`): builds the FAISS semantic index and the two BM25
+   indices over the declaration names.
+
+Running with no flags runs all stages. Passing any stage flag switches to
+"only run the stages I asked for" mode, which is how you resume after a
+failure, or iterate on a single step.
+
+## Prerequisites
+
+### Python install
+
+```bash
+pip install lean-explore[extract]
+```
+
+This adds PyTorch, `sentence-transformers`, and `networkx` on top of the
+base dependencies.
+
+### Lean toolchain
+
+You only need this if you plan to run `--run-doc-gen4`. Install
+[`elan`](https://github.com/leanprover/elan) (the Lean version manager),
+which provides `lake`. The pipeline fetches the correct Lean toolchain per
+package from GitHub.
+
+### Environment variables
+
+| Variable | Required for | Purpose |
+|---|---|---|
+| `OPENROUTER_API_KEY` | informalize stage | LLM calls go through [OpenRouter](https://openrouter.ai). |
+| `LEAN_EXPLORE_DATA_DIR` | optional | Where extractions are written. Default: `<repo-root>/data`. |
+| `LEAN_EXPLORE_PACKAGES_ROOT` | optional | Root for per-package Lean workspaces. Default: `<repo-root>/lean`. |
+| `LEAN_EXPLORE_EMBEDDING_BATCH_SIZE` | optional | Overrides the embedding batch size. |
+
+See [Configuration](./configuration.md) for the full list.
+
+## Usage
+
+### Full rebuild from scratch
+
+```bash
+export OPENROUTER_API_KEY=sk-or-...
+python -m lean_explore.extract --run-doc-gen4 --fresh
+```
+
+- `--run-doc-gen4` runs the Lake/doc-gen4 step for every registered package.
+- `--fresh` clears cached Lake dependencies, forcing the latest compatible
+  versions to be resolved. Use this for nightly-style updates.
+
+Expect this to take a while (hours on a cold machine), dominated by the
+Lake build.
+
+### Re-run a single stage on the latest extraction
+
+Each stage flag accepts `--<stage>/--no-<stage>`. After a fresh `parse-docs`
+creates a new timestamped directory, subsequent stages target the most
+recent extraction automatically.
+
+```bash
+# Just rebuild embeddings + indices against the latest extraction
+python -m lean_explore.extract --no-parse-docs --no-informalize --embeddings --index
+
+# Only rebuild the FAISS/BM25 indices
+python -m lean_explore.extract --index
+```
+
+### Smoke-test on a small slice
+
+```bash
+python -m lean_explore.extract \
+  --informalize-limit 50 \
+  --embedding-limit 50
+```
+
+## Selected flags
+
+Run `python -m lean_explore.extract --help` for the full list. The most
+commonly used flags:
+
+| Flag | Default | Purpose |
+|---|---|---|
+| `--run-doc-gen4` | off | Regenerate Lean documentation with Lake + doc-gen4. |
+| `--fresh` | off | Clear Lake caches before building. |
+| `--parse-docs / --no-parse-docs` | on* | Parse doc-gen4 output into the database. |
+| `--informalize / --no-informalize` | on* | Generate natural-language descriptions. |
+| `--embeddings / --no-embeddings` | on* | Generate embeddings. |
+| `--index / --no-index` | on* | Build FAISS + BM25 indices. |
+| `--informalize-model` | `google/gemini-3-flash-preview` | OpenRouter model id. |
+| `--informalize-max-concurrent` | `10` | Parallel informalization requests. |
+| `--informalize-limit` | none | Cap informalization count (for testing). |
+| `--embedding-model` | `Qwen/Qwen3-Embedding-0.6B` | Sentence-transformer model. |
+| `--embedding-batch-size` | `250` | Lower this if you hit OOM. |
+| `--embedding-max-seq-length` | `512` | Lower this to reduce memory. |
+| `--embedding-server-url` | none | Delegate embeddings to a separate process (e.g., `http://localhost:5001`) to keep GPU memory free elsewhere. |
+| `--verbose` | off | Verbose logging. |
+
+*When you pass any stage flag explicitly, stages you did not mention default
+to `off`.
+
+## Output layout
+
+Every run writes to a new timestamped directory under
+`LEAN_EXPLORE_DATA_DIR`:
+
+```
+<LEAN_EXPLORE_DATA_DIR>/
+└── 20260127_103630/
+    ├── lean_explore.db
+    ├── informalization_faiss.index
+    ├── informalization_faiss_ids_map.json
+    ├── bm25_ids_map.json
+    ├── bm25_name_raw/
+    └── bm25_name_spaced/
+```
+
+This is the same layout that `lean-explore data fetch` populates into the
+cache directory. To make the local backend use your extraction instead of a
+downloaded toolchain, construct a `SearchEngine` with `use_local_data=True`:
+
+```python
+from lean_explore.search import SearchEngine, Service
+
+engine = SearchEngine(use_local_data=True)
+service = Service(engine=engine)
+```
+
+`SearchEngine` with `use_local_data=True` resolves the latest complete
+extraction directory automatically.
+
+## Packages extracted
+
+The registry lives in `src/lean_explore/extract/package_registry.py`:
+
+- `mathlib`: also supplies `Batteries`, `Init`, `Lean`, `Std` from its
+  transitive dependencies.
+- `physlean`
+- `flt`
+- `formal-conjectures`
+- `cslib`
+
+Each package entry specifies its git URL, version strategy (`LATEST` or
+`TAGGED`), and dependencies. Adding a new package means adding a
+`PackageConfig` to that registry and creating a corresponding workspace
+under `lean/<name>/`.
+
+## See also
+
+- [Configuration](./configuration.md): environment variables and cache/data
+  paths.
+- [Local Search Backend](./local-backend.md): how to consume your
+  extraction output with `SearchEngine`.
+- [CONTRIBUTING.md](../CONTRIBUTING.md): general contributor guide.
diff --git a/docs/getting-started.md b/docs/getting-started.md
new file mode 100644
index 0000000..28f3827
--- /dev/null
+++ b/docs/getting-started.md
@@ -0,0 +1,123 @@
+# Getting Started
+
+This guide takes you from zero to your first search in under five minutes.
+LeanExplore has two paths:
+
+1. **Remote API**: a small pure-Python install that talks to the hosted
+   search service. Best for getting started.
+2. **Local backend**: runs the full search pipeline on your machine. Larger
+   install, no network calls after the initial data fetch.
+
+## Requirements
+
+- Python 3.10 or newer (3.11, 3.12 also supported)
+- `pip` (or `uv`, `poetry`, etc.)
+- For the local backend: roughly 1 GB of disk for data files and a few GB for
+  the first-run model download (embedding + reranker models from Hugging Face)
+
+## Option 1: Remote API (recommended to start)
+
+### 1. Install
+
+```bash
+pip install lean-explore
+```
+
+This installs the CLI, the `ApiClient`, and the MCP server, roughly 50 MB
+of pure-Python and C-extension dependencies. No PyTorch.
+
+### 2. Get an API key
+
+Sign up and generate a key at <https://www.leanexplore.com>. Then export it:
+
+```bash
+export LEANEXPLORE_API_KEY="your-key-here"
+```
+
+You can also add it to your shell profile (`~/.zshrc`, `~/.bashrc`) so it
+persists between sessions.
+
+### 3. Run a search
+
+```bash
+lean-explore search "prime number divisibility"
+lean-explore search "List.map" --limit 10
+lean-explore search "fundamental theorem of arithmetic" --package Mathlib
+```
+
+The first argument is the query. It can be a Lean declaration name, a partial
+name, or a natural-language description. The search engine handles both at
+once; you don't need to pick a mode.
+
+### 4. (Optional) Run the MCP server
+
+If you want to give Claude, Cursor, or another MCP client access to
+LeanExplore:
+
+```bash
+lean-explore mcp serve --backend api
+```
+
+See [MCP Server](./mcp-server.md) for client configuration.
+
+## Option 2: Local backend
+
+Use this when you need offline search, want more control over parameters, or
+want to avoid per-query network latency.
+
+### 1. Install with the `[local]` extra
+
+```bash
+pip install lean-explore[local]
+```
+
+This adds PyTorch and `sentence-transformers`. Expect roughly 1–2 GB of
+Python dependencies.
+
+### 2. Fetch the data
+
+```bash
+lean-explore data fetch
+```
+
+This downloads the latest prebuilt database, FAISS index, and BM25 indices to
+`~/.lean_explore/cache/<version>/`. The total download is on the order of
+1 GB.
+
+### 3. Run the MCP server with the local backend
+
+```bash
+lean-explore mcp serve --backend local
+```
+
+The first run will also download the embedding and reranker models from
+Hugging Face (`Qwen/Qwen3-Embedding-0.6B` and `Qwen/Qwen3-Reranker-0.6B`),
+caching them under `~/.cache/huggingface/`. Subsequent runs are fast.
+
+The `lean-explore search` CLI currently uses the remote API. For programmatic
+local search, see [Local Search Backend](./local-backend.md) and use
+`SearchEngine` / `Service` directly.
+
+## What's indexed
+
+LeanExplore currently covers:
+
+- Batteries
+- CSLib
+- FLT (Fermat's Last Theorem)
+- FormalConjectures
+- Init
+- Lean (core)
+- Mathlib
+- PhysLean
+- Std
+
+You can filter any search to a subset with `--package` (CLI) or the
+`packages=[...]` argument (API/MCP).
+
+## Next steps
+
+- [CLI Reference](./cli.md): every command and flag.
+- [MCP Server](./mcp-server.md): wire LeanExplore into Claude or Cursor.
+- [API Client](./api-client.md): use `ApiClient` from Python.
+- [Configuration](./configuration.md): environment variables and data layout.
diff --git a/docs/local-backend.md b/docs/local-backend.md
new file mode 100644
index 0000000..f8c9387
--- /dev/null
+++ b/docs/local-backend.md
@@ -0,0 +1,158 @@
+# Local Search Backend
+
+The local backend runs LeanExplore's full hybrid retrieval pipeline on your
+machine. Once the data is fetched, no network calls are needed; queries
+execute entirely locally.
+
+This page explains:
+
+- [When to use it](#when-to-use-it)
+- [Installation and data setup](#installation-and-data-setup)
+- [How hybrid retrieval works](#how-hybrid-retrieval-works)
+- [Using it from Python](#using-it-from-python)
+
+## When to use it
+
+- **Offline use**: no network per query after the initial data fetch.
+- **Privacy**: queries never leave your machine.
+- **Tuning**: you can adjust candidate pool sizes and rerank depth.
+- **Integration**: embed `SearchEngine` or `Service` directly in your
+  application or data pipeline.
+
+For quick exploration or low-setup use, prefer the
+[remote API client](./api-client.md) instead.
+
+## Installation and data setup
+
+```bash
+pip install lean-explore[local]
+lean-explore data fetch
+```
+
+This adds PyTorch and `sentence-transformers`, then downloads the prebuilt
+database and indices into `~/.lean_explore/cache/<version>/`. The first
+search will additionally download the embedding model
+(`Qwen/Qwen3-Embedding-0.6B`) and reranker model
+(`Qwen/Qwen3-Reranker-0.6B`) from Hugging Face, cached under
+`~/.cache/huggingface/`.
+
+GPU is optional but used automatically when available.
+
+## How hybrid retrieval works
+
+A query flows through five stages:
+
+1. **BM25 lexical search on names.** Two BM25 indices score the query against
+   declaration names: one using raw tokenization (entire dotted name as a
+   single token), one using spaced tokenization (splitting on dots,
+   underscores, and camelCase). This catches both exact and fuzzy name
+   matches (e.g., `list.map` → `List.map`).
+2. **FAISS semantic search on informalizations.** Each declaration has an
+   AI-generated natural-language description (its *informalization*). The
+   query is embedded with the Qwen embedding model and the top FAISS
+   neighbours are retrieved. This catches "what does this thing mean" style
+   queries.
+3. **Reciprocal Rank Fusion.** The candidate lists are merged by summing
+   `1/rank` across the BM25 and FAISS results, producing a single fused
+   ranking.
+4. **Dependency boosting.** Candidates that are depended on by top-ranked
+   results get a small score boost, surfacing foundational lemmas.
+5. **Cross-encoder reranking.** The top N candidates (default 25–50) are
+   rescored pairwise with the Qwen reranker, which looks at the query and
+   candidate text together. The final top results are returned.
+
+Auto-generated declarations (e.g., `.mk` constructors) are filtered from the
+final list.
+
+Switches available on every search call:
+
+- `limit`: how many results to return (default 20 to 50).
+- `rerank_top`: how deep to rerank. Larger is slower but more precise.
+  Setting it to `0` or `None` skips reranking entirely.
+- `packages`: restrict to a subset of packages.
+
+## Using it from Python
+
+Two layers are available:
+
+- **`Service`**: the convenient wrapper. Returns a `SearchResponse` with
+  timing metadata.
+- **`SearchEngine`**: the lower-level engine. Returns a plain
+  `list[SearchResult]` and exposes all retrieval knobs.
+
+### `Service`
+
+```python
+import asyncio
+from lean_explore.search import Service
+
+async def main():
+    service = Service()  # spins up a default SearchEngine
+
+    response = await service.search(
+        query="continuous function on a compact set",
+        limit=10,
+        rerank_top=50,
+        packages=["Mathlib"],
+    )
+    for result in response.results:
+        print(result.name, "-", result.module)
+
+    # Look up one declaration by id
+    first = response.results[0]
+    same = await service.get_by_id(first.id)
+
+asyncio.run(main())
+```
+
+### `SearchEngine`
+
+```python
+import asyncio
+from lean_explore.search import SearchEngine
+
+async def main():
+    engine = SearchEngine()
+
+    results = await engine.search(
+        query="List.map",
+        limit=25,
+        faiss_k=1000,     # how many FAISS candidates to pull
+        bm25_k=1000,      # how many BM25 candidates to pull
+        rerank_top=25,    # depth of cross-encoder reranking
+        packages=["Mathlib", "Std"],
+    )
+    for result in results:
+        print(result.name)
+
+asyncio.run(main())
+```
+
+Key constructor options (all optional; sensible defaults):
+
+| Parameter | Default | Purpose |
+|---|---|---|
+| `db_url` | configured URL | SQLAlchemy database URL. |
+| `embedding_model_name` | `Qwen/Qwen3-Embedding-0.6B` | Sentence-transformer model for queries. |
+| `reranker_model_name` | `Qwen/Qwen3-Reranker-0.6B` | Cross-encoder used for final reranking. |
+| `use_local_data` | `False` | `True` reads from `DATA_DIRECTORY` (extraction output). `False` reads from `CACHE_DIRECTORY` (downloaded data). |
+
+For the full set of parameters, see the class docstring in
+`src/lean_explore/search/engine.py`.
+
+## Tuning tips
+
+- **Small queries, fast responses**: set `rerank_top=0` to skip the
+  cross-encoder. You lose some precision but gain significant speed.
+- **Recall-heavy queries**: bump `faiss_k` and `bm25_k` to widen the
+  candidate pool before reranking.
+- **Memory-constrained machines**: lower `LEAN_EXPLORE_EMBEDDING_BATCH_SIZE`
+  and `LEAN_EXPLORE_RERANKER_BATCH_SIZE`. See [Configuration](./configuration.md).
+
+## See also
+
+- [Data Models](./data-models.md) for the shape of `SearchResult`.
+- [Configuration](./configuration.md) for env vars that control batch sizes
+  and cache paths.
+- [MCP Server](./mcp-server.md): the local backend powers
+  `lean-explore mcp serve --backend local`.
diff --git a/docs/mcp-server.md b/docs/mcp-server.md
new file mode 100644
index 0000000..799350f
--- /dev/null
+++ b/docs/mcp-server.md
@@ -0,0 +1,183 @@
+# MCP Server
+
+LeanExplore ships with a [Model Context Protocol](https://modelcontextprotocol.io)
+server that exposes Lean search as a set of tools. Any MCP-compatible client
+(Claude, Cursor, Zed, Continue, etc.) can launch it and let its model search
+Mathlib and other Lean packages during a conversation.
+
+This page covers:
+
+- [Running the server](#running-the-server)
+- [Connecting from MCP clients](#connecting-from-mcp-clients)
+- [The tools](#the-tools) and their schemas
+- [Recommended agent workflow](#recommended-agent-workflow)
+
+## Running the server
+
+The server speaks MCP over stdio; your client launches it as a subprocess.
+You rarely invoke it directly except for debugging.
+
+### Remote API backend (default)
+
+```bash
+lean-explore mcp serve --backend api
+```
+
+Requires `LEANEXPLORE_API_KEY` in the environment, or pass `--api-key`:
+
+```bash
+lean-explore mcp serve --backend api --api-key sk-...
+```
+
+### Local backend
+
+```bash
+lean-explore mcp serve --backend local
+```
+
+Requires `pip install lean-explore[local]` and a prior
+`lean-explore data fetch`. The first run also downloads the embedding and
+reranker models from Hugging Face.
+
+If required data files are missing, the server exits immediately with a
+message pointing at `~/.lean_explore/cache/<version>/`.
+
+## Connecting from MCP clients
+
+### Claude Desktop / Claude Code
+
+Add an entry to your MCP settings file (location depends on client; Claude
+Desktop uses `~/Library/Application Support/Claude/claude_desktop_config.json`
+on macOS):
+
+```json
+{
+  "mcpServers": {
+    "lean-explore": {
+      "command": "lean-explore",
+      "args": ["mcp", "serve", "--backend", "api"],
+      "env": {
+        "LEANEXPLORE_API_KEY": "your-key-here"
+      }
+    }
+  }
+}
+```
+
+For the local backend, swap `"api"` for `"local"` and drop the `env` block:
+
+```json
+{
+  "mcpServers": {
+    "lean-explore": {
+      "command": "lean-explore",
+      "args": ["mcp", "serve", "--backend", "local"]
+    }
+  }
+}
+```
+
+### Cursor, Zed, Continue, and others
+
+Any client that accepts a command + args will work. Point it at the
+`lean-explore` executable with the same arguments shown above.
+
+### Troubleshooting
+
+- **"API key required"**: set `LEANEXPLORE_API_KEY` in the `env` block (for
+  MCP clients that support it) or pass `--api-key` in `args`.
+- **"Essential data files for the local backend are missing"**: run
+  `lean-explore data fetch` first.
+- **Tools do not appear in the client**: check the client's MCP logs. The
+  server logs to stderr; increasing verbosity with
+  `python -m lean_explore.mcp.server --backend local --log-level DEBUG`
+  helps diagnose startup issues.
+
+## The tools
+
+The server registers eight tools. IDs returned from `search` or
+`search_summary` can be passed to the per-field getters to fetch exactly the
+field you need, which keeps token usage low.
+
+### `search_summary`: the preferred starting point
+
+Returns only `id`, `name`, and a short description per hit. Use this first,
+then fetch details for the handful of entries you care about.
+
+**Parameters**
+
+| Name | Type | Default | Description |
+|---|---|---|---|
+| `query` | string | *required* | Declaration name or natural-language text. |
+| `limit` | int | `10` | Maximum results to return. |
+| `rerank_top` | int or null | `50` | Candidates to rerank with the cross-encoder. Set `0` or `null` to skip. Only used by the local backend. |
+| `packages` | list of strings | `null` | Filter by package name (e.g., `["Mathlib", "Std"]`). |
+
+**Returns**
+
+```json
+{
+  "query": "prime number divisibility",
+  "results": [
+    {"id": 12345, "name": "Nat.Prime.dvd_mul", "description": "Divisibility of a product by a prime"}
+  ],
+  "count": 1,
+  "processing_time_ms": 84
+}
+```
+
+### `search`: full results
+
+Same parameters as `search_summary`, but each result includes every field:
+`id`, `name`, `module`, `docstring`, `source_text`, `source_link`,
+`dependencies`, and `informalization`. Use this when you genuinely need all
+fields at once. For large `limit` values, prefer `search_summary` plus the
+per-field getters.
+
+### Per-field getters
+
+All six take a single parameter `declaration_id` (integer) and return `null`
+if the id does not exist. The id comes from a prior `search` or
+`search_summary` result.
+
+| Tool | Returns |
+|---|---|
+| `get_source_code` | `{id, name, source_text}`: the Lean source. |
+| `get_source_link` | `{id, name, source_link}`: GitHub URL to the source. |
+| `get_docstring` | `{id, name, docstring}`: the doc comment (may be null). |
+| `get_description` | `{id, name, informalization}`: AI-generated natural-language description. |
+| `get_module` | `{id, name, module}`: module path (e.g., `Mathlib.Data.List.Basic`). |
+| `get_dependencies` | `{id, name, dependencies}`: JSON array of declaration names this depends on. |
+
+## Recommended agent workflow
+
+The server's built-in instructions ask models to follow this pattern:
+
+1. **Browse with `search_summary`** for low token cost.
+2. **Fetch only what you need** with the per-field getters (`get_source_code`,
+   `get_docstring`, `get_description`, `get_module`, `get_dependencies`,
+   `get_source_link`).
+3. **Use `search`** only when you genuinely need all fields for every result
+   at once.
+
+This keeps context short while still giving the model access to full source
+code and dependency chains when it matters.
+
+## Query styles
+
+Both styles work through the same endpoints; the hybrid retriever decides
+internally which signal applies:
+
+- **By name**: `List.map`, `Nat.Prime`, `CategoryTheory.Functor.map`.
+- **By meaning**: `"continuous function on a compact set"`, `"sum of a
+  geometric series"`, `"a group homomorphism preserving multiplication"`.
+
+You do not need to specify which mode you want.
+
+## See also
+
+- [CLI Reference](./cli.md) for `lean-explore mcp serve` options.
+- [Local Search Backend](./local-backend.md) for how hybrid retrieval works
+  under the hood.
+- [Configuration](./configuration.md) for environment variables and cache
+  paths.