From 770d405b1297995989ec1df70cb2316be078aa0d Mon Sep 17 00:00:00 2001 From: justincasher Date: Sun, 19 Apr 2026 10:40:21 -0400 Subject: [PATCH 1/7] Add user-facing documentation Creates a docs/ folder with getting-started, CLI reference, MCP server, API client, local backend, configuration, and data models pages, and links them from the README so contributors and users can find them without relying solely on the hosted docs site. --- README.md | 12 ++- docs/README.md | 48 +++++++++++ docs/api-client.md | 136 +++++++++++++++++++++++++++++ docs/cli.md | 164 +++++++++++++++++++++++++++++++++++ docs/configuration.md | 143 +++++++++++++++++++++++++++++++ docs/data-models.md | 101 ++++++++++++++++++++++ docs/getting-started.md | 123 +++++++++++++++++++++++++++ docs/local-backend.md | 158 ++++++++++++++++++++++++++++++++++ docs/mcp-server.md | 183 ++++++++++++++++++++++++++++++++++++++++ 9 files changed, 1067 insertions(+), 1 deletion(-) create mode 100644 docs/README.md create mode 100644 docs/api-client.md create mode 100644 docs/cli.md create mode 100644 docs/configuration.md create mode 100644 docs/data-models.md create mode 100644 docs/getting-started.md create mode 100644 docs/local-backend.md create mode 100644 docs/mcp-server.md diff --git a/README.md b/README.md index 9f5c5a8..9f568a1 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,17 @@ A search engine for Lean 4 declarations. This project provides tools and resources for exploring the Lean 4 ecosystem. -**For full documentation, please visit: [https://www.leanexplore.com/docs](https://www.leanexplore.com/docs)** +**Documentation:** see the [`docs/`](docs/README.md) folder in this repo, or visit [https://www.leanexplore.com/docs](https://www.leanexplore.com/docs). + +## Documentation + +- [Getting Started](docs/getting-started.md) — install and run your first search. +- [CLI Reference](docs/cli.md) — every `lean-explore` command and flag. +- [MCP Server](docs/mcp-server.md) — wire LeanExplore into Claude, Cursor, or any MCP client. +- [API Client](docs/api-client.md) — use `ApiClient` from Python. +- [Local Search Backend](docs/local-backend.md) — how hybrid BM25 + FAISS + reranking works. +- [Configuration](docs/configuration.md) — environment variables and data layout. +- [Data Models](docs/data-models.md) — `SearchResult`, `SearchResponse`, and related types. ## Installation diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..938dbe0 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,48 @@ +# LeanExplore Documentation + +LeanExplore is a search engine for Lean 4 declarations — theorems, definitions, +lemmas, instances, and more — from Mathlib and other major Lean packages. It +provides hybrid retrieval that matches both by declaration name (e.g., +`List.map`, `Nat.Prime`) and by informal natural-language meaning (e.g., +"continuous function on a compact set"). + +This documentation covers how to install, configure, and use LeanExplore from +the CLI, as a Python library, and as an MCP server for AI assistants. + +## Contents + +- **[Getting Started](./getting-started.md)** — Install the package, fetch + data, and run your first search. +- **[CLI Reference](./cli.md)** — Every `lean-explore` command, flag, and + option, with examples. +- **[MCP Server](./mcp-server.md)** — Run the MCP server and connect it to + Claude, Cursor, and other MCP clients. Full reference for every tool exposed. +- **[API Client](./api-client.md)** — Use `ApiClient` to query the remote + LeanExplore API from Python. +- **[Local Search Backend](./local-backend.md)** — How the on-device + BM25 + FAISS + cross-encoder pipeline works. +- **[Configuration](./configuration.md)** — Environment variables, cache + paths, and data layout on disk. +- **[Data Models](./data-models.md)** — `SearchResult`, `SearchResponse`, + and related types returned from search. + +## Which backend should I use? + +LeanExplore has two backends and you pick one per task: + +| | **Remote API** | **Local backend** | +|---|---|---| +| Install | `pip install lean-explore` | `pip install lean-explore[local]` | +| Requires | API key | ~1 GB of data + a few GB of model weights | +| Network | Required per query | Only for initial data fetch | +| Use when | You want zero setup | You want offline, private, or tunable search | + +If you don't know yet, start with the remote API — it's the fastest path to a +working search. You can switch to local later without changing your code. + +## Links + +- Website: +- Repository: +- Paper: +- Contributing: [../CONTRIBUTING.md](../CONTRIBUTING.md) diff --git a/docs/api-client.md b/docs/api-client.md new file mode 100644 index 0000000..92a27a6 --- /dev/null +++ b/docs/api-client.md @@ -0,0 +1,136 @@ +# API Client + +`lean_explore.api.ApiClient` is an async HTTP client for the hosted +LeanExplore API. It ships with the base package — no PyTorch, no local +indices, no data download required. + +## Install and authenticate + +```bash +pip install lean-explore +``` + +Get an API key from and set it as an +environment variable: + +```bash +export LEANEXPLORE_API_KEY="your-key-here" +``` + +Or pass it explicitly to the client constructor. + +## Quick start + +```python +import asyncio +from lean_explore.api import ApiClient + +async def main(): + client = ApiClient() # reads LEANEXPLORE_API_KEY + + response = await client.search("prime number divisibility", limit=5) + for result in response.results: + print(f"{result.name} ({result.module})") + +asyncio.run(main()) +``` + +The `search` and `get_by_id` methods are both `async`, so call them from +inside an async function or via `asyncio.run`. + +## `ApiClient` + +```python +ApiClient(api_key: str | None = None, timeout: float = 10.0) +``` + +| Parameter | Default | Description | +|---|---|---| +| `api_key` | `None` | API key. Falls back to `LEANEXPLORE_API_KEY` env var. Raises `ValueError` if neither is provided. | +| `timeout` | `10.0` | HTTP timeout in seconds for every request. | + +The client hits `https://www.leanexplore.com/api/v2` by default. + +### `search()` + +```python +async def search( + query: str, + limit: int = 20, + rerank_top: int | None = None, + packages: list[str] | None = None, +) -> SearchResponse +``` + +| Parameter | Default | Description | +|---|---|---| +| `query` | *required* | Declaration name or natural-language text. | +| `limit` | `20` | Maximum results to return. | +| `rerank_top` | `None` | Ignored by the API backend (reranking is server-side). Accepted for interface parity with the local `Service`. | +| `packages` | `None` | Filter by package, e.g., `["Mathlib", "Std"]`. | + +Returns a [`SearchResponse`](./data-models.md#searchresponse) with `query`, +`results`, `count`, and `processing_time_ms`. + +Raises: + +- `httpx.HTTPStatusError` — the API returned a non-2xx status. +- `httpx.RequestError` — network, DNS, or timeout failure. + +### `get_by_id()` + +```python +async def get_by_id(declaration_id: int) -> SearchResult | None +``` + +Fetches a single declaration by its numeric id (ids come from `search` +results). Returns a [`SearchResult`](./data-models.md#searchresult) on hit, +or `None` on a 404. Other non-2xx statuses raise +`httpx.HTTPStatusError`. + +## Full example + +```python +import asyncio +from lean_explore.api import ApiClient + +async def main(): + client = ApiClient(api_key="sk-...", timeout=15.0) + + response = await client.search( + query="continuous function on a compact set", + limit=10, + packages=["Mathlib"], + ) + print(f"{response.count} results in {response.processing_time_ms} ms") + + # Grab the first result's full source + if response.results: + first = response.results[0] + print(f"\n{first.name}") + print(first.source_link) + print(first.source_text) + + # Round-trip by id + same = await client.get_by_id(first.id) + assert same is not None and same.id == first.id + +asyncio.run(main()) +``` + +## Notes + +- The client creates a fresh `httpx.AsyncClient` per call. If you make many + rapid requests, consider wrapping them in `asyncio.gather(...)` — they run + concurrently without extra setup. +- Package names follow the casing shown in results (e.g., `Mathlib`, `Std`, + `PhysLean`). +- For offline use or custom tuning, see [Local Search Backend](./local-backend.md) + and use `lean_explore.search.Service` instead. + +## See also + +- [Data Models](./data-models.md) — field reference for `SearchResult` and + `SearchResponse`. +- [Configuration](./configuration.md) — environment variables including + `LEANEXPLORE_API_KEY`. diff --git a/docs/cli.md b/docs/cli.md new file mode 100644 index 0000000..03de9f7 --- /dev/null +++ b/docs/cli.md @@ -0,0 +1,164 @@ +# CLI Reference + +The `lean-explore` command is installed alongside the Python package. It is +organized as a small command tree: + +``` +lean-explore +├── search # Query Lean declarations +├── mcp +│ └── serve # Run the MCP server (api or local backend) +└── data + ├── fetch # Download the data toolchain + └── clean # Delete all cached data +``` + +Run `lean-explore --help` or `lean-explore --help` for built-in +help at any time. + +## `lean-explore search` + +Search the Lean Explore index and print results to your terminal. + +```bash +lean-explore search QUERY [OPTIONS] +``` + +### Arguments + +| Argument | Description | +|---|---| +| `QUERY` | The search query. Can be a declaration name (`List.map`, `Nat.Prime`) or natural-language text (`"continuous function on a compact set"`). Required. | + +### Options + +| Flag | Default | Description | +|---|---|---| +| `--limit`, `-n` | `5` | Number of results to display. | +| `--package`, `-p` | (all) | Filter by package. Repeatable: `-p Mathlib -p Std`. | + +### Requirements + +`lean-explore search` uses the remote API. You must have `LEANEXPLORE_API_KEY` +set in your environment: + +```bash +export LEANEXPLORE_API_KEY="your-key-here" +``` + +### Examples + +```bash +lean-explore search "List.map" +lean-explore search "prime number divisibility" --limit 10 +lean-explore search "fundamental theorem of calculus" -p Mathlib +lean-explore search "continuous" -p Mathlib -p Std -n 20 +``` + +## `lean-explore mcp serve` + +Launch the Model Context Protocol server so an MCP client (Claude, Cursor, +etc.) can call LeanExplore's search tools. The server speaks MCP over stdio, +so you normally do not run it manually — your MCP client launches it. Running +it directly is mostly useful for debugging. + +```bash +lean-explore mcp serve [OPTIONS] +``` + +### Options + +| Flag | Default | Description | +|---|---|---| +| `--backend`, `-b` | `api` | Backend to use: `api` or `local`. | +| `--api-key` | — | API key for the `api` backend. Overrides `LEANEXPLORE_API_KEY`. | + +### Backends + +- **`api`** — Delegates every query to the hosted LeanExplore API. Requires an + API key (via env var or `--api-key`). +- **`local`** — Runs the full hybrid search pipeline on-device. Requires + `pip install lean-explore[local]` and `lean-explore data fetch`. + +### Examples + +```bash +# Remote API (most users) +lean-explore mcp serve --backend api + +# Remote API with an inline key +lean-explore mcp serve --backend api --api-key sk-... + +# Local, fully offline backend +lean-explore mcp serve --backend local +``` + +For MCP tool schemas and client configuration, see +[MCP Server](./mcp-server.md). + +## `lean-explore data fetch` + +Download the prebuilt data toolchain used by the local search backend. This +pulls the SQLite database, the FAISS semantic index, and the BM25 indices from +remote storage into `~/.lean_explore/cache//`. + +```bash +lean-explore data fetch [OPTIONS] +``` + +### Options + +| Flag | Default | Description | +|---|---|---| +| `--version`, `-v` | latest | Specific version to install (e.g., `20260127_103630`). | + +### Behavior + +1. Fetches the current version tag (or uses the one you passed). +2. Downloads all required files to `~/.lean_explore/cache//`: + - `lean_explore.db` + - `informalization_faiss.index` and `informalization_faiss_ids_map.json` + - `bm25_ids_map.json` + - `bm25_name_raw/` and `bm25_name_spaced/` directories +3. Writes `~/.lean_explore/active_version` so future commands resolve the + correct cache directory. +4. Removes any older cached versions to reclaim disk space. + +Already-present files are skipped, so re-running `fetch` after an interrupted +download will resume rather than start over. + +### Examples + +```bash +# Install the latest version +lean-explore data fetch + +# Pin a specific version +lean-explore data fetch --version 20260127_103630 +``` + +## `lean-explore data clean` + +Remove every cached data toolchain and clear the active-version pointer. +Useful if you want to reset state or free disk. + +```bash +lean-explore data clean +``` + +This command prompts for confirmation before deleting anything. It does not +touch downloaded model weights — those live under `~/.cache/huggingface/`. + +## Exit codes + +All CLI commands follow standard conventions: + +- `0` — success +- non-zero — an error occurred (missing API key, failed download, etc.). + An error message is printed to stderr. + +## See also + +- [Getting Started](./getting-started.md) +- [MCP Server](./mcp-server.md) +- [Configuration](./configuration.md) diff --git a/docs/configuration.md b/docs/configuration.md new file mode 100644 index 0000000..ce6e992 --- /dev/null +++ b/docs/configuration.md @@ -0,0 +1,143 @@ +# Configuration + +This page lists every environment variable LeanExplore reads, where files +live on disk, and how to override defaults. Configuration is centralized in +`src/lean_explore/config.py`. + +## Environment variables + +### Authentication + +| Variable | Default | Used by | +|---|---|---| +| `LEANEXPLORE_API_KEY` | — (required for API use) | `ApiClient`, `lean-explore search`, `lean-explore mcp serve --backend api` | + +### Paths + +| Variable | Default | Purpose | +|---|---|---| +| `LEAN_EXPLORE_CACHE_DIR` | `~/.lean_explore/cache` | Where `lean-explore data fetch` downloads data. Read by the local backend. | +| `LEAN_EXPLORE_DATA_DIR` | `/data` | Local extraction output directory. Only relevant if you run the extraction pipeline yourself. | +| `LEAN_EXPLORE_PACKAGES_ROOT` | `/lean` | Root for per-package Lean workspaces. Only relevant for extraction. | +| `LEAN_EXPLORE_VERSION` | contents of `~/.lean_explore/active_version`, or `v4.24.0` | Pin a specific cached version. | + +### Local backend tuning + +| Variable | Default | Purpose | +|---|---|---| +| `LEAN_EXPLORE_EMBEDDING_BATCH_SIZE` | `8` | Batch size used when embedding the query with the sentence-transformer. Raise on GPUs with more VRAM. | +| `LEAN_EXPLORE_RERANKER_BATCH_SIZE` | `16` on CUDA, `32` on CPU | Batch size for the cross-encoder reranker. | + +### Extraction pipeline (only for contributors rebuilding the index) + +| Variable | Default | Purpose | +|---|---|---| +| `OPENROUTER_API_KEY` | — | Used by the informalization step to call OpenRouter. | + +## On-disk layout + +### Cache directory + +Populated by `lean-explore data fetch`: + +``` +~/.lean_explore/ +├── active_version # Text file: currently active version +└── cache/ + └── / # e.g. 20260127_103630 + ├── lean_explore.db # SQLite database + ├── informalization_faiss.index # FAISS semantic index + ├── informalization_faiss_ids_map.json + ├── bm25_ids_map.json + ├── bm25_name_raw/ # BM25 index (raw tokenization) + │ ├── data.csc.index.npy + │ ├── indices.csc.index.npy + │ ├── indptr.csc.index.npy + │ ├── nonoccurrence_array.index.npy + │ ├── params.index.json + │ └── vocab.index.json + └── bm25_name_spaced/ # BM25 index (spaced tokenization) + └── (same six files) +``` + +Change this location with `LEAN_EXPLORE_CACHE_DIR`. + +### Model cache + +First-run local-backend models are cached under `~/.cache/huggingface/` by +`sentence-transformers` and `transformers`. This is controlled by +Hugging Face, not LeanExplore — see `HF_HOME` / `TRANSFORMERS_CACHE` if you +need to move it. + +## Programmatic access + +All of the above resolves through the `Config` class: + +```python +from lean_explore.config import Config + +Config.CACHE_DIRECTORY # ~/.lean_explore/cache +Config.ACTIVE_VERSION # current version string +Config.ACTIVE_CACHE_PATH # ~/.lean_explore/cache/ +Config.DATABASE_PATH # .../lean_explore.db +Config.DATABASE_URL # sqlite+aiosqlite:///.../lean_explore.db +Config.FAISS_INDEX_PATH +Config.FAISS_IDS_MAP_PATH +Config.BM25_RAW_PATH +Config.BM25_SPACED_PATH +Config.BM25_IDS_MAP_PATH +Config.API_BASE_URL # https://www.leanexplore.com/api/v2 +Config.R2_ASSETS_BASE_URL # data download base URL +``` + +`ACTIVE_VERSION` is resolved, in order: + +1. `LEAN_EXPLORE_VERSION` environment variable, if set. +2. Contents of `~/.lean_explore/active_version` (written by `data fetch`). +3. Fallback: `v4.24.0`. + +## Common recipes + +**Move the cache to a shared drive.** + +```bash +export LEAN_EXPLORE_CACHE_DIR=/mnt/shared/lean-explore-cache +lean-explore data fetch +``` + +**Pin a specific data version across a team.** + +```bash +export LEAN_EXPLORE_VERSION=20260127_103630 +``` + +**Tune for a beefier GPU.** + +```bash +export LEAN_EXPLORE_EMBEDDING_BATCH_SIZE=64 +export LEAN_EXPLORE_RERANKER_BATCH_SIZE=128 +lean-explore mcp serve --backend local +``` + +**Run the MCP server under a non-default config in Claude Desktop.** + +```json +{ + "mcpServers": { + "lean-explore": { + "command": "lean-explore", + "args": ["mcp", "serve", "--backend", "local"], + "env": { + "LEAN_EXPLORE_CACHE_DIR": "/Users/me/lean-cache", + "LEAN_EXPLORE_EMBEDDING_BATCH_SIZE": "32" + } + } + } +} +``` + +## See also + +- [Getting Started](./getting-started.md) +- [Local Search Backend](./local-backend.md) +- [MCP Server](./mcp-server.md) diff --git a/docs/data-models.md b/docs/data-models.md new file mode 100644 index 0000000..24856ac --- /dev/null +++ b/docs/data-models.md @@ -0,0 +1,101 @@ +# Data Models + +All public search results are Pydantic v2 models, defined in +`lean_explore.models`. Both the remote `ApiClient` and the local `Service` +return the same types, so code written against one works against the other. + +## Imports + +```python +from lean_explore.models import ( + SearchResult, + SearchResponse, + SearchResultSummary, + SearchSummaryResponse, + extract_bold_description, +) +``` + +## `SearchResult` + +A single Lean declaration returned from a search. + +| Field | Type | Description | +|---|---|---| +| `id` | `int` | Primary key identifier. | +| `name` | `str` | Fully qualified Lean name (e.g., `Nat.add`). | +| `module` | `str` | Module path (e.g., `Mathlib.Data.List.Basic`). | +| `docstring` | `str \| None` | Doc comment from the Lean source. May be `None`. | +| `source_text` | `str` | Full Lean source code for the declaration. | +| `source_link` | `str` | GitHub URL to the source. | +| `dependencies` | `str \| None` | JSON-encoded array of declaration names this depends on. Parse with `json.loads` if you need the list. | +| `informalization` | `str \| None` | AI-generated natural-language description. Typically starts with a bold header: `**Title.** Rest...`. | + +Example: + +```python +for result in response.results: + print(result.name) + print(" module:", result.module) + print(" link: ", result.source_link) + if result.docstring: + print(" doc: ", result.docstring.splitlines()[0]) +``` + +## `SearchResponse` + +Envelope returned by `ApiClient.search()` and `Service.search()`. + +| Field | Type | Description | +|---|---|---| +| `query` | `str` | The original query string. | +| `results` | `list[SearchResult]` | Hits, ordered best-first. | +| `count` | `int` | `len(results)`. | +| `processing_time_ms` | `int \| None` | Server/engine latency in milliseconds. | + +## `SearchResultSummary` + +Slim form used by the MCP `search_summary` tool. Contains only enough to let +a caller decide which ids to drill into. + +| Field | Type | Description | +|---|---|---| +| `id` | `int` | Declaration id (feed into per-field getters). | +| `name` | `str` | Fully qualified Lean name. | +| `description` | `str \| None` | The bold header extracted from the informalization. | + +## `SearchSummaryResponse` + +Envelope for slim results. + +| Field | Type | Description | +|---|---|---| +| `query` | `str` | The original query string. | +| `results` | `list[SearchResultSummary]` | Slim hits, ordered best-first. | +| `count` | `int` | `len(results)`. | +| `processing_time_ms` | `int \| None` | Latency in milliseconds. | + +## `extract_bold_description` + +```python +extract_bold_description(informalization: str | None) -> str | None +``` + +Pulls the `**Bold Title.**` prefix out of an informalization. Returns `None` +if the input is `None` or does not start with a bold header. This is the +helper used to build `SearchResultSummary.description`. + +## ORM model (advanced) + +If you are working directly against the SQLite database (e.g., in the +extraction pipeline), `lean_explore.models.Declaration` is the SQLAlchemy +ORM model that backs the `declarations` table. Most users should stick to +the Pydantic models above. + +## See also + +- [API Client](./api-client.md) — returns `SearchResponse` / `SearchResult`. +- [Local Search Backend](./local-backend.md) — same return types via + `Service`. +- [MCP Server](./mcp-server.md) — tool return payloads serialize these + models to JSON. diff --git a/docs/getting-started.md b/docs/getting-started.md new file mode 100644 index 0000000..71fed18 --- /dev/null +++ b/docs/getting-started.md @@ -0,0 +1,123 @@ +# Getting Started + +This guide takes you from zero to your first search in under five minutes. +LeanExplore has two paths: + +1. **Remote API** — a small pure-Python install that talks to the hosted + search service. Best for getting started. +2. **Local backend** — runs the full search pipeline on your machine. Larger + install, no network calls after the initial data fetch. + +## Requirements + +- Python 3.10 or newer (3.11, 3.12 also supported) +- `pip` (or `uv`, `poetry`, etc.) +- For the local backend: roughly 1 GB of disk for data files and a few GB for + the first-run model download (embedding + reranker models from Hugging Face) + +## Option 1: Remote API (recommended to start) + +### 1. Install + +```bash +pip install lean-explore +``` + +This installs the CLI, the `ApiClient`, and the MCP server — about 50 MB of +pure-Python and C-extension dependencies. No PyTorch. + +### 2. Get an API key + +Sign up and generate a key at . Then export it: + +```bash +export LEANEXPLORE_API_KEY="your-key-here" +``` + +You can also add it to your shell profile (`~/.zshrc`, `~/.bashrc`) so it +persists between sessions. + +### 3. Run a search + +```bash +lean-explore search "prime number divisibility" +lean-explore search "List.map" --limit 10 +lean-explore search "fundamental theorem of arithmetic" --package Mathlib +``` + +The first argument is the query — it can be a Lean declaration name, a partial +name, or a natural-language description. The search engine handles both at +once; you don't need to pick a mode. + +### 4. (Optional) Run the MCP server + +If you want to give Claude, Cursor, or another MCP client access to +LeanExplore: + +```bash +lean-explore mcp serve --backend api +``` + +See [MCP Server](./mcp-server.md) for client configuration. + +## Option 2: Local backend + +Use this when you need offline search, want more control over parameters, or +want to avoid per-query network latency. + +### 1. Install with the `[local]` extra + +```bash +pip install lean-explore[local] +``` + +This adds PyTorch and `sentence-transformers`. Expect roughly 1–2 GB of +Python dependencies. + +### 2. Fetch the data + +```bash +lean-explore data fetch +``` + +This downloads the latest prebuilt database, FAISS index, and BM25 indices to +`~/.lean_explore/cache//`. The total download is on the order of +1 GB. + +### 3. Run the MCP server with the local backend + +```bash +lean-explore mcp serve --backend local +``` + +The first run will also download the embedding and reranker models from +Hugging Face (`Qwen/Qwen3-Embedding-0.6B` and `Qwen/Qwen3-Reranker-0.6B`), +caching them under `~/.cache/huggingface/`. Subsequent runs are fast. + +The `lean-explore search` CLI currently uses the remote API — for programmatic +local search, see [Local Search Backend](./local-backend.md) and use +`SearchEngine` / `Service` directly. + +## What's indexed + +LeanExplore currently covers: + +- Batteries +- CSLib +- FLT (Fermat's Last Theorem) +- FormalConjectures +- Init +- Lean (core) +- Mathlib +- PhysLean +- Std + +You can filter any search to a subset with `--package` (CLI) or the +`packages=[...]` argument (API/MCP). + +## Next steps + +- [CLI Reference](./cli.md) — every command and flag. +- [MCP Server](./mcp-server.md) — wire LeanExplore into Claude or Cursor. +- [API Client](./api-client.md) — use `ApiClient` from Python. +- [Configuration](./configuration.md) — environment variables and data layout. diff --git a/docs/local-backend.md b/docs/local-backend.md new file mode 100644 index 0000000..ece07af --- /dev/null +++ b/docs/local-backend.md @@ -0,0 +1,158 @@ +# Local Search Backend + +The local backend runs LeanExplore's full hybrid retrieval pipeline on your +machine. Once the data is fetched, no network calls are needed — queries +execute entirely locally. + +This page explains: + +- [When to use it](#when-to-use-it) +- [Installation and data setup](#installation-and-data-setup) +- [How hybrid retrieval works](#how-hybrid-retrieval-works) +- [Using it from Python](#using-it-from-python) + +## When to use it + +- **Offline use** — no network per query after the initial data fetch. +- **Privacy** — queries never leave your machine. +- **Tuning** — you can adjust candidate pool sizes and rerank depth. +- **Integration** — embed `SearchEngine` or `Service` directly in your + application or data pipeline. + +For quick exploration or low-setup use, prefer the +[remote API client](./api-client.md) instead. + +## Installation and data setup + +```bash +pip install lean-explore[local] +lean-explore data fetch +``` + +This adds PyTorch and `sentence-transformers`, then downloads the prebuilt +database and indices into `~/.lean_explore/cache//`. The first +search will additionally download the embedding model +(`Qwen/Qwen3-Embedding-0.6B`) and reranker model +(`Qwen/Qwen3-Reranker-0.6B`) from Hugging Face, cached under +`~/.cache/huggingface/`. + +GPU is optional but used automatically when available. + +## How hybrid retrieval works + +A query flows through five stages: + +1. **BM25 lexical search on names.** Two BM25 indices score the query against + declaration names — one using raw tokenization (entire dotted name as a + single token), one using spaced tokenization (splitting on dots, + underscores, and camelCase). This catches both exact and fuzzy name + matches (e.g., `list.map` → `List.map`). +2. **FAISS semantic search on informalizations.** Each declaration has an + AI-generated natural-language description (its *informalization*). The + query is embedded with the Qwen embedding model and the top FAISS + neighbours are retrieved. This catches "what does this thing mean" style + queries. +3. **Reciprocal Rank Fusion.** The candidate lists are merged by summing + `1/rank` across the BM25 and FAISS results, producing a single fused + ranking. +4. **Dependency boosting.** Candidates that are depended on by top-ranked + results get a small score boost, surfacing foundational lemmas. +5. **Cross-encoder reranking.** The top N candidates (default 25–50) are + rescored pairwise with the Qwen reranker, which looks at the query and + candidate text together. The final top results are returned. + +Auto-generated declarations (e.g., `.mk` constructors) are filtered from the +final list. + +Switches available on every search call: + +- `limit` — how many results to return (default 20–50). +- `rerank_top` — how deep to rerank. Larger is slower but more precise. + Setting it to `0` or `None` skips reranking entirely. +- `packages` — restrict to a subset of packages. + +## Using it from Python + +Two layers are available: + +- **`Service`** — the convenient wrapper. Returns a `SearchResponse` with + timing metadata. +- **`SearchEngine`** — the lower-level engine. Returns a plain + `list[SearchResult]` and exposes all retrieval knobs. + +### `Service` + +```python +import asyncio +from lean_explore.search import Service + +async def main(): + service = Service() # spins up a default SearchEngine + + response = await service.search( + query="continuous function on a compact set", + limit=10, + rerank_top=50, + packages=["Mathlib"], + ) + for result in response.results: + print(result.name, "-", result.module) + + # Look up one declaration by id + first = response.results[0] + same = await service.get_by_id(first.id) + +asyncio.run(main()) +``` + +### `SearchEngine` + +```python +import asyncio +from lean_explore.search import SearchEngine + +async def main(): + engine = SearchEngine() + + results = await engine.search( + query="List.map", + limit=25, + faiss_k=1000, # how many FAISS candidates to pull + bm25_k=1000, # how many BM25 candidates to pull + rerank_top=25, # depth of cross-encoder reranking + packages=["Mathlib", "Std"], + ) + for result in results: + print(result.name) + +asyncio.run(main()) +``` + +Key constructor options (all optional; sensible defaults): + +| Parameter | Default | Purpose | +|---|---|---| +| `db_url` | configured URL | SQLAlchemy database URL. | +| `embedding_model_name` | `Qwen/Qwen3-Embedding-0.6B` | Sentence-transformer model for queries. | +| `reranker_model_name` | `Qwen/Qwen3-Reranker-0.6B` | Cross-encoder used for final reranking. | +| `use_local_data` | `False` | `True` reads from `DATA_DIRECTORY` (extraction output). `False` reads from `CACHE_DIRECTORY` (downloaded data). | + +For the full set of parameters, see the class docstring in +`src/lean_explore/search/engine.py`. + +## Tuning tips + +- **Small queries, fast responses** — set `rerank_top=0` to skip the + cross-encoder. You lose some precision but gain significant speed. +- **Recall-heavy queries** — bump `faiss_k` and `bm25_k` to widen the + candidate pool before reranking. +- **Memory-constrained machines** — lower `LEAN_EXPLORE_EMBEDDING_BATCH_SIZE` + and `LEAN_EXPLORE_RERANKER_BATCH_SIZE`. See [Configuration](./configuration.md). + +## See also + +- [Data Models](./data-models.md) for the shape of `SearchResult`. +- [Configuration](./configuration.md) for env vars that control batch sizes + and cache paths. +- [MCP Server](./mcp-server.md) — the local backend powers + `lean-explore mcp serve --backend local`. diff --git a/docs/mcp-server.md b/docs/mcp-server.md new file mode 100644 index 0000000..d1cb04a --- /dev/null +++ b/docs/mcp-server.md @@ -0,0 +1,183 @@ +# MCP Server + +LeanExplore ships with a [Model Context Protocol](https://modelcontextprotocol.io) +server that exposes Lean search as a set of tools. Any MCP-compatible client +(Claude, Cursor, Zed, Continue, etc.) can launch it and let its model search +Mathlib and other Lean packages during a conversation. + +This page covers: + +- [Running the server](#running-the-server) +- [Connecting from MCP clients](#connecting-from-mcp-clients) +- [The tools](#the-tools) and their schemas +- [Recommended agent workflow](#recommended-agent-workflow) + +## Running the server + +The server speaks MCP over stdio — your client launches it as a subprocess. +You rarely invoke it directly except for debugging. + +### Remote API backend (default) + +```bash +lean-explore mcp serve --backend api +``` + +Requires `LEANEXPLORE_API_KEY` in the environment, or pass `--api-key`: + +```bash +lean-explore mcp serve --backend api --api-key sk-... +``` + +### Local backend + +```bash +lean-explore mcp serve --backend local +``` + +Requires `pip install lean-explore[local]` and a prior +`lean-explore data fetch`. The first run also downloads the embedding and +reranker models from Hugging Face. + +If required data files are missing, the server exits immediately with a +message pointing at `~/.lean_explore/cache//`. + +## Connecting from MCP clients + +### Claude Desktop / Claude Code + +Add an entry to your MCP settings file (location depends on client; Claude +Desktop uses `~/Library/Application Support/Claude/claude_desktop_config.json` +on macOS): + +```json +{ + "mcpServers": { + "lean-explore": { + "command": "lean-explore", + "args": ["mcp", "serve", "--backend", "api"], + "env": { + "LEANEXPLORE_API_KEY": "your-key-here" + } + } + } +} +``` + +For the local backend, swap `"api"` for `"local"` and drop the `env` block: + +```json +{ + "mcpServers": { + "lean-explore": { + "command": "lean-explore", + "args": ["mcp", "serve", "--backend", "local"] + } + } +} +``` + +### Cursor, Zed, Continue, and others + +Any client that accepts a command + args will work. Point it at the +`lean-explore` executable with the same arguments shown above. + +### Troubleshooting + +- **"API key required"** — set `LEANEXPLORE_API_KEY` in the `env` block (for + MCP clients that support it) or pass `--api-key` in `args`. +- **"Essential data files for the local backend are missing"** — run + `lean-explore data fetch` first. +- **Tools do not appear in the client** — check the client's MCP logs. The + server logs to stderr; increasing verbosity with + `python -m lean_explore.mcp.server --backend local --log-level DEBUG` + helps diagnose startup issues. + +## The tools + +The server registers eight tools. IDs returned from `search` or +`search_summary` can be passed to the per-field getters to fetch exactly the +field you need, which keeps token usage low. + +### `search_summary` — the preferred starting point + +Returns only `id`, `name`, and a short description per hit. Use this first, +then fetch details for the handful of entries you care about. + +**Parameters** + +| Name | Type | Default | Description | +|---|---|---|---| +| `query` | string | *required* | Declaration name or natural-language text. | +| `limit` | int | `10` | Maximum results to return. | +| `rerank_top` | int or null | `50` | Candidates to rerank with the cross-encoder. Set `0` or `null` to skip. Only used by the local backend. | +| `packages` | list of strings | `null` | Filter by package name (e.g., `["Mathlib", "Std"]`). | + +**Returns** + +```json +{ + "query": "prime number divisibility", + "results": [ + {"id": 12345, "name": "Nat.Prime.dvd_mul", "description": "Divisibility of a product by a prime"} + ], + "count": 1, + "processing_time_ms": 84 +} +``` + +### `search` — full results + +Same parameters as `search_summary`, but each result includes every field: +`id`, `name`, `module`, `docstring`, `source_text`, `source_link`, +`dependencies`, and `informalization`. Use this when you genuinely need all +fields at once. For large `limit` values, prefer `search_summary` plus the +per-field getters. + +### Per-field getters + +All six take a single parameter `declaration_id` (integer) and return `null` +if the id does not exist. The id comes from a prior `search` or +`search_summary` result. + +| Tool | Returns | +|---|---| +| `get_source_code` | `{id, name, source_text}` — the Lean source. | +| `get_source_link` | `{id, name, source_link}` — GitHub URL to the source. | +| `get_docstring` | `{id, name, docstring}` — the doc comment (may be null). | +| `get_description` | `{id, name, informalization}` — AI-generated natural-language description. | +| `get_module` | `{id, name, module}` — module path (e.g., `Mathlib.Data.List.Basic`). | +| `get_dependencies` | `{id, name, dependencies}` — JSON array of declaration names this depends on. | + +## Recommended agent workflow + +The server's built-in instructions ask models to follow this pattern: + +1. **Browse with `search_summary`** for low token cost. +2. **Fetch only what you need** with the per-field getters (`get_source_code`, + `get_docstring`, `get_description`, `get_module`, `get_dependencies`, + `get_source_link`). +3. **Use `search`** only when you genuinely need all fields for every result + at once. + +This keeps context short while still giving the model access to full source +code and dependency chains when it matters. + +## Query styles + +Both styles work through the same endpoints — the hybrid retriever decides +internally which signal applies: + +- **By name** — `List.map`, `Nat.Prime`, `CategoryTheory.Functor.map`. +- **By meaning** — `"continuous function on a compact set"`, `"sum of a + geometric series"`, `"a group homomorphism preserving multiplication"`. + +You do not need to specify which mode you want. + +## See also + +- [CLI Reference](./cli.md) for `lean-explore mcp serve` options. +- [Local Search Backend](./local-backend.md) for how hybrid retrieval works + under the hood. +- [Configuration](./configuration.md) for environment variables and cache + paths. From 4e8a7643141953715a40425ae34cf19412ea8c7d Mon Sep 17 00:00:00 2001 From: justincasher Date: Sun, 19 Apr 2026 10:41:45 -0400 Subject: [PATCH 2/7] Format documentation index as a table Switches the docs list in the root README and in docs/README.md from bullets to a two-column table so the page and its description line up visually. --- README.md | 16 +++++++++------- docs/README.md | 23 +++++++++-------------- 2 files changed, 18 insertions(+), 21 deletions(-) diff --git a/README.md b/README.md index 9f568a1..2780f29 100644 --- a/README.md +++ b/README.md @@ -27,13 +27,15 @@ A search engine for Lean 4 declarations. This project provides tools and resourc ## Documentation -- [Getting Started](docs/getting-started.md) — install and run your first search. -- [CLI Reference](docs/cli.md) — every `lean-explore` command and flag. -- [MCP Server](docs/mcp-server.md) — wire LeanExplore into Claude, Cursor, or any MCP client. -- [API Client](docs/api-client.md) — use `ApiClient` from Python. -- [Local Search Backend](docs/local-backend.md) — how hybrid BM25 + FAISS + reranking works. -- [Configuration](docs/configuration.md) — environment variables and data layout. -- [Data Models](docs/data-models.md) — `SearchResult`, `SearchResponse`, and related types. +| Page | Description | +|---|---| +| [Getting Started](docs/getting-started.md) | Install and run your first search. | +| [CLI Reference](docs/cli.md) | Every `lean-explore` command and flag. | +| [MCP Server](docs/mcp-server.md) | Wire LeanExplore into Claude, Cursor, or any MCP client. | +| [API Client](docs/api-client.md) | Use `ApiClient` from Python. | +| [Local Search Backend](docs/local-backend.md) | How hybrid BM25 + FAISS + reranking works. | +| [Configuration](docs/configuration.md) | Environment variables and data layout. | +| [Data Models](docs/data-models.md) | `SearchResult`, `SearchResponse`, and related types. | ## Installation diff --git a/docs/README.md b/docs/README.md index 938dbe0..dfaf4bb 100644 --- a/docs/README.md +++ b/docs/README.md @@ -11,20 +11,15 @@ the CLI, as a Python library, and as an MCP server for AI assistants. ## Contents -- **[Getting Started](./getting-started.md)** — Install the package, fetch - data, and run your first search. -- **[CLI Reference](./cli.md)** — Every `lean-explore` command, flag, and - option, with examples. -- **[MCP Server](./mcp-server.md)** — Run the MCP server and connect it to - Claude, Cursor, and other MCP clients. Full reference for every tool exposed. -- **[API Client](./api-client.md)** — Use `ApiClient` to query the remote - LeanExplore API from Python. -- **[Local Search Backend](./local-backend.md)** — How the on-device - BM25 + FAISS + cross-encoder pipeline works. -- **[Configuration](./configuration.md)** — Environment variables, cache - paths, and data layout on disk. -- **[Data Models](./data-models.md)** — `SearchResult`, `SearchResponse`, - and related types returned from search. +| Page | Description | +|---|---| +| [Getting Started](./getting-started.md) | Install the package, fetch data, and run your first search. | +| [CLI Reference](./cli.md) | Every `lean-explore` command, flag, and option, with examples. | +| [MCP Server](./mcp-server.md) | Run the MCP server and connect it to Claude, Cursor, and other MCP clients. Full reference for every tool exposed. | +| [API Client](./api-client.md) | Use `ApiClient` to query the remote LeanExplore API from Python. | +| [Local Search Backend](./local-backend.md) | How the on-device BM25 + FAISS + cross-encoder pipeline works. | +| [Configuration](./configuration.md) | Environment variables, cache paths, and data layout on disk. | +| [Data Models](./data-models.md) | `SearchResult`, `SearchResponse`, and related types returned from search. | ## Which backend should I use? From 9b65f33aa639adf817cb2fa2e8564b92f3ad1b68 Mon Sep 17 00:00:00 2001 From: justincasher Date: Sun, 19 Apr 2026 10:43:45 -0400 Subject: [PATCH 3/7] Add extraction pipeline documentation Covers prerequisites (elan/lake, OPENROUTER_API_KEY, the [extract] extra), the four stages (doc-gen4 -> parse -> informalize -> embed -> index), common invocations for full rebuilds and single-stage reruns, the main flags, and the timestamped output layout under LEAN_EXPLORE_DATA_DIR. --- README.md | 1 + docs/README.md | 1 + docs/extraction-pipeline.md | 182 ++++++++++++++++++++++++++++++++++++ 3 files changed, 184 insertions(+) create mode 100644 docs/extraction-pipeline.md diff --git a/README.md b/README.md index 2780f29..7d2a14e 100644 --- a/README.md +++ b/README.md @@ -36,6 +36,7 @@ A search engine for Lean 4 declarations. This project provides tools and resourc | [Local Search Backend](docs/local-backend.md) | How hybrid BM25 + FAISS + reranking works. | | [Configuration](docs/configuration.md) | Environment variables and data layout. | | [Data Models](docs/data-models.md) | `SearchResult`, `SearchResponse`, and related types. | +| [Extraction Pipeline](docs/extraction-pipeline.md) | Rebuild the dataset from Lean source (contributors). | ## Installation diff --git a/docs/README.md b/docs/README.md index dfaf4bb..dbe0db6 100644 --- a/docs/README.md +++ b/docs/README.md @@ -20,6 +20,7 @@ the CLI, as a Python library, and as an MCP server for AI assistants. | [Local Search Backend](./local-backend.md) | How the on-device BM25 + FAISS + cross-encoder pipeline works. | | [Configuration](./configuration.md) | Environment variables, cache paths, and data layout on disk. | | [Data Models](./data-models.md) | `SearchResult`, `SearchResponse`, and related types returned from search. | +| [Extraction Pipeline](./extraction-pipeline.md) | Rebuild the dataset from Lean source — contributor-focused. | ## Which backend should I use? diff --git a/docs/extraction-pipeline.md b/docs/extraction-pipeline.md new file mode 100644 index 0000000..07203e4 --- /dev/null +++ b/docs/extraction-pipeline.md @@ -0,0 +1,182 @@ +# Extraction Pipeline + +The extraction pipeline is how LeanExplore *builds* the dataset that the +search engine queries. For everyday use you do not run it — `lean-explore +data fetch` downloads a prebuilt version. This page is for contributors who +want to regenerate the data (e.g., against a newer Mathlib) or run a custom +build locally. + +The pipeline lives in `lean_explore.extract` and is invoked as a Python +module: + +```bash +python -m lean_explore.extract [OPTIONS] +``` + +## What the pipeline does + +Four stages, run in order: + +1. **doc-gen4** (optional, `--run-doc-gen4`) — runs Lake + doc-gen4 against + each package workspace in `lean//` to produce structured + declaration data (BMP files). This is the slow step: it builds Lean + packages from source. +2. **Parse** (`--parse-docs`) — reads the doc-gen4 output and writes one + row per declaration into a fresh SQLite database under a new + `YYYYMMDD_HHMMSS` directory in `LEAN_EXPLORE_DATA_DIR`. +3. **Informalize** (`--informalize`) — calls an LLM (through OpenRouter) to + generate a natural-language description for each declaration. +4. **Embed** (`--embeddings`) — runs each informalization through the + sentence-transformer to produce vector embeddings. +5. **Index** (`--index`) — builds the FAISS semantic index and the two BM25 + indices over the declaration names. + +Running with no flags runs all stages. Passing any stage flag switches to +"only run the stages I asked for" mode — which is how you resume after a +failure, or iterate on a single step. + +## Prerequisites + +### Python install + +```bash +pip install lean-explore[extract] +``` + +This adds PyTorch, `sentence-transformers`, and `networkx` on top of the +base dependencies. + +### Lean toolchain + +You only need this if you plan to run `--run-doc-gen4`. Install +[`elan`](https://github.com/leanprover/elan) (the Lean version manager), +which provides `lake`. The pipeline fetches the correct Lean toolchain per +package from GitHub. + +### Environment variables + +| Variable | Required for | Purpose | +|---|---|---| +| `OPENROUTER_API_KEY` | informalize stage | LLM calls go through [OpenRouter](https://openrouter.ai). | +| `LEAN_EXPLORE_DATA_DIR` | optional | Where extractions are written. Default: `/data`. | +| `LEAN_EXPLORE_PACKAGES_ROOT` | optional | Root for per-package Lean workspaces. Default: `/lean`. | +| `LEAN_EXPLORE_EMBEDDING_BATCH_SIZE` | optional | Overrides the embedding batch size. | + +See [Configuration](./configuration.md) for the full list. + +## Usage + +### Full rebuild from scratch + +```bash +export OPENROUTER_API_KEY=sk-or-... +python -m lean_explore.extract --run-doc-gen4 --fresh +``` + +- `--run-doc-gen4` runs the Lake/doc-gen4 step for every registered package. +- `--fresh` clears cached Lake dependencies, forcing the latest compatible + versions to be resolved. Use this for nightly-style updates. + +Expect this to take a while — hours on a cold machine — dominated by the +Lake build. + +### Re-run a single stage on the latest extraction + +Each stage flag accepts `--/--no-`. After a fresh `parse-docs` +creates a new timestamped directory, subsequent stages target the most +recent extraction automatically. + +```bash +# Just rebuild embeddings + indices against the latest extraction +python -m lean_explore.extract --no-parse-docs --no-informalize --embeddings --index + +# Only rebuild the FAISS/BM25 indices +python -m lean_explore.extract --index +``` + +### Smoke-test on a small slice + +```bash +python -m lean_explore.extract \ + --informalize-limit 50 \ + --embedding-limit 50 +``` + +## Selected flags + +Run `python -m lean_explore.extract --help` for the full list. The most +commonly used flags: + +| Flag | Default | Purpose | +|---|---|---| +| `--run-doc-gen4` | off | Regenerate Lean documentation with Lake + doc-gen4. | +| `--fresh` | off | Clear Lake caches before building. | +| `--parse-docs / --no-parse-docs` | on* | Parse doc-gen4 output into the database. | +| `--informalize / --no-informalize` | on* | Generate natural-language descriptions. | +| `--embeddings / --no-embeddings` | on* | Generate embeddings. | +| `--index / --no-index` | on* | Build FAISS + BM25 indices. | +| `--informalize-model` | `google/gemini-3-flash-preview` | OpenRouter model id. | +| `--informalize-max-concurrent` | `10` | Parallel informalization requests. | +| `--informalize-limit` | none | Cap informalization count (for testing). | +| `--embedding-model` | `Qwen/Qwen3-Embedding-0.6B` | Sentence-transformer model. | +| `--embedding-batch-size` | `250` | Lower this if you hit OOM. | +| `--embedding-max-seq-length` | `512` | Lower this to reduce memory. | +| `--embedding-server-url` | none | Delegate embeddings to a separate process (e.g., `http://localhost:5001`) to keep GPU memory free elsewhere. | +| `--verbose` | off | Verbose logging. | + +*When you pass any stage flag explicitly, stages you did not mention default +to `off`. + +## Output layout + +Every run writes to a new timestamped directory under +`LEAN_EXPLORE_DATA_DIR`: + +``` +/ +└── 20260127_103630/ + ├── lean_explore.db + ├── informalization_faiss.index + ├── informalization_faiss_ids_map.json + ├── bm25_ids_map.json + ├── bm25_name_raw/ + └── bm25_name_spaced/ +``` + +This is the same layout that `lean-explore data fetch` populates into the +cache directory. To make the local backend use your extraction instead of a +downloaded toolchain, construct a `SearchEngine` with `use_local_data=True`: + +```python +from lean_explore.search import SearchEngine, Service + +engine = SearchEngine(use_local_data=True) +service = Service(engine=engine) +``` + +`SearchEngine` with `use_local_data=True` resolves the latest complete +extraction directory automatically. + +## Packages extracted + +The registry lives in `src/lean_explore/extract/package_registry.py`: + +- `mathlib` — also supplies `Batteries`, `Init`, `Lean`, `Std` from its + transitive dependencies. +- `physlean` +- `flt` +- `formal-conjectures` +- `cslib` + +Each package entry specifies its git URL, version strategy (`LATEST` or +`TAGGED`), and dependencies. Adding a new package means adding a +`PackageConfig` to that registry and creating a corresponding workspace +under `lean//`. + +## See also + +- [Configuration](./configuration.md) — environment variables and cache/data + paths. +- [Local Search Backend](./local-backend.md) — how to consume your + extraction output with `SearchEngine`. +- [CONTRIBUTING.md](../CONTRIBUTING.md) — general contributor guide. From 849eedeccec54ae1d3dd0077b6d20afae06f71d4 Mon Sep 17 00:00:00 2001 From: justincasher Date: Sun, 19 Apr 2026 10:50:23 -0400 Subject: [PATCH 4/7] Replace em dashes across the docs Rewrites every em dash in the documentation, using colons for definitional lists, parentheses for asides, and semicolons or sentence breaks where the dash was separating independent clauses. --- docs/README.md | 8 ++++---- docs/api-client.md | 12 ++++++------ docs/cli.md | 14 +++++++------- docs/configuration.md | 6 +++--- docs/data-models.md | 6 +++--- docs/extraction-pipeline.md | 24 ++++++++++++------------ docs/getting-started.md | 20 ++++++++++---------- docs/local-backend.md | 30 +++++++++++++++--------------- docs/mcp-server.md | 30 +++++++++++++++--------------- 9 files changed, 75 insertions(+), 75 deletions(-) diff --git a/docs/README.md b/docs/README.md index dbe0db6..e9386e5 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,7 +1,7 @@ # LeanExplore Documentation -LeanExplore is a search engine for Lean 4 declarations — theorems, definitions, -lemmas, instances, and more — from Mathlib and other major Lean packages. It +LeanExplore is a search engine for Lean 4 declarations (theorems, definitions, +lemmas, instances, and more) from Mathlib and other major Lean packages. It provides hybrid retrieval that matches both by declaration name (e.g., `List.map`, `Nat.Prime`) and by informal natural-language meaning (e.g., "continuous function on a compact set"). @@ -20,7 +20,7 @@ the CLI, as a Python library, and as an MCP server for AI assistants. | [Local Search Backend](./local-backend.md) | How the on-device BM25 + FAISS + cross-encoder pipeline works. | | [Configuration](./configuration.md) | Environment variables, cache paths, and data layout on disk. | | [Data Models](./data-models.md) | `SearchResult`, `SearchResponse`, and related types returned from search. | -| [Extraction Pipeline](./extraction-pipeline.md) | Rebuild the dataset from Lean source — contributor-focused. | +| [Extraction Pipeline](./extraction-pipeline.md) | Rebuild the dataset from Lean source (contributor-focused). | ## Which backend should I use? @@ -33,7 +33,7 @@ LeanExplore has two backends and you pick one per task: | Network | Required per query | Only for initial data fetch | | Use when | You want zero setup | You want offline, private, or tunable search | -If you don't know yet, start with the remote API — it's the fastest path to a +If you don't know yet, start with the remote API. It is the fastest path to a working search. You can switch to local later without changing your code. ## Links diff --git a/docs/api-client.md b/docs/api-client.md index 92a27a6..ad596d9 100644 --- a/docs/api-client.md +++ b/docs/api-client.md @@ -1,7 +1,7 @@ # API Client `lean_explore.api.ApiClient` is an async HTTP client for the hosted -LeanExplore API. It ships with the base package — no PyTorch, no local +LeanExplore API. It ships with the base package: no PyTorch, no local indices, no data download required. ## Install and authenticate @@ -74,8 +74,8 @@ Returns a [`SearchResponse`](./data-models.md#searchresponse) with `query`, Raises: -- `httpx.HTTPStatusError` — the API returned a non-2xx status. -- `httpx.RequestError` — network, DNS, or timeout failure. +- `httpx.HTTPStatusError`: the API returned a non-2xx status. +- `httpx.RequestError`: network, DNS, or timeout failure. ### `get_by_id()` @@ -121,7 +121,7 @@ asyncio.run(main()) ## Notes - The client creates a fresh `httpx.AsyncClient` per call. If you make many - rapid requests, consider wrapping them in `asyncio.gather(...)` — they run + rapid requests, consider wrapping them in `asyncio.gather(...)`; they run concurrently without extra setup. - Package names follow the casing shown in results (e.g., `Mathlib`, `Std`, `PhysLean`). @@ -130,7 +130,7 @@ asyncio.run(main()) ## See also -- [Data Models](./data-models.md) — field reference for `SearchResult` and +- [Data Models](./data-models.md): field reference for `SearchResult` and `SearchResponse`. -- [Configuration](./configuration.md) — environment variables including +- [Configuration](./configuration.md): environment variables including `LEANEXPLORE_API_KEY`. diff --git a/docs/cli.md b/docs/cli.md index 03de9f7..f511f0e 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -59,7 +59,7 @@ lean-explore search "continuous" -p Mathlib -p Std -n 20 Launch the Model Context Protocol server so an MCP client (Claude, Cursor, etc.) can call LeanExplore's search tools. The server speaks MCP over stdio, -so you normally do not run it manually — your MCP client launches it. Running +so you normally do not run it manually; your MCP client launches it. Running it directly is mostly useful for debugging. ```bash @@ -71,13 +71,13 @@ lean-explore mcp serve [OPTIONS] | Flag | Default | Description | |---|---|---| | `--backend`, `-b` | `api` | Backend to use: `api` or `local`. | -| `--api-key` | — | API key for the `api` backend. Overrides `LEANEXPLORE_API_KEY`. | +| `--api-key` | (none) | API key for the `api` backend. Overrides `LEANEXPLORE_API_KEY`. | ### Backends -- **`api`** — Delegates every query to the hosted LeanExplore API. Requires an +- **`api`**: Delegates every query to the hosted LeanExplore API. Requires an API key (via env var or `--api-key`). -- **`local`** — Runs the full hybrid search pipeline on-device. Requires +- **`local`**: Runs the full hybrid search pipeline on-device. Requires `pip install lean-explore[local]` and `lean-explore data fetch`. ### Examples @@ -147,14 +147,14 @@ lean-explore data clean ``` This command prompts for confirmation before deleting anything. It does not -touch downloaded model weights — those live under `~/.cache/huggingface/`. +touch downloaded model weights; those live under `~/.cache/huggingface/`. ## Exit codes All CLI commands follow standard conventions: -- `0` — success -- non-zero — an error occurred (missing API key, failed download, etc.). +- `0`: success +- non-zero: an error occurred (missing API key, failed download, etc.). An error message is printed to stderr. ## See also diff --git a/docs/configuration.md b/docs/configuration.md index ce6e992..2acf6f7 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -10,7 +10,7 @@ live on disk, and how to override defaults. Configuration is centralized in | Variable | Default | Used by | |---|---|---| -| `LEANEXPLORE_API_KEY` | — (required for API use) | `ApiClient`, `lean-explore search`, `lean-explore mcp serve --backend api` | +| `LEANEXPLORE_API_KEY` | (required for API use) | `ApiClient`, `lean-explore search`, `lean-explore mcp serve --backend api` | ### Paths @@ -32,7 +32,7 @@ live on disk, and how to override defaults. Configuration is centralized in | Variable | Default | Purpose | |---|---|---| -| `OPENROUTER_API_KEY` | — | Used by the informalization step to call OpenRouter. | +| `OPENROUTER_API_KEY` | (none) | Used by the informalization step to call OpenRouter. | ## On-disk layout @@ -66,7 +66,7 @@ Change this location with `LEAN_EXPLORE_CACHE_DIR`. First-run local-backend models are cached under `~/.cache/huggingface/` by `sentence-transformers` and `transformers`. This is controlled by -Hugging Face, not LeanExplore — see `HF_HOME` / `TRANSFORMERS_CACHE` if you +Hugging Face, not LeanExplore. See `HF_HOME` / `TRANSFORMERS_CACHE` if you need to move it. ## Programmatic access diff --git a/docs/data-models.md b/docs/data-models.md index 24856ac..9d7d340 100644 --- a/docs/data-models.md +++ b/docs/data-models.md @@ -94,8 +94,8 @@ the Pydantic models above. ## See also -- [API Client](./api-client.md) — returns `SearchResponse` / `SearchResult`. -- [Local Search Backend](./local-backend.md) — same return types via +- [API Client](./api-client.md): returns `SearchResponse` / `SearchResult`. +- [Local Search Backend](./local-backend.md): same return types via `Service`. -- [MCP Server](./mcp-server.md) — tool return payloads serialize these +- [MCP Server](./mcp-server.md): tool return payloads serialize these models to JSON. diff --git a/docs/extraction-pipeline.md b/docs/extraction-pipeline.md index 07203e4..ca022d0 100644 --- a/docs/extraction-pipeline.md +++ b/docs/extraction-pipeline.md @@ -1,7 +1,7 @@ # Extraction Pipeline The extraction pipeline is how LeanExplore *builds* the dataset that the -search engine queries. For everyday use you do not run it — `lean-explore +search engine queries. For everyday use you do not run it; `lean-explore data fetch` downloads a prebuilt version. This page is for contributors who want to regenerate the data (e.g., against a newer Mathlib) or run a custom build locally. @@ -17,22 +17,22 @@ python -m lean_explore.extract [OPTIONS] Four stages, run in order: -1. **doc-gen4** (optional, `--run-doc-gen4`) — runs Lake + doc-gen4 against +1. **doc-gen4** (optional, `--run-doc-gen4`): runs Lake + doc-gen4 against each package workspace in `lean//` to produce structured declaration data (BMP files). This is the slow step: it builds Lean packages from source. -2. **Parse** (`--parse-docs`) — reads the doc-gen4 output and writes one +2. **Parse** (`--parse-docs`): reads the doc-gen4 output and writes one row per declaration into a fresh SQLite database under a new `YYYYMMDD_HHMMSS` directory in `LEAN_EXPLORE_DATA_DIR`. -3. **Informalize** (`--informalize`) — calls an LLM (through OpenRouter) to +3. **Informalize** (`--informalize`): calls an LLM (through OpenRouter) to generate a natural-language description for each declaration. -4. **Embed** (`--embeddings`) — runs each informalization through the +4. **Embed** (`--embeddings`): runs each informalization through the sentence-transformer to produce vector embeddings. -5. **Index** (`--index`) — builds the FAISS semantic index and the two BM25 +5. **Index** (`--index`): builds the FAISS semantic index and the two BM25 indices over the declaration names. Running with no flags runs all stages. Passing any stage flag switches to -"only run the stages I asked for" mode — which is how you resume after a +"only run the stages I asked for" mode, which is how you resume after a failure, or iterate on a single step. ## Prerequisites @@ -77,7 +77,7 @@ python -m lean_explore.extract --run-doc-gen4 --fresh - `--fresh` clears cached Lake dependencies, forcing the latest compatible versions to be resolved. Use this for nightly-style updates. -Expect this to take a while — hours on a cold machine — dominated by the +Expect this to take a while (hours on a cold machine), dominated by the Lake build. ### Re-run a single stage on the latest extraction @@ -161,7 +161,7 @@ extraction directory automatically. The registry lives in `src/lean_explore/extract/package_registry.py`: -- `mathlib` — also supplies `Batteries`, `Init`, `Lean`, `Std` from its +- `mathlib`: also supplies `Batteries`, `Init`, `Lean`, `Std` from its transitive dependencies. - `physlean` - `flt` @@ -175,8 +175,8 @@ under `lean//`. ## See also -- [Configuration](./configuration.md) — environment variables and cache/data +- [Configuration](./configuration.md): environment variables and cache/data paths. -- [Local Search Backend](./local-backend.md) — how to consume your +- [Local Search Backend](./local-backend.md): how to consume your extraction output with `SearchEngine`. -- [CONTRIBUTING.md](../CONTRIBUTING.md) — general contributor guide. +- [CONTRIBUTING.md](../CONTRIBUTING.md): general contributor guide. diff --git a/docs/getting-started.md b/docs/getting-started.md index 71fed18..28f3827 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -3,9 +3,9 @@ This guide takes you from zero to your first search in under five minutes. LeanExplore has two paths: -1. **Remote API** — a small pure-Python install that talks to the hosted +1. **Remote API**: a small pure-Python install that talks to the hosted search service. Best for getting started. -2. **Local backend** — runs the full search pipeline on your machine. Larger +2. **Local backend**: runs the full search pipeline on your machine. Larger install, no network calls after the initial data fetch. ## Requirements @@ -23,8 +23,8 @@ LeanExplore has two paths: pip install lean-explore ``` -This installs the CLI, the `ApiClient`, and the MCP server — about 50 MB of -pure-Python and C-extension dependencies. No PyTorch. +This installs the CLI, the `ApiClient`, and the MCP server, roughly 50 MB +of pure-Python and C-extension dependencies. No PyTorch. ### 2. Get an API key @@ -45,7 +45,7 @@ lean-explore search "List.map" --limit 10 lean-explore search "fundamental theorem of arithmetic" --package Mathlib ``` -The first argument is the query — it can be a Lean declaration name, a partial +The first argument is the query. It can be a Lean declaration name, a partial name, or a natural-language description. The search engine handles both at once; you don't need to pick a mode. @@ -94,7 +94,7 @@ The first run will also download the embedding and reranker models from Hugging Face (`Qwen/Qwen3-Embedding-0.6B` and `Qwen/Qwen3-Reranker-0.6B`), caching them under `~/.cache/huggingface/`. Subsequent runs are fast. -The `lean-explore search` CLI currently uses the remote API — for programmatic +The `lean-explore search` CLI currently uses the remote API. For programmatic local search, see [Local Search Backend](./local-backend.md) and use `SearchEngine` / `Service` directly. @@ -117,7 +117,7 @@ You can filter any search to a subset with `--package` (CLI) or the ## Next steps -- [CLI Reference](./cli.md) — every command and flag. -- [MCP Server](./mcp-server.md) — wire LeanExplore into Claude or Cursor. -- [API Client](./api-client.md) — use `ApiClient` from Python. -- [Configuration](./configuration.md) — environment variables and data layout. +- [CLI Reference](./cli.md): every command and flag. +- [MCP Server](./mcp-server.md): wire LeanExplore into Claude or Cursor. +- [API Client](./api-client.md): use `ApiClient` from Python. +- [Configuration](./configuration.md): environment variables and data layout. diff --git a/docs/local-backend.md b/docs/local-backend.md index ece07af..f8c9387 100644 --- a/docs/local-backend.md +++ b/docs/local-backend.md @@ -1,7 +1,7 @@ # Local Search Backend The local backend runs LeanExplore's full hybrid retrieval pipeline on your -machine. Once the data is fetched, no network calls are needed — queries +machine. Once the data is fetched, no network calls are needed; queries execute entirely locally. This page explains: @@ -13,10 +13,10 @@ This page explains: ## When to use it -- **Offline use** — no network per query after the initial data fetch. -- **Privacy** — queries never leave your machine. -- **Tuning** — you can adjust candidate pool sizes and rerank depth. -- **Integration** — embed `SearchEngine` or `Service` directly in your +- **Offline use**: no network per query after the initial data fetch. +- **Privacy**: queries never leave your machine. +- **Tuning**: you can adjust candidate pool sizes and rerank depth. +- **Integration**: embed `SearchEngine` or `Service` directly in your application or data pipeline. For quick exploration or low-setup use, prefer the @@ -43,7 +43,7 @@ GPU is optional but used automatically when available. A query flows through five stages: 1. **BM25 lexical search on names.** Two BM25 indices score the query against - declaration names — one using raw tokenization (entire dotted name as a + declaration names: one using raw tokenization (entire dotted name as a single token), one using spaced tokenization (splitting on dots, underscores, and camelCase). This catches both exact and fuzzy name matches (e.g., `list.map` → `List.map`). @@ -66,18 +66,18 @@ final list. Switches available on every search call: -- `limit` — how many results to return (default 20–50). -- `rerank_top` — how deep to rerank. Larger is slower but more precise. +- `limit`: how many results to return (default 20 to 50). +- `rerank_top`: how deep to rerank. Larger is slower but more precise. Setting it to `0` or `None` skips reranking entirely. -- `packages` — restrict to a subset of packages. +- `packages`: restrict to a subset of packages. ## Using it from Python Two layers are available: -- **`Service`** — the convenient wrapper. Returns a `SearchResponse` with +- **`Service`**: the convenient wrapper. Returns a `SearchResponse` with timing metadata. -- **`SearchEngine`** — the lower-level engine. Returns a plain +- **`SearchEngine`**: the lower-level engine. Returns a plain `list[SearchResult]` and exposes all retrieval knobs. ### `Service` @@ -142,11 +142,11 @@ For the full set of parameters, see the class docstring in ## Tuning tips -- **Small queries, fast responses** — set `rerank_top=0` to skip the +- **Small queries, fast responses**: set `rerank_top=0` to skip the cross-encoder. You lose some precision but gain significant speed. -- **Recall-heavy queries** — bump `faiss_k` and `bm25_k` to widen the +- **Recall-heavy queries**: bump `faiss_k` and `bm25_k` to widen the candidate pool before reranking. -- **Memory-constrained machines** — lower `LEAN_EXPLORE_EMBEDDING_BATCH_SIZE` +- **Memory-constrained machines**: lower `LEAN_EXPLORE_EMBEDDING_BATCH_SIZE` and `LEAN_EXPLORE_RERANKER_BATCH_SIZE`. See [Configuration](./configuration.md). ## See also @@ -154,5 +154,5 @@ For the full set of parameters, see the class docstring in - [Data Models](./data-models.md) for the shape of `SearchResult`. - [Configuration](./configuration.md) for env vars that control batch sizes and cache paths. -- [MCP Server](./mcp-server.md) — the local backend powers +- [MCP Server](./mcp-server.md): the local backend powers `lean-explore mcp serve --backend local`. diff --git a/docs/mcp-server.md b/docs/mcp-server.md index d1cb04a..799350f 100644 --- a/docs/mcp-server.md +++ b/docs/mcp-server.md @@ -14,7 +14,7 @@ This page covers: ## Running the server -The server speaks MCP over stdio — your client launches it as a subprocess. +The server speaks MCP over stdio; your client launches it as a subprocess. You rarely invoke it directly except for debugging. ### Remote API backend (default) @@ -84,11 +84,11 @@ Any client that accepts a command + args will work. Point it at the ### Troubleshooting -- **"API key required"** — set `LEANEXPLORE_API_KEY` in the `env` block (for +- **"API key required"**: set `LEANEXPLORE_API_KEY` in the `env` block (for MCP clients that support it) or pass `--api-key` in `args`. -- **"Essential data files for the local backend are missing"** — run +- **"Essential data files for the local backend are missing"**: run `lean-explore data fetch` first. -- **Tools do not appear in the client** — check the client's MCP logs. The +- **Tools do not appear in the client**: check the client's MCP logs. The server logs to stderr; increasing verbosity with `python -m lean_explore.mcp.server --backend local --log-level DEBUG` helps diagnose startup issues. @@ -99,7 +99,7 @@ The server registers eight tools. IDs returned from `search` or `search_summary` can be passed to the per-field getters to fetch exactly the field you need, which keeps token usage low. -### `search_summary` — the preferred starting point +### `search_summary`: the preferred starting point Returns only `id`, `name`, and a short description per hit. Use this first, then fetch details for the handful of entries you care about. @@ -126,7 +126,7 @@ then fetch details for the handful of entries you care about. } ``` -### `search` — full results +### `search`: full results Same parameters as `search_summary`, but each result includes every field: `id`, `name`, `module`, `docstring`, `source_text`, `source_link`, @@ -142,12 +142,12 @@ if the id does not exist. The id comes from a prior `search` or | Tool | Returns | |---|---| -| `get_source_code` | `{id, name, source_text}` — the Lean source. | -| `get_source_link` | `{id, name, source_link}` — GitHub URL to the source. | -| `get_docstring` | `{id, name, docstring}` — the doc comment (may be null). | -| `get_description` | `{id, name, informalization}` — AI-generated natural-language description. | -| `get_module` | `{id, name, module}` — module path (e.g., `Mathlib.Data.List.Basic`). | -| `get_dependencies` | `{id, name, dependencies}` — JSON array of declaration names this depends on. | +| `get_source_code` | `{id, name, source_text}`: the Lean source. | +| `get_source_link` | `{id, name, source_link}`: GitHub URL to the source. | +| `get_docstring` | `{id, name, docstring}`: the doc comment (may be null). | +| `get_description` | `{id, name, informalization}`: AI-generated natural-language description. | +| `get_module` | `{id, name, module}`: module path (e.g., `Mathlib.Data.List.Basic`). | +| `get_dependencies` | `{id, name, dependencies}`: JSON array of declaration names this depends on. | ## Recommended agent workflow @@ -165,11 +165,11 @@ code and dependency chains when it matters. ## Query styles -Both styles work through the same endpoints — the hybrid retriever decides +Both styles work through the same endpoints; the hybrid retriever decides internally which signal applies: -- **By name** — `List.map`, `Nat.Prime`, `CategoryTheory.Functor.map`. -- **By meaning** — `"continuous function on a compact set"`, `"sum of a +- **By name**: `List.map`, `Nat.Prime`, `CategoryTheory.Functor.map`. +- **By meaning**: `"continuous function on a compact set"`, `"sum of a geometric series"`, `"a group homomorphism preserving multiplication"`. You do not need to specify which mode you want. From bf6a6c2e54c249f115380aa8003f73f6ab8f4c7d Mon Sep 17 00:00:00 2001 From: justincasher Date: Sun, 19 Apr 2026 10:51:19 -0400 Subject: [PATCH 5/7] Move documentation section below installation Readers need install instructions first; the docs table now follows installation so it lands in a more natural spot for people skimming the README top-to-bottom. --- README.md | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 7d2a14e..dbfe472 100644 --- a/README.md +++ b/README.md @@ -23,21 +23,6 @@ A search engine for Lean 4 declarations. This project provides tools and resources for exploring the Lean 4 ecosystem. -**Documentation:** see the [`docs/`](docs/README.md) folder in this repo, or visit [https://www.leanexplore.com/docs](https://www.leanexplore.com/docs). - -## Documentation - -| Page | Description | -|---|---| -| [Getting Started](docs/getting-started.md) | Install and run your first search. | -| [CLI Reference](docs/cli.md) | Every `lean-explore` command and flag. | -| [MCP Server](docs/mcp-server.md) | Wire LeanExplore into Claude, Cursor, or any MCP client. | -| [API Client](docs/api-client.md) | Use `ApiClient` from Python. | -| [Local Search Backend](docs/local-backend.md) | How hybrid BM25 + FAISS + reranking works. | -| [Configuration](docs/configuration.md) | Environment variables and data layout. | -| [Data Models](docs/data-models.md) | `SearchResult`, `SearchResponse`, and related types. | -| [Extraction Pipeline](docs/extraction-pipeline.md) | Rebuild the dataset from Lean source (contributors). | - ## Installation The base package connects to the remote API and does not require heavy ML dependencies: @@ -73,6 +58,21 @@ The current indexed projects include: This code is distributed under an Apache License (see [LICENSE](LICENSE)). +## Documentation + +Full docs live in the [`docs/`](docs/README.md) folder, or at [https://www.leanexplore.com/docs](https://www.leanexplore.com/docs). + +| Page | Description | +|---|---| +| [Getting Started](docs/getting-started.md) | Install and run your first search. | +| [CLI Reference](docs/cli.md) | Every `lean-explore` command and flag. | +| [MCP Server](docs/mcp-server.md) | Wire LeanExplore into Claude, Cursor, or any MCP client. | +| [API Client](docs/api-client.md) | Use `ApiClient` from Python. | +| [Local Search Backend](docs/local-backend.md) | How hybrid BM25 + FAISS + reranking works. | +| [Configuration](docs/configuration.md) | Environment variables and data layout. | +| [Data Models](docs/data-models.md) | `SearchResult`, `SearchResponse`, and related types. | +| [Extraction Pipeline](docs/extraction-pipeline.md) | Rebuild the dataset from Lean source (contributors). | + ## Contributing Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on code style, testing, and development setup. From 40a9ad3f70047930af3042ffe1290da2c312b759 Mon Sep 17 00:00:00 2001 From: justincasher Date: Sun, 19 Apr 2026 10:52:07 -0400 Subject: [PATCH 6/7] Move license notice to the end of the README Promotes the license line into its own section at the bottom so the flow reads installation -> documentation -> contributing -> cite -> license. --- README.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index dbfe472..744ec70 100644 --- a/README.md +++ b/README.md @@ -56,8 +56,6 @@ The current indexed projects include: * PhysLean * Std -This code is distributed under an Apache License (see [LICENSE](LICENSE)). - ## Documentation Full docs live in the [`docs/`](docs/README.md) folder, or at [https://www.leanexplore.com/docs](https://www.leanexplore.com/docs). @@ -95,3 +93,7 @@ Justin Asher. (2025). *LeanExplore: A search engine for Lean 4 declarations*. [h url = {https://arxiv.org/abs/2506.11085} } ``` + +## License + +This code is distributed under an Apache License (see [LICENSE](LICENSE)). From df4b113943e9304510b5a6f615bb1d862f57e4fc Mon Sep 17 00:00:00 2001 From: justincasher Date: Sun, 19 Apr 2026 10:52:35 -0400 Subject: [PATCH 7/7] Move indexed-projects list into the intro Surfaces the list of packages LeanExplore covers immediately after the one-line description, so readers see the scope before hitting install instructions. --- README.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 744ec70..045e0ec 100644 --- a/README.md +++ b/README.md @@ -23,6 +23,18 @@ A search engine for Lean 4 declarations. This project provides tools and resources for exploring the Lean 4 ecosystem. +The current indexed projects include: + +* Batteries +* CSLib +* FLT (Fermat's Last Theorem) +* FormalConjectures +* Init +* Lean +* Mathlib +* PhysLean +* Std + ## Installation The base package connects to the remote API and does not require heavy ML dependencies: @@ -44,18 +56,6 @@ lean-explore data fetch lean-explore mcp serve --backend local ``` -The current indexed projects include: - -* Batteries -* CSLib -* FLT (Fermat's Last Theorem) -* FormalConjectures -* Init -* Lean -* Mathlib -* PhysLean -* Std - ## Documentation Full docs live in the [`docs/`](docs/README.md) folder, or at [https://www.leanexplore.com/docs](https://www.leanexplore.com/docs).