Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 27 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,17 @@

A search engine for Lean 4 declarations. This project provides tools and resources for exploring the Lean 4 ecosystem.

**For full documentation, please visit: [https://www.leanexplore.com/docs](https://www.leanexplore.com/docs)**
The current indexed projects include:

* Batteries
* CSLib
* FLT (Fermat's Last Theorem)
* FormalConjectures
* Init
* Lean
* Mathlib
* PhysLean
* Std

## Installation

Expand All @@ -46,19 +56,20 @@ lean-explore data fetch
lean-explore mcp serve --backend local
```

The current indexed projects include:
## Documentation

* Batteries
* CSLib
* FLT (Fermat's Last Theorem)
* FormalConjectures
* Init
* Lean
* Mathlib
* PhysLean
* Std
Full docs live in the [`docs/`](docs/README.md) folder, or at [https://www.leanexplore.com/docs](https://www.leanexplore.com/docs).

This code is distributed under an Apache License (see [LICENSE](LICENSE)).
| Page | Description |
|---|---|
| [Getting Started](docs/getting-started.md) | Install and run your first search. |
| [CLI Reference](docs/cli.md) | Every `lean-explore` command and flag. |
| [MCP Server](docs/mcp-server.md) | Wire LeanExplore into Claude, Cursor, or any MCP client. |
| [API Client](docs/api-client.md) | Use `ApiClient` from Python. |
| [Local Search Backend](docs/local-backend.md) | How hybrid BM25 + FAISS + reranking works. |
| [Configuration](docs/configuration.md) | Environment variables and data layout. |
| [Data Models](docs/data-models.md) | `SearchResult`, `SearchResponse`, and related types. |
| [Extraction Pipeline](docs/extraction-pipeline.md) | Rebuild the dataset from Lean source (contributors). |

## Contributing

Expand All @@ -82,3 +93,7 @@ Justin Asher. (2025). *LeanExplore: A search engine for Lean 4 declarations*. [h
url = {https://arxiv.org/abs/2506.11085}
}
```

## License

This code is distributed under an Apache License (see [LICENSE](LICENSE)).
44 changes: 44 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# LeanExplore Documentation

LeanExplore is a search engine for Lean 4 declarations (theorems, definitions,
lemmas, instances, and more) from Mathlib and other major Lean packages. It
provides hybrid retrieval that matches both by declaration name (e.g.,
`List.map`, `Nat.Prime`) and by informal natural-language meaning (e.g.,
"continuous function on a compact set").

This documentation covers how to install, configure, and use LeanExplore from
the CLI, as a Python library, and as an MCP server for AI assistants.

## Contents

| Page | Description |
|---|---|
| [Getting Started](./getting-started.md) | Install the package, fetch data, and run your first search. |
| [CLI Reference](./cli.md) | Every `lean-explore` command, flag, and option, with examples. |
| [MCP Server](./mcp-server.md) | Run the MCP server and connect it to Claude, Cursor, and other MCP clients. Full reference for every tool exposed. |
| [API Client](./api-client.md) | Use `ApiClient` to query the remote LeanExplore API from Python. |
| [Local Search Backend](./local-backend.md) | How the on-device BM25 + FAISS + cross-encoder pipeline works. |
| [Configuration](./configuration.md) | Environment variables, cache paths, and data layout on disk. |
| [Data Models](./data-models.md) | `SearchResult`, `SearchResponse`, and related types returned from search. |
| [Extraction Pipeline](./extraction-pipeline.md) | Rebuild the dataset from Lean source (contributor-focused). |

## Which backend should I use?

LeanExplore has two backends and you pick one per task:

| | **Remote API** | **Local backend** |
|---|---|---|
| Install | `pip install lean-explore` | `pip install lean-explore[local]` |
| Requires | API key | ~1 GB of data + a few GB of model weights |
| Network | Required per query | Only for initial data fetch |
| Use when | You want zero setup | You want offline, private, or tunable search |

If you don't know yet, start with the remote API. It is the fastest path to a
working search. You can switch to local later without changing your code.

## Links

- Website: <https://www.leanexplore.com>
- Repository: <https://github.com/justincasher/lean-explore>
- Paper: <https://arxiv.org/abs/2506.11085>
- Contributing: [../CONTRIBUTING.md](../CONTRIBUTING.md)
136 changes: 136 additions & 0 deletions docs/api-client.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# API Client

`lean_explore.api.ApiClient` is an async HTTP client for the hosted
LeanExplore API. It ships with the base package: no PyTorch, no local
indices, no data download required.

## Install and authenticate

```bash
pip install lean-explore
```

Get an API key from <https://www.leanexplore.com> and set it as an
environment variable:

```bash
export LEANEXPLORE_API_KEY="your-key-here"
```

Or pass it explicitly to the client constructor.

## Quick start

```python
import asyncio
from lean_explore.api import ApiClient

async def main():
client = ApiClient() # reads LEANEXPLORE_API_KEY

response = await client.search("prime number divisibility", limit=5)
for result in response.results:
print(f"{result.name} ({result.module})")

asyncio.run(main())
```

The `search` and `get_by_id` methods are both `async`, so call them from
inside an async function or via `asyncio.run`.

## `ApiClient`

```python
ApiClient(api_key: str | None = None, timeout: float = 10.0)
```

| Parameter | Default | Description |
|---|---|---|
| `api_key` | `None` | API key. Falls back to `LEANEXPLORE_API_KEY` env var. Raises `ValueError` if neither is provided. |
| `timeout` | `10.0` | HTTP timeout in seconds for every request. |

The client hits `https://www.leanexplore.com/api/v2` by default.

### `search()`

```python
async def search(
query: str,
limit: int = 20,
rerank_top: int | None = None,
packages: list[str] | None = None,
) -> SearchResponse
```

| Parameter | Default | Description |
|---|---|---|
| `query` | *required* | Declaration name or natural-language text. |
| `limit` | `20` | Maximum results to return. |
| `rerank_top` | `None` | Ignored by the API backend (reranking is server-side). Accepted for interface parity with the local `Service`. |
| `packages` | `None` | Filter by package, e.g., `["Mathlib", "Std"]`. |

Returns a [`SearchResponse`](./data-models.md#searchresponse) with `query`,
`results`, `count`, and `processing_time_ms`.

Raises:

- `httpx.HTTPStatusError`: the API returned a non-2xx status.
- `httpx.RequestError`: network, DNS, or timeout failure.

### `get_by_id()`

```python
async def get_by_id(declaration_id: int) -> SearchResult | None
```

Fetches a single declaration by its numeric id (ids come from `search`
results). Returns a [`SearchResult`](./data-models.md#searchresult) on hit,
or `None` on a 404. Other non-2xx statuses raise
`httpx.HTTPStatusError`.

## Full example

```python
import asyncio
from lean_explore.api import ApiClient

async def main():
client = ApiClient(api_key="sk-...", timeout=15.0)

response = await client.search(
query="continuous function on a compact set",
limit=10,
packages=["Mathlib"],
)
print(f"{response.count} results in {response.processing_time_ms} ms")

# Grab the first result's full source
if response.results:
first = response.results[0]
print(f"\n{first.name}")
print(first.source_link)
print(first.source_text)

# Round-trip by id
same = await client.get_by_id(first.id)
assert same is not None and same.id == first.id

asyncio.run(main())
```

## Notes

- The client creates a fresh `httpx.AsyncClient` per call. If you make many
rapid requests, consider wrapping them in `asyncio.gather(...)`; they run
concurrently without extra setup.
- Package names follow the casing shown in results (e.g., `Mathlib`, `Std`,
`PhysLean`).
- For offline use or custom tuning, see [Local Search Backend](./local-backend.md)
and use `lean_explore.search.Service` instead.

## See also

- [Data Models](./data-models.md): field reference for `SearchResult` and
`SearchResponse`.
- [Configuration](./configuration.md): environment variables including
`LEANEXPLORE_API_KEY`.
Loading
Loading