crosshair-lean

Python has property-based testing. Lean 4 has a tactic-based theorem prover. crosshair-lean connects the two: it takes property-based tests written with Hypothesis and proves them correct by symbolic execution, with Lean 4 discharging the proof obligations.

The project is a fork of CrossHair, which ordinarily uses Z3 as its constraint solver. We replace Z3 with lean.py, a Python binding to Lean 4's kernel. Where CrossHair was a bug-finder, crosshair-lean is a prover.

Quick example

from hypothesis import given, settings, HealthCheck
from hypothesis import strategies as st

@settings(
    backend="crosshair",
    max_examples=50,
    deadline=None,
    suppress_health_check=list(HealthCheck),
)
@given(x=st.integers(), y=st.integers())
def test_addition_commutative(x, y):
    assert x + y == y + x

test_addition_commutative()  # proved, not tested

CrossHair's Hypothesis backend creates symbolic integers for x and y, executes the function body symbolically, and hands the resulting proof obligation to Lean 4. If Lean's tactic engine can discharge it, the property is proved for all integers. If not, crosshair-lean reports the unresolved Lean goal so you can see exactly what remains.

Installation

# Install lean.py (Lean 4 bindings for Python)
pip install "lean_py @ git+https://github.com/BasisResearch/lean.py"

# Install crosshair-lean
pip install -e ".[dev]"

The C extension _crosshair_tracers must be compiled. On macOS:

LDSHARED="cc -bundle -undefined dynamic_lookup" python build_tracers.py

Quickstart

$ python demo.py --goals

  PROVED  addition is commutative
          [discharged]
            ⊢ ∀ (int_01 int_02 : Int), ¬¬(int_01 + int_02 = int_02 + int_01)
  PROVED  addition is associative
          [discharged]
            ⊢ ∀ (int_01 int_02 int_03 : Int), ¬¬(int_01 + int_02 + int_03 = int_01 + (int_02 + int_03))
...

Two ways to prove things

1. Hypothesis backend (for property-based tests)

Decorate any @given test with @settings(backend="crosshair"). The test function runs under symbolic execution instead of random testing.

@settings(backend="crosshair", max_examples=50, deadline=None,
          suppress_health_check=list(HealthCheck))
@given(x=st.integers(), y=st.integers(), z=st.integers())
def test_distributive(x, y, z):
    assert x * (y + z) == x * y + x * z

test_distributive()  # proved

2. `prove_function` (for docstring contracts)

Functions with pre: and post: contracts in their docstrings can be verified directly.

from crosshair.prove import prove_function

def clamp(x: int, lo: int, hi: int) -> int:
    """
    pre: lo <= hi
    post: _ >= lo
    post: _ <= hi
    """
    return max(lo, min(x, hi))

result = prove_function(clamp)
result.show()
# Proved over 3 path(s).

When a proof fails, result.lean_goals contains the Lean goal strings that could not be discharged. You can inspect them, or pass additional tactics:

result = prove_function(fn, extra_tactics=["ring", "nlinarith", "linarith"])

Specs for external functions

Real code calls library functions whose source CrossHair cannot symbolically execute. The @spec decorator and register_spec provide postcondition contracts that CrossHair uses in place of the function body during symbolic execution.

from specs import spec, register_spec

@spec(post=[
    lambda ret, x, lo, hi: (lo > x) | (x > hi) | (ret == x),
    lambda ret, lo: ret >= lo,
    lambda ret, hi: ret <= hi,
])
def clamp(x: int, lo: int, hi: int) -> int:
    return max(lo, min(x, hi))

Specs compose transitively. If foo calls bar, and bar has a spec, CrossHair uses bar's postconditions when symbolically executing foo. You can spec library functions you cannot modify with register_spec:

import math
register_spec(math.gcd, post=[
    lambda ret, a, b: ret >= 0,
    lambda ret, a, b: (ret == 0) | (a % ret == 0),
])

What gets proved

The verification suite (verify_realpbts.py) proves 98 properties across four phases:

64 baseline PBTs from the Benchify RealPBT dataset, scanned and verified by scan_realpbt.py
23 Hypothesis-backend tests using specs for external functions (comparison, arithmetic, clamping, intervals, numpy)
11 prove_function tests with docstring contracts (sorting networks, split invariants, constrained values)
Phase 3 retries failed goals with extra Lean tactics (nlinarith, ring, linarith, positivity, norm_num)

Run the full suite:

python verify_realpbts.py            # all 98
python verify_realpbts.py --quick    # skip baseline, run 34 new tests
python verify_realpbts.py --verbose  # show Lean goals on failure

The 64 baseline files live in realpbt_proved/ and can be executed individually:

python realpbt_proved/fengzhongzhu1621_xTool__test_addition_is_commutative.py

Scanning the RealPBT dataset

scan_realpbt.py processes the full RealPBT dataset (45,536 Python @given PBTs from 872 repositories). It injects the crosshair backend, runs each test in a subprocess with a 15-second timeout, and sorts results into realpbt_proved/ and realpbt_unproved/.

python scan_realpbt.py              # full scan (default 4 workers)
python scan_realpbt.py --workers 8  # more parallelism

Of the 45k PBTs, roughly 7,000 are standalone functions that compile and run. The remainder are class methods (37k) or require pytest fixtures. The 64 that currently prove are properties over integers, strings, lists, and basic arithmetic where Lean's tactics suffice.

Architecture

The key changes from upstream CrossHair:

File	Role
`crosshair/statespace.py`	`InstrumentedSolver` subclasses `lean_py.z3.Solver`. Captures Lean goals on `unknown`, tries extra tactics.
`crosshair/prove.py`	`prove_function()` API. Returns `ProofResult` with `proved`, `goals`, `lean_goals`.
`crosshair/z3util.py`	Complete rewrite for lean.py's API (replaces z3py utilities).
`crosshair/core.py`	`NeedsSpecification` handling. Does not stop at first `REFUTED` in proving mode.
`specs.py`	`@spec` decorator and `register_spec` for external function contracts.
`crosshair/libimpl/builtinslib.py`	Graceful fallback when lean.py cannot extract model values.

All import z3 statements have been replaced with import lean_py.z3 as z3. The solver layer treats unknown as "possibly satisfiable" (only unsat means a proven contradiction), and find_model_value() raises NeedsSpecification since lean.py has no model extraction.

How a proof works, step by step

Take the simplest possible property-based test:

@given(x=st.integers(), y=st.integers())
def test_addition_commutative(x, y):
    assert x + y == y + x

Three layers cooperate to prove this. The first is CrossHair's symbolic executor, which replaces concrete Python values with symbolic proxies. The second is the lean.py solver API, which builds constraint expressions. The third is Lean 4's tactic engine, which discharges the actual proof obligation.

Layer 1: Symbolic proxies (CrossHair)

When Hypothesis calls test_addition_commutative(x, y) through the crosshair backend, x and y are not Python int values. They are SymbolicBoundedInt objects, created by proxy_for_type(int, "x") in crosshair/core.py. Each proxy wraps a lean.py expression:

x = SymbolicBoundedInt(smtvar="x", typ=int)
# x.var  →  lean_py.z3.Const("x", IntSort())

Every arithmetic operation on a proxy returns another proxy. x + y calls SymbolicNumberAble.__add__, which dispatches to numeric_binop(ops.add, x, y) in builtinslib.py. The handler calls apply_smt(ops.add, x.var, y.var), producing a new lean.py expression Add(x, y), and wraps it in a fresh SymbolicInt:

x + y
# → SymbolicInt(var=Add(Const("x", IntSort), Const("y", IntSort)))

The same happens for y + x. Then == dispatches to numeric_binop(ops.eq, ...), which builds an equality expression and wraps it in a SymbolicBool:

(x + y) == (y + x)
# → SymbolicBool(var=Eq(Add(x, y), Add(y, x)))

No solver has been called yet. All of this is Python-level bookkeeping: building a tree of lean.py expressions that mirrors the structure of the Python code.

Layer 2: The solver query (lean.py z3 API)

The assert statement forces the SymbolicBool to become a concrete True or False. Python calls SymbolicBool.__bool__(), which calls choose_possible(expr) on the current StateSpace. CrossHair now needs to decide: can this expression be false?

choose_possible asks two questions via solver_is_sat:

Is ¬(x + y == y + x) satisfiable? (Can the assertion fail?)
Is x + y == y + x satisfiable? (Can the assertion hold?)

Each question calls InstrumentedSolver.check(), which forwards to lean_py.z3.Solver.check(). The solver accumulates all path constraints seen so far (bounds on x and y, prior branch decisions) and checks the conjunction.

Internally, lean.py translates the z3-style expression tree into a Lean 4 expression. It builds a conjunction of all accumulated assertions plus the new query, negates it, and asks Lean's tactic engine to prove the negation.

Layer 3: Lean 4 tactics

For the query "is ¬(x + y == y + x) satisfiable?", lean.py constructs the Lean goal:

⊢ ∀ (x y : Int), ¬¬(x + y = y + x)

Lean's tactic engine tries a sequence of built-in tactics (omega, simp, ring, linarith). The omega tactic handles linear integer arithmetic and discharges this goal immediately. lean.py returns unsat to CrossHair: no, the assertion cannot fail.

InstrumentedSolver captures the discharged goal in proved_goals, so you can inspect it with --goals:

  PROVED  addition is commutative
          [discharged]
            ⊢ ∀ (x y : Int), ¬¬(x + y = y + x)

CrossHair records that this path is forced (the assertion must be true), adds the constraint x + y == y + x to the solver state, and continues exploring. When all paths have been explored with no assertion violations, the test is proved.

What about string properties?

Not all proofs reach Lean. Consider:

@given(x=st.text(), y=st.text())
def test_startswith_after_concat(x, y):
    assert (x + y).startswith(x)

CrossHair's symbolic string implementation handles concatenation and startswith directly. The proxy for x + y knows its prefix is x by construction, so startswith(x) resolves to True at the symbolic execution layer. The solver is only called for path feasibility (are these string lengths valid?), and those queries return unknown since the paths are reachable. No proof obligation ever reaches Lean, but the property is still proved by symbolic execution alone.

The --goals flag on demo.py shows this distinction. Integer arithmetic tests show discharged Lean goals. String tests show nothing, because the proof happened one layer up.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
crosshair		crosshair
realpbt_proved		realpbt_proved
realpbt_unproved		realpbt_unproved
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
build_tracers.py		build_tracers.py
demo.py		demo.py
patch_provider.py		patch_provider.py
scan_realpbt.py		scan_realpbt.py
setup.cfg		setup.cfg
setup.py		setup.py
setup_env.sh		setup_env.sh
specs.py		specs.py
test_prove.py		test_prove.py
verify_realpbts.py		verify_realpbts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

crosshair-lean

Quick example

Installation

Quickstart

Two ways to prove things

1. Hypothesis backend (for property-based tests)

2. `prove_function` (for docstring contracts)

Specs for external functions

What gets proved

Scanning the RealPBT dataset

Architecture

How a proof works, step by step

Layer 1: Symbolic proxies (CrossHair)

Layer 2: The solver query (lean.py z3 API)

Layer 3: Lean 4 tactics

What about string properties?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

crosshair-lean

Quick example

Installation

Quickstart

Two ways to prove things

1. Hypothesis backend (for property-based tests)

2. prove_function (for docstring contracts)

Specs for external functions

What gets proved

Scanning the RealPBT dataset

Architecture

How a proof works, step by step

Layer 1: Symbolic proxies (CrossHair)

Layer 2: The solver query (lean.py z3 API)

Layer 3: Lean 4 tactics

What about string properties?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. `prove_function` (for docstring contracts)

Packages