Python has property-based testing. Lean 4 has a tactic-based theorem prover. crosshair-lean connects the two: it takes property-based tests written with Hypothesis and proves them correct by symbolic execution, with Lean 4 discharging the proof obligations.
The project is a fork of CrossHair, which ordinarily uses Z3 as its constraint solver. We replace Z3 with lean.py, a Python binding to Lean 4's kernel. Where CrossHair was a bug-finder, crosshair-lean is a prover.
from hypothesis import given, settings, HealthCheck
from hypothesis import strategies as st
@settings(
backend="crosshair",
max_examples=50,
deadline=None,
suppress_health_check=list(HealthCheck),
)
@given(x=st.integers(), y=st.integers())
def test_addition_commutative(x, y):
assert x + y == y + x
test_addition_commutative() # proved, not testedCrossHair's Hypothesis backend creates symbolic integers for x and
y, executes the function body symbolically, and hands the resulting
proof obligation to Lean 4. If Lean's tactic engine can discharge it,
the property is proved for all integers. If not, crosshair-lean
reports the unresolved Lean goal so you can see exactly what remains.
# Install lean.py (Lean 4 bindings for Python)
pip install "lean_py @ git+https://github.com/BasisResearch/lean.py"
# Install crosshair-lean
pip install -e ".[dev]"The C extension _crosshair_tracers must be compiled. On macOS:
LDSHARED="cc -bundle -undefined dynamic_lookup" python build_tracers.py$ python demo.py --goals
PROVED addition is commutative
[discharged]
⊢ ∀ (int_01 int_02 : Int), ¬¬(int_01 + int_02 = int_02 + int_01)
PROVED addition is associative
[discharged]
⊢ ∀ (int_01 int_02 int_03 : Int), ¬¬(int_01 + int_02 + int_03 = int_01 + (int_02 + int_03))
...Decorate any @given test with @settings(backend="crosshair"). The test function runs under symbolic execution instead of random testing.
@settings(backend="crosshair", max_examples=50, deadline=None,
suppress_health_check=list(HealthCheck))
@given(x=st.integers(), y=st.integers(), z=st.integers())
def test_distributive(x, y, z):
assert x * (y + z) == x * y + x * z
test_distributive() # provedFunctions with pre: and post: contracts in their docstrings can be verified directly.
from crosshair.prove import prove_function
def clamp(x: int, lo: int, hi: int) -> int:
"""
pre: lo <= hi
post: _ >= lo
post: _ <= hi
"""
return max(lo, min(x, hi))
result = prove_function(clamp)
result.show()
# Proved over 3 path(s).When a proof fails, result.lean_goals contains the Lean goal strings that could not be discharged. You can inspect them, or pass additional tactics:
result = prove_function(fn, extra_tactics=["ring", "nlinarith", "linarith"])Real code calls library functions whose source CrossHair cannot symbolically execute. The @spec decorator and register_spec provide postcondition contracts that CrossHair uses in place of the function body during symbolic execution.
from specs import spec, register_spec
@spec(post=[
lambda ret, x, lo, hi: (lo > x) | (x > hi) | (ret == x),
lambda ret, lo: ret >= lo,
lambda ret, hi: ret <= hi,
])
def clamp(x: int, lo: int, hi: int) -> int:
return max(lo, min(x, hi))Specs compose transitively. If foo calls bar, and bar has a spec, CrossHair uses bar's postconditions when symbolically executing foo. You can spec library functions you cannot modify with register_spec:
import math
register_spec(math.gcd, post=[
lambda ret, a, b: ret >= 0,
lambda ret, a, b: (ret == 0) | (a % ret == 0),
])The verification suite (verify_realpbts.py) proves 98 properties across four phases:
- 64 baseline PBTs from the Benchify RealPBT dataset, scanned and verified by
scan_realpbt.py - 23 Hypothesis-backend tests using specs for external functions (comparison, arithmetic, clamping, intervals, numpy)
- 11
prove_functiontests with docstring contracts (sorting networks, split invariants, constrained values) - Phase 3 retries failed goals with extra Lean tactics (
nlinarith,ring,linarith,positivity,norm_num)
Run the full suite:
python verify_realpbts.py # all 98
python verify_realpbts.py --quick # skip baseline, run 34 new tests
python verify_realpbts.py --verbose # show Lean goals on failureThe 64 baseline files live in realpbt_proved/ and can be executed individually:
python realpbt_proved/fengzhongzhu1621_xTool__test_addition_is_commutative.pyscan_realpbt.py processes the full RealPBT dataset (45,536 Python @given PBTs from 872 repositories). It injects the crosshair backend, runs each test in a subprocess with a 15-second timeout, and sorts results into realpbt_proved/ and realpbt_unproved/.
python scan_realpbt.py # full scan (default 4 workers)
python scan_realpbt.py --workers 8 # more parallelismOf the 45k PBTs, roughly 7,000 are standalone functions that compile and run. The remainder are class methods (37k) or require pytest fixtures. The 64 that currently prove are properties over integers, strings, lists, and basic arithmetic where Lean's tactics suffice.
The key changes from upstream CrossHair:
| File | Role |
|---|---|
crosshair/statespace.py |
InstrumentedSolver subclasses lean_py.z3.Solver. Captures Lean goals on unknown, tries extra tactics. |
crosshair/prove.py |
prove_function() API. Returns ProofResult with proved, goals, lean_goals. |
crosshair/z3util.py |
Complete rewrite for lean.py's API (replaces z3py utilities). |
crosshair/core.py |
NeedsSpecification handling. Does not stop at first REFUTED in proving mode. |
specs.py |
@spec decorator and register_spec for external function contracts. |
crosshair/libimpl/builtinslib.py |
Graceful fallback when lean.py cannot extract model values. |
All import z3 statements have been replaced with import lean_py.z3 as z3. The solver layer treats unknown as "possibly satisfiable" (only unsat means a proven contradiction), and find_model_value() raises NeedsSpecification since lean.py has no model extraction.
Take the simplest possible property-based test:
@given(x=st.integers(), y=st.integers())
def test_addition_commutative(x, y):
assert x + y == y + xThree layers cooperate to prove this. The first is CrossHair's symbolic executor, which replaces concrete Python values with symbolic proxies. The second is the lean.py solver API, which builds constraint expressions. The third is Lean 4's tactic engine, which discharges the actual proof obligation.
When Hypothesis calls test_addition_commutative(x, y) through the
crosshair backend, x and y are not Python int values. They are
SymbolicBoundedInt objects, created by proxy_for_type(int, "x") in
crosshair/core.py. Each proxy wraps a lean.py expression:
x = SymbolicBoundedInt(smtvar="x", typ=int)
# x.var → lean_py.z3.Const("x", IntSort())Every arithmetic operation on a proxy returns another proxy. x + y
calls SymbolicNumberAble.__add__, which dispatches to
numeric_binop(ops.add, x, y) in builtinslib.py. The handler calls
apply_smt(ops.add, x.var, y.var), producing a new lean.py expression
Add(x, y), and wraps it in a fresh SymbolicInt:
x + y
# → SymbolicInt(var=Add(Const("x", IntSort), Const("y", IntSort)))The same happens for y + x. Then == dispatches to
numeric_binop(ops.eq, ...), which builds an equality expression and
wraps it in a SymbolicBool:
(x + y) == (y + x)
# → SymbolicBool(var=Eq(Add(x, y), Add(y, x)))No solver has been called yet. All of this is Python-level bookkeeping: building a tree of lean.py expressions that mirrors the structure of the Python code.
The assert statement forces the SymbolicBool to become a concrete
True or False. Python calls SymbolicBool.__bool__(), which calls
choose_possible(expr) on the current StateSpace. CrossHair now
needs to decide: can this expression be false?
choose_possible asks two questions via solver_is_sat:
- Is
¬(x + y == y + x)satisfiable? (Can the assertion fail?) - Is
x + y == y + xsatisfiable? (Can the assertion hold?)
Each question calls InstrumentedSolver.check(), which forwards to
lean_py.z3.Solver.check(). The solver accumulates all path
constraints seen so far (bounds on x and y, prior branch
decisions) and checks the conjunction.
Internally, lean.py translates the z3-style expression tree into a Lean 4 expression. It builds a conjunction of all accumulated assertions plus the new query, negates it, and asks Lean's tactic engine to prove the negation.
For the query "is ¬(x + y == y + x) satisfiable?", lean.py constructs the Lean goal:
⊢ ∀ (x y : Int), ¬¬(x + y = y + x)Lean's tactic engine tries a sequence of built-in tactics (omega,
simp, ring, linarith). The omega tactic handles linear integer
arithmetic and discharges this goal immediately. lean.py returns
unsat to CrossHair: no, the assertion cannot fail.
InstrumentedSolver captures the discharged goal in proved_goals,
so you can inspect it with --goals:
PROVED addition is commutative
[discharged]
⊢ ∀ (x y : Int), ¬¬(x + y = y + x)
CrossHair records that this path is forced (the assertion must be
true), adds the constraint x + y == y + x to the solver state, and
continues exploring. When all paths have been explored with no
assertion violations, the test is proved.
Not all proofs reach Lean. Consider:
@given(x=st.text(), y=st.text())
def test_startswith_after_concat(x, y):
assert (x + y).startswith(x)CrossHair's symbolic string implementation handles concatenation and startswith directly. The proxy for x + y knows its prefix is x by construction, so startswith(x) resolves to True at the symbolic execution layer. The solver is only called for path feasibility (are these string lengths valid?), and those queries return unknown since the paths are reachable. No proof obligation ever reaches Lean, but the property is still proved by symbolic execution alone.
The --goals flag on demo.py shows this distinction. Integer arithmetic tests show discharged Lean goals. String tests show nothing, because the proof happened one layer up.