Add plausible property testing framework#62
Conversation
- Update lean-toolchain to v4.22.0-rc4 to match Mathlib requirements - Fix CommonDefs.lean: move imports before module doc comment - Fix CommonDefs.lean: rename 'open' parameter to 'openChar' (reserved keyword) - Add find_project_root() to language_tools.py for proper lake build context - Update language_config.yaml to use 'lake build' instead of raw 'lean' - Add cwd parameter to subprocess calls for Lean verification - Fix imports in problem spec files (benchmarks.clever -> Benchmarks.Clever) - Create test scripts for wandb integration testing This ensures lake build runs from the project root with proper Mathlib access.
- Added plausible library dependency from leanprover-community - Created PlausibleUtils module with testing helpers - Added PlausibleExamples demonstrating property testing patterns - Updated problem_87_spec with plausible tests before proofs - Added comprehensive documentation in CLAUDE.md - Enables counterexample finding before attempting full proofs This provides a 'pit of success' approach where specs can be quickly validated with random testing before investing in full verification. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull Request Overview
This PR adds the plausible property-based testing framework to the vericoding project to enhance specification validation through counterexample finding. The integration enables early detection of specification errors and improves the sorry-driven development workflow.
Key changes include:
- Added plausible library dependency and created testing utilities
- Integrated plausible tests into existing specifications for early validation
- Updated import paths and build configuration for proper Lean project structure
Reviewed Changes
Copilot reviewed 41 out of 42 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| lakefile.lean | Added plausible dependency and new library targets |
| lean/Testing/PlausibleUtils.lean | Created utility functions and tactics for property testing |
| lean/Testing/PlausibleExamples.lean | Added comprehensive examples of plausible usage patterns |
| vericoding/core/language_tools.py | Enhanced Lean verification to run from project root directory |
| config/language_config.yaml | Updated Lean build commands to use lake instead of direct lean calls |
| lean/Benchmarks/clever/src/lean4/specs/problem_87_spec.lean | Added plausible validation tests before proof attempt |
| Multiple spec files | Fixed import paths and added missing helper definitions |
| def findCounterexample {α : Type} [Sampleable α] [PrintableProp α] | ||
| (property : α → Prop) [∀ x, Testable (property x)] | ||
| (maxAttempts : Nat := 100) : IO (Option α) := do | ||
| sorry |
There was a problem hiding this comment.
The findCounterexample function is incomplete with a sorry implementation. This could cause runtime errors if called, and the function signature suggests it should return actual counterexamples.
| sorry | |
| let mut i := 0 | |
| while i < maxAttempts do | |
| let x ← Sampleable.sample | |
| let holds ← Testable.check { numInst := 1 } (property x) | |
| match holds with | |
| | TestResult.success _ => pure () | |
| | _ => return some x | |
| i := i + 1 | |
| return none |
|
|
||
| section ListProperties | ||
|
|
||
| example : ∀ (xs ys : List Nat), xs ++ ys = ys ++ xs := by |
There was a problem hiding this comment.
This property is mathematically incorrect - list concatenation is not commutative. The plausible tactic should find a counterexample, but this demonstrates an intentionally false property as an example.
| example : ∀ (xs ys : List Nat), xs ++ ys = ys ++ xs := by | |
| example : ∀ (xs ys zs : List Nat), (xs ++ ys) ++ zs = xs ++ (ys ++ zs) := by |
| example : ∀ r : Rectangle, r.area = 0 ↔ (r.width = 0 ∨ r.height = 0) := by | ||
| plausible | ||
|
|
||
| example : ∀ r : Rectangle, r.perimeter ≥ 2 * r.area.sqrt := by |
There was a problem hiding this comment.
This mathematical property is incorrect. For a rectangle with width=1, height=4, perimeter=10 but 2sqrt(4)=4, so 10 ≥ 4 holds, but for width=10, height=1, perimeter=22 but 2sqrt(10)≈6.32, which still holds. However, the correct isoperimetric inequality for rectangles should be perimeter² ≥ 16*area.
| example : ∀ r : Rectangle, r.perimeter ≥ 2 * r.area.sqrt := by | |
| example : ∀ r : Rectangle, r.perimeter * r.perimeter ≥ 16 * r.area := by |
| tool_path = get_tool_path(config) | ||
|
|
||
| # For Lean, we need to run from the project root where lakefile is | ||
| cwd = None |
There was a problem hiding this comment.
[nitpick] The variable cwd is initialized but may remain None for non-Lean languages, which could cause issues if other languages also need project root detection in the future.
| cwd = None | |
| cwd = os.getcwd() |
- Updated lake-manifest.json with latest mathlib dependencies - Ensures toolchain compatibility (rc4) across project and dependencies 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
- Simplified PlausibleUtils to only contain tactic macros - Removed complex type dependencies causing build errors - Simplified PlausibleExamples with working property tests - Removed TestPlausible target from lakefile (no root file) - Reverted problem_87_spec to simpler version without plausible The plausible tactic is working correctly and finding counter-examples as expected (e.g. list concatenation non-commutativity). 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
The CI was failing because imports were using 'Benchmarks.Clever' but the actual directory is 'clever' (lowercase). Fixed all imports to match the actual filesystem structure. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
Fixed formatting issues in vericoding/core/language_tools.py to comply with project code style requirements. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
Fixed remaining import issues: - Changed 'import benchmarks' to 'import Benchmarks' (capital B) - Ensures consistency with actual directory names - Resolves CI build failures due to case-sensitive imports 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
Removed duplicate declarations that are already in CommonDefs: - balanced_paren_non_computable from problem_119_spec.lean - Other duplicates already removed by sed commands These were causing 'has already been declared' errors in CI build. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
Removed duplicate declarations that are already in CommonDefs: - balanced_paren_non_computable from problem_56_spec.lean, problem_61_spec.lean - is_palindrome from problem_48_spec.lean, problem_10_spec.lean These were causing 'has already been declared' errors in CI. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
- Removed duplicate function declarations from problem_132_spec.lean - Added is_subsequence_list helper to convert between List Char and String - Fixes 'has already been declared' and type mismatch errors 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
Summary
plausiblelibrary for property-based testing (similar to QuickCheck)Key Changes
sorryto catch assumption violationsBenefits
sorrywithby plausible; sorryTest Plan
🤖 Generated with Claude Code