Skip to content

Add plausible property testing framework#62

Open
alok wants to merge 13 commits into
mainfrom
add-plausible-testing
Open

Add plausible property testing framework#62
alok wants to merge 13 commits into
mainfrom
add-plausible-testing

Conversation

@alok
Copy link
Copy Markdown
Contributor

@alok alok commented Aug 12, 2025

Summary

  • Added the plausible library for property-based testing (similar to QuickCheck)
  • Created utilities and examples for using plausible testing
  • Integrated plausible checks into existing specs

Key Changes

  1. Added plausible dependency - Using leanprover-community/plausible for counterexample finding
  2. PlausibleUtils module - Helper functions and tactics for common test patterns
  3. PlausibleExamples module - Demonstrations of property testing with lists, matrices, and custom types
  4. Updated specs - Added plausible tests to problem_87_spec before sorry to catch assumption violations
  5. Documentation - Comprehensive guide in CLAUDE.md for using plausible testing

Benefits

  • Early error detection: Find counterexamples quickly before attempting full proofs
  • Improved sorry-driven development: Replace bare sorry with by plausible; sorry
  • Pit of success: Models can validate assumptions before investing in verification
  • Regression testing: Ensure spec changes don't break existing properties

Test Plan

  • Add plausible library dependency
  • Create testing utilities
  • Write example property tests
  • Update existing specs
  • Run full build to verify compilation (in progress - large codebase)

🤖 Generated with Claude Code

alok and others added 3 commits August 7, 2025 09:11
- Update lean-toolchain to v4.22.0-rc4 to match Mathlib requirements
- Fix CommonDefs.lean: move imports before module doc comment
- Fix CommonDefs.lean: rename 'open' parameter to 'openChar' (reserved keyword)
- Add find_project_root() to language_tools.py for proper lake build context
- Update language_config.yaml to use 'lake build' instead of raw 'lean'
- Add cwd parameter to subprocess calls for Lean verification
- Fix imports in problem spec files (benchmarks.clever -> Benchmarks.Clever)
- Create test scripts for wandb integration testing

This ensures lake build runs from the project root with proper Mathlib access.
- Added plausible library dependency from leanprover-community
- Created PlausibleUtils module with testing helpers
- Added PlausibleExamples demonstrating property testing patterns
- Updated problem_87_spec with plausible tests before proofs
- Added comprehensive documentation in CLAUDE.md
- Enables counterexample finding before attempting full proofs

This provides a 'pit of success' approach where specs can be quickly
validated with random testing before investing in full verification.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings August 12, 2025 03:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds the plausible property-based testing framework to the vericoding project to enhance specification validation through counterexample finding. The integration enables early detection of specification errors and improves the sorry-driven development workflow.

Key changes include:

  • Added plausible library dependency and created testing utilities
  • Integrated plausible tests into existing specifications for early validation
  • Updated import paths and build configuration for proper Lean project structure

Reviewed Changes

Copilot reviewed 41 out of 42 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
lakefile.lean Added plausible dependency and new library targets
lean/Testing/PlausibleUtils.lean Created utility functions and tactics for property testing
lean/Testing/PlausibleExamples.lean Added comprehensive examples of plausible usage patterns
vericoding/core/language_tools.py Enhanced Lean verification to run from project root directory
config/language_config.yaml Updated Lean build commands to use lake instead of direct lean calls
lean/Benchmarks/clever/src/lean4/specs/problem_87_spec.lean Added plausible validation tests before proof attempt
Multiple spec files Fixed import paths and added missing helper definitions

Comment thread lean/Testing/PlausibleUtils.lean Outdated
def findCounterexample {α : Type} [Sampleable α] [PrintableProp α]
(property : α → Prop) [∀ x, Testable (property x)]
(maxAttempts : Nat := 100) : IO (Option α) := do
sorry
Copy link

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The findCounterexample function is incomplete with a sorry implementation. This could cause runtime errors if called, and the function signature suggests it should return actual counterexamples.

Suggested change
sorry
let mut i := 0
while i < maxAttempts do
let x ← Sampleable.sample
let holds ← Testable.check { numInst := 1 } (property x)
match holds with
| TestResult.success _ => pure ()
| _ => return some x
i := i + 1
return none

Copilot uses AI. Check for mistakes.

section ListProperties

example : ∀ (xs ys : List Nat), xs ++ ys = ys ++ xs := by
Copy link

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This property is mathematically incorrect - list concatenation is not commutative. The plausible tactic should find a counterexample, but this demonstrates an intentionally false property as an example.

Suggested change
example : ∀ (xs ys : List Nat), xs ++ ys = ys ++ xs := by
example : ∀ (xs ys zs : List Nat), (xs ++ ys) ++ zs = xs ++ (ys ++ zs) := by

Copilot uses AI. Check for mistakes.
Comment thread lean/Testing/PlausibleExamples.lean Outdated
example : ∀ r : Rectangle, r.area = 0 ↔ (r.width = 0 ∨ r.height = 0) := by
plausible

example : ∀ r : Rectangle, r.perimeter ≥ 2 * r.area.sqrt := by
Copy link

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mathematical property is incorrect. For a rectangle with width=1, height=4, perimeter=10 but 2sqrt(4)=4, so 10 ≥ 4 holds, but for width=10, height=1, perimeter=22 but 2sqrt(10)≈6.32, which still holds. However, the correct isoperimetric inequality for rectangles should be perimeter² ≥ 16*area.

Suggested change
example : ∀ r : Rectangle, r.perimeter 2 * r.area.sqrt := by
example : ∀ r : Rectangle, r.perimeter * r.perimeter ≥ 16 * r.area := by

Copilot uses AI. Check for mistakes.
tool_path = get_tool_path(config)

# For Lean, we need to run from the project root where lakefile is
cwd = None
Copy link

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The variable cwd is initialized but may remain None for non-Lean languages, which could cause issues if other languages also need project root detection in the future.

Suggested change
cwd = None
cwd = os.getcwd()

Copilot uses AI. Check for mistakes.
alok and others added 10 commits August 11, 2025 20:26
- Updated lake-manifest.json with latest mathlib dependencies
- Ensures toolchain compatibility (rc4) across project and dependencies

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
- Simplified PlausibleUtils to only contain tactic macros
- Removed complex type dependencies causing build errors
- Simplified PlausibleExamples with working property tests
- Removed TestPlausible target from lakefile (no root file)
- Reverted problem_87_spec to simpler version without plausible

The plausible tactic is working correctly and finding counter-examples
as expected (e.g. list concatenation non-commutativity).

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
The CI was failing because imports were using 'Benchmarks.Clever'
but the actual directory is 'clever' (lowercase). Fixed all imports
to match the actual filesystem structure.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Fixed formatting issues in vericoding/core/language_tools.py
to comply with project code style requirements.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Fixed remaining import issues:
- Changed 'import benchmarks' to 'import Benchmarks' (capital B)
- Ensures consistency with actual directory names
- Resolves CI build failures due to case-sensitive imports

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Removed duplicate declarations that are already in CommonDefs:
- balanced_paren_non_computable from problem_119_spec.lean
- Other duplicates already removed by sed commands

These were causing 'has already been declared' errors in CI build.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Removed duplicate declarations that are already in CommonDefs:
- balanced_paren_non_computable from problem_56_spec.lean, problem_61_spec.lean
- is_palindrome from problem_48_spec.lean, problem_10_spec.lean

These were causing 'has already been declared' errors in CI.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
- Removed duplicate function declarations from problem_132_spec.lean
- Added is_subsequence_list helper to convert between List Char and String
- Fixes 'has already been declared' and type mismatch errors

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants