Add plausible property testing framework by alok · Pull Request #62 · Beneficial-AI-Foundation/vericoding

alok · 2025-08-12T03:12:28Z

Summary

Added the plausible library for property-based testing (similar to QuickCheck)
Created utilities and examples for using plausible testing
Integrated plausible checks into existing specs

Key Changes

Added plausible dependency - Using leanprover-community/plausible for counterexample finding
PlausibleUtils module - Helper functions and tactics for common test patterns
PlausibleExamples module - Demonstrations of property testing with lists, matrices, and custom types
Updated specs - Added plausible tests to problem_87_spec before sorry to catch assumption violations
Documentation - Comprehensive guide in CLAUDE.md for using plausible testing

Benefits

Early error detection: Find counterexamples quickly before attempting full proofs
Improved sorry-driven development: Replace bare sorry with by plausible; sorry
Pit of success: Models can validate assumptions before investing in verification
Regression testing: Ensure spec changes don't break existing properties

Test Plan

Add plausible library dependency
Create testing utilities
Write example property tests
Update existing specs
Run full build to verify compilation (in progress - large codebase)

🤖 Generated with Claude Code

- Update lean-toolchain to v4.22.0-rc4 to match Mathlib requirements - Fix CommonDefs.lean: move imports before module doc comment - Fix CommonDefs.lean: rename 'open' parameter to 'openChar' (reserved keyword) - Add find_project_root() to language_tools.py for proper lake build context - Update language_config.yaml to use 'lake build' instead of raw 'lean' - Add cwd parameter to subprocess calls for Lean verification - Fix imports in problem spec files (benchmarks.clever -> Benchmarks.Clever) - Create test scripts for wandb integration testing This ensures lake build runs from the project root with proper Mathlib access.

- Added plausible library dependency from leanprover-community - Created PlausibleUtils module with testing helpers - Added PlausibleExamples demonstrating property testing patterns - Updated problem_87_spec with plausible tests before proofs - Added comprehensive documentation in CLAUDE.md - Enables counterexample finding before attempting full proofs This provides a 'pit of success' approach where specs can be quickly validated with random testing before investing in full verification. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

Copilot

Pull Request Overview

This PR adds the plausible property-based testing framework to the vericoding project to enhance specification validation through counterexample finding. The integration enables early detection of specification errors and improves the sorry-driven development workflow.

Key changes include:

Added plausible library dependency and created testing utilities
Integrated plausible tests into existing specifications for early validation
Updated import paths and build configuration for proper Lean project structure

Reviewed Changes

Copilot reviewed 41 out of 42 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
lakefile.lean	Added plausible dependency and new library targets
lean/Testing/PlausibleUtils.lean	Created utility functions and tactics for property testing
lean/Testing/PlausibleExamples.lean	Added comprehensive examples of plausible usage patterns
vericoding/core/language_tools.py	Enhanced Lean verification to run from project root directory
config/language_config.yaml	Updated Lean build commands to use lake instead of direct lean calls
lean/Benchmarks/clever/src/lean4/specs/problem_87_spec.lean	Added plausible validation tests before proof attempt
Multiple spec files	Fixed import paths and added missing helper definitions

Copilot · 2025-08-12T03:13:32Z

+def findCounterexample {α : Type} [Sampleable α] [PrintableProp α]
+    (property : α → Prop) [∀ x, Testable (property x)]
+    (maxAttempts : Nat := 100) : IO (Option α) := do
+  sorry


The findCounterexample function is incomplete with a sorry implementation. This could cause runtime errors if called, and the function signature suggests it should return actual counterexamples.

Suggested change

sorry

let mut i := 0

while i < maxAttempts do

let x ← Sampleable.sample

let holds ← Testable.check { numInst := 1 } (property x)

match holds with

| TestResult.success _ => pure ()

| _ => return some x

i := i + 1

return none

Copilot · 2025-08-12T03:13:32Z

+
+section ListProperties
+
+example : ∀ (xs ys : List Nat), xs ++ ys = ys ++ xs := by


This property is mathematically incorrect - list concatenation is not commutative. The plausible tactic should find a counterexample, but this demonstrates an intentionally false property as an example.

Suggested change

example : ∀ (xs ys : List Nat), xs ++ ys = ys ++ xs := by

example : ∀ (xs ys zs : List Nat), (xs ++ ys) ++ zs = xs ++ (ys ++ zs) := by

Copilot · 2025-08-12T03:13:32Z

+example : ∀ r : Rectangle, r.area = 0 ↔ (r.width = 0 ∨ r.height = 0) := by
+  plausible
+
+example : ∀ r : Rectangle, r.perimeter ≥ 2 * r.area.sqrt := by


This mathematical property is incorrect. For a rectangle with width=1, height=4, perimeter=10 but 2sqrt(4)=4, so 10 ≥ 4 holds, but for width=10, height=1, perimeter=22 but 2sqrt(10)≈6.32, which still holds. However, the correct isoperimetric inequality for rectangles should be perimeter² ≥ 16*area.

Suggested change

example : ∀ r : Rectangle, r.perimeter ≥ 2 * r.area.sqrt := by

example : ∀ r : Rectangle, r.perimeter * r.perimeter ≥ 16 * r.area := by

Copilot · 2025-08-12T03:13:33Z

    tool_path = get_tool_path(config)
+
+    # For Lean, we need to run from the project root where lakefile is
+    cwd = None


[nitpick] The variable cwd is initialized but may remain None for non-Lean languages, which could cause issues if other languages also need project root detection in the future.

Suggested change

cwd = None

cwd = os.getcwd()

- Updated lake-manifest.json with latest mathlib dependencies - Ensures toolchain compatibility (rc4) across project and dependencies 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

- Simplified PlausibleUtils to only contain tactic macros - Removed complex type dependencies causing build errors - Simplified PlausibleExamples with working property tests - Removed TestPlausible target from lakefile (no root file) - Reverted problem_87_spec to simpler version without plausible The plausible tactic is working correctly and finding counter-examples as expected (e.g. list concatenation non-commutativity). 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

The CI was failing because imports were using 'Benchmarks.Clever' but the actual directory is 'clever' (lowercase). Fixed all imports to match the actual filesystem structure. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

Fixed formatting issues in vericoding/core/language_tools.py to comply with project code style requirements. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

Fixed remaining import issues: - Changed 'import benchmarks' to 'import Benchmarks' (capital B) - Ensures consistency with actual directory names - Resolves CI build failures due to case-sensitive imports 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

Removed duplicate declarations that are already in CommonDefs: - balanced_paren_non_computable from problem_119_spec.lean - Other duplicates already removed by sed commands These were causing 'has already been declared' errors in CI build. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

Removed duplicate declarations that are already in CommonDefs: - balanced_paren_non_computable from problem_56_spec.lean, problem_61_spec.lean - is_palindrome from problem_48_spec.lean, problem_10_spec.lean These were causing 'has already been declared' errors in CI. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

- Removed duplicate function declarations from problem_132_spec.lean - Added is_subsequence_list helper to convert between List Char and String - Fixes 'has already been declared' and type mismatch errors 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

alok and others added 3 commits August 7, 2025 09:11

chore: Initial wandb test cleanup

a6a9e36

Copilot AI review requested due to automatic review settings August 12, 2025 03:12

Copilot AI reviewed Aug 12, 2025

View reviewed changes

alok and others added 10 commits August 11, 2025 20:26

fix: Update mathlib to latest master version for CI

7a29cea

- Updated lake-manifest.json with latest mathlib dependencies - Ensures toolchain compatibility (rc4) across project and dependencies 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

fix: Apply ruff formatting to language_tools.py

b80c0b5

Fixed formatting issues in vericoding/core/language_tools.py to comply with project code style requirements. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

fix: Update toolchain to rc3 to match mathlib

25950d8

fix: Update to latest mathlib with rc4 toolchain

3e3b1ad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add plausible property testing framework#62

Add plausible property testing framework#62
alok wants to merge 13 commits into
mainfrom
add-plausible-testing

alok commented Aug 12, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 12, 2025

Uh oh!

Copilot AI Aug 12, 2025

Uh oh!

Copilot AI Aug 12, 2025

Uh oh!

Copilot AI Aug 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-  sorry
+  let mut i := 0
+  while i < maxAttempts do
+    let x ← Sampleable.sample
+    let holds ← Testable.check { numInst := 1 } (property x)
+    match holds with
+    | TestResult.success _ => pure ()
+    | _ => return some x
+    i := i + 1
+  return none


		section ListProperties

		example : ∀ (xs ys : List Nat), xs ++ ys = ys ++ xs := by

	example : ∀ (xs ys : List Nat), xs ++ ys = ys ++ xs := by
	example : ∀ (xs ys zs : List Nat), (xs ++ ys) ++ zs = xs ++ (ys ++ zs) := by

	example : ∀ r : Rectangle, r.perimeter ≥ 2 * r.area.sqrt := by
	example : ∀ r : Rectangle, r.perimeter * r.perimeter ≥ 16 * r.area := by

Conversation

alok commented Aug 12, 2025

Summary

Key Changes

Benefits

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants