Fix flaky Enzyme test_forward/test_reverse tolerance (RNG-dependent vs fastpower approximation)#58
Draft
ChrisRackauckas-Claude wants to merge 1 commit into
Conversation
…roximation The Enzyme `@easy_rule` returns the exact `^` derivative, but EnzymeTestUtils `test_forward`/`test_reverse` compare it against finite differences of the *approximate* `fastpower` primal. The measured gap is therefore `fastpower`'s own primal approximation error (~1e-3 relative, the same envelope asserted in test/fast_pow_tests.jl), which sat right on top of the previous atol=1e-4, rtol=1e-3. Whether the lane passed depended on the random perturbation drawn from the global RNG: an analytic sweep over the tangent grid the test samples shows ~4.4% of draws exceed the old tolerance, so the lane went red intermittently (green 8 days ago, red on the v1.3.2 run, green locally). Seed the RNG (Random.Xoshiro(0)) for reproducibility and raise the tolerance to atol=1e-3, rtol=1e-2, consistent with fastpower's documented accuracy. The new tolerance has zero failures across all ~6.5M tangent draws in the grid, while a genuinely wrong rule would still be off by O(1) relative and is not masked. Add Random to the test extras/targets and [compat]. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Please ignore until reviewed by @ChrisRackauckas.
Problem
tests / Enzyme (julia 1)onmainwent red on the v1.3.2 run (was green 8 days earlier, with identical test source). The failure:Root cause
FastPower's Enzyme
@easy_rulereturns the exact^derivative (y*fastpower(x,y-1),Ω*log(x)). EnzymeTestUtils'test_forward/test_reversecompare that rule against finite differences of the deliberately-approximatefastpowerprimal. So the measured gap is exactlyfastpower's own primal approximation error (~1e-3 relative — the same envelope asserted intest/fast_pow_tests.jl), which sat right on top of the oldatol=1e-4, rtol=1e-3.Whether the lane passed depended on the random perturbation
test_forwarddrew from the global RNG. An analytic sweep over the tangent grid the test samples (tangents in-9:0.01:9, central FD-5, at x=1.0, y=0.5) shows:The CI failing value
0.15524589105604497is reproduced exactly at tangentdx=0.3105(= the exact-^derivative0.5·dx), with the FD-of-fastpowerreference at0.1555— i.e. it isfastpower's primal error, not a wrong rule.Fix
Random.Xoshiro(0)) so the randomized test is reproducible.atol=1e-3, rtol=1e-2, consistent withfastpower's documented accuracy. This has zero failures across all ~6.5M tangent draws in the grid, while a genuinely wrong rule would still be off by O(1) relative and is not masked.Randomto the test[extras]/[targets]and aRandom = "1"[compat]entry.This is the principled fix, not a blanket tolerance loosening: the rule's true error is zero; the only thing being measured is the approximation built into
fastpoweritself.Verification (run locally)
Deps resolved match CI: Enzyme 0.13.164, EnzymeTestUtils 0.2.8, FiniteDifferences 0.12.34.
test_forwardwithrng=Xoshiro(16)(first tangent dx=0.31): old tolerance → 6 pass / 1 fail (matches CI); new tolerance → 7 pass / 0 fail.Pkg.test, julia 1.11:enzyme_forward_tests 52/52,enzyme_reverse_tests 36/36, tests passed.Pkg.test, julia lts (1.10):52/52,36/36, tests passed.Note on the other red lanes in the same run
tests / Core (julia 1)andtests / Core (julia lts)were red in the same run but are not code failures: both ran onself-hosted-4vcpu-8gb(smcsd) runners squatting on theubuntu-latestlabel; the "Run tests" step emitted zero log output and never recorded a conclusion (runner OOM/lost-communication while precompiling the Mooncake+Enzyme+ReverseDiff stack in 8 GB). Locally the Core group passes cleanly (fast_log2 1200/1200, fast_pow 5/5, other_ad_engines 4/4, all AD-engine derivative comparisons rel=0.0) on both julia 1.11 and lts. That is a runner-capacity infra issue, out of scope for this PR.🤖 Generated with Claude Code