Skip to content

Fix flaky Enzyme test_forward/test_reverse tolerance (RNG-dependent vs fastpower approximation)#58

Draft
ChrisRackauckas-Claude wants to merge 1 commit into
SciML:mainfrom
ChrisRackauckas-Claude:fix-enzyme-tolerance-rng-flake
Draft

Fix flaky Enzyme test_forward/test_reverse tolerance (RNG-dependent vs fastpower approximation)#58
ChrisRackauckas-Claude wants to merge 1 commit into
SciML:mainfrom
ChrisRackauckas-Claude:fix-enzyme-tolerance-rng-flake

Conversation

@ChrisRackauckas-Claude

Copy link
Copy Markdown
Contributor

Please ignore until reviewed by @ChrisRackauckas.

Problem

tests / Enzyme (julia 1) on main went red on the v1.3.2 run (was green 8 days earlier, with identical test source). The failure:

test_forward: fastpower with return activity Duplicated on (::Float64, Duplicated), (::Float64, Const): Test Failed
  Expression: isapprox(x, y; kwargs...)
   Evaluated: isapprox(0.155, 0.15524589105604497; atol = 0.0001, rtol = 0.001)

Root cause

FastPower's Enzyme @easy_rule returns the exact ^ derivative (y*fastpower(x,y-1), Ω*log(x)). EnzymeTestUtils' test_forward/test_reverse compare that rule against finite differences of the deliberately-approximate fastpower primal. So the measured gap is exactly fastpower's own primal approximation error (~1e-3 relative — the same envelope asserted in test/fast_pow_tests.jl), which sat right on top of the old atol=1e-4, rtol=1e-3.

Whether the lane passed depended on the random perturbation test_forward drew from the global RNG. An analytic sweep over the tangent grid the test samples (tangents in -9:0.01:9, central FD-5, at x=1.0, y=0.5) shows:

config worst abs gap worst rel gap fail @ old tol (1e-4,1e-3) fail @ new tol (1e-3,1e-2)
Tx=Dup, Ty=Const 1.17e-3 2.4% 144080/3241800 (4.44%) 0
Tx=Dup, Ty=Dup 2.04e-3 111058/3243600 (3.42%) 0
Tx=Const, Ty=Dup 6e-14 0 0

The CI failing value 0.15524589105604497 is reproduced exactly at tangent dx=0.3105 (= the exact-^ derivative 0.5·dx), with the FD-of-fastpower reference at 0.1555 — i.e. it is fastpower's primal error, not a wrong rule.

Fix

  • Seed the RNG (Random.Xoshiro(0)) so the randomized test is reproducible.
  • Raise the tolerance to atol=1e-3, rtol=1e-2, consistent with fastpower's documented accuracy. This has zero failures across all ~6.5M tangent draws in the grid, while a genuinely wrong rule would still be off by O(1) relative and is not masked.
  • Add Random to the test [extras]/[targets] and a Random = "1" [compat] entry.

This is the principled fix, not a blanket tolerance loosening: the rule's true error is zero; the only thing being measured is the approximation built into fastpower itself.

Verification (run locally)

Deps resolved match CI: Enzyme 0.13.164, EnzymeTestUtils 0.2.8, FiniteDifferences 0.12.34.

  • Reproduced the failure through the real test_forward with rng=Xoshiro(16) (first tangent dx=0.31): old tolerance → 6 pass / 1 fail (matches CI); new tolerance → 7 pass / 0 fail.
  • Fixed Enzyme group via Pkg.test, julia 1.11: enzyme_forward_tests 52/52, enzyme_reverse_tests 36/36, tests passed.
  • Fixed Enzyme group via Pkg.test, julia lts (1.10): 52/52, 36/36, tests passed.
  • Seeded forward test is deterministic: 52/52 across 3 repeats.
  • Runic: clean (no diff) on both edited files.

Note on the other red lanes in the same run

tests / Core (julia 1) and tests / Core (julia lts) were red in the same run but are not code failures: both ran on self-hosted-4vcpu-8gb (smcsd) runners squatting on the ubuntu-latest label; the "Run tests" step emitted zero log output and never recorded a conclusion (runner OOM/lost-communication while precompiling the Mooncake+Enzyme+ReverseDiff stack in 8 GB). Locally the Core group passes cleanly (fast_log2 1200/1200, fast_pow 5/5, other_ad_engines 4/4, all AD-engine derivative comparisons rel=0.0) on both julia 1.11 and lts. That is a runner-capacity infra issue, out of scope for this PR.

🤖 Generated with Claude Code

…roximation

The Enzyme `@easy_rule` returns the exact `^` derivative, but EnzymeTestUtils
`test_forward`/`test_reverse` compare it against finite differences of the
*approximate* `fastpower` primal. The measured gap is therefore `fastpower`'s
own primal approximation error (~1e-3 relative, the same envelope asserted in
test/fast_pow_tests.jl), which sat right on top of the previous atol=1e-4,
rtol=1e-3. Whether the lane passed depended on the random perturbation drawn
from the global RNG: an analytic sweep over the tangent grid the test samples
shows ~4.4% of draws exceed the old tolerance, so the lane went red
intermittently (green 8 days ago, red on the v1.3.2 run, green locally).

Seed the RNG (Random.Xoshiro(0)) for reproducibility and raise the tolerance to
atol=1e-3, rtol=1e-2, consistent with fastpower's documented accuracy. The new
tolerance has zero failures across all ~6.5M tangent draws in the grid, while a
genuinely wrong rule would still be off by O(1) relative and is not masked.
Add Random to the test extras/targets and [compat].

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants