Fix master CI: expv zero-input NaN, JET-on-1.12 QA, GPU-in-All by ChrisRackauckas-Claude · Pull Request #229 · SciML/ExponentialUtilities.jl

ChrisRackauckas-Claude · 2026-06-19T09:18:07Z

Fixes three independent failures on the master grouped-tests CI.

1. Core: `NaN == 0.0` at `basictests.jl:307` (zero-input expv)

The real expv!(w, t::Real, Ks) method was missing the iszero(beta) guard the complex method already has. For a zero input vector firststep! skips initializing the Krylov basis V (it only fills V[:,1] when beta != 0), so the final lmul!(beta, mul!(w, @view(V[:,1:m]), expHe)) computes 0 * <uninitialized memory>, which is NaN whenever V holds garbage — explaining why the failure was flaky (heap-dependent: green on some OS/runs, NaN on others). Added the same early-return guard so expv of a zero vector is exactly zero.

Verified locally: full GROUP=Core Pkg.test passes on Julia 1.10 and 1.12 (it reliably produced NaN on 1.10 before).

2. QA: 6 JET failures on the `1` (= Julia 1.12) channel

lts (1.10) was green; only 1 (1.12) failed. On 1.12 JET traces into LinearAlgebra/Base internals — norm(::Vector) → norm_recursive_check → iterate(::Nothing), and the broadcast unalias/copyto_unaliased! path over Adjoint{T, Union{}} — and reports abstract-interpretation artifacts there that this package does not control. Scoped the QA report_calls to target_modules = (ExponentialUtilities,) (the standard JET-as-package-QA configuration), which keeps full coverage of this package's own code.

That scoping surfaced two genuine may be undefined findings, which are fixed here so the scoped analysis is clean (not silenced):

si in exponential! (exp_baseexp.jl) — conditionally assigned inside if s > 0, used inside a separate if s > 0; now initialized to 0 unconditionally.
order / kest in kiops (kiops.jl) — carried across loop iterations via the orderold/kestold "reuse" flags but only conditionally assigned; now seeded with their first-iteration defaults.

Verified locally: QA passes 17/17 on Julia 1.10 and 1.12.

3. Core (windows): "CUDA driver not functional"

On Windows the Core job runs the run_tests "All" aggregate, which pulled in the GPU group, and using CUDA errored on the non-GPU runner. Marked the GPU group in_all = false so it only ever runs under an explicit GROUP=GPU on the self-hosted CUDA runner. Verified locally: GROUP=All now runs only Core/basictests.jl, never GPU/gputests.jl.

Not addressed (reported separately)

Core (julia pre, macos-latest): Static Arrays tolerance failure at basictests.jl:265 (expv(t,A,b) ≈ exp(t*A)*b). On linux Julia 1.13-rc1 the worst relative error is 1.25e-15; the macOS-pre failure shows ~1e-7. This is a macOS/1.13-rc-specific accuracy difference I could not reproduce or correctly fix on linux, and I will not loosen the tolerance without being able to prove the macOS deviation is benign.
GPU (self-hosted): requires CUDA hardware (infra), out of scope here.

Please ignore until reviewed by @ChrisRackauckas.

Three independent master-CI failures on the grouped-tests workflow: 1. Core (NaN == 0.0 at basictests.jl:307, flaky across OS/version). The real `expv!(w, t::Real, Ks)` method lacked the `iszero(beta)` guard that the complex method already has. For a zero input vector `firststep!` skips initializing the Krylov basis V (it only fills it when beta != 0), so `lmul!(beta, mul!(w, V, expHe))` computes `0 * <uninitialized memory>`, which is NaN whenever V holds garbage. Add the same early-return guard, making expv of a zero vector exactly zero (matching the complex method). Verified: full Core suite now passes on Julia 1.10 and 1.12 (was reliably NaN on 1.10). 2. QA (6 JET failures on the Julia "1" = 1.12 channel; lts/1.10 was green). On 1.12 JET traces into LinearAlgebra/Base internals (`norm(::Vector)` -> `norm_recursive_check` -> `iterate(::Nothing)`, and the broadcast `unalias`/`copyto_unaliased!` path over `Adjoint{T, Union{}}`) and reports artifacts there that this package does not control. Scope the QA `report_call`s to `target_modules = (ExponentialUtilities,)` — the standard JET-as-QA configuration — which keeps full coverage of this package's own code. That scoping surfaced two genuine `may be undefined` findings, fixed here so the scoped analysis is clean: `si` in `exponential!` and `order`/`kest` in `kiops` are now unconditionally initialized before use. Verified: QA passes 17/17 on Julia 1.10 and 1.12. 3. Core (windows, all versions: "CUDA driver not functional"). On Windows the Core job runs the run_tests "All" aggregate, which pulled in the GPU group and `using CUDA` errored on the non-GPU runner. Mark the GPU group `in_all = false` so it only runs under an explicit GROUP=GPU on the self-hosted CUDA runner. Verified locally: GROUP=All now runs only Core/basictests.jl, never GPU/gputests.jl. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The "Static Arrays" testset compared `expv(t, A, b)` against `exp(t * A) * b` where `exp(t * A)` is StaticArrays' SMatrix matrix exponential. That reference uses an unbalanced scaling-and-squaring Padé path which loses ~7-9 digits for the larger non-normal N=8 cases on macOS + Julia prerelease (relerr ~1e-7..1e-5), tripping the default-tolerance isapprox in "Core (julia pre, macos-latest)". Verified against a 512-bit BigFloat ground truth that the macOS `expv` output is correct to ~1e-16 on both platforms; it was the StaticArrays `exp` reference, not `expv`, that drifted. Switching the reference to the dense LAPACK `exp`, which is balanced and accurate on every platform, keeps this a machine-precision assertion that still catches real `expv` regressions (no tolerance loosening). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ChrisRackauckas-Claude · 2026-06-23T01:59:06Z

Resolved the last red — Core (julia pre, macos-latest) failing at test/basictests.jl:265 in the "Static Arrays" testset.

Root cause: not an expv bug. The assertion was expv(t, A, b) ≈ exp(t * A) * b, where exp(t * A) dispatches to StaticArrays.jl's own SMatrix matrix exponential. I reconstructed the two exact failing matrices (N=8, t=1.0 and N=8, t=10.0; RNG seed 0) and computed a 512-bit BigFloat ground truth:

quantity	relerr vs BigFloat truth (case N=8,t=1.0 / N=8,t=10.0)
macOS `expv` output (under test)	3.2e-16 / 7.7e-16 (correct)
macOS `exp(t*A)` StaticArrays reference	4.0e-7 / 7.5e-6 (wrong)
Linux `expv`	3.2e-16 / 7.3e-16
Linux `exp(t*A)` StaticArrays reference	1.9e-16 / 7.3e-16

So expv is machine-accurate on both platforms. It was the reference exp(t*A) (StaticArrays' unbalanced scaling-and-squaring Padé path — the source even notes "omitted: matrix balancing") that drifted ~7-9 digits on macOS + Julia 1.13-rc1. The test was comparing a correct value against a platform-fragile reference that is less accurate than the thing under test.

Default tolerances for context. The SMatrix expv extension targets eps(T)/2 ≈ 1.1e-16 (default_tolerance), and the Krylov expv path's happy-breakdown tol is 1e-7; neither is the issue here. The test gate is the default isapprox (rtol ≈ 1.49e-8). The macOS error of ~1e-7..1e-5 is far above any plausible expv FP floor (I confirmed across 400 seeds and forced mo/s/break-tol perturbations that faithful expv stays ≤5e-14), which is what pointed at the reference, not expv.

Fix (no tolerance loosening): compare expv against the dense LAPACK exponential exp(t * Matrix(A)) * Vector(b), which is balanced and accurate on every platform. This keeps a machine-precision (default-tolerance) assertion that still catches real expv regressions.

Verified locally (Pkg.test GROUP=Core, full basictests.jl):

Julia 1.13.0-rc1 (= CI pre): 329 pass, 1 broken (pre-existing @test_broken)
Julia 1.12.6: 329 pass, 1 broken
Julia 1.10.11 (lts): 329 pass, 1 broken

The "Static Arrays" testset itself: 12 pass / 12 on rc.

Ignore until reviewed by @ChrisRackauckas.

ChrisRackauckas and others added 2 commits June 19, 2026 05:17

ChrisRackauckas-Claude mentioned this pull request Jun 25, 2026

QA: run_qa v1.6 form + ExplicitImports #232

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix master CI: expv zero-input NaN, JET-on-1.12 QA, GPU-in-All#229

Fix master CI: expv zero-input NaN, JET-on-1.12 QA, GPU-in-All#229
ChrisRackauckas-Claude wants to merge 2 commits into
SciML:masterfrom
ChrisRackauckas-Claude:fix-master-ci-1.12-nan-jet-gpu

ChrisRackauckas-Claude commented Jun 19, 2026

Uh oh!

ChrisRackauckas-Claude commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

ChrisRackauckas-Claude commented Jun 19, 2026

1. Core: NaN == 0.0 at basictests.jl:307 (zero-input expv)

2. QA: 6 JET failures on the 1 (= Julia 1.12) channel

3. Core (windows): "CUDA driver not functional"

Not addressed (reported separately)

Uh oh!

ChrisRackauckas-Claude commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Core: `NaN == 0.0` at `basictests.jl:307` (zero-input expv)

2. QA: 6 JET failures on the `1` (= Julia 1.12) channel