Fix master CI: expv zero-input NaN, JET-on-1.12 QA, GPU-in-All#229
Fix master CI: expv zero-input NaN, JET-on-1.12 QA, GPU-in-All#229ChrisRackauckas-Claude wants to merge 2 commits into
Conversation
Three independent master-CI failures on the grouped-tests workflow:
1. Core (NaN == 0.0 at basictests.jl:307, flaky across OS/version).
The real `expv!(w, t::Real, Ks)` method lacked the `iszero(beta)`
guard that the complex method already has. For a zero input vector
`firststep!` skips initializing the Krylov basis V (it only fills it
when beta != 0), so `lmul!(beta, mul!(w, V, expHe))` computes
`0 * <uninitialized memory>`, which is NaN whenever V holds garbage.
Add the same early-return guard, making expv of a zero vector exactly
zero (matching the complex method). Verified: full Core suite now
passes on Julia 1.10 and 1.12 (was reliably NaN on 1.10).
2. QA (6 JET failures on the Julia "1" = 1.12 channel; lts/1.10 was
green). On 1.12 JET traces into LinearAlgebra/Base internals
(`norm(::Vector)` -> `norm_recursive_check` -> `iterate(::Nothing)`,
and the broadcast `unalias`/`copyto_unaliased!` path over
`Adjoint{T, Union{}}`) and reports artifacts there that this package
does not control. Scope the QA `report_call`s to
`target_modules = (ExponentialUtilities,)` — the standard JET-as-QA
configuration — which keeps full coverage of this package's own code.
That scoping surfaced two genuine `may be undefined` findings, fixed
here so the scoped analysis is clean: `si` in `exponential!` and
`order`/`kest` in `kiops` are now unconditionally initialized before
use. Verified: QA passes 17/17 on Julia 1.10 and 1.12.
3. Core (windows, all versions: "CUDA driver not functional"). On
Windows the Core job runs the run_tests "All" aggregate, which pulled
in the GPU group and `using CUDA` errored on the non-GPU runner. Mark
the GPU group `in_all = false` so it only runs under an explicit
GROUP=GPU on the self-hosted CUDA runner. Verified locally: GROUP=All
now runs only Core/basictests.jl, never GPU/gputests.jl.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "Static Arrays" testset compared `expv(t, A, b)` against `exp(t * A) * b` where `exp(t * A)` is StaticArrays' SMatrix matrix exponential. That reference uses an unbalanced scaling-and-squaring Padé path which loses ~7-9 digits for the larger non-normal N=8 cases on macOS + Julia prerelease (relerr ~1e-7..1e-5), tripping the default-tolerance isapprox in "Core (julia pre, macos-latest)". Verified against a 512-bit BigFloat ground truth that the macOS `expv` output is correct to ~1e-16 on both platforms; it was the StaticArrays `exp` reference, not `expv`, that drifted. Switching the reference to the dense LAPACK `exp`, which is balanced and accurate on every platform, keeps this a machine-precision assertion that still catches real `expv` regressions (no tolerance loosening). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Resolved the last red — Root cause: not an
So Default tolerances for context. The SMatrix Fix (no tolerance loosening): compare Verified locally (
The "Static Arrays" testset itself: Ignore until reviewed by @ChrisRackauckas. |
Fixes three independent failures on the master grouped-tests CI.
1. Core:
NaN == 0.0atbasictests.jl:307(zero-input expv)The real
expv!(w, t::Real, Ks)method was missing theiszero(beta)guard the complex method already has. For a zero input vectorfirststep!skips initializing the Krylov basisV(it only fillsV[:,1]whenbeta != 0), so the finallmul!(beta, mul!(w, @view(V[:,1:m]), expHe))computes0 * <uninitialized memory>, which isNaNwheneverVholds garbage — explaining why the failure was flaky (heap-dependent: green on some OS/runs,NaNon others). Added the same early-return guard soexpvof a zero vector is exactly zero.Verified locally: full
GROUP=CorePkg.testpasses on Julia 1.10 and 1.12 (it reliably producedNaNon 1.10 before).2. QA: 6 JET failures on the
1(= Julia 1.12) channellts(1.10) was green; only1(1.12) failed. On 1.12 JET traces intoLinearAlgebra/Baseinternals —norm(::Vector)→norm_recursive_check→iterate(::Nothing), and the broadcastunalias/copyto_unaliased!path overAdjoint{T, Union{}}— and reports abstract-interpretation artifacts there that this package does not control. Scoped the QAreport_calls totarget_modules = (ExponentialUtilities,)(the standard JET-as-package-QA configuration), which keeps full coverage of this package's own code.That scoping surfaced two genuine
may be undefinedfindings, which are fixed here so the scoped analysis is clean (not silenced):siinexponential!(exp_baseexp.jl) — conditionally assigned insideif s > 0, used inside a separateif s > 0; now initialized to0unconditionally.order/kestinkiops(kiops.jl) — carried across loop iterations via theorderold/kestold"reuse" flags but only conditionally assigned; now seeded with their first-iteration defaults.Verified locally: QA passes 17/17 on Julia 1.10 and 1.12.
3. Core (windows): "CUDA driver not functional"
On Windows the Core job runs the
run_tests"All" aggregate, which pulled in theGPUgroup, andusing CUDAerrored on the non-GPU runner. Marked theGPUgroupin_all = falseso it only ever runs under an explicitGROUP=GPUon the self-hosted CUDA runner. Verified locally:GROUP=Allnow runs onlyCore/basictests.jl, neverGPU/gputests.jl.Not addressed (reported separately)
Static Arraystolerance failure atbasictests.jl:265(expv(t,A,b) ≈ exp(t*A)*b). On linux Julia 1.13-rc1 the worst relative error is1.25e-15; the macOS-pre failure shows~1e-7. This is a macOS/1.13-rc-specific accuracy difference I could not reproduce or correctly fix on linux, and I will not loosen the tolerance without being able to prove the macOS deviation is benign.Please ignore until reviewed by @ChrisRackauckas.