Support for CUDA toolkit 13.3 by maleadt · Pull Request #3155 · JuliaGPU/CUDA.jl

maleadt · 2026-05-27T08:02:32Z

No description provided.

CUDA 13.0 removed CUFFT_INCOMPLETE_PARAMETER_LIST, CUFFT_PARSE_ERROR and CUFFT_LICENSE_ERROR from cufftResult. Since the bindings are regenerated against 13.3, those names no longer exist, so description() threw an UndefVarError for any error code that fell through to them. Drop the dead branches and add descriptions for the new codes (CUFFT_MISSING_DEPENDENCY, CUFFT_NVRTC_FAILURE, CUFFT_NVJITLINK_FAILURE, CUFFT_NVSHMEM_FAILURE). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cuEventElapsedTime_v2 (CUDA 12.8+) supersedes the now-deprecated v1 entry point with improved accuracy and argument validation. Branch on driver_version() so we call it on new enough drivers and keep the v1 fallback otherwise. Covered by the existing "events" testset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cuSPARSE 12.8.1 (CUDA 13.3) added the generic SpGEAM API for C = αA + βB, replacing the type-specific csrgeam2 routines. Prefer it when available and keep csrgeam2 as the fallback for older versions. Also fix the generated SpGEAM bindings: the device workspace was typed as a host Ptr{Cvoid} (it must be CuPtr{Cvoid}), and the alpha/beta scalars are now PtrOrCuPtr{Cvoid} to match the other generic APIs. Fixed in res/wrap too. Covered by the existing geam tests in interfaces/mul.jl. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cuSPARSE 12.8.1 (CUDA 13.3) added native CSC support to the triangular solve APIs. Use it instead of modelling a CSC matrix as its transposed CSR on new enough versions; the workaround couldn't represent transa = 'C', so the adjoint of a complex CSC matrix now works too. Relax the corresponding test skips accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CUDA added tensor-core emulation of higher precisions: BF16x9 reproduces full FP32 accuracy (cuBLAS 12.9+) and the Ozaki fixed-point scheme emulates FP64 (cuBLAS 13.1+, i.e. CUDA 13.0 Update 2). Expose them through the existing `math_mode!`/`math_precision` mechanism: under FAST_MATH, a `:BFloat16x9` precision selects FP32 emulation and `:FixedPoint` selects FP64 emulation. The math mode is applied to the cuBLAS handle (covering plain GEMMs) and the matching compute types are returned from gemmExComputeType (covering gemmEx!); the handle now also re-applies when the precision alone changes. Version gates use the cuBLAS library version, which does not track the toolkit version (CUDA 13.0u2 ships cuBLAS 13.1.0, CUDA 13.3 ships 13.5.1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions

CUDA.jl Benchmarks

Details

Benchmark suite	Current: `cb21ca6`	Previous: `6a129e0`	Ratio
`array/accumulate/Float32/1d`	`99934` ns	`100053` ns	`1.00`
`array/accumulate/Float32/dims=1`	`75686` ns	`75787` ns	`1.00`
`array/accumulate/Float32/dims=1L`	`1587057` ns	`1577853` ns	`1.01`
`array/accumulate/Float32/dims=2`	`141448` ns	`141062` ns	`1.00`
`array/accumulate/Float32/dims=2L`	`653854` ns	`653707` ns	`1.00`
`array/accumulate/Int64/1d`	`117378` ns	`116750` ns	`1.01`
`array/accumulate/Int64/dims=1`	`79900` ns	`78711` ns	`1.02`
`array/accumulate/Int64/dims=1L`	`1699383` ns	`1683110` ns	`1.01`
`array/accumulate/Int64/dims=2`	`152175` ns	`151060` ns	`1.01`
`array/accumulate/Int64/dims=2L`	`960126` ns	`959381` ns	`1.00`
`array/broadcast`	`18881` ns	`19982` ns	`0.94`
`array/construct`	`1207.4` ns	`1194.4` ns	`1.01`
`array/copy`	`16473` ns	`16735` ns	`0.98`
`array/copyto!/cpu_to_gpu`	`214155` ns	`212740` ns	`1.01`
`array/copyto!/gpu_to_cpu`	`281669` ns	`279146` ns	`1.01`
`array/copyto!/gpu_to_gpu`	`10387` ns	`10253` ns	`1.01`
`array/iteration/findall/bool`	`132571` ns	`131244` ns	`1.01`
`array/iteration/findall/int`	`146336` ns	`145039` ns	`1.01`
`array/iteration/findfirst/bool`	`69618` ns	`79765` ns	`0.87`
`array/iteration/findfirst/int`	`71515` ns	`81388` ns	`0.88`
`array/iteration/findmin/1d`	`67419` ns	`66315` ns	`1.02`
`array/iteration/findmin/2d`	`101824` ns	`101872` ns	`1.00`
`array/iteration/logical`	`192079` ns	`190195` ns	`1.01`
`array/iteration/scalar`	`65603` ns	`64845` ns	`1.01`
`array/permutedims/2d`	`49721` ns	`49329` ns	`1.01`
`array/permutedims/3d`	`51360` ns	`49995` ns	`1.03`
`array/permutedims/4d`	`50755` ns	`49456` ns	`1.03`
`array/random/rand/Float32`	`11928` ns	`11887` ns	`1.00`
`array/random/rand/Int64`	`23844` ns	`23761` ns	`1.00`
`array/random/rand!/Float32`	`8021.666666666667` ns	`8603.666666666666` ns	`0.93`
`array/random/rand!/Int64`	`17867` ns	`20714` ns	`0.86`
`array/random/randn/Float32`	`36246` ns	`35746` ns	`1.01`
`array/random/randn!/Float32`	`24199` ns	`24698` ns	`0.98`
`array/reductions/mapreduce/Float32/1d`	`33444` ns	`33262` ns	`1.01`
`array/reductions/mapreduce/Float32/dims=1`	`37924` ns	`38065` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=1L`	`50252` ns	`50303` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=2`	`55904` ns	`55700` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=2L`	`67201` ns	`66967` ns	`1.00`
`array/reductions/mapreduce/Int64/1d`	`40065` ns	`39371` ns	`1.02`
`array/reductions/mapreduce/Int64/dims=1`	`40927` ns	`41064` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=1L`	`86328` ns	`86505` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=2`	`57984` ns	`57928` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=2L`	`83490` ns	`82783` ns	`1.01`
`array/reductions/reduce/Float32/1d`	`33479` ns	`33213` ns	`1.01`
`array/reductions/reduce/Float32/dims=1`	`38394` ns	`38126` ns	`1.01`
`array/reductions/reduce/Float32/dims=1L`	`50618` ns	`50464` ns	`1.00`
`array/reductions/reduce/Float32/dims=2`	`55745` ns	`55536` ns	`1.00`
`array/reductions/reduce/Float32/dims=2L`	`67728` ns	`67415` ns	`1.00`
`array/reductions/reduce/Int64/1d`	`40179` ns	`40070` ns	`1.00`
`array/reductions/reduce/Int64/dims=1`	`40669` ns	`40780` ns	`1.00`
`array/reductions/reduce/Int64/dims=1L`	`86396` ns	`86708` ns	`1.00`
`array/reductions/reduce/Int64/dims=2`	`57868` ns	`57749` ns	`1.00`
`array/reductions/reduce/Int64/dims=2L`	`82658` ns	`82436` ns	`1.00`
`array/reverse/1d`	`16862` ns	`16956` ns	`0.99`
`array/reverse/1dL`	`67699` ns	`67986` ns	`1.00`
`array/reverse/1dL_inplace`	`65292` ns	`65223` ns	`1.00`
`array/reverse/1d_inplace`	`8280.333333333334` ns	`8237.333333333334` ns	`1.01`
`array/reverse/2d`	`20334` ns	`20361` ns	`1.00`
`array/reverse/2dL`	`71899` ns	`72160` ns	`1.00`
`array/reverse/2dL_inplace`	`65109` ns	`65120` ns	`1.00`
`array/reverse/2d_inplace`	`9713` ns	`9782` ns	`0.99`
`array/sorting/1d`	`2713130` ns	`2707240` ns	`1.00`
`array/sorting/2d`	`1062830` ns	`1063955` ns	`1.00`
`array/sorting/by`	`3269686` ns	`3281778` ns	`1.00`
`cuda/synchronization/context/auto`	`1133.5` ns	`1132.3` ns	`1.00`
`cuda/synchronization/context/blocking`	`952.304347826087` ns	`923.0333333333333` ns	`1.03`
`cuda/synchronization/context/nonblocking`	`6086.2` ns	`6078` ns	`1.00`
`cuda/synchronization/stream/auto`	`986` ns	`974.4` ns	`1.01`
`cuda/synchronization/stream/blocking`	`830.2051282051282` ns	`783.4722222222222` ns	`1.06`
`cuda/synchronization/stream/nonblocking`	`5974.4` ns	`6004.333333333333` ns	`1.00`
`integration/byval/reference`	`143284` ns	`143190` ns	`1.00`
`integration/byval/slices=1`	`145461` ns	`145031` ns	`1.00`
`integration/byval/slices=2`	`283970` ns	`283718` ns	`1.00`
`integration/byval/slices=3`	`422516` ns	`422133` ns	`1.00`
`integration/cudadevrt`	`101724` ns	`101714` ns	`1.00`
`integration/volumerhs`	`8882117` ns	`9906116` ns	`0.90`
`kernel/indexing`	`13006` ns	`12607` ns	`1.03`
`kernel/indexing_checked`	`13648` ns	`13261` ns	`1.03`
`kernel/launch`	`2065.5555555555557` ns	`2122.5555555555557` ns	`0.97`
`kernel/occupancy`	`728.5496183206106` ns	`718.696` ns	`1.01`
`kernel/rand`	`13903` ns	`14142` ns	`0.98`
`latency/import`	`3863319483` ns	`3854591463` ns	`1.00`
`latency/precompile`	`4627375059` ns	`4627171718` ns	`1.00`
`latency/ttfp`	`4503227168` ns	`4491759222` ns	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

codecov · 2026-05-28T11:25:55Z

Codecov Report

❌ Patch coverage is 11.11111% with 56 lines in your changes missing coverage. Please review.
✅ Project coverage is 16.32%. Comparing base (2fe75d6) to head (cb21ca6).
⚠️ Report is 19 commits behind head on main.

Files with missing lines	Patch %	Lines
lib/cusparse/src/extra.jl	0.00%	29 Missing ⚠️
lib/cublas/src/cuBLAS.jl	41.66%	7 Missing ⚠️
lib/cublas/src/wrappers.jl	22.22%	7 Missing ⚠️
lib/cusparse/src/helpers.jl	0.00%	7 Missing ⚠️
lib/cusparse/src/generic.jl	0.00%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3155      +/-   ##
==========================================
- Coverage   16.40%   16.32%   -0.08%     
==========================================
  Files         124      124              
  Lines        9827     9875      +48     
==========================================
  Hits         1612     1612              
- Misses       8215     8263      +48

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

maleadt and others added 8 commits May 27, 2026 08:51

Bump JLLs.

2aa50f6

Update headers.

7f47c1d

Bump compat databases.

953b6e2

github-actions Bot reviewed May 27, 2026

View reviewed changes

maleadt added 2 commits May 27, 2026 11:52

Remove test that passes now.

a7904cb

Add CI for CUDA 13.3.

cb21ca6

maleadt enabled auto-merge May 28, 2026 11:19

maleadt disabled auto-merge May 28, 2026 11:19

maleadt merged commit e13541e into main May 28, 2026
1 of 2 checks passed

maleadt deleted the tb/ctk_13.3 branch May 28, 2026 11:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for CUDA toolkit 13.3#3155

Support for CUDA toolkit 13.3#3155
maleadt merged 10 commits into
mainfrom
tb/ctk_13.3

maleadt commented May 27, 2026

Uh oh!

github-actions Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

codecov Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

maleadt commented May 27, 2026

Uh oh!

github-actions Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

CUDA.jl Benchmarks

Uh oh!

Uh oh!

codecov Bot commented May 28, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot left a comment •

edited

Loading