Skip to content

Fix pointer types in cublasHgemm#3157

Open
lpawela wants to merge 1 commit into
JuliaGPU:mainfrom
lpawela:master
Open

Fix pointer types in cublasHgemm#3157
lpawela wants to merge 1 commit into
JuliaGPU:mainfrom
lpawela:master

Conversation

@lpawela
Copy link
Copy Markdown
Contributor

@lpawela lpawela commented May 28, 2026

In the current main branch this fails

N = 256
A = CUDA.rand(Float16, N, N)
B = CUDA.rand(Float16, N, N)
C = CUDA.zeros(Float16, N, N)
CUDA.CUBLAS.gemm!('N', 'N', one(Float16), A, B, zero(Float16), C)

with

ERROR: MethodError: no method matching unsafe_convert(::Type{Ptr{Float16}}, ::Float16)
The function `unsafe_convert` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  unsafe_convert(::Type{Cwstring}, ::Any)
   @ Base strings/cstring.jl:101
  unsafe_convert(::Type{Ptr{T}}, ::Array{T}) where T
   @ Base pointer.jl:65
  unsafe_convert(::Type{Ptr{T}}, ::Ptr{NTuple{N, T}}) where {N, T}
   @ Base refpointer.jl:188

This PR fixes this issue.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: f7b7855 Previous: 54c7586 Ratio
array/accumulate/Float32/1d 99736 ns 100040 ns 1.00
array/accumulate/Float32/dims=1 75957 ns 75598 ns 1.00
array/accumulate/Float32/dims=1L 1585719 ns 1585923 ns 1.00
array/accumulate/Float32/dims=2 141680 ns 140396 ns 1.01
array/accumulate/Float32/dims=2L 653773 ns 652952 ns 1.00
array/accumulate/Int64/1d 117344 ns 116760 ns 1.01
array/accumulate/Int64/dims=1 79011 ns 78966 ns 1.00
array/accumulate/Int64/dims=1L 1698717 ns 1697893 ns 1.00
array/accumulate/Int64/dims=2 150749 ns 150510 ns 1.00
array/accumulate/Int64/dims=2L 959443 ns 959254 ns 1.00
array/broadcast 18299 ns 18315 ns 1.00
array/construct 1198.3 ns 1307.5 ns 0.92
array/copy 16803 ns 16659 ns 1.01
array/copyto!/cpu_to_gpu 214558 ns 211984 ns 1.01
array/copyto!/gpu_to_cpu 281252 ns 280422 ns 1.00
array/copyto!/gpu_to_gpu 10590 ns 10380 ns 1.02
array/iteration/findall/bool 132091 ns 131500 ns 1.00
array/iteration/findall/int 145310 ns 145666 ns 1.00
array/iteration/findfirst/bool 68732 ns 68920 ns 1.00
array/iteration/findfirst/int 70960 ns 71508 ns 0.99
array/iteration/findmin/1d 67253 ns 66419 ns 1.01
array/iteration/findmin/2d 101147 ns 101084 ns 1.00
array/iteration/logical 190663 ns 190475 ns 1.00
array/iteration/scalar 65429 ns 65525 ns 1.00
array/permutedims/2d 49770 ns 49765 ns 1.00
array/permutedims/3d 50309 ns 50702 ns 0.99
array/permutedims/4d 50743 ns 51020 ns 0.99
array/random/rand/Float32 11657 ns 11632 ns 1.00
array/random/rand/Int64 22230 ns 22338 ns 1.00
array/random/rand!/Float32 8009.333333333333 ns 7935.333333333333 ns 1.01
array/random/rand!/Int64 18413 ns 17826 ns 1.03
array/random/randn/Float32 36449 ns 36306 ns 1.00
array/random/randn!/Float32 24057 ns 23975 ns 1.00
array/reductions/mapreduce/Float32/1d 33655 ns 33733 ns 1.00
array/reductions/mapreduce/Float32/dims=1 37928 ns 38085 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 50402 ns 50507 ns 1.00
array/reductions/mapreduce/Float32/dims=2 55582 ns 55668 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 67451 ns 67497 ns 1.00
array/reductions/mapreduce/Int64/1d 39612 ns 40096 ns 0.99
array/reductions/mapreduce/Int64/dims=1 41044 ns 40849 ns 1.00
array/reductions/mapreduce/Int64/dims=1L 86458 ns 86591 ns 1.00
array/reductions/mapreduce/Int64/dims=2 57784 ns 58177 ns 0.99
array/reductions/mapreduce/Int64/dims=2L 82871 ns 83232 ns 1.00
array/reductions/reduce/Float32/1d 33492 ns 33766 ns 0.99
array/reductions/reduce/Float32/dims=1 38230 ns 38178 ns 1.00
array/reductions/reduce/Float32/dims=1L 50431 ns 50557 ns 1.00
array/reductions/reduce/Float32/dims=2 55368 ns 55534 ns 1.00
array/reductions/reduce/Float32/dims=2L 67964 ns 68199 ns 1.00
array/reductions/reduce/Int64/1d 39467 ns 39982 ns 0.99
array/reductions/reduce/Int64/dims=1 40521 ns 41049 ns 0.99
array/reductions/reduce/Int64/dims=1L 86277 ns 86662 ns 1.00
array/reductions/reduce/Int64/dims=2 57786 ns 58182 ns 0.99
array/reductions/reduce/Int64/dims=2L 82512 ns 83392 ns 0.99
array/reverse/1d 16797 ns 17096 ns 0.98
array/reverse/1dL 67649 ns 67978 ns 1.00
array/reverse/1dL_inplace 65328 ns 65357 ns 1.00
array/reverse/1d_inplace 8854 ns 8332.666666666666 ns 1.06
array/reverse/2d 20034 ns 20040 ns 1.00
array/reverse/2dL 71800 ns 71744 ns 1.00
array/reverse/2dL_inplace 65102 ns 65037 ns 1.00
array/reverse/2d_inplace 10259 ns 9712 ns 1.06
array/sorting/1d 2724690 ns 2725346 ns 1.00
array/sorting/2d 1063501 ns 1061549 ns 1.00
array/sorting/by 3269235 ns 3268510 ns 1.00
cuda/synchronization/context/auto 1115.3 ns 1148.8 ns 0.97
cuda/synchronization/context/blocking 920 ns 947 ns 0.97
cuda/synchronization/context/nonblocking 5948.8 ns 6099.8 ns 0.98
cuda/synchronization/stream/auto 966.578947368421 ns 1006 ns 0.96
cuda/synchronization/stream/blocking 828.156626506024 ns 859.8142857142857 ns 0.96
cuda/synchronization/stream/nonblocking 5904.833333333333 ns 5966.2 ns 0.99
integration/byval/reference 143161 ns 143152 ns 1.00
integration/byval/slices=1 145336 ns 145205 ns 1.00
integration/byval/slices=2 283738 ns 283811 ns 1.00
integration/byval/slices=3 422228 ns 422282 ns 1.00
integration/cudadevrt 101648 ns 101717 ns 1.00
integration/volumerhs 8883509 ns 8896793 ns 1.00
kernel/indexing 12728 ns 12625 ns 1.01
kernel/indexing_checked 13354 ns 13467 ns 0.99
kernel/launch 2109.5555555555557 ns 2103.8888888888887 ns 1.00
kernel/occupancy 735.5757575757576 ns 692.2105263157895 ns 1.06
kernel/rand 14893 ns 14754 ns 1.01
latency/import 3846061458 ns 3848826206 ns 1.00
latency/precompile 4624359073 ns 4629101097 ns 1.00
latency/ttfp 4491854168 ns 4482824517 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 16.32%. Comparing base (54c7586) to head (f7b7855).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3157   +/-   ##
=======================================
  Coverage   16.32%   16.32%           
=======================================
  Files         124      124           
  Lines        9875     9875           
=======================================
  Hits         1612     1612           
  Misses       8263     8263           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant