Skip to content

Try to support virtual devices#789

Open
maleadt wants to merge 10 commits into
mainfrom
tb/virtual
Open

Try to support virtual devices#789
maleadt wants to merge 10 commits into
mainfrom
tb/virtual

Conversation

@maleadt
Copy link
Copy Markdown
Member

@maleadt maleadt commented May 29, 2026

Another stab at #309 -- things seem to work locally at least on a virtual macOS 15 on my M3 Pro.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

❌ Patch coverage is 88.23529% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.05%. Comparing base (25dcb95) to head (3ebe242).
⚠️ Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
lib/mtl/device.jl 80.00% 2 Missing ⚠️
src/state.jl 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #789      +/-   ##
==========================================
+ Coverage   80.96%   81.05%   +0.09%     
==========================================
  Files          64       64              
  Lines        3057     3083      +26     
==========================================
+ Hits         2475     2499      +24     
- Misses        582      584       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Details
Benchmark suite Current: 6fe48f0 Previous: 3b6c586 Ratio
array/accumulate/Float32/1d 759875 ns 820187.5 ns 0.93
array/accumulate/Float32/dims=1 978875 ns 1001416.5 ns 0.98
array/accumulate/Float32/dims=1L 10410916 ns 10577334 ns 0.98
array/accumulate/Float32/dims=2 1266458.5 ns 1279958 ns 0.99
array/accumulate/Float32/dims=2L 4744958.5 ns 4806791.5 ns 0.99
array/accumulate/Int64/1d 960583.5 ns 982083 ns 0.98
array/accumulate/Int64/dims=1 1109792 ns 1116021 ns 0.99
array/accumulate/Int64/dims=1L 12286895.5 ns 12565083 ns 0.98
array/accumulate/Int64/dims=2 1463709 ns 1466354.5 ns 1.00
array/accumulate/Int64/dims=2L 9376812.5 ns 8937708 ns 1.05
array/broadcast 331833 ns 329583 ns 1.01
array/construct 6041 ns 5625 ns 1.07
array/permutedims/2d 624937.5 ns 628208.5 ns 0.99
array/permutedims/3d 1127125 ns 1128667 ns 1.00
array/permutedims/4d 1359917 ns 1358834 ns 1.00
array/private/copy 352229.5 ns 365313 ns 0.96
array/private/copyto!/cpu_to_gpu 244458 ns 240750.5 ns 1.02
array/private/copyto!/gpu_to_cpu 283458.5 ns 234125 ns 1.21
array/private/copyto!/gpu_to_gpu 264937.5 ns 255375 ns 1.04
array/private/iteration/findall/bool 1105125 ns 1128729 ns 0.98
array/private/iteration/findall/int 1218729 ns 1291770.5 ns 0.94
array/private/iteration/findfirst/bool 1258750 ns 1245584 ns 1.01
array/private/iteration/findfirst/int 1298542 ns 1276500 ns 1.02
array/private/iteration/findmin/1d 1381125 ns 1432875 ns 0.96
array/private/iteration/findmin/2d 1172166.5 ns 1181438 ns 0.99
array/private/iteration/logical 1727209 ns 1770770.5 ns 0.98
array/private/iteration/scalar 1715875 ns 1476917 ns 1.16
array/random/rand/Float32 564792 ns 600333 ns 0.94
array/random/rand/Int64 660000 ns 648792 ns 1.02
array/random/rand!/Float32 507292 ns 521666 ns 0.97
array/random/rand!/Int64 486541 ns 481750 ns 1.01
array/random/randn/Float32 565333 ns 568458 ns 0.99
array/random/randn!/Float32 478875 ns 484104 ns 0.99
array/reductions/mapreduce/Float32/1d 535625 ns 498833 ns 1.07
array/reductions/mapreduce/Float32/dims=1 459750 ns 461979.5 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 716333 ns 721625 ns 0.99
array/reductions/mapreduce/Float32/dims=2 461750 ns 466083.5 ns 0.99
array/reductions/mapreduce/Float32/dims=2L 1120458 ns 1094312.5 ns 1.02
array/reductions/mapreduce/Int64/1d 777750 ns 795687 ns 0.98
array/reductions/mapreduce/Int64/dims=1 770875 ns 773437.5 ns 1.00
array/reductions/mapreduce/Int64/dims=1L 1211083 ns 1222229.5 ns 0.99
array/reductions/mapreduce/Int64/dims=2 967667 ns 925458 ns 1.05
array/reductions/mapreduce/Int64/dims=2L 2260375 ns 2261708 ns 1.00
array/reductions/reduce/Float32/1d 494583 ns 496459 ns 1.00
array/reductions/reduce/Float32/dims=1 457854.5 ns 463333 ns 0.99
array/reductions/reduce/Float32/dims=1L 715792 ns 719875 ns 0.99
array/reductions/reduce/Float32/dims=2 463229.5 ns 464875 ns 1.00
array/reductions/reduce/Float32/dims=2L 1116083 ns 1096729.5 ns 1.02
array/reductions/reduce/Int64/1d 804250 ns 787958 ns 1.02
array/reductions/reduce/Int64/dims=1 763791.5 ns 767125 ns 1.00
array/reductions/reduce/Int64/dims=1L 1250625 ns 1224375 ns 1.02
array/reductions/reduce/Int64/dims=2 953667 ns 961167 ns 0.99
array/reductions/reduce/Int64/dims=2L 2262625 ns 2264104.5 ns 1.00
array/shared/copy 167750 ns 166458 ns 1.01
array/shared/copyto!/cpu_to_gpu 48000 ns 47709 ns 1.01
array/shared/copyto!/gpu_to_cpu 41166 ns 43500 ns 0.95
array/shared/copyto!/gpu_to_gpu 48708 ns 48333 ns 1.01
array/shared/iteration/findall/bool 1110625 ns 1134875 ns 0.98
array/shared/iteration/findall/int 1219000 ns 1293166.5 ns 0.94
array/shared/iteration/findfirst/bool 1026792 ns 1033333 ns 0.99
array/shared/iteration/findfirst/int 1070688 ns 1081708.5 ns 0.99
array/shared/iteration/findmin/1d 1196875 ns 1201542 ns 1.00
array/shared/iteration/findmin/2d 1174333 ns 1177146 ns 1.00
array/shared/iteration/logical 1577646 ns 1625834 ns 0.97
array/shared/iteration/scalar 8375 ns 5951.333333333333 ns 1.41
integration/byval/reference 1175041.5 ns 1174791 ns 1.00
integration/byval/slices=1 1177458 ns 1178750 ns 1.00
integration/byval/slices=2 2122833 ns 2125604 ns 1.00
integration/byval/slices=3 21131854.5 ns 19836667 ns 1.07
integration/metaldevrt 458041.5 ns 458584 ns 1.00
kernel/indexing 325666 ns 302000 ns 1.08
kernel/indexing_checked 313542 ns 325084 ns 0.96
kernel/launch 13208 ns 14125 ns 0.94
kernel/rand 271250 ns 347958 ns 0.78
latency/import 1403534666.5 ns 1405863458.5 ns 1.00
latency/precompile 29725972833 ns 30266521250 ns 0.98
latency/ttfp 1677229520.5 ns 1681368375.5 ns 1.00
metal/synchronization/context 1037.5 ns 837.6436781609195 ns 1.24
metal/synchronization/stream 665.6335403726708 ns 427.5527638190955 ns 1.56

This comment was automatically generated by workflow using github-action-benchmark.

Comment thread src/utilities.jl Outdated
Comment thread src/utilities.jl Outdated
Comment thread src/utilities.jl Outdated
Comment thread test/runtests.jl Outdated
Comment thread src/initialization.jl Outdated
Comment thread .github/workflows/CI.yml
Tim Besard and others added 10 commits June 3, 2026 07:58
macOS VMs expose an "Apple Paravirtual device" that is backed by real
Apple Silicon and supports Metal 3, but under-reports its capabilities
via supportsFamily:, claiming to lack MTLGPUFamilyApple7/Metal3. This
made functional() return false, so KernelAbstractions.functional() was
false and the misleading "non-virtualized only" warnings fired.

Detect such devices (is_virtual), treat them as functional, replace the
warnings with a single best-effort notice, and make
serial_mapreduce_threshold robust to the unavailable GPU core count
(which otherwise degenerated the reduction heuristic to always-serial).

Refs #309.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the Metal shading language / AIR / metallib versions, and per-device
capabilities (highest Apple and Metal feature set, virtualization). This
makes it easy to spot when a device -- e.g. a future virtualized GPU --
gains support for a newer feature set such as Metal 4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Run the tests on paravirtualized GPUs instead of skipping them, and mark
the few behaviors they genuinely don't support (event device
back-reference, residency sets, GPU core count) as broken/skipped.

Gate the large, slow GPUArrays test suite behind a new --all flag (after
CUDA.jl) so it can be skipped on the best-effort virtualized GitHub
Actions CI while still running on the real-hardware buildkite CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Run the core tests on GitHub-hosted Apple Silicon runners (which only
have a paravirtualized GPU), covering both the OncePerProcess and legacy
functional() code paths via Julia 1.10 and 1.12. Pass --all from the
buildkite CI so the GPUArrays test suite keeps running on real hardware.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Christian Guinard <28689358+christiangnrd@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants