Try to support virtual devices#789
Open
maleadt wants to merge 10 commits into
Open
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #789 +/- ##
==========================================
+ Coverage 80.96% 81.05% +0.09%
==========================================
Files 64 64
Lines 3057 3083 +26
==========================================
+ Hits 2475 2499 +24
- Misses 582 584 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
There was a problem hiding this comment.
Metal Benchmarks
Details
| Benchmark suite | Current: 6fe48f0 | Previous: 3b6c586 | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
759875 ns |
820187.5 ns |
0.93 |
array/accumulate/Float32/dims=1 |
978875 ns |
1001416.5 ns |
0.98 |
array/accumulate/Float32/dims=1L |
10410916 ns |
10577334 ns |
0.98 |
array/accumulate/Float32/dims=2 |
1266458.5 ns |
1279958 ns |
0.99 |
array/accumulate/Float32/dims=2L |
4744958.5 ns |
4806791.5 ns |
0.99 |
array/accumulate/Int64/1d |
960583.5 ns |
982083 ns |
0.98 |
array/accumulate/Int64/dims=1 |
1109792 ns |
1116021 ns |
0.99 |
array/accumulate/Int64/dims=1L |
12286895.5 ns |
12565083 ns |
0.98 |
array/accumulate/Int64/dims=2 |
1463709 ns |
1466354.5 ns |
1.00 |
array/accumulate/Int64/dims=2L |
9376812.5 ns |
8937708 ns |
1.05 |
array/broadcast |
331833 ns |
329583 ns |
1.01 |
array/construct |
6041 ns |
5625 ns |
1.07 |
array/permutedims/2d |
624937.5 ns |
628208.5 ns |
0.99 |
array/permutedims/3d |
1127125 ns |
1128667 ns |
1.00 |
array/permutedims/4d |
1359917 ns |
1358834 ns |
1.00 |
array/private/copy |
352229.5 ns |
365313 ns |
0.96 |
array/private/copyto!/cpu_to_gpu |
244458 ns |
240750.5 ns |
1.02 |
array/private/copyto!/gpu_to_cpu |
283458.5 ns |
234125 ns |
1.21 |
array/private/copyto!/gpu_to_gpu |
264937.5 ns |
255375 ns |
1.04 |
array/private/iteration/findall/bool |
1105125 ns |
1128729 ns |
0.98 |
array/private/iteration/findall/int |
1218729 ns |
1291770.5 ns |
0.94 |
array/private/iteration/findfirst/bool |
1258750 ns |
1245584 ns |
1.01 |
array/private/iteration/findfirst/int |
1298542 ns |
1276500 ns |
1.02 |
array/private/iteration/findmin/1d |
1381125 ns |
1432875 ns |
0.96 |
array/private/iteration/findmin/2d |
1172166.5 ns |
1181438 ns |
0.99 |
array/private/iteration/logical |
1727209 ns |
1770770.5 ns |
0.98 |
array/private/iteration/scalar |
1715875 ns |
1476917 ns |
1.16 |
array/random/rand/Float32 |
564792 ns |
600333 ns |
0.94 |
array/random/rand/Int64 |
660000 ns |
648792 ns |
1.02 |
array/random/rand!/Float32 |
507292 ns |
521666 ns |
0.97 |
array/random/rand!/Int64 |
486541 ns |
481750 ns |
1.01 |
array/random/randn/Float32 |
565333 ns |
568458 ns |
0.99 |
array/random/randn!/Float32 |
478875 ns |
484104 ns |
0.99 |
array/reductions/mapreduce/Float32/1d |
535625 ns |
498833 ns |
1.07 |
array/reductions/mapreduce/Float32/dims=1 |
459750 ns |
461979.5 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
716333 ns |
721625 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=2 |
461750 ns |
466083.5 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=2L |
1120458 ns |
1094312.5 ns |
1.02 |
array/reductions/mapreduce/Int64/1d |
777750 ns |
795687 ns |
0.98 |
array/reductions/mapreduce/Int64/dims=1 |
770875 ns |
773437.5 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1L |
1211083 ns |
1222229.5 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=2 |
967667 ns |
925458 ns |
1.05 |
array/reductions/mapreduce/Int64/dims=2L |
2260375 ns |
2261708 ns |
1.00 |
array/reductions/reduce/Float32/1d |
494583 ns |
496459 ns |
1.00 |
array/reductions/reduce/Float32/dims=1 |
457854.5 ns |
463333 ns |
0.99 |
array/reductions/reduce/Float32/dims=1L |
715792 ns |
719875 ns |
0.99 |
array/reductions/reduce/Float32/dims=2 |
463229.5 ns |
464875 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
1116083 ns |
1096729.5 ns |
1.02 |
array/reductions/reduce/Int64/1d |
804250 ns |
787958 ns |
1.02 |
array/reductions/reduce/Int64/dims=1 |
763791.5 ns |
767125 ns |
1.00 |
array/reductions/reduce/Int64/dims=1L |
1250625 ns |
1224375 ns |
1.02 |
array/reductions/reduce/Int64/dims=2 |
953667 ns |
961167 ns |
0.99 |
array/reductions/reduce/Int64/dims=2L |
2262625 ns |
2264104.5 ns |
1.00 |
array/shared/copy |
167750 ns |
166458 ns |
1.01 |
array/shared/copyto!/cpu_to_gpu |
48000 ns |
47709 ns |
1.01 |
array/shared/copyto!/gpu_to_cpu |
41166 ns |
43500 ns |
0.95 |
array/shared/copyto!/gpu_to_gpu |
48708 ns |
48333 ns |
1.01 |
array/shared/iteration/findall/bool |
1110625 ns |
1134875 ns |
0.98 |
array/shared/iteration/findall/int |
1219000 ns |
1293166.5 ns |
0.94 |
array/shared/iteration/findfirst/bool |
1026792 ns |
1033333 ns |
0.99 |
array/shared/iteration/findfirst/int |
1070688 ns |
1081708.5 ns |
0.99 |
array/shared/iteration/findmin/1d |
1196875 ns |
1201542 ns |
1.00 |
array/shared/iteration/findmin/2d |
1174333 ns |
1177146 ns |
1.00 |
array/shared/iteration/logical |
1577646 ns |
1625834 ns |
0.97 |
array/shared/iteration/scalar |
8375 ns |
5951.333333333333 ns |
1.41 |
integration/byval/reference |
1175041.5 ns |
1174791 ns |
1.00 |
integration/byval/slices=1 |
1177458 ns |
1178750 ns |
1.00 |
integration/byval/slices=2 |
2122833 ns |
2125604 ns |
1.00 |
integration/byval/slices=3 |
21131854.5 ns |
19836667 ns |
1.07 |
integration/metaldevrt |
458041.5 ns |
458584 ns |
1.00 |
kernel/indexing |
325666 ns |
302000 ns |
1.08 |
kernel/indexing_checked |
313542 ns |
325084 ns |
0.96 |
kernel/launch |
13208 ns |
14125 ns |
0.94 |
kernel/rand |
271250 ns |
347958 ns |
0.78 |
latency/import |
1403534666.5 ns |
1405863458.5 ns |
1.00 |
latency/precompile |
29725972833 ns |
30266521250 ns |
0.98 |
latency/ttfp |
1677229520.5 ns |
1681368375.5 ns |
1.00 |
metal/synchronization/context |
1037.5 ns |
837.6436781609195 ns |
1.24 |
metal/synchronization/stream |
665.6335403726708 ns |
427.5527638190955 ns |
1.56 |
This comment was automatically generated by workflow using github-action-benchmark.
macOS VMs expose an "Apple Paravirtual device" that is backed by real Apple Silicon and supports Metal 3, but under-reports its capabilities via supportsFamily:, claiming to lack MTLGPUFamilyApple7/Metal3. This made functional() return false, so KernelAbstractions.functional() was false and the misleading "non-virtualized only" warnings fired. Detect such devices (is_virtual), treat them as functional, replace the warnings with a single best-effort notice, and make serial_mapreduce_threshold robust to the unavailable GPU core count (which otherwise degenerated the reduction heuristic to always-serial). Refs #309. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the Metal shading language / AIR / metallib versions, and per-device capabilities (highest Apple and Metal feature set, virtualization). This makes it easy to spot when a device -- e.g. a future virtualized GPU -- gains support for a newer feature set such as Metal 4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Run the tests on paravirtualized GPUs instead of skipping them, and mark the few behaviors they genuinely don't support (event device back-reference, residency sets, GPU core count) as broken/skipped. Gate the large, slow GPUArrays test suite behind a new --all flag (after CUDA.jl) so it can be skipped on the best-effort virtualized GitHub Actions CI while still running on the real-hardware buildkite CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Run the core tests on GitHub-hosted Apple Silicon runners (which only have a paravirtualized GPU), covering both the OncePerProcess and legacy functional() code paths via Julia 1.10 and 1.12. Pass --all from the buildkite CI so the GPUArrays test suite keeps running on real hardware. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Christian Guinard <28689358+christiangnrd@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Another stab at #309 -- things seem to work locally at least on a virtual macOS 15 on my M3 Pro.