Skip to content

v0.3.0 crashes with latest CUDA_Compiler_jll #243

@AntonOresten

Description

@AntonOresten

CUDA_Compiler_jll v0.4.4 (fails)

Click to view logs
                             │   Test   │ ──────────────── CPU ──────────────── │
Test                (Worker) │ time (s) │ GC (s) │ GC % │ Alloc (MB) │ RSS (MB) │
codegen/assume          (16) │     4.76 │   0.30 │  6.3 │     602.11 │  1220.62 │
device/gather_scatter   (17) │     6.01 │   failed at 2026-05-30T14:46:10.253
codegen/reflection      (19) │     6.31 │   0.55 │  8.8 │     787.81 │  1220.62 │
device/control_flow     (11) │     7.34 │   failed at 2026-05-30T14:46:12.197
codegen/rng_intrinsics  (16) │     2.89 │   0.04 │  1.3 │     312.60 │  1220.62 │
examples/vadd           (11) │     1.25 │   failed at 2026-05-30T14:46:14.242
codegen/cse             (17) │     3.54 │   0.06 │  1.6 │     525.42 │  1500.14 │
device/types            (20) │    13.28 │   failed at 2026-05-30T14:46:17.623
examples/softmax        (19) │     7.25 │   failed at 2026-05-30T14:46:19.055
examples/batchmatmul    (16) │     5.94 │   failed at 2026-05-30T14:46:19.563
device/math             (18) │    14.77 │   failed at 2026-05-30T14:46:19.774
host/broadcast          (12) │    15.28 │   failed at 2026-05-30T14:46:20.281
examples/matmul         (17) │     5.23 │   failed at 2026-05-30T14:46:20.585
examples/fmha           (15) │    16.92 │   failed at 2026-05-30T14:46:21.095
codegen/kernel_state    (17) │     0.28 │   0.00 │  0.0 │      47.77 │  1566.61 │
ext/DLFP8TypesExt       (11) │     6.58 │   0.03 │  0.4 │     366.58 │  1539.62 │
device/slice            (19) │     2.46 │   failed at 2026-05-30T14:46:22.129
device/hints            (13) │    16.91 │   failed at 2026-05-30T14:46:22.547
codegen/slice           (20) │     4.13 │   0.06 │  1.4 │     600.78 │  1550.84 │
examples/transpose      (12) │     1.56 │   failed at 2026-05-30T14:46:22.752
device/views            (16) │     2.62 │   failed at 2026-05-30T14:46:22.958
codegen/bounds          (11) │     0.79 │   0.00 │  0.0 │     101.62 │  1539.62 │
device/print            (17) │     0.98 │   failed at 2026-05-30T14:46:23.161
device/integration      (19) │     0.46 │   failed at 2026-05-30T14:46:23.263
types                   (16) │     0.51 │   0.00 │  0.0 │      42.23 │  1546.50 │
device/kernel_state     (11) │     0.42 │   failed at 2026-05-30T14:46:24.378
codegen/views           (15) │     2.29 │   0.03 │  1.2 │     323.52 │  1510.98 │
analysis/dataflow       (12) │     0.85 │   0.00 │  0.0 │      46.16 │  1553.40 │
device/broadcast         (8) │    19.81 │   failed at 2026-05-30T14:46:24.985
host/cache              (20) │     1.55 │   0.00 │  0.0 │     176.50 │  1577.55 │
codegen/no_wrap         (13) │     3.43 │   0.04 │  1.3 │     421.16 │  1543.97 │
codegen/fpmode          (18) │     6.09 │   0.10 │  1.6 │     618.01 │  1547.52 │
examples/moe            (14) │    22.68 │   failed at 2026-05-30T14:46:27.818
examples/layernorm       (2) │    23.23 │   failed at 2026-05-30T14:46:28.326
codegen/integration      (9) │    22.98 │   1.00 │  4.3 │    3010.83 │  1270.38 │
device/atomics          (10) │    23.65 │   failed at 2026-05-30T14:46:29.137
device/reductions        (1) │    25.70 │   failed at 2026-05-30T14:46:29.238
host/mapreduce           (4) │    24.44 │   failed at 2026-05-30T14:46:29.340
device/core              (7) │    27.89 │   failed at 2026-05-30T14:46:32.174
device/tile              (5) │    27.52 │   failed at 2026-05-30T14:46:32.579
examples/fft            (17) │     8.95 │   failed at 2026-05-30T14:46:32.883
codegen/operations       (6) │    39.48 │   1.58 │  4.0 │    6231.41 │  1346.41 │

CUDA_Compiler_jll v0.4.3 (passes)

Click to view logs
                             │   Test   │ ──────────────── CPU ──────────────── │
Test                (Worker) │ time (s) │ GC (s) │ GC % │ Alloc (MB) │ RSS (MB) │
device/types            (17) │    12.53 │   0.61 │  4.8 │     927.67 │  1499.96 │
device/control_flow      (5) │    13.42 │   0.68 │  5.1 │     847.60 │  1378.12 │
device/hints             (4) │    13.35 │   0.68 │  5.1 │     932.22 │  1461.20 │
examples/softmax        (18) │    15.23 │   1.36 │  8.9 │    2205.54 │  1641.98 │
examples/batchmatmul    (20) │    14.33 │   0.77 │  5.4 │    1274.81 │  1537.26 │
device/math             (19) │    15.47 │   0.72 │  4.6 │     846.00 │  1492.10 │
device/gather_scatter    (4) │     2.75 │   0.00 │  0.0 │     150.98 │  1478.85 │
codegen/rng_intrinsics  (18) │     2.88 │   0.02 │  0.8 │     396.37 │  1641.98 │
host/broadcast          (16) │    19.22 │   0.67 │  3.5 │    1230.98 │  1480.13 │
ext/DLFP8TypesExt       (20) │     5.27 │   0.03 │  0.5 │     295.14 │  1537.26 │
codegen/reflection       (5) │     6.64 │   0.11 │  1.6 │     841.72 │  1465.82 │
device/atomics          (15) │    21.14 │   1.75 │  8.3 │    2660.34 │  1606.46 │
codegen/cse             (19) │     5.09 │   0.07 │  1.3 │     557.88 │  1539.04 │
examples/matmul         (18) │     3.59 │   0.03 │  0.9 │     346.61 │  1641.98 │
examples/vadd           (17) │     8.12 │   0.47 │  5.8 │     977.50 │  1659.45 │
device/integration      (20) │     1.43 │   0.05 │  3.3 │     233.13 │  1612.38 │
codegen/views            (5) │     1.12 │   0.00 │  0.0 │     175.47 │  1465.82 │
codegen/kernel_state    (19) │     0.33 │   0.00 │  0.0 │      48.11 │  1539.04 │
codegen/assume           (4) │     5.50 │   0.07 │  1.3 │     686.51 │  1553.53 │
examples/transpose      (18) │     1.09 │   0.00 │  0.0 │     159.67 │  1641.98 │
host/cache              (17) │     1.42 │   0.00 │  0.0 │     176.52 │  1659.45 │
device/print             (5) │     1.52 │   0.00 │  0.0 │     129.78 │  1465.82 │
codegen/fpmode          (16) │     5.52 │   0.08 │  1.4 │     613.49 │  1536.75 │
codegen/bounds          (18) │     0.78 │   0.00 │  0.0 │      81.24 │  1641.98 │
device/slice            (20) │     2.64 │   0.00 │  0.0 │     193.34 │  1612.38 │
device/kernel_state      (5) │     0.38 │   0.00 │  0.0 │      70.60 │  1465.82 │
codegen/slice           (19) │     2.03 │   0.03 │  1.5 │     276.53 │  1553.02 │
analysis/dataflow       (17) │     0.68 │   0.00 │  0.0 │      46.15 │  1659.45 │
types                   (16) │     0.73 │   0.00 │  0.0 │      59.56 │  1536.75 │
device/views             (4) │     2.57 │   0.00 │  0.0 │     210.22 │  1556.88 │
codegen/no_wrap         (15) │     4.06 │   0.03 │  0.7 │     396.75 │  1623.64 │
examples/fmha           (13) │    27.72 │   0.95 │  3.4 │    2230.41 │  1591.43 │
codegen/integration     (14) │    27.83 │   1.09 │  3.9 │    3010.59 │  1275.29 │
examples/layernorm      (12) │    27.86 │   1.05 │  3.8 │    2387.82 │  1616.07 │
device/broadcast        (11) │    29.71 │   1.14 │  3.8 │    2081.63 │  1573.37 │
examples/moe            (10) │    32.43 │   2.62 │  8.1 │    9459.05 │  3637.04 │
device/core              (8) │    36.33 │   1.06 │  2.9 │    2218.43 │  1553.43 │
host/mapreduce           (3) │    37.21 │   1.63 │  4.4 │    4641.74 │  1770.61 │
device/tile              (9) │    39.70 │   1.34 │  3.4 │    3442.91 │  1618.11 │
examples/fft             (2) │    40.78 │   0.85 │  2.1 │    1478.96 │  1493.23 │
codegen/operations       (7) │    44.53 │   1.88 │  4.2 │    6230.20 │  1350.51 │
device/reductions        (6) │    50.07 │   2.20 │  4.4 │    6696.56 │  1679.93 │

Tested on cuTile.jl v0.3.0.

Seems to be fine regardless of CUDA_Tile_jll version.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions