[CI] Add rocm-examples test (in-pipeline HIP runtime, CMake + bash hybrid)#208
[CI] Add rocm-examples test (in-pipeline HIP runtime, CMake + bash hybrid)#208lamb-j wants to merge 24 commits into
Conversation
Adds a test_rocm_examples job that compiles ROCm/rocm-examples HIP-Basic/hello_world via PR clang with `--offload-arch=amdgcnspirv` and runs the resulting binary on a gfx942 runner. This exercises the SPIRV translator end-to-end through real HIP compile + JIT kernel load, catching translator/codegen bugs that pass lit but fail at runtime. Build job changes: - After Comgr build, install comgr + device-libs into a staging/ tree via cmake --install. clang's --rocm-path expects installed-tree layout, not build-dir layout. - Clone ROCm/rocm-systems (pinned) and build rocr-runtime + clr (libhsa-runtime64.so + libamdhip64.so) against the just-built clang, installing into the same staging/ tree. - Clone ROCm/rocm-cmake (pinned) for the ROCM cmake modules CLR needs. - Disable kpack, rocprofiler-register, and host-hipcc detection in CLR (they're optional and not part of the SPIRV CI scope). - Strip and tar steps extended to include staging/. Why in-pipeline runtime build (vs prebuilt nightly): TheRock's public nightly-tarball bucket only publishes consumer (dgpu) families on a daily cadence; gfx94X / gfx950 (dcgpu) tarballs stopped Oct 2025, and TheRock's own multi-arch CI for those families builds the runtime in-pipeline for the same reason. Building in-pipeline also ensures the runtime tracks the same llvm/clang amd-staging tip we're testing. Test job: - runs-on: linux-gfx942-1gpu-ossci-rocm (matches existing TheRock multi-arch CI runner choice). - Compiles hello_world via PR clang + --rocm-path=$STAGING; explicit -lhsa-runtime64 needed since it's not pulled transitively from libamdhip64.so during exe link. - Runs binary with LD_LIBRARY_PATH=$STAGING/lib. - Greps for fixed host + device greeting lines. - Informational initially; promote to required once stable. Pinned SHAs (bump deliberately, ~weekly): - ROCm/rocm-systems: 8bb3b73c (develop, 2026-05-15) - ROCm/rocm-cmake: 4d391d7b (develop, 2026-05-12) - ROCm/rocm-examples: b4ee9992 (develop, 2025-02-21) Estimated wall-time impact: existing build ~14 min + ~7-10 min new (rocr ~2 min, clr ~5-8 min) -> ~22-25 min total. Inside 120-min timeout. Test job adds one GPU-runner job, runs ~1-2 min.
Mirrors the SPIRV CI structure on ROCm/SPIRV-LLVM-Translator
amd-staging: a top-level dispatcher (spirv-ci.yml) that calls per-
platform reusable workflows (spirv-ci-linux.yml, spirv-ci-windows.yml).
Each platform builds LLVM/Clang/LLD + amd-llvm-spirv translator +
device-libs + Comgr in one job, uploads the build trees as a single
artifact, and fans out to parallel test jobs that consume it:
Linux::release / Build (required)
Linux::release / Test SPIRV translator lit
Linux::release / Test LLVM SPIRV codegen
Linux::release / Test Comgr
Windows::release / Build
Windows::release / Test SPIRV translator lit
Windows::release / Test LLVM SPIRV codegen
Windows::release / Test Comgr
The translator-lit test job re-runs lit twice (PR head + amd-staging
baseline) and gates on new-failures-only — Khronos upstream breaks
~1 translator lit test per week, so without baseline-diffing the
suite would block unrelated PRs on those days. Pre-existing baseline
failures don't gate.
Goal: catch breakage in compiler / SPIRV translator that would fail
downstream Comgr / HIP testing without paying the cost of a full
TheRock build. Test categories will grow over time (rocm-examples,
HIP-tests SPIRV mode, etc.) — they slot in as new jobs in the
per-platform workflows.
This is the llvm-project mirror of the workflows on
ROCm/SPIRV-LLVM-Translator amd-staging. Two intentional divergences
from the translator copy:
1. No sticky PR comment on the translator-lit test job. The
translator copy posts a per-PR comment partitioning translator-
lit failures into new / fixed / pre-existing. On llvm-project
PRs (which mostly touch llvm/clang/lld, not the SPIRV translator)
that comment would just be noise. The failing-check signal in
the PR header is enough; the gate step still fails the job on
real regressions and prints the failing test list to the log.
pull-requests: write permission dropped accordingly.
2. No rocm-examples test job (yet). That work is under review on
the translator side first (ROCm/SPIRV-LLVM-Translator#208); it
will be brought over here once stable.
Replaces 13 incremental prototype commits with a single squash so the
diff matches the current intended state. The translator side is the
source of truth for these workflows going forward; bumps from there
will be replicated here.
ROCR-Runtime's image library compiles OpenCL "blit" bitcode kernels for ~30 gfx targets via add_custom_command(... clang ...). Those clang invocations don't get --rocm-path forwarded; in the manylinux build container there's no /opt/rocm default to fall back to, so device-libs aren't found. Setting ROCM_PATH=$PWD/staging points clang's auto-detection at the staging tree where device-libs were installed in the previous step.
clang-linker-wrapper does its own library lookup before invoking the host linker; it doesn't pick up -L flags from the clang driver, so -lamdhip64 fails to resolve on the GPU runner where there's no system ROCm. Set LIBRARY_PATH=$STAGING/lib so the wrapper finds it in the same tree the host linker is searching. Add -Wl,-rpath,$STAGING/lib so the binary can dlopen the runtime libs without depending on LD_LIBRARY_PATH being set at exec time.
Helps diagnose why clang-linker-wrapper can't find -lamdhip64 even with LIBRARY_PATH set. Temporary; revert once we know the fix.
Mirrors the SPIRV CI structure on ROCm/SPIRV-LLVM-Translator
amd-staging: a top-level dispatcher (spirv-ci.yml) that calls per-
platform reusable workflows (spirv-ci-linux.yml, spirv-ci-windows.yml).
Each platform builds LLVM/Clang/LLD + amd-llvm-spirv translator +
device-libs + Comgr in one job, uploads the build trees as a single
artifact, and fans out to parallel test jobs that consume it:
Linux::release / Build (required)
Linux::release / Test SPIRV translator lit
Linux::release / Test LLVM SPIRV codegen
Linux::release / Test Comgr
Windows::release / Build
Windows::release / Test SPIRV translator lit
Windows::release / Test LLVM SPIRV codegen
Windows::release / Test Comgr
The translator-lit test job re-runs lit twice (PR head + amd-staging
baseline) and gates on new-failures-only — Khronos upstream breaks
~1 translator lit test per week, so without baseline-diffing the
suite would block unrelated PRs on those days. Pre-existing baseline
failures don't gate.
Goal: catch breakage in compiler / SPIRV translator that would fail
downstream Comgr / HIP testing without paying the cost of a full
TheRock build. Test categories will grow over time (rocm-examples,
HIP-tests SPIRV mode, etc.) — they slot in as new jobs in the
per-platform workflows.
This is the llvm-project mirror of the workflows on
ROCm/SPIRV-LLVM-Translator amd-staging. Two intentional divergences
from the translator copy:
1. No sticky PR comment on the translator-lit test job. The
translator copy posts a per-PR comment partitioning translator-
lit failures into new / fixed / pre-existing. On llvm-project
PRs (which mostly touch llvm/clang/lld, not the SPIRV translator)
that comment would just be noise. The failing-check signal in
the PR header is enough; the gate step still fails the job on
real regressions and prints the failing test list to the log.
pull-requests: write permission dropped accordingly.
2. No rocm-examples test job (yet). That work is under review on
the translator side first (ROCm/SPIRV-LLVM-Translator#208); it
will be brought over here once stable.
Replaces 13 incremental prototype commits with a single squash so the
diff matches the current intended state. The translator side is the
source of truth for these workflows going forward; bumps from there
will be replicated here.
CMake's GNUInstallDirs picks lib64 on the manylinux2014 container (64-bit Red Hat heritage). clang's --rocm-path detection looks for libraries at <prefix>/lib, so libamdhip64.so + libhsa-runtime64.so landed where the HIP driver couldn't see them. Set CMAKE_INSTALL_LIBDIR=lib on both ROCR-Runtime and CLR configures to force the layout clang expects. Also drop the debug step and LIBRARY_PATH workaround now that the real cause is identified.
|
@lamb-j looks like the test itself is yielding unexpected output? |
|
Do we know of any issues with device printf and amdgcnspirv? Maybe we should use a different test to prototype It's an issue with the build config, re-testing |
Helps determine whether the kernel launches and whether sync returns cleanly when the device printf output is missing. Temporary.
HIP runtime calls Comgr at hipLaunchKernel time to translate the embedded SPIR-V fatbin to AMDGPU code for the actual GPU. Comgr searches for the translator at <libamd_comgr.so>/../bin/amd-llvm-spirv. The translator binary is built in build/bin/ but wasn't being copied into staging/, so Comgr couldn't find it at JIT time, hipLaunchKernel returned hipErrorInvalidImage, and the kernel silently never ran. hipDeviceSynchronize then returned success (nothing to sync) so the binary exited cleanly with only the host-side print output. Copy amd-llvm-spirv + the llvm-spirv symlink into staging/bin/ after the CLR install step.
Comgr looks for amd-llvm-spirv by name; the llvm-spirv symlink is for downstream tools that expect the upstream name and isn't needed for the HIP runtime JIT path.
Mirrors the SPIRV CI structure on ROCm/SPIRV-LLVM-Translator
amd-staging: a top-level dispatcher (spirv-ci.yml) that calls per-
platform reusable workflows (spirv-ci-linux.yml, spirv-ci-windows.yml).
Each platform builds LLVM/Clang/LLD + amd-llvm-spirv translator +
device-libs + Comgr in one job, uploads the build trees as a single
artifact, and fans out to parallel test jobs that consume it:
Linux::release / Build (required)
Linux::release / Test SPIRV translator lit
Linux::release / Test LLVM SPIRV codegen
Linux::release / Test Comgr
Windows::release / Build
Windows::release / Test SPIRV translator lit
Windows::release / Test LLVM SPIRV codegen
Windows::release / Test Comgr
The translator-lit test job re-runs lit twice (PR head + amd-staging
baseline) and gates on new-failures-only — Khronos upstream breaks
~1 translator lit test per week, so without baseline-diffing the
suite would block unrelated PRs on those days. Pre-existing baseline
failures don't gate.
Goal: catch breakage in compiler / SPIRV translator that would fail
downstream Comgr / HIP testing without paying the cost of a full
TheRock build. Test categories will grow over time (rocm-examples,
HIP-tests SPIRV mode, etc.) — they slot in as new jobs in the
per-platform workflows.
This is the llvm-project mirror of the workflows on
ROCm/SPIRV-LLVM-Translator amd-staging. Two intentional divergences
from the translator copy:
1. No sticky PR comment on the translator-lit test job. The
translator copy posts a per-PR comment partitioning translator-
lit failures into new / fixed / pre-existing. On llvm-project
PRs (which mostly touch llvm/clang/lld, not the SPIRV translator)
that comment would just be noise. The failing-check signal in
the PR header is enough; the gate step still fails the job on
real regressions and prints the failing test list to the log.
pull-requests: write permission dropped accordingly.
2. No rocm-examples test job (yet). That work is under review on
the translator side first (ROCm/SPIRV-LLVM-Translator#208); it
will be brought over here once stable.
Replaces 13 incremental prototype commits with a single squash so the
diff matches the current intended state. The translator side is the
source of truth for these workflows going forward; bumps from there
will be replicated here.
Comgr's 'Failed to compile spirv to reloc' is opaque — the underlying error stays buried unless logs are redirected to stderr. Also list staging/bin to confirm the translator is actually there at runtime.
AMD_COMGR_REDIRECT_LOGS alone surfaced no extra detail for the 'Failed to compile spirv to reloc' failure. Add: - ROCM_PATH=$STAGING so Comgr finds device-libs in staging instead of falling back to /opt/rocm - AMD_COMGR_SAVE_TEMPS=/tmp/comgr to dump intermediate files showing how far the SPIR-V → reloc pipeline got - Post-run listing of /tmp/comgr/ contents
Match the working invocation from ROCm/RocmCIForSPIRV spirv_external_llvm.sh. The critical addition is --offload-new-driver, which routes through the HIP offload codepath that produces a fatbin HIP runtime can JIT for amdgcnspirv. Without it the runtime hits "Failed to compile spirv to reloc" because the bundle format doesn't match what hip_fatbin.cpp expects. Also added: - --rocm-device-lib-path explicit (auto-detect doesn't always find the staged amdgcn/bitcode dir) - -D__HIP_PLATFORM_AMD__ to match - -Wl,--unresolved-symbols=ignore-in-shared-libs (libamd_comgr embeds its own LLVM symbols)
HIP runtime reaches Comgr and Comgr's COMPILE_SPIRV_TO_RELOCATABLE action fails. Need to verify what's actually in the binary's fatbin section — SPIR-V with the right target string, or something else (LLVM IR? Wrong target name?). Add a step that lists bundle entries via clang-offload-bundler.
Global env var forces Comgr to log its internal actions without HIP needing to enable per-action logging. Targets the silent AMD_COMGR_ACTION_COMPILE_SPIRV_TO_RELOCATABLE failure at JIT time.
The gfx942 ossci-rocm runner is persistent (no container), so state from prior runs can leak in. Two things to investigate: - ~/.cache/comgr / ~/.amd/comgr — Comgr's JIT cache. A cached failed compile from a prior run could reproduce the same failure even after we've fixed the inputs. RocmCIForSPIRV's clean_all explicitly wipes these for the same reason. - /opt/rocm — if a system ROCm install is present, HIP runtime could be silently resolving against it instead of (or alongside) our staged tree. Add a probe step that lists both, then wipes the Comgr caches before the Run step.
Probe step now shows which libamdhip64 + libamd_comgr are actually resolving at runtime, and the rpath/runpath / NEEDED entries of our staged libs. /opt/rocm-6.4.1 exists on the runner — if the staged libs end up pulling Comgr from there, our COMPILE_SPIRV_TO_RELOCATABLE call lands in a stale Comgr API.
## Summary Adds SPIRV-focused PR CI for `ROCm/llvm-project` `amd-staging`, mirroring the structure that landed on `ROCm/SPIRV-LLVM-Translator` `amd-staging`. Each platform builds LLVM/Clang/translator/Comgr in one job and fans out to parallel test jobs that consume a GHA-artifact build tree. Catches translator/Comgr breakage without paying the cost of a full TheRock build. ## New checks on PRs ``` SPIRV Compiler CI - amd-staging / Linux::release / Build SPIRV Compiler CI - amd-staging / Linux::release / Test SPIRV translator lit SPIRV Compiler CI - amd-staging / Linux::release / Test LLVM SPIRV codegen SPIRV Compiler CI - amd-staging / Linux::release / Test Comgr SPIRV Compiler CI - amd-staging / Windows::release / Build SPIRV Compiler CI - amd-staging / Windows::release / Test SPIRV translator lit SPIRV Compiler CI - amd-staging / Windows::release / Test LLVM SPIRV codegen SPIRV Compiler CI - amd-staging / Windows::release / Test Comgr ``` All checks are informational initially; promote individually to required as each stabilizes. ## Differences from the translator copy Two intentional divergences from the workflows on ROCm/SPIRV-LLVM-Translator `amd-staging`: 1. **No sticky PR comment from the translator-lit test job.** The translator copy posts a per-PR comment partitioning translator-lit failures into new / fixed / pre-existing. On llvm-project PRs (which mostly touch llvm/clang/lld, not the SPIRV translator) that comment would just be noise — the failing check in the PR header is enough signal. The gate step still fails the job on real regressions and prints the failing test list to the log. `pull-requests: write` permission dropped accordingly. 2. **No `rocm-examples` test job (yet).** That work is under review on the translator side first as ROCm/SPIRV-LLVM-Translator#208 and will be brought over here once stable.
The dynamic loader was resolving libamd_comgr.so.3 from /opt/rocm-6.4.1/lib (a system ROCm install on the GPU runner) instead of from our staged tree, because Comgr's install rules apparently don't always emit the libamd_comgr.so.<MAJOR> SONAME symlink in staging/lib. Without that symlink, LD_LIBRARY_PATH=staging/lib has nothing to match against and falls through to ld.so.cache, which hands back the old system Comgr. End result: our libamdhip64 (which calls the new AMD_COMGR_ACTION_COMPILE_SPIRV_TO_RELOCATABLE) ends up linked against 6.4.1's libamd_comgr that doesn't recognize that action — JIT fails with INVALID_ARGUMENT. Backfill the SONAME + dev symlinks after install, and log the lib directory before/after so we can confirm what's there.
Comgr's install was landing in staging/lib64 because the manylinux container's GNUInstallDirs defaults to lib64. Even though the symlink chain (libamd_comgr.so / .so.3 / .so.3.3.0) was created, it was in the wrong directory — LD_LIBRARY_PATH=staging/lib had no comgr to match, so the runtime resolver fell through to /opt/rocm-6.4.1's older Comgr (with a stale API that doesn't recognize COMPILE_SPIRV_TO_RELOCATABLE). Add -DCMAKE_INSTALL_LIBDIR=lib to Comgr's configure (matching what we already did for ROCR + CLR). Revert the buggy SONAME backfill — no longer needed once Comgr installs to the right place.
Replace the single hello-world compile+run with loop-based steps covering the same SPIR-V-eligible example sets that ROCm/RocmCIForSPIRV's spirv_external_llvm.sh builds: - SIMPLE_EXAMPLES (16): HIP-Basic single-source examples - APPLICATIONS (6): HIP-Basic + External/ include path - REDUCTION (9): Tutorials/reduction v1-v9 (c++20) Excluded examples (opengl/vulkan interop, hello_world_cuda, hipify, assembly_to_executable, llvm_ir_to_executable, sobel_filter) either need libraries we don't have or use native-AMDGPU code paths incompatible with SPIR-V. Verification: gate on exit code 0 per example (catches crashes, hipError returns, asserts). 60s timeout per example to avoid hangs. Per-example pass/fail reported in step output. Bumped rocm-examples pin from b4ee9992 (develop, 2025-02) to e260595e (amd-staging, 2026-05-28) — develop hadn't moved since Feb, while amd-staging tracks the active examples we want to test against. Bumped job timeout 20m -> 45m to fit 31 examples (~30s build each + ~few seconds run each + GPU runner queue time).
Brings rocm-examples coverage to 34 — full parity with what RocmCIForSPIRV's spirv_external_llvm.sh builds, modulo multi_gpu_data_transfer (needs >1 GPU; this runner is 1gpu) and the math-libs-dependent tier (rocPRIM, hipCUB, monte_carlo_pi). bandwidth + matrix_multiplication live in HIP-Basic but need -I External (new EXTERNAL_EXAMPLES bucket). runtime_compilation needs -lhiprtc at link time; staged as a one-off.
|
Thank you for your work, Same as Dmitry's scripts, this PR uses raw clang commands to build some subset of rocm examples. The Examples project has cmake, and cmake is default way to build all apps. The Examples project also have ctest. In my opinion, we should check the default build & test way instead of building each app one by one with our own scripts. I find suggested solution unstable and will be broken the next time any of examples changed. This approach will increase CI support cost - we will need to follow changes that made in another repo and adjust our CI with such changes |
- Bump actions/upload-artifact v4 -> v7, download-artifact v4 -> v8
(reviewer note: v4 is being deprecated June 2; v7/v8 are current).
- Remove the two diagnostic steps from test_rocm_examples that we
added while debugging the JIT failure:
* "Dump fatbin contents" (llvm-objdump / clang-offload-bundler
inspection of hello_world's fatbin)
* "Probe runner state + clean caches" (ldd, readelf, /opt/rocm
listing, comgr cache wipe)
These caught the lib64 vs lib mismatch and the libamd_comgr SONAME
fallthrough to /opt/rocm; both are fixed now (CMAKE_INSTALL_LIBDIR=lib
on Comgr install), so the probes have served their purpose.
- Drop the small "ls staging/lib/libamd_comgr*" diagnostic at the end
of the install step too.
Net -34 lines.
I like this. Let me see if I can update to do that. I'm wondering if we'd need to update rocm-examples CMake to get what we want (for example the right flags, and the right subset of tests for this use case). But I do agree that it would be more sustainable. |
per reviewer suggestion (idubinov). built_examples.txt only contains names of executables that succeeded compile (BUILD_OK append happens after a successful compile), so the [ -x "./$exe" ] check was redundant — trust the list.
|
@lamb-j , I ve built HIP-Basic apps in clean container using CMake, all apps (excluding hip_cooperative_groups) passed (https://github.com/ROCm/SPIRV-LLVM-Translator/actions/runs/26645427704/job/78533788555). https://github.com/ROCm/SPIRV-LLVM-Translator/pull/216/changes |
Incorporates the good ideas from idubinov's PR #216: Build job: - CMAKE_INSTALL_LIBDIR=lib on LLVM Configure (manylinux defaults to lib64; downstream CMake projects expect lib). - LLVM_ENABLE_RUNTIMES=compiler-rt + COMPILER_RT_BUILD_BUILTINS=ON (all other compiler-rt features off). hip-lang-config.cmake hard-requires libclang_rt.builtins.a for any downstream enable_language(HIP). - builtins added to the ninja build target. - CMAKE_INSTALL_LIBDIR=lib also added to device-libs Configure. Test job: - Now runs inside the same manylinux container as Build with GPU passthrough (--device=/dev/kfd /dev/dri --group-add video). Isolates from /opt/rocm-* and Comgr cache on the persistent host runner; also gives glibc consistency with the built libs. - dnf install numactl-libs (HIP runtime dep). HIP-Basic now built via rocm-examples' own CMakeLists (23 examples) instead of our hand-rolled bash loop (19 examples). The upstream CMake auto-skips SPIR-V-incompatible examples (assembly_to_executable, llvm_ir_to_executable, opengl/vulkan_interop, hello_world_cuda, hipify, sobel_filter) via find_package / GPU_RUNTIME gates. Picks up 3 examples we previously excluded: hip_static_host_library (+ _cxx), hip_module_api, hip_multi_gpu_data_transfer. Applications + Tutorials/reduction stay on the bash loop — their CMakeLists pull in hipcub/rocrand/rocfft find_packages that aren't present in our staged tree. Run step now iterates both examples-build/bin/* (CMake build) and built_examples.txt (bash loop), with hip_cooperative_groups skipped (known s.wait.asynccnt codegen bug). Total coverage: ~32 examples (22 HIP-Basic + 6 Applications + 9 Tutorials/reduction - 1 skipped). Dropped --offload-new-driver (now clang default) and -use-spirv-backend (stick with clang default translator codepath) from Igor's flag set.
It's a real codegen failure ('Cannot select intrinsic
%llvm.amdgcn.s.wait.asynccnt') and the whole point of the
test_rocm_examples job is to keep surfacing this until the AMDGPU
backend gets a lowering pattern. Gating the suite on it is the
correct behavior.
Adds a test_hip_tests job that builds the hip-tests `kernel` catch2 unit suite with -DENABLE_SPIRV=ON and runs it via ctest on a gfx942 GPU runner. Prototype scope: just the kernel unit dir to start; expand to more unit dirs once the SPIR-V baseline is characterized. hip-tests has first-class SPIR-V support: -DENABLE_SPIRV=ON makes its own CMake set --offload-arch=amdgcnspirv and gate out cases that can't run in SPIR-V mode (per-test `if(NOT ENABLE_SPIRV)` blocks). catch2's catch_discover_tests registers each case with ctest, so we just configure -> build KernelTest -> ctest -R Kernel. Reuses the in-pipeline runtime staging tree + manylinux container + GPU passthrough from the rocm-examples job. Same lib/lib64 + libgcc runtime-lib flags. Stacked on the rocm-examples branch (shares the Build job). Will rebase onto amd-staging once #208 lands. Informational initially.
Summary
Adds a
test_rocm_examplesjob tospirv-ci-linux.ymlthat builds and runs a broad set ofROCm/rocm-examplesHIP examples via PR clang with--offload-arch=amdgcnspirvon a gfx942 GPU runner. Exercises the SPIRV translator end-to-end through real HIP compile + JIT kernel load — catches translator/codegen bugs that pass lit but fail at runtime.The Build job picks up an in-pipeline HIP runtime build (rocr-runtime + CLR + Comgr install into a
staging/tree) so PR clang can target it via--rocm-path. No external prebuilt runtime needed — TheRock's public nightly bucket dropped dcgpu families in Oct 2025.New check on PRs
Runs on
linux-gfx942-1gpu-ossci-rocm, inside the same manylinux container the Build job uses with GPU passthrough (--device=/dev/kfd /dev/dri --group-add video). Isolates the test from /opt/rocm and Comgr cache state on the persistent host runner, and gives glibc consistency with the built libs. Informational initially.Example coverage
Two parallel build paths, both gating on exit code 0 per binary:
HIP-Basiccmake -S HIP-Basic -B examples-build+cmake --build. Upstream CMake auto-skips SPIR-V-incompatible examples (assembly_to_executable,llvm_ir_to_executable, opengl/vulkan_interop,hello_world_cuda,hipify,sobel_filter) viafind_package/GPU_RUNTIMEgates.Applications,Tutorials/reductionfind_package(hipcub/rocrand/rocfft)etc., which aren't in our staged tree, so direct clang invocation is simpler than staging math-libs.Excluded from both paths:
rocPRIM,hipCUB,monte_carlo_pi/hipRAND) — need rocm-libraries math-libs not built hereKnown failure (intentionally gating)
hip_cooperative_groupscurrently fails at runtime with:The AMDGPU backend lacks a lowering pattern for
s.wait.asynccntfor the target ISA. The Run step intentionally does not skip it — the whole point oftest_rocm_examplesis to keep surfacing this kind of break in CI until it's fixed. PRs that touch the translator / AMDGPU backend / Comgr should see this red check as a reminder. Remove the test-suite-fails state once the intrinsic gets a pattern.Pinned SHAs (bump deliberately)
ROCm/rocm-systems:8bb3b73c(develop, 2026-05-15)ROCm/rocm-cmake:4d391d7b(develop, 2026-05-12)ROCm/rocm-examples:e260595e(amd-staging, 2026-05-28)Build-job changes worth noting
CMAKE_INSTALL_LIBDIR=libon LLVM Configure + device-libs + Comgr + ROCR + CLR (manylinux defaults tolib64; downstream CMake projects expectlib).LLVM_ENABLE_RUNTIMES=compiler-rt+COMPILER_RT_BUILD_BUILTINS=ON(all other compiler-rt features off):hip-lang-config.cmakehard-requireslibclang_rt.builtins.afor any downstreamenable_language(HIP), which the CMake-driven HIP-Basic build needs.builtinsadded to the ninja build target.Credits
CMake-driven HIP-Basic build + container/GPU-passthrough setup adapted from @idubinov's PR #216 (stacked on this one).