[CI] Add rocm-examples test (in-pipeline HIP runtime, CMake + bash hybrid) by lamb-j · Pull Request #208 · ROCm/SPIRV-LLVM-Translator

lamb-j · 2026-05-18T22:14:34Z

Summary

Adds a test_rocm_examples job to spirv-ci-linux.yml that builds and runs a broad set of ROCm/rocm-examples HIP examples via PR clang with --offload-arch=amdgcnspirv on a gfx942 GPU runner. Exercises the SPIRV translator end-to-end through real HIP compile + JIT kernel load — catches translator/codegen bugs that pass lit but fail at runtime.

The Build job picks up an in-pipeline HIP runtime build (rocr-runtime + CLR + Comgr install into a staging/ tree) so PR clang can target it via --rocm-path. No external prebuilt runtime needed — TheRock's public nightly bucket dropped dcgpu families in Oct 2025.

New check on PRs

SPIRV Compiler CI / Linux::release / Test rocm-examples

Runs on linux-gfx942-1gpu-ossci-rocm, inside the same manylinux container the Build job uses with GPU passthrough (--device=/dev/kfd /dev/dri --group-add video). Isolates the test from /opt/rocm and Comgr cache state on the persistent host runner, and gives glibc consistency with the built libs. Informational initially.

Example coverage

Two parallel build paths, both gating on exit code 0 per binary:

Path	Subdirs	Count	Mechanism
CMake (rocm-examples' own CMakeLists)	`HIP-Basic`	~23	`cmake -S HIP-Basic -B examples-build` + `cmake --build`. Upstream CMake auto-skips SPIR-V-incompatible examples (`assembly_to_executable`, `llvm_ir_to_executable`, opengl/vulkan_interop, `hello_world_cuda`, `hipify`, `sobel_filter`) via `find_package` / `GPU_RUNTIME` gates.
Bash loop (direct clang)	`Applications`, `Tutorials/reduction`	15 (6+9)	These subdirs' CMakeLists `find_package(hipcub/rocrand/rocfft)` etc., which aren't in our staged tree, so direct clang invocation is simpler than staging math-libs.

Excluded from both paths:

Library-conditional examples (rocPRIM, hipCUB, monte_carlo_pi/hipRAND) — need rocm-libraries math-libs not built here

Known failure (intentionally gating)

hip_cooperative_groups currently fails at runtime with:

LLVM ERROR: Cannot select: intrinsic %llvm.amdgcn.s.wait.asynccnt

The AMDGPU backend lacks a lowering pattern for s.wait.asynccnt for the target ISA. The Run step intentionally does not skip it — the whole point of test_rocm_examples is to keep surfacing this kind of break in CI until it's fixed. PRs that touch the translator / AMDGPU backend / Comgr should see this red check as a reminder. Remove the test-suite-fails state once the intrinsic gets a pattern.

Pinned SHAs (bump deliberately)

ROCm/rocm-systems: 8bb3b73c (develop, 2026-05-15)
ROCm/rocm-cmake: 4d391d7b (develop, 2026-05-12)
ROCm/rocm-examples: e260595e (amd-staging, 2026-05-28)

Build-job changes worth noting

CMAKE_INSTALL_LIBDIR=lib on LLVM Configure + device-libs + Comgr + ROCR + CLR (manylinux defaults to lib64; downstream CMake projects expect lib).
LLVM_ENABLE_RUNTIMES=compiler-rt + COMPILER_RT_BUILD_BUILTINS=ON (all other compiler-rt features off): hip-lang-config.cmake hard-requires libclang_rt.builtins.a for any downstream enable_language(HIP), which the CMake-driven HIP-Basic build needs.
builtins added to the ninja build target.

Credits

CMake-driven HIP-Basic build + container/GPU-passthrough setup adapted from @idubinov's PR #216 (stacked on this one).

Adds a test_rocm_examples job that compiles ROCm/rocm-examples HIP-Basic/hello_world via PR clang with `--offload-arch=amdgcnspirv` and runs the resulting binary on a gfx942 runner. This exercises the SPIRV translator end-to-end through real HIP compile + JIT kernel load, catching translator/codegen bugs that pass lit but fail at runtime. Build job changes: - After Comgr build, install comgr + device-libs into a staging/ tree via cmake --install. clang's --rocm-path expects installed-tree layout, not build-dir layout. - Clone ROCm/rocm-systems (pinned) and build rocr-runtime + clr (libhsa-runtime64.so + libamdhip64.so) against the just-built clang, installing into the same staging/ tree. - Clone ROCm/rocm-cmake (pinned) for the ROCM cmake modules CLR needs. - Disable kpack, rocprofiler-register, and host-hipcc detection in CLR (they're optional and not part of the SPIRV CI scope). - Strip and tar steps extended to include staging/. Why in-pipeline runtime build (vs prebuilt nightly): TheRock's public nightly-tarball bucket only publishes consumer (dgpu) families on a daily cadence; gfx94X / gfx950 (dcgpu) tarballs stopped Oct 2025, and TheRock's own multi-arch CI for those families builds the runtime in-pipeline for the same reason. Building in-pipeline also ensures the runtime tracks the same llvm/clang amd-staging tip we're testing. Test job: - runs-on: linux-gfx942-1gpu-ossci-rocm (matches existing TheRock multi-arch CI runner choice). - Compiles hello_world via PR clang + --rocm-path=$STAGING; explicit -lhsa-runtime64 needed since it's not pulled transitively from libamdhip64.so during exe link. - Runs binary with LD_LIBRARY_PATH=$STAGING/lib. - Greps for fixed host + device greeting lines. - Informational initially; promote to required once stable. Pinned SHAs (bump deliberately, ~weekly): - ROCm/rocm-systems: 8bb3b73c (develop, 2026-05-15) - ROCm/rocm-cmake: 4d391d7b (develop, 2026-05-12) - ROCm/rocm-examples: b4ee9992 (develop, 2025-02-21) Estimated wall-time impact: existing build ~14 min + ~7-10 min new (rocr ~2 min, clr ~5-8 min) -> ~22-25 min total. Inside 120-min timeout. Test job adds one GPU-runner job, runs ~1-2 min.

Mirrors the SPIRV CI structure on ROCm/SPIRV-LLVM-Translator amd-staging: a top-level dispatcher (spirv-ci.yml) that calls per- platform reusable workflows (spirv-ci-linux.yml, spirv-ci-windows.yml). Each platform builds LLVM/Clang/LLD + amd-llvm-spirv translator + device-libs + Comgr in one job, uploads the build trees as a single artifact, and fans out to parallel test jobs that consume it: Linux::release / Build (required) Linux::release / Test SPIRV translator lit Linux::release / Test LLVM SPIRV codegen Linux::release / Test Comgr Windows::release / Build Windows::release / Test SPIRV translator lit Windows::release / Test LLVM SPIRV codegen Windows::release / Test Comgr The translator-lit test job re-runs lit twice (PR head + amd-staging baseline) and gates on new-failures-only — Khronos upstream breaks ~1 translator lit test per week, so without baseline-diffing the suite would block unrelated PRs on those days. Pre-existing baseline failures don't gate. Goal: catch breakage in compiler / SPIRV translator that would fail downstream Comgr / HIP testing without paying the cost of a full TheRock build. Test categories will grow over time (rocm-examples, HIP-tests SPIRV mode, etc.) — they slot in as new jobs in the per-platform workflows. This is the llvm-project mirror of the workflows on ROCm/SPIRV-LLVM-Translator amd-staging. Two intentional divergences from the translator copy: 1. No sticky PR comment on the translator-lit test job. The translator copy posts a per-PR comment partitioning translator- lit failures into new / fixed / pre-existing. On llvm-project PRs (which mostly touch llvm/clang/lld, not the SPIRV translator) that comment would just be noise. The failing-check signal in the PR header is enough; the gate step still fails the job on real regressions and prints the failing test list to the log. pull-requests: write permission dropped accordingly. 2. No rocm-examples test job (yet). That work is under review on the translator side first (ROCm/SPIRV-LLVM-Translator#208); it will be brought over here once stable. Replaces 13 incremental prototype commits with a single squash so the diff matches the current intended state. The translator side is the source of truth for these workflows going forward; bumps from there will be replicated here.

ROCR-Runtime's image library compiles OpenCL "blit" bitcode kernels for ~30 gfx targets via add_custom_command(... clang ...). Those clang invocations don't get --rocm-path forwarded; in the manylinux build container there's no /opt/rocm default to fall back to, so device-libs aren't found. Setting ROCM_PATH=$PWD/staging points clang's auto-detection at the staging tree where device-libs were installed in the previous step.

clang-linker-wrapper does its own library lookup before invoking the host linker; it doesn't pick up -L flags from the clang driver, so -lamdhip64 fails to resolve on the GPU runner where there's no system ROCm. Set LIBRARY_PATH=$STAGING/lib so the wrapper finds it in the same tree the host linker is searching. Add -Wl,-rpath,$STAGING/lib so the binary can dlopen the runtime libs without depending on LD_LIBRARY_PATH being set at exec time.

Helps diagnose why clang-linker-wrapper can't find -lamdhip64 even with LIBRARY_PATH set. Temporary; revert once we know the fix.

Mirrors the SPIRV CI structure on ROCm/SPIRV-LLVM-Translator amd-staging: a top-level dispatcher (spirv-ci.yml) that calls per- platform reusable workflows (spirv-ci-linux.yml, spirv-ci-windows.yml). Each platform builds LLVM/Clang/LLD + amd-llvm-spirv translator + device-libs + Comgr in one job, uploads the build trees as a single artifact, and fans out to parallel test jobs that consume it: Linux::release / Build (required) Linux::release / Test SPIRV translator lit Linux::release / Test LLVM SPIRV codegen Linux::release / Test Comgr Windows::release / Build Windows::release / Test SPIRV translator lit Windows::release / Test LLVM SPIRV codegen Windows::release / Test Comgr The translator-lit test job re-runs lit twice (PR head + amd-staging baseline) and gates on new-failures-only — Khronos upstream breaks ~1 translator lit test per week, so without baseline-diffing the suite would block unrelated PRs on those days. Pre-existing baseline failures don't gate. Goal: catch breakage in compiler / SPIRV translator that would fail downstream Comgr / HIP testing without paying the cost of a full TheRock build. Test categories will grow over time (rocm-examples, HIP-tests SPIRV mode, etc.) — they slot in as new jobs in the per-platform workflows. This is the llvm-project mirror of the workflows on ROCm/SPIRV-LLVM-Translator amd-staging. Two intentional divergences from the translator copy: 1. No sticky PR comment on the translator-lit test job. The translator copy posts a per-PR comment partitioning translator- lit failures into new / fixed / pre-existing. On llvm-project PRs (which mostly touch llvm/clang/lld, not the SPIRV translator) that comment would just be noise. The failing-check signal in the PR header is enough; the gate step still fails the job on real regressions and prints the failing test list to the log. pull-requests: write permission dropped accordingly. 2. No rocm-examples test job (yet). That work is under review on the translator side first (ROCm/SPIRV-LLVM-Translator#208); it will be brought over here once stable. Replaces 13 incremental prototype commits with a single squash so the diff matches the current intended state. The translator side is the source of truth for these workflows going forward; bumps from there will be replicated here.

CMake's GNUInstallDirs picks lib64 on the manylinux2014 container (64-bit Red Hat heritage). clang's --rocm-path detection looks for libraries at <prefix>/lib, so libamdhip64.so + libhsa-runtime64.so landed where the HIP driver couldn't see them. Set CMAKE_INSTALL_LIBDIR=lib on both ROCR-Runtime and CLR configures to force the layout clang expects. Also drop the debug step and LIBRARY_PATH workaround now that the real cause is identified.

AlexVlx · 2026-05-21T17:29:57Z

@lamb-j looks like the test itself is yielding unexpected output?

lamb-j · 2026-05-21T17:46:33Z

Do we know of any issues with device printf and amdgcnspirv? Maybe we should use a different test to prototype

It's an issue with the build config, re-testing

Helps determine whether the kernel launches and whether sync returns cleanly when the device printf output is missing. Temporary.

HIP runtime calls Comgr at hipLaunchKernel time to translate the embedded SPIR-V fatbin to AMDGPU code for the actual GPU. Comgr searches for the translator at <libamd_comgr.so>/../bin/amd-llvm-spirv. The translator binary is built in build/bin/ but wasn't being copied into staging/, so Comgr couldn't find it at JIT time, hipLaunchKernel returned hipErrorInvalidImage, and the kernel silently never ran. hipDeviceSynchronize then returned success (nothing to sync) so the binary exited cleanly with only the host-side print output. Copy amd-llvm-spirv + the llvm-spirv symlink into staging/bin/ after the CLR install step.

Comgr looks for amd-llvm-spirv by name; the llvm-spirv symlink is for downstream tools that expect the upstream name and isn't needed for the HIP runtime JIT path.

Mirrors the SPIRV CI structure on ROCm/SPIRV-LLVM-Translator amd-staging: a top-level dispatcher (spirv-ci.yml) that calls per- platform reusable workflows (spirv-ci-linux.yml, spirv-ci-windows.yml). Each platform builds LLVM/Clang/LLD + amd-llvm-spirv translator + device-libs + Comgr in one job, uploads the build trees as a single artifact, and fans out to parallel test jobs that consume it: Linux::release / Build (required) Linux::release / Test SPIRV translator lit Linux::release / Test LLVM SPIRV codegen Linux::release / Test Comgr Windows::release / Build Windows::release / Test SPIRV translator lit Windows::release / Test LLVM SPIRV codegen Windows::release / Test Comgr The translator-lit test job re-runs lit twice (PR head + amd-staging baseline) and gates on new-failures-only — Khronos upstream breaks ~1 translator lit test per week, so without baseline-diffing the suite would block unrelated PRs on those days. Pre-existing baseline failures don't gate. Goal: catch breakage in compiler / SPIRV translator that would fail downstream Comgr / HIP testing without paying the cost of a full TheRock build. Test categories will grow over time (rocm-examples, HIP-tests SPIRV mode, etc.) — they slot in as new jobs in the per-platform workflows. This is the llvm-project mirror of the workflows on ROCm/SPIRV-LLVM-Translator amd-staging. Two intentional divergences from the translator copy: 1. No sticky PR comment on the translator-lit test job. The translator copy posts a per-PR comment partitioning translator- lit failures into new / fixed / pre-existing. On llvm-project PRs (which mostly touch llvm/clang/lld, not the SPIRV translator) that comment would just be noise. The failing-check signal in the PR header is enough; the gate step still fails the job on real regressions and prints the failing test list to the log. pull-requests: write permission dropped accordingly. 2. No rocm-examples test job (yet). That work is under review on the translator side first (ROCm/SPIRV-LLVM-Translator#208); it will be brought over here once stable. Replaces 13 incremental prototype commits with a single squash so the diff matches the current intended state. The translator side is the source of truth for these workflows going forward; bumps from there will be replicated here.

Comgr's 'Failed to compile spirv to reloc' is opaque — the underlying error stays buried unless logs are redirected to stderr. Also list staging/bin to confirm the translator is actually there at runtime.

AMD_COMGR_REDIRECT_LOGS alone surfaced no extra detail for the 'Failed to compile spirv to reloc' failure. Add: - ROCM_PATH=$STAGING so Comgr finds device-libs in staging instead of falling back to /opt/rocm - AMD_COMGR_SAVE_TEMPS=/tmp/comgr to dump intermediate files showing how far the SPIR-V → reloc pipeline got - Post-run listing of /tmp/comgr/ contents

Match the working invocation from ROCm/RocmCIForSPIRV spirv_external_llvm.sh. The critical addition is --offload-new-driver, which routes through the HIP offload codepath that produces a fatbin HIP runtime can JIT for amdgcnspirv. Without it the runtime hits "Failed to compile spirv to reloc" because the bundle format doesn't match what hip_fatbin.cpp expects. Also added: - --rocm-device-lib-path explicit (auto-detect doesn't always find the staged amdgcn/bitcode dir) - -D__HIP_PLATFORM_AMD__ to match - -Wl,--unresolved-symbols=ignore-in-shared-libs (libamd_comgr embeds its own LLVM symbols)

HIP runtime reaches Comgr and Comgr's COMPILE_SPIRV_TO_RELOCATABLE action fails. Need to verify what's actually in the binary's fatbin section — SPIR-V with the right target string, or something else (LLVM IR? Wrong target name?). Add a step that lists bundle entries via clang-offload-bundler.

Global env var forces Comgr to log its internal actions without HIP needing to enable per-action logging. Targets the silent AMD_COMGR_ACTION_COMPILE_SPIRV_TO_RELOCATABLE failure at JIT time.

The gfx942 ossci-rocm runner is persistent (no container), so state from prior runs can leak in. Two things to investigate: - ~/.cache/comgr / ~/.amd/comgr — Comgr's JIT cache. A cached failed compile from a prior run could reproduce the same failure even after we've fixed the inputs. RocmCIForSPIRV's clean_all explicitly wipes these for the same reason. - /opt/rocm — if a system ROCm install is present, HIP runtime could be silently resolving against it instead of (or alongside) our staged tree. Add a probe step that lists both, then wipes the Comgr caches before the Run step.

Probe step now shows which libamdhip64 + libamd_comgr are actually resolving at runtime, and the rpath/runpath / NEEDED entries of our staged libs. /opt/rocm-6.4.1 exists on the runner — if the staged libs end up pulling Comgr from there, our COMPILE_SPIRV_TO_RELOCATABLE call lands in a stale Comgr API.

## Summary Adds SPIRV-focused PR CI for `ROCm/llvm-project` `amd-staging`, mirroring the structure that landed on `ROCm/SPIRV-LLVM-Translator` `amd-staging`. Each platform builds LLVM/Clang/translator/Comgr in one job and fans out to parallel test jobs that consume a GHA-artifact build tree. Catches translator/Comgr breakage without paying the cost of a full TheRock build. ## New checks on PRs ``` SPIRV Compiler CI - amd-staging / Linux::release / Build SPIRV Compiler CI - amd-staging / Linux::release / Test SPIRV translator lit SPIRV Compiler CI - amd-staging / Linux::release / Test LLVM SPIRV codegen SPIRV Compiler CI - amd-staging / Linux::release / Test Comgr SPIRV Compiler CI - amd-staging / Windows::release / Build SPIRV Compiler CI - amd-staging / Windows::release / Test SPIRV translator lit SPIRV Compiler CI - amd-staging / Windows::release / Test LLVM SPIRV codegen SPIRV Compiler CI - amd-staging / Windows::release / Test Comgr ``` All checks are informational initially; promote individually to required as each stabilizes. ## Differences from the translator copy Two intentional divergences from the workflows on ROCm/SPIRV-LLVM-Translator `amd-staging`: 1. **No sticky PR comment from the translator-lit test job.** The translator copy posts a per-PR comment partitioning translator-lit failures into new / fixed / pre-existing. On llvm-project PRs (which mostly touch llvm/clang/lld, not the SPIRV translator) that comment would just be noise — the failing check in the PR header is enough signal. The gate step still fails the job on real regressions and prints the failing test list to the log. `pull-requests: write` permission dropped accordingly. 2. **No `rocm-examples` test job (yet).** That work is under review on the translator side first as ROCm/SPIRV-LLVM-Translator#208 and will be brought over here once stable.

The dynamic loader was resolving libamd_comgr.so.3 from /opt/rocm-6.4.1/lib (a system ROCm install on the GPU runner) instead of from our staged tree, because Comgr's install rules apparently don't always emit the libamd_comgr.so.<MAJOR> SONAME symlink in staging/lib. Without that symlink, LD_LIBRARY_PATH=staging/lib has nothing to match against and falls through to ld.so.cache, which hands back the old system Comgr. End result: our libamdhip64 (which calls the new AMD_COMGR_ACTION_COMPILE_SPIRV_TO_RELOCATABLE) ends up linked against 6.4.1's libamd_comgr that doesn't recognize that action — JIT fails with INVALID_ARGUMENT. Backfill the SONAME + dev symlinks after install, and log the lib directory before/after so we can confirm what's there.

Comgr's install was landing in staging/lib64 because the manylinux container's GNUInstallDirs defaults to lib64. Even though the symlink chain (libamd_comgr.so / .so.3 / .so.3.3.0) was created, it was in the wrong directory — LD_LIBRARY_PATH=staging/lib had no comgr to match, so the runtime resolver fell through to /opt/rocm-6.4.1's older Comgr (with a stale API that doesn't recognize COMPILE_SPIRV_TO_RELOCATABLE). Add -DCMAKE_INSTALL_LIBDIR=lib to Comgr's configure (matching what we already did for ROCR + CLR). Revert the buggy SONAME backfill — no longer needed once Comgr installs to the right place.

Replace the single hello-world compile+run with loop-based steps covering the same SPIR-V-eligible example sets that ROCm/RocmCIForSPIRV's spirv_external_llvm.sh builds: - SIMPLE_EXAMPLES (16): HIP-Basic single-source examples - APPLICATIONS (6): HIP-Basic + External/ include path - REDUCTION (9): Tutorials/reduction v1-v9 (c++20) Excluded examples (opengl/vulkan interop, hello_world_cuda, hipify, assembly_to_executable, llvm_ir_to_executable, sobel_filter) either need libraries we don't have or use native-AMDGPU code paths incompatible with SPIR-V. Verification: gate on exit code 0 per example (catches crashes, hipError returns, asserts). 60s timeout per example to avoid hangs. Per-example pass/fail reported in step output. Bumped rocm-examples pin from b4ee9992 (develop, 2025-02) to e260595e (amd-staging, 2026-05-28) — develop hadn't moved since Feb, while amd-staging tracks the active examples we want to test against. Bumped job timeout 20m -> 45m to fit 31 examples (~30s build each + ~few seconds run each + GPU runner queue time).

Brings rocm-examples coverage to 34 — full parity with what RocmCIForSPIRV's spirv_external_llvm.sh builds, modulo multi_gpu_data_transfer (needs >1 GPU; this runner is 1gpu) and the math-libs-dependent tier (rocPRIM, hipCUB, monte_carlo_pi). bandwidth + matrix_multiplication live in HIP-Basic but need -I External (new EXTERNAL_EXAMPLES bucket). runtime_compilation needs -lhiprtc at link time; staged as a one-off.

idubinov · 2026-05-29T07:39:24Z

Thank you for your work,

Same as Dmitry's scripts, this PR uses raw clang commands to build some subset of rocm examples.

The Examples project has cmake, and cmake is default way to build all apps. The Examples project also have ctest.

In my opinion, we should check the default build & test way instead of building each app one by one with our own scripts. I find suggested solution unstable and will be broken the next time any of examples changed. This approach will increase CI support cost - we will need to follow changes that made in another repo and adjust our CI with such changes

- Bump actions/upload-artifact v4 -> v7, download-artifact v4 -> v8 (reviewer note: v4 is being deprecated June 2; v7/v8 are current). - Remove the two diagnostic steps from test_rocm_examples that we added while debugging the JIT failure: * "Dump fatbin contents" (llvm-objdump / clang-offload-bundler inspection of hello_world's fatbin) * "Probe runner state + clean caches" (ldd, readelf, /opt/rocm listing, comgr cache wipe) These caught the lib64 vs lib mismatch and the libamd_comgr SONAME fallthrough to /opt/rocm; both are fixed now (CMAKE_INSTALL_LIBDIR=lib on Comgr install), so the probes have served their purpose. - Drop the small "ls staging/lib/libamd_comgr*" diagnostic at the end of the install step too. Net -34 lines.

lamb-j · 2026-05-29T16:11:49Z

Thank you for your work,

Same as Dmitry's scripts, this PR uses raw clang commands to build some subset of rocm examples.

The Examples project has cmake, and cmake is default way to build all apps. The Examples project also have ctest.

In my opinion, we should check the default build & test way instead of building each app one by one with our own scripts. I find suggested solution unstable and will be broken the next time any of examples changed. This approach will increase CI support cost - we will need to follow changes that made in another repo and adjust our CI with such changes

I like this. Let me see if I can update to do that. I'm wondering if we'd need to update rocm-examples CMake to get what we want (for example the right flags, and the right subset of tests for this use case). But I do agree that it would be more sustainable.

per reviewer suggestion (idubinov). built_examples.txt only contains names of executables that succeeded compile (BUILD_OK append happens after a successful compile), so the [ -x "./$exe" ] check was redundant — trust the list.

idubinov · 2026-05-29T16:18:57Z

@lamb-j , I ve built HIP-Basic apps in clean container using CMake, all apps (excluding hip_cooperative_groups) passed (https://github.com/ROCm/SPIRV-LLVM-Translator/actions/runs/26645427704/job/78533788555).
Other parts of examples have un- and necessary library dependencies - need to figure out where it is needed or not, where can we provide libs and which examples we should disable

https://github.com/ROCm/SPIRV-LLVM-Translator/pull/216/changes

Incorporates the good ideas from idubinov's PR #216: Build job: - CMAKE_INSTALL_LIBDIR=lib on LLVM Configure (manylinux defaults to lib64; downstream CMake projects expect lib). - LLVM_ENABLE_RUNTIMES=compiler-rt + COMPILER_RT_BUILD_BUILTINS=ON (all other compiler-rt features off). hip-lang-config.cmake hard-requires libclang_rt.builtins.a for any downstream enable_language(HIP). - builtins added to the ninja build target. - CMAKE_INSTALL_LIBDIR=lib also added to device-libs Configure. Test job: - Now runs inside the same manylinux container as Build with GPU passthrough (--device=/dev/kfd /dev/dri --group-add video). Isolates from /opt/rocm-* and Comgr cache on the persistent host runner; also gives glibc consistency with the built libs. - dnf install numactl-libs (HIP runtime dep). HIP-Basic now built via rocm-examples' own CMakeLists (23 examples) instead of our hand-rolled bash loop (19 examples). The upstream CMake auto-skips SPIR-V-incompatible examples (assembly_to_executable, llvm_ir_to_executable, opengl/vulkan_interop, hello_world_cuda, hipify, sobel_filter) via find_package / GPU_RUNTIME gates. Picks up 3 examples we previously excluded: hip_static_host_library (+ _cxx), hip_module_api, hip_multi_gpu_data_transfer. Applications + Tutorials/reduction stay on the bash loop — their CMakeLists pull in hipcub/rocrand/rocfft find_packages that aren't present in our staged tree. Run step now iterates both examples-build/bin/* (CMake build) and built_examples.txt (bash loop), with hip_cooperative_groups skipped (known s.wait.asynccnt codegen bug). Total coverage: ~32 examples (22 HIP-Basic + 6 Applications + 9 Tutorials/reduction - 1 skipped). Dropped --offload-new-driver (now clang default) and -use-spirv-backend (stick with clang default translator codepath) from Igor's flag set.

It's a real codegen failure ('Cannot select intrinsic %llvm.amdgcn.s.wait.asynccnt') and the whole point of the test_rocm_examples job is to keep surfacing this until the AMDGPU backend gets a lowering pattern. Gating the suite on it is the correct behavior.

Adds a test_hip_tests job that builds the hip-tests `kernel` catch2 unit suite with -DENABLE_SPIRV=ON and runs it via ctest on a gfx942 GPU runner. Prototype scope: just the kernel unit dir to start; expand to more unit dirs once the SPIR-V baseline is characterized. hip-tests has first-class SPIR-V support: -DENABLE_SPIRV=ON makes its own CMake set --offload-arch=amdgcnspirv and gate out cases that can't run in SPIR-V mode (per-test `if(NOT ENABLE_SPIRV)` blocks). catch2's catch_discover_tests registers each case with ctest, so we just configure -> build KernelTest -> ctest -R Kernel. Reuses the in-pipeline runtime staging tree + manylinux container + GPU passthrough from the rocm-examples job. Same lib/lib64 + libgcc runtime-lib flags. Stacked on the rocm-examples branch (shares the Build job). Will rebase onto amd-staging once #208 lands. Informational initially.

lamb-j mentioned this pull request May 18, 2026

[CI] Add SPIRV-focused PR CI workflow for amd-staging ROCm/llvm-project#2451

Merged

lamb-j added 3 commits May 18, 2026 16:22

[CI] rocm-examples: add staging-tree debug + -v on compile

24cb78a

Helps diagnose why clang-linker-wrapper can't find -lamdhip64 even with LIBRARY_PATH set. Temporary; revert once we know the fix.

lamb-j added 3 commits May 21, 2026 10:47

[CI] rocm-examples: diagnostic AMD_LOG_LEVEL on Run step

8220221

Helps determine whether the kernel launches and whether sync returns cleanly when the device printf output is missing. Temporary.

[CI] Drop llvm-spirv symlink from staging copy

1f263d9

Comgr looks for amd-llvm-spirv by name; the llvm-spirv symlink is for downstream tools that expect the upstream name and isn't needed for the HIP runtime JIT path.

lamb-j added 7 commits May 21, 2026 13:37

[CI] rocm-examples: enable AMD_COMGR_REDIRECT_LOGS for JIT debug

57460c7

Comgr's 'Failed to compile spirv to reloc' is opaque — the underlying error stays buried unless logs are redirected to stderr. Also list staging/bin to confirm the translator is actually there at runtime.

[CI] rocm-examples: add AMD_COMGR_EMIT_VERBOSE_LOGS=1

919ca04

Global env var forces Comgr to log its internal actions without HIP needing to enable per-action logging. Targets the silent AMD_COMGR_ACTION_COMPILE_SPIRV_TO_RELOCATABLE failure at JIT time.

lamb-j added 4 commits May 28, 2026 13:14

lamb-j changed the title ~~[CI] Add rocm-examples hello-world test (in-pipeline HIP runtime build)~~ [CI] Add rocm-examples test (in-pipeline HIP runtime build, 34 examples) May 29, 2026

ROCm deleted a comment from github-actions Bot May 29, 2026

lamb-j requested a review from kirthana14m May 29, 2026 06:39

lamb-j requested a review from idubinov May 29, 2026 06:39

Merge branch 'amd-staging' into users/lambj/spirv-ci-rocm-examples

c9794cb

idubinov reviewed May 29, 2026

View reviewed changes

Comment thread .github/workflows/spirv-ci-linux.yml Outdated

Comment thread .github/workflows/spirv-ci-linux.yml Outdated

Comment thread .github/workflows/spirv-ci-linux.yml Outdated

Comment thread .github/workflows/spirv-ci-linux.yml Outdated

[CI] rocm-examples: drop redundant -x check in run loop

acd5d3b

per reviewer suggestion (idubinov). built_examples.txt only contains names of executables that succeeded compile (BUILD_OK append happens after a successful compile), so the [ -x "./$exe" ] check was redundant — trust the list.

lamb-j changed the title ~~[CI] Add rocm-examples test (in-pipeline HIP runtime build, 34 examples)~~ [CI] Add rocm-examples test (in-pipeline HIP runtime, CMake + bash hybrid) May 29, 2026

lamb-j mentioned this pull request Jun 4, 2026

[CI] Add hip-tests kernel suite in SPIRV mode #223

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] Add rocm-examples test (in-pipeline HIP runtime, CMake + bash hybrid)#208

[CI] Add rocm-examples test (in-pipeline HIP runtime, CMake + bash hybrid)#208
lamb-j wants to merge 24 commits into
amd-stagingfrom
users/lambj/spirv-ci-rocm-examples

lamb-j commented May 18, 2026 •

edited

Loading

Uh oh!

AlexVlx commented May 21, 2026

Uh oh!

lamb-j commented May 21, 2026 •

edited

Loading

Uh oh!

idubinov commented May 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lamb-j commented May 29, 2026

Uh oh!

idubinov commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

lamb-j commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New check on PRs

Example coverage

Known failure (intentionally gating)

Pinned SHAs (bump deliberately)

Build-job changes worth noting

Credits

Uh oh!

AlexVlx commented May 21, 2026

Uh oh!

lamb-j commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

idubinov commented May 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lamb-j commented May 29, 2026

Uh oh!

idubinov commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lamb-j commented May 18, 2026 •

edited

Loading

lamb-j commented May 21, 2026 •

edited

Loading