Skip to content

[CI] Add rocm-examples test (in-pipeline HIP runtime, CMake + bash hybrid)#208

Open
lamb-j wants to merge 24 commits into
amd-stagingfrom
users/lambj/spirv-ci-rocm-examples
Open

[CI] Add rocm-examples test (in-pipeline HIP runtime, CMake + bash hybrid)#208
lamb-j wants to merge 24 commits into
amd-stagingfrom
users/lambj/spirv-ci-rocm-examples

Conversation

@lamb-j

@lamb-j lamb-j commented May 18, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a test_rocm_examples job to spirv-ci-linux.yml that builds and runs a broad set of ROCm/rocm-examples HIP examples via PR clang with --offload-arch=amdgcnspirv on a gfx942 GPU runner. Exercises the SPIRV translator end-to-end through real HIP compile + JIT kernel load — catches translator/codegen bugs that pass lit but fail at runtime.

The Build job picks up an in-pipeline HIP runtime build (rocr-runtime + CLR + Comgr install into a staging/ tree) so PR clang can target it via --rocm-path. No external prebuilt runtime needed — TheRock's public nightly bucket dropped dcgpu families in Oct 2025.

New check on PRs

SPIRV Compiler CI / Linux::release / Test rocm-examples

Runs on linux-gfx942-1gpu-ossci-rocm, inside the same manylinux container the Build job uses with GPU passthrough (--device=/dev/kfd /dev/dri --group-add video). Isolates the test from /opt/rocm and Comgr cache state on the persistent host runner, and gives glibc consistency with the built libs. Informational initially.

Example coverage

Two parallel build paths, both gating on exit code 0 per binary:

Path Subdirs Count Mechanism
CMake (rocm-examples' own CMakeLists) HIP-Basic ~23 cmake -S HIP-Basic -B examples-build + cmake --build. Upstream CMake auto-skips SPIR-V-incompatible examples (assembly_to_executable, llvm_ir_to_executable, opengl/vulkan_interop, hello_world_cuda, hipify, sobel_filter) via find_package / GPU_RUNTIME gates.
Bash loop (direct clang) Applications, Tutorials/reduction 15 (6+9) These subdirs' CMakeLists find_package(hipcub/rocrand/rocfft) etc., which aren't in our staged tree, so direct clang invocation is simpler than staging math-libs.

Excluded from both paths:

  • Library-conditional examples (rocPRIM, hipCUB, monte_carlo_pi/hipRAND) — need rocm-libraries math-libs not built here

Known failure (intentionally gating)

hip_cooperative_groups currently fails at runtime with:

LLVM ERROR: Cannot select: intrinsic %llvm.amdgcn.s.wait.asynccnt

The AMDGPU backend lacks a lowering pattern for s.wait.asynccnt for the target ISA. The Run step intentionally does not skip it — the whole point of test_rocm_examples is to keep surfacing this kind of break in CI until it's fixed. PRs that touch the translator / AMDGPU backend / Comgr should see this red check as a reminder. Remove the test-suite-fails state once the intrinsic gets a pattern.

Pinned SHAs (bump deliberately)

  • ROCm/rocm-systems: 8bb3b73c (develop, 2026-05-15)
  • ROCm/rocm-cmake: 4d391d7b (develop, 2026-05-12)
  • ROCm/rocm-examples: e260595e (amd-staging, 2026-05-28)

Build-job changes worth noting

  • CMAKE_INSTALL_LIBDIR=lib on LLVM Configure + device-libs + Comgr + ROCR + CLR (manylinux defaults to lib64; downstream CMake projects expect lib).
  • LLVM_ENABLE_RUNTIMES=compiler-rt + COMPILER_RT_BUILD_BUILTINS=ON (all other compiler-rt features off): hip-lang-config.cmake hard-requires libclang_rt.builtins.a for any downstream enable_language(HIP), which the CMake-driven HIP-Basic build needs.
  • builtins added to the ninja build target.

Credits

CMake-driven HIP-Basic build + container/GPU-passthrough setup adapted from @idubinov's PR #216 (stacked on this one).

Adds a test_rocm_examples job that compiles ROCm/rocm-examples
HIP-Basic/hello_world via PR clang with `--offload-arch=amdgcnspirv`
and runs the resulting binary on a gfx942 runner. This exercises the
SPIRV translator end-to-end through real HIP compile + JIT kernel
load, catching translator/codegen bugs that pass lit but fail at
runtime.

Build job changes:
- After Comgr build, install comgr + device-libs into a staging/ tree
  via cmake --install. clang's --rocm-path expects installed-tree
  layout, not build-dir layout.
- Clone ROCm/rocm-systems (pinned) and build rocr-runtime + clr
  (libhsa-runtime64.so + libamdhip64.so) against the just-built clang,
  installing into the same staging/ tree.
- Clone ROCm/rocm-cmake (pinned) for the ROCM cmake modules CLR needs.
- Disable kpack, rocprofiler-register, and host-hipcc detection in CLR
  (they're optional and not part of the SPIRV CI scope).
- Strip and tar steps extended to include staging/.

Why in-pipeline runtime build (vs prebuilt nightly): TheRock's public
nightly-tarball bucket only publishes consumer (dgpu) families on a
daily cadence; gfx94X / gfx950 (dcgpu) tarballs stopped Oct 2025, and
TheRock's own multi-arch CI for those families builds the runtime
in-pipeline for the same reason. Building in-pipeline also ensures the
runtime tracks the same llvm/clang amd-staging tip we're testing.

Test job:
- runs-on: linux-gfx942-1gpu-ossci-rocm (matches existing TheRock
  multi-arch CI runner choice).
- Compiles hello_world via PR clang + --rocm-path=$STAGING; explicit
  -lhsa-runtime64 needed since it's not pulled transitively from
  libamdhip64.so during exe link.
- Runs binary with LD_LIBRARY_PATH=$STAGING/lib.
- Greps for fixed host + device greeting lines.
- Informational initially; promote to required once stable.

Pinned SHAs (bump deliberately, ~weekly):
- ROCm/rocm-systems: 8bb3b73c (develop, 2026-05-15)
- ROCm/rocm-cmake:   4d391d7b (develop, 2026-05-12)
- ROCm/rocm-examples: b4ee9992 (develop, 2025-02-21)

Estimated wall-time impact: existing build ~14 min + ~7-10 min new
(rocr ~2 min, clr ~5-8 min) -> ~22-25 min total. Inside 120-min
timeout. Test job adds one GPU-runner job, runs ~1-2 min.
lamb-j added a commit to ROCm/llvm-project that referenced this pull request May 18, 2026
Mirrors the SPIRV CI structure on ROCm/SPIRV-LLVM-Translator
amd-staging: a top-level dispatcher (spirv-ci.yml) that calls per-
platform reusable workflows (spirv-ci-linux.yml, spirv-ci-windows.yml).
Each platform builds LLVM/Clang/LLD + amd-llvm-spirv translator +
device-libs + Comgr in one job, uploads the build trees as a single
artifact, and fans out to parallel test jobs that consume it:

  Linux::release / Build                          (required)
  Linux::release / Test SPIRV translator lit
  Linux::release / Test LLVM SPIRV codegen
  Linux::release / Test Comgr
  Windows::release / Build
  Windows::release / Test SPIRV translator lit
  Windows::release / Test LLVM SPIRV codegen
  Windows::release / Test Comgr

The translator-lit test job re-runs lit twice (PR head + amd-staging
baseline) and gates on new-failures-only — Khronos upstream breaks
~1 translator lit test per week, so without baseline-diffing the
suite would block unrelated PRs on those days. Pre-existing baseline
failures don't gate.

Goal: catch breakage in compiler / SPIRV translator that would fail
downstream Comgr / HIP testing without paying the cost of a full
TheRock build. Test categories will grow over time (rocm-examples,
HIP-tests SPIRV mode, etc.) — they slot in as new jobs in the
per-platform workflows.

This is the llvm-project mirror of the workflows on
ROCm/SPIRV-LLVM-Translator amd-staging. Two intentional divergences
from the translator copy:

  1. No sticky PR comment on the translator-lit test job. The
     translator copy posts a per-PR comment partitioning translator-
     lit failures into new / fixed / pre-existing. On llvm-project
     PRs (which mostly touch llvm/clang/lld, not the SPIRV translator)
     that comment would just be noise. The failing-check signal in
     the PR header is enough; the gate step still fails the job on
     real regressions and prints the failing test list to the log.
     pull-requests: write permission dropped accordingly.

  2. No rocm-examples test job (yet). That work is under review on
     the translator side first (ROCm/SPIRV-LLVM-Translator#208); it
     will be brought over here once stable.

Replaces 13 incremental prototype commits with a single squash so the
diff matches the current intended state. The translator side is the
source of truth for these workflows going forward; bumps from there
will be replicated here.
lamb-j added 3 commits May 18, 2026 16:22
ROCR-Runtime's image library compiles OpenCL "blit" bitcode kernels
for ~30 gfx targets via add_custom_command(... clang ...). Those
clang invocations don't get --rocm-path forwarded; in the manylinux
build container there's no /opt/rocm default to fall back to, so
device-libs aren't found. Setting ROCM_PATH=$PWD/staging points
clang's auto-detection at the staging tree where device-libs were
installed in the previous step.
clang-linker-wrapper does its own library lookup before invoking the
host linker; it doesn't pick up -L flags from the clang driver, so
-lamdhip64 fails to resolve on the GPU runner where there's no
system ROCm. Set LIBRARY_PATH=$STAGING/lib so the wrapper finds it
in the same tree the host linker is searching.

Add -Wl,-rpath,$STAGING/lib so the binary can dlopen the runtime
libs without depending on LD_LIBRARY_PATH being set at exec time.
Helps diagnose why clang-linker-wrapper can't find -lamdhip64 even
with LIBRARY_PATH set. Temporary; revert once we know the fix.
lamb-j added a commit to ROCm/llvm-project that referenced this pull request May 20, 2026
Mirrors the SPIRV CI structure on ROCm/SPIRV-LLVM-Translator
amd-staging: a top-level dispatcher (spirv-ci.yml) that calls per-
platform reusable workflows (spirv-ci-linux.yml, spirv-ci-windows.yml).
Each platform builds LLVM/Clang/LLD + amd-llvm-spirv translator +
device-libs + Comgr in one job, uploads the build trees as a single
artifact, and fans out to parallel test jobs that consume it:

  Linux::release / Build                          (required)
  Linux::release / Test SPIRV translator lit
  Linux::release / Test LLVM SPIRV codegen
  Linux::release / Test Comgr
  Windows::release / Build
  Windows::release / Test SPIRV translator lit
  Windows::release / Test LLVM SPIRV codegen
  Windows::release / Test Comgr

The translator-lit test job re-runs lit twice (PR head + amd-staging
baseline) and gates on new-failures-only — Khronos upstream breaks
~1 translator lit test per week, so without baseline-diffing the
suite would block unrelated PRs on those days. Pre-existing baseline
failures don't gate.

Goal: catch breakage in compiler / SPIRV translator that would fail
downstream Comgr / HIP testing without paying the cost of a full
TheRock build. Test categories will grow over time (rocm-examples,
HIP-tests SPIRV mode, etc.) — they slot in as new jobs in the
per-platform workflows.

This is the llvm-project mirror of the workflows on
ROCm/SPIRV-LLVM-Translator amd-staging. Two intentional divergences
from the translator copy:

  1. No sticky PR comment on the translator-lit test job. The
     translator copy posts a per-PR comment partitioning translator-
     lit failures into new / fixed / pre-existing. On llvm-project
     PRs (which mostly touch llvm/clang/lld, not the SPIRV translator)
     that comment would just be noise. The failing-check signal in
     the PR header is enough; the gate step still fails the job on
     real regressions and prints the failing test list to the log.
     pull-requests: write permission dropped accordingly.

  2. No rocm-examples test job (yet). That work is under review on
     the translator side first (ROCm/SPIRV-LLVM-Translator#208); it
     will be brought over here once stable.

Replaces 13 incremental prototype commits with a single squash so the
diff matches the current intended state. The translator side is the
source of truth for these workflows going forward; bumps from there
will be replicated here.
CMake's GNUInstallDirs picks lib64 on the manylinux2014 container
(64-bit Red Hat heritage). clang's --rocm-path detection looks for
libraries at <prefix>/lib, so libamdhip64.so + libhsa-runtime64.so
landed where the HIP driver couldn't see them.

Set CMAKE_INSTALL_LIBDIR=lib on both ROCR-Runtime and CLR configures
to force the layout clang expects. Also drop the debug step and
LIBRARY_PATH workaround now that the real cause is identified.
@AlexVlx

AlexVlx commented May 21, 2026

Copy link
Copy Markdown
Contributor

@lamb-j looks like the test itself is yielding unexpected output?

@lamb-j

lamb-j commented May 21, 2026

Copy link
Copy Markdown
Collaborator Author

Do we know of any issues with device printf and amdgcnspirv? Maybe we should use a different test to prototype

It's an issue with the build config, re-testing

lamb-j added 3 commits May 21, 2026 10:47
Helps determine whether the kernel launches and whether sync returns
cleanly when the device printf output is missing. Temporary.
HIP runtime calls Comgr at hipLaunchKernel time to translate the
embedded SPIR-V fatbin to AMDGPU code for the actual GPU. Comgr
searches for the translator at <libamd_comgr.so>/../bin/amd-llvm-spirv.

The translator binary is built in build/bin/ but wasn't being copied
into staging/, so Comgr couldn't find it at JIT time, hipLaunchKernel
returned hipErrorInvalidImage, and the kernel silently never ran.
hipDeviceSynchronize then returned success (nothing to sync) so the
binary exited cleanly with only the host-side print output.

Copy amd-llvm-spirv + the llvm-spirv symlink into staging/bin/ after
the CLR install step.
Comgr looks for amd-llvm-spirv by name; the llvm-spirv symlink is
for downstream tools that expect the upstream name and isn't needed
for the HIP runtime JIT path.
lamb-j added a commit to ROCm/llvm-project that referenced this pull request May 21, 2026
Mirrors the SPIRV CI structure on ROCm/SPIRV-LLVM-Translator
amd-staging: a top-level dispatcher (spirv-ci.yml) that calls per-
platform reusable workflows (spirv-ci-linux.yml, spirv-ci-windows.yml).
Each platform builds LLVM/Clang/LLD + amd-llvm-spirv translator +
device-libs + Comgr in one job, uploads the build trees as a single
artifact, and fans out to parallel test jobs that consume it:

  Linux::release / Build                          (required)
  Linux::release / Test SPIRV translator lit
  Linux::release / Test LLVM SPIRV codegen
  Linux::release / Test Comgr
  Windows::release / Build
  Windows::release / Test SPIRV translator lit
  Windows::release / Test LLVM SPIRV codegen
  Windows::release / Test Comgr

The translator-lit test job re-runs lit twice (PR head + amd-staging
baseline) and gates on new-failures-only — Khronos upstream breaks
~1 translator lit test per week, so without baseline-diffing the
suite would block unrelated PRs on those days. Pre-existing baseline
failures don't gate.

Goal: catch breakage in compiler / SPIRV translator that would fail
downstream Comgr / HIP testing without paying the cost of a full
TheRock build. Test categories will grow over time (rocm-examples,
HIP-tests SPIRV mode, etc.) — they slot in as new jobs in the
per-platform workflows.

This is the llvm-project mirror of the workflows on
ROCm/SPIRV-LLVM-Translator amd-staging. Two intentional divergences
from the translator copy:

  1. No sticky PR comment on the translator-lit test job. The
     translator copy posts a per-PR comment partitioning translator-
     lit failures into new / fixed / pre-existing. On llvm-project
     PRs (which mostly touch llvm/clang/lld, not the SPIRV translator)
     that comment would just be noise. The failing-check signal in
     the PR header is enough; the gate step still fails the job on
     real regressions and prints the failing test list to the log.
     pull-requests: write permission dropped accordingly.

  2. No rocm-examples test job (yet). That work is under review on
     the translator side first (ROCm/SPIRV-LLVM-Translator#208); it
     will be brought over here once stable.

Replaces 13 incremental prototype commits with a single squash so the
diff matches the current intended state. The translator side is the
source of truth for these workflows going forward; bumps from there
will be replicated here.
lamb-j added 7 commits May 21, 2026 13:37
Comgr's 'Failed to compile spirv to reloc' is opaque — the
underlying error stays buried unless logs are redirected to stderr.
Also list staging/bin to confirm the translator is actually there
at runtime.
AMD_COMGR_REDIRECT_LOGS alone surfaced no extra detail for the
'Failed to compile spirv to reloc' failure. Add:
- ROCM_PATH=$STAGING so Comgr finds device-libs in staging instead
  of falling back to /opt/rocm
- AMD_COMGR_SAVE_TEMPS=/tmp/comgr to dump intermediate files showing
  how far the SPIR-V → reloc pipeline got
- Post-run listing of /tmp/comgr/ contents
Match the working invocation from ROCm/RocmCIForSPIRV
spirv_external_llvm.sh. The critical addition is --offload-new-driver,
which routes through the HIP offload codepath that produces a fatbin
HIP runtime can JIT for amdgcnspirv. Without it the runtime hits
"Failed to compile spirv to reloc" because the bundle format doesn't
match what hip_fatbin.cpp expects.

Also added:
- --rocm-device-lib-path explicit (auto-detect doesn't always find the
  staged amdgcn/bitcode dir)
- -D__HIP_PLATFORM_AMD__ to match
- -Wl,--unresolved-symbols=ignore-in-shared-libs (libamd_comgr embeds
  its own LLVM symbols)
HIP runtime reaches Comgr and Comgr's COMPILE_SPIRV_TO_RELOCATABLE
action fails. Need to verify what's actually in the binary's fatbin
section — SPIR-V with the right target string, or something else
(LLVM IR? Wrong target name?). Add a step that lists bundle entries
via clang-offload-bundler.
Global env var forces Comgr to log its internal actions without HIP
needing to enable per-action logging. Targets the silent
AMD_COMGR_ACTION_COMPILE_SPIRV_TO_RELOCATABLE failure at JIT time.
The gfx942 ossci-rocm runner is persistent (no container), so state
from prior runs can leak in. Two things to investigate:

- ~/.cache/comgr / ~/.amd/comgr — Comgr's JIT cache. A cached failed
  compile from a prior run could reproduce the same failure even
  after we've fixed the inputs. RocmCIForSPIRV's clean_all explicitly
  wipes these for the same reason.
- /opt/rocm — if a system ROCm install is present, HIP runtime could
  be silently resolving against it instead of (or alongside) our
  staged tree.

Add a probe step that lists both, then wipes the Comgr caches before
the Run step.
Probe step now shows which libamdhip64 + libamd_comgr are actually
resolving at runtime, and the rpath/runpath / NEEDED entries of our
staged libs. /opt/rocm-6.4.1 exists on the runner — if the staged
libs end up pulling Comgr from there, our COMPILE_SPIRV_TO_RELOCATABLE
call lands in a stale Comgr API.
lamb-j added a commit to ROCm/llvm-project that referenced this pull request May 27, 2026
## Summary

Adds SPIRV-focused PR CI for `ROCm/llvm-project` `amd-staging`,
mirroring the structure that landed on `ROCm/SPIRV-LLVM-Translator`
`amd-staging`. Each platform builds LLVM/Clang/translator/Comgr in one
job and fans out to parallel test jobs that consume a GHA-artifact build
tree. Catches translator/Comgr breakage without paying the cost of a
full TheRock build.

## New checks on PRs

```
SPIRV Compiler CI - amd-staging / Linux::release / Build
SPIRV Compiler CI - amd-staging / Linux::release / Test SPIRV translator lit
SPIRV Compiler CI - amd-staging / Linux::release / Test LLVM SPIRV codegen
SPIRV Compiler CI - amd-staging / Linux::release / Test Comgr
SPIRV Compiler CI - amd-staging / Windows::release / Build
SPIRV Compiler CI - amd-staging / Windows::release / Test SPIRV translator lit
SPIRV Compiler CI - amd-staging / Windows::release / Test LLVM SPIRV codegen
SPIRV Compiler CI - amd-staging / Windows::release / Test Comgr
```

All checks are informational initially; promote individually to required
as each stabilizes.

## Differences from the translator copy

Two intentional divergences from the workflows on
ROCm/SPIRV-LLVM-Translator `amd-staging`:

1. **No sticky PR comment from the translator-lit test job.** The
translator copy posts a per-PR comment partitioning translator-lit
failures into new / fixed / pre-existing. On llvm-project PRs (which
mostly touch llvm/clang/lld, not the SPIRV translator) that comment
would just be noise — the failing check in the PR header is enough
signal. The gate step still fails the job on real regressions and prints
the failing test list to the log. `pull-requests: write` permission
dropped accordingly.

2. **No `rocm-examples` test job (yet).** That work is under review on
the translator side first as ROCm/SPIRV-LLVM-Translator#208 and will be
brought over here once stable.
lamb-j added 4 commits May 28, 2026 13:14
The dynamic loader was resolving libamd_comgr.so.3 from
/opt/rocm-6.4.1/lib (a system ROCm install on the GPU runner) instead
of from our staged tree, because Comgr's install rules apparently
don't always emit the libamd_comgr.so.<MAJOR> SONAME symlink in
staging/lib. Without that symlink, LD_LIBRARY_PATH=staging/lib has
nothing to match against and falls through to ld.so.cache, which
hands back the old system Comgr.

End result: our libamdhip64 (which calls the new
AMD_COMGR_ACTION_COMPILE_SPIRV_TO_RELOCATABLE) ends up linked against
6.4.1's libamd_comgr that doesn't recognize that action — JIT fails
with INVALID_ARGUMENT.

Backfill the SONAME + dev symlinks after install, and log the lib
directory before/after so we can confirm what's there.
Comgr's install was landing in staging/lib64 because the manylinux
container's GNUInstallDirs defaults to lib64. Even though the symlink
chain (libamd_comgr.so / .so.3 / .so.3.3.0) was created, it was in
the wrong directory — LD_LIBRARY_PATH=staging/lib had no comgr to
match, so the runtime resolver fell through to /opt/rocm-6.4.1's
older Comgr (with a stale API that doesn't recognize
COMPILE_SPIRV_TO_RELOCATABLE).

Add -DCMAKE_INSTALL_LIBDIR=lib to Comgr's configure (matching what
we already did for ROCR + CLR). Revert the buggy SONAME backfill —
no longer needed once Comgr installs to the right place.
Replace the single hello-world compile+run with loop-based steps
covering the same SPIR-V-eligible example sets that
ROCm/RocmCIForSPIRV's spirv_external_llvm.sh builds:

- SIMPLE_EXAMPLES (16): HIP-Basic single-source examples
- APPLICATIONS (6): HIP-Basic + External/ include path
- REDUCTION (9): Tutorials/reduction v1-v9 (c++20)

Excluded examples (opengl/vulkan interop, hello_world_cuda, hipify,
assembly_to_executable, llvm_ir_to_executable, sobel_filter) either
need libraries we don't have or use native-AMDGPU code paths
incompatible with SPIR-V.

Verification: gate on exit code 0 per example (catches crashes,
hipError returns, asserts). 60s timeout per example to avoid hangs.
Per-example pass/fail reported in step output.

Bumped rocm-examples pin from b4ee9992 (develop, 2025-02) to e260595e
(amd-staging, 2026-05-28) — develop hadn't moved since Feb, while
amd-staging tracks the active examples we want to test against.

Bumped job timeout 20m -> 45m to fit 31 examples (~30s build each
+ ~few seconds run each + GPU runner queue time).
Brings rocm-examples coverage to 34 — full parity with what
RocmCIForSPIRV's spirv_external_llvm.sh builds, modulo
multi_gpu_data_transfer (needs >1 GPU; this runner is 1gpu) and
the math-libs-dependent tier (rocPRIM, hipCUB, monte_carlo_pi).

bandwidth + matrix_multiplication live in HIP-Basic but need
-I External (new EXTERNAL_EXAMPLES bucket). runtime_compilation
needs -lhiprtc at link time; staged as a one-off.
@lamb-j lamb-j changed the title [CI] Add rocm-examples hello-world test (in-pipeline HIP runtime build) [CI] Add rocm-examples test (in-pipeline HIP runtime build, 34 examples) May 29, 2026
@ROCm ROCm deleted a comment from github-actions Bot May 29, 2026
@ROCm ROCm deleted a comment from github-actions Bot May 29, 2026
@lamb-j lamb-j requested a review from kirthana14m May 29, 2026 06:39
@lamb-j lamb-j requested a review from idubinov May 29, 2026 06:39
@idubinov

Copy link
Copy Markdown

Thank you for your work,

Same as Dmitry's scripts, this PR uses raw clang commands to build some subset of rocm examples.

The Examples project has cmake, and cmake is default way to build all apps. The Examples project also have ctest.

In my opinion, we should check the default build & test way instead of building each app one by one with our own scripts. I find suggested solution unstable and will be broken the next time any of examples changed. This approach will increase CI support cost - we will need to follow changes that made in another repo and adjust our CI with such changes

Comment thread .github/workflows/spirv-ci-linux.yml Outdated
Comment thread .github/workflows/spirv-ci-linux.yml Outdated
Comment thread .github/workflows/spirv-ci-linux.yml Outdated
Comment thread .github/workflows/spirv-ci-linux.yml Outdated
- Bump actions/upload-artifact v4 -> v7, download-artifact v4 -> v8
  (reviewer note: v4 is being deprecated June 2; v7/v8 are current).
- Remove the two diagnostic steps from test_rocm_examples that we
  added while debugging the JIT failure:
  * "Dump fatbin contents" (llvm-objdump / clang-offload-bundler
    inspection of hello_world's fatbin)
  * "Probe runner state + clean caches" (ldd, readelf, /opt/rocm
    listing, comgr cache wipe)
  These caught the lib64 vs lib mismatch and the libamd_comgr SONAME
  fallthrough to /opt/rocm; both are fixed now (CMAKE_INSTALL_LIBDIR=lib
  on Comgr install), so the probes have served their purpose.
- Drop the small "ls staging/lib/libamd_comgr*" diagnostic at the end
  of the install step too.

Net -34 lines.
@lamb-j

lamb-j commented May 29, 2026

Copy link
Copy Markdown
Collaborator Author

Thank you for your work,

Same as Dmitry's scripts, this PR uses raw clang commands to build some subset of rocm examples.

The Examples project has cmake, and cmake is default way to build all apps. The Examples project also have ctest.

In my opinion, we should check the default build & test way instead of building each app one by one with our own scripts. I find suggested solution unstable and will be broken the next time any of examples changed. This approach will increase CI support cost - we will need to follow changes that made in another repo and adjust our CI with such changes

I like this. Let me see if I can update to do that. I'm wondering if we'd need to update rocm-examples CMake to get what we want (for example the right flags, and the right subset of tests for this use case). But I do agree that it would be more sustainable.

per reviewer suggestion (idubinov). built_examples.txt only contains
names of executables that succeeded compile (BUILD_OK append happens
after a successful compile), so the [ -x "./$exe" ] check was
redundant — trust the list.
@idubinov

Copy link
Copy Markdown

@lamb-j , I ve built HIP-Basic apps in clean container using CMake, all apps (excluding hip_cooperative_groups) passed (https://github.com/ROCm/SPIRV-LLVM-Translator/actions/runs/26645427704/job/78533788555).
Other parts of examples have un- and necessary library dependencies - need to figure out where it is needed or not, where can we provide libs and which examples we should disable

https://github.com/ROCm/SPIRV-LLVM-Translator/pull/216/changes

Incorporates the good ideas from idubinov's PR #216:

Build job:
- CMAKE_INSTALL_LIBDIR=lib on LLVM Configure (manylinux defaults to
  lib64; downstream CMake projects expect lib).
- LLVM_ENABLE_RUNTIMES=compiler-rt + COMPILER_RT_BUILD_BUILTINS=ON
  (all other compiler-rt features off). hip-lang-config.cmake
  hard-requires libclang_rt.builtins.a for any downstream
  enable_language(HIP).
- builtins added to the ninja build target.
- CMAKE_INSTALL_LIBDIR=lib also added to device-libs Configure.

Test job:
- Now runs inside the same manylinux container as Build with GPU
  passthrough (--device=/dev/kfd /dev/dri --group-add video). Isolates
  from /opt/rocm-* and Comgr cache on the persistent host runner;
  also gives glibc consistency with the built libs.
- dnf install numactl-libs (HIP runtime dep).

HIP-Basic now built via rocm-examples' own CMakeLists (23 examples)
instead of our hand-rolled bash loop (19 examples). The upstream
CMake auto-skips SPIR-V-incompatible examples (assembly_to_executable,
llvm_ir_to_executable, opengl/vulkan_interop, hello_world_cuda,
hipify, sobel_filter) via find_package / GPU_RUNTIME gates. Picks up
3 examples we previously excluded: hip_static_host_library (+ _cxx),
hip_module_api, hip_multi_gpu_data_transfer.

Applications + Tutorials/reduction stay on the bash loop — their
CMakeLists pull in hipcub/rocrand/rocfft find_packages that aren't
present in our staged tree.

Run step now iterates both examples-build/bin/* (CMake build) and
built_examples.txt (bash loop), with hip_cooperative_groups skipped
(known s.wait.asynccnt codegen bug). Total coverage: ~32 examples
(22 HIP-Basic + 6 Applications + 9 Tutorials/reduction - 1 skipped).

Dropped --offload-new-driver (now clang default) and -use-spirv-backend
(stick with clang default translator codepath) from Igor's flag set.
@lamb-j lamb-j changed the title [CI] Add rocm-examples test (in-pipeline HIP runtime build, 34 examples) [CI] Add rocm-examples test (in-pipeline HIP runtime, CMake + bash hybrid) May 29, 2026
It's a real codegen failure ('Cannot select intrinsic
%llvm.amdgcn.s.wait.asynccnt') and the whole point of the
test_rocm_examples job is to keep surfacing this until the AMDGPU
backend gets a lowering pattern. Gating the suite on it is the
correct behavior.
lamb-j added a commit that referenced this pull request Jun 4, 2026
Adds a test_hip_tests job that builds the hip-tests `kernel` catch2
unit suite with -DENABLE_SPIRV=ON and runs it via ctest on a gfx942
GPU runner. Prototype scope: just the kernel unit dir to start;
expand to more unit dirs once the SPIR-V baseline is characterized.

hip-tests has first-class SPIR-V support: -DENABLE_SPIRV=ON makes its
own CMake set --offload-arch=amdgcnspirv and gate out cases that can't
run in SPIR-V mode (per-test `if(NOT ENABLE_SPIRV)` blocks). catch2's
catch_discover_tests registers each case with ctest, so we just
configure -> build KernelTest -> ctest -R Kernel.

Reuses the in-pipeline runtime staging tree + manylinux container +
GPU passthrough from the rocm-examples job. Same lib/lib64 + libgcc
runtime-lib flags.

Stacked on the rocm-examples branch (shares the Build job). Will
rebase onto amd-staging once #208 lands. Informational initially.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants