[CI test - DO NOT MERGE] Intentionally break two lit tests (v2) by lamb-j · Pull Request #192 · ROCm/SPIRV-LLVM-Translator

lamb-j · 2026-05-08T19:34:26Z

Modifies CHECK directives in two currently-passing transcoding tests with sentinel strings that won't appear in actual translator output:

test/transcoding/fmax.ll — broke CHECK-SPIRV line
test/transcoding/CommonAddrspaceGlobalVar.ll — broke CHECK-LLVM line

github-actions · 2026-05-08T19:50:53Z

🔴 2 new translator lit failures introduced by this PR (non-blocking; see run).

🔴 New failures (2) — likely caused by this PR

FAIL: LLVM_SPIRV :: transcoding/CommonAddrspaceGlobalVar.ll
FAIL: LLVM_SPIRV :: transcoding/fmax.ll

🟢 Fixed by this PR (0) — failing on baseline, passing here

(none)

⚠️ Pre-existing on `amd-staging` (21)

FAIL: LLVM_SPIRV :: constant/local-float-point-constants.ll
FAIL: LLVM_SPIRV :: extensions/EXT/SPV_EXT_float8/conversions_matrix.ll
FAIL: LLVM_SPIRV :: extensions/EXT/SPV_EXT_float8/conversions_scalar_vector.ll
FAIL: LLVM_SPIRV :: extensions/EXT/SPV_EXT_shader_atomic_float_/atomicrmw_fsub_half.ll
FAIL: LLVM_SPIRV :: extensions/INTEL/SPV_INTEL_float4/conversions_packed.ll
FAIL: LLVM_SPIRV :: extensions/INTEL/SPV_INTEL_float4/conversions_scalar_vector.ll
FAIL: LLVM_SPIRV :: extensions/INTEL/SPV_INTEL_fp_conversions/spv_intel_fp_conversions.ll
FAIL: LLVM_SPIRV :: extensions/INTEL/SPV_INTEL_int4/conversions_packed.ll
FAIL: LLVM_SPIRV :: extensions/INTEL/SPV_INTEL_sigmoid/sigmoid_f16.ll
FAIL: LLVM_SPIRV :: extensions/KHR/SPV_KHR_bfloat16/cooperative_matrix_bfloat16.ll
FAIL: LLVM_SPIRV :: extensions/KHR/SPV_KHR_cooperative_matrix/conversion_instructions.ll
FAIL: LLVM_SPIRV :: extensions/KHR/SPV_KHR_subgroup_rotate/SPV_KHR_subgroup_rotate.cl
FAIL: LLVM_SPIRV :: extensions/KHR/SPV_KHR_uniform_group_instructions/group-instructions.ll
FAIL: LLVM_SPIRV :: transcoding/float16.ll
FAIL: LLVM_SPIRV :: transcoding/image_signedness_spv_ir.ll
FAIL: LLVM_SPIRV :: transcoding/OpImageSampleExplicitLod_arg.cl
FAIL: LLVM_SPIRV :: transcoding/spec_const.ll
FAIL: LLVM_SPIRV :: transcoding/sub_group_clustered_reduce.ll
FAIL: LLVM_SPIRV :: transcoding/sub_group_non_uniform_arithmetic.ll
FAIL: LLVM_SPIRV :: transcoding/sub_group_shuffle.ll
FAIL: LLVM_SPIRV :: transcoding/sub_group_shuffle_relative.ll

The PR-head and baseline lit steps stay continue-on-error so pre- existing amd-staging breakage doesn't block. The partition comment already surfaces 🔴 New / 🟢 Fixed / ⚠️ Pre-existing buckets. Add a final gate step that exits non-zero when newFails > 0 (PR head FAILs not present in baseline). Real PR-introduced regressions now turn the check red instead of just dropping into a comment. Degrades gracefully — if either capture file is missing (e.g., one of the lit runs failed to produce output) the gate emits a warning and passes rather than blocking on incomplete data. Now that the partition logic has been validated end-to-end on PRs #190/#192 (🔴 New) and #193 (🟢 Fixed), this is a safe step up from informational to blocking.

@sha256

* [CI] Split spirv-ci into Build + parallel Test jobs Today the SPIRV CI is one mega-job (Linux Build & Test) that builds the LLVM/translator/Comgr stack and runs all lit suites sequentially. As we add more test categories (rocm-examples, hip-tests per the SPIRV Automated Testing Status confluence page), one mega-check is too coarse — a failing suite hides the others, can't selectively re-run, and the required-check is all-or-nothing. Split into 4 jobs: Linux Build (required) ├─► Linux Test - SPIRV translator lit (informational, baseline-diff) ├─► Linux Test - LLVM SPIRV codegen (informational) └─► Linux Test - Comgr (informational) Build uploads `build/`, `build-comgr/`, `build-device-libs/` (after strip --strip-unneeded) as a single GHA artifact. Test jobs do a fresh checkout of source trees + download the artifact. Source isn't shipped in the artifact (cleaner: artifact = build outputs, checkout = source). Translator-lit baseline-diff (PR head vs amd-staging) preserved as-is — runs in the translator-lit test job: download artifact, run lit at PR head, swap translator src to amd-staging tip, cmake-reconfigure + ninja-incremental + lit-rerun, post sticky comment with new/fixed/ pre-existing partition. Translator checkout in this job uses fetch-depth: 0 so the baseline `git checkout amd-staging` works. Tests stay informational (per design discussion): promote to required individually as each suite stabilizes. Only Linux Build is required. Pragmatic deviations from TheRock conventions: - GHA artifacts (not S3) for transport — no OIDC/IAM access here yet. Will swap to S3 patterns (post_build_upload.py / fetch_artifacts.py) when this workflow moves into TheRock. Job graph + naming match TheRock so the swap is mechanical. - "Linux" prefix on job names — TheRock uses workflow-file-as-platform convention, but we keep the prefix consistent with the prior rename and our future Windows expansion plans. ACTION REQUIRED on merge: update the amd-staging-psdb ruleset's required-check context from "Linux Build & Test" → "Linux Build". Without it the old rule will dangle (no job named "Linux Build & Test" exists anymore) and PRs would block on a permanently-pending placeholder. * [CI] Refactor into 2-file workflow_call chain (TheRock conventions) Mirrors TheRock's Multi-Arch CI shape: a top-level dispatcher with platform-variant jobs that call reusable per-platform workflows. The Linux variant is byte-isolated so adding Windows later is just a new file + dispatcher entry. Files: - spirv-ci.yml — top-level dispatcher (~25 lines), byte-identical across both this repo and ROCm/llvm-project. Triggers on pull_request + workflow_dispatch. Sole job (linux_release, name Linux::release) calls spirv-ci-linux.yml. - spirv-ci-linux.yml — workflow_call. Holds the build job + 3 test jobs (factored from the prior single spirv-ci.yml). Rendered check_run names in the PR rollup (workflow_call composes the slash hierarchy): - SPIRV Compiler CI / Linux::release / Build - SPIRV Compiler CI / Linux::release / Test SPIRV translator lit - SPIRV Compiler CI / Linux::release / Test LLVM SPIRV codegen - SPIRV Compiler CI / Linux::release / Test Comgr Convention alignment with TheRock (verified against .github/workflows/multi_arch_ci.yml, multi_arch_ci_linux.yml and the rest of the workflow set): - Top-level workflow name Title Case, no branch suffix - Reusable workflow has its own descriptive name - snake_case job IDs, separate display name field - No literal slashes in job names — slashes come from workflow_call - concurrency: only on dispatcher, not on workflow_call - secrets: inherit at the dispatcher → variant call - permissions read-only at workflow level; pull-requests: write escalated only on the translator-lit job that posts the comment - workflow_dispatch alongside pull_request — Multi-Arch's check names don't get the "(pull_request)" suffix even with both triggers, so the disambiguation bug we hit before is sidestepped by the workflow_call structure - Container image pinned with @sha256: ACTION REQUIRED on merge: update the amd-staging-psdb ruleset's required-check context from "Linux Build" → "Linux::release / Build". Without it the dangling rule blocks PRs on a permanently-pending placeholder. * [CI] Grant pull-requests: write at dispatcher level (fix startup_failure) The previous commit moved pull-requests: write to a job-level permission on the test_translator_lit job inside spirv-ci-linux.yml. Per GHA rules, a called workflow's GITHUB_TOKEN permissions are capped by the caller's permissions; if the caller's workflow-level grant is `contents: read` only, the called workflow can't add pull-requests: write — validation fails at startup, no jobs run. Symptom: PR #194 ran with conclusion=startup_failure, no check_runs created, empty PR rollup. Fix: grant pull-requests: write at the dispatcher's workflow-level permissions. The called workflow's job-level grant on test_translator_lit still narrows the actual usage to that single job; this just lifts the caller-side cap. Multi-Arch CI gets away with `contents: read` only because none of its jobs post comments. Ours does, so this grant is required. * [CI] Tar build trees for artifact transport (preserve modes + .git) The previous commit's first run hit two distinct failures rooted in actions/upload-artifact@v4 defaults: 1. Comgr test job: "/bin/sh: clang-23: Permission denied" — v4 strips executable bits on upload, so binaries come back non-executable. 2. Codegen test job: cmake reconfigure (triggered when ninja sees the freshly-checked-out source as newer than the build tree) failed inside FetchContent's SPIRV-Headers update step with "fatal: not a git repository: '.git'" — v4 also excludes hidden files (the .git dir under build/_deps/) by default. Translator-lit job appeared green but actually hit the same first-lit breakage; continue-on-error: true masked it. Fix: tar the build trees ourselves before upload, untar after download. Tar preserves both file modes and hidden files in one shot, which is simpler than chmodding +x and toggling include-hidden-files separately. * [CI] Test-job wall-time fixes: tar -xmf + shallower translator clone Two wall-time bugs found while diagnosing PR #194's slow codegen test: 1. Untar with `tar -xf` restored mtimes from when the build job produced them. Test-job source checkouts run just before, so freshly-checked-out source files appeared newer than (older) build outputs from the tar, and ninja cascade-rebuilt instead of running the requested test target. Switch to `tar -xmf` (skip mtime restore) so build outputs are newer. Observed ~5-10 min wasted on codegen test job. 2. Translator-lit job's translator checkout used `fetch-depth: 0` (full history) only to enable the later `git checkout amd-staging` for the baseline swap. But `git fetch --depth=1 origin amd-staging` followed by `git checkout FETCH_HEAD` works on a shallow clone too. Switch to `fetch-depth: 1`. ~30-60s save per run. Drive-by: trim verbose comments on the tar/untar steps. * [CI] Gate translator-lit job on new failures The PR-head and baseline lit steps stay continue-on-error so pre- existing amd-staging breakage doesn't block. The partition comment already surfaces 🔴 New / 🟢 Fixed / ⚠️ Pre-existing buckets. Add a final gate step that exits non-zero when newFails > 0 (PR head FAILs not present in baseline). Real PR-introduced regressions now turn the check red instead of just dropping into a comment. Degrades gracefully — if either capture file is missing (e.g., one of the lit runs failed to produce output) the gate emits a warning and passes rather than blocking on incomplete data. Now that the partition logic has been validated end-to-end on PRs #190/#192 (🔴 New) and #193 (🟢 Fixed), this is a safe step up from informational to blocking.

Re-creation of #190 (closed) on top of current amd-staging which includes the "Linux Build & Test" job rename (#191) and required-check ruleset update. Verifies end-to-end: - the renamed job emits "Linux Build & Test" check_run name - the updated required-check rule resolves to satisfied - the partition comment correctly classifies these test failures as 🔴 New failures Modifies CHECK directives in two passing transcoding tests with sentinel strings that won't appear in the actual output: - test/transcoding/fmax.ll - test/transcoding/CommonAddrspaceGlobalVar.ll Close without merge once verified.

This was referenced May 8, 2026

[CI test - DO NOT MERGE] Intentionally break two lit tests #190

Closed

[CI test - DO NOT MERGE] Demo Fixed bucket via Khronos #3741 cherry-pick #193

Closed

lamb-j force-pushed the users/lambj/spirv-ci-test-new-failures-v2 branch from 176b27b to 430d6d3 Compare May 10, 2026 21:37

lamb-j closed this May 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI test - DO NOT MERGE] Intentionally break two lit tests (v2)#192

[CI test - DO NOT MERGE] Intentionally break two lit tests (v2)#192
lamb-j wants to merge 1 commit into
amd-stagingfrom
users/lambj/spirv-ci-test-new-failures-v2

lamb-j commented May 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lamb-j commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lamb-j commented May 8, 2026 •

edited

Loading

github-actions Bot commented May 8, 2026 •

edited

Loading