Skip to content

[AMDGPU] comgr: rewrite s_sethalt to s_nop for GFX1250 A0#3087

Open
jammm wants to merge 1 commit into
amd-stagingfrom
users/jam/sethalt-hotswap-gfx1250-staging
Open

[AMDGPU] comgr: rewrite s_sethalt to s_nop for GFX1250 A0#3087
jammm wants to merge 1 commit into
amd-stagingfrom
users/jam/sethalt-hotswap-gfx1250-staging

Conversation

@jammm

@jammm jammm commented Jun 26, 2026

Copy link
Copy Markdown

Summary

This PR splits out the s_sethalt hotswap fix from #2519 into its own branch/PR.

For the GFX1250 B0-to-A0 hotswap path, COMGR now detects in-shader s_sethalt instructions and rewrites them in-place to s_nop 0. This avoids the A0 LD_SCALE/WMMA clause-break hazard caused by a shader halt immediately before scale/WMMA execution.

Details

  • Adds a new hotswap patch module: comgr-hotswap-patch-sethalt-fix.cpp
  • Registers the patch through the existing hotswap patch vtable/registry
  • Runs the patch after the existing whole-kernel hotswap passes
  • Rewrites only decoded s_sethalt instructions
  • Uses the existing pre-encoded LS.SNopBytes replacement bytes
  • Logs the kernel name and .text offset for each neutralized instruction

This only handles in-code-object s_sethalt. External halt mechanisms such as host/debugger SQ halt commands are outside COMGR’s code-object rewrite path and are not handled here.

Tests

Added hotswap-sethalt-fix.s, covering:

  • bare s_sethalt rewrite
  • s_sethalt immediately before a VOP3PX2 instruction
  • multiple s_sethalt instructions in one kernel
  • no-op behavior for kernels without s_sethalt

Validation run locally:

cmake --build /jam/TheRock-build --target amd-comgr+stage
PATH="/jam/TheRock-build/compiler/amd-llvm/build/bin:$PATH" \
  /jam/TheRock-build/compiler/amd-llvm/build/bin/llvm-lit -sv \
  /jam/TheRock-build/compiler/amd-comgr/build/test-lit --filter='hotswap'

Result: 51/51 hotswap lit tests passed.

Rewrites every s_sethalt to s_nop 0 in-place (4-byte overwrite, no
length change, no offset shifts). Defensive: LLVM almost never
emits s_sethalt in production code (only via the int_amdgcn_s_sethalt
intrinsic, mostly in debug builds), so this pass is dormant on most
inputs -- it exists for defense-in-depth against any input ELF,
compiler-emitted or hand-written, that ships s_sethalt.

The replacement bytes come from LS.SNopBytes (pre-encoded at
initLLVM() time, the same source the splitter uses for trampoline
padding).

Wired into b0a0.cpp dispatcher AFTER applyVop3pxWrapPatch (the
order doesn't matter functionally -- sethalt-fix and the wrap pass
don't interact -- but keeping all the A0-only post-loop passes
adjacent makes the dispatch order easy to read).

LIT test (hotswap-sethalt-fix.s): 4 cases -- bare s_sethalt is
neutralized, s_sethalt before VOP3PX2 is neutralized, multiple
s_sethalt instances are all neutralized, kernels without s_sethalt
are not modified.
@jammm jammm added comgr Related to Code Object Manager hotswap Related to the Comgr Hotswap feature ci:skip Skip all CI builds/tests for this PR labels Jun 26, 2026
@jammm jammm marked this pull request as ready for review June 26, 2026 15:26
@jammm jammm requested review from chinmaydd and lamb-j as code owners June 26, 2026 15:26

@chinmaydd chinmaydd left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:skip Skip all CI builds/tests for this PR comgr Related to Code Object Manager hotswap Related to the Comgr Hotswap feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants