[SYCL][AMDGPU] Set amdgpu-flat-work-group-size for SYCL reqd_work_group_size by wenju-he · Pull Request #22452 · intel/llvm

wenju-he · 2026-06-26T09:28:02Z

AMDGPU verifier (added in 6794e31) requires amdgpu-flat-work-group-size function attribute whenever reqd_work_group_size metadata is present. setFunctionDeclAttributes only handled OpenCL's ReqdWorkGroupSizeAttr; SYCL uses SYCLReqdWorkGroupSizeAttr which went through CodeGenFunction.cpp to emit the metadata but never set the function attribute, triggering the verifier error on amdgcn targets.

Fixes CodeGenSYCL/reqd-work-group-size.cpp and check-work-group-attributes-match.cpp.
CMPLRLLVM-76303

…iceCompilation.cpp (intel#22344) Signed-off-by: jinge90 <ge.jin@intel.com>

Fixes #205073.

Refactor `DimOp::fold` in both memref and tensor dialects to use the existing `getConstantIndex()` helper instead of manually extracting the index via `IntegerAttr`.

This patch was a part of llvm/llvm-project#201170. I split the `icmp ptr` support from the original PR since I am worried it might not catch up for the LLVM 23 release (#201170 is blocked by #200672 for curating mixed provenance tests). I hope we can pick most of the low-hanging fruit exposed by fuzzers before the release. The released version should be able to run csmith-generated tests without obvious false positives or crashes. BTW, this patch doesn't respect the exact semantics of `icmp ptr` (i.e., truncating the address to the address width. The naming is a bit confusing...). Currently, we don't model external state in non-address bits of a pointer in llubi. So I think it is fine.

…l#22408) Fixes build breakage caused by 448b725 ("clang/Driver: Use struct type for BoundArch instead of StringRef"), which changed virtual signatures in `ToolChain.h` and related APIs. Update SYCL/CUDA driver code to use the new `BoundArch` struct type instead of `StringRef`/`const char*`: - SYCL.h/Cuda.h: update override signatures to match base class (`getDeviceLibs`, `getSupportedSanitizers`) - Driver.cpp: migrate `DeviceTargetInfo::BoundArch` field from `const char*` to `BoundArch`; fix all downstream uses including `appendSYCLDeviceLink`, `addSYCLDeviceLibs`, `CollectForEachInputs`, and `BuildJobsForActionNoCache` (fix stale `BoundArch` type-as-value references to use `BA` parameter); fix `nullptr`/`StringRef()` in `DeviceDependences::add`, `HostDependence`, and unbundling action `registerDependentActionInfo` calls - Clang.cpp: fix `getOffloadingArch()` → `.ArchName` for `StringRef` contexts; fix `doOnEachDependence` lambda param type; fix `getArgsForToolChain` calls with empty arch Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…040) To match with GCC: https://godbolt.org/z/KPKGhhenK Fixes: #203760 Assisted-by: Claude Sonnet 4.6

This makes it more consistent with the rest of the repository.

Fix missing opcodes in table of flag-setting instructions.

Reverts llvm/llvm-project#173135 and and add two new IR tests to demonstrate the impact of different atomic orderings on Dead Store Elimination(DSE). This reverts commit c8941df. Co-authored-by: Aiden Grossman <aidengrossman@google.com>

1) Return the evaluated APValue as a const pointer since it may not be modified by callers. 2) Only return a non-nullptr from `getEvaluatedValue()` if the APValue not absent.

Local build on Linux platform reports a compiler warning: llvm-project/libc/utils/MPFRWrapper/MPCommon.cpp:546:15: warning: implicit conversion loses integer precision: 'long' to 'int' [-Wshorten-64-to-32] 546 | int mod = mpfr_get_si(value_ret_exact.value, MPFR_RNDN); | ~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 warning generated. Signed-off-by: jinge90 <ge.jin@intel.com>

…22342) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

…el#22409) Test fails after 96eb0cb. Add NativeCPULibclcCall, SYCLGlobalVar, SYCLIntelESimdVectorize, SYCLUsesAspects to undocumented list; remove ReqdWorkGroupSize and WorkGroupSizeHint (now documented); update total 84->86. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

This fixes an oversight in #164241.

…205477) `getAsCXXRecordDecl` will return nullptr for any dependent types. It's introduced by #192786, see llvm/llvm-project#192786 (comment) in original PR.

Use m_c_ICmp so the load can be on either side of the icmp.

Without asserts, we see failures like so: /repo/llvm/llvm/lib/Target/Hexagon/HexagonAsmPrinter.cpp:982:43: error: unused variable 'NextI' [-Werror,-Wunused-variable] 982 | MachineBasicBlock::const_instr_iterator NextI = std::next(MI.getIterator()); | ^~~~~ 1 error generated. Mark NextI `maybe_unused` to address the issue. Fixes a regression introduced by f8aa5f6.

Implement `memset`, `memcpy`, `memmove` intrinsics and their corresponding inline version. Note that the `isvolatile` argument is ignored and left for future PRs.

This PR fixes two related DWARF constant-handling bugs that were blocking each other. First, LLDB's DWARF expression evaluator in [`DWARFExpression.cpp`](https://github.com/llvm/llvm-project/blob/main/lldb/source/Expression/DWARFExpression.cpp) handled `DW_OP_constu` and `DW_OP_consts` without going through `to_generic`. Under DWARF, these operators push a generic value: an address-sized integral value with unspecified signedness. That means the result should be truncated to the target address size (via `to_generic`). Second, LLVM already had a producer-side issue tracked as [#47431](llvm/llvm-project#47431): on 32-bit targets, LLVM could emit `DW_OP_consts` / `DW_OP_constu` for source integer constants wider than the target generic type. If LLDB were fixed alone, those producer-emitted constants would become truncated as DWARF requires, exposing incorrect debug info for wide source values. This patch fixes both sides together. ## What Changed On the LLDB consumer side: - `DW_OP_constu` now uses `to_generic`. - `DW_OP_consts` now uses `to_generic`. - The corresponding LLDB DWARF expression tests were updated to expect address-sized generic values. On the LLVM producer side: - Wide integer debug-location constants that cannot be represented by the target generic type are emitted as `DW_OP_implicit_value` instead of `DW_OP_const*`. - This preserves the source value bytes instead of relying on an address-sized DWARF generic constant. - The producer-side change is limited to complete constant values, where there are no remaining `DIExpression` operations. ## Validation Locally verified with: ```text build/tools/lldb/unittests/Expression/ExpressionTests --gtest_filter='DWARFExpression.*' 74 tests passed build/bin/llvm-lit -sv llvm/test/DebugInfo/X86/constant-loclist.ll 1 test passed ninja -C build check-lldb -j12 No unexpected failures ninja -C build check-all -j12 Completed with one unrelated local failure in Clang Tools :: clang-doc/DR-141990.cpp, caused by host warning-option output. No DebugInfo, DWARF, LLDB expression, or AsmPrinter-related failures were observed. ```

…hs (#205492) `performFusion()` and `fuseGuardedLoops()` carried two character-for-character identical tails: header-PHI migration plus latch rewiring, and the SCEV-forget / block-merge / latch-merge finalization. Extract them into `rewireFusedHeaderPHIsAndLatches()` and `finalizeFusedLoop()` and call both from each path.

…205498)

It's a lightweight pass. Should always be the last SSA pass since peephole can end up making some instructions dead.

…#204056) Add the CBA offset delta to sh_size once at the end instead of after each write.

This patch makes the following changes: - Refactor the internal sorting functions to reduce code duplication. - Move the testing machinery done for the testing of `qsort_r` to a shared place. These changes are done in anticipation to the introduction of Annex K's `qsort_s`. This function shares most of its semantics with `qsort_r`, therefore most of the testing logic can be shared between the two. Besides, `qsort`, `qsort_r` and `qsort_s` are all very similar, hence we can attempt to reduce duplication a bit more.

The code which calculates the 'errsign' parameter to pass to `__compiler_rt_dunder` was wrong in two ways. It calculated the value with the wrong sign, and also in the wrong register, r12 rather than r2! In this code's original context, both of those things made sense (the 'dunder' function had a nonstandard ABI). Somehow none of the existing test cases detected the problem. We found this bug in a test case downstream that only failed big-endian (because that changes which half of the denominator mantissa is left in r2 to be accidentally used as errsign). However, the new test cases here are designed to detect the failure in both endiannesses.

This patch adds struct ip_mreq, ip_mreq_source, ip_mreqn, ip_opts, and ip_msfilter to <netinet/in.h>, along with IP level socket option macros (IP_TOS, IP_TTL, IP_ADD_MEMBERSHIP, etc.). I add basic unit tests verifying the size and member offsets of the new structures against standard layout expectations, mainly to make sure that the files are used /somewhere/. Assisted by Gemini.

…AL (#205353) Lowering is generating patterns when forwarding OPTIONAL in calls that looks like: ``` %present = fir.is_present %var : (T) -> i1 %if_result = fir.if %present -> (T) { fir.result %var : T } else { %absent = fir.absent T fir.result %absent : T } ``` This specific pattern is a no-op and `%var` can be used directly. The lowering logic that generates such patterns is inside non trivial compiler code that has to deal with more complex scenarios where the code inside the fir.if is more complex. Add a FIR pattern to canonicalize such code to help with later analysis (like aliasing).

Towards #172124 #References: - https://wg21.link/range.reverse - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

llvm/llvm-project#203084 adds diagnostics about unused variables to the libc++ containers. This patch is the fallout from the projects I tried to build with it.

Match the `VarDecl::evaluateValue()` contract updated by #205033 in CIR constant emission.

…el#22446) b3ca5fb introduced amdgpu-flat-work-group-size attributes with per-kernel values, causing each kernel to get a distinct attribute group. Fixes CodeGenSYCL/check-work-group-attributes-match.cpp CMPLRLLVM-76303 --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…el#22434) TranslateOffloadTargetArgs had a broad exception for amdgcn OpenMP that passed all -m_Group args to the device DAL. This silently claimed flags like -mxnack/-mno-sramecc, suppressing the expected "unused argument" warning tested by amdgpu-xnack-sramecc-flags.c. The original intent was only to forward -mcpu to the device toolchain; narrow the condition accordingly. Fixes Driver/amdgpu-xnack-sramecc-flags.c CMPLRLLVM-76233 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…empty (intel#22432) BoundArch("") sets Arch=OffloadArch::Unknown (not Unused), so operator bool() returns true even for an empty arch name. This caused TranslateArgs to erase -march=<value> and re-add -march= (empty), producing "unsupported argument '' to option '-march='". Fix: use !BA.empty() (checks ArchName string) instead of if (BA) (checks Arch != Unused). Broken by 448b725. Fixes test Driver/sycl-native-cpu.cpp CMPLRLLVM-76405 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

89905ff added nofree to below tests: - check_device_code/vector/bf16_builtins_new_vec.cpp - check_device_code/vector/bf16_builtins_old_vec.cpp - check_device_code/vector/convert_bfloat_new_vec.cpp - check_device_code/vector/convert_bfloat_old_vec.cpp - check_device_code/vector/math_ops_new_vec.cpp - check_device_code/vector/math_ops_old_vec.cpp CMPLRLLVM-76303

…ble (intel#22449) 437f95c disabled address space optimization for kernel args, adding range/id byval struct params to NativeCPU kernel signatures (4->6 params). ESIMD kernels capturing int args now retain them in the signature. Fixes - check_device_code/esimd/NBarrierAttr.cpp - check_device_code/esimd/dae.cpp - check_device_code/esimd/genx_func_attr.cpp - check_device_code/esimd/slm_init_specconst_size.cpp - check_device_code/native_cpu/kernelhandler-scalar.cpp - check_device_code/native_cpu/native_cpu_subhandler.cpp CMPLRLLVM-76303 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

CONFLICT (content): Merge conflict in sycl/test-e2e/KernelAndProgram/level-zero-static-link-flow.cpp CONFLICT (content): Merge conflict in sycl/test-e2e/Properties/cache_config.cpp

…up_size AMDGPU verifier (added in 6794e31) requires amdgpu-flat-work-group-size function attribute whenever reqd_work_group_size metadata is present. setFunctionDeclAttributes only handled OpenCL's ReqdWorkGroupSizeAttr; SYCL uses SYCLReqdWorkGroupSizeAttr which went through CodeGenFunction.cpp to emit the metadata but never set the function attribute, triggering the verifier error on amdgcn targets. Fixes CodeGenSYCL/reqd-work-group-size.cpp CMPLRLLVM-76303 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

wenju-he · 2026-06-27T01:04:54Z

not upstream-able because SYCLReqdWorkGroupSizeAttr doesn't exists in llvm-project

jinge90 and others added 30 commits June 23, 2026 17:24

[SYCL][NFC] Move target triple name out of loop in KernelCompiler Dev…

36d5e81

…iceCompilation.cpp (intel#22344) Signed-off-by: jinge90 <ge.jin@intel.com>

[mlir][gpu] Fix memref.dim folding with negative index (#205338)

4f051ae

Fixes #205073.

[mlir] Simplify DimOp::fold by using getConstantIndex(NFC) (#205343)

dba3717

Refactor `DimOp::fold` in both memref and tensor dialects to use the existing `getConstantIndex()` helper instead of manually extracting the index via `IntegerAttr`.

[clang] Exclude EmptyRecord when calculating larger CXX records (#205…

9fa8669

…040) To match with GCC: https://godbolt.org/z/KPKGhhenK Fixes: #203760 Assisted-by: Claude Sonnet 4.6

[Github] Bump release-binaries python version (#179287)

1f0799c

This makes it more consistent with the rest of the repository.

[AArch64] Add final missing instructions to sForm (#167518)

20cf046

Fix missing opcodes in table of flag-setting instructions.

[clang][AST] Refactor EvaluatedStmt accessors in VarDecl (#205033)

2c0c6eb

1) Return the evaluated APValue as a const pointer since it may not be modified by callers. 2) Only return a non-nullptr from `getEvaluatedValue()` if the APValue not absent.

[SYCL] Use ext_vector_type for optimizing marray arithmetic (intel#…

6c86c95

…22342) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

[AMDGPU] Reject src1 immediates with dpp when unsupported (#201494)

e72e958

This fixes an oversight in #164241.

[clang-tidy][NFC] Remove a wrong comment in ProTypeMemberInitCheck (#…

bf1f61b

…205477) `getAsCXXRecordDecl` will return nullptr for any dependent types. It's introduced by #192786, see llvm/llvm-project#192786 (comment) in original PR.

[LV] Accept swapped operands in early-exit condition compare (#199989)

10c0e8e

Use m_c_ICmp so the load can be on either side of the icmp.

[llubi] Implement memory manipulation intrinsics (#204932)

13c45a6

Implement `memset`, `memcpy`, `memmove` intrinsics and their corresponding inline version. Note that the `isvolatile` argument is ignored and left for future PRs.

[RISCV] Convert opaque pointers in vp-combine-reverse-load.ll. NFC (#…

da77f74

…205498)

[AArch64] Run cleanup one final time after peephole (#199711)

3dce6e6

It's a lightweight pass. Should always be the last SSA pass since peephole can end up making some instructions dead.

[ObjectYAML][NFC] Derive BBAddrMap section size from the CBA offset (…

7fe0b4c

…#204056) Add the CBA offset delta to sh_size once at the end instead of after each write.

[libc++][ranges] Applied [[nodiscard]] to reverse_view (#205186)

a1f50d6

Towards #172124 #References: - https://wg21.link/range.reverse - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

Remove unused variables in the monorepo (#204994)

11c6192

llvm/llvm-project#203084 adds diagnostics about unused variables to the libc++ containers. This patch is the fallout from the projects I tried to build with it.

[CIR] Handle const evaluated variable values (#205512)

7a6c8fb

Match the `VarDecl::evaluateValue()` contract updated by #205033 in CIR constant emission.

wenju-he and others added 7 commits June 26, 2026 11:17

Merge from 'sycl' to 'sycl-web' (18 commits)

5983d70

CONFLICT (content): Merge conflict in sycl/test-e2e/KernelAndProgram/level-zero-static-link-flow.cpp CONFLICT (content): Merge conflict in sycl/test-e2e/Properties/cache_config.cpp

wenju-he requested a review from a team as a code owner June 26, 2026 09:28

wenju-he mentioned this pull request Jun 27, 2026

[SYCL][NFC] Fix check-work-group-attributes-match.cpp for AMDGPU #22446

Merged

wenju-he force-pushed the sycl-web branch from a68962c to 74a1382 Compare June 28, 2026 07:57

wenju-he requested review from a team, Maetveis, bader and cperkinsintel as code owners June 28, 2026 07:57

wenju-he requested review from adamfidel and rafNNN and removed request for a team June 28, 2026 07:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][AMDGPU] Set amdgpu-flat-work-group-size for SYCL reqd_work_group_size#22452

[SYCL][AMDGPU] Set amdgpu-flat-work-group-size for SYCL reqd_work_group_size#22452
wenju-he wants to merge 528 commits into
intel:sycl-webfrom
wenju-he:fix-CodeGenSYCL-reqd-work-group-size.cpp

wenju-he commented Jun 26, 2026 •

edited

Loading

Uh oh!

wenju-he commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Uh oh!

Conversation

wenju-he commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wenju-he commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

wenju-he commented Jun 26, 2026 •

edited

Loading