Skip to content

[SYCL][AMDGPU] Set amdgpu-flat-work-group-size for SYCL reqd_work_group_size#22452

Open
wenju-he wants to merge 528 commits into
intel:sycl-webfrom
wenju-he:fix-CodeGenSYCL-reqd-work-group-size.cpp
Open

[SYCL][AMDGPU] Set amdgpu-flat-work-group-size for SYCL reqd_work_group_size#22452
wenju-he wants to merge 528 commits into
intel:sycl-webfrom
wenju-he:fix-CodeGenSYCL-reqd-work-group-size.cpp

Conversation

@wenju-he

@wenju-he wenju-he commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

AMDGPU verifier (added in 6794e31) requires amdgpu-flat-work-group-size function attribute whenever reqd_work_group_size metadata is present. setFunctionDeclAttributes only handled OpenCL's ReqdWorkGroupSizeAttr; SYCL uses SYCLReqdWorkGroupSizeAttr which went through CodeGenFunction.cpp to emit the metadata but never set the function attribute, triggering the verifier error on amdgcn targets.

Fixes CodeGenSYCL/reqd-work-group-size.cpp and check-work-group-attributes-match.cpp.
CMPLRLLVM-76303

jinge90 and others added 30 commits June 23, 2026 17:24
…iceCompilation.cpp (intel#22344)

Signed-off-by: jinge90 <ge.jin@intel.com>
Refactor `DimOp::fold` in both memref and tensor dialects to use the
existing `getConstantIndex()` helper instead of manually extracting the
index via `IntegerAttr`.
This patch was a part of
llvm/llvm-project#201170. I split the `icmp ptr`
support from the original PR since I am worried it might not catch up
for the LLVM 23 release (#201170 is blocked by #200672 for curating
mixed provenance tests). I hope we can pick most of the low-hanging
fruit exposed by fuzzers before the release. The released version should
be able to run csmith-generated tests without obvious false positives or
crashes.

BTW, this patch doesn't respect the exact semantics of `icmp ptr` (i.e.,
truncating the address to the address width. The naming is a bit
confusing...). Currently, we don't model external state in non-address
bits of a pointer in llubi. So I think it is fine.
…l#22408)

Fixes build breakage caused by 448b725
("clang/Driver: Use struct type for BoundArch instead of StringRef"),
which changed virtual signatures in `ToolChain.h` and related APIs.

Update SYCL/CUDA driver code to use the new `BoundArch` struct type
instead of `StringRef`/`const char*`:
- SYCL.h/Cuda.h: update override signatures to match base class
(`getDeviceLibs`, `getSupportedSanitizers`)
- Driver.cpp: migrate `DeviceTargetInfo::BoundArch` field from `const
char*` to `BoundArch`; fix all downstream uses including
`appendSYCLDeviceLink`, `addSYCLDeviceLibs`, `CollectForEachInputs`, and
`BuildJobsForActionNoCache` (fix stale `BoundArch` type-as-value
references to use `BA` parameter); fix `nullptr`/`StringRef()` in
`DeviceDependences::add`, `HostDependence`, and unbundling action
`registerDependentActionInfo` calls
- Clang.cpp: fix `getOffloadingArch()` → `.ArchName` for `StringRef`
contexts; fix `doOnEachDependence` lambda param type; fix
`getArgsForToolChain` calls with empty arch

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…040)

To match with GCC: https://godbolt.org/z/KPKGhhenK

Fixes: #203760

Assisted-by: Claude Sonnet 4.6
This makes it more consistent with the rest of the repository.
Fix missing opcodes in table of flag-setting instructions.
Reverts llvm/llvm-project#173135 and and add two
new IR tests to demonstrate the impact of different atomic orderings on
Dead Store Elimination(DSE).

This reverts commit c8941df.

Co-authored-by: Aiden Grossman <aidengrossman@google.com>
1) Return the evaluated APValue as a const pointer since it
    may not be modified by callers.
 2) Only return a non-nullptr from `getEvaluatedValue()` if
    the APValue not absent.
Local build on Linux platform reports a compiler warning:
llvm-project/libc/utils/MPFRWrapper/MPCommon.cpp:546:15: warning:
implicit conversion loses integer precision: 'long' to 'int'
[-Wshorten-64-to-32]
  546 |     int mod = mpfr_get_si(value_ret_exact.value, MPFR_RNDN);
      |         ~~~   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.

Signed-off-by: jinge90 <ge.jin@intel.com>
…22342)

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
…el#22409)

Test fails after 96eb0cb.
Add NativeCPULibclcCall, SYCLGlobalVar, SYCLIntelESimdVectorize,
SYCLUsesAspects to undocumented list; remove ReqdWorkGroupSize and
WorkGroupSizeHint (now documented); update total 84->86.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…205477)

`getAsCXXRecordDecl` will return nullptr for any dependent types.

It's introduced by #192786, see
llvm/llvm-project#192786 (comment)
in original PR.
Use m_c_ICmp so the load can be on either side of the icmp.
Without asserts, we see failures like so:

/repo/llvm/llvm/lib/Target/Hexagon/HexagonAsmPrinter.cpp:982:43: error:
unused variable 'NextI' [-Werror,-Wunused-variable]
982 | MachineBasicBlock::const_instr_iterator NextI =
std::next(MI.getIterator());
          |                                           ^~~~~
    1 error generated.

Mark NextI `maybe_unused` to address the issue.

Fixes a regression introduced by f8aa5f6.
Implement `memset`, `memcpy`, `memmove` intrinsics and their
corresponding inline version. Note that the `isvolatile` argument is
ignored and left for future PRs.
This PR fixes two related DWARF constant-handling bugs that were
blocking each other.

First, LLDB's DWARF expression evaluator in
[`DWARFExpression.cpp`](https://github.com/llvm/llvm-project/blob/main/lldb/source/Expression/DWARFExpression.cpp)
handled `DW_OP_constu` and `DW_OP_consts` without going through
`to_generic`. Under DWARF, these operators push a generic value: an
address-sized integral value with unspecified signedness. That means the
result should be truncated to the target address size (via
`to_generic`).

Second, LLVM already had a producer-side issue tracked as
[#47431](llvm/llvm-project#47431): on 32-bit
targets, LLVM could emit `DW_OP_consts` / `DW_OP_constu` for source
integer constants wider than the target generic type. If LLDB were fixed
alone, those producer-emitted constants would become truncated as DWARF
requires, exposing incorrect debug info for wide source values.

This patch fixes both sides together.

## What Changed

On the LLDB consumer side:

- `DW_OP_constu` now uses `to_generic`.
- `DW_OP_consts` now uses `to_generic`.
- The corresponding LLDB DWARF expression tests were updated to expect
address-sized generic values.

On the LLVM producer side:

- Wide integer debug-location constants that cannot be represented by
the target generic type are emitted as `DW_OP_implicit_value` instead of
`DW_OP_const*`.
- This preserves the source value bytes instead of relying on an
address-sized DWARF generic constant.
- The producer-side change is limited to complete constant values, where
there are no remaining `DIExpression` operations.

## Validation

Locally verified with:

```text
build/tools/lldb/unittests/Expression/ExpressionTests --gtest_filter='DWARFExpression.*'
74 tests passed

build/bin/llvm-lit -sv llvm/test/DebugInfo/X86/constant-loclist.ll
1 test passed

ninja -C build check-lldb -j12
No unexpected failures

ninja -C build check-all -j12
Completed with one unrelated local failure in Clang Tools :: clang-doc/DR-141990.cpp, caused by host warning-option output. No DebugInfo, DWARF, LLDB expression, or AsmPrinter-related failures were observed.

```
…hs (#205492)

`performFusion()` and `fuseGuardedLoops()` carried two
character-for-character identical tails: header-PHI migration plus latch
rewiring, and the SCEV-forget / block-merge / latch-merge finalization.
Extract them into `rewireFusedHeaderPHIsAndLatches()` and
`finalizeFusedLoop()` and call both from each path.
It's a lightweight pass. Should always be the last SSA pass since
peephole can end up making some instructions dead.
…#204056)

Add the CBA offset delta to sh_size once at the end instead of after
each write.
This patch makes the following changes:
 - Refactor the internal sorting functions to reduce code duplication.
- Move the testing machinery done for the testing of `qsort_r` to a
shared place.

These changes are done in anticipation to the introduction of Annex K's
`qsort_s`. This function shares most of its semantics with `qsort_r`,
therefore most of the testing logic can be shared between the two.
Besides, `qsort`, `qsort_r` and `qsort_s` are all very similar, hence we
can attempt to reduce duplication a bit more.
The code which calculates the 'errsign' parameter to pass to
`__compiler_rt_dunder` was wrong in two ways. It calculated the value
with the wrong sign, and also in the wrong register, r12 rather than r2!
In this code's original context, both of those things made sense (the
'dunder' function had a nonstandard ABI). Somehow none of the existing
test cases detected the problem.

We found this bug in a test case downstream that only failed big-endian
(because that changes which half of the denominator mantissa is left in
r2 to be accidentally used as errsign). However, the new test cases here
are designed to detect the failure in both endiannesses.
This patch adds struct ip_mreq, ip_mreq_source, ip_mreqn, ip_opts, and
ip_msfilter to <netinet/in.h>, along with IP level socket option macros
(IP_TOS, IP_TTL, IP_ADD_MEMBERSHIP, etc.).

I add basic unit tests verifying the size and member offsets of the new
structures against standard layout expectations, mainly to make sure
that the files are used /somewhere/.

Assisted by Gemini.
…AL (#205353)

Lowering is generating patterns when forwarding OPTIONAL in calls that
looks like:

```
   %present = fir.is_present %var : (T) -> i1
    %if_result = fir.if %present -> (T) {
      fir.result %var : T
    } else {
      %absent = fir.absent T
      fir.result %absent : T
    }
```

This specific pattern is a no-op and `%var` can be used directly. The
lowering logic that generates such patterns is inside non trivial
compiler code that has to deal with more complex scenarios where the
code inside the fir.if is more complex. Add a FIR pattern to
canonicalize such code to help with later analysis (like aliasing).
llvm/llvm-project#203084 adds diagnostics about
unused variables to the libc++ containers. This patch is the fallout
from the projects I tried to build with it.
Match the `VarDecl::evaluateValue()` contract updated by #205033 in CIR
constant emission.
wenju-he and others added 7 commits June 26, 2026 11:17
…el#22446)

b3ca5fb introduced amdgpu-flat-work-group-size attributes with
per-kernel values, causing each kernel to get a distinct attribute group.

Fixes CodeGenSYCL/check-work-group-attributes-match.cpp
CMPLRLLVM-76303

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…el#22434)

TranslateOffloadTargetArgs had a broad exception for amdgcn OpenMP that
passed all -m_Group args to the device DAL. This silently claimed flags
like -mxnack/-mno-sramecc, suppressing the expected "unused argument"
warning tested by amdgpu-xnack-sramecc-flags.c.

The original intent was only to forward -mcpu to the device toolchain;
narrow the condition accordingly.

Fixes Driver/amdgpu-xnack-sramecc-flags.c
CMPLRLLVM-76233

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…empty (intel#22432)

BoundArch("") sets Arch=OffloadArch::Unknown (not Unused), so operator
bool() returns true even for an empty arch name. This caused
TranslateArgs to erase -march=<value> and re-add -march= (empty),
producing "unsupported argument '' to option '-march='".

Fix: use !BA.empty() (checks ArchName string) instead of if (BA) (checks Arch != Unused).

Broken by 448b725.

Fixes test Driver/sycl-native-cpu.cpp
CMPLRLLVM-76405

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
89905ff added nofree to below tests:
- check_device_code/vector/bf16_builtins_new_vec.cpp
- check_device_code/vector/bf16_builtins_old_vec.cpp
- check_device_code/vector/convert_bfloat_new_vec.cpp
- check_device_code/vector/convert_bfloat_old_vec.cpp
- check_device_code/vector/math_ops_new_vec.cpp
- check_device_code/vector/math_ops_old_vec.cpp

CMPLRLLVM-76303
…ble (intel#22449)

437f95c disabled address space optimization for kernel args, adding
range/id byval struct params to NativeCPU kernel signatures (4->6
params).
ESIMD kernels capturing int args now retain them in the signature.

Fixes
- check_device_code/esimd/NBarrierAttr.cpp
- check_device_code/esimd/dae.cpp
- check_device_code/esimd/genx_func_attr.cpp
- check_device_code/esimd/slm_init_specconst_size.cpp
- check_device_code/native_cpu/kernelhandler-scalar.cpp
- check_device_code/native_cpu/native_cpu_subhandler.cpp

CMPLRLLVM-76303

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
  CONFLICT (content): Merge conflict in sycl/test-e2e/KernelAndProgram/level-zero-static-link-flow.cpp
  CONFLICT (content): Merge conflict in sycl/test-e2e/Properties/cache_config.cpp
…up_size

AMDGPU verifier (added in 6794e31) requires amdgpu-flat-work-group-size
function attribute whenever reqd_work_group_size metadata is present.
setFunctionDeclAttributes only handled OpenCL's ReqdWorkGroupSizeAttr; SYCL
uses SYCLReqdWorkGroupSizeAttr which went through CodeGenFunction.cpp to emit
the metadata but never set the function attribute, triggering the verifier
error on amdgcn targets.

Fixes CodeGenSYCL/reqd-work-group-size.cpp
CMPLRLLVM-76303

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@wenju-he

Copy link
Copy Markdown
Contributor Author

not upstream-able because SYCLReqdWorkGroupSizeAttr doesn't exists in llvm-project

@wenju-he wenju-he requested review from a team, Maetveis, bader and cperkinsintel as code owners June 28, 2026 07:57
@wenju-he wenju-he requested review from adamfidel and rafNNN and removed request for a team June 28, 2026 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.