[SYCL][AMDGPU] Set amdgpu-flat-work-group-size for SYCL reqd_work_group_size#22452
Open
wenju-he wants to merge 528 commits into
Open
[SYCL][AMDGPU] Set amdgpu-flat-work-group-size for SYCL reqd_work_group_size#22452wenju-he wants to merge 528 commits into
wenju-he wants to merge 528 commits into
Conversation
…iceCompilation.cpp (intel#22344) Signed-off-by: jinge90 <ge.jin@intel.com>
Refactor `DimOp::fold` in both memref and tensor dialects to use the existing `getConstantIndex()` helper instead of manually extracting the index via `IntegerAttr`.
This patch was a part of llvm/llvm-project#201170. I split the `icmp ptr` support from the original PR since I am worried it might not catch up for the LLVM 23 release (#201170 is blocked by #200672 for curating mixed provenance tests). I hope we can pick most of the low-hanging fruit exposed by fuzzers before the release. The released version should be able to run csmith-generated tests without obvious false positives or crashes. BTW, this patch doesn't respect the exact semantics of `icmp ptr` (i.e., truncating the address to the address width. The naming is a bit confusing...). Currently, we don't model external state in non-address bits of a pointer in llubi. So I think it is fine.
…l#22408) Fixes build breakage caused by 448b725 ("clang/Driver: Use struct type for BoundArch instead of StringRef"), which changed virtual signatures in `ToolChain.h` and related APIs. Update SYCL/CUDA driver code to use the new `BoundArch` struct type instead of `StringRef`/`const char*`: - SYCL.h/Cuda.h: update override signatures to match base class (`getDeviceLibs`, `getSupportedSanitizers`) - Driver.cpp: migrate `DeviceTargetInfo::BoundArch` field from `const char*` to `BoundArch`; fix all downstream uses including `appendSYCLDeviceLink`, `addSYCLDeviceLibs`, `CollectForEachInputs`, and `BuildJobsForActionNoCache` (fix stale `BoundArch` type-as-value references to use `BA` parameter); fix `nullptr`/`StringRef()` in `DeviceDependences::add`, `HostDependence`, and unbundling action `registerDependentActionInfo` calls - Clang.cpp: fix `getOffloadingArch()` → `.ArchName` for `StringRef` contexts; fix `doOnEachDependence` lambda param type; fix `getArgsForToolChain` calls with empty arch Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…040) To match with GCC: https://godbolt.org/z/KPKGhhenK Fixes: #203760 Assisted-by: Claude Sonnet 4.6
This makes it more consistent with the rest of the repository.
Fix missing opcodes in table of flag-setting instructions.
Reverts llvm/llvm-project#173135 and and add two new IR tests to demonstrate the impact of different atomic orderings on Dead Store Elimination(DSE). This reverts commit c8941df. Co-authored-by: Aiden Grossman <aidengrossman@google.com>
1) Return the evaluated APValue as a const pointer since it
may not be modified by callers.
2) Only return a non-nullptr from `getEvaluatedValue()` if
the APValue not absent.
Local build on Linux platform reports a compiler warning:
llvm-project/libc/utils/MPFRWrapper/MPCommon.cpp:546:15: warning:
implicit conversion loses integer precision: 'long' to 'int'
[-Wshorten-64-to-32]
546 | int mod = mpfr_get_si(value_ret_exact.value, MPFR_RNDN);
| ~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
Signed-off-by: jinge90 <ge.jin@intel.com>
…22342) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
This fixes an oversight in #164241.
…205477) `getAsCXXRecordDecl` will return nullptr for any dependent types. It's introduced by #192786, see llvm/llvm-project#192786 (comment) in original PR.
Use m_c_ICmp so the load can be on either side of the icmp.
Without asserts, we see failures like so:
/repo/llvm/llvm/lib/Target/Hexagon/HexagonAsmPrinter.cpp:982:43: error:
unused variable 'NextI' [-Werror,-Wunused-variable]
982 | MachineBasicBlock::const_instr_iterator NextI =
std::next(MI.getIterator());
| ^~~~~
1 error generated.
Mark NextI `maybe_unused` to address the issue.
Fixes a regression introduced by f8aa5f6.
Implement `memset`, `memcpy`, `memmove` intrinsics and their corresponding inline version. Note that the `isvolatile` argument is ignored and left for future PRs.
This PR fixes two related DWARF constant-handling bugs that were blocking each other. First, LLDB's DWARF expression evaluator in [`DWARFExpression.cpp`](https://github.com/llvm/llvm-project/blob/main/lldb/source/Expression/DWARFExpression.cpp) handled `DW_OP_constu` and `DW_OP_consts` without going through `to_generic`. Under DWARF, these operators push a generic value: an address-sized integral value with unspecified signedness. That means the result should be truncated to the target address size (via `to_generic`). Second, LLVM already had a producer-side issue tracked as [#47431](llvm/llvm-project#47431): on 32-bit targets, LLVM could emit `DW_OP_consts` / `DW_OP_constu` for source integer constants wider than the target generic type. If LLDB were fixed alone, those producer-emitted constants would become truncated as DWARF requires, exposing incorrect debug info for wide source values. This patch fixes both sides together. ## What Changed On the LLDB consumer side: - `DW_OP_constu` now uses `to_generic`. - `DW_OP_consts` now uses `to_generic`. - The corresponding LLDB DWARF expression tests were updated to expect address-sized generic values. On the LLVM producer side: - Wide integer debug-location constants that cannot be represented by the target generic type are emitted as `DW_OP_implicit_value` instead of `DW_OP_const*`. - This preserves the source value bytes instead of relying on an address-sized DWARF generic constant. - The producer-side change is limited to complete constant values, where there are no remaining `DIExpression` operations. ## Validation Locally verified with: ```text build/tools/lldb/unittests/Expression/ExpressionTests --gtest_filter='DWARFExpression.*' 74 tests passed build/bin/llvm-lit -sv llvm/test/DebugInfo/X86/constant-loclist.ll 1 test passed ninja -C build check-lldb -j12 No unexpected failures ninja -C build check-all -j12 Completed with one unrelated local failure in Clang Tools :: clang-doc/DR-141990.cpp, caused by host warning-option output. No DebugInfo, DWARF, LLDB expression, or AsmPrinter-related failures were observed. ```
…hs (#205492) `performFusion()` and `fuseGuardedLoops()` carried two character-for-character identical tails: header-PHI migration plus latch rewiring, and the SCEV-forget / block-merge / latch-merge finalization. Extract them into `rewireFusedHeaderPHIsAndLatches()` and `finalizeFusedLoop()` and call both from each path.
It's a lightweight pass. Should always be the last SSA pass since peephole can end up making some instructions dead.
…#204056) Add the CBA offset delta to sh_size once at the end instead of after each write.
This patch makes the following changes: - Refactor the internal sorting functions to reduce code duplication. - Move the testing machinery done for the testing of `qsort_r` to a shared place. These changes are done in anticipation to the introduction of Annex K's `qsort_s`. This function shares most of its semantics with `qsort_r`, therefore most of the testing logic can be shared between the two. Besides, `qsort`, `qsort_r` and `qsort_s` are all very similar, hence we can attempt to reduce duplication a bit more.
The code which calculates the 'errsign' parameter to pass to `__compiler_rt_dunder` was wrong in two ways. It calculated the value with the wrong sign, and also in the wrong register, r12 rather than r2! In this code's original context, both of those things made sense (the 'dunder' function had a nonstandard ABI). Somehow none of the existing test cases detected the problem. We found this bug in a test case downstream that only failed big-endian (because that changes which half of the denominator mantissa is left in r2 to be accidentally used as errsign). However, the new test cases here are designed to detect the failure in both endiannesses.
This patch adds struct ip_mreq, ip_mreq_source, ip_mreqn, ip_opts, and ip_msfilter to <netinet/in.h>, along with IP level socket option macros (IP_TOS, IP_TTL, IP_ADD_MEMBERSHIP, etc.). I add basic unit tests verifying the size and member offsets of the new structures against standard layout expectations, mainly to make sure that the files are used /somewhere/. Assisted by Gemini.
…AL (#205353)
Lowering is generating patterns when forwarding OPTIONAL in calls that
looks like:
```
%present = fir.is_present %var : (T) -> i1
%if_result = fir.if %present -> (T) {
fir.result %var : T
} else {
%absent = fir.absent T
fir.result %absent : T
}
```
This specific pattern is a no-op and `%var` can be used directly. The
lowering logic that generates such patterns is inside non trivial
compiler code that has to deal with more complex scenarios where the
code inside the fir.if is more complex. Add a FIR pattern to
canonicalize such code to help with later analysis (like aliasing).
llvm/llvm-project#203084 adds diagnostics about unused variables to the libc++ containers. This patch is the fallout from the projects I tried to build with it.
Match the `VarDecl::evaluateValue()` contract updated by #205033 in CIR constant emission.
…el#22434) TranslateOffloadTargetArgs had a broad exception for amdgcn OpenMP that passed all -m_Group args to the device DAL. This silently claimed flags like -mxnack/-mno-sramecc, suppressing the expected "unused argument" warning tested by amdgpu-xnack-sramecc-flags.c. The original intent was only to forward -mcpu to the device toolchain; narrow the condition accordingly. Fixes Driver/amdgpu-xnack-sramecc-flags.c CMPLRLLVM-76233 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…empty (intel#22432) BoundArch("") sets Arch=OffloadArch::Unknown (not Unused), so operator bool() returns true even for an empty arch name. This caused TranslateArgs to erase -march=<value> and re-add -march= (empty), producing "unsupported argument '' to option '-march='". Fix: use !BA.empty() (checks ArchName string) instead of if (BA) (checks Arch != Unused). Broken by 448b725. Fixes test Driver/sycl-native-cpu.cpp CMPLRLLVM-76405 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
89905ff added nofree to below tests: - check_device_code/vector/bf16_builtins_new_vec.cpp - check_device_code/vector/bf16_builtins_old_vec.cpp - check_device_code/vector/convert_bfloat_new_vec.cpp - check_device_code/vector/convert_bfloat_old_vec.cpp - check_device_code/vector/math_ops_new_vec.cpp - check_device_code/vector/math_ops_old_vec.cpp CMPLRLLVM-76303
…ble (intel#22449) 437f95c disabled address space optimization for kernel args, adding range/id byval struct params to NativeCPU kernel signatures (4->6 params). ESIMD kernels capturing int args now retain them in the signature. Fixes - check_device_code/esimd/NBarrierAttr.cpp - check_device_code/esimd/dae.cpp - check_device_code/esimd/genx_func_attr.cpp - check_device_code/esimd/slm_init_specconst_size.cpp - check_device_code/native_cpu/kernelhandler-scalar.cpp - check_device_code/native_cpu/native_cpu_subhandler.cpp CMPLRLLVM-76303 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
CONFLICT (content): Merge conflict in sycl/test-e2e/KernelAndProgram/level-zero-static-link-flow.cpp CONFLICT (content): Merge conflict in sycl/test-e2e/Properties/cache_config.cpp
…up_size AMDGPU verifier (added in 6794e31) requires amdgpu-flat-work-group-size function attribute whenever reqd_work_group_size metadata is present. setFunctionDeclAttributes only handled OpenCL's ReqdWorkGroupSizeAttr; SYCL uses SYCLReqdWorkGroupSizeAttr which went through CodeGenFunction.cpp to emit the metadata but never set the function attribute, triggering the verifier error on amdgcn targets. Fixes CodeGenSYCL/reqd-work-group-size.cpp CMPLRLLVM-76303 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
Author
|
not upstream-able because SYCLReqdWorkGroupSizeAttr doesn't exists in llvm-project |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
AMDGPU verifier (added in 6794e31) requires amdgpu-flat-work-group-size function attribute whenever reqd_work_group_size metadata is present. setFunctionDeclAttributes only handled OpenCL's ReqdWorkGroupSizeAttr; SYCL uses SYCLReqdWorkGroupSizeAttr which went through CodeGenFunction.cpp to emit the metadata but never set the function attribute, triggering the verifier error on amdgcn targets.
Fixes CodeGenSYCL/reqd-work-group-size.cpp and check-work-group-attributes-match.cpp.
CMPLRLLVM-76303