Merge upstream/llvm into amd-debug by mariusz-sikora-at-amd · Pull Request #2863 · ROCm/llvm-project

mariusz-sikora-at-amd · 2026-06-11T07:25:26Z

Merge with all Scott's upstream changes

Please run git show --remerge-diff on top of this PR

-march is now rewritten to -mcpu.

…cases. (llvm#199158) This is a follow-up on Jean's comment llvm#198933 (comment) This patch makes use of the descriptor strides when `fir.array_coor`'s memref is a `fir.box` that is not a fir.embox result.

This patch enables pulling slicing `fir.rebox` operations into `fir.array_coor`. This helps preserve information about the original rank of the array being accessed. `FIRToMemRef` and later passes may benefit from this. Assisted by: Claude

…199137) Problem: `hasNearbyPairedStore` uses `stripAndAccumulateInBoundsConstantOffsets` to decompose store pointers into (base, offset) pairs and check whether two stores are 16 bytes apart. This fails when LSR has rewritten pointer arithmetic into non-inbounds GEPs because the function refuses to look through them. The two stores then appear to have different base pointers and the check returns false. When this happens, `lowerInterleavedStore` proceeds to emit `ST2` for a pattern that would be more profitable as `zip+stp`, since the load-store optimizer can pair adjacent stores into `STP` but cannot merge `ST2` with anything. On a bf16-to-fp32 NEON conversion loop this causes a regression from 11 to 17 instructions per iteration. Note: Interleaved stores support was added for RISCV in llvm#115354. Turning this off produces the desired STP instructions. https://godbolt.org/z/1afsjPd3e Fix: Switch to `stripAndAccumulateConstantOffsets` with `AllowNonInbounds=true`. The function is a bail-out heuristic doing pure address arithmetic, so the inbounds semantic guarantee is not needed for correctness. --------- Co-authored-by: Kunal Pathak <kupathak@fb.com>

The [WebAssembly Component Model](https://component-model.bytecodealliance.org/) has added support for [cooperative multithreading](WebAssembly/component-model#557). This has been implemented in the [Wasmtime engine](bytecodealliance/wasmtime#11751) and is part of the wider project of [WASI preview 3](https://wasi.dev/roadmap#upcoming-wasi-03-releases), which is currently tracked [here](https://github.com/orgs/bytecodealliance/projects/16). These changes require updating the way that `__stack_pointer` and `__tls_base` work purely for a new `wasm32-wasip3` target; other targets will not be touched. Specifically, rather than using a Wasm global for tracking the stack pointer and TLS base, the new [`context.get/set`](https://github.com/WebAssembly/component-model/blob/main/design/mvp/CanonicalABI.md#-canon-contextget) component model builtin functions will be used (the intention being that runtimes will need to aggressively optimize these calls into single load/stores). For justification on this choice rather than switching out the global at context-switch boundaries, see [this comment](WebAssembly/wasi-libc#691 (comment)) and [this comment](WebAssembly/wasi-libc#691 (comment)). This PR adds support for using library calls instead of globals for holding the stack pointer and TLS base. When used, this thread context ABI emits calls to `__wasm_{get,set}_{stack_pointer,tls_base}` when needed. These functions can then be implemented in `libc`. This is enabled only for the WASIp3 target. There is a temporary macro define for `__wasm_libcall_thread_context__` which can be removed once `wasi-libc` has fully migrated to the new ABI for the WASIp3 target.

In Python 3.0 and later it is no longer necessary to explicitly derive from `object` to opt into "new-style" classes, they are the default. Since the current minimum Python version is 3.8, this is no longer required. This patch removes `object` from the base class lists of all affected classes in lit.

llvm#199786) This patch removes future statements from lit for features that are mandatory in Python 3.0 and later. Specifically, it removes future statements for [`absolute_import`](https://docs.python.org/3/library/__future__.html#future__.absolute_import) and [`print_function`](https://docs.python.org/3/library/__future__.html#future__.print_function), since both became mandatory in Python 3.0.

…2457) This patch adds basic support for partial alias masking, which allows entering the vector loop even when there is aliasing within a single vector iteration. It does this by clamping the VF to the safe distance between pointers. This allows the runtime VF to be anywhere from 2 to the "static" VF. Conceptually, this transform looks like: ``` // `c` and `b` may alias. for (int i = 0; i < n; i++) { c[i] = a[i] + b[i]; } ``` -> ``` svbool_t alias_mask = loop.dependence.war.mask(b, c); int num_active = num_active_lanes(mask); if (num_active >= 2) { for (int i = 0; i < n; i += num_active) { // ... vector loop masked with `alias_mask` } } // ... scalar tail ``` This initial patch has a number of limitations: - The loop must be tail-folded * We intend to follow-up with full alias-masking support for loops without tail-folding - The mask and transform is only valid for IC = 1 * Some recipes may not handle the "ClampedVF" correctly at IC > 1 * Note: On AArch64, we also only have native alias mask instructions for IC = 1 - Reverse iteration is not supported * The mask reversal logic is not correct for the alias mask (or clamped ALM) - First order recurrences are not supported * The `splice.right` is not lowered correctly for clamped VFs - Reductions are not supported * The final horizontal reduction needs to set lanes past the "ClampedVF" to the identity value - This style of vectorization is not enabled by default/costed * It can be enabled with `-force-partial-aliasing-vectorization` * When enabled, alias masking is used instead of the standard diff checks (when legal to do so) This PR supersedes llvm#100579 (closes llvm#100579).

…e unique linkage names. (llvm#198667) Use normalized path from the macro prefix map to generate the unique ids for the internal linkage names. That allows a reproducible hash on any build system. Regularly the macro prefix map gets normalized in favor of the target system before the path substitution.

)

This allows us to keep GUIDs consistent across compilation phases which may change the name or linkage type. See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801 This is a large change since the addition of metadata breaks many tests. The test changes are mostly just trivial changes to checks to get them passing.

Reviewers: Pull Request: llvm#199804

Fix the column `0` for the `<total>` row in llvm-mca's `Average Wait times` report. The `total` row now represents the total dynamic execution count used to normalize the averages, instead of the per-instruction iteration count. Update the timeline view docs and autogenerated test expectations accordingly. Co-authored-by: liuxiaodong <liuxiaodong@sunmmio.com>

) The comment in getOutputSectionName has always called the second-dot stripping "for MinGW" (e.g. .ctors.NNNN), but the code applied it on every target. This hiddes a split-dwarf bug llvm#199616. Take an isMinGW gate and skip the stripping when it is false.

Move out `setHasProfileAvailable` into `markFunctionsWithProfile`. This also allows extracting per-pre-aggregated type handling in `parseAggregatedLBREntry` into a switch statement. Test Plan: NFC Processing time change (wall time): * 10MB pre-aggregated profile: - Parsing aggregated branch events: 0.16s -> 0.05s - Pre-process profile data (parsing+marking): 0.18s -> 0.16s * 6GB perf.data file: - Parsing branch events: 29.06s -> 28.55s - Pre-process profile data (excluding perf script): 29.47s -> 29.13s Reviewers: rafaelauler, yota9, maksfb, ayermolo, yozhu, yavtuk, paschalis-mpeis Pull Request: llvm#199320

…99797) Sink the lets Defs = [VXSAT] into the classs. This makes the encoding based structure of this file more consistent.

…=0. NFC (llvm#199798) We had a let outside the class and inside.

Since SegInstSEW is only used by segment load/store, no need to keep it for other builtins.

I broke this test in llvm#199739. As a result to that change, the start of the CODE section in the linked WASM file shifted from 0x41 to 0x37 (a shift of -10 bytes). I was not aware that `wasm-ld` had testing outside of `lld/test/wasm`.

GCC released a new version, so we should bump the versions installed in the CI so we can upgrade.

They are like regular record ctors.

@jmorse

…vm#196462) As suggested by @jmorse and @efriedma-quic in llvm#196223. --------- Co-authored-by: Corentin Jabot <corentinjabot@gmail.com>

Fixes llvm#177852. The reproducer has two `.cfi_startproc` directives separated by a `.popsection`. The first is never closed; the second is properly paired with `.cfi_endproc`. `MCStreamer::finish()` only inspects the last entry of `DwarfFrameInfos`, so the unfinished earlier frame slips through and crashes `finishImpl()` when it emits frame data with a null End label. Use `hasUnfinishedDwarfFrameInfo()` instead, which walks the full `FrameInfoStack` and catches every unfinished frame. --------- Co-authored-by: Fangrui Song <i@maskray.me>

… sources (llvm#199604) PR llvm#179924 and llvm#179925 added optimized assembly implementations for ARM double-precision and single-precision FP comparisons (arm/cmpdf2.S, arm/gedf2.S, arm/unorddf2.S, arm/cmpsf2.S, arm/gesf2.S, arm/unordsf2.S) but only added SUPERSEDES annotations for the thumb1 variants. The arm variants were missing these annotations, causing both the generic and optimized implementations to be included in libclang_rt.builtins.a. For double-precision, the archive contains: - comparedf2.c.obj (pos 28): defines __unorddf2, __aeabi_dcmpun, ... - divdc3.c.obj (pos 32): defines __divdc3; refs __aeabi_dcmpun - unorddf2.S.obj (pos 126): defines __unorddf2, __aeabi_dcmpun - aeabi_dcmp.S.obj (pos 158): defines __aeabi_dcmpeq; refs __eqdf2 When linking divdc3_test.c, the linker loads divdc3.c.obj which introduces __aeabi_dcmpun as undefined. BFD-like linkers (GNU ld, ELD) continue scanning forward and resolve __aeabi_dcmpun from unorddf2.S.obj (pos 126). Later, aeabi_dcmp.S.obj introduces __eqdf2 as undefined, which is resolved by comparedf2.c.obj (pos 28) on the next pass. Since both comparedf2.c.obj and unorddf2.S.obj define __unorddf2, the linker reports a duplicate symbol error. lld does not encounter this because of the difference in the way it resolves symbol references. This causes comparedf2.c.obj (pos 28) to be selected first for __aeabi_dcmpun, making unorddf2.S.obj unnecessary. The same pattern exists for single-precision where arm/comparesf2.S and arm/unordsf2.S both define __unordsf2 and __aeabi_fcmpun. The fix adds SUPERSEDES annotations so that the generic implementations (comparedf2.c for double-precision and arm/comparesf2.S for single- precision) are removed from the source list when the optimized assembly replacements are present. The assembly files together provide all symbols that the generic implementations define. The surrounding code was reviewed, and this PR was developed with the assistance of AI.

…#196906) Move the bit name list of BBAddrMap::Features and BBAddrMap::BBEntry::Metadata into a new BBAddrMap.def and derive the enum, bitfield, encode(), decode(), and operator== from it. Adding a new bit now only requires one line in the .def file. Also expose BBAddrMap::Features::KnownMask for future use.

…vm#198192) Simplify LitConfig initialization and setter to allow None values. TestingConfig.maxIndividualTestTime is initialized to 0 (or resolved to 0 if None) strictly during initialization. This fixes an issue where the aggressive BOLT timeout of 60s (previously set globally on lit_config) was leaking and affecting libc++ tests. By moving the timeout configuration from the global lit_config to the individual test suite config, we ensure that timeouts are isolated and respect suite-local settings without leaking. PR Stack: * ➤ llvm#198192 * llvm#198193 Assisted-by: Gemini

Debug labels did not exist in LLVM 3.7 and have no equivalent.

…ardian argument (llvm#198695) A function parameter of type RefPtr<T>& should not be used as a guardian variable of a raw pointer/reference variable if the function body contains an assignment to it since such an assignment can shorten the lifetime of the guarded object.

…lvm#200091) Patches reverted: commit c315c66 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Date: Wed May 27 12:51:13 2026 [AMDGPU] Fix codesize estimate after llvm#198005 (llvm#200033) This fixes failure in libc tests which checks the exact encoding size. Encoding is now shorter, but it did not recognize fp16 immediates as an inlinable constant and assumes literal encoding. Shorter encodings were created here: llvm#198005 commit 2b3bc03 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Date: Wed May 27 10:55:36 2026 [AMDGPU] Use shorter form for i16 operands (llvm#198005) For 16-bit operands an inline constant is zero extended which in particular allows to use FP constants. These will have 16 bits of zeroes in the high half and FP16 value in the low 16 bits. The patch changes semantics of the FP literal argument used in i16 context in the asm parser to fp16. Apparently this breaks some libc tests with bf16. I do not know why, these were not supposed to be affected. Reverting for now. Failed tests: https://lab.llvm.org/buildbot/#/builders/10/builds/29005

Accompanies llvm#200018

…lvm#199917) With a constant lane index, split the vector and recurse on the single-GPR half containing Idx (already Custom-lowered).

…lvm#200000) The `RISCVMoveMerger` pass was incorrectly forming `CM_MVSA01/QC_CM_MVSA01` when `Zdinx` was enabled. The pass attempted CM merge for copy pairs even when the first copy was not an `a0/a1-based` CM candidate. Fix by only running `findMatchingInst` when the current copy is a valid CM candidate.

…lvm#199963) ExpandAMDGPUPredicateBuiltIn synthesized an IntegerLiteral typed _Bool/bool — a shape no other producer creates, and one that StmtPrinter::VisitIntegerLiteral has no case for. -ast-print on the resulting if-condition hit llvm_unreachable. Emit the canonical boolean literal instead: - C++, C23, OpenCL, HIP: CXXBoolLiteralExpr 'bool' - pre-C23 C: IntegerLiteral 'int' In the C case this matches what <stdbool.h>'s true/false macros expand to. Fixes llvm#199563

The commit added a dep from profile -> interception, so define that target too Fixes 5db1364

PR llvm#177665 added an unconditional `extern` reference to `__llvm_profile_hip_collect_device_data` from `InstrProfilingFile.c`, which forces `InstrProfilingPlatformROCm.o` (and its sanitizer_common / interception dependencies) out of `libclang_rt.profile.a` in every PGO binary. That breaks bots without `-lpthread` and races dlsym/PLT state in non-HIP programs via the interceptor constructor. Fix: - Declare the hook `COMPILER_RT_WEAK` and gate the call on its address. No `COMPILER_RT_VISIBILITY`: a hidden weak-undef function would be non-preemptible and the address test would fold to true. - Gate `installHipModuleInterceptors` on `dlsym(hipModuleLoad)` so the constructor is a no-op if `ROCm.o` is still pulled in. Fixes: - https://lab.llvm.org/buildbot/#/builders/66/builds/31311 - https://lab.llvm.org/buildbot/#/builders/174/builds/36180 Verified: - `check-profile` 134/134 pass. - `nm` on a non-HIP `clang -fprofile-generate` binary: zero `installHip`/`ROCm`/`sanitizer`/`hip_collect` symbols. - HIP offload PGO end-to-end on gfx1101 (compile → run → `llvm-profdata merge` → `llvm-cov`) still works; interceptor installs, device profile collected via shared API.

SplitDebugName checked -o and /o but not /Fo, so clang-cl /Fo<path> /c fell through to the cwd-relative fallback and every .dwo landed in cwd under <source-stem>.dwo regardless of the .obj location.

This adds those test cases while llvm#111561 gathers dust.

`loop-fusion` treats any loop-invariant scalar non-anti dependence as safe to fuse. In the linked issue, it incorrectly allows scalar flow dependences where the first loop writes a loop-invariant location and the second loop later reads that same location. Fusion interleaves the producer and consumer and this changes the value observed by the second loop. Example C source would look like: ```C for (int i = 0; i < N; i++) { ptr[0] = i; } for (int j = 0; j < N; j++) { out[j] = ptr[0]; } => for (int i = 0; i < N; i++) { ptr[0] = i; out[i] = ptr[0]; } ``` This patch makes the DA scalar-dependence shortcut **_more conservative_** by rejecting scalar non-anti and allowing input/output dependences. This preserves the existing safe read and write cases while preventing the miscompile above. The patch also updates the `loop-fusion` debug message to reflect the narrower accepted case, updates the existing regression to check the new debug message, and adds a new regression from the linked issue. Fixes llvm#191238

…vm#198872) Add `-fcoverage-mapping`, `-fno-coverage-mapping`, `-fcoverage-compilation-dir=`, `-ffile-compilation-dir=`, and `-fcoverage-prefix-map=` to the LinkerWrapper `CompilerOptions` forwarding list. Without this, passing `-fprofile-instr-generate -fcoverage-mapping` to clang for a HIP program silently omits the coverage mapping flags from the embedded device recompilation, so `__llvm_covmap`/`__llvm_covfun` sections are never emitted for device code.

rocm-cciapp · 2026-06-11T07:28:48Z

PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/6/builds/209

dstutt

Primarily just using the downstream version, right? Were the changes already cherry-picked?

Either way, the resolution looks ok to me.

mariusz-sikora-at-amd · 2026-06-11T07:51:09Z

Primarily just using the downstream version, right?

Either way, the resolution looks ok to me.

I used mostly upstream version.

Were the changes already cherry-picked?

You are referring to #2636?

dstutt · 2026-06-11T08:17:23Z

Primarily just using the downstream version, right?
Either way, the resolution looks ok to me.

I used mostly upstream version.

Were the changes already cherry-picked?

You are referring to #2636?

Not specifically, it just looked that way.
Even if that isn't the case, the changes LGTM.

arsenm and others added 30 commits May 26, 2026 22:26

clang/AMDGPU: Remove unnecessary fallback to check -march (llvm#199780)

e9e5d4e

-march is now rewritten to -mcpu.

clang/AMDGPU: Report all runtimeless sanitizers as available (llvm#19…

f263446

…9642)

[flang][FIRToMemRef] Get strides from descriptor for some array_coor …

f3c0f26

…cases. (llvm#199158) This is a follow-up on Jean's comment llvm#198933 (comment) This patch makes use of the descriptor strides when `fir.array_coor`'s memref is a `fir.box` that is not a fir.embox result.

[libc] Use containers for overlay precommit CIs. (llvm#199294)

fc60e08

[lld][WebAssembly] Only include __stack_pointer when needed (llvm#199739

e0ef143

)

[SLP][NFC]Add another test for external phi user, NFC

d139f65

Reviewers: Pull Request: llvm#199804

[NFC][AMDGPU] Improve the predicate uses for WMMAs (llvm#199807)

73de4c7

[RISCV][P-ext] Add DefVXSAT argument to tablegen classes. NFC (llvm#1…

7259dd6

…99797) Sink the lets Defs = [VXSAT] into the classs. This makes the encoding based structure of this file more consistent.

[RISCV][P-ext] Remove duplicate hasSideEffects=0, mayLoad=0, mayStore…

05e1af7

…=0. NFC (llvm#199798) We had a let outside the class and inside.

[RISCV][NFC] Remove SegInstSEW for unused function (llvm#199598)

0d5b752

Since SegInstSEW is only used by segment load/store, no need to keep it for other builtins.

[libc++] Update the GCC head version to 17 (llvm#199823)

5a616ce

GCC released a new version, so we should bump the versions installed in the CI so we can upgrade.

[clang][bytecode] Fix non-defaulted union copy/move ctors (llvm#199394)

22ba468

They are like regular record ctors.

[clang][diagnostics] Reject embedded NUL characters in inline asm (ll…

80490b8

…vm#196462) As suggested by @jmorse and @efriedma-quic in llvm#196223. --------- Co-authored-by: Corentin Jabot <corentinjabot@gmail.com>

[lldb] Use private stop for breakpoint-delaying decision (llvm#199639)

18ddec7

hvdijk and others added 14 commits May 28, 2026 03:13

[DirectX] Drop debug labels (llvm#197490)

740e52b

Debug labels did not exist in LLVM 3.7 and have no equivalent.

[SLP] Precommit tests for runtime strided stores (llvm#200019)

0381a09

Accompanies llvm#200018

[RISCV][P-ext] Split v4i16/v8i8 INSERT/EXTRACT_VECTOR_ELT on RV32. (l…

5b17cdb

…lvm#199917) With a constant lane index, split the vector and recurse on the single-GPR half containing Idx (already Custom-lowered).

[Bazel] Fixes 5db1364 (llvm#200104)

2d5dac5

The commit added a dep from profile -> interception, so define that target too Fixes 5db1364

[Driver] Honor /Fo when deriving the split-dwarf .dwo path (llvm#199613)

d5e97d7

SplitDebugName checked -o and /o but not /Fo, so clang-cl /Fo<path> /c fell through to the cwd-relative fallback and every .dwo landed in cwd under <source-stem>.dwo regardless of the .obj location.

[clang] NFC: add test cases from llvm#111561 (llvm#200105)

a20e85f

This adds those test cases while llvm#111561 gathers dust.

Merge llvm/main into amd-debug

b719329

mariusz-sikora-at-amd requested review from ScottEgerton, dstutt, slinder1 and sstipano June 11, 2026 07:25

mariusz-sikora-at-amd requested review from Pierre-vh, krzysz00, kuhar, lamb-j, ritter-x2a and vangthao95 as code owners June 11, 2026 07:25

dstutt approved these changes Jun 11, 2026

View reviewed changes

mariusz-sikora-at-amd merged commit b719329 into amd-debug Jun 11, 2026
29 checks passed

mariusz-sikora-at-amd deleted the amd/dev/masikora/amd-debug-merge-candidate branch June 11, 2026 08:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge upstream/llvm into amd-debug#2863

Merge upstream/llvm into amd-debug#2863
mariusz-sikora-at-amd merged 184 commits into
amd-debugfrom
amd/dev/masikora/amd-debug-merge-candidate

mariusz-sikora-at-amd commented Jun 11, 2026

Uh oh!

rocm-cciapp Bot commented Jun 11, 2026

Uh oh!

dstutt left a comment

Uh oh!

mariusz-sikora-at-amd commented Jun 11, 2026

Uh oh!

dstutt commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Uh oh!

Conversation

mariusz-sikora-at-amd commented Jun 11, 2026

Uh oh!

rocm-cciapp Bot commented Jun 11, 2026

Uh oh!

dstutt left a comment

Choose a reason for hiding this comment

Uh oh!

mariusz-sikora-at-amd commented Jun 11, 2026

Uh oh!

dstutt commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants