Merge upstream/llvm into amd-debug#2863
Merged
mariusz-sikora-at-amd merged 184 commits intoJun 11, 2026
Merged
Conversation
-march is now rewritten to -mcpu.
…cases. (llvm#199158) This is a follow-up on Jean's comment llvm#198933 (comment) This patch makes use of the descriptor strides when `fir.array_coor`'s memref is a `fir.box` that is not a fir.embox result.
This patch enables pulling slicing `fir.rebox` operations into `fir.array_coor`. This helps preserve information about the original rank of the array being accessed. `FIRToMemRef` and later passes may benefit from this. Assisted by: Claude
…199137) Problem: `hasNearbyPairedStore` uses `stripAndAccumulateInBoundsConstantOffsets` to decompose store pointers into (base, offset) pairs and check whether two stores are 16 bytes apart. This fails when LSR has rewritten pointer arithmetic into non-inbounds GEPs because the function refuses to look through them. The two stores then appear to have different base pointers and the check returns false. When this happens, `lowerInterleavedStore` proceeds to emit `ST2` for a pattern that would be more profitable as `zip+stp`, since the load-store optimizer can pair adjacent stores into `STP` but cannot merge `ST2` with anything. On a bf16-to-fp32 NEON conversion loop this causes a regression from 11 to 17 instructions per iteration. Note: Interleaved stores support was added for RISCV in llvm#115354. Turning this off produces the desired STP instructions. https://godbolt.org/z/1afsjPd3e Fix: Switch to `stripAndAccumulateConstantOffsets` with `AllowNonInbounds=true`. The function is a bail-out heuristic doing pure address arithmetic, so the inbounds semantic guarantee is not needed for correctness. --------- Co-authored-by: Kunal Pathak <kupathak@fb.com>
The [WebAssembly Component Model](https://component-model.bytecodealliance.org/) has added support for [cooperative multithreading](WebAssembly/component-model#557). This has been implemented in the [Wasmtime engine](bytecodealliance/wasmtime#11751) and is part of the wider project of [WASI preview 3](https://wasi.dev/roadmap#upcoming-wasi-03-releases), which is currently tracked [here](https://github.com/orgs/bytecodealliance/projects/16). These changes require updating the way that `__stack_pointer` and `__tls_base` work purely for a new `wasm32-wasip3` target; other targets will not be touched. Specifically, rather than using a Wasm global for tracking the stack pointer and TLS base, the new [`context.get/set`](https://github.com/WebAssembly/component-model/blob/main/design/mvp/CanonicalABI.md#-canon-contextget) component model builtin functions will be used (the intention being that runtimes will need to aggressively optimize these calls into single load/stores). For justification on this choice rather than switching out the global at context-switch boundaries, see [this comment](WebAssembly/wasi-libc#691 (comment)) and [this comment](WebAssembly/wasi-libc#691 (comment)). This PR adds support for using library calls instead of globals for holding the stack pointer and TLS base. When used, this thread context ABI emits calls to `__wasm_{get,set}_{stack_pointer,tls_base}` when needed. These functions can then be implemented in `libc`. This is enabled only for the WASIp3 target. There is a temporary macro define for `__wasm_libcall_thread_context__` which can be removed once `wasi-libc` has fully migrated to the new ABI for the WASIp3 target.
In Python 3.0 and later it is no longer necessary to explicitly derive from `object` to opt into "new-style" classes, they are the default. Since the current minimum Python version is 3.8, this is no longer required. This patch removes `object` from the base class lists of all affected classes in lit.
llvm#199786) This patch removes future statements from lit for features that are mandatory in Python 3.0 and later. Specifically, it removes future statements for [`absolute_import`](https://docs.python.org/3/library/__future__.html#future__.absolute_import) and [`print_function`](https://docs.python.org/3/library/__future__.html#future__.print_function), since both became mandatory in Python 3.0.
…2457) This patch adds basic support for partial alias masking, which allows entering the vector loop even when there is aliasing within a single vector iteration. It does this by clamping the VF to the safe distance between pointers. This allows the runtime VF to be anywhere from 2 to the "static" VF. Conceptually, this transform looks like: ``` // `c` and `b` may alias. for (int i = 0; i < n; i++) { c[i] = a[i] + b[i]; } ``` -> ``` svbool_t alias_mask = loop.dependence.war.mask(b, c); int num_active = num_active_lanes(mask); if (num_active >= 2) { for (int i = 0; i < n; i += num_active) { // ... vector loop masked with `alias_mask` } } // ... scalar tail ``` This initial patch has a number of limitations: - The loop must be tail-folded * We intend to follow-up with full alias-masking support for loops without tail-folding - The mask and transform is only valid for IC = 1 * Some recipes may not handle the "ClampedVF" correctly at IC > 1 * Note: On AArch64, we also only have native alias mask instructions for IC = 1 - Reverse iteration is not supported * The mask reversal logic is not correct for the alias mask (or clamped ALM) - First order recurrences are not supported * The `splice.right` is not lowered correctly for clamped VFs - Reductions are not supported * The final horizontal reduction needs to set lanes past the "ClampedVF" to the identity value - This style of vectorization is not enabled by default/costed * It can be enabled with `-force-partial-aliasing-vectorization` * When enabled, alias masking is used instead of the standard diff checks (when legal to do so) This PR supersedes llvm#100579 (closes llvm#100579).
…e unique linkage names. (llvm#198667) Use normalized path from the macro prefix map to generate the unique ids for the internal linkage names. That allows a reproducible hash on any build system. Regularly the macro prefix map gets normalized in favor of the target system before the path substitution.
This allows us to keep GUIDs consistent across compilation phases which may change the name or linkage type. See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801 This is a large change since the addition of metadata breaks many tests. The test changes are mostly just trivial changes to checks to get them passing.
Reviewers: Pull Request: llvm#199804
Fix the column `0` for the `<total>` row in llvm-mca's `Average Wait times` report. The `total` row now represents the total dynamic execution count used to normalize the averages, instead of the per-instruction iteration count. Update the timeline view docs and autogenerated test expectations accordingly. Co-authored-by: liuxiaodong <liuxiaodong@sunmmio.com>
) The comment in getOutputSectionName has always called the second-dot stripping "for MinGW" (e.g. .ctors.NNNN), but the code applied it on every target. This hiddes a split-dwarf bug llvm#199616. Take an isMinGW gate and skip the stripping when it is false.
Move out `setHasProfileAvailable` into `markFunctionsWithProfile`. This also allows extracting per-pre-aggregated type handling in `parseAggregatedLBREntry` into a switch statement. Test Plan: NFC Processing time change (wall time): * 10MB pre-aggregated profile: - Parsing aggregated branch events: 0.16s -> 0.05s - Pre-process profile data (parsing+marking): 0.18s -> 0.16s * 6GB perf.data file: - Parsing branch events: 29.06s -> 28.55s - Pre-process profile data (excluding perf script): 29.47s -> 29.13s Reviewers: rafaelauler, yota9, maksfb, ayermolo, yozhu, yavtuk, paschalis-mpeis Pull Request: llvm#199320
…99797) Sink the lets Defs = [VXSAT] into the classs. This makes the encoding based structure of this file more consistent.
…=0. NFC (llvm#199798) We had a let outside the class and inside.
Since SegInstSEW is only used by segment load/store, no need to keep it for other builtins.
I broke this test in llvm#199739. As a result to that change, the start of the CODE section in the linked WASM file shifted from 0x41 to 0x37 (a shift of -10 bytes). I was not aware that `wasm-ld` had testing outside of `lld/test/wasm`.
GCC released a new version, so we should bump the versions installed in the CI so we can upgrade.
They are like regular record ctors.
…vm#196462) As suggested by @jmorse and @efriedma-quic in llvm#196223. --------- Co-authored-by: Corentin Jabot <corentinjabot@gmail.com>
Fixes llvm#177852. The reproducer has two `.cfi_startproc` directives separated by a `.popsection`. The first is never closed; the second is properly paired with `.cfi_endproc`. `MCStreamer::finish()` only inspects the last entry of `DwarfFrameInfos`, so the unfinished earlier frame slips through and crashes `finishImpl()` when it emits frame data with a null End label. Use `hasUnfinishedDwarfFrameInfo()` instead, which walks the full `FrameInfoStack` and catches every unfinished frame. --------- Co-authored-by: Fangrui Song <i@maskray.me>
… sources (llvm#199604) PR llvm#179924 and llvm#179925 added optimized assembly implementations for ARM double-precision and single-precision FP comparisons (arm/cmpdf2.S, arm/gedf2.S, arm/unorddf2.S, arm/cmpsf2.S, arm/gesf2.S, arm/unordsf2.S) but only added SUPERSEDES annotations for the thumb1 variants. The arm variants were missing these annotations, causing both the generic and optimized implementations to be included in libclang_rt.builtins.a. For double-precision, the archive contains: - comparedf2.c.obj (pos 28): defines __unorddf2, __aeabi_dcmpun, ... - divdc3.c.obj (pos 32): defines __divdc3; refs __aeabi_dcmpun - unorddf2.S.obj (pos 126): defines __unorddf2, __aeabi_dcmpun - aeabi_dcmp.S.obj (pos 158): defines __aeabi_dcmpeq; refs __eqdf2 When linking divdc3_test.c, the linker loads divdc3.c.obj which introduces __aeabi_dcmpun as undefined. BFD-like linkers (GNU ld, ELD) continue scanning forward and resolve __aeabi_dcmpun from unorddf2.S.obj (pos 126). Later, aeabi_dcmp.S.obj introduces __eqdf2 as undefined, which is resolved by comparedf2.c.obj (pos 28) on the next pass. Since both comparedf2.c.obj and unorddf2.S.obj define __unorddf2, the linker reports a duplicate symbol error. lld does not encounter this because of the difference in the way it resolves symbol references. This causes comparedf2.c.obj (pos 28) to be selected first for __aeabi_dcmpun, making unorddf2.S.obj unnecessary. The same pattern exists for single-precision where arm/comparesf2.S and arm/unordsf2.S both define __unordsf2 and __aeabi_fcmpun. The fix adds SUPERSEDES annotations so that the generic implementations (comparedf2.c for double-precision and arm/comparesf2.S for single- precision) are removed from the source list when the optimized assembly replacements are present. The assembly files together provide all symbols that the generic implementations define. The surrounding code was reviewed, and this PR was developed with the assistance of AI.
…#196906) Move the bit name list of BBAddrMap::Features and BBAddrMap::BBEntry::Metadata into a new BBAddrMap.def and derive the enum, bitfield, encode(), decode(), and operator== from it. Adding a new bit now only requires one line in the .def file. Also expose BBAddrMap::Features::KnownMask for future use.
…vm#198192) Simplify LitConfig initialization and setter to allow None values. TestingConfig.maxIndividualTestTime is initialized to 0 (or resolved to 0 if None) strictly during initialization. This fixes an issue where the aggressive BOLT timeout of 60s (previously set globally on lit_config) was leaking and affecting libc++ tests. By moving the timeout configuration from the global lit_config to the individual test suite config, we ensure that timeouts are isolated and respect suite-local settings without leaking. PR Stack: * ➤ llvm#198192 * llvm#198193 Assisted-by: Gemini
Debug labels did not exist in LLVM 3.7 and have no equivalent.
…ardian argument (llvm#198695) A function parameter of type RefPtr<T>& should not be used as a guardian variable of a raw pointer/reference variable if the function body contains an assignment to it since such an assignment can shorten the lifetime of the guarded object.
…lvm#200091) Patches reverted: commit c315c66 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Date: Wed May 27 12:51:13 2026 [AMDGPU] Fix codesize estimate after llvm#198005 (llvm#200033) This fixes failure in libc tests which checks the exact encoding size. Encoding is now shorter, but it did not recognize fp16 immediates as an inlinable constant and assumes literal encoding. Shorter encodings were created here: llvm#198005 commit 2b3bc03 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Date: Wed May 27 10:55:36 2026 [AMDGPU] Use shorter form for i16 operands (llvm#198005) For 16-bit operands an inline constant is zero extended which in particular allows to use FP constants. These will have 16 bits of zeroes in the high half and FP16 value in the low 16 bits. The patch changes semantics of the FP literal argument used in i16 context in the asm parser to fp16. Apparently this breaks some libc tests with bf16. I do not know why, these were not supposed to be affected. Reverting for now. Failed tests: https://lab.llvm.org/buildbot/#/builders/10/builds/29005
…lvm#199917) With a constant lane index, split the vector and recurse on the single-GPR half containing Idx (already Custom-lowered).
…lvm#200000) The `RISCVMoveMerger` pass was incorrectly forming `CM_MVSA01/QC_CM_MVSA01` when `Zdinx` was enabled. The pass attempted CM merge for copy pairs even when the first copy was not an `a0/a1-based` CM candidate. Fix by only running `findMatchingInst` when the current copy is a valid CM candidate.
…lvm#199963) ExpandAMDGPUPredicateBuiltIn synthesized an IntegerLiteral typed _Bool/bool — a shape no other producer creates, and one that StmtPrinter::VisitIntegerLiteral has no case for. -ast-print on the resulting if-condition hit llvm_unreachable. Emit the canonical boolean literal instead: - C++, C23, OpenCL, HIP: CXXBoolLiteralExpr 'bool' - pre-C23 C: IntegerLiteral 'int' In the C case this matches what <stdbool.h>'s true/false macros expand to. Fixes llvm#199563
The commit added a dep from profile -> interception, so define that target too Fixes 5db1364
PR llvm#177665 added an unconditional `extern` reference to `__llvm_profile_hip_collect_device_data` from `InstrProfilingFile.c`, which forces `InstrProfilingPlatformROCm.o` (and its sanitizer_common / interception dependencies) out of `libclang_rt.profile.a` in every PGO binary. That breaks bots without `-lpthread` and races dlsym/PLT state in non-HIP programs via the interceptor constructor. Fix: - Declare the hook `COMPILER_RT_WEAK` and gate the call on its address. No `COMPILER_RT_VISIBILITY`: a hidden weak-undef function would be non-preemptible and the address test would fold to true. - Gate `installHipModuleInterceptors` on `dlsym(hipModuleLoad)` so the constructor is a no-op if `ROCm.o` is still pulled in. Fixes: - https://lab.llvm.org/buildbot/#/builders/66/builds/31311 - https://lab.llvm.org/buildbot/#/builders/174/builds/36180 Verified: - `check-profile` 134/134 pass. - `nm` on a non-HIP `clang -fprofile-generate` binary: zero `installHip`/`ROCm`/`sanitizer`/`hip_collect` symbols. - HIP offload PGO end-to-end on gfx1101 (compile → run → `llvm-profdata merge` → `llvm-cov`) still works; interceptor installs, device profile collected via shared API.
SplitDebugName checked -o and /o but not /Fo, so clang-cl /Fo<path> /c fell through to the cwd-relative fallback and every .dwo landed in cwd under <source-stem>.dwo regardless of the .obj location.
This adds those test cases while llvm#111561 gathers dust.
`loop-fusion` treats any loop-invariant scalar non-anti dependence as
safe to fuse. In the linked issue, it incorrectly allows scalar flow
dependences where the first loop writes a loop-invariant location and
the second loop later reads that same location. Fusion interleaves the
producer and consumer and this changes the value observed by the second
loop.
Example C source would look like:
```C
for (int i = 0; i < N; i++) {
ptr[0] = i;
}
for (int j = 0; j < N; j++) {
out[j] = ptr[0];
}
=>
for (int i = 0; i < N; i++) {
ptr[0] = i;
out[i] = ptr[0];
}
```
This patch makes the DA scalar-dependence shortcut **_more
conservative_** by rejecting scalar non-anti and allowing input/output
dependences. This preserves the existing safe read and write cases while
preventing the miscompile above.
The patch also updates the `loop-fusion` debug message to reflect the
narrower accepted case, updates the existing regression to check the new
debug message, and adds a new regression from the linked issue.
Fixes llvm#191238
…vm#198872) Add `-fcoverage-mapping`, `-fno-coverage-mapping`, `-fcoverage-compilation-dir=`, `-ffile-compilation-dir=`, and `-fcoverage-prefix-map=` to the LinkerWrapper `CompilerOptions` forwarding list. Without this, passing `-fprofile-instr-generate -fcoverage-mapping` to clang for a HIP program silently omits the coverage mapping flags from the embedded device recompilation, so `__llvm_covmap`/`__llvm_covfun` sections are never emitted for device code.
|
PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/6/builds/209 |
dstutt
approved these changes
Jun 11, 2026
dstutt
left a comment
There was a problem hiding this comment.
Primarily just using the downstream version, right? Were the changes already cherry-picked?
Either way, the resolution looks ok to me.
Author
I used mostly upstream version.
You are referring to #2636? |
Not specifically, it just looked that way. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Merge with all Scott's upstream changes
Please run
git show --remerge-diffon top of this PR