[pull] main from llvm:main by pull[bot] · Pull Request #5793 · Ericsson/llvm-project

pull · 2026-06-26T13:14:30Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

…ant (#205870) A shuffle mask can select from the second operand even when that operand is poison. This caused unshuffleConstant to assert while trying to map those mask elements into the first operand's constant vector. Fix this by ignoring mask elements that select the poison operand. Fixes #205769

Many of these are disabled as they do not yet lower successfully.

Follow up from comments on #202886 Make HWEvent a bitmask by default instead of having both the enum, and a separate HWEventSet. This has the advantage of streamlining the code a bit and opening the possibility of adding "modifiers" to events, e.g. I imagine we could now fold "VMemType" into the Events. We already do this with things like SMEM_GROUP. At least now it's baked into the design. I opted for a bit more verbosity by taking inspiration from FastMathFlags (FMF): instead of exposing a raw enum, I wrap it in a class w/ helper function. The downside is having to reimplement all the little bitwise ops, but the result is a cleaner, simpler interface than a raw enum (class) w/ many helper functions. I initially tried that but I recoiled at the sight of things like `contains(A, B)` which isn't very clear, while `A.contains(B)` is self explanatory. Considering HWEvent is a bitmask, I also implemented a simple iterator to iterate over all set bits of the mask, which is a useful thing to have as some APIs in InsertWaitCnt rely on treating one event at a time.

…e header (#204544) I forgot to move those out of the way as they were not grouped with the other. Now `getEventsFor` does all the work.

Instead of having an HWEvent that can be either a read or a write depending on the target, keep the events as straightforward as possible and let InsertWaitCnt interpret it. Rename VMEM_ACCESS to VMEM_READ_ACCESS and set VMEM_WRITE_ACCESS & similar events even if the target does not have a VSCnt. I think this conceptually makes more sense. This separates concerns better so that HWEvents models events objectively, and InsertWaitCnt handles them as necessary for the task it is trying to achieve (insert wait instructions). My end goal with this series of changes is to de-tangle InsertWaitCnt so we can divide it into layers, and each layer worries about its own thing. This is only possible with proper separation of concerns.

…terleaved access analysis (#205793) During interleaved access analysis, certain addresses require a no-wrap predicate to form an add recurrence and obtain the stride. However, when optimizing for size, generating SCEV runtime checks is disallowed. This patch modifies the constant stride collection when optimizing for size to only collect strides that do not require predicates. This ensures that vectorization will not blocked by disallowed predicates.

Remove the MLA commuted patterns added in #198566 and canonicalise those operations in instcombine instead.

…205815) Deduce dst type for new instructions, that do the load lowering, from destination type of original load instead of from MMO. Makes a difference with extendedLLTs.

…ge (#205816) In widenScalarMergeValues, WideTy is input given by target. Use same LLT kind for other types of different sizes instead of LLT::scalar. Makes a difference with extendedLLTs.

Add support for DXContainer PRIV in the ObjectYAML pipeline so it can be represented in structured YAML and round-tripped through yaml2obj/obj2yaml. PRIV part can store arbitrary user-provided binary blobs in DXContainer. Unlike other DXContainer parts, PRIV part does not have to have 4-byte aligned size. Therefore, if it is present, it is always the last section in a DXContainer. llvm-objcopy is already able to extract PRIV section. A test to verify extraction of binary from PRIV is added.

…5848) There is still one test remaining: LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll but this looks more like a phase-ordering test and should probably be handled separately.

… constructor (#199480) When the user invokes **Go to Definition** on a call like `std::make_unique<T>(args...)` or `std::make_shared<T>(args...)`, surface the constructor of `T` that is actually invoked inside the wrapper, alongside the wrapper itself. The constructor is added before the wrapper so LSP clients that auto-jump to the first target land on it; clients that present a menu still let the user reach the wrapper. This is the forward-direction counterpart to the find-references work in #169742 (clangd/clangd#716): the same `isLikelyForwardingFunction` + `searchConstructorsInForwardingFunction` machinery, applied to `locateASTReferent`.

Remove the unsafe `getType` method from ReshapeOp. It unconditionally casts the result to `MemRefType`, but `memref.reshape` may return an `UnrankedMemRefType`, leading to an assertion failure. The redundant build method is also removed alongside this change. Fixes #203812.

Removes the need for gfx11_dasm_vop3_from_vop2_hi.txt sitting downstream. Catches a problem with printing op_sel for the tied operands in v_fmac_f16_e64.

…#206014) This feature was removed in a56993a. The test used to have a pair testing the enabled and disabled case, and there's no point leaving the enabled partner.

This code assumes that writing to an unbuffered raw_fd_ostream from multiple threads is somehow safe. raw_fd_ostream doesn't make any guarantees about this from what I can see. The current raw_fd_ostream implementation also uses a looping write call to write the content in chunks, and doing this from multiple threads leads to interleaving log messages. This patch unconditionally make us aquire the stream lock.

Neither SALU nor VALU support direct conversion from f16 to/from i32. Previously, this was still legal and handled by instruction selection patterns, forming chains f16 -> f32 -> i32 and i32 -> f32 -> f16 for the two cases, respectively. This change marks the conversion illegal and creates the same chains as the pattern during (operation) legalization. This had the added benefit that a combination of FNEG and FPTOSI/UI can now merge the float negation into the source modifier of the f16-to-f32 conversion, as demonstrated by the GlobalISel tests. This fixes #177342. --------- Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

…ons (#202680) This is a follow-up to #187487. `v_cvt_pk_*` is used for vector cases, as well as for scalar types (by passing a dummy second input) on GFX11+. Relevant fallback patterns have also been added and `splitUnaryVectorOp` has been extended to handle trailing scalar ops if present. Assisted-by: Claude Code

The change to CGCall is required to avoid collisions of operator|=.

…FC (#205932) Delimited directives are those that come in begin/end pairs, e.g. "begin declare target"/"end declare target". Other block-associated directives in Fortran do have end-forms, but they don't need to have specific directive enums. Some such enums have been used in the past, but are not anymore. Delete those extraneous definitions to clean up the OMP.td file.

This removes the older overload of CheckAllowedClause(clauseId). After 0f1abfe that function was no longer doing anything.

…declarations). (#203615) The PR adds support for [DebugFunctionDeclaration](https://github.khronos.org/SPIRV-Registry/nonsemantic/NonSemantic.Shader.DebugInfo.html#DebugFunctionDeclaration).

…03355) This will allow some of the types in src/stdio/printf_core/ to be templated on character type for the implementation of `swprintf`.

Lower CLMUL v4i32 by splitting it into two v2i32 operations and concatenating the results when AES is available. This avoids the much larger generic expansion and lets v4i16 benefit via legalization through v4i32. Update the cost model: v4i32 is costed as the 11-instruction PMULL sequence, and v4i16 as that sequence plus the required input widens and result narrow.

This fixes a7eaec7. Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com>

Add `isDynamicShaped` helper to detect shaped types without static shape (e.g., tensor<?xf32>). Return failure in 10 ops that use `createFloatConst` (which assumes static shapes for DenseElementsAttr): sinh, cosh, tanh, asinh, acosh, atanh, exp2, round, roundeven, ctlz Refactor existing guards in ceil and rsqrt to use the shared helper for consistency. Add lit test coverage for each op with both ?-shaped and unranked tensors, verifying the ops are preserved unchanged. Fix issue #203753 --- Changes since last review: Change `isDynamicShaped` to `isUnrankedShaped` to detect unranked shape (e.g., `tensor<*xf32>`). Modify `createConstFloat` and `createConstInt` using `tensor.dim` and `tensor.splat` to handle dynamically-shaped ranked tensors. Modify lit tests for dynamically-shaped ranked tensors, verifying the ops are expanded using `tensor.dim` and `tensor.splat`.

LLDB on Windows requires GNU tools like `dirname` which are not installed by default. They are bundled with Git for Windows which also depends on them. They are not in PATH however. This patch adds those utilities to the PATH to fix lldb test targets build failures.

…can (#205587) The compressed offload bundle (CCOB) readers located the boundary between concatenated bundles by scanning for the literal 4-byte magic "CCOB". Those bytes can appear by chance inside a compressed payload, so a single valid bundle could be truncated at the spurious magic and then fail to decompress with "Src size is incorrect". At runtime this surfaced as hipErrorInvalidImage ("device kernel image is invalid") when loading affected HIP code objects. Use the authoritative FileSize recorded in the compressed bundle header (CompressedBundleHeader::tryParse, present for V2/V3) to delimit the current bundle, and search for the next bundle's "CCOB" magic only past that point. This keeps multi-bundle iteration working (and tolerant of inter-bundle padding) while ignoring magic-byte collisions inside the payload. Bundles without a recorded size (legacy V1) fall back to the previous magic scan.

``` /usr/bin/ld: tools/flang/unittests/Frontend/CMakeFiles/FlangFrontendTests.dir/CompilerInstanceTest.cpp.o: undefined reference to symbol '_ZN5clang17getDriverOptTableEv' /usr/bin/ld: b/x86/lib/libclangOptions.so.23.0git: error adding symbols: DSO missing from command line clang++: error: linker command failed with exit code 1 (use -v to see invocation) ```

…206030)

This replaces the REG_SEQUENCE we use for concat_vector with INSERT_SUBREG, which whilst not perfect can produce slightly better code overall, and helps us avoid REG_SEQUENCE instructions.

This change removes `NodeBuilder`s from the functions connected to `defaultEvalCall` that were previously passing around `NodeBuilder` arguments instead of the more usual `ExplodedNodeSet &Dst` out-parameters. Although these `NodeBuilder`s "travelled through" many functions, their usage pattern was relatively simple and their back-and-forth set manipulation didn't provide any advantage over a plain exploded node set. In addition to the removal of the `NodeBuilder`s, this commit performs minor simplifications in the affected code and renames the old method `BifurcateCall` to the more specific `dynDispatchBifurcate` (because the old name was too vague now that we also have `ctuBifurcate`).

This patch introduces several tests in `llvm/test/FileCheck/dump-input/search-range-annotations` to demonstrate use cases that PR #198138 improves.

These are not performance-critical and especially operator< is expensive to compile due to the std::tie template instantiation.

This change removes the remaining `NodeBuilder`s from `ExprEngineObjC.cpp` as a step of my project to gradually replace the class `NodeBuilder` with more straightforward tools. The `NodeBuilder`s that I remove were all used in very simple patterns, especially the two `NodeBuilder`s in `VisitObjCMessage` where the `Frontier` set is thrown away and the code continues with the return value of `generateNode` (which is exactly the same as the return value of the appropriate `makeNode` call). In addition to the removal of the `NodeBuilder`s, this change also includes two additional NFC improvements: 1. `populateObjCForDestinationSet` was previously a static helper function with nine (!) awkward arguments; this change turns it into a method of `ExprEngine` with only five arguments (and the first three arguments are customary: _expression_, `Pred`, `Dst` as in the transfer functions). This change was necessary for the `NodeBuilder` removal: without this cleanup I would have had to add a tenth argument. 2. In `VisitObjCMessage` I remove two complex and pointless assertions. The ancestor of these was a simple and useful ```c++ Pred = Bldr.generateNode(currStmt, Pred, notNilState); assert(Pred && "Should have cached out already!"); ``` introduced in 2012 by 5481cfe in a checker, but since then it was relocated to the engine, duplicated and weakened to `assert(Pred || HasTag)`. As `HasTag` can easily be true and is independent of everything else in this function, the current assertions (which are removed in this change) don't provide any helpful invariant, they just state that "`Pred` is not null ... except when it is null".

`MAX_PATH` is defined as 260. `PosixApi.h` already defines `PATH_MAX` as `32,768` characters which is the max path limit for Unicode paths on Windows. Use this in `lldb-dap` on Windows to avoid path truncation.

`MAX_PATH` is defined as `260` on Windows. Unicode paths however can be up to `32,767` characters long. Use `llvm::sys::windows::widenPath` to convert paths to unicode paths to allow targets that have long path names. This is the first part of a series of multiple commits that will fix long path support in lldb on Windows as well as adding regression tests. rdar://180711797

tbaederr and others added 30 commits June 26, 2026 09:16

[clang][bytecode] Fix division by zero in CXXNewExpr handling (#205800)

9d6e0dd

[ARM] Add basic bf16 instructions tests. NFC (#206003)

a42540b

Many of these are disabled as they do not yet lower successfully.

[AMDGPU][InsertWaitCnts] Move TENSOR/ASYNC event detection to separat…

938ee65

…e header (#204544) I forgot to move those out of the way as they were not grouped with the other. Now `getEventsFor` does all the work.

[MLIR][XeGPU] Enable isa<> check for uarch (#204577)

bda6db4

[AArch64][SVE] add missing MLA commute instcombine (#205526)

a0248a2

Remove the MLA commuted patterns added in #198566 and canonicalise those operations in instcombine instead.

GlobalISel/LegalizerHelper: Use type of input load dst for LowerLoad (#…

e1cbf0f

…205815) Deduce dst type for new instructions, that do the load lowering, from destination type of original load instead of from MMO. Makes a difference with extendedLLTs.

GlobalISel/LegalizerHelper: Use same LLT kind as WideTy for widen mer…

9e9de1f

…ge (#205816) In widenScalarMergeValues, WideTy is input given by target. Use same LLT kind for other types of different sizes instead of LLT::scalar. Makes a difference with extendedLLTs.

[LV][NFC] Remove instcombine pass from RUN lines in target tests (#20…

10755f4

…5848) There is still one test remaining: LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll but this looks more like a phase-ordering test and should probably be handled separately.

[AMDGPU][NFC] Roundtrip gfx11_asm_vop3_from_vop2.s (#205825)

2436174

Removes the need for gfx11_dasm_vop3_from_vop2_hi.txt sitting downstream. Catches a problem with printing op_sel for the tied operands in v_fmac_f16_e64.

AMDGPU: Remove leftover test for old promote-alloca subtarget feature (…

cdbc5ca

…#206014) This feature was removed in a56993a. The test used to have a pair testing the enabled and disabled case, and there's no point leaving the enabled partner.

AMDGPU: Remove unnecessary target-cpu attributes from tests (#206015)

cb27ba9

[CMake][CodeGen] Add PCH for Clang CodeGen (#206018)

63ccf78

The change to CGCall is required to avoid collisions of operator|=.

[flang][OpenMP] Delete no longer needed CheckAllowedClause (#205936)

c7d1932

This removes the older overload of CheckAllowedClause(clauseId). After 0f1abfe that function was no longer doing anything.

[SPIRV] Emit NonSemantic DebugFunctionDeclaration for DISubprograms (…

74cc00f

…declarations). (#203615) The PR adds support for [DebugFunctionDeclaration](https://github.khronos.org/SPIRV-Registry/nonsemantic/NonSemantic.Shader.DebugInfo.html#DebugFunctionDeclaration).

[libc] Change cpp::string_view into cpp::basic_string_view<CharT> (#2…

a7eaec7

…03355) This will allow some of the types in src/stdio/printf_core/ to be templated on character type for the implementation of `swprintf`.

[Bazel] Fixes a7eaec7 (#206035)

933aa3e

This fixes a7eaec7. Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com>

michaelselehov and others added 10 commits June 26, 2026 13:42

[AMDGPU] Use hasVCvtPkIU16F32 instead of a generation check (NFC) (#…

3e75ca6

…206030)

[ARM] Replace Neon concat patterns with insert_subregs. (#205505)

2028801

This replaces the REG_SEQUENCE we use for concat_vector with INSERT_SUBREG, which whilst not perfect can produce slightly better code overall, and helps us avoid REG_SEQUENCE instructions.

[FileCheck] Extract new tests for braced search ranges here (#199063)

685fe09

This patch introduces several tests in `llvm/test/FileCheck/dump-input/search-range-annotations` to demonstrate use cases that PR #198138 improves.

[TargetParser][NFC] Move Triple comparators out of line (#206032)

ba35346

These are not performance-critical and especially operator< is expensive to compile due to the std::tie template instantiation.

[lldb-dap][Windows] use Unicode path limit (#206046)

0ee0a1e

`MAX_PATH` is defined as 260. `PosixApi.h` already defines `PATH_MAX` as `32,768` characters which is the max path limit for Unicode paths on Windows. Use this in `lldb-dap` on Windows to avoid path truncation.

pull Bot locked and limited conversation to collaborators Jun 26, 2026

pull Bot added the ⤵️ pull label Jun 26, 2026

pull Bot merged commit d9c24ae into Ericsson:main Jun 26, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] main from llvm:main#5793

[pull] main from llvm:main#5793
pull[bot] merged 40 commits into
Ericsson:mainfrom
llvm:main

pull Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Uh oh!

Conversation

pull Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

pull Bot commented Jun 26, 2026 •

edited

Loading