Skip to content

[pull] main from llvm:main#5793

Merged
pull[bot] merged 40 commits into
Ericsson:mainfrom
llvm:main
Jun 26, 2026
Merged

[pull] main from llvm:main#5793
pull[bot] merged 40 commits into
Ericsson:mainfrom
llvm:main

Conversation

@pull

@pull pull Bot commented Jun 26, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

tbaederr and others added 30 commits June 26, 2026 09:16
…ant (#205870)

A shuffle mask can select from the second operand even when that operand
is poison. This caused unshuffleConstant to assert while trying to map
those mask elements into the first operand's constant vector.

Fix this by ignoring mask elements that select the poison operand.

Fixes #205769
Many of these are disabled as they do not yet lower successfully.
Follow up from comments on
#202886

Make HWEvent a bitmask by default instead of having both the enum, and a
separate HWEventSet. This has the advantage of streamlining the code a
bit and opening the possibility of adding "modifiers" to events, e.g. I
imagine we could now fold "VMemType" into the Events.
We already do this with things like SMEM_GROUP. At least now it's baked
into the design.

I opted for a bit more verbosity by taking inspiration from
FastMathFlags (FMF): instead of exposing a raw enum, I wrap it in a
class w/ helper function. The downside is having to reimplement all the
little bitwise ops, but the result is a cleaner, simpler interface than
a raw enum (class) w/ many helper functions. I initially tried that but
I recoiled at the sight of things like `contains(A, B)` which isn't very
clear, while `A.contains(B)` is self explanatory.

Considering HWEvent is a bitmask, I also implemented a simple iterator
to iterate over all set bits of the mask, which is a useful thing to
have as some APIs in InsertWaitCnt rely on treating one event at a time.
…e header (#204544)

I forgot to move those out of the way as they were not grouped with the
other.
Now `getEventsFor` does all the work.
Instead of having an HWEvent that can be either a read or a write
depending on the target, keep the events as straightforward as
possible and let InsertWaitCnt interpret it. Rename VMEM_ACCESS
to VMEM_READ_ACCESS and set VMEM_WRITE_ACCESS & similar events
even if the target does not have a VSCnt.

I think this conceptually makes more sense.
This separates concerns better so that HWEvents models events
objectively, and InsertWaitCnt handles them as necessary for the task
it is trying to achieve (insert wait instructions).

My end goal with this series of changes is to de-tangle InsertWaitCnt so
we can divide it into layers, and each layer worries about its own thing.  
This is only possible with proper separation of concerns.
…terleaved access analysis (#205793)

During interleaved access analysis, certain addresses require a no-wrap
predicate to form an add recurrence and obtain the stride. However, when
optimizing for size, generating SCEV runtime checks is disallowed.

This patch modifies the constant stride collection when optimizing for
size to only collect strides that do not require predicates. This
ensures that vectorization will not blocked by disallowed predicates.
Remove the MLA commuted patterns added in #198566 and canonicalise
those operations in instcombine instead.
…205815)

Deduce dst type for new instructions, that do the load lowering, from
destination type of original load instead of from MMO.
Makes a difference with extendedLLTs.
…ge (#205816)

In widenScalarMergeValues, WideTy is input given by target. Use same LLT
kind for other types of different sizes instead of LLT::scalar.
Makes a difference with extendedLLTs.
Add support for DXContainer PRIV in the ObjectYAML pipeline so it can be
represented in structured YAML and round-tripped through
yaml2obj/obj2yaml.

PRIV part can store arbitrary user-provided binary blobs in DXContainer.
Unlike other DXContainer parts, PRIV part does not have to have 4-byte
aligned size. Therefore, if it is present, it is always the last section
in a DXContainer.

llvm-objcopy is already able to extract PRIV section. A test to verify
extraction of binary from PRIV is added.
…5848)

There is still one test remaining:

  LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll

but this looks more like a phase-ordering test and should probably be
handled separately.
… constructor (#199480)

When the user invokes **Go to Definition** on a call like
`std::make_unique<T>(args...)` or `std::make_shared<T>(args...)`,
surface the constructor of `T` that is actually invoked inside the
wrapper, alongside the wrapper itself. The constructor is added before
the wrapper so LSP clients that auto-jump to the first target land on
it; clients that present a menu still let the user reach the wrapper.

This is the forward-direction counterpart to the find-references work in
#169742 (clangd/clangd#716): the same `isLikelyForwardingFunction` +
`searchConstructorsInForwardingFunction` machinery, applied to
`locateASTReferent`.
Remove the unsafe `getType` method from ReshapeOp. It unconditionally
casts the result to `MemRefType`, but `memref.reshape` may return an
`UnrankedMemRefType`, leading to an assertion failure. The redundant
build method is also removed alongside this change. Fixes #203812.
Removes the need for gfx11_dasm_vop3_from_vop2_hi.txt sitting
downstream.

Catches a problem with printing op_sel for the tied operands in
v_fmac_f16_e64.
…#206014)

This feature was removed in a56993a.
The test used to have a pair testing the enabled and disabled case,
and there's no point leaving the enabled partner.
This code assumes that writing to an unbuffered raw_fd_ostream from
multiple threads is somehow safe. raw_fd_ostream doesn't make any
guarantees about this from what I can see.

The current raw_fd_ostream implementation also uses a looping write call
to write the content in chunks, and doing this from multiple threads
leads to interleaving log messages.

This patch unconditionally make us aquire the stream lock.
Neither SALU nor VALU support direct conversion from f16 to/from i32.

Previously, this was still legal and handled by instruction selection
patterns, forming chains f16 -> f32 -> i32 and i32 -> f32 -> f16 for the
two cases, respectively.

This change marks the conversion illegal and creates the same chains as
the pattern during (operation) legalization.

This had the added benefit that a combination of FNEG and FPTOSI/UI can
now merge the float negation into the source modifier of the f16-to-f32
conversion, as demonstrated by the GlobalISel tests.

This fixes #177342.

---------

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>
…ons (#202680)

This is a follow-up to #187487. `v_cvt_pk_*` is used for vector cases, as well as for scalar types (by passing a dummy second input) on GFX11+. Relevant fallback patterns have also been added and `splitUnaryVectorOp` has been extended to handle trailing scalar ops if present.

Assisted-by: Claude Code
The change to CGCall is required to avoid collisions of operator|=.
…FC (#205932)

Delimited directives are those that come in begin/end pairs, e.g. "begin
declare target"/"end declare target". Other block-associated directives
in Fortran do have end-forms, but they don't need to have specific
directive enums. Some such enums have been used in the past, but are not
anymore. Delete those extraneous definitions to clean up the OMP.td
file.
This removes the older overload of CheckAllowedClause(clauseId). After
0f1abfe that function was no longer doing anything.
…03355)

This will allow some of the types in src/stdio/printf_core/ to be
templated on character type for the implementation of `swprintf`.
Lower CLMUL v4i32 by splitting it into two v2i32 operations and
concatenating the results when AES is available. This avoids the much
larger generic expansion and lets v4i16 benefit via legalization through
v4i32.

Update the cost model: v4i32 is costed as the 11-instruction PMULL
sequence, and v4i16 as that sequence plus the required input widens and
result narrow.
This fixes a7eaec7.

Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com>
Add `isDynamicShaped` helper to detect shaped types without static shape
(e.g., tensor<?xf32>). Return failure in 10 ops that use
`createFloatConst` (which assumes static shapes for DenseElementsAttr):

sinh, cosh, tanh, asinh, acosh, atanh, exp2, round, roundeven, ctlz

Refactor existing guards in ceil and rsqrt to use the shared helper for
consistency.

Add lit test coverage for each op with both ?-shaped and unranked
tensors, verifying the ops are preserved unchanged.

Fix issue #203753

---

Changes since last review:

Change `isDynamicShaped` to `isUnrankedShaped` to detect unranked shape
(e.g., `tensor<*xf32>`).

Modify `createConstFloat` and `createConstInt` using `tensor.dim` and
`tensor.splat` to handle dynamically-shaped ranked tensors.

Modify lit tests for dynamically-shaped ranked tensors, verifying the
ops are expanded using `tensor.dim` and `tensor.splat`.
LLDB on Windows requires GNU tools like `dirname` which are not
installed by default. They are bundled with Git for Windows which also
depends on them. They are not in PATH however.

This patch adds those utilities to the PATH to fix lldb test targets
build failures.
michaelselehov and others added 10 commits June 26, 2026 13:42
…can (#205587)

The compressed offload bundle (CCOB) readers located the boundary
between concatenated bundles by scanning for the literal 4-byte magic
"CCOB". Those bytes can appear by chance inside a compressed payload, so
a single valid bundle could be truncated at the spurious magic and then
fail to decompress with "Src size is incorrect". At runtime this
surfaced as hipErrorInvalidImage ("device kernel image is invalid") when
loading affected HIP code objects.

Use the authoritative FileSize recorded in the compressed bundle header
(CompressedBundleHeader::tryParse, present for V2/V3) to delimit the
current bundle, and search for the next bundle's "CCOB" magic only past
that point. This keeps multi-bundle iteration working (and tolerant of
inter-bundle padding) while ignoring magic-byte collisions inside the
payload. Bundles without a recorded size (legacy V1) fall back to the
previous magic scan.
```
/usr/bin/ld: tools/flang/unittests/Frontend/CMakeFiles/FlangFrontendTests.dir/CompilerInstanceTest.cpp.o: undefined reference to symbol '_ZN5clang17getDriverOptTableEv'
/usr/bin/ld: b/x86/lib/libclangOptions.so.23.0git: error adding symbols: DSO missing from command line
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
```
This replaces the REG_SEQUENCE we use for concat_vector with
INSERT_SUBREG, which whilst not perfect can produce slightly better code
overall, and helps us avoid REG_SEQUENCE instructions.
This change removes `NodeBuilder`s from the functions connected to
`defaultEvalCall` that were previously passing around `NodeBuilder`
arguments instead of the more usual `ExplodedNodeSet &Dst`
out-parameters.

Although these `NodeBuilder`s "travelled through" many functions, their
usage pattern was relatively simple and their back-and-forth set
manipulation didn't provide any advantage over a plain exploded node
set.

In addition to the removal of the `NodeBuilder`s, this commit performs
minor simplifications in the affected code and renames the old method
`BifurcateCall` to the more specific `dynDispatchBifurcate` (because the
old name was too vague now that we also have `ctuBifurcate`).
This patch introduces several tests in
`llvm/test/FileCheck/dump-input/search-range-annotations` to demonstrate
use cases that PR #198138 improves.
These are not performance-critical and especially operator< is expensive
to compile due to the std::tie template instantiation.
This change removes the remaining `NodeBuilder`s from
`ExprEngineObjC.cpp` as a step of my project to gradually replace the
class `NodeBuilder` with more straightforward tools.

The `NodeBuilder`s that I remove were all used in very simple patterns,
especially the two `NodeBuilder`s in `VisitObjCMessage` where the
`Frontier` set is thrown away and the code continues with the return
value of `generateNode` (which is exactly the same as the return value
of the appropriate `makeNode` call).

In addition to the removal of the `NodeBuilder`s, this change also
includes two additional NFC improvements:

1. `populateObjCForDestinationSet` was previously a static helper
function with nine (!) awkward arguments; this change turns it into a
method of `ExprEngine` with only five arguments (and the first three
arguments are customary: _expression_, `Pred`, `Dst` as in the transfer
functions). This change was necessary for the `NodeBuilder` removal:
without this cleanup I would have had to add a tenth argument.

2. In `VisitObjCMessage` I remove two complex and pointless assertions.
The ancestor of these was a simple and useful
```c++
  Pred = Bldr.generateNode(currStmt, Pred, notNilState);
  assert(Pred && "Should have cached out already!");
```
introduced in 2012 by 5481cfe in a
checker, but since then it was relocated to the engine, duplicated and
weakened to `assert(Pred || HasTag)`.

As `HasTag` can easily be true and is independent of everything else in
this function, the current assertions (which are removed in this change)
don't provide any helpful invariant, they just state that "`Pred` is not
null ... except when it is null".
`MAX_PATH` is defined as 260. `PosixApi.h` already defines `PATH_MAX` as
`32,768` characters which is the max path limit for Unicode paths on
Windows.

Use this in `lldb-dap` on Windows to avoid path truncation.
`MAX_PATH` is defined as `260` on Windows. Unicode paths however can be
up to `32,767` characters long.

Use `llvm::sys::windows::widenPath` to convert paths to unicode paths to
allow targets that have long path names.

This is the first part of a series of multiple commits that will fix
long path support in lldb on Windows as well as adding regression tests.

rdar://180711797
@pull pull Bot locked and limited conversation to collaborators Jun 26, 2026
@pull pull Bot added the ⤵️ pull label Jun 26, 2026
@pull pull Bot merged commit d9c24ae into Ericsson:main Jun 26, 2026
9 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.