Skip to content

[pull] main from llvm:main#5794

Merged
pull[bot] merged 66 commits into
Ericsson:mainfrom
llvm:main
Jun 26, 2026
Merged

[pull] main from llvm:main#5794
pull[bot] merged 66 commits into
Ericsson:mainfrom
llvm:main

Conversation

@pull

@pull pull Bot commented Jun 26, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

aengelke and others added 30 commits June 26, 2026 13:29
The existing EnumEntry stores string using StringRefs, which are large
and require two relocations per entry. Introduce a new, compact enum
string representation that stores strings relative to the enum entries
in memory, allowing a low-overhead and relocation-free storage.

Unfortunately, the enum definitions have to be written into a separate
constexpr variable; only C++20 supports structural template parameters.
It is also not possible to hide this behind a macro due, because we want
enum entries to be sourced from other files and #include cannot occur
during a macro expansion.

When all uses of EnumEntry are ported to the new representation, this
will save 4.7k relocations on libLLVM.so (3% in an all-target assert
build), resulting in faster startup and lower max-rss, as these rarely
used pages don't need to be touched at all anymore.
This PR has an
[RFC](https://discourse.llvm.org/t/rfc-filecheck-improving-input-dump-readability/91112).
It is stacked on PR #199063.

Example
=======

```
$ cat check
CHECK: start
CHECK-NEXT: end

$ FileCheck -v -dump-input-context=2 check < input |& tail -23
<<<<<<
          1: start
check:1      ^~~~~
next:2'0         {   search range start (exclusive)
          2: foo0
          3: foo1
          .
          .
          .
         21: foo19
         22: foo20
         23: end
next:2'1     !~~   error: match on wrong line
         24: bar0
         25: bar1
          .
          .
          .
         42: bar18
         43: bar19
         44: bar20
next:2'2           } search range end (exclusive)
>>>>>>
```

Without this patch, input lines 1-3 and 42-44 are not shown. However,
lines 1-3 are where the actual problem is because that is where the
`CHECK-NEXT` directive was expected to match but did not.

Search Ranges Are Helpful
=========================

In general, this patch marks any failed pattern's search range by using
the annotation style shown above, and these annotations are filtered in
when using `-dump-input-filter=error`, which is the default filter.
Seeing the search range can be helpful for understanding the pattern's
behavior. Moreover, the cause of the pattern failure is often the input
at the start or end of the search range. For example:

- A `CHECK-NEXT` or `CHECK-SAME` match on the wrong line, as in the
  above example.
- A `CHECK-NOT` unexpected match because a neighboring directive matched
  at an unexpected point, affecting the search range.
- An unmatched `CHECK` because a subsequent `CHECK-LABEL` matched at an
  unexpected point, affecting the search range end. (In this case, the
  search range start and thus the prior directive's match is already
  revealed without this patch.)

This patch updates tests in
`llvm/test/FileCheck/dump-input/search-range-annotations`, which
demonstrate its benefit for those cases.

This patch is a replacement for D96653, which attempted to address the
above cases but in a less straight-forward and somewhat broken manner.
The idea of the current patch was discussed during that review.

Replacing `X~~`
===============

Without this patch, search ranges for unmatched patterns (whether a
success, as for `CHECK-NOT`, or a failure, as for `CHECK`) are already
marked with `X~~`. In those cases, this patch replaces those annotations
with the new annotations shown above. As described above, this patch
adds the new annotations to all other failed matched patterns as well.

`X~~` is thus no longer used by `-dump-input`. The `X~~` style is very
noisy, especially for consecutive unmatched unexpected patterns (like a
`CHECK-NOT` block), all of which mark every line of their identical
search ranges. Switching to the new style significantly reduces the
noise in such cases. That noise was discussed recently in issue #77257
and PRs linked from there, and that discussion led to the resurrection
of this patch.

Without this patch, `-dump-input-filter=error` filters in only the start
of a search range that spans multiple lines because an entire `X~~` is
one annotation. With this patch, it filters in both the start and end of
a search range because they are separate annotations (or a single
one-liner, as discussed below). With or without this patch, a different
argument to `-dump-input-filter` (and `-vv`) is required to filter in
the search range for an unmatched unexpected pattern, like `CHECK-NOT`,
because that is not an error.

One-Liners
==========

If a search range does not involve multiple input lines, this patch
keeps the `{` and `}` markers on the same output line like this:

```
<<<<<<
         1: start foo end
check:1     ^~~~~
check:3               ^~~
not:2           {     }     search range (exclusive bounds)
>>>>>>
```

Exclusive Boundaries
====================

`{` and `}` are to be interpreted as exclusive bounds. That is, the
characters at those markers are not included in the search range, but
everything in between is. To try to avoid confusion, this patch adds the
word "exclusive" in every search range annotation.

When the search range starts or ends at a line boundary, the marker
cannot be placed at the first or last character of the line because that
would exclude that character. This patch instead places the marker in
the margin of the input dump, either before the line's first character
or after the line's last character (usually a newline), rather than on
the adjacent line. I have found this makes the annotations easier to
read (more apt have a one-liner, at least), and I do not think it would
make much sense to move a start annotation to an imaginary line 0. For
example, the `{` and `}` below appear in the input dump margins, before
the first character, `s`, and after the space representing the newline:

```
<<<<<<
           1: start
check:1'0    {      } search range (exclusive bounds)
check:1'1             error: no match found in search range
>>>>>>
```

Before trying exclusive bounds, I experimented with notations involving
inclusive bounds (e.g., `[` and `]`, or `[` and `)`). Such notations
either cannot distinguish empty ranges (without a cryptic inversion like
`][`) from one-character ranges, or they cannot represent them with
one-liners (because they must occupy the same column), increasing
verbosity. I ultimately decided I prefer exclusive bounds because they
are visually and semantically symmetric while consistently concise and
unambiguous.

For comparison, match ranges (e.g., `^~~`) cannot distinguish
one-character ranges from empty ranges. However, in my experience, empty
match ranges are uncommon, and usually it is easy to distinguish them
based on the pattern or directive (e.g., `CHECK-EMPTY`). In contrast,
empty search ranges can occur repeatedly when directive matches are
adjacent and `-implicit-check-not` is used.
…206053)

This fixes the case where GFX7 fails expensive checks/machine
verification with GISel due to passing a literal directly to V_MIN that
is not supported on the architecture. This fixes the buildbot failure:
https://lab.llvm.org/buildbot/#/builders/187/builds/21241 caused by
#202680.
The embedded compressed payload is in little endian, and offload assumes
that host endianness is used. Skip the test if host endianness is not
little endian.

Alternative to #205822.
…." (#206062)

Reverts #205235

Due to timeouts reported in GitHub CI and reproduced locally by me. 
See #205879 for details.
)

The const and pure attribute may only be applied to a function
declaration. However, we were missing a subject list for the attributes
and so we would silently accept and retain the attribute on any kind of
declaration.

Empirical testing suggests that this attribute is not effective with
Objective-C method calls or indirect calls and so the subject is limited
to just function declarations.
…ments. (#165278)

FreeBSD coredump uses program headers to store mmap information. It is
possible for program to use more than PN_XNUM mmaps. Therefore, we
implement the support of PN_XNUM in readelf.
The X86 backend then lowers the shuffle through lowerV16F32Shuffle /
lowerV8F64Shuffle, which fall through to lowerShuffleWithPERMV (VPERMPS
/ VPERMPD). lowerShuffleAsVALIGN is asserted on i32 / i64 element types
only and is never called from the float-domain paths, even when the mask
is a clean concatenate-and-shift that VALIGN expresses exactly.

On znver5, VALIGN and VPERMPS / VPERMPD have identical latency (5 cycles
for zmm), throughput (2), and macro-op count (1). The real cost of
VPERMPS / VPERMPD is the extra zmm register required to hold the
permutation index vector.

Intrinsic path for _mm512_alignr_epi32 also gets a vperm. Its a win in
generic path as well as vpermps zmm1, zmm0, zmm3 requires a dedicated
zmm register to hold the permutation index vector. valignd zmm1, zmm3,
zmm3, 1 encodes the rotation count as an immediate (imm8 = 1), using no
extra registers.

Co-authored-by: Shivanshu
#205761)

This patch updates the ThreadSanitizer documentation in clang/docs by
documenting the run-time flags and suppressions, which was requested in
google/sanitizers#446.

Specifically:
- Adds a "Run-time Flags" section detailing common options that can be
passed in TSAN_OPTIONS (e.g. exitcode, log_path, history_size,
halt_on_error, report_atomic_races, etc.).
- Explains how to print the full list of options using help=1.
- Adds a "Suppressions" section documenting the syntax, wildcard rules,
and types of runtime suppressions (race, thread, called_from_lib) with a
practical example suppressions file.
- Adds compile-time ignorelist code examples.
- Document limitations with C++ Exceptions, non-instrumented code, and
GDB/ASLR issues.
- Removes outdated references to the archived sanitizers wiki.
Signed-off-by: Ingo Müller <ingomueller@google.com>
)

-mtriple=amdgcn is by far the dominant form over space separation.
Convert these to simplify future bulk test updates.
Co-Authored-By: Claude <noreply@anthropic.com>
…alues" (#206034)

Reverts #205657

The original commit was causing pre-merge CI to fail for AArch64, as one
of the tests expects stepping behaviour that is seen on not seen on
AArch64 targets; the test suite containing the failing test is meant to
be configured to not run for AArch64, but the unsupported label was not
being applied, due to an error in the unsupported check. This patch
fixes the unsupported check in scripts/lit.local.cfg, which should
prevent further errors.
Run `acc-bind-routine` on `FunctionOpInterface` and rewrite calls to
bound symbols in offload regions and `gpu.func`. For string bind names,
declare private functions in the enclosing `gpu.module` symbol table
when the call is inside device code.
Follow up to #200414
[comment](#200414 (comment))
to add explicit `-global-isel` flag to mixed tests.
Other than in 8a7846f (the C++23 bump), we apparently only
bump the standard for libc++, but not for libc++abi.
…ctions (#205612)

The default max interleave factor is 2. Increasing it to 4 universally
can spend an amount of codeside on something that does not always
increase performance (especially if the loop gets over-unrolled). Small
reduction loops often benefit from extra interleaving due to the
multiple independant streams that can execute in parallel. This patch
increases the max interleave factor to 4 for such loops, limited to
where the VF is <= 4 to limit the impact for already highly vectorized
loops.
To handle bitcode inputs that are not in individual files on disk, such
as members of non-thin archives, DTLTO serializes those inputs to
temporary individual bitcode files.

This patch changes LLVM to serialize only uncached input modules and any
modules they import from.

For a link of Clang 22 (debug build with sanitizers and
instrumentation), I performed measurements with and without this patch
for an optimized toolchain (PGO non-LTO, based on recent main commit
c264e07). The measurements were run on:
- Windows 11 Pro build 26200, AMD Family 25 at approximately 4.5 GHz,
  16 cores / 32 threads, and 64 GB RAM.
- Ubuntu 24.04.3 LTS, Ryzen 9 5950X with 32 threads, and 62 GiB RAM.

There was no difference in serialization time when the cache was
disabled.

When the cache was enabled and all compilations hit in the cache,
serialization was eliminated, as was the time spent deleting the
previously serialized temporary files, which are no longer created. Mean
wall-clock time improved by about 10% on both machines in this scenario.
The horizontal reduction reuse-counter scale is built in
getRootNodeScalars() order and applied positionally to the emitted
reduction vector. For a root node with copyable elements the scalar
order is reordered while the emitted lanes still follow the reduced
values (candidates) order, so the repeat count was applied to the wrong
lane, producing a wrong reduction result.

Fixes #205614

Reviewers: 

Pull Request: #206102
Instead of storing pointer+value pair, use the new enum tables to store
the same information more compact and without dynamic relocations.
…both LHS and RHS are slices of the same array" (#206103)

Reverts #204532 due to regressions in numerous Fujitsu
tests and several important apps
…#205952)

The iteration order of DenseSet is not guaranteed, which affects the
output of code generated with GVNSink enabled. This can cause code to be
emitted in differing order, affect section ordering, and in some cases
was reported to result in larger binaries due to increased padding between
sections.

This patch addresses this by using SetVector, which has a deterministic
iteration order.
…to ensure middle-end is creating reduction intrinsics (#206101)

AVX512 is missing a llvm.vector.reduce.add.v16i32 call - will investigate
Reading a global or static variable on a Wasm target produced a wrong
value (or none at all). Two Wasm-only bugs combined to break it, both of
which need to be fixed to support `target variable` / `frame var`.

1. DWARFExpression::Evaluate special-cased DW_OP_addr and
DW_OP_addrx/DW_OP_GNU_addr_index on Wasm to push a LoadAddress, based on
the theory that "Wasm file sections aren't mapped into memory". But a
DW_OP_addr operand denotes a location in the module's address space,
i.e. a file address like on every other target. Forcing a load address
breaks the static (no-process) read path, since a file section cannot be
read as a load address.

2. ObjectFileWasm::SetLoadAddress mapped every section with
`load_address | GetFileOffset()`. For an active data segment that
Object-tags the address (top bit = code space) and uses the file offset
instead of the segment's linear-memory address, so a live read of a data
global resolved to a garbage address in the wrong space.

Address (1) by dropping the incorrect special casing. Address (2) by
mapping data sections into the Memory space at their linear VM address
while preserving the module id. Code and other sections keep their
Object-space module-offset addressing.
When reading shared cache libraries out of lldb's own memory (the
default, eSymbolSharedCacheUseHostAndInferiorSharedCache), the dyld
introspection path built a plain DataExtractor spanning an image's
[minVmAddr, maxVmAddr). A shared cache image's segments may not be
contiguous: other images' data and unmapped guard pages may lie between
them.

Take advantage of the VirtualDataExtractor with a per-segment lookup
table instead, matching the map_shared_cache_binary_segments path, so
reads are confined to mapped segments.
Avoid incompatible declarations, which are problematic with MSVC.
forking-google-bazel-bot Bot and others added 27 commits June 26, 2026 16:25
This fixes dd5357d.

Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com>
#205939)

`ACCRecipeMaterialization` can replace the placeholder with the actual
variable name when materializing the recipe.

Assisted-by: Claude Code
Instead of manually calculating the size and alignment of a union, we
can just generate an actual union and take the size and alignment of
that.

Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
…efs (#195877)

It fixes the following case:
```
   vector.transfer_read %arg0[], %0 : memref<f16>, vector<f16>
```
The macro is only required inside `<fstream>`, so we can move it there
instead of having it as a general configuration macro.
Added a guard so the structured pack transform reports a normal tiling
failure when the target has already been bufferized, instead of reaching
a tensor-only path and asserting.
Fixes #205744
…6107)

`LLDB_LAUNCH_FLAG_USE_PIPES=1` is used in tests to run lldb without the
ConPTY on Windows. This reduces the flakyness of tests.

This patch ensures that we read the value of
`LLDB_LAUNCH_FLAG_USE_PIPES` when setting up gdbremote tests, to make
sure they don't use the ConPTY.

This fixes `tools/lldb-server/TestGdbRemote_qThreadStopInfo.py` on
https://ci-external.swift.org/job/lldb-windows/job/main/.
`std::countr_zero` can be used instead, which is a standard API.
This implementation detects a use-after-move for the 3-arguments
std::move on containers. This PR fixes #137157.

Since my current implementation uses `IteratorModeling` which is in
alpha stage I mark this PR as draft.

When both the `IteratorModeling` and `MoveChecker` are enabled my
implementation works to detect the use-after-move for the 3 argument
std::move case.

```cpp
std::move(l1.begin(), l1.end(), std::back_inserter(l2));
std::cout << "l1: " << *l1.cbegin() << '\n'; // <--- should have a use-after-move
```

```text
move_iterator.cpp:14:28: warning: Method called on moved-from object 'l1' of
      type 'std::list' [cplusplus.Move]
   14 |     std::cout << "l1: " << *l1.cbegin() << '\n'; // <--- should ...
      |                            ^~~~~~~~~~~~
```

`evalCall` models the 3-arg `std::move` pattern and marks the source
container in `TrackedContentsMap` to avoid false positives on safe
method calls. In `checkPreCall` I recover the iterator's container
through `getIteratorPosition` and check it against `TrackedContentsMap`
to emit the warning.

I have been thinking about alternative solutions that do not depend on
`IteratorModeling`, but I think it would be more time saving to ask
maintainers about possible solutions before I start my own
implementation.
When constructing the dependency graph for compilation caching, the
dependency scanner needs to do some extra operations on the compiler
invocations. Historically, these have not utilized the copy-on-write
variant well. This patch takes care to minimize `CompilerInvocation`
copies, which improves incremental scans with populated up-to-date
scanning module cache by 16-18%. Together with
#203350 which operates in the
same space, wall-times are improved by 1.54x and instruction counts by
1.66x.
…ompiler.h> (#205590)

These macros are essentially there to query compiler features, so they
should be moved into `<__configuration/compiler.h>`.
Towards #172124

Co-authored-by: Hristo Hristov <zingam@outlook.com>
Resolve the names of CRITICAL constructs even if they are reserved
names.
This also limits locator parsing to known reserved names.

Fixes #205855
…eduction costs (#206124)

Fixes failure to fold to v16i32 reduction on ax512 targets

We still need to determine better CostKind values - but that can wait until #194621 is complete
)

TLS 1.3 is only supported on Windows Server 2022 and beyond. Windows
Server 2019 only supports up to TLS 1.2.

This causes test failures on CI runners which run on Windows Server
2019.

This patch allows falling back to TLS 1.2 if 1.3 is not available.
…5864)

Prepares for `AF_UNIX` domain-socket support on Windows by separating
the cross-platform socket logic from the one platform-specific
operation.

Every domain-socket operation is identical on POSIX and Windows (via
`<afunix.h>`), so it now lives in a single base class
`DomainSocket`. The one operation that is different is `CreatePair()`.
It lives in `DomainSocketPosix` / `DomainSocketWindows`. It's selected
for the host as `DomainSocketPlatform` through
`lldb/Host/DomainSocket.h`.

This is an NFC patch: POSIX behavior is unchanged, and while the shared
code now also compiles on Windows it stays unreachable there. A
follow-up commit enables it.

rdar://180736036
…206129)

Host::OpenURL was only defined for Darwin (in Host.mm). Add a portable
implementation in the common Host.cpp: on Unix it launches xdg-open; on
Windows it returns "unsupported" for now. xdg-open is run without a
shell (run_in_shell=false) so query-string metacharacters in the URL are
never interpreted by the shell.

Also add Host::URLEncode, an RFC 3986 percent-encoder for assembling
tracker URLs. These are the building blocks for an upcoming "diagnostics
report" command that opens a pre-filled bug URL, and the encoder is
shared with a downstream tap-to-radar reporter.
The Diagnostics framework had a callback registry (AddCallback /
RemoveCallback) so subsystems could contribute files to a diagnostics
directory, intended to also run during crash handling. That crash-time
path never materialized, and the sole registered callback was the
Debugger copying its file-backed logs. If you had no logging enabled,
the directory would be empty, confusing the users.

Remove the registry and the callback loop in Diagnostics::Create (which
now just writes the in-memory log), and expose the log copying as
Debugger::CopyLogFilesToDirectory, which "diagnostics dump" calls
directly. The dump command now copies the invoking debugger's logs
rather than every debugger's, which is the more useful behavior I want
to double down on.
This fixes a problem with CIR failing to handle boolean result types for the __builtin_(add|sub|mul)_overflow functions. We were trying to lower to operations derived from CIR_BinOpOverflow, but these operations required an integer type for the return value. This change relaxes that requirement to allow integer or boolean types.

related non-CIR PR #192568.
)

The majority of the content of rdar://179151476 duplicates the
PointerFlow analysis after
#203633. Therefore, we only
need to upstream the tests for better test coverage and proving the
duplication.

rdar://179151476
Replace assertions that listed concrete types with generic ones that
check that the type is a vector with an even number of elements.

Update splitUnary and splitBinary.

I already updated splitBinary and splitTernary in #203472, but
splitBinary change was accidentally removed in #203607, so I am bringing
it back in.
When recipes are generated per type and not per variable, we can end up
with the same location for multiple private/firstprivate/reduction
variables. When materializing the recipes, set the Location of all
Operations within the recipe region to be that of the op that is being
materialized. It is okay to mutate the original recipes since the location
is already not "useful" and the recipes will always get removed at the end
of the pass.
…205531)

This continues the effort to split `<__config>` into self-contained
detail headers.
Introduce acc.gpu_shared_memory to represent GPU workgroup memory slots
in a compute region - used for planning before eventually turned into a
`memref.view` of a dynamic slot within the workgroup allocation.
Fixes #201756

AI Usage: Used to Search codebase to find location of code to modify and
understand existing implementation.

---------

Co-authored-by: Simon Pilgrim <git@redking.me.uk>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
Summary:
In the case of, say, a GPU sanitizer, there could be a pending report
that isn't flushed before the queue dies and the program terminates. Add
an explicit flush to ensure that all work at least posted *before* the
trap fired is cleared in the HSA error callback before actually
quitting.
Fixes #205571

---------

Co-authored-by: Barry Revzin <brevzin@jumptrading.com>
Co-authored-by: Björn Schäpers <bjoern@hazardy.de>
@pull pull Bot locked and limited conversation to collaborators Jun 26, 2026
@pull pull Bot added the ⤵️ pull label Jun 26, 2026
@pull pull Bot merged commit 00ca105 into Ericsson:main Jun 26, 2026
14 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.