Skip to content

Merge upstream llvm into amd-debug#2671

Merged
mariusz-sikora-at-amd merged 55 commits into
amd-debugfrom
amd/dev/masikora/amd-debug-merge-candidate
May 27, 2026
Merged

Merge upstream llvm into amd-debug#2671
mariusz-sikora-at-amd merged 55 commits into
amd-debugfrom
amd/dev/masikora/amd-debug-merge-candidate

Conversation

@mariusz-sikora-at-amd

Copy link
Copy Markdown

New code from upstream, in amd-debug we have one extra parameter in this function + test update, mostly metadata.

diff --git a/llvm/unittests/IR/DebugInfoTest.cpp b/llvm/unittests/IR/DebugInfoTest.cpp
index 6709adc09388..d481b709bbc2 100644
--- a/llvm/unittests/IR/DebugInfoTest.cpp
+++ b/llvm/unittests/IR/DebugInfoTest.cpp
@@ -1455,7 +1455,7 @@ TEST(DIBuilder, CompositeTypeAnnotations) {
       Ctx, nullptr, "", "", nullptr, 0, nullptr, 0, nullptr, 0, 0,
       DINode::FlagZero, DISubprogram::SPFlagZero, nullptr);
   DIVariable *Len = DIB.createAutoVariable(SPScope, "length", F, 0, nullptr,
-                                           false, DINode::FlagZero, 0);
+                                           false, DINode::FlagZero, dwarf::DW_MSPACE_LLVM_none, 0);
   DICompositeType *DynStruct = DIB.createStructType(
       CU, "MyDynStruct", F, 0, Len, 8, DINode::FlagZero, nullptr, {}, 0,
       nullptr, "DynStructUniqueIdentifier", nullptr, 0, DynStructAnnotations);

Mel-Chen and others added 30 commits May 22, 2026 16:30
…vm#199222)

VPlanTransforms::convertToStridedAccesses calls
VPWidenMemoryRecipe::computeCost, which uses VPTypeAnalysis in
VPCostContext to infer the pointer type of the load address. However,
CachedTypes in VPTypeAnalysis may be invalidated since earlier
transformations in tryToBuildVPlan could erase recipes from the plan.
This pollutes the cache with stale types.

Fix this by creating a new VPCostContext locally scoped to
convertToStridedAccesses, ensuring VPTypeAnalysis reflects the current
plan state. This serves as a quick fix to prevent accidental reuse by
future transformations.
Follow-up to llvm#198941, which introduced Locked<T> and SharedLocked<T>.
Add GetObjectFileLocked, GetSymbolFileLocked, GetSymtabLocked, and
GetSectionListLocked alongside the existing accessors.

The locked variants cover two things:

1. They prevent the pointer from being swapped out from under the
caller. The old getters take m_mutex only during lazy initialization and
release it before returning. The unique_ptr or shared_ptr that owns the
pointee can therefore be reassigned by another thread while the caller
still holds the raw value. LockedPtr keeps the Module mutex held
alongside the borrowed pointer, pinning the binding for the lifetime of
the handle.

2. They serialize access to the pointee itself. This is not new, the
classes in question were already relying on the Module mutex for
synchronization.

Migrate the four call sites in Module where the existing patter maps to
a single LockedPtr.

The legacy raw-pointer getters remain so call sites can migrate
incrementally.
…vm#199126)

Most callers are unchanged, since they either ignore the specific error
or have their own formatting of the error that includes both the path
and the errorToErrorCode-unwrapped value. However, for clients that just
forward the error it's helpful to ensure we do not lose track of the
filename that the error is associated with, so use FileError.

Incidentally remove two uses of errorToErrorCode that were being used
instead of consumeError; in both cases getOptionalFileRef was more
appropriate.
…results (llvm#199119)

With layout conflict handling this case is no longer an issue.
…ing (llvm#199189)

This prevents generating invalid C code in mixed-language headers by
leaving `typedef` declarations inside `extern "C"` blocks intact by
default.

Fixes llvm#141394
…ameter mapping (llvm#195995)" (llvm#199228)

This reverts commit 7e2821e, which
causes a crash-on-valid in clang:
llvm#199209
…essage (llvm#199233)

Help track whether a fold was attempted or not
Implement `MemRefElementTypeInterface` on `fir::RecordType` so that
`memref<!fir.type<…>>` verifies, enabling downstream passes to use
memrefs of Fortran derived types.
Co-authored-by: <konstantinos.parasyris@intel.com>
Not profitable with VF=4, but we only we try smaller VFs if the load can
fit in a single vector register found by BoUpSLP::getVectorElementSize().
Requires proprogation of bit widths through the fmuladd intrinsic to vectorize
at VF=2. This is from the hot block in `538.imagick_r` which fails to vectorize
when vectorization is removed from pre-LTO, see
llvm#195886 (comment).
Relax modular-format attribute validation in the Verifier to allow a
first-arg-index of 0 (meaning no variadic arguments, e.g. for v-family
functions like vsnprintf).

Guard InstCombine's optimizeModularFormat against zero index.

Generated by Gemini, reviewed by dthorn
This matches the name on SiFive's website.
Summary:
These are stored in the libc/shared and have a unified CMake helper to
find them. Likely these will be a more core dependency as LLVM uses them
for constexpr math, libcxx uses it, and compiler-rt will probably use
bits of it.

The original intention was to allow building flang-rt with a partial
checkout, but i don't think this is a reasonable use-case and I do not
think this exists in practice.
There are a lot of similar and repetetive variants of SDK lookups in the
Apple platform plugins. This commit unifies the implementations, error
handling and progress reporting.

Assisted-by: claude
…#199242)

Similar to other VectorCombine folds, in case of OldCost == NewCost, use
the reduction if at least the root BinOp is removed as well as the
ExtractElement.

Noticed while triaging codegen for llvm#199208
…8867)

Annotations are not indexed, so we need to skip them on the verifier.

Assisted by: claude
…Type (llvm#197331)

DICompositeType already has an "Annotations" ivar. This simply adds a
way to set it from the "createStructType" function.
D150880 (landed as 0726cb0) uses `APInt` to eliminate most integer
overflow issues from FileCheck numeric variables. It also removes the 4
tests in `llvm/test/FileCheck/match-time-error-propagation`.

While the elimination of overflow issues reduces the importance of those
tests, the tests still seem worthwhile. Without them, I see no test that
exercises the "unable to substitute variable or numeric expression:
overflow error" diagnostic in FileCheck input dumps.

This patch resurrects those tests and updates them to exercise the
remaining unsigned underflow case.
…e-side var metadata, internalize device side variables, and lower poison attribute (llvm#190087)

Signed-off-by: ZakyHermawan <zaky.hermawan9615@gmail.com>
This changes the documented semantics of the `noescape` attribute to
disallow freeing the pointer, and allow escapes of the integer value of
the memory address, as discussed in

https://discourse.llvm.org/t/rfc-updating-the-semantics-of-the-noescape-attribute/90326.

It also clarifies that the attribute may only be used to annotate the
outermost pointer level of nested pointer parameters.
This PR is stacked on PR llvm#198136.

This patch refactors `llvm/test/FileCheck/dump-input/annotations.txt` to
improve maintainability and coverage and to prepare for the upcoming
implementation of search range annotations.

Lit substitutions
=================

The test repeats the same basic set of RUN lines *many* times. This
patch encapsulates those in lit substitutions to improve
maintainability. By doing so, it also helps to ensure more consistent
coverage of all cases and thus slightly expands coverage.

-strict-whitespace
==================

Via those substitutions, this patch adds `-strict-whitespace` throughout
the test, and it drops the initial `-strict-whitespace` case because it
is then redundant. That causes many whitespace changes throughout the
test, so this patch is easier to read with `git diff -w`.

When I originally wrote the test, I thought maintaining it would be too
painful with `-strict-whitespace`. However, I now think it is important
for usability to thoroughly check that annotations are correctly aligned
with the input, especially given the upcoming search range annotations.

-dump-input-label-width
=======================

To address that anticipated maintenance pain, and to make the above
change easier to implement, this patch also implements a new hidden
FileCheck option, `-dump-input-label-width`. It enables tests like this
one not to have to fuss with fluctuations in the label column width that
are caused when varying the verbosity options. I do not anticipate this
option will be used outside FileCheck's own test suite.

Splitting directive blocks
==========================

To improve readability, this patch splits apart directive blocks where
the same annotations appear multiple times with small differences at
different verbosity levels. See new header comments for details.
These tests were failing on z/OS because the text input files were being
opened as binary.

```
FAIL: LLVM :: tools/dsymutil/AArch64/typedef-different-types.test
FAIL: LLVM :: tools/dsymutil/X86/mismatch.m
FAIL: LLVM :: tools/dsymutil/embed-resource.test
FAIL: LLVM :: tools/llvm-gsymutil/X86/elf-symtab-file.yaml
```
Open the files as text to solve the problems.
…erleave.ll (llvm#198666)

On the memory-interleave.ll test, some of the CHECK lines are failing on
z/OS, due to difference in rounding behaviour when printing the
Estimated cost per lane. Resolve this by removing the fractional part,
similar to what done in the past with
llvm@e8556ff
and
llvm@aeb88f6
.
…egion. (llvm#199157)

This patch fixes a regression caused by llvm#198635: when we call getSource()
for a `fir.load` of a box we have to handle the input value that might be
a `BlockArgument` and pass-through it.
…lvm#197850)

Restore "Extend jump-threading to allow live local defs" llvm#135079. Long
compilation time with reduce.cu in hipcub/warp was partially addressed
in llvm#195744. Compilation time for reduce.cu with this PR (after llvm#195744)
is 6 minutes 40 seconds. Without (llvm#195744) compilation time was several
hours.

Long compilation time in reduce.cu was only exposed by jump-threading.
In my view the primary causes were due to inlining, SROA tripling the IR
code size, and SSA updating 26K phi-nodes resulting in an O(N^2) search
for duplicates. llvm#195744 limits phi search times.

This reverts commit a76750e.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
Was broken with

> when more than 1 dialect is present, one must be selected via
'-dialect'
igorban-intel and others added 21 commits May 22, 2026 14:09
…lvm#199232)

The intel_sub_group_block_write_ui[2,4,8] overloads for image2d_t were
declared with a read_only qualifier, both in opencl-c.h and in
OpenCLBuiltins.td. A write operation cannot target a read_only image,
and
the base intel_sub_group_block_write together with the analogous _us,
_uc
and _ul aliases all correctly use write_only image2d_t.

Per the cl_intel_subgroups_short [1], cl_intel_subgroups_char [2] and
cl_intel_subgroups_long [3] specifications, the _ui aliases are added
"for
naming consistency [...] There is no change to the description or
behavior
of these functions" relative to the cl_intel_subgroups base, which uses
write_only image2d_t for writes.

The typo was introduced in b833bf6 and preserved across all
later edits to this area.

Switch the qualifier from read_only to write_only in both opencl-c.h and
OpenCLBuiltins.td, and update intel-subgroups-builtins.cl to match the
corrected signature (the existing test was exercising the buggy
overload).

[1]
https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_subgroups_short.html
[2]
https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_subgroups_char.html
[3]
https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_subgroups_long.html

Co-Authored-By: Claude Opus

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ins (llvm#199258)

Add cl_intel_subgroup_buffer_prefetch and
cl_intel_subgroup_local_block_io
declarations to OpenCLBuiltins.td and cover them with header-free SPIR
tests.

This keeps the generated OpenCL builtins in sync with opencl-c.h for the
Intel subgroup buffer prefetch and local block I/O extensions.

Per the cl_intel_subgroup_local_block_io specification, the _ui local
aliases (intel_sub_group_block_read_ui*, intel_sub_group_block_write_ui*
with __local pointer) are declared under
FuncExtIntelSubgroupLocalBlockIO
alone, without a char/short/long prerequisite.  A dedicated test
(intel-subgroup-local-block-io-ui-without-char-short-long.cl) verifies
that
they resolve when only cl_intel_subgroup_local_block_io is active.

Specification:

https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_subgroup_buffer_prefetch.html

https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_subgroup_local_block_io.html

Co-authored-by: Copilot
Padded CIR unions (e.g. libstdc++ `std::string` SSO layout) carry a
trailing byte-array member so the record matches the AST layout size.
`RecordType::getTypeSizeInBits` was returning only the largest-aligned
member and ignored that tail, so the CIR view of the union was 8 bytes
smaller than what `LowerToLLVM` emits.  Parent structs then picked up
a spurious trailing pad via `insertPadding`, arrays of those structs
used the wrong stride, and heap allocations could be overrun (Eigen's
`array_of_string` hits this directly).

The fix adds the padding member's size when the union is marked
`padded`, so struct size, GEP strides, and `new T[n]` allocation sizes
match OGCG.  Regression test models the SSO-shaped record and checks
the 96-byte `new` for three elements.
Cmake does not properly parse IN_LIST within the if condition, and
treats it as a token.
This is not desired behavior.
The CMP0057 policy supports the new [if() IN_LIST
](https://cmake.org/cmake/help/latest/command/if.html#command:if)
operator.
Enable this policy and resolve the build error.


Fixes llvm#199282
Assisted by: Github Copilot
…access (llvm#199087)

This job checks out untrusted code from a PR in a trusted context
(issue_comment trigger), so we need to limit it to people with commit
access to avoid possible privilege escalation.
libclc standalone build puts libclc.bc in ${CMAKE_CURRENT_BINARY_DIR}/
${TARGET_TRIPLE} dir. check-libclc fails because .cl test is looking for
libclc in clang resource dir.
Fix them by adding `--libclc-lib=:{path}` flag for standalone build,
where `path` is path to libclc.bc.
Note: this flag is not used in in-tree build.
… ScopedHashTable traversal (llvm#196746)" (llvm#199288)

This reverts commit 371f57c due to
failing tests
Normally the open parens happen right before a.out, but on arm64e the
load address is placed there instead. So instead of:

$0 = 0x0000d00d (a.out...)

we instead have:

$0 = 0xcafed00d (actual=0x0000d00d a.out ...)
…vm#199169)

This makes check-clang-format automatically builds
clang-format-check-format, which checks that the new clang-format
doesn't break the existing format of the clang-format source.
…dVPValuesInPlan tests (llvm#199275)

llvm#195891 exposed a
use-after-free in the tests: `BinaryOperator *AI` [*] is deleted prior
to VPlan's destructor, which expects all the operands to still be alive.
This patch fixes the test (suggested by a Florian in
llvm#199252 (review)),
by preemptively detaching AI from the VPlan.

[*] No AI was harmed or used during the creation of this patch.
…llvm#198652)

This PR extended xegpu.load_matrix and xegpu.store_matrix to support 1D
mem_desc for contiguous SLM access
  - Added unit tests for 1D load/store (valid ops and invalid cases)
- Added integration test verifying both 1D (<4096xbf16>) and 2D
(<64x128xbf16>), correctly lower through the full WG→SG→WI→XeVM pipeline

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…)) (llvm#199281)

`capture(none)` has very restrictive semantics and an easy footgun to
accidentally fire some UB into your code with. Most significantly it
does not allow any visible side-effects of whether a pointer was null or
not to escape the function. This means that the function cannot perform
different side effects depending on whether a pointer marked `noescape`
is null. Relax this to `captures(address)`, which allows information
about the numerical address to escape the function, but no provenance
(i.e. nothing that could be dereferenced) may escape.

As discussed in
https://discourse.llvm.org/t/rfc-updating-the-semantics-of-the-noescape-attribute/90326.
…ng getVectorElementCount (llvm#199286)

Fixes the assert reported here:

<llvm#198446 (comment)>

I believe this happens when the element type isn't a legal RVV element
type and so has been scalarised by type legalisation.

Adding this guard also matches the AArch64 implementation.

The test change is LLM generated.
…th fixed length vectors (llvm#199227)

Implementing IRTranslator support for fixed length vectors when the V
extension is used. This implementation works similar to SelecionDAGs. We
use insert and extract subvector OPs to get the fixed length vectors out
of the scalable length vectors.
…lysis (llvm#199208)

Add full CostKinds, to improve a lot of reduction matching in
vectorcombine/slp passes

These are based off SMIN/UMIN numbers, and a few SMAX/UMAX numbers don't
always match, but are typically within +/-1
@z1-cciauto

Copy link
Copy Markdown
Collaborator

@dstutt

dstutt commented May 26, 2026

Copy link
Copy Markdown

I think the conflict resolution is fine - but I'm not sure what's going on with #2638 and #2636

@ScottEgerton ScottEgerton left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@mariusz-sikora-at-amd mariusz-sikora-at-amd merged commit 05cad89 into amd-debug May 27, 2026
6 checks passed
@mariusz-sikora-at-amd mariusz-sikora-at-amd deleted the amd/dev/masikora/amd-debug-merge-candidate branch May 27, 2026 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.