Merge upstream llvm into amd-debug#2671
Merged
mariusz-sikora-at-amd merged 55 commits intoMay 27, 2026
Merged
Conversation
…vm#199222) VPlanTransforms::convertToStridedAccesses calls VPWidenMemoryRecipe::computeCost, which uses VPTypeAnalysis in VPCostContext to infer the pointer type of the load address. However, CachedTypes in VPTypeAnalysis may be invalidated since earlier transformations in tryToBuildVPlan could erase recipes from the plan. This pollutes the cache with stale types. Fix this by creating a new VPCostContext locally scoped to convertToStridedAccesses, ensuring VPTypeAnalysis reflects the current plan state. This serves as a quick fix to prevent accidental reuse by future transformations.
Follow-up to llvm#198941, which introduced Locked<T> and SharedLocked<T>. Add GetObjectFileLocked, GetSymbolFileLocked, GetSymtabLocked, and GetSectionListLocked alongside the existing accessors. The locked variants cover two things: 1. They prevent the pointer from being swapped out from under the caller. The old getters take m_mutex only during lazy initialization and release it before returning. The unique_ptr or shared_ptr that owns the pointee can therefore be reassigned by another thread while the caller still holds the raw value. LockedPtr keeps the Module mutex held alongside the borrowed pointer, pinning the binding for the lifetime of the handle. 2. They serialize access to the pointee itself. This is not new, the classes in question were already relying on the Module mutex for synchronization. Migrate the four call sites in Module where the existing patter maps to a single LockedPtr. The legacy raw-pointer getters remain so call sites can migrate incrementally.
…vm#199126) Most callers are unchanged, since they either ignore the specific error or have their own formatting of the error that includes both the path and the errorToErrorCode-unwrapped value. However, for clients that just forward the error it's helpful to ensure we do not lose track of the filename that the error is associated with, so use FileError. Incidentally remove two uses of errorToErrorCode that were being used instead of consumeError; in both cases getOptionalFileRef was more appropriate.
…results (llvm#199119) With layout conflict handling this case is no longer an issue.
…ing (llvm#199189) This prevents generating invalid C code in mixed-language headers by leaving `typedef` declarations inside `extern "C"` blocks intact by default. Fixes llvm#141394
…ameter mapping (llvm#195995)" (llvm#199228) This reverts commit 7e2821e, which causes a crash-on-valid in clang: llvm#199209
…essage (llvm#199233) Help track whether a fold was attempted or not
Implement `MemRefElementTypeInterface` on `fir::RecordType` so that `memref<!fir.type<…>>` verifies, enabling downstream passes to use memrefs of Fortran derived types.
Co-authored-by: <konstantinos.parasyris@intel.com>
Not profitable with VF=4, but we only we try smaller VFs if the load can fit in a single vector register found by BoUpSLP::getVectorElementSize(). Requires proprogation of bit widths through the fmuladd intrinsic to vectorize at VF=2. This is from the hot block in `538.imagick_r` which fails to vectorize when vectorization is removed from pre-LTO, see llvm#195886 (comment).
Relax modular-format attribute validation in the Verifier to allow a first-arg-index of 0 (meaning no variadic arguments, e.g. for v-family functions like vsnprintf). Guard InstCombine's optimizeModularFormat against zero index. Generated by Gemini, reviewed by dthorn
This matches the name on SiFive's website.
Summary: These are stored in the libc/shared and have a unified CMake helper to find them. Likely these will be a more core dependency as LLVM uses them for constexpr math, libcxx uses it, and compiler-rt will probably use bits of it. The original intention was to allow building flang-rt with a partial checkout, but i don't think this is a reasonable use-case and I do not think this exists in practice.
There are a lot of similar and repetetive variants of SDK lookups in the Apple platform plugins. This commit unifies the implementations, error handling and progress reporting. Assisted-by: claude
…#199242) Similar to other VectorCombine folds, in case of OldCost == NewCost, use the reduction if at least the root BinOp is removed as well as the ExtractElement. Noticed while triaging codegen for llvm#199208
…8867) Annotations are not indexed, so we need to skip them on the verifier. Assisted by: claude
…Type (llvm#197331) DICompositeType already has an "Annotations" ivar. This simply adds a way to set it from the "createStructType" function.
D150880 (landed as 0726cb0) uses `APInt` to eliminate most integer overflow issues from FileCheck numeric variables. It also removes the 4 tests in `llvm/test/FileCheck/match-time-error-propagation`. While the elimination of overflow issues reduces the importance of those tests, the tests still seem worthwhile. Without them, I see no test that exercises the "unable to substitute variable or numeric expression: overflow error" diagnostic in FileCheck input dumps. This patch resurrects those tests and updates them to exercise the remaining unsigned underflow case.
…e-side var metadata, internalize device side variables, and lower poison attribute (llvm#190087) Signed-off-by: ZakyHermawan <zaky.hermawan9615@gmail.com>
This changes the documented semantics of the `noescape` attribute to disallow freeing the pointer, and allow escapes of the integer value of the memory address, as discussed in https://discourse.llvm.org/t/rfc-updating-the-semantics-of-the-noescape-attribute/90326. It also clarifies that the attribute may only be used to annotate the outermost pointer level of nested pointer parameters.
This PR is stacked on PR llvm#198136. This patch refactors `llvm/test/FileCheck/dump-input/annotations.txt` to improve maintainability and coverage and to prepare for the upcoming implementation of search range annotations. Lit substitutions ================= The test repeats the same basic set of RUN lines *many* times. This patch encapsulates those in lit substitutions to improve maintainability. By doing so, it also helps to ensure more consistent coverage of all cases and thus slightly expands coverage. -strict-whitespace ================== Via those substitutions, this patch adds `-strict-whitespace` throughout the test, and it drops the initial `-strict-whitespace` case because it is then redundant. That causes many whitespace changes throughout the test, so this patch is easier to read with `git diff -w`. When I originally wrote the test, I thought maintaining it would be too painful with `-strict-whitespace`. However, I now think it is important for usability to thoroughly check that annotations are correctly aligned with the input, especially given the upcoming search range annotations. -dump-input-label-width ======================= To address that anticipated maintenance pain, and to make the above change easier to implement, this patch also implements a new hidden FileCheck option, `-dump-input-label-width`. It enables tests like this one not to have to fuss with fluctuations in the label column width that are caused when varying the verbosity options. I do not anticipate this option will be used outside FileCheck's own test suite. Splitting directive blocks ========================== To improve readability, this patch splits apart directive blocks where the same annotations appear multiple times with small differences at different verbosity levels. See new header comments for details.
These tests were failing on z/OS because the text input files were being opened as binary. ``` FAIL: LLVM :: tools/dsymutil/AArch64/typedef-different-types.test FAIL: LLVM :: tools/dsymutil/X86/mismatch.m FAIL: LLVM :: tools/dsymutil/embed-resource.test FAIL: LLVM :: tools/llvm-gsymutil/X86/elf-symtab-file.yaml ``` Open the files as text to solve the problems.
…erleave.ll (llvm#198666) On the memory-interleave.ll test, some of the CHECK lines are failing on z/OS, due to difference in rounding behaviour when printing the Estimated cost per lane. Resolve this by removing the fractional part, similar to what done in the past with llvm@e8556ff and llvm@aeb88f6 .
…egion. (llvm#199157) This patch fixes a regression caused by llvm#198635: when we call getSource() for a `fir.load` of a box we have to handle the input value that might be a `BlockArgument` and pass-through it.
…lvm#197850) Restore "Extend jump-threading to allow live local defs" llvm#135079. Long compilation time with reduce.cu in hipcub/warp was partially addressed in llvm#195744. Compilation time for reduce.cu with this PR (after llvm#195744) is 6 minutes 40 seconds. Without (llvm#195744) compilation time was several hours. Long compilation time in reduce.cu was only exposed by jump-threading. In my view the primary causes were due to inlining, SROA tripling the IR code size, and SSA updating 26K phi-nodes resulting in an O(N^2) search for duplicates. llvm#195744 limits phi search times. This reverts commit a76750e. --------- Signed-off-by: John Lu <John.Lu@amd.com>
Was broken with > when more than 1 dialect is present, one must be selected via '-dialect'
…lvm#199232) The intel_sub_group_block_write_ui[2,4,8] overloads for image2d_t were declared with a read_only qualifier, both in opencl-c.h and in OpenCLBuiltins.td. A write operation cannot target a read_only image, and the base intel_sub_group_block_write together with the analogous _us, _uc and _ul aliases all correctly use write_only image2d_t. Per the cl_intel_subgroups_short [1], cl_intel_subgroups_char [2] and cl_intel_subgroups_long [3] specifications, the _ui aliases are added "for naming consistency [...] There is no change to the description or behavior of these functions" relative to the cl_intel_subgroups base, which uses write_only image2d_t for writes. The typo was introduced in b833bf6 and preserved across all later edits to this area. Switch the qualifier from read_only to write_only in both opencl-c.h and OpenCLBuiltins.td, and update intel-subgroups-builtins.cl to match the corrected signature (the existing test was exercising the buggy overload). [1] https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_subgroups_short.html [2] https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_subgroups_char.html [3] https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_subgroups_long.html Co-Authored-By: Claude Opus --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ins (llvm#199258) Add cl_intel_subgroup_buffer_prefetch and cl_intel_subgroup_local_block_io declarations to OpenCLBuiltins.td and cover them with header-free SPIR tests. This keeps the generated OpenCL builtins in sync with opencl-c.h for the Intel subgroup buffer prefetch and local block I/O extensions. Per the cl_intel_subgroup_local_block_io specification, the _ui local aliases (intel_sub_group_block_read_ui*, intel_sub_group_block_write_ui* with __local pointer) are declared under FuncExtIntelSubgroupLocalBlockIO alone, without a char/short/long prerequisite. A dedicated test (intel-subgroup-local-block-io-ui-without-char-short-long.cl) verifies that they resolve when only cl_intel_subgroup_local_block_io is active. Specification: https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_subgroup_buffer_prefetch.html https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_subgroup_local_block_io.html Co-authored-by: Copilot
Padded CIR unions (e.g. libstdc++ `std::string` SSO layout) carry a trailing byte-array member so the record matches the AST layout size. `RecordType::getTypeSizeInBits` was returning only the largest-aligned member and ignored that tail, so the CIR view of the union was 8 bytes smaller than what `LowerToLLVM` emits. Parent structs then picked up a spurious trailing pad via `insertPadding`, arrays of those structs used the wrong stride, and heap allocations could be overrun (Eigen's `array_of_string` hits this directly). The fix adds the padding member's size when the union is marked `padded`, so struct size, GEP strides, and `new T[n]` allocation sizes match OGCG. Regression test models the SSO-shaped record and checks the 96-byte `new` for three elements.
Cmake does not properly parse IN_LIST within the if condition, and treats it as a token. This is not desired behavior. The CMP0057 policy supports the new [if() IN_LIST ](https://cmake.org/cmake/help/latest/command/if.html#command:if) operator. Enable this policy and resolve the build error. Fixes llvm#199282 Assisted by: Github Copilot
…access (llvm#199087) This job checks out untrusted code from a PR in a trusted context (issue_comment trigger), so we need to limit it to people with commit access to avoid possible privilege escalation.
libclc standalone build puts libclc.bc in ${CMAKE_CURRENT_BINARY_DIR}/
${TARGET_TRIPLE} dir. check-libclc fails because .cl test is looking for
libclc in clang resource dir.
Fix them by adding `--libclc-lib=:{path}` flag for standalone build,
where `path` is path to libclc.bc.
Note: this flag is not used in in-tree build.
… ScopedHashTable traversal (llvm#196746)" (llvm#199288) This reverts commit 371f57c due to failing tests
Normally the open parens happen right before a.out, but on arm64e the load address is placed there instead. So instead of: $0 = 0x0000d00d (a.out...) we instead have: $0 = 0xcafed00d (actual=0x0000d00d a.out ...)
…vm#199169) This makes check-clang-format automatically builds clang-format-check-format, which checks that the new clang-format doesn't break the existing format of the clang-format source.
…dVPValuesInPlan tests (llvm#199275) llvm#195891 exposed a use-after-free in the tests: `BinaryOperator *AI` [*] is deleted prior to VPlan's destructor, which expects all the operands to still be alive. This patch fixes the test (suggested by a Florian in llvm#199252 (review)), by preemptively detaching AI from the VPlan. [*] No AI was harmed or used during the creation of this patch.
…llvm#198652) This PR extended xegpu.load_matrix and xegpu.store_matrix to support 1D mem_desc for contiguous SLM access - Added unit tests for 1D load/store (valid ops and invalid cases) - Added integration test verifying both 1D (<4096xbf16>) and 2D (<64x128xbf16>), correctly lower through the full WG→SG→WI→XeVM pipeline --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…)) (llvm#199281) `capture(none)` has very restrictive semantics and an easy footgun to accidentally fire some UB into your code with. Most significantly it does not allow any visible side-effects of whether a pointer was null or not to escape the function. This means that the function cannot perform different side effects depending on whether a pointer marked `noescape` is null. Relax this to `captures(address)`, which allows information about the numerical address to escape the function, but no provenance (i.e. nothing that could be dereferenced) may escape. As discussed in https://discourse.llvm.org/t/rfc-updating-the-semantics-of-the-noescape-attribute/90326.
…ng getVectorElementCount (llvm#199286) Fixes the assert reported here: <llvm#198446 (comment)> I believe this happens when the element type isn't a legal RVV element type and so has been scalarised by type legalisation. Adding this guard also matches the AArch64 implementation. The test change is LLM generated.
…th fixed length vectors (llvm#199227) Implementing IRTranslator support for fixed length vectors when the V extension is used. This implementation works similar to SelecionDAGs. We use insert and extract subvector OPs to get the fixed length vectors out of the scalable length vectors.
…lysis (llvm#199208) Add full CostKinds, to improve a lot of reduction matching in vectorcombine/slp passes These are based off SMIN/UMIN numbers, and a few SMAX/UMAX numbers don't always match, but are typically within +/-1
Collaborator
|
PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/6/builds/201 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
New code from upstream, in amd-debug we have one extra parameter in this function + test update, mostly metadata.