[pull] main from llvm:main#5794
Merged
Merged
Conversation
The existing EnumEntry stores string using StringRefs, which are large and require two relocations per entry. Introduce a new, compact enum string representation that stores strings relative to the enum entries in memory, allowing a low-overhead and relocation-free storage. Unfortunately, the enum definitions have to be written into a separate constexpr variable; only C++20 supports structural template parameters. It is also not possible to hide this behind a macro due, because we want enum entries to be sourced from other files and #include cannot occur during a macro expansion. When all uses of EnumEntry are ported to the new representation, this will save 4.7k relocations on libLLVM.so (3% in an all-target assert build), resulting in faster startup and lower max-rss, as these rarely used pages don't need to be touched at all anymore.
This PR has an [RFC](https://discourse.llvm.org/t/rfc-filecheck-improving-input-dump-readability/91112). It is stacked on PR #199063. Example ======= ``` $ cat check CHECK: start CHECK-NEXT: end $ FileCheck -v -dump-input-context=2 check < input |& tail -23 <<<<<< 1: start check:1 ^~~~~ next:2'0 { search range start (exclusive) 2: foo0 3: foo1 . . . 21: foo19 22: foo20 23: end next:2'1 !~~ error: match on wrong line 24: bar0 25: bar1 . . . 42: bar18 43: bar19 44: bar20 next:2'2 } search range end (exclusive) >>>>>> ``` Without this patch, input lines 1-3 and 42-44 are not shown. However, lines 1-3 are where the actual problem is because that is where the `CHECK-NEXT` directive was expected to match but did not. Search Ranges Are Helpful ========================= In general, this patch marks any failed pattern's search range by using the annotation style shown above, and these annotations are filtered in when using `-dump-input-filter=error`, which is the default filter. Seeing the search range can be helpful for understanding the pattern's behavior. Moreover, the cause of the pattern failure is often the input at the start or end of the search range. For example: - A `CHECK-NEXT` or `CHECK-SAME` match on the wrong line, as in the above example. - A `CHECK-NOT` unexpected match because a neighboring directive matched at an unexpected point, affecting the search range. - An unmatched `CHECK` because a subsequent `CHECK-LABEL` matched at an unexpected point, affecting the search range end. (In this case, the search range start and thus the prior directive's match is already revealed without this patch.) This patch updates tests in `llvm/test/FileCheck/dump-input/search-range-annotations`, which demonstrate its benefit for those cases. This patch is a replacement for D96653, which attempted to address the above cases but in a less straight-forward and somewhat broken manner. The idea of the current patch was discussed during that review. Replacing `X~~` =============== Without this patch, search ranges for unmatched patterns (whether a success, as for `CHECK-NOT`, or a failure, as for `CHECK`) are already marked with `X~~`. In those cases, this patch replaces those annotations with the new annotations shown above. As described above, this patch adds the new annotations to all other failed matched patterns as well. `X~~` is thus no longer used by `-dump-input`. The `X~~` style is very noisy, especially for consecutive unmatched unexpected patterns (like a `CHECK-NOT` block), all of which mark every line of their identical search ranges. Switching to the new style significantly reduces the noise in such cases. That noise was discussed recently in issue #77257 and PRs linked from there, and that discussion led to the resurrection of this patch. Without this patch, `-dump-input-filter=error` filters in only the start of a search range that spans multiple lines because an entire `X~~` is one annotation. With this patch, it filters in both the start and end of a search range because they are separate annotations (or a single one-liner, as discussed below). With or without this patch, a different argument to `-dump-input-filter` (and `-vv`) is required to filter in the search range for an unmatched unexpected pattern, like `CHECK-NOT`, because that is not an error. One-Liners ========== If a search range does not involve multiple input lines, this patch keeps the `{` and `}` markers on the same output line like this: ``` <<<<<< 1: start foo end check:1 ^~~~~ check:3 ^~~ not:2 { } search range (exclusive bounds) >>>>>> ``` Exclusive Boundaries ==================== `{` and `}` are to be interpreted as exclusive bounds. That is, the characters at those markers are not included in the search range, but everything in between is. To try to avoid confusion, this patch adds the word "exclusive" in every search range annotation. When the search range starts or ends at a line boundary, the marker cannot be placed at the first or last character of the line because that would exclude that character. This patch instead places the marker in the margin of the input dump, either before the line's first character or after the line's last character (usually a newline), rather than on the adjacent line. I have found this makes the annotations easier to read (more apt have a one-liner, at least), and I do not think it would make much sense to move a start annotation to an imaginary line 0. For example, the `{` and `}` below appear in the input dump margins, before the first character, `s`, and after the space representing the newline: ``` <<<<<< 1: start check:1'0 { } search range (exclusive bounds) check:1'1 error: no match found in search range >>>>>> ``` Before trying exclusive bounds, I experimented with notations involving inclusive bounds (e.g., `[` and `]`, or `[` and `)`). Such notations either cannot distinguish empty ranges (without a cryptic inversion like `][`) from one-character ranges, or they cannot represent them with one-liners (because they must occupy the same column), increasing verbosity. I ultimately decided I prefer exclusive bounds because they are visually and semantically symmetric while consistently concise and unambiguous. For comparison, match ranges (e.g., `^~~`) cannot distinguish one-character ranges from empty ranges. However, in my experience, empty match ranges are uncommon, and usually it is easy to distinguish them based on the pattern or directive (e.g., `CHECK-EMPTY`). In contrast, empty search ranges can occur repeatedly when directive matches are adjacent and `-implicit-check-not` is used.
…206053) This fixes the case where GFX7 fails expensive checks/machine verification with GISel due to passing a literal directly to V_MIN that is not supported on the architecture. This fixes the buildbot failure: https://lab.llvm.org/buildbot/#/builders/187/builds/21241 caused by #202680.
The embedded compressed payload is in little endian, and offload assumes that host endianness is used. Skip the test if host endianness is not little endian. Alternative to #205822.
) The const and pure attribute may only be applied to a function declaration. However, we were missing a subject list for the attributes and so we would silently accept and retain the attribute on any kind of declaration. Empirical testing suggests that this attribute is not effective with Objective-C method calls or indirect calls and so the subject is limited to just function declarations.
…ments. (#165278) FreeBSD coredump uses program headers to store mmap information. It is possible for program to use more than PN_XNUM mmaps. Therefore, we implement the support of PN_XNUM in readelf.
The X86 backend then lowers the shuffle through lowerV16F32Shuffle / lowerV8F64Shuffle, which fall through to lowerShuffleWithPERMV (VPERMPS / VPERMPD). lowerShuffleAsVALIGN is asserted on i32 / i64 element types only and is never called from the float-domain paths, even when the mask is a clean concatenate-and-shift that VALIGN expresses exactly. On znver5, VALIGN and VPERMPS / VPERMPD have identical latency (5 cycles for zmm), throughput (2), and macro-op count (1). The real cost of VPERMPS / VPERMPD is the extra zmm register required to hold the permutation index vector. Intrinsic path for _mm512_alignr_epi32 also gets a vperm. Its a win in generic path as well as vpermps zmm1, zmm0, zmm3 requires a dedicated zmm register to hold the permutation index vector. valignd zmm1, zmm3, zmm3, 1 encodes the rotation count as an immediate (imm8 = 1), using no extra registers. Co-authored-by: Shivanshu
#205761) This patch updates the ThreadSanitizer documentation in clang/docs by documenting the run-time flags and suppressions, which was requested in google/sanitizers#446. Specifically: - Adds a "Run-time Flags" section detailing common options that can be passed in TSAN_OPTIONS (e.g. exitcode, log_path, history_size, halt_on_error, report_atomic_races, etc.). - Explains how to print the full list of options using help=1. - Adds a "Suppressions" section documenting the syntax, wildcard rules, and types of runtime suppressions (race, thread, called_from_lib) with a practical example suppressions file. - Adds compile-time ignorelist code examples. - Document limitations with C++ Exceptions, non-instrumented code, and GDB/ASLR issues. - Removes outdated references to the archived sanitizers wiki.
Co-Authored-By: Claude <noreply@anthropic.com>
…alues" (#206034) Reverts #205657 The original commit was causing pre-merge CI to fail for AArch64, as one of the tests expects stepping behaviour that is seen on not seen on AArch64 targets; the test suite containing the failing test is meant to be configured to not run for AArch64, but the unsupported label was not being applied, due to an error in the unsupported check. This patch fixes the unsupported check in scripts/lit.local.cfg, which should prevent further errors.
Run `acc-bind-routine` on `FunctionOpInterface` and rewrite calls to bound symbols in offload regions and `gpu.func`. For string bind names, declare private functions in the enclosing `gpu.module` symbol table when the call is inside device code.
Follow up to #200414 [comment](#200414 (comment)) to add explicit `-global-isel` flag to mixed tests.
…ctions (#205612) The default max interleave factor is 2. Increasing it to 4 universally can spend an amount of codeside on something that does not always increase performance (especially if the loop gets over-unrolled). Small reduction loops often benefit from extra interleaving due to the multiple independant streams that can execute in parallel. This patch increases the max interleave factor to 4 for such loops, limited to where the VF is <= 4 to limit the impact for already highly vectorized loops.
To handle bitcode inputs that are not in individual files on disk, such as members of non-thin archives, DTLTO serializes those inputs to temporary individual bitcode files. This patch changes LLVM to serialize only uncached input modules and any modules they import from. For a link of Clang 22 (debug build with sanitizers and instrumentation), I performed measurements with and without this patch for an optimized toolchain (PGO non-LTO, based on recent main commit c264e07). The measurements were run on: - Windows 11 Pro build 26200, AMD Family 25 at approximately 4.5 GHz, 16 cores / 32 threads, and 64 GB RAM. - Ubuntu 24.04.3 LTS, Ryzen 9 5950X with 32 threads, and 62 GiB RAM. There was no difference in serialization time when the cache was disabled. When the cache was enabled and all compilations hit in the cache, serialization was eliminated, as was the time spent deleting the previously serialized temporary files, which are no longer created. Mean wall-clock time improved by about 10% on both machines in this scenario.
The horizontal reduction reuse-counter scale is built in getRootNodeScalars() order and applied positionally to the emitted reduction vector. For a root node with copyable elements the scalar order is reordered while the emitted lanes still follow the reduced values (candidates) order, so the repeat count was applied to the wrong lane, producing a wrong reduction result. Fixes #205614 Reviewers: Pull Request: #206102
Instead of storing pointer+value pair, use the new enum tables to store the same information more compact and without dynamic relocations.
…#205952) The iteration order of DenseSet is not guaranteed, which affects the output of code generated with GVNSink enabled. This can cause code to be emitted in differing order, affect section ordering, and in some cases was reported to result in larger binaries due to increased padding between sections. This patch addresses this by using SetVector, which has a deterministic iteration order.
…to ensure middle-end is creating reduction intrinsics (#206101) AVX512 is missing a llvm.vector.reduce.add.v16i32 call - will investigate
Reading a global or static variable on a Wasm target produced a wrong value (or none at all). Two Wasm-only bugs combined to break it, both of which need to be fixed to support `target variable` / `frame var`. 1. DWARFExpression::Evaluate special-cased DW_OP_addr and DW_OP_addrx/DW_OP_GNU_addr_index on Wasm to push a LoadAddress, based on the theory that "Wasm file sections aren't mapped into memory". But a DW_OP_addr operand denotes a location in the module's address space, i.e. a file address like on every other target. Forcing a load address breaks the static (no-process) read path, since a file section cannot be read as a load address. 2. ObjectFileWasm::SetLoadAddress mapped every section with `load_address | GetFileOffset()`. For an active data segment that Object-tags the address (top bit = code space) and uses the file offset instead of the segment's linear-memory address, so a live read of a data global resolved to a garbage address in the wrong space. Address (1) by dropping the incorrect special casing. Address (2) by mapping data sections into the Memory space at their linear VM address while preserving the module id. Code and other sections keep their Object-space module-offset addressing.
When reading shared cache libraries out of lldb's own memory (the default, eSymbolSharedCacheUseHostAndInferiorSharedCache), the dyld introspection path built a plain DataExtractor spanning an image's [minVmAddr, maxVmAddr). A shared cache image's segments may not be contiguous: other images' data and unmapped guard pages may lie between them. Take advantage of the VirtualDataExtractor with a per-segment lookup table instead, matching the map_shared_cache_binary_segments path, so reads are confined to mapped segments.
Avoid incompatible declarations, which are problematic with MSVC.
#205939) `ACCRecipeMaterialization` can replace the placeholder with the actual variable name when materializing the recipe. Assisted-by: Claude Code
Instead of manually calculating the size and alignment of a union, we can just generate an actual union and take the size and alignment of that. Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
…efs (#195877) It fixes the following case: ``` vector.transfer_read %arg0[], %0 : memref<f16>, vector<f16> ```
The macro is only required inside `<fstream>`, so we can move it there instead of having it as a general configuration macro.
Added a guard so the structured pack transform reports a normal tiling failure when the target has already been bufferized, instead of reaching a tensor-only path and asserting. Fixes #205744
…6107) `LLDB_LAUNCH_FLAG_USE_PIPES=1` is used in tests to run lldb without the ConPTY on Windows. This reduces the flakyness of tests. This patch ensures that we read the value of `LLDB_LAUNCH_FLAG_USE_PIPES` when setting up gdbremote tests, to make sure they don't use the ConPTY. This fixes `tools/lldb-server/TestGdbRemote_qThreadStopInfo.py` on https://ci-external.swift.org/job/lldb-windows/job/main/.
`std::countr_zero` can be used instead, which is a standard API.
This implementation detects a use-after-move for the 3-arguments std::move on containers. This PR fixes #137157. Since my current implementation uses `IteratorModeling` which is in alpha stage I mark this PR as draft. When both the `IteratorModeling` and `MoveChecker` are enabled my implementation works to detect the use-after-move for the 3 argument std::move case. ```cpp std::move(l1.begin(), l1.end(), std::back_inserter(l2)); std::cout << "l1: " << *l1.cbegin() << '\n'; // <--- should have a use-after-move ``` ```text move_iterator.cpp:14:28: warning: Method called on moved-from object 'l1' of type 'std::list' [cplusplus.Move] 14 | std::cout << "l1: " << *l1.cbegin() << '\n'; // <--- should ... | ^~~~~~~~~~~~ ``` `evalCall` models the 3-arg `std::move` pattern and marks the source container in `TrackedContentsMap` to avoid false positives on safe method calls. In `checkPreCall` I recover the iterator's container through `getIteratorPosition` and check it against `TrackedContentsMap` to emit the warning. I have been thinking about alternative solutions that do not depend on `IteratorModeling`, but I think it would be more time saving to ask maintainers about possible solutions before I start my own implementation.
When constructing the dependency graph for compilation caching, the dependency scanner needs to do some extra operations on the compiler invocations. Historically, these have not utilized the copy-on-write variant well. This patch takes care to minimize `CompilerInvocation` copies, which improves incremental scans with populated up-to-date scanning module cache by 16-18%. Together with #203350 which operates in the same space, wall-times are improved by 1.54x and instruction counts by 1.66x.
…ompiler.h> (#205590) These macros are essentially there to query compiler features, so they should be moved into `<__configuration/compiler.h>`.
Towards #172124 Co-authored-by: Hristo Hristov <zingam@outlook.com>
Resolve the names of CRITICAL constructs even if they are reserved names. This also limits locator parsing to known reserved names. Fixes #205855
…5864) Prepares for `AF_UNIX` domain-socket support on Windows by separating the cross-platform socket logic from the one platform-specific operation. Every domain-socket operation is identical on POSIX and Windows (via `<afunix.h>`), so it now lives in a single base class `DomainSocket`. The one operation that is different is `CreatePair()`. It lives in `DomainSocketPosix` / `DomainSocketWindows`. It's selected for the host as `DomainSocketPlatform` through `lldb/Host/DomainSocket.h`. This is an NFC patch: POSIX behavior is unchanged, and while the shared code now also compiles on Windows it stays unreachable there. A follow-up commit enables it. rdar://180736036
…206129) Host::OpenURL was only defined for Darwin (in Host.mm). Add a portable implementation in the common Host.cpp: on Unix it launches xdg-open; on Windows it returns "unsupported" for now. xdg-open is run without a shell (run_in_shell=false) so query-string metacharacters in the URL are never interpreted by the shell. Also add Host::URLEncode, an RFC 3986 percent-encoder for assembling tracker URLs. These are the building blocks for an upcoming "diagnostics report" command that opens a pre-filled bug URL, and the encoder is shared with a downstream tap-to-radar reporter.
The Diagnostics framework had a callback registry (AddCallback / RemoveCallback) so subsystems could contribute files to a diagnostics directory, intended to also run during crash handling. That crash-time path never materialized, and the sole registered callback was the Debugger copying its file-backed logs. If you had no logging enabled, the directory would be empty, confusing the users. Remove the registry and the callback loop in Diagnostics::Create (which now just writes the in-memory log), and expose the log copying as Debugger::CopyLogFilesToDirectory, which "diagnostics dump" calls directly. The dump command now copies the invoking debugger's logs rather than every debugger's, which is the more useful behavior I want to double down on.
This fixes a problem with CIR failing to handle boolean result types for the __builtin_(add|sub|mul)_overflow functions. We were trying to lower to operations derived from CIR_BinOpOverflow, but these operations required an integer type for the return value. This change relaxes that requirement to allow integer or boolean types. related non-CIR PR #192568.
) The majority of the content of rdar://179151476 duplicates the PointerFlow analysis after #203633. Therefore, we only need to upstream the tests for better test coverage and proving the duplication. rdar://179151476
Replace assertions that listed concrete types with generic ones that check that the type is a vector with an even number of elements. Update splitUnary and splitBinary. I already updated splitBinary and splitTernary in #203472, but splitBinary change was accidentally removed in #203607, so I am bringing it back in.
When recipes are generated per type and not per variable, we can end up with the same location for multiple private/firstprivate/reduction variables. When materializing the recipes, set the Location of all Operations within the recipe region to be that of the op that is being materialized. It is okay to mutate the original recipes since the location is already not "useful" and the recipes will always get removed at the end of the pass.
…205531) This continues the effort to split `<__config>` into self-contained detail headers.
Introduce acc.gpu_shared_memory to represent GPU workgroup memory slots in a compute region - used for planning before eventually turned into a `memref.view` of a dynamic slot within the workgroup allocation.
Fixes #201756 AI Usage: Used to Search codebase to find location of code to modify and understand existing implementation. --------- Co-authored-by: Simon Pilgrim <git@redking.me.uk> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
Summary: In the case of, say, a GPU sanitizer, there could be a pending report that isn't flushed before the queue dies and the program terminates. Add an explicit flush to ensure that all work at least posted *before* the trap fired is cleared in the HSA error callback before actually quitting.
Fixes #205571 --------- Co-authored-by: Barry Revzin <brevzin@jumptrading.com> Co-authored-by: Björn Schäpers <bjoern@hazardy.de>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )