Rebased: Mark drop calls in landing pads cold instead of noinline#102099
Rebased: Mark drop calls in landing pads cold instead of noinline#102099bors merged 4 commits intorust-lang:masterfrom
Conversation
|
Some changes occurred in compiler/rustc_codegen_gcc cc @antoyo |
|
r? @davidtwco (rust-highfive has picked a reviewer for you, use r? to override) |
|
r? @nagisa |
|
Please adjust the PR description to include the motivation (@nikic’s comment) explaining why it is okay to reapply the patch. r=me otherwise. |
|
Probably fixes #97217 as well. |
|
We should also check perf. before relanding. (I don't know if this PR is ready for that, so I didn't schedule it yet to avoid spam). |
|
@bors try @rust-timer queue |
|
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
|
⌛ Trying commit d1ddf27de674cddc714f857f5ff76cbf315460e6 with merge 252b42173601b80baba38d146cd7ca8f59813819... |
|
Did anything change in serde since last time, or are we accepting that merging this will "reopen" #94390 (it's not closed, but supposedly fixed) -- does https://github.com/MarkDDR/long_compile_rustc_1.59.0 still build slowly with this PR compared to nightly ? |
|
@lqd I don't know if anything changed in serde, but the intention here is indeed to potentially regress compile-time for certain types of excessively large machine-generated code in favor of fixing a class of common optimization failures. |
|
It would be reassuring to have an idea on how common these excessively large cases are, but I don't think we have such data. It feels like it should be rare to me, and worth the better codegen for the general case. Because of the tradeoff, do we know if that's a call that t-compiler should make (at their next meeting) or is wg-llvm signoff enough ? |
|
☀️ Try build successful - checks-actions |
|
Queued 252b42173601b80baba38d146cd7ca8f59813819 with parent edadc7c, future comparison URL. |
|
Finished benchmarking commit (252b42173601b80baba38d146cd7ca8f59813819): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Footnotes |
|
Looking at the perf results, it seems like the one-off regressions are LLVM doing more stuff in some instances. Not super much more, but also quite expected given that LLVM now has a free pass to inline a bunch more code. |
|
I found there's a patch https://reviews.llvm.org/D150989 in LLVM 17 which claims to fix the mutual inlining explosion. If that's the case, does it mean we don't need cold/noinline on landing pads anymore at all? Unfortunately I couldn't find any recent enough test to verify. #94390 seems irrelevant to this inlining explosion since their example is still slow to compile on current nightly 68c2f5b (with noinline) for me. This ancient comment #41696 (comment) also cannot reproduce. |
|
@oxalica I'm confused what you're responding to. I don't think anyone is in doubt that the exponential inlining issues are fixed, that's the reason this PR removes We want |
My bad, I misunderstood it to be all drop calls. Yeah, landing pads should be cold. |
|
If I'm understanding things correctly here, a reduced example would be: @type_info = external global ptr
define void @test(ptr %arg) personality ptr @__CxxFrameHandler3 {
bb:
%a1 = alloca ptr, align 4
%a2 = alloca ptr, align 4
call void @llvm.lifetime.start.p0(i64 4, ptr nonnull %a2)
invoke void @throw()
to label %bb14 unwind label %bb8
bb8: ; preds = %bb7
%i9 = cleanuppad within none []
call void @llvm.lifetime.start.p0(i64 4, ptr nonnull %a1)
store ptr %arg, ptr %a1, align 4
call fastcc void @foo(ptr %a1) [ "funclet"(token %i9) ]
call void @llvm.lifetime.end.p0(i64 4, ptr nonnull %a1)
cleanupret from %i9 unwind label %bb15
bb14: ; preds = %bb7
unreachable
bb15: ; preds = %bb13, %bb5
%cs = catchswitch within none [label %bb17] unwind to caller
bb17: ; preds = %bb15
%cp = catchpad within %cs [ptr @type_info, i32 8, ptr %a2]
%p = load ptr, ptr %a2, align 4
call fastcc void @cleanup(ptr %p) [ "funclet"(token %cp) ]
catchret from %cp to label %exit
exit:
call void @llvm.lifetime.end.p0(i64 4, ptr nonnull %a2)
ret void
}
declare i32 @__CxxFrameHandler3(...)
declare void @throw()
declare void @cleanup(ptr)
declare void @foo(ptr)
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture)
declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture)Here Something I don't fully understand is at which point precisely the exception object |
|
I've filed llvm/llvm-project#66984 upstream. Some helpful references on C++ SEH: https://www.codeproject.com/Articles/2126/How-a-C-compiler-implements-exception-handling and https://gitlab.winehq.org/wine/wine/-/blob/ee17400c05d88fa29d0b895fa01902adfc91ba7f/dlls/msvcrt/except_x86_64.c#L513 |
|
🔒 Merge conflict This pull request and the master branch diverged in a way that cannot be automatically merged. Please rebase on top of the latest master branch, and let the reviewer approve again. How do I rebase?Assuming
You may also read Git Rebasing to Resolve Conflicts by Drew Blessing for a short tutorial. Please avoid the "Resolve conflicts" button on GitHub. It uses Sometimes step 4 will complete without asking for resolution. This is usually due to difference between how Error message |
|
The WinEH issue should be fixed now. We can try this again with a condition for LLVM >= 17.0.2. |
Co-authored-by: Max Fan <git@max.fan> Co-authored-by: Nikita Popov <npopov@redhat.com>
We no longer generate a protector for the strong case in this test, which is actually the expected behavior per the test comment.
This comment has been minimized.
This comment has been minimized.
|
I've limited this to LLVM >= 17.0.2. Fingers crossed that i686-msvc works after the LLVM fix. @bors r+ |
|
💥 Test timed out |
|
Looks spurious: @bors retry |
|
☀️ Test successful - checks-actions |
|
Finished benchmarking commit (2e5a9dd): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 629.674s -> 624.341s (-0.85%) |
|
@nikic @InnovativeInventor - I'm doing the perf triage. Looks like the results are on average an improvement - the one concern I have is the very large regression in ripgrep opt. I suppose this is to be expected with LLVM doing more (which is backed up by the detailed perf results which show more time spent in LLVM). I just wanted to make sure that my interpretation is indeed correct, and the regressions we're seeing are both expected and acceptable. |
|
The regex search runtime benchmark (which is usually quite stable) saw some improvement from this change - https://perf.rust-lang.org/compare.html?start=5333b878c8bc1c4267a67ea3682663629e47541a&end=2e5a9dd6c9eaa42f0684b4b760bd68fc27cbe51b&stat=instructions%3Au&tab=runtime. It uses the |
Yes, that's right. As the change allows more inlining, LLVM will spend more time optimizing in some cases, but also producing better code. (Which is why we're seeing improvements on check builds.) |
This is not surprising -- the compiled code output that I was looking at that initially led me to submit this PR was in |
I noticed that certain inlining optimizations were missing while staring at some compiled code output. I'd like to see this relanded, so I rebased the PR from @erikdesjardins (PR #94823).
This PR reapplies #92419, which was reverted in #94402 due to #94390.
Fixes #46515, fixes #87055.
Update: fixes #97217.