Rebased: Mark drop calls in landing pads cold instead of noinline by InnovativeInventor · Pull Request #102099 · rust-lang/rust

InnovativeInventor · 2022-09-21T14:09:07Z

I noticed that certain inlining optimizations were missing while staring at some compiled code output. I'd like to see this relanded, so I rebased the PR from @erikdesjardins (PR #94823).

This PR reapplies #92419, which was reverted in #94402 due to #94390.

Fixes #46515, fixes #87055.

Update: fixes #97217.

rustbot · 2022-09-21T14:09:11Z

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo

rust-highfive · 2022-09-21T14:09:11Z

r? @davidtwco

(rust-highfive has picked a reviewer for you, use r? to override)

davidtwco · 2022-09-21T14:25:13Z

r? @nagisa

nagisa · 2022-09-22T11:34:51Z

Please adjust the PR description to include the motivation (@nikic’s comment) explaining why it is okay to reapply the patch.

r=me otherwise.

compiler/rustc_codegen_gcc/src/builder.rs

nikic · 2022-09-30T20:44:08Z

Probably fixes #97217 as well.

Kobzol · 2022-10-01T16:02:16Z

We should also check perf. before relanding. (I don't know if this PR is ready for that, so I didn't schedule it yet to avoid spam).

nagisa · 2022-10-01T18:24:31Z

@bors try @rust-timer queue

rust-timer · 2022-10-01T18:24:33Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-10-01T18:25:13Z

⌛ Trying commit d1ddf27de674cddc714f857f5ff76cbf315460e6 with merge 252b42173601b80baba38d146cd7ca8f59813819...

lqd · 2022-10-01T18:48:50Z

Did anything change in serde since last time, or are we accepting that merging this will "reopen" #94390 (it's not closed, but supposedly fixed) -- does https://github.com/MarkDDR/long_compile_rustc_1.59.0 still build slowly with this PR compared to nightly ?

nikic · 2022-10-01T18:56:24Z

@lqd I don't know if anything changed in serde, but the intention here is indeed to potentially regress compile-time for certain types of excessively large machine-generated code in favor of fixing a class of common optimization failures.

lqd · 2022-10-01T19:13:32Z

It would be reassuring to have an idea on how common these excessively large cases are, but I don't think we have such data. It feels like it should be rare to me, and worth the better codegen for the general case.

Because of the tradeoff, do we know if that's a call that t-compiler should make (at their next meeting) or is wg-llvm signoff enough ?

bors · 2022-10-01T19:56:07Z

☀️ Try build successful - checks-actions
Build commit: 252b42173601b80baba38d146cd7ca8f59813819 (252b42173601b80baba38d146cd7ca8f59813819)

rust-timer · 2022-10-01T19:56:09Z

Queued 252b42173601b80baba38d146cd7ca8f59813819 with parent edadc7c, future comparison URL.

rust-timer · 2022-10-02T01:02:35Z

Finished benchmarking commit (252b42173601b80baba38d146cd7ca8f59813819): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	1.1%	[0.5%, 3.9%]	12
Regressions ❌ (secondary)	3.3%	[1.7%, 4.8%]	3
Improvements ✅ (primary)	-0.7%	[-3.5%, -0.2%]	160
Improvements ✅ (secondary)	-0.8%	[-5.0%, -0.2%]	128
All ❌✅ (primary)	-0.5%	[-3.5%, 3.9%]	172

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	2.2%	[0.8%, 3.7%]	3
Regressions ❌ (secondary)	3.3%	[2.1%, 4.8%]	3
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.2%	[0.8%, 3.7%]	3

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.2%	[3.2%, 3.2%]	1
Improvements ✅ (primary)	-2.4%	[-4.7%, -1.6%]	5
Improvements ✅ (secondary)	-2.5%	[-3.2%, -2.0%]	9
All ❌✅ (primary)	-2.4%	[-4.7%, -1.6%]	5

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

nagisa · 2022-11-05T14:18:33Z

Looking at the perf results, it seems like the one-off regressions are LLVM doing more stuff in some instances. Not super much more, but also quite expected given that LLVM now has a free pass to inline a bunch more code.

oxalica · 2023-09-11T16:45:50Z

I found there's a patch https://reviews.llvm.org/D150989 in LLVM 17 which claims to fix the mutual inlining explosion. If that's the case, does it mean we don't need cold/noinline on landing pads anymore at all?

Unfortunately I couldn't find any recent enough test to verify. #94390 seems irrelevant to this inlining explosion since their example is still slow to compile on current nightly 68c2f5b (with noinline) for me. This ancient comment #41696 (comment) also cannot reproduce.

erikdesjardins · 2023-09-11T22:37:33Z

@oxalica I'm confused what you're responding to. I don't think anyone is in doubt that the exponential inlining issues are fixed, that's the reason this PR removes noinline.

We want cold, since panicking paths are cold, pretty much by definition.

oxalica · 2023-09-12T16:28:35Z

@oxalica I'm confused what you're responding to. I don't think anyone is in doubt that the exponential inlining issues are fixed, that's the reason this PR removes noinline.

We want cold, since panicking paths are cold, pretty much by definition.

My bad, I misunderstood it to be all drop calls. Yeah, landing pads should be cold.

nikic · 2023-09-20T14:44:46Z

If I'm understanding things correctly here, a reduced example would be:

@type_info = external global ptr

define void @test(ptr %arg) personality ptr @__CxxFrameHandler3 {
bb:
  %a1 = alloca ptr, align 4
  %a2 = alloca ptr, align 4
  call void @llvm.lifetime.start.p0(i64 4, ptr nonnull %a2)
  invoke void @throw()
          to label %bb14 unwind label %bb8

bb8:                                              ; preds = %bb7
  %i9 = cleanuppad within none []
  call void @llvm.lifetime.start.p0(i64 4, ptr nonnull %a1)
  store ptr %arg, ptr %a1, align 4
  call fastcc void @foo(ptr %a1) [ "funclet"(token %i9) ]
  call void @llvm.lifetime.end.p0(i64 4, ptr nonnull %a1)
  cleanupret from %i9 unwind label %bb15

bb14:                                             ; preds = %bb7
  unreachable

bb15:                                             ; preds = %bb13, %bb5
  %cs = catchswitch within none [label %bb17] unwind to caller

bb17:                                             ; preds = %bb15
  %cp = catchpad within %cs [ptr @type_info, i32 8, ptr %a2]
  %p = load ptr, ptr %a2, align 4
  call fastcc void @cleanup(ptr %p) [ "funclet"(token %cp) ]
  catchret from %cp to label %exit

exit:
  call void @llvm.lifetime.end.p0(i64 4, ptr nonnull %a2)
  ret void
}

declare i32 @__CxxFrameHandler3(...)
declare void @throw()
declare void @cleanup(ptr)
declare void @foo(ptr)
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture)
declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture)

Here %a1 and %a2 get allocated to the same stack slot.

Something I don't fully understand is at which point precisely the exception object %a2 gets written. For the stack slot reuse to be problematic, this must happen before the cleanuppad runs, rather than before the catchswitch/catchpad runs. Is that what happens? Is there any documentation (or open-source implementation) for this anywhere?

nikic · 2023-09-21T07:09:18Z

I've filed llvm/llvm-project#66984 upstream.

Some helpful references on C++ SEH: https://www.codeproject.com/Articles/2126/How-a-C-compiler-implements-exception-handling and https://gitlab.winehq.org/wine/wine/-/blob/ee17400c05d88fa29d0b895fa01902adfc91ba7f/dlls/msvcrt/except_x86_64.c#L513

bors · 2023-09-25T00:16:08Z

🔒 Merge conflict

This pull request and the master branch diverged in a way that cannot be automatically merged. Please rebase on top of the latest master branch, and let the reviewer approve again.

How do I rebase?

Assuming self is your fork and upstream is this repository, you can resolve the conflict following these steps:

git checkout re-cold-land (switch to your branch)
git fetch upstream master (retrieve the latest master)
git rebase upstream/master -p (rebase on top of it)
Follow the on-screen instruction to resolve conflicts (check git status if you got lost).
git push self re-cold-land --force-with-lease (update this PR)

You may also read Git Rebasing to Resolve Conflicts by Drew Blessing for a short tutorial.

Please avoid the "Resolve conflicts" button on GitHub. It uses git merge instead of git rebase which makes the PR commit history more difficult to read.

Sometimes step 4 will complete without asking for resolution. This is usually due to difference between how Cargo.lock conflict is handled during merge and rebase. This is normal, and you should still perform step 5 to update this PR.

Error message

Auto-merging compiler/rustc_codegen_ssa/src/mir/block.rs
CONFLICT (content): Merge conflict in compiler/rustc_codegen_ssa/src/mir/block.rs
Auto-merging compiler/rustc_codegen_llvm/src/builder.rs
Auto-merging compiler/rustc_codegen_gcc/src/builder.rs
Automatic merge failed; fix conflicts and then commit the result.

nikic · 2023-09-28T19:32:25Z

The WinEH issue should be fixed now. We can try this again with a condition for LLVM >= 17.0.2.

Co-authored-by: Max Fan <git@max.fan> Co-authored-by: Nikita Popov <npopov@redhat.com>

We no longer generate a protector for the strong case in this test, which is actually the expected behavior per the test comment.

nikic · 2023-10-02T10:45:23Z

I've limited this to LLVM >= 17.0.2. Fingers crossed that i686-msvc works after the LLVM fix.

@bors r+

bors · 2023-10-02T10:45:25Z

📌 Commit 5bcf4f2 has been approved by nikic

It is now in the queue for this repository.

bors · 2023-10-02T11:29:28Z

⌛ Testing commit 5bcf4f2 with merge 41a28aa...

bors · 2023-10-02T15:44:28Z

💥 Test timed out

rust-log-analyzer · 2023-10-02T15:57:16Z

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

nikic · 2023-10-02T20:26:03Z

Looks spurious:

2023-10-02T15:06:15.1832520Z �[0m�[0m�[1m�[32m    Updating�[0m crates.io index
2023-10-02T15:56:43.8689800Z ##[error]The operation was canceled.

@bors retry

bors · 2023-10-02T22:02:16Z

⌛ Testing commit 5bcf4f2 with merge 2e5a9dd...

bors · 2023-10-02T23:50:05Z

☀️ Test successful - checks-actions
Approved by: nikic
Pushing 2e5a9dd to master...

rust-timer · 2023-10-03T01:12:16Z

Finished benchmarking commit (2e5a9dd): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.5%	[0.3%, 5.9%]	12
Regressions ❌ (secondary)	2.7%	[0.4%, 7.7%]	5
Improvements ✅ (primary)	-0.5%	[-4.5%, -0.2%]	78
Improvements ✅ (secondary)	-0.9%	[-6.9%, -0.1%]	65
All ❌✅ (primary)	-0.2%	[-4.5%, 5.9%]	90

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.1%, 0.1%]	1
Regressions ❌ (secondary)	3.4%	[3.0%, 3.8%]	2
Improvements ✅ (primary)	-7.0%	[-7.0%, -7.0%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-3.5%	[-7.0%, 0.1%]	2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.6%	[1.0%, 4.0%]	6
Regressions ❌ (secondary)	4.8%	[2.7%, 7.0%]	2
Improvements ✅ (primary)	-2.2%	[-5.7%, -1.2%]	6
Improvements ✅ (secondary)	-2.9%	[-6.9%, -0.9%]	9
All ❌✅ (primary)	-0.3%	[-5.7%, 4.0%]	12

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.7%	[0.1%, 1.3%]	20
Regressions ❌ (secondary)	2.2%	[2.1%, 2.3%]	6
Improvements ✅ (primary)	-1.9%	[-5.5%, -0.0%]	35
Improvements ✅ (secondary)	-5.4%	[-11.2%, -0.2%]	10
All ❌✅ (primary)	-1.0%	[-5.5%, 1.3%]	55

Bootstrap: 629.674s -> 624.341s (-0.85%)
Artifact size: 273.32 MiB -> 271.92 MiB (-0.51%)

rylev · 2023-10-03T14:06:26Z

@nikic @InnovativeInventor - I'm doing the perf triage. Looks like the results are on average an improvement - the one concern I have is the very large regression in ripgrep opt. I suppose this is to be expected with LLVM doing more (which is backed up by the detailed perf results which show more time spent in LLVM). I just wanted to make sure that my interpretation is indeed correct, and the regressions we're seeing are both expected and acceptable.

Kobzol · 2023-10-03T14:09:23Z

The regex search runtime benchmark (which is usually quite stable) saw some improvement from this change - https://perf.rust-lang.org/compare.html?start=5333b878c8bc1c4267a67ea3682663629e47541a&end=2e5a9dd6c9eaa42f0684b4b760bd68fc27cbe51b&stat=instructions%3Au&tab=runtime. It uses the regex crate (which is also used by ripgrep) internally. I wouldn't make much of it, but could be a hint that indeed regex optimizes slightly better thanks to this.

nikic · 2023-10-03T14:13:53Z

@nikic @InnovativeInventor - I'm doing the perf triage. Looks like the results are on average an improvement - the one concern I have is the very large regression in ripgrep opt. I suppose this is to be expected with LLVM doing more (which is backed up by the detailed perf results which show more time spent in LLVM). I just wanted to make sure that my interpretation is indeed correct, and the regressions we're seeing are both expected and acceptable.

Yes, that's right. As the change allows more inlining, LLVM will spend more time optimizing in some cases, but also producing better code. (Which is why we're seeing improvements on check builds.)

InnovativeInventor · 2023-10-03T14:55:37Z

The regex search runtime benchmark (which is usually quite stable) saw some improvement from this change - https://perf.rust-lang.org/compare.html?start=5333b878c8bc1c4267a67ea3682663629e47541a&end=2e5a9dd6c9eaa42f0684b4b760bd68fc27cbe51b&stat=instructions%3Au&tab=runtime. It uses the regex crate (which is also used by ripgrep) internally. I wouldn't make much of it, but could be a hint that indeed regex optimizes slightly better thanks to this.

This is not surprising -- the compiled code output that I was looking at that initially led me to submit this PR was in ripgrep.

rust-highfive assigned davidtwco Sep 21, 2022

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Sep 21, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 21, 2022

rust-highfive assigned nagisa and unassigned davidtwco Sep 21, 2022

erikdesjardins reviewed Sep 22, 2022

View reviewed changes

compiler/rustc_codegen_gcc/src/builder.rs Outdated Show resolved Hide resolved

nikic mentioned this pull request Oct 1, 2022

#[inline] on generic functions #102539

Open

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 1, 2022

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Oct 2, 2022

nagisa added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 28, 2022

nagisa added the perf-regression-triaged The performance regression has been triaged. label Nov 5, 2022

nikic mentioned this pull request Sep 28, 2023

Update LLVM submodule #116227

Merged

erikdesjardins and others added 3 commits October 2, 2023 10:37

Reapply: Mark drop calls in landing pads cold instead of noinline

31ee8b1

Co-authored-by: Max Fan <git@max.fan> Co-authored-by: Nikita Popov <npopov@redhat.com>

Fix codegen tests on panic=abort targets

0608fca

Update stack protector test

ebbc687

We no longer generate a protector for the strong case in this test, which is actually the expected behavior per the test comment.

This comment has been minimized.

Sign in to view

Limit to LLVM 17.0.2 to work around WinEH codegen bug

5bcf4f2

RalfJung mentioned this pull request Oct 3, 2023

Rust 1.59 rustc greatly increased compile time with include_str! #94390

Open

pvdrz mentioned this pull request Oct 4, 2023

Fix divergence from upstream master ferrocene/ferrocene#23

Merged

ehuss mentioned this pull request Feb 6, 2024

build --release takes forever on latest rust release #119822

Closed

erikdesjardins mentioned this pull request Feb 24, 2024

Heap allocation in Box optimized out with rustc 1.60 but not with 1.61+ #98679

Closed

Uh oh!

Conversation

InnovativeInventor commented Sep 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Sep 21, 2022

Uh oh!

rust-highfive commented Sep 21, 2022

Uh oh!

davidtwco commented Sep 21, 2022

Uh oh!

nagisa commented Sep 22, 2022

Uh oh!

Uh oh!

nikic commented Sep 30, 2022

Uh oh!

Kobzol commented Oct 1, 2022

Uh oh!

nagisa commented Oct 1, 2022

Uh oh!

rust-timer commented Oct 1, 2022

Uh oh!

bors commented Oct 1, 2022

Uh oh!

lqd commented Oct 1, 2022

Uh oh!

nikic commented Oct 1, 2022

Uh oh!

lqd commented Oct 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bors commented Oct 1, 2022

Uh oh!

rust-timer commented Oct 1, 2022

Uh oh!

rust-timer commented Oct 2, 2022

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

Footnotes

Uh oh!

nagisa commented Nov 5, 2022

Uh oh!

oxalica commented Sep 11, 2023

Uh oh!

erikdesjardins commented Sep 11, 2023

Uh oh!

oxalica commented Sep 12, 2023

Uh oh!

nikic commented Sep 20, 2023

Uh oh!

nikic commented Sep 21, 2023

Uh oh!

bors commented Sep 25, 2023

Uh oh!

nikic commented Sep 28, 2023

Uh oh!

This comment has been minimized.

nikic commented Oct 2, 2023

Uh oh!

bors commented Oct 2, 2023

Uh oh!

bors commented Oct 2, 2023

Uh oh!

bors commented Oct 2, 2023

Uh oh!

rust-log-analyzer commented Oct 2, 2023

Uh oh!

nikic commented Oct 2, 2023

Uh oh!

bors commented Oct 2, 2023

Uh oh!

bors commented Oct 2, 2023

Uh oh!

rust-timer commented Oct 3, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

InnovativeInventor commented Sep 21, 2022 •

edited

Loading

lqd commented Oct 1, 2022 •

edited

Loading