Skip to content

Rebased: Mark drop calls in landing pads cold instead of noinline#102099

Merged
bors merged 4 commits intorust-lang:masterfrom
InnovativeInventor:re-cold-land
Oct 2, 2023
Merged

Rebased: Mark drop calls in landing pads cold instead of noinline#102099
bors merged 4 commits intorust-lang:masterfrom
InnovativeInventor:re-cold-land

Conversation

@InnovativeInventor
Copy link
Contributor

@InnovativeInventor InnovativeInventor commented Sep 21, 2022

I noticed that certain inlining optimizations were missing while staring at some compiled code output. I'd like to see this relanded, so I rebased the PR from @erikdesjardins (PR #94823).

This PR reapplies #92419, which was reverted in #94402 due to #94390.

Fixes #46515, fixes #87055.

Update: fixes #97217.

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Sep 21, 2022
@rustbot
Copy link
Collaborator

rustbot commented Sep 21, 2022

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo

@rust-highfive
Copy link
Contributor

r? @davidtwco

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 21, 2022
@davidtwco
Copy link
Member

r? @nagisa

@rust-highfive rust-highfive assigned nagisa and unassigned davidtwco Sep 21, 2022
@nagisa
Copy link
Member

nagisa commented Sep 22, 2022

Please adjust the PR description to include the motivation (@nikic’s comment) explaining why it is okay to reapply the patch.

r=me otherwise.

@nikic
Copy link
Contributor

nikic commented Sep 30, 2022

Probably fixes #97217 as well.

@Kobzol
Copy link
Member

Kobzol commented Oct 1, 2022

We should also check perf. before relanding. (I don't know if this PR is ready for that, so I didn't schedule it yet to avoid spam).

@nagisa
Copy link
Member

nagisa commented Oct 1, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 1, 2022
@bors
Copy link
Collaborator

bors commented Oct 1, 2022

⌛ Trying commit d1ddf27de674cddc714f857f5ff76cbf315460e6 with merge 252b42173601b80baba38d146cd7ca8f59813819...

@lqd
Copy link
Member

lqd commented Oct 1, 2022

Did anything change in serde since last time, or are we accepting that merging this will "reopen" #94390 (it's not closed, but supposedly fixed) -- does https://github.com/MarkDDR/long_compile_rustc_1.59.0 still build slowly with this PR compared to nightly ?

@nikic
Copy link
Contributor

nikic commented Oct 1, 2022

@lqd I don't know if anything changed in serde, but the intention here is indeed to potentially regress compile-time for certain types of excessively large machine-generated code in favor of fixing a class of common optimization failures.

@lqd
Copy link
Member

lqd commented Oct 1, 2022

It would be reassuring to have an idea on how common these excessively large cases are, but I don't think we have such data. It feels like it should be rare to me, and worth the better codegen for the general case.

Because of the tradeoff, do we know if that's a call that t-compiler should make (at their next meeting) or is wg-llvm signoff enough ?

@bors
Copy link
Collaborator

bors commented Oct 1, 2022

☀️ Try build successful - checks-actions
Build commit: 252b42173601b80baba38d146cd7ca8f59813819 (252b42173601b80baba38d146cd7ca8f59813819)

@rust-timer
Copy link
Collaborator

Queued 252b42173601b80baba38d146cd7ca8f59813819 with parent edadc7c, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (252b42173601b80baba38d146cd7ca8f59813819): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
1.1% [0.5%, 3.9%] 12
Regressions ❌
(secondary)
3.3% [1.7%, 4.8%] 3
Improvements ✅
(primary)
-0.7% [-3.5%, -0.2%] 160
Improvements ✅
(secondary)
-0.8% [-5.0%, -0.2%] 128
All ❌✅ (primary) -0.5% [-3.5%, 3.9%] 172

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
2.2% [0.8%, 3.7%] 3
Regressions ❌
(secondary)
3.3% [2.1%, 4.8%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.2% [0.8%, 3.7%] 3

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.2% [3.2%, 3.2%] 1
Improvements ✅
(primary)
-2.4% [-4.7%, -1.6%] 5
Improvements ✅
(secondary)
-2.5% [-3.2%, -2.0%] 9
All ❌✅ (primary) -2.4% [-4.7%, -1.6%] 5

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Oct 2, 2022
@nagisa nagisa added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 28, 2022
@nagisa
Copy link
Member

nagisa commented Nov 5, 2022

Looking at the perf results, it seems like the one-off regressions are LLVM doing more stuff in some instances. Not super much more, but also quite expected given that LLVM now has a free pass to inline a bunch more code.

@nagisa nagisa added the perf-regression-triaged The performance regression has been triaged. label Nov 5, 2022
@oxalica
Copy link
Contributor

oxalica commented Sep 11, 2023

I found there's a patch https://reviews.llvm.org/D150989 in LLVM 17 which claims to fix the mutual inlining explosion. If that's the case, does it mean we don't need cold/noinline on landing pads anymore at all?

Unfortunately I couldn't find any recent enough test to verify. #94390 seems irrelevant to this inlining explosion since their example is still slow to compile on current nightly 68c2f5b (with noinline) for me. This ancient comment #41696 (comment) also cannot reproduce.

@erikdesjardins
Copy link
Contributor

@oxalica I'm confused what you're responding to. I don't think anyone is in doubt that the exponential inlining issues are fixed, that's the reason this PR removes noinline.

We want cold, since panicking paths are cold, pretty much by definition.

@oxalica
Copy link
Contributor

oxalica commented Sep 12, 2023

@oxalica I'm confused what you're responding to. I don't think anyone is in doubt that the exponential inlining issues are fixed, that's the reason this PR removes noinline.

We want cold, since panicking paths are cold, pretty much by definition.

My bad, I misunderstood it to be all drop calls. Yeah, landing pads should be cold.

@nikic
Copy link
Contributor

nikic commented Sep 20, 2023

If I'm understanding things correctly here, a reduced example would be:

@type_info = external global ptr

define void @test(ptr %arg) personality ptr @__CxxFrameHandler3 {
bb:
  %a1 = alloca ptr, align 4
  %a2 = alloca ptr, align 4
  call void @llvm.lifetime.start.p0(i64 4, ptr nonnull %a2)
  invoke void @throw()
          to label %bb14 unwind label %bb8

bb8:                                              ; preds = %bb7
  %i9 = cleanuppad within none []
  call void @llvm.lifetime.start.p0(i64 4, ptr nonnull %a1)
  store ptr %arg, ptr %a1, align 4
  call fastcc void @foo(ptr %a1) [ "funclet"(token %i9) ]
  call void @llvm.lifetime.end.p0(i64 4, ptr nonnull %a1)
  cleanupret from %i9 unwind label %bb15

bb14:                                             ; preds = %bb7
  unreachable

bb15:                                             ; preds = %bb13, %bb5
  %cs = catchswitch within none [label %bb17] unwind to caller

bb17:                                             ; preds = %bb15
  %cp = catchpad within %cs [ptr @type_info, i32 8, ptr %a2]
  %p = load ptr, ptr %a2, align 4
  call fastcc void @cleanup(ptr %p) [ "funclet"(token %cp) ]
  catchret from %cp to label %exit

exit:
  call void @llvm.lifetime.end.p0(i64 4, ptr nonnull %a2)
  ret void
}

declare i32 @__CxxFrameHandler3(...)
declare void @throw()
declare void @cleanup(ptr)
declare void @foo(ptr)
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture)
declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture)

Here %a1 and %a2 get allocated to the same stack slot.

Something I don't fully understand is at which point precisely the exception object %a2 gets written. For the stack slot reuse to be problematic, this must happen before the cleanuppad runs, rather than before the catchswitch/catchpad runs. Is that what happens? Is there any documentation (or open-source implementation) for this anywhere?

@nikic
Copy link
Contributor

nikic commented Sep 21, 2023

I've filed llvm/llvm-project#66984 upstream.

Some helpful references on C++ SEH: https://www.codeproject.com/Articles/2126/How-a-C-compiler-implements-exception-handling and https://gitlab.winehq.org/wine/wine/-/blob/ee17400c05d88fa29d0b895fa01902adfc91ba7f/dlls/msvcrt/except_x86_64.c#L513

@bors
Copy link
Collaborator

bors commented Sep 25, 2023

🔒 Merge conflict

This pull request and the master branch diverged in a way that cannot be automatically merged. Please rebase on top of the latest master branch, and let the reviewer approve again.

How do I rebase?

Assuming self is your fork and upstream is this repository, you can resolve the conflict following these steps:

  1. git checkout re-cold-land (switch to your branch)
  2. git fetch upstream master (retrieve the latest master)
  3. git rebase upstream/master -p (rebase on top of it)
  4. Follow the on-screen instruction to resolve conflicts (check git status if you got lost).
  5. git push self re-cold-land --force-with-lease (update this PR)

You may also read Git Rebasing to Resolve Conflicts by Drew Blessing for a short tutorial.

Please avoid the "Resolve conflicts" button on GitHub. It uses git merge instead of git rebase which makes the PR commit history more difficult to read.

Sometimes step 4 will complete without asking for resolution. This is usually due to difference between how Cargo.lock conflict is handled during merge and rebase. This is normal, and you should still perform step 5 to update this PR.

Error message
Auto-merging compiler/rustc_codegen_ssa/src/mir/block.rs
CONFLICT (content): Merge conflict in compiler/rustc_codegen_ssa/src/mir/block.rs
Auto-merging compiler/rustc_codegen_llvm/src/builder.rs
Auto-merging compiler/rustc_codegen_gcc/src/builder.rs
Automatic merge failed; fix conflicts and then commit the result.

@nikic nikic mentioned this pull request Sep 28, 2023
@nikic
Copy link
Contributor

nikic commented Sep 28, 2023

The WinEH issue should be fixed now. We can try this again with a condition for LLVM >= 17.0.2.

erikdesjardins and others added 3 commits October 2, 2023 10:37
Co-authored-by: Max Fan <git@max.fan>
Co-authored-by: Nikita Popov <npopov@redhat.com>
We no longer generate a protector for the strong case in this test,
which is actually the expected behavior per the test comment.
@rust-log-analyzer

This comment has been minimized.

@nikic
Copy link
Contributor

nikic commented Oct 2, 2023

I've limited this to LLVM >= 17.0.2. Fingers crossed that i686-msvc works after the LLVM fix.

@bors r+

@bors
Copy link
Collaborator

bors commented Oct 2, 2023

📌 Commit 5bcf4f2 has been approved by nikic

It is now in the queue for this repository.

@bors
Copy link
Collaborator

bors commented Oct 2, 2023

⌛ Testing commit 5bcf4f2 with merge 41a28aa...

@bors
Copy link
Collaborator

bors commented Oct 2, 2023

💥 Test timed out

@rust-log-analyzer
Copy link
Collaborator

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@nikic
Copy link
Contributor

nikic commented Oct 2, 2023

Looks spurious:

2023-10-02T15:06:15.1832520Z �[0m�[0m�[1m�[32m    Updating�[0m crates.io index
2023-10-02T15:56:43.8689800Z ##[error]The operation was canceled.

@bors retry

@bors
Copy link
Collaborator

bors commented Oct 2, 2023

⌛ Testing commit 5bcf4f2 with merge 2e5a9dd...

@bors
Copy link
Collaborator

bors commented Oct 2, 2023

☀️ Test successful - checks-actions
Approved by: nikic
Pushing 2e5a9dd to master...

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (2e5a9dd): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.5% [0.3%, 5.9%] 12
Regressions ❌
(secondary)
2.7% [0.4%, 7.7%] 5
Improvements ✅
(primary)
-0.5% [-4.5%, -0.2%] 78
Improvements ✅
(secondary)
-0.9% [-6.9%, -0.1%] 65
All ❌✅ (primary) -0.2% [-4.5%, 5.9%] 90

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.1% [0.1%, 0.1%] 1
Regressions ❌
(secondary)
3.4% [3.0%, 3.8%] 2
Improvements ✅
(primary)
-7.0% [-7.0%, -7.0%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -3.5% [-7.0%, 0.1%] 2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.6% [1.0%, 4.0%] 6
Regressions ❌
(secondary)
4.8% [2.7%, 7.0%] 2
Improvements ✅
(primary)
-2.2% [-5.7%, -1.2%] 6
Improvements ✅
(secondary)
-2.9% [-6.9%, -0.9%] 9
All ❌✅ (primary) -0.3% [-5.7%, 4.0%] 12

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.7% [0.1%, 1.3%] 20
Regressions ❌
(secondary)
2.2% [2.1%, 2.3%] 6
Improvements ✅
(primary)
-1.9% [-5.5%, -0.0%] 35
Improvements ✅
(secondary)
-5.4% [-11.2%, -0.2%] 10
All ❌✅ (primary) -1.0% [-5.5%, 1.3%] 55

Bootstrap: 629.674s -> 624.341s (-0.85%)
Artifact size: 273.32 MiB -> 271.92 MiB (-0.51%)

@rylev
Copy link
Member

rylev commented Oct 3, 2023

@nikic @InnovativeInventor - I'm doing the perf triage. Looks like the results are on average an improvement - the one concern I have is the very large regression in ripgrep opt. I suppose this is to be expected with LLVM doing more (which is backed up by the detailed perf results which show more time spent in LLVM). I just wanted to make sure that my interpretation is indeed correct, and the regressions we're seeing are both expected and acceptable.

@Kobzol
Copy link
Member

Kobzol commented Oct 3, 2023

The regex search runtime benchmark (which is usually quite stable) saw some improvement from this change - https://perf.rust-lang.org/compare.html?start=5333b878c8bc1c4267a67ea3682663629e47541a&end=2e5a9dd6c9eaa42f0684b4b760bd68fc27cbe51b&stat=instructions%3Au&tab=runtime. It uses the regex crate (which is also used by ripgrep) internally. I wouldn't make much of it, but could be a hint that indeed regex optimizes slightly better thanks to this.

@nikic
Copy link
Contributor

nikic commented Oct 3, 2023

@nikic @InnovativeInventor - I'm doing the perf triage. Looks like the results are on average an improvement - the one concern I have is the very large regression in ripgrep opt. I suppose this is to be expected with LLVM doing more (which is backed up by the detailed perf results which show more time spent in LLVM). I just wanted to make sure that my interpretation is indeed correct, and the regressions we're seeing are both expected and acceptable.

Yes, that's right. As the change allows more inlining, LLVM will spend more time optimizing in some cases, but also producing better code. (Which is why we're seeing improvements on check builds.)

@InnovativeInventor
Copy link
Contributor Author

The regex search runtime benchmark (which is usually quite stable) saw some improvement from this change - https://perf.rust-lang.org/compare.html?start=5333b878c8bc1c4267a67ea3682663629e47541a&end=2e5a9dd6c9eaa42f0684b4b760bd68fc27cbe51b&stat=instructions%3Au&tab=runtime. It uses the regex crate (which is also used by ripgrep) internally. I wouldn't make much of it, but could be a hint that indeed regex optimizes slightly better thanks to this.

This is not surprising -- the compiled code output that I was looking at that initially led me to submit this PR was in ripgrep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet