Skip to content

Comments

Deterministic query cycles for parallel front-end#149849

Open
zetanumbers wants to merge 5 commits intorust-lang:mainfrom
zetanumbers:deterministic_cycles
Open

Deterministic query cycles for parallel front-end#149849
zetanumbers wants to merge 5 commits intorust-lang:mainfrom
zetanumbers:deterministic_cycles

Conversation

@zetanumbers
Copy link
Contributor

@zetanumbers zetanumbers commented Dec 10, 2025

View all comments

The mechanism is similar to cycle detection in single-threaded mode. We traverse the deadlocked query graph from the top active query downwards to subqueries until we visit some query a second time, thus finding a cycle. With multi-thread front-end enabled one query may now have more than one active subqueries, aka we used one of parallel interfaces parallel!, join, par_for_each, etc. As such we have to traverse the "leftmost" active subquery to recover the sequential behavior of these parallel interfaces in single-threaded mode. New TreeNodeIndex saves implicit context information about what join (or scope) task we entered while executing a query, which we then use in break_query_cycle.

However we then have to guarantee the query stack from single-threaded mode is included in the active query graph. This is true for join function as their first task will be completed on the same thread and same will be tried for the second task unless stolen which is fine for us. scope places tasks in local queue and pops them in LIFO maner, while other worker threads could only steal from that queue in FIFO maner, thus we can guarantee the next task is either stolen or available for execution.

Fixes #142064
Fixes #142063
Fixes #127971

UPDATE: commits are sliced to the finest detail

@rustbot rustbot added A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 10, 2025
@rustbot
Copy link
Collaborator

rustbot commented Dec 10, 2025

These commits modify the Cargo.lock file. Unintentional changes to Cargo.lock can be introduced when switching branches and rebasing PRs.

If this was unintentional then you should revert the changes before this PR is merged.
Otherwise, you can ignore this comment.

@rustbot
Copy link
Collaborator

rustbot commented Dec 10, 2025

r? @eholk

rustbot has assigned @eholk.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rust-log-analyzer

This comment has been minimized.

@matthiaskrgr
Copy link
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Dec 10, 2025
Deterministic query cycles for parallel front-end
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 10, 2025
@Kivooeo
Copy link
Member

Kivooeo commented Dec 10, 2025

@matthiaskrgr there is a compilation error in this PR, not sure if this can be benchmarked?

Ok this is surprising

@rust-bors
Copy link
Contributor

rust-bors bot commented Dec 10, 2025

☀️ Try build successful (CI)
Build commit: 306a768 (306a768e2f03217e62aee6655739249be1d8aa95, parent: 377656d3dd3f9c23a9c8713e163f4365a5261a84)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (306a768): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.5% [0.2%, 1.8%] 155
Regressions ❌
(secondary)
0.7% [0.2%, 1.9%] 138
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.0% [-0.0%, -0.0%] 1
All ❌✅ (primary) 0.5% [0.2%, 1.8%] 155

Max RSS (memory usage)

Results (secondary -1.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.3% [2.3%, 2.3%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.1% [-3.5%, -0.9%] 3
All ❌✅ (primary) - - 0

Cycles

Results (primary 2.2%, secondary 2.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.2% [2.0%, 2.4%] 3
Regressions ❌
(secondary)
3.6% [2.8%, 5.3%] 4
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.6% [-4.6%, -4.6%] 1
All ❌✅ (primary) 2.2% [2.0%, 2.4%] 3

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 471.42s -> 474.532s (0.66%)
Artifact size: 389.04 MiB -> 389.08 MiB (0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Dec 10, 2025
@Zoxc
Copy link
Contributor

Zoxc commented Dec 14, 2025

It's a bit unclear to me what source of non-determinism this tries to fix. Does this alter parallel execution in any way other than picking which query in the cycle use for resumption? It looks to me like you're trying to pick the point in the cycle to break which corresponds to a single threaded execution, but I don't think that point is guaranteed to be resumable.

Currently it looks like we're not deterministically picking which query in a cycle to resume when multiple is present. That's fairly simple to improve, though I think the queries available for resumption is non-deterministic to start with.

@zetanumbers
Copy link
Contributor Author

It's a bit unclear to me what source of non-determinism this tries to fix. Does this alter parallel execution in any way other than picking which query in the cycle use for resumption? It looks to me like you're trying to pick the point in the cycle to break which corresponds to a single threaded execution, but I don't think that point is guaranteed to be resumable.

I make an assumption that when we get a query cycle there is the "point in the cycle to break which corresponds to a single threaded execution" which currently rustc_thread_scope::scope violates due to the order of task executions (and which I plan to replace with join). If you traverse query graph in a "single-threaded manner" you will eventually find a thread waiting for already visited query to finish because graph is finite and every active query during a deadlock has a subquery (either direct or waiting on) to go to next. That last query wait closing a loop has to be resumed. I assume any query can be reached by traversing down to some subquery from the root query which I also assume is unique.

though I think the queries available for resumption is non-deterministic to start with.

That's what I am trying to make deterministic.

@rustbot

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@reddevilmidzy reddevilmidzy added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 9, 2026
@rustbot
Copy link
Collaborator

rustbot commented Jan 13, 2026

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

@rustbot

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Feb 13, 2026

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@petrochenkov
Copy link
Contributor

petrochenkov commented Feb 13, 2026

Also, how is more determinism able to fix ICEs like #142064 or #127971?

@zetanumbers
Copy link
Contributor Author

zetanumbers commented Feb 15, 2026

As I've described I resume the same query in the first cycle as in single-threaded mode. Before that we were sorting query hashes to more deterministically choose query to resume given a set of query waits. So new code no longer uses these hashes and I removed them in f9c6527. #152229 resembles this commit, but also changes the old cycle handling code to break on now unspecified query in the cycle. So you can consider this PR to implement all #152229 changes and merging it would only require me to do rebase again.

  • Why is 252dc42 necessary for deterministic query cycles? The commit doesn't state the motivation, and the change looks unrelated to the rest of the PR. If it's a justified refactoring, please submit it separately (with motivation), and we'll merge it.

Because in the next commit (ca51215) I've added some functionality to those interfaces relying on ImplicitCtxt from rustc_middle. No, I don't know how you would be able to neatly move ImplicitCtxt instead. I do not make such disturbing changes (move code across crates) unprompted, so assume every unexplained code change serves the PR's one function.

@zetanumbers
Copy link
Contributor Author

Also, how is more determinism able to fix ICEs like #142064 or #127971?

New cycle breaking code is guaranteed to break on the same query as single-threaded mode would. Breaking query would prompt a query wait to preemptively stop and call value_from_cycle_error which in turn calls Value::from_cycle_error. That trait method for some queries is responsible for emitting a compiler error related to query cycle. #127971 and #142064's problem comes from us breaking cycle on a query different from the single-threaded mode with a reported cycle in CycleError different from the single-threaded mode. And so I make those to always be the same for a cycle's first unblocked thread.

@rustbot
Copy link
Collaborator

rustbot commented Feb 24, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@rust-log-analyzer
Copy link
Collaborator

The job tidy failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)
    ╭▸ compiler/rustc_query_impl/src/job.rs:142:69
    │
142 │ /// This difference in behavior is strictly more tolerable than the undeterministic cycle breaking.
    ╰╴                                                                    ━━━━━━━━━━━━━━━
rerun tidy with `--extra-checks=spellcheck --bless` to fix typos
tidy [extra_checks]: checks with external tool 'typos' failed
tidy [extra_checks]: FAIL
tidy: The following check failed: extra_checks
Command `/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-tools-bin/rust-tidy --root-path=/checkout --cargo-path=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo --output-dir=/checkout/obj/build --concurrency=4 --npm-path=/node/bin/yarn --extra-checks=py,cpp,js,spellcheck` failed with exit code 1
Created at: src/bootstrap/src/core/build_steps/tool.rs:1612:23
Executed at: src/bootstrap/src/core/build_steps/test.rs:1365:29

Command has failed. Rerun with -v to see more details.
Bootstrap failed while executing `test src/tools/tidy tidyselftest --extra-checks=py,cpp,js,spellcheck`
Build completed unsuccessfully in 0:02:45
  local time: Tue Feb 24 15:50:23 UTC 2026
  network time: Tue, 24 Feb 2026 15:50:23 GMT
##[error]Process completed with exit code 1.

@zetanumbers
Copy link
Contributor Author

Added a big doc comment about new query cycle breaking code.

@zetanumbers
Copy link
Contributor Author

@rustbot ready
@rustbot reroll

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 24, 2026
@rustbot rustbot assigned chenyukang and unassigned cjgillot and petrochenkov Feb 24, 2026
@zetanumbers
Copy link
Contributor Author

r? @nnethercote

This PR is stalling for 3.5 months. Please, I need any tangible feedback.

@rustbot rustbot assigned nnethercote and unassigned chenyukang Feb 24, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Feb 24, 2026

☔ The latest upstream changes (presumably #153047) made this pull request unmergeable. Please resolve the merge conflicts.

@nnethercote
Copy link
Contributor

@zetanumbers: I see this PR hasn't been handled well and has been frustrating for you. I'm sorry about that. I also know very little about the query cycle handling code but I will do my best to do a close review, and try to get this PR back on track.

Copy link
Contributor

@nnethercote nnethercote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First review is just looking at the TreeNodeIndex. Seems reasonable, though there is scope for more comments plus some tests.

View changes since this review

///
/// ## Order
///
/// Encoding allows to sort nodes in left < parent < right linear order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this why you start at the high bits, instead of the low bits?

/// ```
///
/// Node reach after traversal of `LRLRRLLR` branches should be represented as `0b0101100110000...0`.
/// Root is obviously encoded as `0b10000...0`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's totally obvious :) It would be nice to have a small table with three or four examples, with the root as the first example, and then longer examples.


impl Display for BranchingError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
"TreeNodeIndex's free bits have been exhausted, make sure recursion is used carefully"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if we might ever reach this limit? A balanced binary tree that is 64 deep is very large, but is it possible we might get a highly unbalanced binary tree of that depth?


/// Error for exhausting free bits
#[derive(Debug)]
pub struct BranchingError(());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the () for? Could this just be pub struct BranchingError;?

Self(0x80000000_00000000)
}

/// Append tree branch no. `branch_idx` reserving `bits` bits.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments on try_bits_branch and branch are terse. I'm having trouble understanding exactly what "reserve" means, and what bits and branch_num. Can you expand the comment slightly?

use std::error::Error;
use std::fmt::Display;

/// Ordered index for dynamic trees
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few unit tests for this module would be good.

pub struct QueryInclusion {
pub id: QueryJobId,
pub branch: TreeNodeIndex,
}
Copy link
Contributor

@nnethercote nnethercote Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments on this type and its fields would be welcome. In particular explaning why the TreeNodeIndex is present.

Does "Inclusion" have any particular meaning? I'm not sure what it's trying to communicate.

ret
}

fn serial_join<A, B, RA, RB>(oper_a: A, oper_b: B) -> (RA, RB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful in the commit message to add a brief explanation why these functions are being moved. Something like "Because the next commit will modify them to use ImplicitCtxt which is not available in rustc_data_structures."

) {
match items {
[] => return,
[item] => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does par_slice get called with zero or one items a lot in practice?

Copy link
Contributor Author

@zetanumbers zetanumbers Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did, as without this code parallel-rustc tests start to break as I've rewrote other parts of par_slice. AFAIK it should happen at least with methods like par_foreign_items, because we might not have foreign items, etc.

pub fn par_foreign_items(
&self,
f: impl Fn(ForeignItemId) -> Result<(), ErrorGuaranteed> + DynSend + DynSync,
) -> Result<(), ErrorGuaranteed> {
try_par_for_each_in(&self.foreign_items[..], |&&id| f(id))
}

Also I have intended to avoid rustc_thread_pool::scope call there since it can involve a bit of heavy machinery inside.

})
}

fn raw_branched_join<A, B, RA: Send, RB: Send>(oper_a: A, oper_b: B) -> (RA, RB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment.

rustc_thread_pool::join(|| branch_context(0, 2, oper_a), || branch_context(1, 2, oper_b))
}

fn branch_context<F, R>(branch_num: u64, branch_space: u64, f: F) -> R
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this needs a comment.

A: FnOnce() -> RA + Send,
B: FnOnce() -> RB + Send,
{
rustc_thread_pool::join(|| branch_context(0, 2, oper_a), || branch_context(1, 2, oper_b))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is 2 used for branch_space?

@nnethercote
Copy link
Contributor

I had lots of little comments, but in general the first four commits seem reasonable. They are basically adding a bunch of plumbing to track node ordering.

The final commit is the important one, where the plumbing is put to use to change the cycle handling. This is difficult for me to evaluate because I know very little about cycle handling. AFAIK @Zoxc and @zetanumbers are the only ones who do know about cycle handling. I do like the expanded comment with the examples.

@Zoxc: you asked some questions earlier, and @zetanumbers gave what seem like reasonable answers. Do you have any other observations or concerns that would prevent this PR from being merged. (It appears that this PR served as, at least, partial inspiration for #152229, which speaks to its merits.)

@zetanumbers You said this fixes #142064,
#142063, and
#127971. Is it possible to write tests for these issues? It would be good to have test-based evidence that the problems are fixed, and won't regress in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ICE: parallel: internal error: entered unreachable code diagnostics flipflopping with Zthreads ICE: only 'variances_of' returns '&[ty::Variance]'