Deterministic query cycles for parallel front-end#149849
Deterministic query cycles for parallel front-end#149849zetanumbers wants to merge 5 commits intorust-lang:mainfrom
Conversation
|
These commits modify the If this was unintentional then you should revert the changes before this PR is merged. |
This comment has been minimized.
This comment has been minimized.
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Deterministic query cycles for parallel front-end
|
Ok this is surprising |
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (306a768): comparison URL. Overall result: ❌ regressions - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (secondary -1.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.2%, secondary 2.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 471.42s -> 474.532s (0.66%) |
|
It's a bit unclear to me what source of non-determinism this tries to fix. Does this alter parallel execution in any way other than picking which query in the cycle use for resumption? It looks to me like you're trying to pick the point in the cycle to break which corresponds to a single threaded execution, but I don't think that point is guaranteed to be resumable. Currently it looks like we're not deterministically picking which query in a cycle to resume when multiple is present. That's fairly simple to improve, though I think the queries available for resumption is non-deterministic to start with. |
I make an assumption that when we get a query cycle there is the "point in the cycle to break which corresponds to a single threaded execution" which currently
That's what I am trying to make deterministic. |
0a5000f to
ba45f0b
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
ba45f0b to
206fec5
Compare
|
Some changes occurred in compiler/rustc_codegen_cranelift cc @bjorn3 |
This comment has been minimized.
This comment has been minimized.
|
Reminder, once the PR becomes ready for a review, use |
As I've described I resume the same query in the first cycle as in single-threaded mode. Before that we were sorting query hashes to more deterministically choose query to resume given a set of query waits. So new code no longer uses these hashes and I removed them in f9c6527. #152229 resembles this commit, but also changes the old cycle handling code to break on now unspecified query in the cycle. So you can consider this PR to implement all #152229 changes and merging it would only require me to do rebase again.
Because in the next commit (ca51215) I've added some functionality to those interfaces relying on |
New cycle breaking code is guaranteed to break on the same query as single-threaded mode would. Breaking query would prompt a query wait to preemptively stop and call |
f9c6527 to
f466828
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
The job Click to see the possible cause of the failure (guessed by this bot) |
|
Added a big doc comment about new query cycle breaking code. |
|
r? @nnethercote This PR is stalling for 3.5 months. Please, I need any tangible feedback. |
|
☔ The latest upstream changes (presumably #153047) made this pull request unmergeable. Please resolve the merge conflicts. |
|
@zetanumbers: I see this PR hasn't been handled well and has been frustrating for you. I'm sorry about that. I also know very little about the query cycle handling code but I will do my best to do a close review, and try to get this PR back on track. |
| /// | ||
| /// ## Order | ||
| /// | ||
| /// Encoding allows to sort nodes in left < parent < right linear order. |
There was a problem hiding this comment.
Is this why you start at the high bits, instead of the low bits?
| /// ``` | ||
| /// | ||
| /// Node reach after traversal of `LRLRRLLR` branches should be represented as `0b0101100110000...0`. | ||
| /// Root is obviously encoded as `0b10000...0`. |
There was a problem hiding this comment.
I don't think it's totally obvious :) It would be nice to have a small table with three or four examples, with the root as the first example, and then longer examples.
|
|
||
| impl Display for BranchingError { | ||
| fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
| "TreeNodeIndex's free bits have been exhausted, make sure recursion is used carefully" |
There was a problem hiding this comment.
Do you know if we might ever reach this limit? A balanced binary tree that is 64 deep is very large, but is it possible we might get a highly unbalanced binary tree of that depth?
|
|
||
| /// Error for exhausting free bits | ||
| #[derive(Debug)] | ||
| pub struct BranchingError(()); |
There was a problem hiding this comment.
What's the () for? Could this just be pub struct BranchingError;?
| Self(0x80000000_00000000) | ||
| } | ||
|
|
||
| /// Append tree branch no. `branch_idx` reserving `bits` bits. |
There was a problem hiding this comment.
The comments on try_bits_branch and branch are terse. I'm having trouble understanding exactly what "reserve" means, and what bits and branch_num. Can you expand the comment slightly?
| use std::error::Error; | ||
| use std::fmt::Display; | ||
|
|
||
| /// Ordered index for dynamic trees |
There was a problem hiding this comment.
A few unit tests for this module would be good.
| pub struct QueryInclusion { | ||
| pub id: QueryJobId, | ||
| pub branch: TreeNodeIndex, | ||
| } |
There was a problem hiding this comment.
Comments on this type and its fields would be welcome. In particular explaning why the TreeNodeIndex is present.
Does "Inclusion" have any particular meaning? I'm not sure what it's trying to communicate.
| ret | ||
| } | ||
|
|
||
| fn serial_join<A, B, RA, RB>(oper_a: A, oper_b: B) -> (RA, RB) |
There was a problem hiding this comment.
It would be helpful in the commit message to add a brief explanation why these functions are being moved. Something like "Because the next commit will modify them to use ImplicitCtxt which is not available in rustc_data_structures."
| ) { | ||
| match items { | ||
| [] => return, | ||
| [item] => { |
There was a problem hiding this comment.
Does par_slice get called with zero or one items a lot in practice?
There was a problem hiding this comment.
It did, as without this code parallel-rustc tests start to break as I've rewrote other parts of par_slice. AFAIK it should happen at least with methods like par_foreign_items, because we might not have foreign items, etc.
rust/compiler/rustc_middle/src/hir/mod.rs
Lines 130 to 135 in 0028f34
Also I have intended to avoid rustc_thread_pool::scope call there since it can involve a bit of heavy machinery inside.
| }) | ||
| } | ||
|
|
||
| fn raw_branched_join<A, B, RA: Send, RB: Send>(oper_a: A, oper_b: B) -> (RA, RB) |
| rustc_thread_pool::join(|| branch_context(0, 2, oper_a), || branch_context(1, 2, oper_b)) | ||
| } | ||
|
|
||
| fn branch_context<F, R>(branch_num: u64, branch_space: u64, f: F) -> R |
There was a problem hiding this comment.
And this needs a comment.
| A: FnOnce() -> RA + Send, | ||
| B: FnOnce() -> RB + Send, | ||
| { | ||
| rustc_thread_pool::join(|| branch_context(0, 2, oper_a), || branch_context(1, 2, oper_b)) |
There was a problem hiding this comment.
Why is 2 used for branch_space?
|
I had lots of little comments, but in general the first four commits seem reasonable. They are basically adding a bunch of plumbing to track node ordering. The final commit is the important one, where the plumbing is put to use to change the cycle handling. This is difficult for me to evaluate because I know very little about cycle handling. AFAIK @Zoxc and @zetanumbers are the only ones who do know about cycle handling. I do like the expanded comment with the examples. @Zoxc: you asked some questions earlier, and @zetanumbers gave what seem like reasonable answers. Do you have any other observations or concerns that would prevent this PR from being merged. (It appears that this PR served as, at least, partial inspiration for #152229, which speaks to its merits.) @zetanumbers You said this fixes #142064, |
View all comments
The mechanism is similar to cycle detection in single-threaded mode. We traverse the deadlocked query graph from the top active query downwards to subqueries until we visit some query a second time, thus finding a cycle. With multi-thread front-end enabled one query may now have more than one active subqueries, aka we used one of parallel interfaces
parallel!,join,par_for_each, etc. As such we have to traverse the "leftmost" active subquery to recover the sequential behavior of these parallel interfaces in single-threaded mode. NewTreeNodeIndexsaves implicit context information about whatjoin(orscope) task we entered while executing a query, which we then use inbreak_query_cycle.However we then have to guarantee the query stack from single-threaded mode is included in the active query graph. This is true for
joinfunction as their first task will be completed on the same thread and same will be tried for the second task unless stolen which is fine for us.scopeplaces tasks in local queue and pops them in LIFO maner, while other worker threads could only steal from that queue in FIFO maner, thus we can guarantee the next task is either stolen or available for execution.Fixes #142064
Fixes #142063
Fixes #127971
UPDATE: commits are sliced to the finest detail