fix(sync/pending): stop chasing the tip when building pre-confirmed data#3499
Open
t00ts wants to merge 5 commits into
Open
fix(sync/pending): stop chasing the tip when building pre-confirmed data#3499t00ts wants to merge 5 commits into
t00ts wants to merge 5 commits into
Conversation
023394d to
3dda16c
Compare
Contributor
Author
|
Note: This does not fix the latency issue. Especially in mainnet we'll need variable depth (3-5 blocks) for the pre-confirmed flow. But that's another big (and orthogonal) change to this one, which focuses on fixing the bug and cleaning up the polling flow. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem:
Clients querying the
pre_confirmedtag were getting intermittent 503s.The producer asked the gateway for its
latestpre-confirmed block and required it to line up with the pending (pre-latest) block fetched alongside it.While this makes sense in theory, in practice it's hard to get right because they come from different endpoints sampled a few moments apart. This is due to how the fgw API is designed, and there are no plans to change it.
TDLR: By the time we read the pre-latest at
N, the pre-confirmed had already raced ahead toN+2. Right now if they don't chain we discard the update. This can result in long periods of nothing to serve. Which is wrong.Solution:
Stop chasing a moving target. Instead of asking for
latest, we anchor the pre-confirmed query on our own (known) committed head. So the block we fetch (by number) is the exact pre-confirmed block that chains onto it by construction. The race should no longer happen. Ever.Then, one step further as suggested by SW: The pre-latest is really just a pre-confirmed block one height down that's already past consensus (waiting for block hash), which is exactly what the pending block was giving us. So instead of a separate pending poll, we fetch the pre-latest by number from the same
get_preconfirmedendpoint. Both levels then come from the same place.Also, a couple notable changes:
panic!from the read path. We just return an empty block instead.This is a major rewrite. I have tested it live on
sepolia-integrationand thepre_confirmedserves well with zero drops.See also #3495