fix: use snapshot metadata for prev-log boundary checks#5
Draft
anyasabo wants to merge 1 commit into
Draft
Conversation
appendEntries can incorrectly reject replay when PrevLogEntry equals or predates the last snapshot boundary, because compacted entries are no longer in LogStore. Use snapshot metadata for boundary comparisons and add deterministic tests for matching and mismatched boundary terms. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
appendEntriesprev-log validation rejects valid replay at the snapshot boundary because compacted entries are no longer present inLogStore.Pre-fix behavior effectively did:
PrevLogEntry == lastLogIndex: compare term,GetLog(PrevLogEntry).When
PrevLogEntryequals a compacted snapshot boundary index,GetLogfails even though snapshot metadata should satisfy the check.How this can happen in concrete terms
S.PrevLogEntry = S,PrevLogTerm = snapshotTerm.Sfrom log store (not present after compaction).This is a natural post-snapshot catch-up path (no adversarial input required).
Mitigations that help (but do not fix root cause)
Neither removes the correctness issue; boundary checks must use snapshot metadata.
Impact
Replication liveness and resilience risk:
How we would notice in production
failed to get previous logwhereprevious-indexis at/near snapshot boundary.Provenance / Preconditions
fdb8145f, 2016).PrevLogEntryat or below compacted snapshot index.What this PR changes
Reviewer reproduction (live in-process Raft state)
Reproduce pre-fix behavior
git checkout b5e9b90^git checkout fix/append-prevlog-snapshot-boundary -- raft_test.gogo test -run "TestRaft_AppendEntries(AcceptsPrevLogAtSnapshotBoundary|RejectsPrevLogAtSnapshotBoundaryWithWrongTerm)$" -count=1 .Verify fixed behavior
fix/append-prevlog-snapshot-boundary).go test -run "TestRaft_AppendEntries(AcceptsPrevLogAtSnapshotBoundary|RejectsPrevLogAtSnapshotBoundaryWithWrongTerm)$" -count=1 .Test plan
go test -run "TestRaft_AppendEntries(AcceptsPrevLogAtSnapshotBoundary|RejectsPrevLogAtSnapshotBoundaryWithWrongTerm)$" -count=1 .go test -run "TestRaft_AppendEntry$|TestRaft_AppendEntries(AcceptsPrevLogAtSnapshotBoundary|RejectsPrevLogAtSnapshotBoundaryWithWrongTerm)$" -count=1 .