Skip to content

fix: sync lastLog cache when append config processing fails#6

Draft
anyasabo wants to merge 1 commit into
mainfrom
fix/append-config-process-failure-lastlog
Draft

fix: sync lastLog cache when append config processing fails#6
anyasabo wants to merge 1 commit into
mainfrom
fix/append-config-process-failure-lastlog

Conversation

@anyasabo
Copy link
Copy Markdown
Owner

@anyasabo anyasabo commented May 6, 2026

Problem

appendEntries can store new entries durably, then fail while processing configuration entry decoding, and return without updating cached lastLog.

Result: durable log advances, but r.getLastLog() remains stale.

How this can happen in concrete terms

  1. Leader sends AppendEntries containing at least one config-changing entry.
  2. Follower successfully writes new entries via StoreLogs.
  3. Config-processing step fails (e.g., malformed peers/config payload in the new entry).
  4. Function returns error before cached lastLog update step.

This produces a cache/store mismatch even though append persistence already succeeded.

Mitigations that help

  • Avoid malformed config-entry payloads (transport hygiene, tooling checks).
  • Ensure config-change producers are well-validated.

But pre-fix, once this error path was hit, cache divergence still occurred.

Impact

Availability/liveness degradation risk after an error event:

  • stale replication metadata,
  • incorrect follow-up consistency checks/retry behavior,
  • prolonged instability after malformed config-entry events.

How we would notice in production

  • Warning logs: failed to append entry ... failed to decode peers (or decode config).
  • Replication behavior inconsistent with actual durable index.
  • Possible mismatch between debug/stats last_log_* and on-disk state immediately after failure.

Provenance / Preconditions

  • Not a brand-new regression: this specific return path was introduced with config-processing integration (e59f65d6, 2021).
  • Trigger requires:
    • append path where StoreLogs succeeds,
    • subsequent config-entry decode failure in processing loop.

What this PR changes

  • On config-processing failure after append, updates cached lastLog to the newly stored last entry before returning.
  • Adds deterministic regression test covering store-success + config-decode-failure path.

Reviewer reproduction (live in-process path)

Reproduce pre-fix behavior

  1. Checkout parent commit:
    • git checkout 53b8474^
  2. Bring this PR's regression test into that tree:
    • git checkout fix/append-config-process-failure-lastlog -- raft_test.go
  3. Run:
    • go test -run "TestRaft_AppendEntriesConfigProcessFailureRefreshesLastLog" -count=1 .
  4. Expected pre-fix: stale cached lastLog assertion failure.

Verify fixed behavior

  1. Checkout this branch (fix/append-config-process-failure-lastlog).
  2. Run:
    • go test -run "TestRaft_AppendEntriesConfigProcessFailureRefreshesLastLog" -count=1 .
  3. Expected: pass; cached lastLog aligns with durable store after error.

Test plan

  • go test -run "TestRaft_AppendEntriesConfigProcessFailureRefreshesLastLog" -count=1 .
  • go test -run "TestRaft_AppendEntry$|TestRaft_AppendEntriesConfigProcessFailureRefreshesLastLog" -count=1 .

appendEntries stores new entries before processing configuration records, so a decode failure can leave durable logs ahead of cached lastLog metadata. Update lastLog on this error path and add regression coverage for malformed config-entry processing.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant