Skip to content

fix: reject non-voter leadership transfer targets#3

Draft
anyasabo wants to merge 1 commit into
mainfrom
fix/leadership-transfer-nonvoter-target
Draft

fix: reject non-voter leadership transfer targets#3
anyasabo wants to merge 1 commit into
mainfrom
fix/leadership-transfer-nonvoter-target

Conversation

@anyasabo
Copy link
Copy Markdown
Owner

@anyasabo anyasabo commented May 6, 2026

Problem

LeadershipTransferToServer can accept an explicit target that has already been demoted to non-voter.

The generic transfer path (LeadershipTransfer) already avoids non-voters by selection, but the explicit-target path did not enforce suffrage, so stale automation/operator requests could still attempt transfer to ineligible targets.

How this can happen in concrete terms

  1. Cluster has leader L and follower F.
  2. F is demoted (or removed then re-added as non-voter) during membership churn.
  3. External automation still references F as preferred transfer target (stale cache/config).
  4. LeadershipTransferToServer(F) is issued.
  5. Pre-fix behavior could begin transfer flow despite F lacking a vote.

Mitigations that help (but are not complete)

  • Prefer LeadershipTransfer() (auto-pick) over explicit target when possible.
  • Refresh membership view before targeted transfer.
  • Operationally gate transfer requests after config changes settle.

Gap addressed by this PR: explicit target path now enforces voter suffrage directly.

Impact

Availability/control-plane stability risk:

  • unnecessary term churn,
  • potential transient leader instability,
  • avoidable write latency spikes during transfer attempts.

How we would notice in production

  • Leadership transfer errors around explicit targets.
  • Term bumps/leader changes immediately after targeted transfer attempts.
  • Elevated election metrics correlated with config-change windows.

Provenance / Preconditions

  • Longstanding behavior, not newly introduced this month.
  • Transfer loop in this form traces back to leadership transfer introduction (eba83432, 2019), with later flow updates (d68b78bc, 2022).
  • Condition requires:
    • use of explicit LeadershipTransferToServer, and
    • target server currently non-voting.

What this PR changes

  • Adds an explicit voter check in targeted transfer path.
  • Returns clear error for non-voter target before transfer begins.
  • Adds regression test to ensure failed transfer-to-non-voter does not churn term/leader.

Reviewer reproduction (live in-process cluster)

Reproduce problematic behavior pre-fix

  1. Checkout parent commit:
    • git checkout 6b313b5^
  2. Bring this PR's transfer regression test into that tree:
    • git checkout fix/leadership-transfer-nonvoter-target -- raft_test.go
  3. Run repeatedly to expose churn behavior:
    • go test -run "TestRaft_LeadershipTransferToNonvoterDoesNotDisruptLeader" -count=20 .

Verify fixed behavior

  1. Checkout this branch (fix/leadership-transfer-nonvoter-target).
  2. Run:
    • go test -run "TestRaft_LeadershipTransfer(IgnoresNonvoters|ToNonvoterDoesNotDisruptLeader)$" -count=1 .
  3. Expected: targeted transfer to non-voter fails immediately; leader/term remain stable.

Test plan

  • go test -run "TestRaft_LeadershipTransfer(IgnoresNonvoters|ToNonvoterDoesNotDisruptLeader)$" -count=1 .

LeadershipTransferToServer can currently target a demoted server and trigger unnecessary election churn. Reject non-voter targets up front and add a regression test to confirm leader/term stability after failed transfer attempts.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant