Add ability to transfer Raft leader to another node#200
Open
taurus-forever wants to merge 2 commits into
Open
Conversation
Stopping a Raft leader node causing Raft leader reelection. The election process itself takes a bit of time and with a tight schedule (slow hardware and/or bad luck) may trigger a Primary step-down (due to lack of DCS updates by Patroni). Also Patroni-based operators have to handle temporary Raft leader outage gracefully which make scale-down logic complex. The main idea of this PR is to add ability for operators to transfer the current node Raft leadership to another member of the Raft cluster before shutting down the current server. Example: ``` > syncobj-admin -conn 10.69.235.38:2222 -pass mypass -status commit_idx: 518 enabled_code_version: 0 has_quorum: True last_applied: 518 leader: 10.69.235.38:2222 <<<<<<<<<<<<<<<< 10.69.235.38 is a Leader leader_commit_idx: 518 log_len: 50 match_idx_count: 2 match_idx_server_10.69.235.244:2222: 518 match_idx_server_10.69.235.36:2222: 518 next_node_idx_count: 2 next_node_idx_server_10.69.235.244:2222: 519 next_node_idx_server_10.69.235.36:2222: 519 partner_node_status_server_10.69.235.244:2222: 2 partner_node_status_server_10.69.235.36:2222: 2 partner_nodes_count: 2 raft_term: 9 readonly_nodes_count: 0 revision: deprecated self: 10.69.235.38:2222 self_code_version: 0 state: 2 uptime: 421 version: 0.3.15 > syncobj-admin -conn 10.69.235.38:2222 -pass mypass -transfer 10.69.235.36:2222 SUCCESS TRANSFER 10.69.235.36:2222 > syncobj-admin -conn 10.69.235.38:2222 -pass mypass -status commit_idx: 527 enabled_code_version: 0 has_quorum: True last_applied: 527 leader: 10.69.235.36:2222 <<<<<<<<<< 10.69.235.36 is a new Leader leader_commit_idx: 527 log_len: 59 match_idx_count: 2 match_idx_server_10.69.235.244:2222: 526 match_idx_server_10.69.235.36:2222: 526 next_node_idx_count: 2 next_node_idx_server_10.69.235.244:2222: 527 next_node_idx_server_10.69.235.36:2222: 527 partner_node_status_server_10.69.235.244:2222: 2 partner_node_status_server_10.69.235.36:2222: 2 partner_nodes_count: 2 raft_term: 10 readonly_nodes_count: 0 revision: deprecated self: 10.69.235.38:2222 self_code_version: 0 state: 0 uptime: 438 version: 0.3.15 > ssh 10.69.235.38 -- shutdown -P now # will cause no Raft troubles ``` Assisted-by: Claude:claude-4.8-opus
transferLeadership() triggers an election on the target follower by zeroing its __raftElectionDeadline when a 'timeout_now' message arrives. The election only fires on the follower's next tick, so any same-term append_entries heartbeat from the still-active outgoing leader that was processed in between reset the deadline and silently cancelled the transfer - the admin still got SUCCESS while leadership never moved. Add a one-shot __transferInProgress flag: set it when 'timeout_now' is accepted, skip the append_entries deadline reset while it is set so the outgoing leader's heartbeats can no longer postpone the forced election, and clear it the moment the election fires. A higher-term append_entries (a different leader already won) clears the flag and resumes normal resets, so the node still yields correctly to a genuinely new leader. Add tests covering a successful transfer under active heartbeats and the NOT_LEADER denial when transferLeadership() is called on a follower. Assisted-by: Claude:claude-4.8-opus
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stopping a Raft leader node causing Raft leader reelection.
The election process itself takes a bit of time and with a
tight schedule (slow hardware and/or bad luck) may trigger
a Primary step-down (due to lack of DCS updates by Patroni).
Also Patroni-based operators have to handle temporary Raft
leader outage gracefully which make scale-down logic complex.
The main idea of this PR is to add ability for operators to
transfer the current node Raft leadership to another member
of the Raft cluster before shutting down the current server.
Example:
Manually tested on PySyncObj 0.3.15, Patroni 3.3.8, PostgreSQL 16, Ubuntu 24.04.
Assisted-by: Claude:claude-4.8-opus