Skip to content

Add ability to transfer Raft leader to another node#200

Open
taurus-forever wants to merge 2 commits into
bakwc:masterfrom
taurus-forever:add-transfer-leadership
Open

Add ability to transfer Raft leader to another node#200
taurus-forever wants to merge 2 commits into
bakwc:masterfrom
taurus-forever:add-transfer-leadership

Conversation

@taurus-forever

@taurus-forever taurus-forever commented Jun 30, 2026

Copy link
Copy Markdown

Stopping a Raft leader node causing Raft leader reelection.
The election process itself takes a bit of time and with a
tight schedule (slow hardware and/or bad luck) may trigger
a Primary step-down (due to lack of DCS updates by Patroni).

Also Patroni-based operators have to handle temporary Raft
leader outage gracefully which make scale-down logic complex.

The main idea of this PR is to add ability for operators to
transfer the current node Raft leadership to another member
of the Raft cluster before shutting down the current server.

Example:

> syncobj-admin -conn 10.69.235.38:2222 -pass mypass -status
commit_idx: 518
enabled_code_version: 0
has_quorum: True
last_applied: 518
leader: 10.69.235.38:2222 <<<<<<<<<<<<<<<<  .38 is a Raft Leader
leader_commit_idx: 518
log_len: 50
match_idx_count: 2
match_idx_server_10.69.235.244:2222: 518
match_idx_server_10.69.235.36:2222: 518
next_node_idx_count: 2
next_node_idx_server_10.69.235.244:2222: 519
next_node_idx_server_10.69.235.36:2222: 519
partner_node_status_server_10.69.235.244:2222: 2
partner_node_status_server_10.69.235.36:2222: 2
partner_nodes_count: 2
raft_term: 9
readonly_nodes_count: 0
revision: deprecated
self: 10.69.235.38:2222
self_code_version: 0
state: 2
uptime: 421
version: 0.3.15

> syncobj-admin -conn 10.69.235.38:2222 -pass mypass -transfer 10.69.235.36:2222
SUCCESS TRANSFER 10.69.235.36:2222

> syncobj-admin -conn 10.69.235.38:2222 -pass mypass -status
commit_idx: 527
enabled_code_version: 0
has_quorum: True
last_applied: 527
leader: 10.69.235.36:2222 <<<<<<<<<<  .36 is a new Raft Leader
leader_commit_idx: 527
log_len: 59
match_idx_count: 2
match_idx_server_10.69.235.244:2222: 526
match_idx_server_10.69.235.36:2222: 526
next_node_idx_count: 2
next_node_idx_server_10.69.235.244:2222: 527
next_node_idx_server_10.69.235.36:2222: 527
partner_node_status_server_10.69.235.244:2222: 2
partner_node_status_server_10.69.235.36:2222: 2
partner_nodes_count: 2
raft_term: 10
readonly_nodes_count: 0
revision: deprecated
self: 10.69.235.38:2222
self_code_version: 0
state: 0
uptime: 438
version: 0.3.15

> ssh 10.69.235.38 -- reboot # will cause no Raft troubles

Manually tested on PySyncObj 0.3.15, Patroni 3.3.8, PostgreSQL 16, Ubuntu 24.04.

Assisted-by: Claude:claude-4.8-opus

Stopping a Raft leader node causing Raft leader reelection.
The election process itself takes a bit of time and with a
tight schedule (slow hardware and/or bad luck) may trigger
a Primary step-down (due to lack of DCS updates by Patroni).

Also Patroni-based operators have to handle temporary Raft
leader outage gracefully which make scale-down logic complex.

The main idea of this PR is to add ability for operators to
transfer the current node Raft leadership to another member
of the Raft cluster before shutting down the current server.

Example:
```
> syncobj-admin -conn 10.69.235.38:2222 -pass mypass -status
commit_idx: 518
enabled_code_version: 0
has_quorum: True
last_applied: 518
leader: 10.69.235.38:2222 <<<<<<<<<<<<<<<< 10.69.235.38 is a Leader
leader_commit_idx: 518
log_len: 50
match_idx_count: 2
match_idx_server_10.69.235.244:2222: 518
match_idx_server_10.69.235.36:2222: 518
next_node_idx_count: 2
next_node_idx_server_10.69.235.244:2222: 519
next_node_idx_server_10.69.235.36:2222: 519
partner_node_status_server_10.69.235.244:2222: 2
partner_node_status_server_10.69.235.36:2222: 2
partner_nodes_count: 2
raft_term: 9
readonly_nodes_count: 0
revision: deprecated
self: 10.69.235.38:2222
self_code_version: 0
state: 2
uptime: 421
version: 0.3.15

> syncobj-admin -conn 10.69.235.38:2222 -pass mypass -transfer 10.69.235.36:2222
SUCCESS TRANSFER 10.69.235.36:2222

> syncobj-admin -conn 10.69.235.38:2222 -pass mypass -status
commit_idx: 527
enabled_code_version: 0
has_quorum: True
last_applied: 527
leader: 10.69.235.36:2222 <<<<<<<<<< 10.69.235.36 is a new Leader
leader_commit_idx: 527
log_len: 59
match_idx_count: 2
match_idx_server_10.69.235.244:2222: 526
match_idx_server_10.69.235.36:2222: 526
next_node_idx_count: 2
next_node_idx_server_10.69.235.244:2222: 527
next_node_idx_server_10.69.235.36:2222: 527
partner_node_status_server_10.69.235.244:2222: 2
partner_node_status_server_10.69.235.36:2222: 2
partner_nodes_count: 2
raft_term: 10
readonly_nodes_count: 0
revision: deprecated
self: 10.69.235.38:2222
self_code_version: 0
state: 0
uptime: 438
version: 0.3.15

> ssh 10.69.235.38 -- shutdown -P now # will cause no Raft troubles
```

Assisted-by: Claude:claude-4.8-opus
transferLeadership() triggers an election on the target follower by
zeroing its __raftElectionDeadline when a 'timeout_now' message arrives.
The election only fires on the follower's next tick, so any same-term
append_entries heartbeat from the still-active outgoing leader that was
processed in between reset the deadline and silently cancelled the
transfer - the admin still got SUCCESS while leadership never moved.

Add a one-shot __transferInProgress flag: set it when 'timeout_now' is
accepted, skip the append_entries deadline reset while it is set so the
outgoing leader's heartbeats can no longer postpone the forced election,
and clear it the moment the election fires. A higher-term append_entries
(a different leader already won) clears the flag and resumes normal
resets, so the node still yields correctly to a genuinely new leader.

Add tests covering a successful transfer under active heartbeats and the
NOT_LEADER denial when transferLeadership() is called on a follower.

Assisted-by: Claude:claude-4.8-opus
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant