CNTRLPLANE-3460: Refactor defrag controller to lower impact of defrag#1618
CNTRLPLANE-3460: Refactor defrag controller to lower impact of defrag#1618dusk125 wants to merge 3 commits into
Conversation
The defrag controller now tracks elapsed time during each sync and skips remaining etcd members if the 5-minute budget is exceeded. Budget-skipped members are deferred to the next reconciliation loop and do not count toward the degraded failure threshold. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
WalkthroughAdds a defragmentation cooldown to DefragController, refactors runDefrag to target one fragmented member per sync (filtering learners, sorting by fragmentation, leader last), updates failure/degraded handling, adjusts clearDegraded, updates fakeEtcdClient.Defragment, and adds/updates tests including TestDefragCooldown. ChangesDefrag controller cooldown & refactor
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/operator/defragcontroller/defragcontroller_test.go`:
- Around line 464-468: The test uses a tiny non-zero defragBudget which can be
flaky; set the DefragController's defragBudget to 0 (or switch the controller to
use a controllable/test clock) so budget exhaustion is deterministic. Locate
where defragBudget is set in the test setup (the variable named defragBudget
used to construct the controller) and replace 1 * time.Nanosecond with 0 (or
adapt the controller constructor to accept a mock clock and use that in the
test) so calling controller.sync(context.TODO(),
factory.NewSyncContext("defrag-controller", eventRecorder)) always observes an
exhausted budget.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e0de8aac-2c9a-42a2-9dee-693e18c04822
📒 Files selected for processing (2)
pkg/operator/defragcontroller/defragcontroller.gopkg/operator/defragcontroller/defragcontroller_test.go
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@dusk125: This pull request references CNTRLPLANE-3460 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Rework the defrag controller to defrag at most one etcd member per reconciliation loop instead of all members in a single pass. After a successful defrag, a 1-hour cooldown prevents further defrags until the next cycle, spacing them out to be gentler on the cluster. Key changes: - Replace time budget with per-sync cooldown (defaultDefragCooldown = 1h) - Sort non-leader members by fragmentation descending, leader always last - Post-defrag: verify member responds via Status(), log before/after DbSize - Replace deprecated wait.Poll with wait.PollUntilContextTimeout - Fix fake Defragment() to update per-member status for multi-sync tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@dusk125: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| if len(member.ClientURLs) == 0 { | ||
| // skip unstarted member | ||
| var ( | ||
| etcdMembers []*etcdserverpb.Member |
|
|
||
| // Sort non-leader members by fragmentation percentage descending so we | ||
| // defrag the most fragmented member first each cycle. | ||
| slices.SortFunc(endpointStatus, func(a, b *clientv3.StatusResponse) int { |
There was a problem hiding this comment.
can you please add a unit test for this sort?
|
|
||
| if !c.lastDefragTime.IsZero() && time.Since(c.lastDefragTime) < c.defragCooldown { | ||
| klog.V(4).Infof("Defrag cooldown active, %v remaining", c.defragCooldown-time.Since(c.lastDefragTime)) | ||
| return nil |
There was a problem hiding this comment.
we could schedule a next defrag if we know the remaining time here
| return nil | ||
| } | ||
|
|
||
| if !c.lastDefragTime.IsZero() && time.Since(c.lastDefragTime) < c.defragCooldown { |
There was a problem hiding this comment.
Could it be a problem that a member could exceed the fragmentation limit for 1h in high churn cluster, since this backfoff / cooldown is member agnostic?
There was a problem hiding this comment.
hmm maybe we can frame this cooldown differently.... e.g after a defrag, the cluster must have no unhealthy members for at least 10m before defragmenting the next member
| recorder.Eventf("DefragControllerDefragmentSuccess", "etcd member has been defragmented: %s, memberID: %d", member.Name, member.ID) | ||
|
|
||
| // Give cluster time to recover before the next sync defrags another member. | ||
| if err := wait.PollUntilContextTimeout(ctx, pollWaitDuration, pollTimeoutDuration, false, |
There was a problem hiding this comment.
This is not the best pattern for controllers. Could we just select a single member try to defrag it and use a queue for any backoff checks like checking the health?
I suppose the defrag is not that often so a 10m between each member defrag shouldn't be an issue?
There was a problem hiding this comment.
line 144 already has that
if !etcdcli.IsClusterHealthy(memberHealth)
so we can have the backoff work through the sync loop for some period of time
| numDefragFailures int | ||
| defragWaitDuration time.Duration | ||
| defragCooldown time.Duration | ||
| lastDefragTime time.Time |
There was a problem hiding this comment.
you will have to persist this somewhere (operator conditions?), imagine you're restarting CEO within that hour.
| recorder.Eventf("DefragControllerDefragmentAttempt", "Attempting defrag on member: %s, memberID: %x, dbSize: %d, dbInUse: %d, leader ID: %d", member.Name, member.ID, status.DbSize, status.DbSizeInUse, status.Leader) | ||
| if _, err := c.defragClient.Defragment(ctx, member); err != nil { | ||
| // Defrag can timeout if defragmentation takes longer than etcdcli.DefragDialTimeout. | ||
| errMsg := fmt.Sprintf("failed defrag on member: %s, memberID: %x: %v", member.Name, member.ID, err) |
There was a problem hiding this comment.
it might not indicate a failure in this case. We have two options: increase the timeout (plus some slack) based on the db size, or just always use some crazy value like 10 minutes.
Not sure it makes sense to poll the status instead, so have a short timeout always and just poll until we're back up and the size has been reduced.
There was a problem hiding this comment.
okay I see you added the polling on success - but we need the polling on failure 😄
| } | ||
| recorder.Eventf("DefragControllerDefragmentSuccess", "etcd member has been defragmented: %s, memberID: %d", member.Name, member.ID) | ||
|
|
||
| // Give cluster time to recover before the next sync defrags another member. |
There was a problem hiding this comment.
if the defrag call came back, then the cluster should be in healthy state and respond. I think the polling is a bit misplaced here.
Summary
Test plan
TestDefragBudgetExceededtest verifies budget enforcement with a near-zero budget (1ns): no defrags occur, aDefragControllerBudgetExceededevent is recorded, and the controller does not degrade🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
Tests