CORENET-7154: Fix debounce timer for OperatorConfig level being incorrectly cleared by tpantelis · Pull Request #3011 · openshift/cluster-network-operator

tpantelis · 2026-05-18T13:59:06Z

The debounce timer for the OperatorConfig status level was being incorrectly cleared when the operator config exists. The code was using the OperatorConfig status level internally to track the debounce timer for when the operator config is missing. However it also blindly cleared the timer when the operator config exists, which is the normal case, potentially interfering with a caller's usage of the OperatorConfig status level.

This commit removes the debouncing for this case as the operconfig should never actually be missing so it was likely some other error that occurred that facilitated adding the debounce. Since it is unknown what caused the original scenario, we shouldn't assume that delaying the degraded status is the correct solution.

Summary by CodeRabbit

Bug Fixes
- Cluster operator is now marked degraded immediately when required operator configuration is missing, ensuring prompt and consistent degraded-state reporting.
Tests
- Tests updated to verify the immediate degraded behavior and to ensure debounce/timer state does not delay or hide missing-configuration reporting.

openshift-ci-robot · 2026-05-18T13:59:11Z

@tpantelis: This pull request references CORENET-7154 which is a valid jira issue.

Details

In response to this:

The debounce timer for the OperatorConfig status level was being incorrectly cleared when the operator config exists. The code was using the OperatorConfig status level internally to track the debounce timer for when the operator config is missing. However it also blindly cleared the timer when the operator config exists, which is the normal case, potentially interfering with a caller's usage of the OperatorConfig status level.

This commit introduces a new operConfigMissing constant specifically for tracking the debounce timer when operator config is missing, preventing it from interfering with the OperatorConfig status level's debounce timer.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-05-18T13:59:21Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e1eb8f65-b95a-4b9e-b0cd-a7f1bed87225

📥 Commits

Reviewing files that changed from the base of the PR and between 014baef and 1c1868b.

📒 Files selected for processing (2)

pkg/controller/statusmanager/status_manager.go
pkg/controller/statusmanager/status_manager_test.go

💤 Files with no reviewable changes (2)

pkg/controller/statusmanager/status_manager.go
pkg/controller/statusmanager/status_manager_test.go

Walkthrough

StatusManager.set now immediately marks ClusterOperator degraded with reason NoOperConfig and message "Failed to get networks.operator.openshift.io cluster" when operator config is missing. Tests were updated to assert the immediate degraded condition instead of using fake-clock debounce checks.

Changes

Operator Config Missing Debounce Tracking

Layer / File(s)	Summary
Immediate degrade and removal of debounce `pkg/controller/statusmanager/status_manager.go`	When `operStatus` is `nil`, `set()` now immediately sets the ClusterOperator degraded condition with reason `NoOperConfig` and message `Failed to get networks.operator.openshift.io cluster`, removing the prior failureFirstSeen/debounce threshold logic.
Test: assert immediate NoOperConfig condition `pkg/controller/statusmanager/status_manager_test.go`	`TestStatusManager_set` no longer relies on fake-clock debounce progression; it immediately retrieves the ClusterOperator after `status.set(false)` and asserts a single condition with reason `NoOperConfig`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (11 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: fixing a bug where a debounce timer for OperatorConfig was being incorrectly cleared, which directly aligns with the core changes removing debounce logic for the missing operator config case.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	Test file uses standard Go testing package, not Ginkgo. The custom check for Ginkgo test names is not applicable to this PR as there are no Ginkgo test declarations.
Test Structure And Quality	✅ Passed	The PR modifies standard Go tests (using testing.T), not Ginkgo tests. The custom check specifically requires "Review Ginkgo test code" - which is not applicable here.
Microshift Test Compatibility	✅ Passed	No new Ginkgo e2e tests are added in this PR. Changes only affect unit tests in status_manager_test.go using standard Go testing framework, not Ginkgo.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	This PR does not add any new Ginkgo e2e tests. It only modifies existing unit tests in pkg/controller/statusmanager/ using standard Go testing framework, not Ginkgo.
Topology-Aware Scheduling Compatibility	✅ Passed	PR modifies StatusManager controller status logic, removing debouncing for operator-config-missing case. No scheduling constraints (affinity, topology, nodeSelector, replicas) are introduced.
Ote Binary Stdout Contract	✅ Passed	The PR modifies status_manager components which are libraries, not OTE binaries. Only Go standard log (stderr by default) and klog (configured to stderr) are used. No stdout violations found.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	No Ginkgo e2e tests were added in this PR. The changes modify only unit tests using Go's standard testing package in status_manager_test.go. The check does not apply.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-05-18T13:59:25Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tpantelis
Once this PR has been reviewed and has the lgtm label, please assign tssurya for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

danwinship · 2026-05-19T13:47:36Z

I don't understand the description. What's the exact scenario and what's the incorrect behavior in that scenario? What is the user-visible impact of the bug?

tpantelis · 2026-05-19T13:51:59Z

I don't understand the description. What's the exact scenario and what's the incorrect behavior in that scenario? What is the user-visible impact of the bug?

I outlined the scenario in https://redhat.atlassian.net/browse/CORENET-7154.

danwinship · 2026-05-20T12:56:28Z

 	CertificateSigner
 	InfrastructureConfig
 	DashboardConfig
+	operConfigMissing


OK, so the problem with this is that operconfig should never actually be missing, and if it was, we'd be fine with immediately going degraded. According to the discussion in #2896 (comment), it seemed like some other error was occurring, which the code was mistakenly classifying as "operator config doesn't exist", and this sporadically happened in some real job, but in the example @jluhrsen linked to, the pod logs have been deleted now so there's no further information.

Anyway, I think we should remove the debouncing for this case, reproduce the failure that led to Jamo adding this check, and then figure out the right fix for it (because it may actually indicate a real problem with the cluster which we are just masking by delaying the degraded status).

Sounds good.

(you can remove operConfigMissing now too)

danwinship · 2026-05-20T15:54:52Z

/assign @jluhrsen

we can test this to try to reproduce the original failure here by doing payload jobs, can't we?

The debounce timer for the OperatorConfig status level was being incorrectly cleared when the operator config exists. The code was using the OperatorConfig StatusLevel internally to track the debounce timer for when the operator config is missing. However it also blindly cleared the timer when the operator config exists, which is the normal case, potentially interfering with a caller's usage of the OperatorConfig StatusLevel. This commit removes the debouncing for this case as the operconfig should never actually be missing so it was likely some other error that occurred that facilitated adding the debounce. Since it is unknown what caused the original scenario, we shouldn't assume that delaying the degraded status is the correct solution. Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

openshift-ci · 2026-05-21T00:04:45Z

@tpantelis: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn-serial-2of2	`1c1868b`	link	true	`/test e2e-aws-ovn-serial-2of2`
ci/prow/e2e-gcp-ovn	`1c1868b`	link	true	`/test e2e-gcp-ovn`
ci/prow/security	`1c1868b`	link	false	`/test security`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 18, 2026

openshift-ci Bot requested review from martinkennelly and miheer May 18, 2026 13:59

danwinship reviewed May 20, 2026

View reviewed changes

tpantelis force-pushed the sm_debounce_bug branch from 62ad0d0 to 014baef Compare May 20, 2026 14:56

openshift-ci Bot assigned jluhrsen May 20, 2026

tpantelis force-pushed the sm_debounce_bug branch from 014baef to 1c1868b Compare May 20, 2026 18:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CORENET-7154: Fix debounce timer for OperatorConfig level being incorrectly cleared#3011

CORENET-7154: Fix debounce timer for OperatorConfig level being incorrectly cleared#3011
tpantelis wants to merge 1 commit into
openshift:masterfrom
tpantelis:sm_debounce_bug

tpantelis commented May 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

openshift-ci-robot commented May 18, 2026 •

edited by openshift-ci Bot

Loading

Uh oh!

coderabbitai Bot commented May 18, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

openshift-ci Bot commented May 18, 2026

Uh oh!

danwinship commented May 19, 2026

Uh oh!

tpantelis commented May 19, 2026

Uh oh!

danwinship May 20, 2026

Uh oh!

tpantelis May 20, 2026

Uh oh!

danwinship May 20, 2026

Uh oh!

tpantelis May 20, 2026

Uh oh!

danwinship commented May 20, 2026

Uh oh!

openshift-ci Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tpantelis commented May 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented May 18, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

openshift-ci Bot commented May 18, 2026

Uh oh!

danwinship commented May 19, 2026

Uh oh!

tpantelis commented May 19, 2026

Uh oh!

danwinship May 20, 2026

Choose a reason for hiding this comment

Uh oh!

tpantelis May 20, 2026

Choose a reason for hiding this comment

Uh oh!

danwinship May 20, 2026

Choose a reason for hiding this comment

Uh oh!

tpantelis May 20, 2026

Choose a reason for hiding this comment

Uh oh!

danwinship commented May 20, 2026

Uh oh!

openshift-ci Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tpantelis commented May 18, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented May 18, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented May 18, 2026 •

edited

Loading