OCPBUGS-56274: add datacenter consistency check by RomanBednar · Pull Request #212 · openshift/vsphere-problem-detector

RomanBednar · 2026-02-19T15:41:01Z

When using zonal deployments of vSphere with OpenShift, if a datacenter referenced by a failure domain in the Infrastructure CR (infrastructure.config.openshift.io/cluster) is missing from the cloud provider config (cloud-provider-config ConfigMap in openshift-config), the CSI driver silently fails to find VMs in that zone, causing the cluster to degrade. The vSphere Problem Detector (VPD) had no check to detect this misconfiguration. This fix adds a new cluster-level check, CheckDatacenterConsistency, that compares each failure domain's required datacenter against the datacenters listed in the parsed cloud.conf (ctx.VMConfig.Config.VirtualCenter[server].Datacenters). When a datacenter is absent, VPD emits a WARNING naming the missing datacenter, the affected failure domain, and instructs the administrator to update the cloud-provider-config ConfigMap in the openshift-config namespace.

Cluster Setup

Two failure domains configured:

us-east-1 → datacenter nested-devqedatacenter-1
us-west-1 → datacenter nested-devqedatacenter-2

Both on vCenter 232-15-184-10.in-addr.arpa.

Simulating the Bug

The datacenter nested-devqedatacenter-2 was removed from cloud-provider-config:

# Edit cloud-provider-config to remove nested-devqedatacenter-2
oc -n openshift-config edit configmap cloud-provider-config
# Changed: datacenters = nested-devqedatacenter-1,nested-devqedatacenter-2
# To:      datacenters = nested-devqedatacenter-1

# Verified propagation to vsphere-csi-config-secret:
oc -n openshift-cluster-csi-drivers get secret/vsphere-csi-config-secret \
  -o jsonpath='{.data.cloud\.conf}' | base64 -d
# Output confirmed: datacenters = nested-devqedatacenter-1

Unpatched Behaviour (openshift/main)

export KUBECONFIG=/Users/MAC/openshift/clusters/vsphere/cluster-01/auth/kubeconfig
git checkout openshift/main && make
./vsphere-problem-detector start -v 5 \
  --kubeconfig=$KUBECONFIG \
  --namespace=openshift-cluster-storage-operator

Relevant log lines:

I0219 16:17:18.909862   17481 infra_config.go:15] Checking infrastructure and cloud provider config for consistency.
I0219 16:17:18.909897   17481 vsphere_check.go:302] CheckInfraConfig passed
I0219 16:17:24.169406   17481 vsphere_check.go:109] Finished running all vSphere specific checks in the cluster
I0219 16:17:24.307163   17481 event.go:377] ... type: 'Normal' reason: 'SucceededVSphereCheckInfraConfig' Check succeeded

No warning or error about the missing datacenter nested-devqedatacenter-2.

Patched Behaviour (OCPBUGS-56274)

git checkout OCPBUGS-56274 && make
./vsphere-problem-detector start -v 5 \
  --kubeconfig=$KUBECONFIG \
  --namespace=openshift-cluster-storage-operator

Relevant log lines:

I0219 16:23:24.680681   32885 datacenter_consistency.go:16] Checking datacenter consistency between failure domains and cloud provider config.
W0219 16:23:24.680821   32885 datacenter_consistency.go:50] Datacenter-Consistency: failure domain "us-west-1" (infrastructure.config.openshift.io/cluster) requires datacenter "nested-devqedatacenter-2" on vCenter "232-15-184-10.in-addr.arpa", but it is not listed in the cloud provider config (datacenters = "nested-devqedatacenter-1" in vsphere-csi-config-secret, namespace openshift-cluster-csi-drivers). Add "nested-devqedatacenter-2" to the datacenters list in the cloud-provider-config ConfigMap in the openshift-config namespace.
I0219 16:23:24.680835   32885 vsphere_check.go:299] CheckDatacenterConsistency failed: Datacenter-Consistency: failure domain "us-west-1" ...
I0219 16:23:30.292865   32885 event.go:377] ... type: 'Warning' reason: 'FailedVSphereCheckDatacenterConsistency' Datacenter-Consistency: failure domain "us-west-1" (infrastructure.config.openshift.io/cluster) requires datacenter "nested-devqedatacenter-2" on vCenter "232-15-184-10.in-addr.arpa" ...

WARNING emitted, explicitly naming nested-devqedatacenter-2 as missing, with remediation instructions.

Summary by CodeRabbit

New Features
- Added a validation that ensures vSphere failure domains reference datacenters present in the configured vCenter settings; emits warnings and returns an error when mismatches are found.
Tests
- Added comprehensive tests covering legacy behavior, single/multi-vCenter setups, missing datacenters, and datacenter-list parsing.

openshift-ci-robot · 2026-02-19T15:41:07Z

@RomanBednar: This pull request references Jira Issue OCPBUGS-56274, which is invalid:

expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

When using zonal deployments of vSphere with OpenShift, if a datacenter referenced by a failure domain in the Infrastructure CR (infrastructure.config.openshift.io/cluster) is missing from the cloud provider config (cloud-provider-config ConfigMap in openshift-config), the CSI driver silently fails to find VMs in that zone, causing the cluster to degrade. The vSphere Problem Detector (VPD) had no check to detect this misconfiguration. This fix adds a new cluster-level check, CheckDatacenterConsistency, that compares each failure domain's required datacenter against the datacenters listed in the parsed cloud.conf (ctx.VMConfig.Config.VirtualCenter[server].Datacenters). When a datacenter is absent, VPD emits a WARNING naming the missing datacenter, the affected failure domain, and instructs the administrator to update the cloud-provider-config ConfigMap in the openshift-config namespace.

Cluster Setup

Two failure domains configured:

us-east-1 → datacenter nested-devqedatacenter-1

us-west-1 → datacenter nested-devqedatacenter-2

Both on vCenter 232-15-184-10.in-addr.arpa.

Simulating the Bug

The datacenter nested-devqedatacenter-2 was removed from cloud-provider-config:
# Edit cloud-provider-config to remove nested-devqedatacenter-2
oc -n openshift-config edit configmap cloud-provider-config
# Changed: datacenters = nested-devqedatacenter-1,nested-devqedatacenter-2
# To:      datacenters = nested-devqedatacenter-1

# Verified propagation to vsphere-csi-config-secret:
oc -n openshift-cluster-csi-drivers get secret/vsphere-csi-config-secret \
 -o jsonpath='{.data.cloud\.conf}' | base64 -d
# Output confirmed: datacenters = nested-devqedatacenter-1
Unpatched Behaviour (openshift/main)
export KUBECONFIG=/Users/MAC/openshift/clusters/vsphere/cluster-01/auth/kubeconfig
git checkout openshift/main && make
./vsphere-problem-detector start -v 5 \
 --kubeconfig=$KUBECONFIG \
 --namespace=openshift-cluster-storage-operator
Relevant log lines:
I0219 16:17:18.909862   17481 infra_config.go:15] Checking infrastructure and cloud provider config for consistency.
I0219 16:17:18.909897   17481 vsphere_check.go:302] CheckInfraConfig passed
I0219 16:17:24.169406   17481 vsphere_check.go:109] Finished running all vSphere specific checks in the cluster
I0219 16:17:24.307163   17481 event.go:377] ... type: 'Normal' reason: 'SucceededVSphereCheckInfraConfig' Check succeeded
No warning or error about the missing datacenter nested-devqedatacenter-2.

Patched Behaviour (OCPBUGS-56274)
git checkout OCPBUGS-56274 && make
./vsphere-problem-detector start -v 5 \
 --kubeconfig=$KUBECONFIG \
 --namespace=openshift-cluster-storage-operator
Relevant log lines:
I0219 16:23:24.680681   32885 datacenter_consistency.go:16] Checking datacenter consistency between failure domains and cloud provider config.
W0219 16:23:24.680821   32885 datacenter_consistency.go:50] Datacenter-Consistency: failure domain "us-west-1" (infrastructure.config.openshift.io/cluster) requires datacenter "nested-devqedatacenter-2" on vCenter "232-15-184-10.in-addr.arpa", but it is not listed in the cloud provider config (datacenters = "nested-devqedatacenter-1" in vsphere-csi-config-secret, namespace openshift-cluster-csi-drivers). Add "nested-devqedatacenter-2" to the datacenters list in the cloud-provider-config ConfigMap in the openshift-config namespace.
I0219 16:23:24.680835   32885 vsphere_check.go:299] CheckDatacenterConsistency failed: Datacenter-Consistency: failure domain "us-west-1" ...
I0219 16:23:30.292865   32885 event.go:377] ... type: 'Warning' reason: 'FailedVSphereCheckDatacenterConsistency' Datacenter-Consistency: failure domain "us-west-1" (infrastructure.config.openshift.io/cluster) requires datacenter "nested-devqedatacenter-2" on vCenter "232-15-184-10.in-addr.arpa" ...
WARNING emitted, explicitly naming nested-devqedatacenter-2 as missing, with remediation instructions.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-02-19T15:42:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RomanBednar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [RomanBednar]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

RomanBednar · 2026-02-19T15:43:22Z

/jira refresh

openshift-ci-robot · 2026-02-19T15:43:31Z

@RomanBednar: This pull request references Jira Issue OCPBUGS-56274, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.22.0) matches configured target version for branch (4.22.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (wduan@redhat.com), skipping review request.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

RomanBednar · 2026-03-09T09:27:04Z

/assign @gnufied

For review.

RomanBednar · 2026-04-14T12:25:06Z

@coderabbitai review

coderabbitai · 2026-04-14T12:25:14Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-04-14T12:25:25Z

Walkthrough

Adds a new cluster-level check, CheckDatacenterConsistency, that fetches the Infrastructure CR and validates each vSphere failure domain's referenced datacenter against configured datacenters in the cloud provider config and the Infrastructure vCenters list, reporting mismatches as accumulated errors.

Changes

Cohort / File(s)	Summary
Datacenter consistency check `pkg/check/datacenter_consistency.go`	New exported function `CheckDatacenterConsistency(ctx *CheckContext) error` that: fetches `Infrastructure`; skips legacy/no-failure-domain cases; for each vSphere failure domain looks up the vCenter entry in the cloud config, parses its `Datacenters` string (`parseDatacenters`), compares required datacenter presence, logs warnings and accumulates errors; performs a second pass comparing against `infra.Spec.PlatformSpec.VSphere.VCenters`.
Tests `pkg/check/datacenter_consistency_test.go`	New table-driven tests for `CheckDatacenterConsistency` covering legacy/no-FD, successful validations (single/multi vCenter, ini/yaml variants), missing datacenters, unknown vCenter entries, and unit tests for `parseDatacenters` parsing/trimming behavior.
Check registration `pkg/check/interface.go`	Added `"CheckDatacenterConsistency": CheckDatacenterConsistency` to `DefaultClusterChecks` map (formatting/alignment adjusted).

Sequence Diagram(s)

sequenceDiagram
  participant Runner as Runner
  participant Check as CheckDatacenterConsistency
  participant Kube as KubeClient
  participant Config as CloudConfig
  participant Infra as Infrastructure
  participant Logger as Logger/ErrorAccum

  Runner->>Check: invoke CheckDatacenterConsistency(ctx)
  Check->>Kube: GetInfrastructure(ctx)
  Kube-->>Check: Infrastructure (or error)
  alt fetch error
    Check->>Logger: log error
    Check-->>Runner: return error
  else infra fetched
    Check->>Infra: read PlatformSpec.VSphere / FailureDomains
    alt no vSphere or no FailureDomains
      Check->>Logger: debug skip
      Check-->>Runner: return nil
    else have failure domains
      loop for each FailureDomain
        Check->>Config: lookup cfg.VirtualCenter[fd.Server]
        alt config entry missing
          Check->>Logger: debug skip for that fd
        else entry present
          Check->>Check: parseDatacenters(entry.Datacenters)
          Check->>Check: compare parsed list with fd.Topology.Datacenter
          alt mismatch
            Check->>Logger: warn + append error
          end
        end
      end
      loop second pass for each FailureDomain
        Check->>Infra: lookup matched vCenter in infra.Spec.PlatformSpec.VSphere.VCenters
        Check->>Check: compare infra vcenter.datacenters with fd.Topology.Datacenter
        alt mismatch
          Check->>Logger: warn + append error
        end
      end
      Check-->>Runner: return joined errors if any, else nil
    end
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (11 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: adding a new datacenter consistency check to the vSphere Problem Detector.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	The pull request uses standard Go testing functions without Ginkgo, making the Ginkgo test name stability check not applicable.
Test Structure And Quality	✅ Passed	Test code demonstrates strong structure following Go patterns with table-driven subtests, proper setup/cleanup, appropriate timeouts, and consistency with codebase patterns.
Microshift Test Compatibility	✅ Passed	PR adds standard Go unit tests using testing.T framework, not Ginkgo e2e tests, so MicroShift Test Compatibility check does not apply.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	PR adds standard Go unit tests with testing.T package, not Ginkgo e2e tests targeted by this check.
Topology-Aware Scheduling Compatibility	✅ Passed	The PR adds only a diagnostic validation function that checks infrastructure configuration consistency; it does not introduce deployment manifests, workloads, or scheduling constraints.
Ote Binary Stdout Contract	✅ Passed	Code analysis reveals no process-level stdout writes in new/modified files violating OTE Binary Stdout Contract.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	Tests use standard Go unit testing with mocking, no IPv4 hardcodes, no external connectivity requirements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci-robot · 2026-04-14T12:26:38Z

@RomanBednar: This pull request references Jira Issue OCPBUGS-56274, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.22.0) matches configured target version for branch (4.22.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (wduan@redhat.com), skipping review request.

Details

In response to this:

When using zonal deployments of vSphere with OpenShift, if a datacenter referenced by a failure domain in the Infrastructure CR (infrastructure.config.openshift.io/cluster) is missing from the cloud provider config (cloud-provider-config ConfigMap in openshift-config), the CSI driver silently fails to find VMs in that zone, causing the cluster to degrade. The vSphere Problem Detector (VPD) had no check to detect this misconfiguration. This fix adds a new cluster-level check, CheckDatacenterConsistency, that compares each failure domain's required datacenter against the datacenters listed in the parsed cloud.conf (ctx.VMConfig.Config.VirtualCenter[server].Datacenters). When a datacenter is absent, VPD emits a WARNING naming the missing datacenter, the affected failure domain, and instructs the administrator to update the cloud-provider-config ConfigMap in the openshift-config namespace.

Cluster Setup

Two failure domains configured:

us-east-1 → datacenter nested-devqedatacenter-1

us-west-1 → datacenter nested-devqedatacenter-2

Both on vCenter 232-15-184-10.in-addr.arpa.

Simulating the Bug

The datacenter nested-devqedatacenter-2 was removed from cloud-provider-config:
# Edit cloud-provider-config to remove nested-devqedatacenter-2
oc -n openshift-config edit configmap cloud-provider-config
# Changed: datacenters = nested-devqedatacenter-1,nested-devqedatacenter-2
# To:      datacenters = nested-devqedatacenter-1

# Verified propagation to vsphere-csi-config-secret:
oc -n openshift-cluster-csi-drivers get secret/vsphere-csi-config-secret \
 -o jsonpath='{.data.cloud\.conf}' | base64 -d
# Output confirmed: datacenters = nested-devqedatacenter-1
Unpatched Behaviour (openshift/main)
export KUBECONFIG=/Users/MAC/openshift/clusters/vsphere/cluster-01/auth/kubeconfig
git checkout openshift/main && make
./vsphere-problem-detector start -v 5 \
 --kubeconfig=$KUBECONFIG \
 --namespace=openshift-cluster-storage-operator
Relevant log lines:
I0219 16:17:18.909862   17481 infra_config.go:15] Checking infrastructure and cloud provider config for consistency.
I0219 16:17:18.909897   17481 vsphere_check.go:302] CheckInfraConfig passed
I0219 16:17:24.169406   17481 vsphere_check.go:109] Finished running all vSphere specific checks in the cluster
I0219 16:17:24.307163   17481 event.go:377] ... type: 'Normal' reason: 'SucceededVSphereCheckInfraConfig' Check succeeded
No warning or error about the missing datacenter nested-devqedatacenter-2.

Patched Behaviour (OCPBUGS-56274)
git checkout OCPBUGS-56274 && make
./vsphere-problem-detector start -v 5 \
 --kubeconfig=$KUBECONFIG \
 --namespace=openshift-cluster-storage-operator
Relevant log lines:
I0219 16:23:24.680681   32885 datacenter_consistency.go:16] Checking datacenter consistency between failure domains and cloud provider config.
W0219 16:23:24.680821   32885 datacenter_consistency.go:50] Datacenter-Consistency: failure domain "us-west-1" (infrastructure.config.openshift.io/cluster) requires datacenter "nested-devqedatacenter-2" on vCenter "232-15-184-10.in-addr.arpa", but it is not listed in the cloud provider config (datacenters = "nested-devqedatacenter-1" in vsphere-csi-config-secret, namespace openshift-cluster-csi-drivers). Add "nested-devqedatacenter-2" to the datacenters list in the cloud-provider-config ConfigMap in the openshift-config namespace.
I0219 16:23:24.680835   32885 vsphere_check.go:299] CheckDatacenterConsistency failed: Datacenter-Consistency: failure domain "us-west-1" ...
I0219 16:23:30.292865   32885 event.go:377] ... type: 'Warning' reason: 'FailedVSphereCheckDatacenterConsistency' Datacenter-Consistency: failure domain "us-west-1" (infrastructure.config.openshift.io/cluster) requires datacenter "nested-devqedatacenter-2" on vCenter "232-15-184-10.in-addr.arpa" ...
WARNING emitted, explicitly naming nested-devqedatacenter-2 as missing, with remediation instructions.

Summary by CodeRabbit

New Features

Added a new validation check that ensures vSphere Infrastructure failure domains are properly configured in the cloud provider settings.

Tests

Added comprehensive test coverage for the datacenter consistency validation.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/check/datacenter_consistency_test.go`:
- Around line 103-104: Save the original value of the global timeout
(util.Timeout) before mutating it in the subtest, then restore it after the
subtest finishes (e.g., store orig := *util.Timeout and use defer to set
*util.Timeout = orig) so changes around setting *util.Timeout = time.Second in
the test that calls CheckDatacenterConsistency(ctx) do not leak to other tests;
apply this restore pattern around where you modify util.Timeout in
pkg/check/datacenter_consistency_test.go.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: cc2fd6fa-3677-4457-8bdf-7479823f7f60

📥 Commits

Reviewing files that changed from the base of the PR and between 36a0ee6 and 5039e0d.

📒 Files selected for processing (3)

pkg/check/datacenter_consistency.go
pkg/check/datacenter_consistency_test.go
pkg/check/interface.go

openshift-ci-robot · 2026-04-21T12:57:50Z

@RomanBednar: This pull request references Jira Issue OCPBUGS-56274, which is invalid:

expected the bug to target either version "5.0." or "openshift-5.0.", but it targets "4.22.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

When using zonal deployments of vSphere with OpenShift, if a datacenter referenced by a failure domain in the Infrastructure CR (infrastructure.config.openshift.io/cluster) is missing from the cloud provider config (cloud-provider-config ConfigMap in openshift-config), the CSI driver silently fails to find VMs in that zone, causing the cluster to degrade. The vSphere Problem Detector (VPD) had no check to detect this misconfiguration. This fix adds a new cluster-level check, CheckDatacenterConsistency, that compares each failure domain's required datacenter against the datacenters listed in the parsed cloud.conf (ctx.VMConfig.Config.VirtualCenter[server].Datacenters). When a datacenter is absent, VPD emits a WARNING naming the missing datacenter, the affected failure domain, and instructs the administrator to update the cloud-provider-config ConfigMap in the openshift-config namespace.

Cluster Setup

Two failure domains configured:

us-east-1 → datacenter nested-devqedatacenter-1

us-west-1 → datacenter nested-devqedatacenter-2

Both on vCenter 232-15-184-10.in-addr.arpa.

Simulating the Bug

The datacenter nested-devqedatacenter-2 was removed from cloud-provider-config:
# Edit cloud-provider-config to remove nested-devqedatacenter-2
oc -n openshift-config edit configmap cloud-provider-config
# Changed: datacenters = nested-devqedatacenter-1,nested-devqedatacenter-2
# To:      datacenters = nested-devqedatacenter-1

# Verified propagation to vsphere-csi-config-secret:
oc -n openshift-cluster-csi-drivers get secret/vsphere-csi-config-secret \
 -o jsonpath='{.data.cloud\.conf}' | base64 -d
# Output confirmed: datacenters = nested-devqedatacenter-1
Unpatched Behaviour (openshift/main)
export KUBECONFIG=/Users/MAC/openshift/clusters/vsphere/cluster-01/auth/kubeconfig
git checkout openshift/main && make
./vsphere-problem-detector start -v 5 \
 --kubeconfig=$KUBECONFIG \
 --namespace=openshift-cluster-storage-operator
Relevant log lines:
I0219 16:17:18.909862   17481 infra_config.go:15] Checking infrastructure and cloud provider config for consistency.
I0219 16:17:18.909897   17481 vsphere_check.go:302] CheckInfraConfig passed
I0219 16:17:24.169406   17481 vsphere_check.go:109] Finished running all vSphere specific checks in the cluster
I0219 16:17:24.307163   17481 event.go:377] ... type: 'Normal' reason: 'SucceededVSphereCheckInfraConfig' Check succeeded
No warning or error about the missing datacenter nested-devqedatacenter-2.

Patched Behaviour (OCPBUGS-56274)
git checkout OCPBUGS-56274 && make
./vsphere-problem-detector start -v 5 \
 --kubeconfig=$KUBECONFIG \
 --namespace=openshift-cluster-storage-operator
Relevant log lines:
I0219 16:23:24.680681   32885 datacenter_consistency.go:16] Checking datacenter consistency between failure domains and cloud provider config.
W0219 16:23:24.680821   32885 datacenter_consistency.go:50] Datacenter-Consistency: failure domain "us-west-1" (infrastructure.config.openshift.io/cluster) requires datacenter "nested-devqedatacenter-2" on vCenter "232-15-184-10.in-addr.arpa", but it is not listed in the cloud provider config (datacenters = "nested-devqedatacenter-1" in vsphere-csi-config-secret, namespace openshift-cluster-csi-drivers). Add "nested-devqedatacenter-2" to the datacenters list in the cloud-provider-config ConfigMap in the openshift-config namespace.
I0219 16:23:24.680835   32885 vsphere_check.go:299] CheckDatacenterConsistency failed: Datacenter-Consistency: failure domain "us-west-1" ...
I0219 16:23:30.292865   32885 event.go:377] ... type: 'Warning' reason: 'FailedVSphereCheckDatacenterConsistency' Datacenter-Consistency: failure domain "us-west-1" (infrastructure.config.openshift.io/cluster) requires datacenter "nested-devqedatacenter-2" on vCenter "232-15-184-10.in-addr.arpa" ...
WARNING emitted, explicitly naming nested-devqedatacenter-2 as missing, with remediation instructions.

Summary by CodeRabbit

New Features

Added a validation that ensures vSphere failure domains reference datacenters present in the configured vCenter settings; emits warnings and returns an error when mismatches are found.

Tests

Added comprehensive tests covering legacy behavior, single/multi-vCenter setups, missing datacenters, and datacenter-list parsing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/check/datacenter_consistency.go`:
- Around line 57-62: The error constructed into err via fmt.Errorf in
datacenter_consistency.go ends the message with a trailing period which violates
Go's ST1005 rule; update the format string in that fmt.Errorf (the one
referencing fd.Name, fd.Topology.Datacenter, fd.Server and vc.Datacenters) to
remove the final punctuation so the error string does not end with a period
(ensure the rest of the message and argument ordering remain unchanged).
- Around line 41-47: The fmt.Errorf call that builds the error (assigned to err)
in datacenter_consistency.go ends the formatted message with a trailing period
("%s namespace."); remove the final punctuation so the error string does not end
with a period — update the fmt.Errorf format string (the call that references
fd.Name, fd.Topology.Datacenter, fd.Server, vcConfig.Datacenters,
fd.Topology.Datacenter, util.CloudConfigNamespace) to omit the trailing "." at
the end of the message.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 40e991a5-bede-4476-934d-bbc167071ca2

📥 Commits

Reviewing files that changed from the base of the PR and between 5039e0d and 9b88326.

📒 Files selected for processing (3)

pkg/check/datacenter_consistency.go
pkg/check/datacenter_consistency_test.go
pkg/check/interface.go

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/check/interface.go

RomanBednar · 2026-04-21T14:08:16Z

/hold

Need to get an env and test this first.

gnufied · 2026-04-29T19:33:08Z

@RomanBednar is this PR fixed now?

RomanBednar · 2026-05-13T11:32:25Z

/retest

RomanBednar · 2026-05-13T11:36:57Z

@gnufied The infa check was not included in the original spec, I had to reconfigure the zonal cluster, add a test case for it and retest everything; it is ready for review now. Here's the test for missing DC in infra object:

# 1. Backup Infrastructure CR
oc get infrastructure cluster -o yaml > /tmp/infrastructure-backup.yaml

# 2. Remove nested-devqedatacenter-2 from vcenters
oc patch infrastructure cluster --type=json \
  -p='[{"op":"replace","path":"/spec/platformSpec/vsphere/vcenters/0/datacenters","value":["nested-devqedatacenter-1"]}]'

# 3. Verify
oc get infrastructure cluster -o jsonpath='{.spec.platformSpec.vsphere.vcenters[0].datacenters}'
# Output: ["nested-devqedatacenter-1"]

# 4-5. Restart VPD and check logs
oc -n openshift-cluster-storage-operator delete pod -l name=vsphere-problem-detector-operator
oc -n openshift-cluster-storage-operator wait --for=condition=Ready pod \
  -l name=vsphere-problem-detector-operator --timeout=120s
# (waited 45s)
oc -n openshift-cluster-storage-operator logs deployment/vsphere-problem-detector-operator \
  | grep -i "datacenter.consistency\|CheckDatacenterConsistency"

Log output:

I0513 11:16:04.727922  1 datacenter_consistency.go:16] Checking datacenter consistency between failure domains and cloud provider config.
W0513 11:16:04.727951  1 datacenter_consistency.go:64] datacenter-Consistency: failure domain "us-west-1" references datacenter "nested-devqedatacenter-2" on vCenter "197-15-184-10.in-addr.arpa", but it is not listed in the vcenters section of infrastructure.config.openshift.io/cluster (datacenters = [nested-devqedatacenter-1]), add "nested-devqedatacenter-2" to the datacenters list for vCenter "197-15-184-10.in-addr.arpa" in the Infrastructure CR
I0513 11:16:04.728127  1 vsphere_check.go:299] CheckDatacenterConsistency failed: ...
I0513 11:16:05.142746  1 event.go:377] ... type: 'Warning' reason: 'FailedVSphereCheckDatacenterConsistency' ...

RomanBednar · 2026-05-13T11:41:25Z

/jira refresh

openshift-ci-robot · 2026-05-13T11:41:29Z

@RomanBednar: This pull request references Jira Issue OCPBUGS-56274, which is invalid:

expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

RomanBednar · 2026-05-13T11:41:31Z

/unhold

openshift-ci · 2026-05-13T11:44:13Z

@RomanBednar: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

RomanBednar · 2026-05-13T11:53:16Z

/jira refresh

openshift-ci-robot · 2026-05-13T11:53:22Z

@RomanBednar: This pull request references Jira Issue OCPBUGS-56274, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (wduan@redhat.com), skipping review request.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

RomanBednar · 2026-05-13T11:53:39Z

@coderabbitai review

coderabbitai · 2026-05-13T11:53:45Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Feb 19, 2026

openshift-ci Bot requested review from dfajmon and mpatlasov February 19, 2026 15:41

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 19, 2026

openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Feb 19, 2026

RomanBednar force-pushed the OCPBUGS-56274 branch from c4dfdec to 5039e0d Compare February 19, 2026 15:54

openshift-ci Bot assigned gnufied Mar 9, 2026

coderabbitai Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread pkg/check/datacenter_consistency_test.go

gnufied reviewed Apr 17, 2026

View reviewed changes

Comment thread pkg/check/datacenter_consistency.go

add datacenter consistency check

9b88326

RomanBednar force-pushed the OCPBUGS-56274 branch from 5039e0d to 9b88326 Compare April 21, 2026 12:53

openshift-ci-robot added jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. and removed jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Apr 21, 2026

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread pkg/check/datacenter_consistency.go

Comment thread pkg/check/datacenter_consistency.go

openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 21, 2026

openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 13, 2026

openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 13, 2026

Conversation

RomanBednar commented Feb 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cluster Setup

Simulating the Bug

Unpatched Behaviour (openshift/main)

Patched Behaviour (OCPBUGS-56274)

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented Feb 19, 2026

Cluster Setup

Simulating the Bug

Unpatched Behaviour (openshift/main)

Patched Behaviour (OCPBUGS-56274)

Uh oh!

openshift-ci Bot commented Feb 19, 2026

Uh oh!

RomanBednar commented Feb 19, 2026

Uh oh!

openshift-ci-robot commented Feb 19, 2026

Uh oh!

RomanBednar commented Mar 9, 2026

Uh oh!

RomanBednar commented Apr 14, 2026

Uh oh!

coderabbitai Bot commented Apr 14, 2026

Uh oh!

coderabbitai Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

openshift-ci-robot commented Apr 14, 2026

Cluster Setup

Simulating the Bug

Unpatched Behaviour (openshift/main)

Patched Behaviour (OCPBUGS-56274)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

openshift-ci-robot commented Apr 21, 2026

Cluster Setup

Simulating the Bug

Unpatched Behaviour (openshift/main)

Patched Behaviour (OCPBUGS-56274)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

RomanBednar commented Apr 21, 2026

Uh oh!

gnufied commented Apr 29, 2026

Uh oh!

RomanBednar commented May 13, 2026

Uh oh!

RomanBednar commented May 13, 2026

Uh oh!

RomanBednar commented May 13, 2026

Uh oh!

openshift-ci-robot commented May 13, 2026

Uh oh!

RomanBednar commented May 13, 2026

Uh oh!

openshift-ci Bot commented May 13, 2026

Uh oh!

RomanBednar commented May 13, 2026

Uh oh!

openshift-ci-robot commented May 13, 2026

Uh oh!

RomanBednar commented May 13, 2026

RomanBednar commented Feb 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading