Skip to content

[SREP-4823] Allow CMO config overrides#2743

Open
bergmannf wants to merge 3 commits into
openshift:masterfrom
bergmannf:cmo-config-overrides
Open

[SREP-4823] Allow CMO config overrides#2743
bergmannf wants to merge 3 commits into
openshift:masterfrom
bergmannf:cmo-config-overrides

Conversation

@bergmannf
Copy link
Copy Markdown
Contributor

@bergmannf bergmannf commented May 6, 2026

What type of PR is this?

feature

What this PR does / why we need it?

This PR would allow modifying the monitoring configurtion per customer/cluster (anything that can be filtered in a selectorsyncset).

This is currently just a proof of concept that would allow us to push modified configurations more granular.

Which Jira/Github issue(s) this PR fixes?

Fixes #

Special notes for your reviewer:

Pre-checks (if applicable):

  • Tested latest changes against a cluster

  • Included documentation changes with PR

  • If this is a new object that is not intended for the FedRAMP environment (if unsure, please reach out to team FedRAMP), please exclude it with:

    matchExpressions:
    - key: api.openshift.com/fedramp
      operator: NotIn
      values: ["true"]

Summary by CodeRabbit

  • New Features

    • Added cluster monitoring configuration variants for OpenShift 4.5–4.16, supporting User Workload Monitoring (UWM)-enabled and non-UWM clusters, FedRAMP compliance, and management clusters.
    • New extended retention configuration example for organization-specific customizations.
  • Refactor

    • Modernized configuration generation system to use a modular, variant-driven architecture for improved maintainability and scalability.

bergmannf added 3 commits May 5, 2026 11:30
Extend generate-cmo-config.py to support per-cluster or per-organization
CMO config overrides via declarative YAML files in
resources/cluster-monitoring-config/overrides/.

Key changes:

1. Move selector definitions from static deploy/*/config.yaml files to a
   centralized resources/cluster-monitoring-config/selectors.yaml. The
   script now generates all config.yaml files, enabling automatic
   injection of override exclusions.

2. Override files define a target (cluster ID or organization), which
   deploy trees they apply to (uwm/non-uwm/fedramp/both), and config
   overrides that are deep-merged onto the base monitoring config.

3. For each override, the script:
   - Adds NotIn exclusions to all affected default config.yaml files,
     preventing targeted clusters from receiving duplicate configs
   - Creates override subdirectories with In selectors and merged
     ConfigMap outputs

4. Without any override files present, all outputs remain semantically
   identical to the previous static config.yaml files (formatting only).

See resources/cluster-monitoring-config/overrides/EXAMPLE.yaml.disabled
for the override file format and documentation.
…tory

Replace the three separate configuration sources (selectors.yaml,
variants.yaml, overrides/*.yaml) with a single variants/ directory where
every file is self-contained: it carries both its selector (which clusters
it targets) and its config transformations (what monitoring config those
clusters get).

Base variants have a 'selector' field. Override variants have a 'parent'
and 'target' field — they specialize a base variant for specific clusters
or organizations. The script automatically adds NotIn exclusions on the
parent's selector so targeted clusters don't receive conflicting configs.

The script is simplified from 309 to 215 lines. All 20 generated output
files (10 ConfigMaps + 10 config.yaml) are byte-identical to before.

Replace generic example with concrete extended-retention override example

Replace EXAMPLE.yaml.disabled with a focused, realistic example
(50-extended-retention-org.yaml.example) that demonstrates extending
Prometheus retention and storage for a specific organization.

The .example suffix is naturally excluded by the *.yaml glob pattern.
The previous 'target' abstraction (type: organization/cluster, values: [...])
hid what people already know — SelectorSyncSet matchExpressions. Overrides
now specify matchExpressions directly, which:

- Uses the same syntax as base variant selectors (no new concepts)
- Supports any label key, not just cluster ID and org ID
- Supports all operators: In, NotIn, Exists, DoesNotExist

The script auto-negates each override expression (In<->NotIn,
Exists<->DoesNotExist) on the parent variant's selector to ensure
mutual exclusion.
@bergmannf
Copy link
Copy Markdown
Contributor Author

/hold

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

Caution

Review failed

Failed to post review comments

Walkthrough

This PR refactors cluster monitoring configuration from a hard-coded Python generation script into a modular, variant-driven system. It adds new variant configuration files for UWM and non-UWM deployments across OpenShift versions and reformats existing deploy configuration files to multiline YAML syntax for consistency.

Changes

Cluster Monitoring Configuration Variant System

Layer / File(s) Summary
Script Infrastructure
scripts/generate-cmo-config.py (lines 24-36)
New path constants (BASE_DIR, RESOURCES_DIR, DEPLOY_DIR, INPUT_FILE_PATH, VARIANTS_DIR) and operator negation mapping (NEGATE_OPERATOR) established to support the variant system.
Script Helpers
scripts/generate-cmo-config.py (lines 60-81)
Helper functions remove_dotted_key() and negate_expression() added to support dotted-key deletion and automatic expression negation for variant logic.
Variant Loading & Validation
scripts/generate-cmo-config.py (lines 85-143)
Functions load_base_config() and load_variants() implemented to load base monitoring config and validate variant definitions, checking for required fields and operator constraints.
Configuration Transformation
scripts/generate-cmo-config.py (lines 147-186)
Functions apply_transformations(), build_configmap(), and write_yaml() added to apply variant overrides, wrap configs into ConfigMaps, and persist outputs.
Generation Orchestration
scripts/generate-cmo-config.py (lines 190-279)
Core generate() function and main() entrypoint orchestrate variant indexing, grouping, negation, and per-variant ConfigMap/selector emission; replaced ad-hoc hard-coded calls with scalable variant-driven flow.
Variant Definitions
resources/cluster-monitoring-config/variants/0*.yaml, 1*.yaml, 2*.yaml, 5*.yaml.example
Eight new UWM variants and six non-UWM variants (plus one FedRAMP and one example) added, defining deployment selectors, feature toggles (enableUserWorkload), Prometheus retention, and config key mappings for OpenShift versions 4.5–4.16+.
Deploy Configuration Reformatting
deploy/cluster-monitoring-config*/4.11-4.15/config.yaml, deploy/cluster-monitoring-config*/clusters-v4.5/config.yaml, deploy/cluster-monitoring-config*/config.yaml, deploy/cluster-monitoring-config*/management-clusters/config.yaml, deploy/cluster-monitoring-config*/pre-4.11/config.yaml, deploy/osd-fedramp-cluster-monitoring-config/config.yaml
Existing deploy configs reformatted from inline YAML arrays and quoted values to multiline block sequences and unquoted strings (e.g., values: ["4.5"] becomes values:\n - 4.5); no semantic changes to selector logic or values.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

🚥 Pre-merge checks | ✅ 12
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[WIP] Allow CMO config overrides' is directly related to the main change: introducing a variant-driven system for cluster monitoring configuration overrides. It accurately describes the primary objective of the pull request.
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed No Ginkgo tests or test code found in this PR. The PR modifies YAML configuration files and a Python script only. The custom check is not applicable to non-test code.
Test Structure And Quality ✅ Passed PR contains no Ginkgo tests. Files modified are YAML configs (21 files) and a Python script. Custom check for Ginkgo test structure is not applicable.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests have been added. The PR consists of YAML config reformatting and a Python script refactor. The custom check for MicroShift test compatibility is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. Changes consist of YAML configuration files and Python tooling only. Check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed This PR refactors cluster monitoring configuration management and generates ConfigMaps, not pod/deployment manifests. It does not introduce scheduling constraints that assume standard HA topology.
Ote Binary Stdout Contract ✅ Passed OTE Binary Stdout Contract check is not applicable. PR modifies only YAML configuration and Python utility scripts; contains no Go code, test infrastructure, or OTE binary implementations.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR contains no Ginkgo e2e tests or Go test files. The changes are limited to YAML configuration and a Python utility script. The check is not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels May 6, 2026
@openshift-ci openshift-ci Bot requested review from boranx and cblecker May 6, 2026 15:17
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 6, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bergmannf

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 6, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 6, 2026

@bergmannf: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@cblecker
Copy link
Copy Markdown
Member

cblecker commented May 6, 2026

/uncc

@openshift-ci openshift-ci Bot removed the request for review from cblecker May 6, 2026 17:29
@bergmannf bergmannf changed the title [WIP] Allow CMO config overrides [SREP-4823] Allow CMO config overrides May 8, 2026
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants