[SREP-4823] Allow CMO config overrides#2743
Conversation
Extend generate-cmo-config.py to support per-cluster or per-organization
CMO config overrides via declarative YAML files in
resources/cluster-monitoring-config/overrides/.
Key changes:
1. Move selector definitions from static deploy/*/config.yaml files to a
centralized resources/cluster-monitoring-config/selectors.yaml. The
script now generates all config.yaml files, enabling automatic
injection of override exclusions.
2. Override files define a target (cluster ID or organization), which
deploy trees they apply to (uwm/non-uwm/fedramp/both), and config
overrides that are deep-merged onto the base monitoring config.
3. For each override, the script:
- Adds NotIn exclusions to all affected default config.yaml files,
preventing targeted clusters from receiving duplicate configs
- Creates override subdirectories with In selectors and merged
ConfigMap outputs
4. Without any override files present, all outputs remain semantically
identical to the previous static config.yaml files (formatting only).
See resources/cluster-monitoring-config/overrides/EXAMPLE.yaml.disabled
for the override file format and documentation.
…tory Replace the three separate configuration sources (selectors.yaml, variants.yaml, overrides/*.yaml) with a single variants/ directory where every file is self-contained: it carries both its selector (which clusters it targets) and its config transformations (what monitoring config those clusters get). Base variants have a 'selector' field. Override variants have a 'parent' and 'target' field — they specialize a base variant for specific clusters or organizations. The script automatically adds NotIn exclusions on the parent's selector so targeted clusters don't receive conflicting configs. The script is simplified from 309 to 215 lines. All 20 generated output files (10 ConfigMaps + 10 config.yaml) are byte-identical to before. Replace generic example with concrete extended-retention override example Replace EXAMPLE.yaml.disabled with a focused, realistic example (50-extended-retention-org.yaml.example) that demonstrates extending Prometheus retention and storage for a specific organization. The .example suffix is naturally excluded by the *.yaml glob pattern.
The previous 'target' abstraction (type: organization/cluster, values: [...]) hid what people already know — SelectorSyncSet matchExpressions. Overrides now specify matchExpressions directly, which: - Uses the same syntax as base variant selectors (no new concepts) - Supports any label key, not just cluster ID and org ID - Supports all operators: In, NotIn, Exists, DoesNotExist The script auto-negates each override expression (In<->NotIn, Exists<->DoesNotExist) on the parent variant's selector to ensure mutual exclusion.
|
/hold |
|
Caution Review failedFailed to post review comments WalkthroughThis PR refactors cluster monitoring configuration from a hard-coded Python generation script into a modular, variant-driven system. It adds new variant configuration files for UWM and non-UWM deployments across OpenShift versions and reformats existing deploy configuration files to multiline YAML syntax for consistency. ChangesCluster Monitoring Configuration Variant System
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes 🚥 Pre-merge checks | ✅ 12✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bergmannf The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@bergmannf: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/uncc |
What type of PR is this?
feature
What this PR does / why we need it?
This PR would allow modifying the monitoring configurtion per customer/cluster (anything that can be filtered in a selectorsyncset).
This is currently just a proof of concept that would allow us to push modified configurations more granular.
Which Jira/Github issue(s) this PR fixes?
Fixes #
Special notes for your reviewer:
Pre-checks (if applicable):
Tested latest changes against a cluster
Included documentation changes with PR
If this is a new object that is not intended for the FedRAMP environment (if unsure, please reach out to team FedRAMP), please exclude it with:
Summary by CodeRabbit
New Features
Refactor