Skip to content

OTA-1927: Eval cluster update prompts#2908

Open
fao89 wants to merge 1 commit into
openshift:mainfrom
fao89:OTA-1927
Open

OTA-1927: Eval cluster update prompts#2908
fao89 wants to merge 1 commit into
openshift:mainfrom
fao89:OTA-1927

Conversation

@fao89

@fao89 fao89 commented Apr 29, 2026

Copy link
Copy Markdown
Member

Add comprehensive MCP test scenarios to evaluation dataset for validating OpenShift cluster update workflow AI responses. These scenarios establish quality benchmarks for LLM outputs across different update phases.

Test Scenarios Added (conv_798-802):

  • Precheck: Pre-upgrade validation and readiness assessment Comprehensive analysis of cluster health, available updates, and upgrade blockers before initiating updates

  • Precheck-Specific: Targeted upgrade path validation Validates specific version availability and upgrade feasibility for planned update targets

  • No-Updates: Cluster health assessment at latest version Health monitoring and operational status when no updates are available in current channel

  • Progress: Real-time upgrade progress monitoring Tracks upgrade progress with component status, timeline analysis, and ETA calculations during active updates

  • Troubleshoot: Upgrade failure diagnosis and remediation Root cause analysis and conservative troubleshooting guidance for failed or stuck upgrade scenarios

Each scenario includes:

  • Complete analysis prompts with constraints and requirements
  • Full ClusterVersion YAML data as attachments
  • Full ClusterOperator YAML data as attachments
  • Expected responses with Summary and TL;DR sections
  • Real cluster data from production-like scenarios

These scenarios mirror the CONSOLE-5118 OLS integration workflow phases and provide the evaluation baseline for cluster update AI assistance.

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

Ref: openshift/console#16131

Summary by CodeRabbit

  • Documentation
    • Expanded evaluation instructions, including Python 3.11+ prerequisites, setup, and how to run full/short and “cluster-updates” tests.
    • Enhanced “What’s Included” with dataset/test listings (tags and conversation ranges) and references to the evaluation tool.
  • Chores
    • Added a new “cluster-updates” evaluation system configuration with judge settings, metric definitions, output/CSV configuration, and logging/telemetry adjustments.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 29, 2026
@openshift-ci-robot

openshift-ci-robot commented Apr 29, 2026

Copy link
Copy Markdown

@fao89: This pull request references OTA-1927 which is a valid jira issue.

Details

In response to this:

Add comprehensive MCP test scenarios to evaluation dataset for validating OpenShift cluster update workflow AI responses. These scenarios establish quality benchmarks for LLM outputs across different update phases.

Test Scenarios Added (conv_798-802):

  • Precheck: Pre-upgrade validation and readiness assessment Comprehensive analysis of cluster health, available updates, and upgrade blockers before initiating updates

  • Precheck-Specific: Targeted upgrade path validation Validates specific version availability and upgrade feasibility for planned update targets

  • No-Updates: Cluster health assessment at latest version Health monitoring and operational status when no updates are available in current channel

  • Progress: Real-time upgrade progress monitoring Tracks upgrade progress with component status, timeline analysis, and ETA calculations during active updates

  • Troubleshoot: Upgrade failure diagnosis and remediation Root cause analysis and conservative troubleshooting guidance for failed or stuck upgrade scenarios

Each scenario includes:

  • Complete analysis prompts with constraints and requirements
  • Full ClusterVersion YAML data as attachments
  • Full ClusterOperator YAML data as attachments
  • Expected responses with Summary and TL;DR sections
  • Real cluster data from production-like scenarios

These scenarios mirror the CONSOLE-5118 OLS integration workflow phases and provide the evaluation baseline for cluster update AI assistance.

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

Ref: openshift/console#16131

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from blublinsky and raptorsun April 29, 2026 15:44
@openshift-ci

openshift-ci Bot commented Apr 29, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign bparees for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: df9d827e-60a9-4ba3-9de6-062ca8a6071b

📥 Commits

Reviewing files that changed from the base of the PR and between e1c10db and 47bc7ce.

📒 Files selected for processing (3)
  • eval/README.md
  • eval/eval_data_cluster_updates.yaml
  • eval/system_cluster_updates.yaml
✅ Files skipped from review due to trivial changes (1)
  • eval/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • eval/system_cluster_updates.yaml

Walkthrough

A new eval/system_cluster_updates.yaml configuration file is added for the LightSpeed Evaluation Framework, defining judge LLM settings, API call parameters, turn-level and conversation-level metrics, output/CSV configuration, visualization options, and logging overrides. The eval/README.md is expanded with usage commands, dataset details, test-category tags, and descriptions of both system configuration presets.

Changes

Cluster-Updates Evaluation Setup

Layer / File(s) Summary
Judge LLM and API configuration
eval/system_cluster_updates.yaml
Establishes OpenAI judge LLM provider with model selection, temperature, and generation/request limits; configures query-style API endpoint targeting a local HTTPS server with cluster-specific query provider/model and optional tool/system-prompt field overrides.
Turn and conversation-level metrics
eval/system_cluster_updates.yaml
Defines turn-level metrics with default custom answer correctness scoring and optional GEval criteria for Kubernetes condition interpretation, output format validation (requiring Summary/TL;DR sections), technical accuracy, and actionable guidance; adds conversation-level optional DeepEval metrics for completeness, relevancy, and knowledge retention (all disabled by default).
Results output, visualization, and operational configuration
eval/system_cluster_updates.yaml
Configures results output directory and CSV column definitions for recording turn/metric data; sets visualization figure sizing and enabled graph types; defines environment variable overrides to suppress DeepEval telemetry and control LiteLLM logging; specifies per-package log levels and formatting behavior.
README: setup instructions and configuration reference
eval/README.md
Adds evaluation framework prerequisites (Python 3.11+) and setup instructions; documents run commands for full, short, and cluster-updates evaluation variants with tag-based filtering; expands "What's Included" with dataset file listings, test-category tags mapped to conversation ranges, and descriptions of both system configuration presets with their metrics; adds reference link to the Lightspeed evaluation tool.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 15
✅ Passed checks (15 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'OTA-1927: Eval cluster update prompts' clearly identifies the main change: adding evaluation test scenarios for cluster update workflows.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed This PR modifies documentation and YAML configuration files only. The custom check for stable Ginkgo test names does not apply as this is a Python project with zero Go files and no Ginkgo tests.
Test Structure And Quality ✅ Passed PR contains no Ginkgo test code; it's a Python project that adds YAML evaluation dataset scenarios and configuration files. Custom check is not applicable.
Microshift Test Compatibility ✅ Passed This PR does not add any Ginkgo e2e tests. It only modifies YAML evaluation datasets and configuration files in lightspeed-service. The check is not applicable to this PR.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR; it only adds evaluation configuration, test data, and documentation files.
Topology-Aware Scheduling Compatibility ✅ Passed PR contains no deployment manifests, operator code, or controllers. Changes are documentation (README) and evaluation framework configuration/test data only. Check not applicable.
Ote Binary Stdout Contract ✅ Passed This PR is a documentation and YAML configuration update to the evaluation dataset for OLS (OpenShift Lightspeed). It contains no Go code, test binaries, or process-level code (main(), init(), Test...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR adds only YAML evaluation config and data files plus documentation. No Go test files or Ginkgo e2e tests were added, so the IPv6/disconnected network compatibility check does not apply.
No-Weak-Crypto ✅ Passed No weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or insecure secret comparisons found in the PR changes.
Container-Privileges ✅ Passed PR contains no container/K8s manifests or privileged security configurations; it only adds evaluation documentation and test data files.
No-Sensitive-Data-In-Logs ✅ Passed The pull request adds evaluation dataset test scenarios and configuration files without exposing sensitive data in logs. The logging configuration sets source_level to INFO and package_level to ERR...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@eval/README.md`:
- Around line 54-63: The cluster-updates example commands in the eval/README.md
reference system_cluster_updates.yaml which uses https://localhost:8080, but the
local setup starts OLS at http://localhost:8080, causing a TLS mismatch. Add a
clarifying note in the README near these example commands explaining that for
local runs, users need to either modify the api_base setting in
system_cluster_updates.yaml to use http instead of https, or provide
instructions pointing to a separate local cluster-updates configuration preset
that uses HTTP. This will prevent users from encountering immediate
connection/TLS failures when attempting to run these commands locally.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8d9c1f51-f25b-40f6-b7f8-eba854c9da4a

📥 Commits

Reviewing files that changed from the base of the PR and between a8aa7a8 and 2064cd9.

📒 Files selected for processing (3)
  • eval/README.md
  • eval/eval_data_cluster_updates.yaml
  • eval/system_cluster_updates.yaml

Comment thread eval/README.md
Add comprehensive MCP test scenarios to evaluation dataset for validating
OpenShift cluster update workflow AI responses. These scenarios establish
quality benchmarks for LLM outputs across different update phases.

Test Scenarios Added (conv_798-802):
- Precheck: Pre-upgrade validation and readiness assessment
  Comprehensive analysis of cluster health, available updates, and
  upgrade blockers before initiating updates

- Precheck-Specific: Targeted upgrade path validation
  Validates specific version availability and upgrade feasibility
  for planned update targets

- No-Updates: Cluster health assessment at latest version
  Health monitoring and operational status when no updates are
  available in current channel

- Progress: Real-time upgrade progress monitoring
  Tracks upgrade progress with component status, timeline analysis,
  and ETA calculations during active updates

- Troubleshoot: Upgrade failure diagnosis and remediation
  Root cause analysis and conservative troubleshooting guidance
  for failed or stuck upgrade scenarios

Each scenario includes:
- Complete analysis prompts with constraints and requirements
- Full ClusterVersion YAML data as attachments
- Full ClusterOperator YAML data as attachments
- Expected responses with Summary and TL;DR sections
- Real cluster data from production-like scenarios

These scenarios mirror the CONSOLE-5118 OLS integration workflow phases
and provide the evaluation baseline for cluster update AI assistance.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Fabricio Aguiar <fabricio.aguiar@gmail.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown

@fao89: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci openshift-ci Bot requested review from cambelem and sriroopar June 17, 2026 17:37
@fao89

fao89 commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

/cc @sriroopar @rioloc

@openshift-ci openshift-ci Bot requested a review from rioloc June 17, 2026 17:38
@sriroopar

Copy link
Copy Markdown
Contributor
  1. Turn metrics need to be defined for every turn as necessary.
  2. provider name needs to be standardized to openai.
  3. https should be replaced with http.
  4. all conversations have single tag - but readme suggests otherwise.

- Clear recommendation should be provided
- conversation_group_id: conv_800
tag: cluster-updates-scenarios
turns:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your PR Fabricio,:)

a major bug is that turn metrics is not set up for everyturn which will in turn not capture the metrics we may want to analyze. rest looks okay, dropped a couple minor mismatches in a comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants