OTA-1927: Eval cluster update prompts by fao89 · Pull Request #2908 · openshift/lightspeed-service

fao89 · 2026-04-29T15:43:15Z

Add comprehensive MCP test scenarios to evaluation dataset for validating OpenShift cluster update workflow AI responses. These scenarios establish quality benchmarks for LLM outputs across different update phases.

Test Scenarios Added (conv_798-802):

Precheck: Pre-upgrade validation and readiness assessment Comprehensive analysis of cluster health, available updates, and upgrade blockers before initiating updates
Precheck-Specific: Targeted upgrade path validation Validates specific version availability and upgrade feasibility for planned update targets
No-Updates: Cluster health assessment at latest version Health monitoring and operational status when no updates are available in current channel
Progress: Real-time upgrade progress monitoring Tracks upgrade progress with component status, timeline analysis, and ETA calculations during active updates
Troubleshoot: Upgrade failure diagnosis and remediation Root cause analysis and conservative troubleshooting guidance for failed or stuck upgrade scenarios

Each scenario includes:

Complete analysis prompts with constraints and requirements
Full ClusterVersion YAML data as attachments
Full ClusterOperator YAML data as attachments
Expected responses with Summary and TL;DR sections
Real cluster data from production-like scenarios

These scenarios mirror the CONSOLE-5118 OLS integration workflow phases and provide the evaluation baseline for cluster update AI assistance.

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

Ref: openshift/console#16131

Summary by CodeRabbit

Documentation
- Expanded evaluation instructions, including Python 3.11+ prerequisites, setup, and how to run full/short and “cluster-updates” tests.
- Enhanced “What’s Included” with dataset/test listings (tags and conversation ranges) and references to the evaluation tool.
Chores
- Added a new “cluster-updates” evaluation system configuration with judge settings, metric definitions, output/CSV configuration, and logging/telemetry adjustments.

openshift-ci-robot · 2026-04-29T15:43:20Z

@fao89: This pull request references OTA-1927 which is a valid jira issue.

Details

In response to this:

Add comprehensive MCP test scenarios to evaluation dataset for validating OpenShift cluster update workflow AI responses. These scenarios establish quality benchmarks for LLM outputs across different update phases.

Test Scenarios Added (conv_798-802):

Precheck: Pre-upgrade validation and readiness assessment Comprehensive analysis of cluster health, available updates, and upgrade blockers before initiating updates

Precheck-Specific: Targeted upgrade path validation Validates specific version availability and upgrade feasibility for planned update targets

No-Updates: Cluster health assessment at latest version Health monitoring and operational status when no updates are available in current channel

Progress: Real-time upgrade progress monitoring Tracks upgrade progress with component status, timeline analysis, and ETA calculations during active updates

Troubleshoot: Upgrade failure diagnosis and remediation Root cause analysis and conservative troubleshooting guidance for failed or stuck upgrade scenarios

Each scenario includes:

Complete analysis prompts with constraints and requirements

Full ClusterVersion YAML data as attachments

Full ClusterOperator YAML data as attachments

Expected responses with Summary and TL;DR sections

Real cluster data from production-like scenarios

These scenarios mirror the CONSOLE-5118 OLS integration workflow phases and provide the evaluation baseline for cluster update AI assistance.

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

Ref: openshift/console#16131

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-04-29T15:44:08Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign bparees for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-06-16T17:41:51Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: df9d827e-60a9-4ba3-9de6-062ca8a6071b

📥 Commits

Reviewing files that changed from the base of the PR and between e1c10db and 47bc7ce.

📒 Files selected for processing (3)

eval/README.md
eval/eval_data_cluster_updates.yaml
eval/system_cluster_updates.yaml

✅ Files skipped from review due to trivial changes (1)

eval/README.md

🚧 Files skipped from review as they are similar to previous changes (1)

eval/system_cluster_updates.yaml

Walkthrough

A new eval/system_cluster_updates.yaml configuration file is added for the LightSpeed Evaluation Framework, defining judge LLM settings, API call parameters, turn-level and conversation-level metrics, output/CSV configuration, visualization options, and logging overrides. The eval/README.md is expanded with usage commands, dataset details, test-category tags, and descriptions of both system configuration presets.

Changes

Cluster-Updates Evaluation Setup

Layer / File(s)	Summary
Judge LLM and API configuration `eval/system_cluster_updates.yaml`	Establishes OpenAI judge LLM provider with model selection, temperature, and generation/request limits; configures query-style API endpoint targeting a local HTTPS server with cluster-specific query provider/model and optional tool/system-prompt field overrides.
Turn and conversation-level metrics `eval/system_cluster_updates.yaml`	Defines turn-level metrics with default custom answer correctness scoring and optional GEval criteria for Kubernetes condition interpretation, output format validation (requiring Summary/TL;DR sections), technical accuracy, and actionable guidance; adds conversation-level optional DeepEval metrics for completeness, relevancy, and knowledge retention (all disabled by default).
Results output, visualization, and operational configuration `eval/system_cluster_updates.yaml`	Configures results output directory and CSV column definitions for recording turn/metric data; sets visualization figure sizing and enabled graph types; defines environment variable overrides to suppress DeepEval telemetry and control LiteLLM logging; specifies per-package log levels and formatting behavior.
README: setup instructions and configuration reference `eval/README.md`	Adds evaluation framework prerequisites (Python 3.11+) and setup instructions; documents run commands for full, short, and cluster-updates evaluation variants with tag-based filtering; expands "What's Included" with dataset file listings, test-category tags mapped to conversation ranges, and descriptions of both system configuration presets with their metrics; adds reference link to the Lightspeed evaluation tool.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 15

✅ Passed checks (15 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'OTA-1927: Eval cluster update prompts' clearly identifies the main change: adding evaluation test scenarios for cluster update workflows.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	This PR modifies documentation and YAML configuration files only. The custom check for stable Ginkgo test names does not apply as this is a Python project with zero Go files and no Ginkgo tests.
Test Structure And Quality	✅ Passed	PR contains no Ginkgo test code; it's a Python project that adds YAML evaluation dataset scenarios and configuration files. Custom check is not applicable.
Microshift Test Compatibility	✅ Passed	This PR does not add any Ginkgo e2e tests. It only modifies YAML evaluation datasets and configuration files in lightspeed-service. The check is not applicable to this PR.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	No Ginkgo e2e tests are added in this PR; it only adds evaluation configuration, test data, and documentation files.
Topology-Aware Scheduling Compatibility	✅ Passed	PR contains no deployment manifests, operator code, or controllers. Changes are documentation (README) and evaluation framework configuration/test data only. Check not applicable.
Ote Binary Stdout Contract	✅ Passed	This PR is a documentation and YAML configuration update to the evaluation dataset for OLS (OpenShift Lightspeed). It contains no Go code, test binaries, or process-level code (main(), init(), Test...
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	This PR adds only YAML evaluation config and data files plus documentation. No Go test files or Ginkgo e2e tests were added, so the IPv6/disconnected network compatibility check does not apply.
No-Weak-Crypto	✅ Passed	No weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or insecure secret comparisons found in the PR changes.
Container-Privileges	✅ Passed	PR contains no container/K8s manifests or privileged security configurations; it only adds evaluation documentation and test data files.
No-Sensitive-Data-In-Logs	✅ Passed	The pull request adds evaluation dataset test scenarios and configuration files without exposing sensitive data in logs. The logging configuration sets source_level to INFO and package_level to ERR...

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@eval/README.md`:
- Around line 54-63: The cluster-updates example commands in the eval/README.md
reference system_cluster_updates.yaml which uses https://localhost:8080, but the
local setup starts OLS at http://localhost:8080, causing a TLS mismatch. Add a
clarifying note in the README near these example commands explaining that for
local runs, users need to either modify the api_base setting in
system_cluster_updates.yaml to use http instead of https, or provide
instructions pointing to a separate local cluster-updates configuration preset
that uses HTTP. This will prevent users from encountering immediate
connection/TLS failures when attempting to run these commands locally.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8d9c1f51-f25b-40f6-b7f8-eba854c9da4a

📥 Commits

Reviewing files that changed from the base of the PR and between a8aa7a8 and 2064cd9.

📒 Files selected for processing (3)

eval/README.md
eval/eval_data_cluster_updates.yaml
eval/system_cluster_updates.yaml

Add comprehensive MCP test scenarios to evaluation dataset for validating OpenShift cluster update workflow AI responses. These scenarios establish quality benchmarks for LLM outputs across different update phases. Test Scenarios Added (conv_798-802): - Precheck: Pre-upgrade validation and readiness assessment Comprehensive analysis of cluster health, available updates, and upgrade blockers before initiating updates - Precheck-Specific: Targeted upgrade path validation Validates specific version availability and upgrade feasibility for planned update targets - No-Updates: Cluster health assessment at latest version Health monitoring and operational status when no updates are available in current channel - Progress: Real-time upgrade progress monitoring Tracks upgrade progress with component status, timeline analysis, and ETA calculations during active updates - Troubleshoot: Upgrade failure diagnosis and remediation Root cause analysis and conservative troubleshooting guidance for failed or stuck upgrade scenarios Each scenario includes: - Complete analysis prompts with constraints and requirements - Full ClusterVersion YAML data as attachments - Full ClusterOperator YAML data as attachments - Expected responses with Summary and TL;DR sections - Real cluster data from production-like scenarios These scenarios mirror the CONSOLE-5118 OLS integration workflow phases and provide the evaluation baseline for cluster update AI assistance. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Fabricio Aguiar <fabricio.aguiar@gmail.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

openshift-ci · 2026-06-16T18:16:47Z

@fao89: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

fao89 · 2026-06-17T17:38:15Z

/cc @sriroopar @rioloc

sriroopar · 2026-06-17T20:23:47Z

Turn metrics need to be defined for every turn as necessary.
provider name needs to be standardized to openai.
https should be replaced with http.
all conversations have single tag - but readme suggests otherwise.

sriroopar · 2026-06-17T20:25:47Z

+      - Clear recommendation should be provided
+- conversation_group_id: conv_800
+  tag: cluster-updates-scenarios
+  turns:


Thank you very much for your PR Fabricio,:)

a major bug is that turn metrics is not set up for everyturn which will in turn not capture the metrics we may want to analyze. rest looks okay, dropped a couple minor mismatches in a comment.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 29, 2026

openshift-ci Bot requested review from blublinsky and raptorsun April 29, 2026 15:44

fao89 force-pushed the OTA-1927 branch from d564306 to 2064cd9 Compare June 16, 2026 17:41

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread eval/README.md

fao89 force-pushed the OTA-1927 branch from 2064cd9 to e1c10db Compare June 16, 2026 17:50

fao89 force-pushed the OTA-1927 branch from e1c10db to 47bc7ce Compare June 16, 2026 17:55

openshift-ci Bot requested review from cambelem and sriroopar June 17, 2026 17:37

openshift-ci Bot requested a review from rioloc June 17, 2026 17:38

sriroopar reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OTA-1927: Eval cluster update prompts#2908

OTA-1927: Eval cluster update prompts#2908
fao89 wants to merge 1 commit into
openshift:mainfrom
fao89:OTA-1927

fao89 commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

openshift-ci-robot commented Apr 29, 2026 •

edited by openshift-ci Bot

Loading

Uh oh!

openshift-ci Bot commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited by openshift-ci Bot

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

openshift-ci Bot commented Jun 16, 2026

Uh oh!

fao89 commented Jun 17, 2026

Uh oh!

sriroopar commented Jun 17, 2026

Uh oh!

sriroopar Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fao89 commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented Apr 29, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci Bot commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-ci Bot commented Jun 16, 2026

Uh oh!

fao89 commented Jun 17, 2026

Uh oh!

sriroopar commented Jun 17, 2026

Uh oh!

sriroopar Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fao89 commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented Apr 29, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited by openshift-ci Bot

Loading