Skip to content

OLS-2928 - Implement rhoai on lseval#2946

Open
JoaoFula wants to merge 3 commits into
openshift:mainfrom
JoaoFula:implement-rhoai-on-lseval
Open

OLS-2928 - Implement rhoai on lseval#2946
JoaoFula wants to merge 3 commits into
openshift:mainfrom
JoaoFula:implement-rhoai-on-lseval

Conversation

@JoaoFula

@JoaoFula JoaoFula commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Description

  • Adding RHOAI installation for lseval.
  • Expanding lseval call script to install RHOAI in case RHOAI_PROVISION=true. RHOAI will not install unless this env is set (so it shouldn't interfere with existing eval)
  • Changing temperature for gpt5 as judge to 1 since gpt5 does not allow for different temperatures
  • Increasing timeout for route in tests (and evals) since weaker models such as llama 3.1 8b (currently used in RHOAI tests) was causing constant route timeouts

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Added support for rhoai_vllm provider in evaluation tests
    • Added GPU infrastructure and vLLM model serving automation for test environments
  • Configuration Changes

    • Updated evaluation scoring parameters for increased randomness and token generation limits
    • Extended default model configuration options

@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@JoaoFula, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 3 minutes. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ba2b3bd3-b0f9-4171-9dcf-5985a0b327b0

📥 Commits

Reviewing files that changed from the base of the PR and between 0f3be56 and 969bcd8.

📒 Files selected for processing (5)
  • eval/system_rhoai_vllm_lseval.yaml
  • tests/e2e/utils/ols_installer.py
  • tests/rhoai/manifests/operators/operatorgroup.yaml
  • tests/rhoai/scripts/gpu-setup.sh
  • tests/scripts/test-lseval-periodic.sh
📝 Walkthrough

Walkthrough

This PR adds RHOAI vLLM provider support to the lightspeed-service evaluation testing infrastructure. It updates judge LLM temperature settings across multiple evaluation configs, integrates a new rhoai_vllm provider into the test framework, provides Kubernetes manifests and orchestration scripts for RHOAI cluster provisioning, and extends the periodic LSEval test suite to optionally provision and test against a self-hosted vLLM deployment.

Changes

RHOAI vLLM Provider Integration

Layer / File(s) Summary
Evaluation judge configuration updates
eval/system_*.yaml
Judge LLM temperature changed from 0.0 to 1 across Azure OpenAI, RhelAI vLLM, and Watsonx configs. OpenAI config temperature normalized to 1. RHOAI vLLM config model switched to meta-llama/Llama-3.1-8B-Instruct with temperature: 1 and max_tokens: 4096.
Test framework provider registration
tests/e2e/conftest.py, tests/e2e/evaluation/test_lseval_periodic.py
Added rhoai_vllm to pytest --lseval_provider choices and test provider matrix. Mapped rhoai_vllm to system_rhoai_vllm_lseval.yaml config. Added cleanup of stale evaluation_* artifacts before test runs.
OLS operator CRD and configuration
tests/config/operator_install/olsconfig.crd.rhoai_vllm_lseval.yaml, tests/config/operator_install/route.yaml, tests/e2e/utils/ols_installer.py
Created OLSConfig resource for rhoai_vllm provider with Llama-3.1-8B model and ${KSVC_URL}/v1 endpoint. Added route timeout annotation. Implemented environment variable substitution for rhoai_vllm CRD templates via os.path.expandvars.
Kubernetes operators and namespaces
tests/rhoai/manifests/namespaces/*.yaml, tests/rhoai/manifests/operators/*.yaml
Created Namespace manifests for openshift-nfd and nvidia-gpu-operator. Defined OperatorGroup resources for global, GPU, and NFD operators. Created Subscription resources for RHODS, GPU Operator, and NFD. Configured DataScienceCluster with kserve enabled but serving components disabled.
GPU cluster policy and NFD configuration
tests/rhoai/manifests/gpu/*.yaml, tests/rhoai/scripts/gpu-setup.sh
Created ClusterPolicy enabling GPU driver, toolkit, device plugin, DCGM, and NFD components with GPU node tolerations. Created NodeFeatureDiscovery instance. Added gpu-setup.sh orchestrating NFD deployment, ClusterPolicy application, operator readiness, GPU node labeling, and capacity validation with timeouts and diagnostics.
vLLM serving runtime and inference service
tests/rhoai/manifests/vllm/*.yaml
Created ServingRuntime vllm-gpu with vLLM command args, HuggingFace/API key secrets, chat template and cache volumes, GPU/CPU/memory resources. Created InferenceService vllm-model with RawDeployment mode and GPU resource limits.
RHOAI bootstrap and deployment scripts
tests/rhoai/scripts/bootstrap.sh, tests/rhoai/scripts/deploy-vllm.sh, tests/rhoai/scripts/fetch-vllm-image.sh, tests/rhoai/scripts/get-vllm-pod-info.sh
Added bootstrap.sh orchestrating operator installation and DataScienceCluster provisioning. Added deploy-vllm.sh waiting for KServe CRDs, controllers, GPU nodes, and GPU capacity before deploying vLLM. Added fetch-vllm-image.sh resolving vLLM CUDA image from templates or ServingRuntimes. Added get-vllm-pod-info.sh discovering InferenceService pods and extracting Knative/cluster service URLs.
Test suite provisioning and orchestration
tests/scripts/test-lseval-periodic.sh, tests/scripts/utils.sh
Extended test-lseval-periodic.sh with optional RHOAI_PROVISION=true mode conditionally setting up RHOAI namespaces, operators, GPU, vLLM, and secrets. Updated run_suites and record_trends to branch between OpenAI and RHOAI provider paths. Updated utils.sh to use prefix matching (lseval_periodic*) for suite routing.

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'OLS-2928 - Implement rhoai on lseval' clearly summarizes the main change: implementing RHOAI support for the lseval evaluation pipeline.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

When RHOAI_PROVISION=true, the lseval periodic script provisions
GPU infra and a vLLM model serving endpoint (Llama-3.1-8B-Instruct)
on the OpenShift cluster before running evals. When not set, behavior
is unchanged (OpenAI GPT-4o-mini).

Changes:
- Copy RHOAI manifests and scripts from lightspeed-stack, adapted for
  RawDeployment mode (3 operators: RHODS, GPU, NFD — no Service Mesh
  or Serverless)
- Add OLSConfig CR with KSVC_URL placeholder for envsubst substitution
- Add envsubst support in ols_installer.py for rhoai_vllm CRs
- Add rhoai_vllm to the lseval periodic provider matrix
- Update eval config model from gpt-3.5-turbo to Llama-3.1-8B-Instruct
@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 5, 2026
@JoaoFula JoaoFula force-pushed the implement-rhoai-on-lseval branch from 3a0d10d to 0f3be56 Compare June 5, 2026 07:51
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 5, 2026
@openshift-ci openshift-ci Bot requested review from bparees and raptorsun June 5, 2026 07:51
@openshift-ci

openshift-ci Bot commented Jun 5, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign blublinsky for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

🧹 Nitpick comments (1)
tests/rhoai/scripts/gpu-setup.sh (1)

57-74: 💤 Low value

JSON patch replaces all existing tolerations.

The add operation with a full array value replaces any existing tolerations on the NFD worker daemonset. If the daemonset has default tolerations (e.g., for control-plane nodes), they would be lost. Consider using merge-patch or appending to the existing array if preserving defaults matters.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/rhoai/scripts/gpu-setup.sh` around lines 57 - 74, The current oc patch
command replaces the entire /spec/template/spec/tolerations array; instead
append each toleration so existing tolerations aren't lost: change the JSON
patch to use "op": "add" with the path "/spec/template/spec/tolerations/-" for
each toleration object (targeting the nfd-worker DaemonSet patch invocation in
gpu-setup.sh) so you add the two tolerations rather than overwrite the array, or
switch to a merge-type patch that merges/appends tolerations instead of
replacing them.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@eval/system_azure_openai_lseval.yaml`:
- Line 18: Remove the explicit "temperature: 1" field from the Azure OpenAI
request config for the gpt-5-mini model (i.e., drop the temperature key) and/or
update the request-building logic to omit the temperature parameter when model
=== "gpt-5-mini" so the client/proxy does not send a temperature field (ensure
the check references the "gpt-5-mini" model string and the "temperature" config
key).

In `@eval/system_rhoai_vllm_lseval.yaml`:
- Line 22: The rhoai_vllm judge LLM entry currently sets max_tokens: 4096 which
is asymmetric versus other providers at 512; either change the rhoai_vllm
max_tokens value to 512 to standardize evaluation length across providers (edit
the max_tokens field in the rhoai_vllm/judge config) or add a concise comment
next to the rhoai_vllm max_tokens setting documenting the explicit rationale for
why it needs to be 4096 (e.g., to allow extended judge reasoning for weaker
models) so reviewers understand the intentional deviation.
- Line 37: The model string "meta-llama/Llama-3.1-8B-Instruct" in the config is
inconsistent with the PR text claiming Llama 3.3 8B; confirm which model is
actually deployed and make them match by either updating the config's model
field to the deployed identifier (e.g., change the "model" value to the correct
Llama-3.3-8B identifier) or updating the PR description to reference
"meta-llama/Llama-3.1-8B-Instruct" so both the config (the model field) and the
PR description reference the same version.

In `@tests/e2e/utils/ols_installer.py`:
- Around line 444-453: The current branch that handles "rhoai_vllm" reads the CR
YAML and expands environment vars with os.path.expandvars(raw) but doesn't
verify KSVC_URL is set, so an unresolved "${KSVC_URL}" can be applied; update
the block that detects "rhoai_vllm" to validate that the environment variable
KSVC_URL is present (or that the substituted content no longer contains
"${KSVC_URL}") before calling cluster_utils.run_oc (refer to the "rhoai_vllm"
conditional, the use of os.path.expandvars, and cluster_utils.run_oc), and if
KSVC_URL is missing, log an error / raise an exception and abort the install to
fail fast.

In `@tests/rhoai/manifests/operators/operatorgroup.yaml`:
- Around line 6-7: The OperatorGroup resource named "global-operators" currently
has its spec left as YAML null; update the manifest so the OperatorGroup's spec
is an explicit empty object (spec: {}) instead of null to match the CRD/schema
expectations and mirror the other OperatorGroup entries; locate the
"global-operators" OperatorGroup in the operatorgroup.yaml and replace the
null/empty spec with an explicit empty object for the spec field.

In `@tests/rhoai/manifests/vllm/vllm-runtime-gpu.yaml`:
- Around line 22-24: The manifest currently sets the model download directory to
"/tmp/models-cache" via the "--download-dir" flag, but the writable cache volume
is mounted at "/mnt/models-cache"; change the "--download-dir" value from
"/tmp/models-cache" to "/mnt/models-cache" wherever it appears (e.g., near the
"--download-dir" flag and any related container args in vllm-runtime-gpu.yaml)
so model downloads use the mounted cache volume instead of container root
storage.
- Around line 14-70: Add an explicit restrictive container securityContext for
the kserve-container by setting runAsNonRoot: true and a non-root UID (e.g.,
runAsUser: 1000), disallowing privilege escalation (allowPrivilegeEscalation:
false), dropping all capabilities (capabilities.drop: ["ALL"]), enabling a
read-only root filesystem (readOnlyRootFilesystem: true), and adding a default
seccompProfile (type: RuntimeDefault) and optional runAsGroup if required by the
image; apply this under the container spec for the container named
kserve-container to harden the runtime.

In `@tests/rhoai/scripts/bootstrap.sh`:
- Around line 13-16: The CSV discovery loop using `until oc get csv -n
"${NAMESPACE}" -l "${OPERATOR_LABEL}" --no-headers` can spin forever; add a
bounded timeout (e.g., CSV_TIMEOUT_SECONDS or MAX_ATTEMPTS) and track elapsed
time or attempt count around that loop, sleeping as before but breaking out and
calling `echo` with a clear error and `exit 1` if the timeout is reached; update
the loop that prints `"...still waiting for ${OPERATOR_NAME} CSV to show up"` to
include the elapsed/attempts or timeout value for clarity.

In `@tests/rhoai/scripts/deploy-vllm.sh`:
- Around line 6-10: The polling loops that run commands like "until oc get crd
$crd &>/dev/null; do sleep 5; done" (and the similar loops at the other
occurrences) need an upper bound: add a timeout mechanism (e.g. record start
time and max_wait_seconds or use the timeout command) and on expiry print a
clear error including the resource name and exit non‑zero; update each loop (the
one iterating over "servingruntimes.serving.kserve.io
inferenceservices.serving.kserve.io" and the similar loops at the other two
locations) to break on success or fail fast after max wait rather than looping
forever.
- Around line 1-3: Add strict shell mode at the top of the script (immediately
after the #!/bin/bash line) to fail fast: insert "set -euo pipefail" and define
a safe IFS (e.g. IFS=$'\n\t'). This will ensure errors (including unset
variables and pipeline failures) abort execution; locate the existing
BASE_DIR="$1" line to place this change just before it.

In `@tests/rhoai/scripts/fetch-vllm-image.sh`:
- Line 25: The VLLM_IMAGE assignment uses a cluster-wide search and head -1
which yields nondeterministic first-match behavior; change the logic that sets
VLLM_IMAGE (the VLLM_IMAGE variable / the oc get servingruntime invocation) to
target a specific namespace or a caller-provided namespace (e.g., accept
NAMESPACE env/arg) and to select by exact servingruntime name or a deterministic
selector (avoid piping to head -1); ensure the command uses oc -n <namespace>
and a precise grep/filter to return the intended odh-vllm-cuda-rhel9 image only.

In `@tests/rhoai/scripts/gpu-setup.sh`:
- Around line 23-25: The shell script uses unquoted variable expansions for node
names in the oc get node commands (the occurrences of $node), which can cause
word-splitting or globbing; update both usages to quote the variable (use
"$node") so the commands call oc with the full node name as a single argument
and avoid accidental splitting or glob expansion.
- Line 88: The until loop uses unquoted command substitutions for the oc get
pods pipelines which can break if output is empty or contains whitespace; update
the condition in the until loop that contains the two command substitutions (the
oc get pods ... | awk ... and the oc get pods ... | wc -l) to quote their
substitutions so the shell treats their results as single words (i.e., quote
"$(oc get pods -n nvidia-gpu-operator --no-headers 2>/dev/null | awk '{if ($3 !=
"Running" && $3 != "Completed") exit 1}')" and the count substitution "$(oc get
pods -n nvidia-gpu-operator --no-headers 2>/dev/null | wc -l)") to make the test
robust.
- Around line 159-197: The until condition only checks GPU capacity but not
allocatable, so the loop can exit when capacity > 0 while allocatable is still
0; update the until test that currently uses oc get nodes ...
status.capacity.nvidia\.com/gpu to also fetch and verify
status.allocatable.nvidia\.com/gpu (the same oc jsonpath logic used later for
the capacity and allocatable variables) and require both values to be non-empty
and not "0" before breaking out of the loop; modify the until block and its oc
commands (and keep the existing capacity and allocatable variable
assignments/prints) so the guard enforces both capacity and allocatable > 0.

In `@tests/scripts/test-lseval-periodic.sh`:
- Around line 72-79: Replace the one-shot create logic that uses "oc create ...
|| echo ...", because it ignores rotated credentials, with an idempotent apply
flow: generate the secret manifest client-side and pipe it to "oc apply -f -" so
hf-token-secret and vllm-api-key-secret are created or updated atomically;
specifically, change the two invocations that reference the secret names
hf-token-secret and vllm-api-key-secret to use a client-side dry-run output
(e.g., --dry-run=client -o yaml) piped to oc apply -f - so reruns will update
secrets when values change.
- Around line 106-109: The temporary file created as RHOAI_KEY_FILE is not
removed after the script exits; add a cleanup that removes "$RHOAI_KEY_FILE" on
exit and on interrupt: define a small cleanup function (e.g., cleanup() { rm -f
"$RHOAI_KEY_FILE"; }) and register it with trap to run on EXIT, INT, TERM before
exporting RHOAI_PROVIDER_KEY_PATH so the sensitive key file is deleted when the
script finishes or is interrupted.

---

Nitpick comments:
In `@tests/rhoai/scripts/gpu-setup.sh`:
- Around line 57-74: The current oc patch command replaces the entire
/spec/template/spec/tolerations array; instead append each toleration so
existing tolerations aren't lost: change the JSON patch to use "op": "add" with
the path "/spec/template/spec/tolerations/-" for each toleration object
(targeting the nfd-worker DaemonSet patch invocation in gpu-setup.sh) so you add
the two tolerations rather than overwrite the array, or switch to a merge-type
patch that merges/appends tolerations instead of replacing them.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5fcc1c9a-5d7b-4dfd-9e01-910024da7495

📥 Commits

Reviewing files that changed from the base of the PR and between f600c71 and 0f3be56.

📒 Files selected for processing (26)
  • eval/system_azure_openai_lseval.yaml
  • eval/system_openai_lseval.yaml
  • eval/system_rhelai_vllm_lseval.yaml
  • eval/system_rhoai_vllm_lseval.yaml
  • eval/system_watsonx_lseval.yaml
  • tests/config/operator_install/olsconfig.crd.rhoai_vllm_lseval.yaml
  • tests/config/operator_install/route.yaml
  • tests/e2e/conftest.py
  • tests/e2e/evaluation/test_lseval_periodic.py
  • tests/e2e/utils/ols_installer.py
  • tests/rhoai/manifests/gpu/cluster-policy.yaml
  • tests/rhoai/manifests/gpu/create-nfd.yaml
  • tests/rhoai/manifests/namespaces/nfd.yaml
  • tests/rhoai/manifests/namespaces/nvidia-operator.yaml
  • tests/rhoai/manifests/operators/ds-cluster.yaml
  • tests/rhoai/manifests/operators/operatorgroup.yaml
  • tests/rhoai/manifests/operators/operators.yaml
  • tests/rhoai/manifests/vllm/vllm-inference-service-gpu.yaml
  • tests/rhoai/manifests/vllm/vllm-runtime-gpu.yaml
  • tests/rhoai/scripts/bootstrap.sh
  • tests/rhoai/scripts/deploy-vllm.sh
  • tests/rhoai/scripts/fetch-vllm-image.sh
  • tests/rhoai/scripts/get-vllm-pod-info.sh
  • tests/rhoai/scripts/gpu-setup.sh
  • tests/scripts/test-lseval-periodic.sh
  • tests/scripts/utils.sh

provider: "openai"
model: "gpt-5-mini"
temperature: 0.0
temperature: 1

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Does OpenAI's gpt-5-mini model require temperature to be set to 1, or does it support other values like 0.0?

💡 Result:

OpenAI's gpt-5-mini model does not support setting the temperature parameter to values other than 1 [1][2]. If you attempt to provide a temperature value such as 0.0 or any other setting besides the default, the API will return a 400 Bad Request error indicating that the value is unsupported and only the default value of 1 is allowed [1][2]. To use the model successfully, you should either omit the temperature parameter from your request or explicitly set it to 1 [3].

Citations:


🌐 Web query:

OpenAI gpt-5-mini temperature parameter only allowed value 1 official documentation

💡 Result:

The claim that GPT-5-mini only allows a temperature parameter value of 1 is not supported by official OpenAI documentation. In practice, the GPT-5 model family—including GPT-5-mini—does not support the temperature parameter at all [1][2][3]. Attempting to pass the temperature parameter to the GPT-5-mini model will result in an API error (often a 400 Bad Request) stating that the parameter is unsupported [1][2][3]. While some users and developers have observed that setting temperature to 1 effectively "bypasses" this error because it aligns with the model's internal default, the model does not "support" the parameter in a functional sense, nor is 1 an "allowed value" in the way that it would be for models that permit sampling control [4][2][3]. The GPT-5 family utilizes multi-stage inference and complex internal reasoning processes that are pre-calibrated for stability and safety [4][3]. Exposing external sampling knobs like temperature would disrupt these processes [4][3]. Therefore, rather than using temperature, users are encouraged to steer model output through prompting techniques or specific parameters like reasoning_effort when available [4][3].

Citations:


🌐 Web query:

Azure OpenAI gpt-5-mini temperature parameter unsupported 400 Bad Request

💡 Result:

The "400 Bad Request" error when using the temperature parameter with gpt-5-mini on Azure OpenAI occurs because this model (and others in the GPT-5 series) enforces a strict default temperature setting and does not support custom temperature values [1][2][3]. When a request includes a temperature parameter—even one set to the default value of 1.0—the API may reject it if the client library or the deployment configuration interprets the presence of the parameter as an attempt to override the model's fixed sampling behavior [1][4][3]. Key points regarding this issue: 1. Root Cause: The gpt-5-mini model is configured to ignore or strictly reject custom temperature values [1][3]. Many development tools and SDKs automatically inject a default temperature (e.g., 0.7 or 1.0) into API payloads [5][4][3]. If the payload explicitly contains the temperature key, the Azure OpenAI service returns a "400 Bad Request" error with a message indicating that the provided value is unsupported or that only the default value is allowed [1][2][3]. 2. Resolution Strategies: - Omit the Parameter: The most effective fix is to ensure the temperature parameter is completely omitted from the API request payload, rather than setting it to a specific value [4][3]. - Update Client Libraries: Many AI development frameworks (such as LiteLLM, LangChain, and various SDKs) have released updates to detect the model family and automatically strip unsupported parameters like temperature [6][5]. Ensure you are using the latest versions of your libraries [5]. - Configure "Drop Params": If using a gateway or proxy service (e.g., LiteLLM), enable "drop_params" or similar functionality in your configuration to automatically filter out unsupported parameters before they reach the Azure endpoint [1]. - Hardcode Defaults (with caution): If you cannot omit the parameter, some users have reported that explicitly setting temperature to 1.0 works for certain deployments, though omitting it entirely is generally the more robust and recommended solution [4][3]. If you are using specific tools, check your configuration files to verify if a default temperature is being applied automatically and attempt to set it to null or remove the configuration key entirely [1][4][3].

Citations:


Remove temperature from Azure OpenAI requests for gpt-5-mini (don’t rely on temperature: 1).

The GPT-5(-mini) Azure behavior rejects the temperature parameter when it’s present in the request payload (often returning a 400 Bad Request), so keeping temperature: 1 is brittle even if it matches a default. Omit the temperature field for this model (or ensure your client/proxy strips it).

temperature: 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/system_azure_openai_lseval.yaml` at line 18, Remove the explicit
"temperature: 1" field from the Azure OpenAI request config for the gpt-5-mini
model (i.e., drop the temperature key) and/or update the request-building logic
to omit the temperature parameter when model === "gpt-5-mini" so the
client/proxy does not send a temperature field (ensure the check references the
"gpt-5-mini" model string and the "temperature" config key).

Comment thread eval/system_rhoai_vllm_lseval.yaml Outdated
Comment thread eval/system_rhoai_vllm_lseval.yaml
Comment thread tests/e2e/utils/ols_installer.py Outdated
Comment thread tests/rhoai/manifests/operators/operatorgroup.yaml Outdated
Comment thread tests/rhoai/scripts/gpu-setup.sh Outdated
Comment thread tests/rhoai/scripts/gpu-setup.sh Outdated
Comment thread tests/rhoai/scripts/gpu-setup.sh
Comment thread tests/scripts/test-lseval-periodic.sh
Comment thread tests/scripts/test-lseval-periodic.sh
@JoaoFula

JoaoFula commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

/retest

5 similar comments
@JoaoFula

JoaoFula commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

/retest

@JoaoFula

JoaoFula commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

/retest

@JoaoFula

JoaoFula commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

/retest

@JoaoFula

JoaoFula commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

/retest

@JoaoFula

Copy link
Copy Markdown
Contributor Author

/retest

@blublinsky

Copy link
Copy Markdown
Contributor

tests/scripts/test-lseval-periodic.sh — Temp API key file may not be cleaned up on normal exit

When RHOAI_PROVISION=true, you register trap cleanup EXIT INT TERM to remove RHOAI_KEY_FILE (lines 107–108). Later, trap finish EXIT (line 198) replaces the EXIT handler, so on a successful run only finish runs and the temp file containing VLLM_API_KEY is left in /tmp.

INT/TERM still invoke cleanup, but the common CI path exits via EXIT after run_suites completes.

Suggestion: chain cleanup into finish, e.g.:

function finish() {
  [[ -n "${RHOAI_KEY_FILE:-}" ]] && rm -f "$RHOAI_KEY_FILE"
  record_trends
  ...
}

Or use a single EXIT trap that calls both. Low severity on ephemeral CI nodes, but worth fixing for secret hygiene.

@blublinsky

Copy link
Copy Markdown
Contributor

tests/e2e/utils/adapt_ols_config.py${KSVC_URL} substitution missing on adapt path

install_ols() handles olsconfig.crd.rhoai_vllm_lseval.yaml by expanding ${KSVC_URL} and piping the result to oc create -f - (ols_installer.py, ~444–457).

adapt_ols_config.apply_olsconfig() applies the same file directly with oc apply -f (lines 36–39), so if the operator is already installed the CR can get a literal url: "${KSVC_URL}/v1" instead of the real vLLM endpoint.

That’s fine for the normal RHOAI periodic flow (cleanup_ols_operator → fresh install_ols), but it breaks re-runs or local dev when pytest takes the adapt path instead of install.

Suggestion: reuse the same substitution logic in apply_olsconfig() (shared helper, or the same expandvars + oc apply -f - branch for rhoai_vllm_lseval). Nice-to-have unless you expect adapt-path runs for RHOAI.

@blublinsky

Copy link
Copy Markdown
Contributor

tests/rhoai/ — consider a short README

This PR adds a fair amount of RHOAI bootstrap infra (manifests + shell scripts) with no local docs under tests/rhoai/. The test-lseval-periodic.sh header covers some env vars, but a short tests/rhoai/README.md would help the next person maintain this:

  • When to set RHOAI_PROVISION=true
  • Required env (HUGGING_FACE_HUB_TOKEN, VLLM_API_KEY, etc.)
  • GPU cluster prerequisites
  • Rough runtime expectations (operators + GPU + model pull)
  • Script flow (bootstrap → gpu-setup → deploy-vllm → get-vllm-pod-info)

Nice-to-have for this PR or a quick follow-up.

adding except for oauth and reforcing logic around filtering for events
@JoaoFula JoaoFula force-pushed the implement-rhoai-on-lseval branch from 969bcd8 to 52b9bdb Compare June 17, 2026 09:20
@JoaoFula

Copy link
Copy Markdown
Contributor Author

/retest

@openshift-ci

openshift-ci Bot commented Jun 17, 2026

Copy link
Copy Markdown

@JoaoFula: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants