OLS-2928 - Implement rhoai on lseval#2946
Conversation
|
Warning Review limit reached
More reviews will be available in 3 minutes. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR adds RHOAI vLLM provider support to the lightspeed-service evaluation testing infrastructure. It updates judge LLM temperature settings across multiple evaluation configs, integrates a new ChangesRHOAI vLLM Provider Integration
🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
When RHOAI_PROVISION=true, the lseval periodic script provisions GPU infra and a vLLM model serving endpoint (Llama-3.1-8B-Instruct) on the OpenShift cluster before running evals. When not set, behavior is unchanged (OpenAI GPT-4o-mini). Changes: - Copy RHOAI manifests and scripts from lightspeed-stack, adapted for RawDeployment mode (3 operators: RHODS, GPU, NFD — no Service Mesh or Serverless) - Add OLSConfig CR with KSVC_URL placeholder for envsubst substitution - Add envsubst support in ols_installer.py for rhoai_vllm CRs - Add rhoai_vllm to the lseval periodic provider matrix - Update eval config model from gpt-3.5-turbo to Llama-3.1-8B-Instruct
3a0d10d to
0f3be56
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 16
🧹 Nitpick comments (1)
tests/rhoai/scripts/gpu-setup.sh (1)
57-74: 💤 Low valueJSON patch replaces all existing tolerations.
The
addoperation with a full array value replaces any existing tolerations on the NFD worker daemonset. If the daemonset has default tolerations (e.g., for control-plane nodes), they would be lost. Consider usingmerge-patchor appending to the existing array if preserving defaults matters.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/rhoai/scripts/gpu-setup.sh` around lines 57 - 74, The current oc patch command replaces the entire /spec/template/spec/tolerations array; instead append each toleration so existing tolerations aren't lost: change the JSON patch to use "op": "add" with the path "/spec/template/spec/tolerations/-" for each toleration object (targeting the nfd-worker DaemonSet patch invocation in gpu-setup.sh) so you add the two tolerations rather than overwrite the array, or switch to a merge-type patch that merges/appends tolerations instead of replacing them.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@eval/system_azure_openai_lseval.yaml`:
- Line 18: Remove the explicit "temperature: 1" field from the Azure OpenAI
request config for the gpt-5-mini model (i.e., drop the temperature key) and/or
update the request-building logic to omit the temperature parameter when model
=== "gpt-5-mini" so the client/proxy does not send a temperature field (ensure
the check references the "gpt-5-mini" model string and the "temperature" config
key).
In `@eval/system_rhoai_vllm_lseval.yaml`:
- Line 22: The rhoai_vllm judge LLM entry currently sets max_tokens: 4096 which
is asymmetric versus other providers at 512; either change the rhoai_vllm
max_tokens value to 512 to standardize evaluation length across providers (edit
the max_tokens field in the rhoai_vllm/judge config) or add a concise comment
next to the rhoai_vllm max_tokens setting documenting the explicit rationale for
why it needs to be 4096 (e.g., to allow extended judge reasoning for weaker
models) so reviewers understand the intentional deviation.
- Line 37: The model string "meta-llama/Llama-3.1-8B-Instruct" in the config is
inconsistent with the PR text claiming Llama 3.3 8B; confirm which model is
actually deployed and make them match by either updating the config's model
field to the deployed identifier (e.g., change the "model" value to the correct
Llama-3.3-8B identifier) or updating the PR description to reference
"meta-llama/Llama-3.1-8B-Instruct" so both the config (the model field) and the
PR description reference the same version.
In `@tests/e2e/utils/ols_installer.py`:
- Around line 444-453: The current branch that handles "rhoai_vllm" reads the CR
YAML and expands environment vars with os.path.expandvars(raw) but doesn't
verify KSVC_URL is set, so an unresolved "${KSVC_URL}" can be applied; update
the block that detects "rhoai_vllm" to validate that the environment variable
KSVC_URL is present (or that the substituted content no longer contains
"${KSVC_URL}") before calling cluster_utils.run_oc (refer to the "rhoai_vllm"
conditional, the use of os.path.expandvars, and cluster_utils.run_oc), and if
KSVC_URL is missing, log an error / raise an exception and abort the install to
fail fast.
In `@tests/rhoai/manifests/operators/operatorgroup.yaml`:
- Around line 6-7: The OperatorGroup resource named "global-operators" currently
has its spec left as YAML null; update the manifest so the OperatorGroup's spec
is an explicit empty object (spec: {}) instead of null to match the CRD/schema
expectations and mirror the other OperatorGroup entries; locate the
"global-operators" OperatorGroup in the operatorgroup.yaml and replace the
null/empty spec with an explicit empty object for the spec field.
In `@tests/rhoai/manifests/vllm/vllm-runtime-gpu.yaml`:
- Around line 22-24: The manifest currently sets the model download directory to
"/tmp/models-cache" via the "--download-dir" flag, but the writable cache volume
is mounted at "/mnt/models-cache"; change the "--download-dir" value from
"/tmp/models-cache" to "/mnt/models-cache" wherever it appears (e.g., near the
"--download-dir" flag and any related container args in vllm-runtime-gpu.yaml)
so model downloads use the mounted cache volume instead of container root
storage.
- Around line 14-70: Add an explicit restrictive container securityContext for
the kserve-container by setting runAsNonRoot: true and a non-root UID (e.g.,
runAsUser: 1000), disallowing privilege escalation (allowPrivilegeEscalation:
false), dropping all capabilities (capabilities.drop: ["ALL"]), enabling a
read-only root filesystem (readOnlyRootFilesystem: true), and adding a default
seccompProfile (type: RuntimeDefault) and optional runAsGroup if required by the
image; apply this under the container spec for the container named
kserve-container to harden the runtime.
In `@tests/rhoai/scripts/bootstrap.sh`:
- Around line 13-16: The CSV discovery loop using `until oc get csv -n
"${NAMESPACE}" -l "${OPERATOR_LABEL}" --no-headers` can spin forever; add a
bounded timeout (e.g., CSV_TIMEOUT_SECONDS or MAX_ATTEMPTS) and track elapsed
time or attempt count around that loop, sleeping as before but breaking out and
calling `echo` with a clear error and `exit 1` if the timeout is reached; update
the loop that prints `"...still waiting for ${OPERATOR_NAME} CSV to show up"` to
include the elapsed/attempts or timeout value for clarity.
In `@tests/rhoai/scripts/deploy-vllm.sh`:
- Around line 6-10: The polling loops that run commands like "until oc get crd
$crd &>/dev/null; do sleep 5; done" (and the similar loops at the other
occurrences) need an upper bound: add a timeout mechanism (e.g. record start
time and max_wait_seconds or use the timeout command) and on expiry print a
clear error including the resource name and exit non‑zero; update each loop (the
one iterating over "servingruntimes.serving.kserve.io
inferenceservices.serving.kserve.io" and the similar loops at the other two
locations) to break on success or fail fast after max wait rather than looping
forever.
- Around line 1-3: Add strict shell mode at the top of the script (immediately
after the #!/bin/bash line) to fail fast: insert "set -euo pipefail" and define
a safe IFS (e.g. IFS=$'\n\t'). This will ensure errors (including unset
variables and pipeline failures) abort execution; locate the existing
BASE_DIR="$1" line to place this change just before it.
In `@tests/rhoai/scripts/fetch-vllm-image.sh`:
- Line 25: The VLLM_IMAGE assignment uses a cluster-wide search and head -1
which yields nondeterministic first-match behavior; change the logic that sets
VLLM_IMAGE (the VLLM_IMAGE variable / the oc get servingruntime invocation) to
target a specific namespace or a caller-provided namespace (e.g., accept
NAMESPACE env/arg) and to select by exact servingruntime name or a deterministic
selector (avoid piping to head -1); ensure the command uses oc -n <namespace>
and a precise grep/filter to return the intended odh-vllm-cuda-rhel9 image only.
In `@tests/rhoai/scripts/gpu-setup.sh`:
- Around line 23-25: The shell script uses unquoted variable expansions for node
names in the oc get node commands (the occurrences of $node), which can cause
word-splitting or globbing; update both usages to quote the variable (use
"$node") so the commands call oc with the full node name as a single argument
and avoid accidental splitting or glob expansion.
- Line 88: The until loop uses unquoted command substitutions for the oc get
pods pipelines which can break if output is empty or contains whitespace; update
the condition in the until loop that contains the two command substitutions (the
oc get pods ... | awk ... and the oc get pods ... | wc -l) to quote their
substitutions so the shell treats their results as single words (i.e., quote
"$(oc get pods -n nvidia-gpu-operator --no-headers 2>/dev/null | awk '{if ($3 !=
"Running" && $3 != "Completed") exit 1}')" and the count substitution "$(oc get
pods -n nvidia-gpu-operator --no-headers 2>/dev/null | wc -l)") to make the test
robust.
- Around line 159-197: The until condition only checks GPU capacity but not
allocatable, so the loop can exit when capacity > 0 while allocatable is still
0; update the until test that currently uses oc get nodes ...
status.capacity.nvidia\.com/gpu to also fetch and verify
status.allocatable.nvidia\.com/gpu (the same oc jsonpath logic used later for
the capacity and allocatable variables) and require both values to be non-empty
and not "0" before breaking out of the loop; modify the until block and its oc
commands (and keep the existing capacity and allocatable variable
assignments/prints) so the guard enforces both capacity and allocatable > 0.
In `@tests/scripts/test-lseval-periodic.sh`:
- Around line 72-79: Replace the one-shot create logic that uses "oc create ...
|| echo ...", because it ignores rotated credentials, with an idempotent apply
flow: generate the secret manifest client-side and pipe it to "oc apply -f -" so
hf-token-secret and vllm-api-key-secret are created or updated atomically;
specifically, change the two invocations that reference the secret names
hf-token-secret and vllm-api-key-secret to use a client-side dry-run output
(e.g., --dry-run=client -o yaml) piped to oc apply -f - so reruns will update
secrets when values change.
- Around line 106-109: The temporary file created as RHOAI_KEY_FILE is not
removed after the script exits; add a cleanup that removes "$RHOAI_KEY_FILE" on
exit and on interrupt: define a small cleanup function (e.g., cleanup() { rm -f
"$RHOAI_KEY_FILE"; }) and register it with trap to run on EXIT, INT, TERM before
exporting RHOAI_PROVIDER_KEY_PATH so the sensitive key file is deleted when the
script finishes or is interrupted.
---
Nitpick comments:
In `@tests/rhoai/scripts/gpu-setup.sh`:
- Around line 57-74: The current oc patch command replaces the entire
/spec/template/spec/tolerations array; instead append each toleration so
existing tolerations aren't lost: change the JSON patch to use "op": "add" with
the path "/spec/template/spec/tolerations/-" for each toleration object
(targeting the nfd-worker DaemonSet patch invocation in gpu-setup.sh) so you add
the two tolerations rather than overwrite the array, or switch to a merge-type
patch that merges/appends tolerations instead of replacing them.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5fcc1c9a-5d7b-4dfd-9e01-910024da7495
📒 Files selected for processing (26)
eval/system_azure_openai_lseval.yamleval/system_openai_lseval.yamleval/system_rhelai_vllm_lseval.yamleval/system_rhoai_vllm_lseval.yamleval/system_watsonx_lseval.yamltests/config/operator_install/olsconfig.crd.rhoai_vllm_lseval.yamltests/config/operator_install/route.yamltests/e2e/conftest.pytests/e2e/evaluation/test_lseval_periodic.pytests/e2e/utils/ols_installer.pytests/rhoai/manifests/gpu/cluster-policy.yamltests/rhoai/manifests/gpu/create-nfd.yamltests/rhoai/manifests/namespaces/nfd.yamltests/rhoai/manifests/namespaces/nvidia-operator.yamltests/rhoai/manifests/operators/ds-cluster.yamltests/rhoai/manifests/operators/operatorgroup.yamltests/rhoai/manifests/operators/operators.yamltests/rhoai/manifests/vllm/vllm-inference-service-gpu.yamltests/rhoai/manifests/vllm/vllm-runtime-gpu.yamltests/rhoai/scripts/bootstrap.shtests/rhoai/scripts/deploy-vllm.shtests/rhoai/scripts/fetch-vllm-image.shtests/rhoai/scripts/get-vllm-pod-info.shtests/rhoai/scripts/gpu-setup.shtests/scripts/test-lseval-periodic.shtests/scripts/utils.sh
| provider: "openai" | ||
| model: "gpt-5-mini" | ||
| temperature: 0.0 | ||
| temperature: 1 |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does OpenAI's gpt-5-mini model require temperature to be set to 1, or does it support other values like 0.0?
💡 Result:
OpenAI's gpt-5-mini model does not support setting the temperature parameter to values other than 1 [1][2]. If you attempt to provide a temperature value such as 0.0 or any other setting besides the default, the API will return a 400 Bad Request error indicating that the value is unsupported and only the default value of 1 is allowed [1][2]. To use the model successfully, you should either omit the temperature parameter from your request or explicitly set it to 1 [3].
Citations:
- 1: LLM Feature with gpt-5-mini not working due to wrong temperature pgadmin-org/pgadmin4#9719
- 2: [BUG] - OpenAI parser fails with GPT-5 models due to unsupported 'temperature' parameter mealie-recipes/mealie#6019
- 3: [BUG] Can't use GPT-5 model family - Unsupported parameter:
temperatureis not supported with this model. getzep/graphiti#874
🌐 Web query:
OpenAI gpt-5-mini temperature parameter only allowed value 1 official documentation
💡 Result:
The claim that GPT-5-mini only allows a temperature parameter value of 1 is not supported by official OpenAI documentation. In practice, the GPT-5 model family—including GPT-5-mini—does not support the temperature parameter at all [1][2][3]. Attempting to pass the temperature parameter to the GPT-5-mini model will result in an API error (often a 400 Bad Request) stating that the parameter is unsupported [1][2][3]. While some users and developers have observed that setting temperature to 1 effectively "bypasses" this error because it aligns with the model's internal default, the model does not "support" the parameter in a functional sense, nor is 1 an "allowed value" in the way that it would be for models that permit sampling control [4][2][3]. The GPT-5 family utilizes multi-stage inference and complex internal reasoning processes that are pre-calibrated for stability and safety [4][3]. Exposing external sampling knobs like temperature would disrupt these processes [4][3]. Therefore, rather than using temperature, users are encouraged to steer model output through prompting techniques or specific parameters like reasoning_effort when available [4][3].
Citations:
- 1: https://community.openai.com/t/temperature-in-gpt-5-models/1337133
- 2: [BUG] Can't use GPT-5 model family - Unsupported parameter:
temperatureis not supported with this model. getzep/graphiti#874 - 3: https://ai.plainenglish.io/gpt-5-killed-temperature-control-heres-what-that-actually-means-e323f98685c9
- 4: https://hippocampus-garden.com/llm_temperature/
🌐 Web query:
Azure OpenAI gpt-5-mini temperature parameter unsupported 400 Bad Request
💡 Result:
The "400 Bad Request" error when using the temperature parameter with gpt-5-mini on Azure OpenAI occurs because this model (and others in the GPT-5 series) enforces a strict default temperature setting and does not support custom temperature values [1][2][3]. When a request includes a temperature parameter—even one set to the default value of 1.0—the API may reject it if the client library or the deployment configuration interprets the presence of the parameter as an attempt to override the model's fixed sampling behavior [1][4][3]. Key points regarding this issue: 1. Root Cause: The gpt-5-mini model is configured to ignore or strictly reject custom temperature values [1][3]. Many development tools and SDKs automatically inject a default temperature (e.g., 0.7 or 1.0) into API payloads [5][4][3]. If the payload explicitly contains the temperature key, the Azure OpenAI service returns a "400 Bad Request" error with a message indicating that the provided value is unsupported or that only the default value is allowed [1][2][3]. 2. Resolution Strategies: - Omit the Parameter: The most effective fix is to ensure the temperature parameter is completely omitted from the API request payload, rather than setting it to a specific value [4][3]. - Update Client Libraries: Many AI development frameworks (such as LiteLLM, LangChain, and various SDKs) have released updates to detect the model family and automatically strip unsupported parameters like temperature [6][5]. Ensure you are using the latest versions of your libraries [5]. - Configure "Drop Params": If using a gateway or proxy service (e.g., LiteLLM), enable "drop_params" or similar functionality in your configuration to automatically filter out unsupported parameters before they reach the Azure endpoint [1]. - Hardcode Defaults (with caution): If you cannot omit the parameter, some users have reported that explicitly setting temperature to 1.0 works for certain deployments, though omitting it entirely is generally the more robust and recommended solution [4][3]. If you are using specific tools, check your configuration files to verify if a default temperature is being applied automatically and attempt to set it to null or remove the configuration key entirely [1][4][3].
Citations:
- 1: [Bug]: Azure OpenAI gpt-5 Tempature Value BerriAI/litellm#13397
- 2: [Bug] MIRPOv2 Temperature argument not supported by GPT-5 series stanfordnlp/dspy#8642
- 3: Legacy OpenAI_Chat: temperature=0.7 causes 400 with models that only support default vanna-ai/vanna#1112
- 4: Error with Azure OpenAI Reasoning Models RooCodeInc/Roo-Code#1400
- 5: AzureChatOpenAI Deployment “o3-mini” Returns 400 Error Due to Unsupported ‘temperature’ Parameter Azure/azure-sdk-for-python#39938
- 6: Fix: Azure GPT-5 incorrectly routed to O-series config (temperature parameter unsupported) BerriAI/litellm#16246
Remove temperature from Azure OpenAI requests for gpt-5-mini (don’t rely on temperature: 1).
The GPT-5(-mini) Azure behavior rejects the temperature parameter when it’s present in the request payload (often returning a 400 Bad Request), so keeping temperature: 1 is brittle even if it matches a default. Omit the temperature field for this model (or ensure your client/proxy strips it).
temperature: 1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@eval/system_azure_openai_lseval.yaml` at line 18, Remove the explicit
"temperature: 1" field from the Azure OpenAI request config for the gpt-5-mini
model (i.e., drop the temperature key) and/or update the request-building logic
to omit the temperature parameter when model === "gpt-5-mini" so the
client/proxy does not send a temperature field (ensure the check references the
"gpt-5-mini" model string and the "temperature" config key).
|
/retest |
5 similar comments
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
When INT/TERM still invoke Suggestion: chain cleanup into function finish() {
[[ -n "${RHOAI_KEY_FILE:-}" ]] && rm -f "$RHOAI_KEY_FILE"
record_trends
...
}Or use a single EXIT trap that calls both. Low severity on ephemeral CI nodes, but worth fixing for secret hygiene. |
|
That’s fine for the normal RHOAI periodic flow ( Suggestion: reuse the same substitution logic in |
|
This PR adds a fair amount of RHOAI bootstrap infra (manifests + shell scripts) with no local docs under
Nice-to-have for this PR or a quick follow-up. |
adding except for oauth and reforcing logic around filtering for events
969bcd8 to
52b9bdb
Compare
|
/retest |
|
@JoaoFula: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Description
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Configuration Changes