fix(frontend): clarify the model-not-ready 503 message #10995
Conversation
WalkthroughOpenAI and Anthropic now return a shared 503 “model not ready” response for unavailable models, and OpenAI readiness checks use the same message helper. One OpenAI test fixture also adds an explicit ChangesModel not ready responses
Tool parser fixture update
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
lib/llm/src/http/service/openai.rs (1)
4657-4660: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winExercise the actual readiness gate in this drift test.
This manually builds
service_unavailable_with_body(model_not_ready_message(...)), so it would still pass ifcheck_model_serving_readylater stopped using the canonical helper. Prefer asserting the body fromcheck_model_serving_ready(...)for a registered-not-ready model, or add that assertion to the existing/readyintegration coverage.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/llm/src/http/service/openai.rs` around lines 4657 - 4660, The drift test is manually constructing the service-unavailable body instead of exercising the real readiness gate, so it can miss regressions in check_model_serving_ready. Update the test to assert the body returned by check_model_serving_ready for a registered-but-not-ready model, or add that assertion alongside the existing /ready integration coverage, using the same canonical model_not_ready_message path to compare against the gate output.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@lib/llm/src/http/service/openai.rs`:
- Around line 4657-4660: The drift test is manually constructing the
service-unavailable body instead of exercising the real readiness gate, so it
can miss regressions in check_model_serving_ready. Update the test to assert the
body returned by check_model_serving_ready for a registered-but-not-ready model,
or add that assertion alongside the existing /ready integration coverage, using
the same canonical model_not_ready_message path to compare against the gate
output.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 48a4322d-651b-4dd6-9933-dd42ed05240b
📒 Files selected for processing (3)
lib/llm/src/http/service/anthropic.rslib/llm/src/http/service/openai.rslib/llm/src/protocols/openai/chat_completions/tool_parser_v2.rs
308683b to
78700e2
Compare
The "model registered but not ready to serve" 503 was produced at three
sites with three different bodies, two of which leaked internal taxonomy
(prefill/decode/encode worker types, namespaces, "worker set", "worker
startup logs"):
- OpenAI readiness gate (check_model_serving_ready)
- OpenAI dispatch backstop (from_model_error -> ModelUnavailable,
"Model temporarily unavailable")
- Anthropic gate (anthropic_messages -> ModelUnavailable)
Add a single canonical helper `model_not_ready_message(model_name)` and
route all three sites through it, so a client gets one clear, retryable
503 with no internal deployment jargon. Remove the now-dead
model_unavailable() helper.
Scope: this is message-only. It does NOT change readiness behavior --
the gate still uses has_ready_workers() and the listing still uses
serving_ready_display_names() exactly as before. It also does not unify
the two missing-role cases: a decode-missing deployment returns this 503
(it reaches the readiness gate), while a prefill-only deployment still
returns 404 via a separate endpoint-enablement gate that runs before the
readiness check. Consolidating those two paths is a separate change and
is intentionally not attempted here.
Also add the missing llm_metrics: None field in the tool_parser_v2 test
so the lib test target compiles (pre-existing break, unrelated).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jie Hao <jihao@nvidia.com>
78700e2 to
56e1897
Compare
Summary
Make the "model registered but not yet ready to serve" 503 message clear and consistent. Message-only — no behavioral change.
The 503 was produced at three sites with three different bodies, two of which leaked internal taxonomy (prefill/decode/encode worker types, namespaces, "worker set", "worker startup logs"):
check_model_serving_ready) — covers chat, completions, embeddings, responses, images, audiosfrom_model_error→ModelUnavailable, was "Model temporarily unavailable")anthropic_messages→ModelUnavailable)All three now route through one canonical helper
model_not_ready_message(model_name):(503. Body text is identical across the OpenAI family and Anthropic; the error envelope still differs by API convention — OpenAI
type=service_unavailable, Anthropictype=overloaded_error.) Removes the now-deadmodel_unavailable().Scope / what this deliberately does NOT do
has_ready_workers()and the listing still usesserving_ready_display_names()exactly as before. An earlier revision routed the gate/listing throughis_ready_to_serve(); that broke correct behavior (a declaratively-ready aggregated model vanished from/v1/modelsand/readybegan contradicting the gate), so it was dropped.service_v2.rs). Consolidating that is out of scope (deferred)._service_unavailable, used by/healthand the service middleware), backend-overload 503/529 (SanitizedError), the Realtime API (/v1/realtime, its ownmodel_unavailable), and KServe gRPC. Only the model-not-ready 503 on the OpenAI/Anthropic HTTP paths is unified here.Behavioral map (verified, real vLLM)
The response is decided by whether a decode (chat-serving) worker is registered:
Testing
Container full-stack with real vLLM backends on an RTX PRO 6000 (
--router-mode kv):cargo clippy -p dynamo-llmclean.test_unavailable_paths_share_one_message); the live container runs exercised the OpenAI endpoints.Note
tool_parser_v2test gains a missingllm_metrics: Noneso thelib/llmtest target compiles (pre-existing break onmain, unrelated).Closes DYN-3290.
🤖 Generated with Claude Code