Skip to content

fix(frontend): clarify the model-not-ready 503 message #10995

Open
jh-nv wants to merge 2 commits into
mainfrom
readiness_message_DYN-3290
Open

fix(frontend): clarify the model-not-ready 503 message #10995
jh-nv wants to merge 2 commits into
mainfrom
readiness_message_DYN-3290

Conversation

@jh-nv

@jh-nv jh-nv commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Make the "model registered but not yet ready to serve" 503 message clear and consistent. Message-only — no behavioral change.

The 503 was produced at three sites with three different bodies, two of which leaked internal taxonomy (prefill/decode/encode worker types, namespaces, "worker set", "worker startup logs"):

  • OpenAI readiness gate (check_model_serving_ready) — covers chat, completions, embeddings, responses, images, audios
  • OpenAI dispatch backstop (from_model_errorModelUnavailable, was "Model temporarily unavailable")
  • Anthropic gate (anthropic_messagesModelUnavailable)

All three now route through one canonical helper model_not_ready_message(model_name):

Model <model> is not ready to serve requests yet. The deployment may still be starting up or is not fully provisioned. Please retry shortly.

(503. Body text is identical across the OpenAI family and Anthropic; the error envelope still differs by API convention — OpenAI type=service_unavailable, Anthropic type=overloaded_error.) Removes the now-dead model_unavailable().

Scope / what this deliberately does NOT do

  • No readiness-behavior change. The gate still uses has_ready_workers() and the listing still uses serving_ready_display_names() exactly as before. An earlier revision routed the gate/listing through is_ready_to_serve(); that broke correct behavior (a declaratively-ready aggregated model vanished from /v1/models and /ready began contradicting the gate), so it was dropped.
  • Does not unify the 404 vs 503 split. A request that never reaches the readiness gate (because no chat-serving worker is registered) still returns a bare 404 from a separate endpoint-enablement gate (service_v2.rs). Consolidating that is out of scope (deferred).
  • Other 503 surfaces are untouched and separate: the process-level "Service is not ready" (_service_unavailable, used by /health and the service middleware), backend-overload 503/529 (SanitizedError), the Realtime API (/v1/realtime, its own model_unavailable), and KServe gRPC. Only the model-not-ready 503 on the OpenAI/Anthropic HTTP paths is unified here.

Behavioral map (verified, real vLLM)

The response is decided by whether a decode (chat-serving) worker is registered:

Deployment state Response
Complete (all required roles present) 200
Decode present but incomplete (missing prefill and/or encode) 503 — this message
No decode/chat-serving worker (prefill-only, encode-only, encode+prefill) 404 (bare, empty body — endpoint gate, before the readiness gate)

Testing

Container full-stack with real vLLM backends on an RTX PRO 6000 (--router-mode kv):

  • P/D (Qwen3-0.6B): decode-only (prefill missing) → 503 (this message); P+D complete → 200; prefill dies → 503.
  • E/P/D (Qwen2-VL-2B, multimodal): encode-only → 404; encode+prefill (no decode) → 404; E+P+D complete → 200 (text and image); prefill+decode (encode missing) → 503; decode-only (prefill missing) → 503. Confirms the decode-present → 503 / decode-absent → 404 rule above, including the EPD encode role.
  • Rust unit + integration green; cargo clippy -p dynamo-llm clean.
  • Anthropic message-sharing is covered by code + unit test (test_unavailable_paths_share_one_message); the live container runs exercised the OpenAI endpoints.

Note

tool_parser_v2 test gains a missing llm_metrics: None so the lib/llm test target compiles (pre-existing break on main, unrelated).

Closes DYN-3290.

🤖 Generated with Claude Code

@jh-nv jh-nv requested a review from a team as a code owner June 26, 2026 16:01
@jh-nv jh-nv temporarily deployed to external_collaborator June 26, 2026 16:01 — with GitHub Actions Inactive
@github-actions github-actions Bot added fix frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Jun 26, 2026
@datadog-official

datadog-official Bot commented Jun 26, 2026

Copy link
Copy Markdown

Pipelines

⚠️ Warnings

🚦 1 Pipeline job failed

Docs link check | lychee   View in Datadog   GitHub Actions

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: d48c11f | Docs | Give us feedback!

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

Open in Devin Review

Comment thread lib/llm/src/http/service/openai.rs
Comment thread lib/llm/src/http/service/openai.rs
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

OpenAI and Anthropic now return a shared 503 “model not ready” response for unavailable models, and OpenAI readiness checks use the same message helper. One OpenAI test fixture also adds an explicit llm_metrics: None field.

Changes

Model not ready responses

Layer / File(s) Summary
Shared message and dispatch mapping
lib/llm/src/http/service/openai.rs
model_not_ready_message(model_name) is introduced, and ErrorMessage::from_model_error returns a 503 body from that helper for ModelUnavailable.
Readiness gate and Anthropic reuse
lib/llm/src/http/service/openai.rs, lib/llm/src/http/service/anthropic.rs
check_model_serving_ready and the Anthropic ModelUnavailable branch both return the shared model_not_ready_message(...) 503 body.
Response tests
lib/llm/src/http/service/openai.rs
Tests now construct the unavailable response through from_model_error and assert the readiness-gate and dispatch paths produce matching 503 bodies.

Tool parser fixture update

Layer / File(s) Summary
Stream response fixture shape
lib/llm/src/protocols/openai/chat_completions/tool_parser_v2.rs
The chunk helper now sets llm_metrics: None in NvCreateChatCompletionStreamResponse.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title captures the main change: clarifying the customer-facing model-not-ready 503 message.
Description check ✅ Passed The PR description is detailed, covers summary, scope, testing, and issue linkage, and is mostly aligned with the template.

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
lib/llm/src/http/service/openai.rs (1)

4657-4660: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Exercise the actual readiness gate in this drift test.

This manually builds service_unavailable_with_body(model_not_ready_message(...)), so it would still pass if check_model_serving_ready later stopped using the canonical helper. Prefer asserting the body from check_model_serving_ready(...) for a registered-not-ready model, or add that assertion to the existing /ready integration coverage.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/http/service/openai.rs` around lines 4657 - 4660, The drift test
is manually constructing the service-unavailable body instead of exercising the
real readiness gate, so it can miss regressions in check_model_serving_ready.
Update the test to assert the body returned by check_model_serving_ready for a
registered-but-not-ready model, or add that assertion alongside the existing
/ready integration coverage, using the same canonical model_not_ready_message
path to compare against the gate output.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lib/llm/src/http/service/openai.rs`:
- Around line 4657-4660: The drift test is manually constructing the
service-unavailable body instead of exercising the real readiness gate, so it
can miss regressions in check_model_serving_ready. Update the test to assert the
body returned by check_model_serving_ready for a registered-but-not-ready model,
or add that assertion alongside the existing /ready integration coverage, using
the same canonical model_not_ready_message path to compare against the gate
output.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 48a4322d-651b-4dd6-9933-dd42ed05240b

📥 Commits

Reviewing files that changed from the base of the PR and between 91b6259 and 5ff08fe.

📒 Files selected for processing (3)
  • lib/llm/src/http/service/anthropic.rs
  • lib/llm/src/http/service/openai.rs
  • lib/llm/src/protocols/openai/chat_completions/tool_parser_v2.rs

@jh-nv jh-nv changed the title fix(frontend): unify model-not-ready 503 and hide worker-type internals (DYN-3290) fix(frontend): unify model-not-ready 503 and hide worker-type internals Jun 26, 2026
@jh-nv jh-nv temporarily deployed to external_collaborator June 26, 2026 16:31 — with GitHub Actions Inactive
@jh-nv jh-nv changed the title fix(frontend): unify model-not-ready 503 and hide worker-type internals fix(frontend): gate serving readiness on actual servability, not just worker types (DYN-3290) Jun 26, 2026
@jh-nv jh-nv force-pushed the readiness_message_DYN-3290 branch from 308683b to 78700e2 Compare June 26, 2026 18:27
@jh-nv jh-nv temporarily deployed to external_collaborator June 26, 2026 18:27 — with GitHub Actions Inactive
@jh-nv jh-nv changed the title fix(frontend): gate serving readiness on actual servability, not just worker types (DYN-3290) fix(frontend): clarify the model-not-ready 503 message (DYN-3290) Jun 26, 2026
@jh-nv jh-nv changed the title fix(frontend): clarify the model-not-ready 503 message (DYN-3290) fix(frontend): clarify the model-not-ready 503 message Jun 26, 2026
The "model registered but not ready to serve" 503 was produced at three
sites with three different bodies, two of which leaked internal taxonomy
(prefill/decode/encode worker types, namespaces, "worker set", "worker
startup logs"):

  - OpenAI readiness gate (check_model_serving_ready)
  - OpenAI dispatch backstop (from_model_error -> ModelUnavailable,
    "Model temporarily unavailable")
  - Anthropic gate (anthropic_messages -> ModelUnavailable)

Add a single canonical helper `model_not_ready_message(model_name)` and
route all three sites through it, so a client gets one clear, retryable
503 with no internal deployment jargon. Remove the now-dead
model_unavailable() helper.

Scope: this is message-only. It does NOT change readiness behavior --
the gate still uses has_ready_workers() and the listing still uses
serving_ready_display_names() exactly as before. It also does not unify
the two missing-role cases: a decode-missing deployment returns this 503
(it reaches the readiness gate), while a prefill-only deployment still
returns 404 via a separate endpoint-enablement gate that runs before the
readiness check. Consolidating those two paths is a separate change and
is intentionally not attempted here.

Also add the missing llm_metrics: None field in the tool_parser_v2 test
so the lib test target compiles (pre-existing break, unrelated).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jie Hao <jihao@nvidia.com>
@jh-nv jh-nv force-pushed the readiness_message_DYN-3290 branch from 78700e2 to 56e1897 Compare June 26, 2026 19:34
@jh-nv jh-nv temporarily deployed to external_collaborator June 26, 2026 19:34 — with GitHub Actions Inactive
Comment thread lib/llm/src/protocols/openai/chat_completions/tool_parser_v2.rs
@rmccorm4 rmccorm4 temporarily deployed to external_collaborator June 26, 2026 19:36 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants