fix(frontend): clarify the model-not-ready 503 message by jh-nv · Pull Request #10995 · ai-dynamo/dynamo

jh-nv · 2026-06-26T16:01:49Z

Summary

Make the "model registered but not yet ready to serve" 503 message clear and consistent. Message-only — no behavioral change.

The 503 was produced at three sites with three different bodies, two of which leaked internal taxonomy (prefill/decode/encode worker types, namespaces, "worker set", "worker startup logs"):

OpenAI readiness gate (check_model_serving_ready) — covers chat, completions, embeddings, responses, images, audios
OpenAI dispatch backstop (from_model_error → ModelUnavailable, was "Model temporarily unavailable")
Anthropic gate (anthropic_messages → ModelUnavailable)

All three now route through one canonical helper model_not_ready_message(model_name):

Model <model> is not ready to serve requests yet. The deployment may still be starting up or is not fully provisioned. Please retry shortly.

(503. Body text is identical across the OpenAI family and Anthropic; the error envelope still differs by API convention — OpenAI type=service_unavailable, Anthropic type=overloaded_error.) Removes the now-dead model_unavailable().

Scope / what this deliberately does NOT do

No readiness-behavior change. The gate still uses has_ready_workers() and the listing still uses serving_ready_display_names() exactly as before. An earlier revision routed the gate/listing through is_ready_to_serve(); that broke correct behavior (a declaratively-ready aggregated model vanished from /v1/models and /ready began contradicting the gate), so it was dropped.
Does not unify the 404 vs 503 split. A request that never reaches the readiness gate (because no chat-serving worker is registered) still returns a bare 404 from a separate endpoint-enablement gate (service_v2.rs). Consolidating that is out of scope (deferred).
Other 503 surfaces are untouched and separate: the process-level "Service is not ready" (_service_unavailable, used by /health and the service middleware), backend-overload 503/529 (SanitizedError), the Realtime API (/v1/realtime, its own model_unavailable), and KServe gRPC. Only the model-not-ready 503 on the OpenAI/Anthropic HTTP paths is unified here.

Behavioral map (verified, real vLLM)

The response is decided by whether a decode (chat-serving) worker is registered:

Deployment state	Response
Complete (all required roles present)	200
Decode present but incomplete (missing prefill and/or encode)	503 — this message
No decode/chat-serving worker (prefill-only, encode-only, encode+prefill)	404 (bare, empty body — endpoint gate, before the readiness gate)

Testing

Container full-stack with real vLLM backends on an RTX PRO 6000 (--router-mode kv):

P/D (Qwen3-0.6B): decode-only (prefill missing) → 503 (this message); P+D complete → 200; prefill dies → 503.
E/P/D (Qwen2-VL-2B, multimodal): encode-only → 404; encode+prefill (no decode) → 404; E+P+D complete → 200 (text and image); prefill+decode (encode missing) → 503; decode-only (prefill missing) → 503. Confirms the decode-present → 503 / decode-absent → 404 rule above, including the EPD encode role.
Rust unit + integration green; cargo clippy -p dynamo-llm clean.
Anthropic message-sharing is covered by code + unit test (test_unavailable_paths_share_one_message); the live container runs exercised the OpenAI endpoints.

Note

tool_parser_v2 test gains a missing llm_metrics: None so the lib/llm test target compiles (pre-existing break on main, unrelated).

Closes DYN-3290.

🤖 Generated with Claude Code

datadog-official · 2026-06-26T16:02:54Z

⚠️ Warnings

🚦 1 Pipeline job failed

Docs link check | lychee

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: d48c11f | Docs | Give us feedback!}

devin-ai-integration

Devin Review found 2 potential issues.

coderabbitai · 2026-06-26T16:05:50Z

Walkthrough

OpenAI and Anthropic now return a shared 503 “model not ready” response for unavailable models, and OpenAI readiness checks use the same message helper. One OpenAI test fixture also adds an explicit llm_metrics: None field.

Changes

Model not ready responses

Layer / File(s)	Summary
Shared message and dispatch mapping `lib/llm/src/http/service/openai.rs`	`model_not_ready_message(model_name)` is introduced, and `ErrorMessage::from_model_error` returns a 503 body from that helper for `ModelUnavailable`.
Readiness gate and Anthropic reuse `lib/llm/src/http/service/openai.rs`, `lib/llm/src/http/service/anthropic.rs`	`check_model_serving_ready` and the Anthropic `ModelUnavailable` branch both return the shared `model_not_ready_message(...)` 503 body.
Response tests `lib/llm/src/http/service/openai.rs`	Tests now construct the unavailable response through `from_model_error` and assert the readiness-gate and dispatch paths produce matching 503 bodies.

Tool parser fixture update

Layer / File(s)	Summary
Stream response fixture shape `lib/llm/src/protocols/openai/chat_completions/tool_parser_v2.rs`	The `chunk` helper now sets `llm_metrics: None` in `NvCreateChatCompletionStreamResponse`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title captures the main change: clarifying the customer-facing model-not-ready 503 message.
Description check	✅ Passed	The PR description is detailed, covers summary, scope, testing, and issue linkage, and is mostly aligned with the template.

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

🧹 Nitpick comments (1)

lib/llm/src/http/service/openai.rs (1)
4657-4660: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Exercise the actual readiness gate in this drift test.

This manually builds service_unavailable_with_body(model_not_ready_message(...)), so it would still pass if check_model_serving_ready later stopped using the canonical helper. Prefer asserting the body from check_model_serving_ready(...) for a registered-not-ready model, or add that assertion to the existing /ready integration coverage.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/http/service/openai.rs` around lines 4657 - 4660, The drift test
is manually constructing the service-unavailable body instead of exercising the
real readiness gate, so it can miss regressions in check_model_serving_ready.
Update the test to assert the body returned by check_model_serving_ready for a
registered-but-not-ready model, or add that assertion alongside the existing
/ready integration coverage, using the same canonical model_not_ready_message
path to compare against the gate output.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lib/llm/src/http/service/openai.rs`:
- Around line 4657-4660: The drift test is manually constructing the
service-unavailable body instead of exercising the real readiness gate, so it
can miss regressions in check_model_serving_ready. Update the test to assert the
body returned by check_model_serving_ready for a registered-but-not-ready model,
or add that assertion alongside the existing /ready integration coverage, using
the same canonical model_not_ready_message path to compare against the gate
output.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 48a4322d-651b-4dd6-9933-dd42ed05240b

📥 Commits

Reviewing files that changed from the base of the PR and between 91b6259 and 5ff08fe.

📒 Files selected for processing (3)

lib/llm/src/http/service/anthropic.rs
lib/llm/src/http/service/openai.rs
lib/llm/src/protocols/openai/chat_completions/tool_parser_v2.rs

The "model registered but not ready to serve" 503 was produced at three sites with three different bodies, two of which leaked internal taxonomy (prefill/decode/encode worker types, namespaces, "worker set", "worker startup logs"): - OpenAI readiness gate (check_model_serving_ready) - OpenAI dispatch backstop (from_model_error -> ModelUnavailable, "Model temporarily unavailable") - Anthropic gate (anthropic_messages -> ModelUnavailable) Add a single canonical helper `model_not_ready_message(model_name)` and route all three sites through it, so a client gets one clear, retryable 503 with no internal deployment jargon. Remove the now-dead model_unavailable() helper. Scope: this is message-only. It does NOT change readiness behavior -- the gate still uses has_ready_workers() and the listing still uses serving_ready_display_names() exactly as before. It also does not unify the two missing-role cases: a decode-missing deployment returns this 503 (it reaches the readiness gate), while a prefill-only deployment still returns 404 via a separate endpoint-enablement gate that runs before the readiness check. Consolidating those two paths is a separate change and is intentionally not attempted here. Also add the missing llm_metrics: None field in the tool_parser_v2 test so the lib test target compiles (pre-existing break, unrelated). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jie Hao <jihao@nvidia.com>

jh-nv requested a review from a team as a code owner June 26, 2026 16:01

pull-request-size Bot added the size/L label Jun 26, 2026

jh-nv temporarily deployed to external_collaborator June 26, 2026 16:01 — with GitHub Actions Inactive

github-actions Bot added fix frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Jun 26, 2026

devin-ai-integration Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread lib/llm/src/http/service/openai.rs

Comment thread lib/llm/src/http/service/openai.rs

coderabbitai Bot reviewed Jun 26, 2026

View reviewed changes

jh-nv changed the title ~~fix(frontend): unify model-not-ready 503 and hide worker-type internals (DYN-3290)~~ fix(frontend): unify model-not-ready 503 and hide worker-type internals Jun 26, 2026

jh-nv temporarily deployed to external_collaborator June 26, 2026 16:31 — with GitHub Actions Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 26, 2026 16:31 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 26, 2026 16:37 Inactive

jh-nv changed the title ~~fix(frontend): unify model-not-ready 503 and hide worker-type internals~~ fix(frontend): gate serving readiness on actual servability, not just worker types (DYN-3290) Jun 26, 2026

jh-nv force-pushed the readiness_message_DYN-3290 branch from 308683b to 78700e2 Compare June 26, 2026 18:27

jh-nv temporarily deployed to external_collaborator June 26, 2026 18:27 — with GitHub Actions Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 26, 2026 18:27 Inactive

jh-nv changed the title ~~fix(frontend): gate serving readiness on actual servability, not just worker types (DYN-3290)~~ fix(frontend): clarify the model-not-ready 503 message (DYN-3290) Jun 26, 2026

copy-pr-bot Bot temporarily deployed to GITLAB June 26, 2026 18:50 Inactive

dynamo-review-agent Bot approved these changes Jun 26, 2026

View reviewed changes

jh-nv changed the title ~~fix(frontend): clarify the model-not-ready 503 message (DYN-3290)~~ fix(frontend): clarify the model-not-ready 503 message Jun 26, 2026

jh-nv force-pushed the readiness_message_DYN-3290 branch from 78700e2 to 56e1897 Compare June 26, 2026 19:34

jh-nv temporarily deployed to external_collaborator June 26, 2026 19:34 — with GitHub Actions Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 26, 2026 19:34 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 26, 2026 19:35 Inactive

rmccorm4 reviewed Jun 26, 2026

View reviewed changes

Comment thread lib/llm/src/protocols/openai/chat_completions/tool_parser_v2.rs

Merge branch 'main' into readiness_message_DYN-3290

d48c11f

rmccorm4 temporarily deployed to external_collaborator June 26, 2026 19:36 — with GitHub Actions Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 26, 2026 19:36 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 26, 2026 19:40 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(frontend): clarify the model-not-ready 503 message #10995

fix(frontend): clarify the model-not-ready 503 message #10995
jh-nv wants to merge 2 commits into
mainfrom
readiness_message_DYN-3290

jh-nv commented Jun 26, 2026 •

edited

Loading

Uh oh!

datadog-official Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jh-nv commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope / what this deliberately does NOT do

Behavioral map (verified, real vLLM)

Testing

Note

Uh oh!

datadog-official Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jh-nv commented Jun 26, 2026 •

edited

Loading

datadog-official Bot commented Jun 26, 2026 •

edited

Loading

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading