Skip to content

fix(sglang): derive enable_eagle from SpeculativeAlgorithm.is_eagle() (covers EAGLE3)#10982

Draft
yifjiang wants to merge 1 commit into
ai-dynamo:mainfrom
yifjiang:yifjiang/sglang-eagle3-enable-eagle
Draft

fix(sglang): derive enable_eagle from SpeculativeAlgorithm.is_eagle() (covers EAGLE3)#10982
yifjiang wants to merge 1 commit into
ai-dynamo:mainfrom
yifjiang:yifjiang/sglang-eagle3-enable-eagle

Conversation

@yifjiang

@yifjiang yifjiang commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Problem

components/src/dynamo/sglang/register.py sets ModelRuntimeConfig.enable_eagle = True from a hand-maintained name set ("EAGLE", "NEXTN"). The KV router uses enable_eagle (lib/llm/src/discovery/watcher.rs) to bigram-align the frontend's prompt-block hashes (window = stride + 1, lib/kv-router/src/protocols.rs) so they match the worker's KV events.

But the worker's radix cache bigram-keys its KV-event hashes iff SpeculativeAlgorithm.is_eagle() (srt/managers/scheduler.py) = {EAGLE, EAGLE3, FROZEN_KV_MTP}. The name set had drifted from that predicate:

  • EAGLE3 was missing → an EAGLE3 worker publishes enable_eagle=false → the frontend hashes at plain page_size while the worker emits bigram-keyed hashes → overlap is always 0 → KV-aware routing is cache-blind for EAGLE3 (falls back to load-only).
  • "NEXTN" is dead: ServerArgs .upper()s the value and _resolve_speculative_algorithm_alias normalizes NEXTN/EAGLEEAGLE (or FROZEN_KV_MTP for Gemma4 drafts) before register sees it, so the literal never matches "NEXTN" — and FROZEN_KV_MTP (is_eagle()=true) was also missing.

Fix

Derive enable_eagle from spec_algorithm.is_eagle() — the same predicate the radix cache uses — so the frontend window and the worker's events stay in lockstep by construction; this covers EAGLE3 and FROZEN_KV_MTP and drops the dead literal:

- if server_args.speculative_algorithm in ("EAGLE", "NEXTN"):
+ if _eagle_enabled_for(server_args.speculative_algorithm):   # SpeculativeAlgorithm.from_string(...).is_eagle()
      runtime_config.enable_eagle = True

e2e before/after (public repro)

Stock dynamo + sglang (nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.3.0-dev.1-cuda12, sglang 0.5.12.post1), Qwen/Qwen3-4B + AngelSlim/Qwen3-4B_eagle3 (EAGLE3 draft), single warm node, --router-mode kv --router-kv-events, EAGLE3 spec, a repeated ~930-token prefix (page-size 16 → 58 blocks). Same binary, same models — only register.py differs between phases:

Phase published enable_eagle KV-router effective cached blocks on the warm repeat
Before (stock register.py) false 0.00 — cache-blind every repeat
After (this fix) true 58.00 — full prefix credited

Before, the router's Formula logged with 0.00 effective cached blocks on every warm request (EAGLE3 worker's bigram events never matched the plain-token frontend hashes). After, the same warm prefix logs with 58.00 effective cached blocks — the worker's events now match, so KV-aware routing sees the cache. (The bug and enable_eagle: false were also reproduced on a separate EAGLE3 deployment; this Qwen3-4B run is the public, reproducible demonstration.)

Testing

  • New parametrized unit test test_eagle_enabled_for_speculative_algorithm pins the enabled set to is_eagle(): EAGLE/EAGLE3/FROZEN_KV_MTP → True; DFLASH/NGRAM/STANDALONE/NONE/None → False — guarding against the set drifting again.
  • The downstream half is already covered: lib/kv-router/src/protocols.rs::test_compute_block_hash_for_seq_eagle_windows exercises is_eagle = Some(true) → the stride+1 bigram window. So once enable_eagle tracks is_eagle(), EAGLE3/FROZEN_KV_MTP flow through the same validated bigram-window path EAGLE already used.

Scope

Any EAGLE3 (or FROZEN_KV_MTP) model under --router-mode kv with ≥2 workers and worker KV events; single-worker / no-router / no-kv-events unaffected.

@github-actions

Copy link
Copy Markdown
Contributor

👋 Hi yifjiang! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions Bot added external-contribution Pull request is from an external contributor backend::sglang Relates to the sglang backend labels Jun 26, 2026
@yifjiang yifjiang force-pushed the yifjiang/sglang-eagle3-enable-eagle branch from 635fe3d to f6d54b0 Compare June 26, 2026 05:09
@yifjiang yifjiang temporarily deployed to external_collaborator June 26, 2026 05:09 — with GitHub Actions Inactive
@datadog-official

datadog-official Bot commented Jun 26, 2026

Copy link
Copy Markdown

Pipelines

⚠️ Warnings

🚦 4 Pipeline jobs failed

PR | deploy-operator   View in Datadog   GitHub Actions

PR | deploy-status-check   View in Datadog   GitHub Actions

PR | dynamo-runtime / rust-gpu   View in Datadog   GitHub Actions

View all 4 failed jobs.

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: e4b95f4 | Docs | Give us feedback!

@yifjiang yifjiang force-pushed the yifjiang/sglang-eagle3-enable-eagle branch from f6d54b0 to 25ec553 Compare June 26, 2026 05:31
@yifjiang yifjiang temporarily deployed to external_collaborator June 26, 2026 05:31 — with GitHub Actions Inactive
… (covers EAGLE3)

ModelRuntimeConfig.enable_eagle was set from a hand-maintained name set
("EAGLE", "NEXTN"). The KV router uses enable_eagle to bigram-align the frontend's
prompt-block hashes so they match the worker's KV events. But sglang's radix cache
bigrams its KV-event hashes iff SpeculativeAlgorithm.is_eagle() (srt/managers/scheduler.py)
= {EAGLE, EAGLE3, FROZEN_KV_MTP}, so the name set had drifted from the real predicate:

  - EAGLE3 was missing -> an EAGLE3 worker publishes enable_eagle=false -> the frontend
    hashes prompt blocks at plain page_size while the worker emits bigram-keyed hashes ->
    overlap is always 0 -> KV-aware routing is cache-blind for EAGLE3.
  - "NEXTN" in the set is dead: ServerArgs normalizes NEXTN/EAGLE to EAGLE (or FROZEN_KV_MTP
    for Gemma4 drafts) before register sees it, so the literal never matches "NEXTN" -- and
    FROZEN_KV_MTP (is_eagle()=true) was also missing.

Derive enable_eagle from spec_algorithm.is_eagle() so it stays in lockstep with the radix's
bigram condition by construction; this covers EAGLE3 and FROZEN_KV_MTP and drops the dead
literal. Add a parametrized unit test pinning the enabled set to is_eagle().

Signed-off-by: Yifan Jiang <19356972+yifjiang@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@yifjiang yifjiang force-pushed the yifjiang/sglang-eagle3-enable-eagle branch from 25ec553 to e4b95f4 Compare June 26, 2026 05:33
@yifjiang yifjiang temporarily deployed to external_collaborator June 26, 2026 05:33 — with GitHub Actions Inactive
@yifjiang yifjiang changed the title fix(sglang): include EAGLE3 in enable_eagle so KV-aware routing works for EAGLE3 workers fix(sglang): derive enable_eagle from SpeculativeAlgorithm.is_eagle() (covers EAGLE3) Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::sglang Relates to the sglang backend external-contribution Pull request is from an external contributor fix size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant