perf(mocker): reduce scheduler and FPM publish overhead by jthomson04 · Pull Request #11025 · ai-dynamo/dynamo

jthomson04 · 2026-06-28T02:57:28Z

Summary

skip tokio_timerfd::Delay construction when scheduler work has already consumed a mocker's modeled deadline; future deadlines retain the precise timerfd path
reuse the single FPM publisher task's MessagePack buffer, borrow the worker ID during serialization, and encode event envelopes from borrowed payload/topic data
cache the formatted event-plane subject and parsed NATS subject instead of rebuilding and validating them for every FPM event

The FPM path still publishes every snapshot in channel order and preserves the existing per-rank counters and idle-heartbeat behavior. The owned event publication and public string-based NATS entry points remain available, and a regression test verifies that the borrowed envelope is byte-for-byte identical to the existing owned encoding.

Benchmark

The comparison used an optimized build with debug symbols on the same host throughout:

one mocker process/worker pinned to CPUs 0-1
one KV-routing frontend plus aiperf sharing CPUs 2-23
Qwen/Qwen3-0.6B, 64-token blocks, --speedup-ratio 1000000
aiperf c256 with exact ISL/OSL 1024/1024, zero variance, ignore_eos, fixed seed, 10-second warmup, and a 45-second measured interval
fresh frontend/mocker processes per run; all reported runs had zero errors and cancellations

build	clean throughput	incremental benefit	cumulative benefit
original `main`	207.95 req/s	baseline	baseline
expired-deadline fast path	258.47 req/s	+24.30%	+24.30%
FPM buffer + subject reuse	273.65 req/s	+5.87%	+31.59%

The FPM change also reduced mean latency by 5.62% and p99 latency by 5.44% relative to the timer-only build. A separate perf stat run measured 7.09% higher throughput and 5.85% fewer task-clock CPU seconds/request.

User+kernel profiles at 99 Hz contained about 11K samples with no lost samples and less than 1% unknown leaves. The timer change reduced the precise-sleep stack from 12.15% to 0.18%. The FPM change then reduced FPM publisher inclusive cycles from 14.50% to 9.78%, serialization from 3.64% to 0.71%, subject conversion from 0.65% to 0.04%, and NATS subject validation from 0.84% to 0.17%.

Validation

cargo fmt --all --check
cargo clippy -p dynamo-runtime -p dynamo-llm --lib -- -D warnings
cargo test -p dynamo-llm --lib (1,389 passed, 5 ignored)
cargo test -p dynamo-mocker (468 passed, 1 ignored)
cargo test -p dynamo-runtime transports::event_plane::codec::tests -- --nocapture (3 passed)
focused FPM serialization/heartbeat tests (6 passed)
two clean 45-second aiperf repeats, one 60-second user+kernel profile, and one separate counter run for each optimization stage

The full dynamo-runtime suite was attempted, but the existing pipeline::network::egress::push_router::tests::transport_resolution_falls_back_when_selected_instance_disappears test hung. It reproduced when run alone and was stopped after 120 seconds; focused tests exercising this patch pass.

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

datadog-official · 2026-06-28T02:57:54Z

⚠️ Warnings

🚦 3 Pipeline jobs failed

Docs link check | lychee

PR | dynamo-runtime / rust-gpu

PR | dynamo-status-check

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 6aa9491 | Docs | Give us feedback!}

perf(mocker): reduce scheduler and FPM publish overhead

6aa9491

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

pull-request-size Bot added the size/L label Jun 28, 2026

jthomson04 temporarily deployed to external_collaborator June 28, 2026 02:57 — with GitHub Actions Inactive

github-actions Bot added the perf label Jun 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(mocker): reduce scheduler and FPM publish overhead#11025

perf(mocker): reduce scheduler and FPM publish overhead#11025
jthomson04 wants to merge 1 commit into
mainfrom
jthomson04/mocker-cpu-overhead

jthomson04 commented Jun 28, 2026

Uh oh!

datadog-official Bot commented Jun 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jthomson04 commented Jun 28, 2026

Summary

Benchmark

Validation

Uh oh!

datadog-official Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

datadog-official Bot commented Jun 28, 2026 •

edited

Loading