fix(mocker): make G1 use allocation transactional [DYN-3299]#11019
fix(mocker): make G1 use allocation transactional [DYN-3299]#11019PeaBrane wants to merge 17 commits into
Conversation
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
WalkthroughThe PR changes KV acquisition and reservation APIs to return ChangesG1 Acquire and Offload Flow
Offline Replay Liveness Docs
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
Comment |
Signed-off-by: PeaBrane <yanrpei@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/mocker/src/kv_manager/kvbm_backend.rs`:
- Around line 430-434: The offload dependency path in kvbm_backend.rs can return
Some(OffloadDependency) with both fields empty when
current_g1_offload_dependency() is None, which leaves refresh loops treating the
request as blocked without any wakeup. Update the logic around
has_nonterminal_g1_pipeline() and current_g1_offload_dependency() so that
offload blocking is only reported when there is a concrete future wakeup (real
transfer_id/deadline, scheduled completion, or similar notification), and
otherwise return no offload dependency. Use the existing OffloadDependency
construction in the current_g1_offload_dependency handling to locate and adjust
this branch.
- Around line 945-947: The eviction flow in
kvbm_backend::handle_evictions_with_source_slots_at is currently treated as
infallible by the caller that assigns to outcome and then unwraps with expect,
but it can legitimately return None when evicted PLHs do not map to
shadow-registry offload blocks. Update the allocation path at that call site to
handle the None case gracefully by returning a bounded retry/backoff instead of
panicking, and keep the existing dependency-outcome handling only for the
Some(outcome) path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 12722e39-6679-4a54-8d6a-ed1e3c9d996b
📒 Files selected for processing (11)
lib/mocker/src/kv_manager/kvbm_backend.rslib/mocker/src/kvbm_offload/bandwidth_sharing_model.rslib/mocker/src/kvbm_offload/engine.rslib/mocker/src/kvbm_offload/mod.rslib/mocker/src/kvbm_offload/worker.rslib/mocker/src/replay/offline/CLAUDE.mdlib/mocker/src/replay/offline/components/engine.rslib/mocker/src/scheduler/source_holds.rslib/mocker/src/scheduler/vllm/core.rslib/mocker/src/scheduler/vllm/policy/tests.rslib/mocker/src/scheduler/vllm/tests.rs
dynamo-ops
left a comment
There was a problem hiding this comment.
Architecture review: LGTM ✅
The core change — making G1 use allocation transactional via prepare_use/commit_use with an explicit G1Acquire<T> outcome enum — is well-designed. Key observations:
-
Atomicity is correctly preserved: The prepare phase collects existing handles (active/inactive hits) and reserves fresh slots in one batch via
reserve_g1_slots, then commit applies all mutations. On any failure arm (CapacityExhausted,BlockedOnOffload), no partial state leaks — existingImmutableBlockclones are dropped, unusedMutableBlocks return to the reset pool via their RAII Drop. -
RetryNow + validate_retry_witness: The bounded single-retry mechanism for presence-filtered evictions (where PLHs resolve to G2 shadow entries, releasing source slots without a DMA) is tight — asserts one retry max, positive slot delta, and generation advancement. The loop terminates.
-
Offload dependency propagation: The
offload_dependencyfield onVllmRequestState+refresh_request_offload_dependencyis clean. The refresh correctly clears the dependency when the specific transfer completes (checked byis_g1_offload_active) or when no nonterminal pipeline entries remain. -
Speculative/decode gating: The early-exit on offload dependency before entering allocation loops (prefill admission, speculative batch, decode allocation) prevents wasted work. The
sequence.pop()onBlockedOnOffloadin the decode path correctly undoes the uncommitted speculative token. -
CI test failure: The rust-tests failure should be investigated — likely a test not yet adapted to the new
G1Acquirereturn type or a timing issue in the offload engine mock.
The two CodeRabbit suggestions (defensive handling for None dependency fields in refresh_offload_dependency, and graceful fallback on handle_evictions_with_source_slots_at returning None) are reasonable hardening but not blocking — the current code's expect() is a valid contract assertion given the offload-enabled feature gate.
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/kvbm-engine/src/offload/pipeline.rs (1)
1475-1507: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winFinalize transfer states before returning block-transfer errors.
Line 1506 can return after states were marked
Transferring/in_flight, but the affectedTransferStates are nevermark_failed/set_error, soTransferHandle::wait()can hang while settlement only records the batch failure. Apply the same terminal-state handling to the allocation andexecute_local_transferearly-error paths too.Proposed fix pattern
- // Wait for transfer completion - notification.await?; + // Wait for transfer completion + if let Err(error) = notification.await { + let message = format!("block transfer failed: {error}"); + for (_transfer_id, (state, block_ids)) in &transfer_states { + let mut state_guard = state.lock().unwrap(); + state_guard.mark_failed(block_ids.iter().copied()); + state_guard.set_error(message.clone()); + } + return Err(error); + } phase.mark_settling();🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/kvbm-engine/src/offload/pipeline.rs` around lines 1475 - 1507, The transfer completion path in pipeline.rs leaves some TransferState entries stuck in Transferring/in_flight when an error returns from notification.await or from the earlier allocation/execute_local_transfer steps, which can make TransferHandle::wait() hang. Update the transfer flow around execute_local_transfer, notification.await, and the dst_blocks allocation path to apply terminal failure handling before returning: mark each affected TransferState as failed and set its error, and ensure phase/settlement state is finalized consistently on all early-exit paths. Keep the existing successful Transferring/in_flight transition, but add the same failure cleanup used elsewhere in this transfer pipeline for both allocation failures and local transfer execution failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/kvbm-logical/src/manager/mod.rs`:
- Around line 106-110: The registration counter in BlockPoolMetrics is being
used as a publication watermark, but inc_registrations_by() and snapshot() in
manager/mod.rs still rely on Ordering::Relaxed. Update the synchronization
around this watermark by using a release/acquire pair or a separate notifier so
that the batch’s store transitions and presence updates are fully visible before
another thread observes the increment. Keep the change centered on the
inc_registrations_by()/snapshot() flow and the check_presence()/swap-in planning
path.
---
Outside diff comments:
In `@lib/kvbm-engine/src/offload/pipeline.rs`:
- Around line 1475-1507: The transfer completion path in pipeline.rs leaves some
TransferState entries stuck in Transferring/in_flight when an error returns from
notification.await or from the earlier allocation/execute_local_transfer steps,
which can make TransferHandle::wait() hang. Update the transfer flow around
execute_local_transfer, notification.await, and the dst_blocks allocation path
to apply terminal failure handling before returning: mark each affected
TransferState as failed and set its error, and ensure phase/settlement state is
finalized consistently on all early-exit paths. Keep the existing successful
Transferring/in_flight transition, but add the same failure cleanup used
elsewhere in this transfer pipeline for both allocation failures and local
transfer execution failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f28e8ffa-12c1-4950-871a-97e4375d91f4
📒 Files selected for processing (27)
lib/kv-router/src/sequences/multi_worker.rslib/kv-router/src/sequences/single.rslib/kv-router/src/sequences/topology.rslib/kvbm-engine/src/offload/engine.rslib/kvbm-engine/src/offload/handle.rslib/kvbm-engine/src/offload/mod.rslib/kvbm-engine/src/offload/pipeline.rslib/kvbm-engine/src/offload/settlement.rslib/kvbm-logical/src/manager/mod.rslib/kvbm-logical/src/metrics/pool_metrics.rslib/kvbm-logical/src/pools/store.rslib/kvbm-logical/src/registry/mod.rslib/kvbm-logical/src/registry/tests.rslib/mocker/src/kv_manager/kvbm_backend.rslib/mocker/src/kvbm_offload/bandwidth_sharing_model.rslib/mocker/src/kvbm_offload/engine.rslib/mocker/src/kvbm_offload/mod.rslib/mocker/src/kvbm_offload/worker.rslib/mocker/src/replay/offline/CLAUDE.mdlib/mocker/src/replay/offline/components/engine.rslib/mocker/src/replay/offline/state.rslib/mocker/src/replay/router_shared.rslib/mocker/src/scheduler/source_holds.rslib/mocker/src/scheduler/vllm/core.rslib/mocker/src/scheduler/vllm/live.rslib/mocker/src/scheduler/vllm/policy/tests.rslib/mocker/src/scheduler/vllm/tests.rs
💤 Files with no reviewable changes (1)
- lib/mocker/src/replay/offline/state.rs
Signed-off-by: PeaBrane <yanrpei@gmail.com>
|
@coderabbitai Addressed the block-transfer failure-path finding in e7d93b0. Destination-allocation, synchronous transfer-start, and asynchronous completion errors now mark every affected block failed and publish a terminal TransferState error before returning, so TransferHandle::wait() cannot remain stranded. Added focused coverage for both queued and in-flight handles. The separate relaxed registration-watermark comment remains intentionally deferred. |
|
Tip For best results, initiate chat on the files or code changes.
Keeping the relaxed registration-watermark comment deferred is fine. If you want, I can help open a follow-up issue for that separately. |
Signed-off-by: PeaBrane <yanrpei@gmail.com>
This reverts commit 513b6c2. Signed-off-by: PeaBrane <yanrpei@gmail.com>
Summary
MoveBlock::Useallocation an all-or-nothing prepare/reserve/commit transaction, so failed attempts cannot publish phantom cache entries, partial active references, orStoredeventsThis implements the transactional allocation boundary described by #11016 and the semaphore-handoff settlement slice of #11018. Lifecycle watermarks and generalized auto-chain closure remain deferred.
Validation
Correctness and flake gate
Validated exact candidate
76e1d7afb7e4b7733e8e76bde6d3ca081c068094on an SC-01 CPU-only x86_64 allocation:kvbm-engine: 125 passedkvbm-logical: 493 passed plus 3 doctestscargo test --locked -p dynamo-mocker --features kvbm-offload: 557 passed, 1 ignoredidle_destination_reservation_wakes_at_offload_deadline: 200/200 repetitionslive_destination_eviction_uses_command_application_time: 200/200 repetitionsBoth known regressions fail on the exact baseline: the idle-destination case times out, and command-application-time eviction does not schedule G1→G2. Their candidate passes are intended fixes.
Deterministic tests also cover concurrency one, simultaneous completions, no successor, progress between subscription and initial check, block/object executors, cancellation, task abort, panic, executor error, shutdown, stale/foreign/unavailable tokens, overflow, G2-only release, partial chains, and speculative reservation.
White-box live path validation
Temporary tracing was applied only in the scratch candidate checkout, preserved as an artifact, then removed before rebuilding the exact candidate for A/B measurement.
queuedpeaked at 4,startingat 2)The traced path-validation load remained sub-saturation: frontend/prefill/decode CPU medians were 7.1%/10.6%/10.9%, with 39/39 successful measured requests.
Deterministic semantic A/B
Normalized traces remove wall timestamps, generated IDs, ports, paths, and absolute clock origins while preserving outputs, completion order, router decisions, modeled transfer durations, batch/block counts, and source-slot release.
Live before/after overhead ablation
Seven paired 60-second measured trials per workload after 15-second warmups, using closed-loop concurrency two, alternating arm order, paired seeds, and fresh services/discovery state per run:
All agreed performance gates passed.
Aggressive-tracing stress observation
With an intentionally tiny 12-block G1, concurrency four, and aggressive debug tracing, requests could not coexist once tracing stretched their overlap. The live decode scheduler entered 122,879 preempt/requeue passes, produced a 475 MB debug log, and seven clients timed out. The identical normal-logging workload completed 24/24 exact outputs in order.
The capacity preemption is expected for this artificial setup; the tight spin is a broader scheduler yield/backoff concern and is not attributed to pipeline settlement.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation