feat: bound phantom_loop ticks with a wall-clock timeout#17
Merged
electronicBlacksmith merged 1 commit intomainfrom Apr 11, 2026
Merged
feat: bound phantom_loop ticks with a wall-clock timeout#17electronicBlacksmith merged 1 commit intomainfrom
electronicBlacksmith merged 1 commit intomainfrom
Conversation
A loop tick could hang indefinitely when the agent ran a command inside 'docker exec' that ignored signals (e.g. pytest in a container). The SDK's async iterator never yielded, LoopRunner.tick() awaited forever, the inFlight set was never released, and every subsequent tick was silently dropped. The fix gives the loop runner a contract: a tick is bounded in wall-clock time, regardless of what the agent or its subprocesses are doing. Two layers: 1. AbortController plumbed through AgentRuntime.handleMessage into runQuery's internal controller (cooperative path for slow-but- responsive ticks). 2. Promise.race with a hard-cancel timer in LoopRunner.tick() (escape hatch for wedged subprocesses that ignore signals). On hard cancel the runner explicitly calls AgentRuntime.releaseSession() so the activeSessions bookkeeping isn't leaked by the orphan promise. Default budget is 30 minutes (informed by real-world ticks averaging ~10 min on small/medium work), min 1 minute, max 60 minutes. Exposed on the MCP tool as max_tick_duration_minutes to match the existing timeout_minutes convention; ms is preserved internally where setTimeout needs it. Timed-out ticks finalize with status 'timed_out' and a diagnostic lastError including elapsed time, the configured limit, and the last tool that ran. softTimerFired is tracked explicitly so post-SDK exceptions (cost tracker, session touch) can't be misclassified as timeouts. The agent prompt now instructs wrapping 'docker exec' with the host 'timeout' utility, since signals don't propagate through docker exec. Notes on divergence from the original draft: - No clamp() in runner.start() for maxTickDurationMs. The Zod schema at the MCP tool boundary already enforces the 1-60 minute window, and runner-level clamping would force soft/hard timeout tests to wait a full minute. The runner trusts its single internal caller. - softTimerFired is checked on the success path too, not just in the catch branch: the SDK swallows abort errors and returns a normal response with error text, so the race resolves cleanly when the soft timer fires. Without the extra check a timed-out tick would still recordTick + re-read the state file. - A no-op .catch is attached to the losing messagePromise so that a later rejection after the hard-timeout wins the race can't become an unhandled-rejection warning.
7b8d49b to
1e318e9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A loop tick could hang indefinitely when the agent ran a command inside
docker execthat ignored signals (e.g.pytestin a container). The SDK's async iterator never yielded,LoopRunner.tick()awaited forever, theinFlightset was never released, and every subsequent tick was silently dropped.This PR bounds every tick in wall-clock time, regardless of what the agent or its subprocesses are doing.
Design
Two-layer cancel:
AbortControllerplumbed throughAgentRuntime.handleMessageintorunQuery's internal controller. Cooperative path for slow-but-responsive ticks.Promise.racewith a hard-cancel timer inLoopRunner.tick(). Escape hatch for wedged subprocesses that ignore signals. On hard cancel, the runner explicitly callsAgentRuntime.releaseSession()so theactiveSessionsbookkeeping isn't leaked by the orphan promise.Budget: default 30 minutes (informed by real-world ticks averaging ~10 min on small/medium work), min 1 minute, max 60 minutes. Exposed on the MCP tool as
max_tick_duration_minutesto match the existingtimeout_minutesconvention; ms is preserved internally wheresetTimeoutneeds it.Timed-out ticks finalize with status
timed_outand a diagnosticlast_errorincluding elapsed time, the configured limit, and the last tool that ran.softTimerFiredis tracked explicitly so post-SDK exceptions can't be misclassified as timeouts.The agent prompt now instructs wrapping
docker execwith the hosttimeoututility, since signals don't propagate throughdocker exec.Notes on implementation
runner.start()formaxTickDurationMs. The Zod schema at the MCP tool boundary already enforces the 1-60 minute window; runner-level clamping would force soft/hard-timeout tests to wait a full minute. The runner trusts its single internal caller.softTimerFiredis checked on the success path too, not just in the catch branch.AgentRuntime.runQueryswallows abort errors and returns a normal response with error text, so the race resolves cleanly when the soft timer fires. Without the extra check a timed-out tick would stillrecordTick+ re-read the state file..catchon the losingmessagePromiseso that a later rejection after the hard-timeout wins the race can't become an unhandled-rejection warning.Test plan
bun run typecheckcleanbun run lintcleanbun test— 1011 pass, 0 fail (includes 5 new runner tests for the timeout paths)phantom_loop.startwith a wedged shell command (e.g.sleep 9999insidedocker exec),max_tick_duration_minutes: 1,max_iterations: 1. Verify the loop finalizes astimed_outwithin ~90s with a diagnosticlast_error.Included fix
This branch also cherry-picks
506485d fix: make getListenerCount test resilient to leaked listenersfrom PR #16 so CI is green against currentmain. No conflict expected when either PR merges first.