Skip to content

feat: bound phantom_loop ticks with a wall-clock timeout#17

Merged
electronicBlacksmith merged 1 commit intomainfrom
fix/loop-tick-timeout
Apr 11, 2026
Merged

feat: bound phantom_loop ticks with a wall-clock timeout#17
electronicBlacksmith merged 1 commit intomainfrom
fix/loop-tick-timeout

Conversation

@electronicBlacksmith
Copy link
Copy Markdown
Owner

@electronicBlacksmith electronicBlacksmith commented Apr 10, 2026

Summary

A loop tick could hang indefinitely when the agent ran a command inside docker exec that ignored signals (e.g. pytest in a container). The SDK's async iterator never yielded, LoopRunner.tick() awaited forever, the inFlight set was never released, and every subsequent tick was silently dropped.

This PR bounds every tick in wall-clock time, regardless of what the agent or its subprocesses are doing.

Design

Two-layer cancel:

  1. Soft cancelAbortController plumbed through AgentRuntime.handleMessage into runQuery's internal controller. Cooperative path for slow-but-responsive ticks.
  2. Hard cancelPromise.race with a hard-cancel timer in LoopRunner.tick(). Escape hatch for wedged subprocesses that ignore signals. On hard cancel, the runner explicitly calls AgentRuntime.releaseSession() so the activeSessions bookkeeping isn't leaked by the orphan promise.

Budget: default 30 minutes (informed by real-world ticks averaging ~10 min on small/medium work), min 1 minute, max 60 minutes. Exposed on the MCP tool as max_tick_duration_minutes to match the existing timeout_minutes convention; ms is preserved internally where setTimeout needs it.

Timed-out ticks finalize with status timed_out and a diagnostic last_error including elapsed time, the configured limit, and the last tool that ran. softTimerFired is tracked explicitly so post-SDK exceptions can't be misclassified as timeouts.

The agent prompt now instructs wrapping docker exec with the host timeout utility, since signals don't propagate through docker exec.

Notes on implementation

  • No clamp in runner.start() for maxTickDurationMs. The Zod schema at the MCP tool boundary already enforces the 1-60 minute window; runner-level clamping would force soft/hard-timeout tests to wait a full minute. The runner trusts its single internal caller.
  • softTimerFired is checked on the success path too, not just in the catch branch. AgentRuntime.runQuery swallows abort errors and returns a normal response with error text, so the race resolves cleanly when the soft timer fires. Without the extra check a timed-out tick would still recordTick + re-read the state file.
  • No-op .catch on the losing messagePromise so that a later rejection after the hard-timeout wins the race can't become an unhandled-rejection warning.

Test plan

  • bun run typecheck clean
  • bun run lint clean
  • bun test — 1011 pass, 0 fail (includes 5 new runner tests for the timeout paths)
  • Manual smoke: phantom_loop.start with a wedged shell command (e.g. sleep 9999 inside docker exec), max_tick_duration_minutes: 1, max_iterations: 1. Verify the loop finalizes as timed_out within ~90s with a diagnostic last_error.

Included fix

This branch also cherry-picks 506485d fix: make getListenerCount test resilient to leaked listeners from PR #16 so CI is green against current main. No conflict expected when either PR merges first.

A loop tick could hang indefinitely when the agent ran a command inside
'docker exec' that ignored signals (e.g. pytest in a container). The
SDK's async iterator never yielded, LoopRunner.tick() awaited forever,
the inFlight set was never released, and every subsequent tick was
silently dropped.

The fix gives the loop runner a contract: a tick is bounded in
wall-clock time, regardless of what the agent or its subprocesses are
doing. Two layers:

1. AbortController plumbed through AgentRuntime.handleMessage into
   runQuery's internal controller (cooperative path for slow-but-
   responsive ticks).
2. Promise.race with a hard-cancel timer in LoopRunner.tick() (escape
   hatch for wedged subprocesses that ignore signals). On hard cancel
   the runner explicitly calls AgentRuntime.releaseSession() so the
   activeSessions bookkeeping isn't leaked by the orphan promise.

Default budget is 30 minutes (informed by real-world ticks averaging
~10 min on small/medium work), min 1 minute, max 60 minutes. Exposed
on the MCP tool as max_tick_duration_minutes to match the existing
timeout_minutes convention; ms is preserved internally where setTimeout
needs it.

Timed-out ticks finalize with status 'timed_out' and a diagnostic
lastError including elapsed time, the configured limit, and the last
tool that ran. softTimerFired is tracked explicitly so post-SDK
exceptions (cost tracker, session touch) can't be misclassified as
timeouts.

The agent prompt now instructs wrapping 'docker exec' with the host
'timeout' utility, since signals don't propagate through docker exec.

Notes on divergence from the original draft:
- No clamp() in runner.start() for maxTickDurationMs. The Zod schema at
  the MCP tool boundary already enforces the 1-60 minute window, and
  runner-level clamping would force soft/hard timeout tests to wait a
  full minute. The runner trusts its single internal caller.
- softTimerFired is checked on the success path too, not just in the
  catch branch: the SDK swallows abort errors and returns a normal
  response with error text, so the race resolves cleanly when the soft
  timer fires. Without the extra check a timed-out tick would still
  recordTick + re-read the state file.
- A no-op .catch is attached to the losing messagePromise so that a
  later rejection after the hard-timeout wins the race can't become an
  unhandled-rejection warning.
@electronicBlacksmith electronicBlacksmith merged commit 4f01081 into main Apr 11, 2026
1 check passed
@electronicBlacksmith electronicBlacksmith deleted the fix/loop-tick-timeout branch April 11, 2026 04:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant