feat: bound phantom_loop ticks with a wall-clock timeout by electronicBlacksmith · Pull Request #17 · electronicBlacksmith/phantom

electronicBlacksmith · 2026-04-10T23:59:13Z

Summary

A loop tick could hang indefinitely when the agent ran a command inside docker exec that ignored signals (e.g. pytest in a container). The SDK's async iterator never yielded, LoopRunner.tick() awaited forever, the inFlight set was never released, and every subsequent tick was silently dropped.

This PR bounds every tick in wall-clock time, regardless of what the agent or its subprocesses are doing.

Design

Two-layer cancel:

Soft cancel — AbortController plumbed through AgentRuntime.handleMessage into runQuery's internal controller. Cooperative path for slow-but-responsive ticks.
Hard cancel — Promise.race with a hard-cancel timer in LoopRunner.tick(). Escape hatch for wedged subprocesses that ignore signals. On hard cancel, the runner explicitly calls AgentRuntime.releaseSession() so the activeSessions bookkeeping isn't leaked by the orphan promise.

Budget: default 30 minutes (informed by real-world ticks averaging ~10 min on small/medium work), min 1 minute, max 60 minutes. Exposed on the MCP tool as max_tick_duration_minutes to match the existing timeout_minutes convention; ms is preserved internally where setTimeout needs it.

Timed-out ticks finalize with status timed_out and a diagnostic last_error including elapsed time, the configured limit, and the last tool that ran. softTimerFired is tracked explicitly so post-SDK exceptions can't be misclassified as timeouts.

The agent prompt now instructs wrapping docker exec with the host timeout utility, since signals don't propagate through docker exec.

Notes on implementation

No clamp in runner.start() for maxTickDurationMs. The Zod schema at the MCP tool boundary already enforces the 1-60 minute window; runner-level clamping would force soft/hard-timeout tests to wait a full minute. The runner trusts its single internal caller.
softTimerFired is checked on the success path too, not just in the catch branch. AgentRuntime.runQuery swallows abort errors and returns a normal response with error text, so the race resolves cleanly when the soft timer fires. Without the extra check a timed-out tick would still recordTick + re-read the state file.
No-op .catch on the losing messagePromise so that a later rejection after the hard-timeout wins the race can't become an unhandled-rejection warning.

Test plan

bun run typecheck clean
bun run lint clean
bun test — 1011 pass, 0 fail (includes 5 new runner tests for the timeout paths)
Manual smoke: phantom_loop.start with a wedged shell command (e.g. sleep 9999 inside docker exec), max_tick_duration_minutes: 1, max_iterations: 1. Verify the loop finalizes as timed_out within ~90s with a diagnostic last_error.

Included fix

This branch also cherry-picks 506485d fix: make getListenerCount test resilient to leaked listeners from PR #16 so CI is green against current main. No conflict expected when either PR merges first.

A loop tick could hang indefinitely when the agent ran a command inside 'docker exec' that ignored signals (e.g. pytest in a container). The SDK's async iterator never yielded, LoopRunner.tick() awaited forever, the inFlight set was never released, and every subsequent tick was silently dropped. The fix gives the loop runner a contract: a tick is bounded in wall-clock time, regardless of what the agent or its subprocesses are doing. Two layers: 1. AbortController plumbed through AgentRuntime.handleMessage into runQuery's internal controller (cooperative path for slow-but- responsive ticks). 2. Promise.race with a hard-cancel timer in LoopRunner.tick() (escape hatch for wedged subprocesses that ignore signals). On hard cancel the runner explicitly calls AgentRuntime.releaseSession() so the activeSessions bookkeeping isn't leaked by the orphan promise. Default budget is 30 minutes (informed by real-world ticks averaging ~10 min on small/medium work), min 1 minute, max 60 minutes. Exposed on the MCP tool as max_tick_duration_minutes to match the existing timeout_minutes convention; ms is preserved internally where setTimeout needs it. Timed-out ticks finalize with status 'timed_out' and a diagnostic lastError including elapsed time, the configured limit, and the last tool that ran. softTimerFired is tracked explicitly so post-SDK exceptions (cost tracker, session touch) can't be misclassified as timeouts. The agent prompt now instructs wrapping 'docker exec' with the host 'timeout' utility, since signals don't propagate through docker exec. Notes on divergence from the original draft: - No clamp() in runner.start() for maxTickDurationMs. The Zod schema at the MCP tool boundary already enforces the 1-60 minute window, and runner-level clamping would force soft/hard timeout tests to wait a full minute. The runner trusts its single internal caller. - softTimerFired is checked on the success path too, not just in the catch branch: the SDK swallows abort errors and returns a normal response with error text, so the race resolves cleanly when the soft timer fires. Without the extra check a timed-out tick would still recordTick + re-read the state file. - A no-op .catch is attached to the losing messagePromise so that a later rejection after the hard-timeout wins the race can't become an unhandled-rejection warning.

electronicBlacksmith force-pushed the fix/loop-tick-timeout branch from 7b8d49b to 1e318e9 Compare April 11, 2026 04:10

electronicBlacksmith merged commit 4f01081 into main Apr 11, 2026
1 check passed

electronicBlacksmith deleted the fix/loop-tick-timeout branch April 11, 2026 04:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: bound phantom_loop ticks with a wall-clock timeout#17

feat: bound phantom_loop ticks with a wall-clock timeout#17
electronicBlacksmith merged 1 commit intomainfrom
fix/loop-tick-timeout

electronicBlacksmith commented Apr 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

electronicBlacksmith commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

Notes on implementation

Test plan

Included fix

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

electronicBlacksmith commented Apr 10, 2026 •

edited

Loading