diff --git a/CHANGELOG.md b/CHANGELOG.md index d7d7ac0380..1e328e0edb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,58 @@ # Changelog +## [1.43.3.0] - 2026-05-21 + +## **Headed Chromium embedded by external supervisors stops auto-shutting-down after 30 minutes of HTTP idle.** +## **Four module-level lifecycle handlers in `browse/src/server.ts` now read through an `activeBrowserManager` indirection so embedders (gbrowser's phoenix overlay) reach the right `BrowserManager` instance instead of the dead module-level one.** + +The dual-instance bug surfaced when a Codex plan review caught what the static eng review missed: `idleCheckTick`, the parent-process watchdog, the SIGTERM handler, and `onDisconnect` wiring all read the module-level `BrowserManager` directly. Embedders pass their own instance into `buildFetchHandler({ browserManager: ... })`, so the module-level instance never has `launchHeaded()` called on it. Its `connectionMode` stays `'launched'` forever, headed-mode early-returns never fire, and after 30 minutes of HTTP idle the server kills itself out from under a still-open overlay window. The onDisconnect leak — window-close cleanup running against the wrong instance — was masked by the 30-min auto-shutdown until this fix; both ship together because they share a single root cause. + +The fix introduces `let activeBrowserManager: BrowserManager` at module scope, symmetric with the existing `let activeShutdown` pattern. `buildFetchHandler` retargets it at `cfg.browserManager` and CHAINS `cfg.browserManager.onDisconnect` to `activeShutdown` instead of overwriting any handler the caller already installed. Caller exceptions are logged but never block gstack shutdown — defensive symmetry with `safeUnlinkQuiet` / `safeKill` in `error-handling.ts`. Caller-set onDisconnect handlers run first so embedders can snapshot or log before the process exits; gstack's shutdown owns `process.exit(code)` and runs last. + +### The numbers that matter + +Source: `bun test browse/test/server-factory.test.ts` — 33 tests, all green. New describe block `idle timer + onDisconnect dual-instance fix` pins five behavioral guarantees plus a static guard. + +| Surface | Before | After | +|---|---|---| +| gbrowser overlay session, headed, 31 min HTTP idle | Server self-terminates; overlay window orphaned | Server stays alive; idleCheckTick reads cfg-instance and returns early | +| Headless CLI, 31 min idle | Auto-shutdown (regression-protected by Test 2) | Same behavior, regression test added | +| Tunnel-active session, headless, 31 min idle | Auto-shutdown skipped (already correct) | Same; Test 4 pins it behaviorally | +| Window-close on embedder-owned headed window | `browserManager.onDisconnect` fires on dead module-level instance; no cleanup | `cfgBrowserManager.onDisconnect` chained to activeShutdown; full cleanup runs | +| Embedder pre-installed onDisconnect handler | Silently overwritten by `buildFetchHandler` | Chained: caller's handler runs first, then gstack shutdown | +| SIGTERM in headed mode (embedder) | Reads stale module-level instance (Codex-caught, original plan missed) | Reads via `activeBrowserManager` | + +The static guard (Test 5) counts `activeBrowserManager.getConnectionMode()` calls outside `buildFetchHandler` and pins the count at exactly 3 — `idleCheckTick`, the parent watchdog, and the SIGTERM handler. A future refactor that reintroduces a stale read against module-level `browserManager` at one of those sites fails CI before the user-visible bug returns. + +### What this means for gbrowser + +gbrowser's phoenix overlay can hold a headed Chromium window open indefinitely without gstack pulling the rug out at the 30-minute mark. Window-close cleanup reaches the right `BrowserManager` instance, so terminal-agent, profile locks, and state files all get torn down against the cfg-owned chrome rather than the dead module-level one. Embedders that pre-wire `cfg.browserManager.onDisconnect` for their own pre-shutdown work (logging, snapshotting, gbd handoff) now have that handler preserved instead of clobbered. gbrowser bumps its gstack submodule SHA after this lands; no gbrowser-side code changes required. + +### Itemized changes + +#### Fixed + +- **`browse/src/server.ts`**: Six edit sites apply the indirection. + - Edit 1 (line ~705): Declared `let activeBrowserManager: BrowserManager = browserManager;` alongside the module-level `const browserManager`. Module-level `browserManager.onDisconnect` default wire stays in place as the safety net for the CLI flow before `buildFetchHandler` runs. + - Edit 2 (line ~596): Extracted the idle-check setInterval callback into a named `idleCheckTick()` function so behavioral tests can drive it directly. Reads `activeBrowserManager.getConnectionMode()`. + - Edit 3 (line ~658): Parent watchdog now reads `activeBrowserManager.getConnectionMode()`. + - Edit 4 (inside `buildFetchHandler`, line ~1387): Retargets `activeBrowserManager` at `cfgBrowserManager` and CHAINS the cfg-instance's onDisconnect to `activeShutdown` (preserving any caller-installed handler). Replaces what would have been a bare `cfg.onDisconnect = ...` clobber — caught by Codex against an earlier draft. + - Edit 5 (no code change): Confirmed the module-level `browserManager.onDisconnect` at line 714 stays in place. + - Edit 6 (line ~1212): SIGTERM handler reads `activeBrowserManager.getConnectionMode()`. Caught by Codex; the original eng-review plan missed this fourth lifecycle site. +- **`__testInternals__` export**: New test-only surface in `browse/src/server.ts` exposing `idleCheckTick`, `setTunnelActive`, `setLastActivity`, and `resetShutdownState`. Lets tests exercise the dual-instance behavior deterministically without mutating `Date.now` globally (which would interact with the leaked module-level setInterval) or leaking `isShuttingDown` state between tests. + +#### Added + +- **`browse/test/server-factory.test.ts`**: New `idle timer + onDisconnect dual-instance fix` describe block with five behavioral tests. Reuses the existing `makeMinimalConfig()` + `__resetRegistry()` patterns from the factory contract tests; new `makeMockBrowserManager()` helper. Tests T1 (REGRESSION — headed embedder does not auto-shutdown), T2 (paired defensive — headless still shuts down), T3 (chain semantics — caller-set onDisconnect preserved + async via `.rejects.toThrow`), T4 (tunnelActive blocks shutdown), T5 (static guard — exactly 3 lifecycle sites use the indirection). + +#### Changed + +- **`browse/test/sidebar-ux.test.ts`**: Deleted the old `idle check skips in headed mode` string-grep test at line 1596 — it grepped for `=== 'headed'` + `return` and would have passed even with the dual-instance bug present. Behavioral coverage moved to `server-factory.test.ts` per Codex finding (duplicating partial test helpers across files rots; the factory test file already solved minimal-cfg + registry-reset). + +#### For contributors + +- **Cross-model review note**: The eng review's static-assessment pass said "0 issues" in Architecture, Code Quality, and Performance. Codex's plan review then grounded six issues in actual code reads: Bun memoizes dynamic imports (so `await import('../src/server')` doesn't give fresh module state per test), `initRegistry` throws on token-reuse between tests, `shutdown()` is async (sync `.toThrow()` cannot catch the rejection), `cfg.browserManager.onDisconnect` is a public field that callers may set, the original plan missed the SIGTERM site at line 1186, and tests belong in `server-factory.test.ts` not `sidebar-ux.test.ts`. All six were verified against the actual code and incorporated into the shipped plan. The static eng review's blind spot here was runtime/module-cache semantics; the lesson is that "0 issues" from a static pass is a weaker signal than two-model consensus. + ## [1.43.2.0] - 2026-05-21 ## **Three flagship workflows stop lying to users: /retro detects stale base before fabricating a narrative, /sync-gbrain resumes from gbrain's checkpoint instead of restarting the 35-min import loop, and /review forces every finding to quote the code line that motivates it.** @@ -93,6 +146,7 @@ If you run `/retro` on a Conductor branch that's been around for a few days, the - Defer-doc artifact `~/.gstack-dev/plans/1539-framework-aware-review.md` describes the multi-week framework-aware ORM verification extension (Django/Rails/SQLAlchemy detection, model-introspection helpers, migration-history-aware checks) intentionally deferred from this wave. Promote to active plan when v1.43.0.0 ships and a second high-volume FP report lands on a different framework, or a follow-up retro shows the lighter quoted-line gate doesn't deliver measurable FP reduction. - Wave shape preserved from Daegu pattern: ONE bundled PR with bisect commits, atomic squashed commits for `.tmpl` edit + `gen:skill-docs` regen pairs, intermediate verification checkpoints, original contributors credited in commit author + footer. See `[[feedback_one_pr_fix_waves]]` in agent memory. + ## [1.43.1.0] - 2026-05-21 ## **Local gbrain PGLite now defaults to Voyage's code-specialized embedding model when `VOYAGE_API_KEY` is set.** @@ -141,6 +195,7 @@ If you have `VOYAGE_API_KEY` set and run `/setup-gbrain` on a fresh machine, `gb **Regenerated** - `setup-gbrain/SKILL.md`, `sync-gbrain/SKILL.md` — refreshed via `bun run gen:skill-docs --host all` after the template edits + ## [1.43.0.0] - 2026-05-20 ## **iOS QA on a real iPhone — no XCTest, no WebDriverAgent, no simulators.** diff --git a/VERSION b/VERSION index d6b96738e8..f2c85ebe28 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.43.2.0 +1.43.3.0 diff --git a/browse/src/server.ts b/browse/src/server.ts index 05db6665b0..afeb5a09e9 100644 --- a/browse/src/server.ts +++ b/browse/src/server.ts @@ -590,17 +590,39 @@ function resetIdleTimer() { lastActivity = Date.now(); } -const idleCheckInterval = setInterval(() => { +// Named for behavioral testing via __testInternals__. The factory tests in +// server-factory.test.ts call this directly so the idle-shutdown path can be +// exercised without waiting 60s for the interval to fire. +function idleCheckTick() { // Headed mode: the user is looking at the browser. Never auto-die. // Only shut down when the user explicitly disconnects or closes the window. - if (browserManager.getConnectionMode() === 'headed') return; + // Reads via the activeBrowserManager indirection so embedders that pass + // their own BrowserManager into buildFetchHandler hit the right instance. + if (activeBrowserManager.getConnectionMode() === 'headed') return; // Tunnel mode: remote agents may send commands sporadically. Never auto-die. if (tunnelActive) return; if (Date.now() - lastActivity > IDLE_TIMEOUT_MS) { console.log(`[browse] Idle for ${IDLE_TIMEOUT_MS / 1000}s, shutting down`); activeShutdown?.(); } -}, 60_000); +} +const idleCheckInterval = setInterval(idleCheckTick, 60_000); + +// Test-only surface for server-factory.test.ts. Lets the dual-instance +// idle-timer behavior be exercised deterministically without mutating +// Date.now (which would interact with the leaked module-level setInterval). +// Production code must never import this — see `idle timer + onDisconnect +// dual-instance fix` describe block for usage. +export const __testInternals__ = { + idleCheckTick, + setTunnelActive: (v: boolean) => { tunnelActive = v; }, + setLastActivity: (t: number) => { lastActivity = t; }, + // Reset the module-level shutdown latch so tests that drive shutdown to + // completion (process.exit-stubbed) can be followed by tests that also + // need shutdown to fire. Without this, the second test's shutdown + // returns early at the `if (isShuttingDown) return;` guard. + resetShutdownState: () => { isShuttingDown = false; }, +}; // ─── Parent-Process Watchdog ──────────────────────────────────────── // When the spawning CLI process (e.g. a Claude Code session) exits, this @@ -638,7 +660,7 @@ if (BROWSE_PARENT_PID > 0 && !IS_HEADED_WATCHDOG) { // the parent shell between invocations. The idle timeout (30 min) // handles eventual cleanup. if (hasActivePicker()) return; - const headed = browserManager.getConnectionMode() === 'headed'; + const headed = activeBrowserManager.getConnectionMode() === 'headed'; if (headed || tunnelActive) { console.log(`[browse] Parent process ${BROWSE_PARENT_PID} exited in ${headed ? 'headed' : 'tunnel'} mode, shutting down`); activeShutdown?.(); @@ -678,13 +700,22 @@ function emitInspectorEvent(event: any): void { // ─── Server ──────────────────────────────────────────────────── const browserManager = new BrowserManager(); +// Indirection for embedders. Module-level handlers (idleCheckTick, parent +// watchdog, SIGTERM) read activeBrowserManager so that buildFetchHandler can +// retarget them at a caller-supplied BrowserManager. Symmetric with the +// existing `let activeShutdown` pattern at module scope (line ~113). +// Without this, embedders like gbrowser hit the dead module-level instance +// whose connectionMode never leaves 'launched' — and headed mode never +// short-circuits idle-shutdown. +let activeBrowserManager: BrowserManager = browserManager; // When the user closes the headed browser window, run full cleanup // (kill sidebar-agent, save session, remove profile locks, delete state file) // before exiting. Exit code 0 means user-initiated clean quit (Cmd+Q on // macOS) so process supervisors like gbrowser's gbd skip the restart loop; // 2 means a real crash that should respawn. The fallback `?? 2` preserves // legacy crash semantics for any caller that invokes onDisconnect without -// an explicit code. +// an explicit code. This is the safety-net default for the CLI flow before +// any buildFetchHandler call rebinds onDisconnect onto the cfg instance. browserManager.onDisconnect = (code) => activeShutdown?.(code ?? 2); let isShuttingDown = false; @@ -1183,7 +1214,7 @@ if (import.meta.main) { console.log('[browse] Received SIGTERM but cookie picker is active, ignoring to avoid stranding the picker UI'); return; } - const headed = browserManager.getConnectionMode() === 'headed'; + const headed = activeBrowserManager.getConnectionMode() === 'headed'; if (headed || tunnelActive) { console.log(`[browse] Received SIGTERM in ${headed ? 'headed' : 'tunnel'} mode, shutting down`); activeShutdown?.(); @@ -1360,6 +1391,31 @@ export function buildFetchHandler(cfg: ServerConfig): ServerHandle { // differs from the module-level instance. activeShutdown = shutdown; + // Retarget the BrowserManager indirection at the cfg-instance so the + // module-level idleCheckTick + parent watchdog + SIGTERM handler all read + // the right connectionMode. Without this, headed embedders auto-shutdown + // after 30 min of HTTP idle because the dead module-level instance still + // reports connectionMode === 'launched'. + activeBrowserManager = cfgBrowserManager; + + // Wire the cfg-instance's onDisconnect to run shutdown when the user + // closes the headed browser window. CHAIN any caller-provided handler + // instead of overwriting it: gbrowser may have set its own onDisconnect + // before calling buildFetchHandler (e.g. for snapshot/log work that needs + // to run before the process exits). Caller errors are logged but never + // block gstack shutdown — defensive symmetry with the safeUnlinkQuiet / + // safeKill philosophy in error-handling.ts. + const callerOnDisconnect = cfgBrowserManager.onDisconnect; + cfgBrowserManager.onDisconnect = async (code) => { + if (callerOnDisconnect) { + try { await callerOnDisconnect(code); } + catch (err: any) { + console.warn('[browse] caller onDisconnect threw:', err?.message ?? err); + } + } + await activeShutdown?.(code ?? 2); + }; + // Substitute cfgBrowserManager for module-level browserManager in the // dispatcher body so all browser-state reads/writes go through the cfg // instance. Other module-level references (handleCommand, getTokenInfo, diff --git a/browse/test/server-factory.test.ts b/browse/test/server-factory.test.ts index bf9d3f7fcf..6b5feb2641 100644 --- a/browse/test/server-factory.test.ts +++ b/browse/test/server-factory.test.ts @@ -1,7 +1,8 @@ -import { describe, test, expect, beforeEach } from 'bun:test'; +import { describe, test, expect, beforeEach, mock } from 'bun:test'; import { resolveConfigFromEnv, buildFetchHandler, + __testInternals__, type ServerConfig, type ServerHandle, type Surface, @@ -11,6 +12,8 @@ import { __resetRegistry, initRegistry } from '../src/token-registry'; import { BrowserManager } from '../src/browser-manager'; import { resolveConfig } from '../src/config'; import * as crypto from 'crypto'; +import * as fs from 'node:fs'; +import * as path from 'node:path'; /** * Tests for the factory-export API surface added so gbrowser (phoenix) can @@ -381,3 +384,141 @@ describe('buildFetchHandler factory contract', () => { expect(() => initRegistry('second-token-pad-to-16-chars')).toThrow(/already initialized/i); }); }); + +// ─── Idle timer + onDisconnect dual-instance fix (v1.42.3.0) ────────── +// +// Before this fix, module-level handlers (idleCheckTick, parent watchdog, +// SIGTERM, onDisconnect default wire) all read the module-level +// BrowserManager directly. For embedders (gbrowser) that pass their own +// BrowserManager into buildFetchHandler, the module-level instance never +// has launchHeaded() called on it — so connectionMode stays 'launched' +// forever and headed mode never short-circuits idle-shutdown. Result: +// 30-min auto-shutdown of overlay sessions. +// +// Fix: introduce `let activeBrowserManager` indirection (symmetric with +// the existing `let activeShutdown` pattern). buildFetchHandler retargets +// it at cfg.browserManager AND chains cfg.browserManager.onDisconnect to +// activeShutdown (without clobbering any caller-provided handler). + +function makeMockBrowserManager(mode: 'launched' | 'headed') { + return { + getConnectionMode: () => mode, + isWatching: () => false, + stopWatch: () => {}, + close: async () => {}, + onDisconnect: null as ((code?: number) => void | Promise) | null, + }; +} + +describe('idle timer + onDisconnect dual-instance fix', () => { + beforeEach(() => { + __resetRegistry(); + // Reset module state every test. Bun memoizes the server.ts module + // import for the whole test process, so `lastActivity`, `tunnelActive`, + // `activeShutdown`, `activeBrowserManager`, and `isShuttingDown` leak + // between tests. We reset what we touch here; the rest is fresh + // because each test calls buildFetchHandler with a new mock instance. + __testInternals__.setTunnelActive(false); + __testInternals__.setLastActivity(Date.now()); + __testInternals__.resetShutdownState(); + }); + + test('CRITICAL — REGRESSION: headed embedder does not auto-shutdown at idle', () => { + const exitMock = mock((_code?: number) => { throw new Error('process.exit called'); }); + const originalExit = process.exit; + (process as any).exit = exitMock; + try { + const mockBM = makeMockBrowserManager('headed'); + buildFetchHandler(makeMinimalConfig({ browserManager: mockBM as any })); + // Drive lastActivity past the idle threshold via the test seam instead + // of mutating Date.now — the leaked module-level setInterval would + // see fake-time and could fire shutdown if the timing aligned. + __testInternals__.setLastActivity(Date.now() - (31 * 60 * 1000)); + __testInternals__.idleCheckTick(); + expect(exitMock).not.toHaveBeenCalled(); + } finally { + (process as any).exit = originalExit; + } + }); + + test('headless still auto-shuts down at idle (paired defensive)', async () => { + // Non-throwing mock: idleCheckTick fires shutdown as a fire-and-forget + // async call. Throwing from process.exit becomes an unhandled rejection + // that the test runner catches. Recording the call is enough. + const exitMock = mock((_code?: number) => {}); + const originalExit = process.exit; + (process as any).exit = exitMock; + try { + const mockBM = makeMockBrowserManager('launched'); + buildFetchHandler(makeMinimalConfig({ browserManager: mockBM as any })); + __testInternals__.setLastActivity(Date.now() - (31 * 60 * 1000)); + __testInternals__.idleCheckTick(); + // Drain microtasks: shutdown awaits flushBuffers + cfgBrowserManager.close + // before reaching process.exit. + await Promise.resolve(); + await Promise.resolve(); + await new Promise(r => setImmediate(r)); + await new Promise(r => setImmediate(r)); + expect(exitMock).toHaveBeenCalled(); + } finally { + (process as any).exit = originalExit; + } + }); + + test('buildFetchHandler chains cfgBrowserManager.onDisconnect, preserving caller-set handler', async () => { + const mockBM = makeMockBrowserManager('headed'); + const callerCb = mock(async (_code?: number) => {}); + mockBM.onDisconnect = callerCb; + buildFetchHandler(makeMinimalConfig({ browserManager: mockBM as any })); + // gstack should have wrapped the caller-installed handler instead of + // clobbering it (Codex finding: BrowserManager.onDisconnect is a public + // field; gbrowser may set it before calling buildFetchHandler). + expect(typeof mockBM.onDisconnect).toBe('function'); + expect(mockBM.onDisconnect).not.toBe(callerCb); + // Verify the chain: invoking the wrapped handler runs the caller + // callback AND reaches activeShutdown (which calls process.exit at the + // very end of its async path). Stubbing process.exit to throw aborts + // the chain before isShuttingDown can leak into later tests. + const exitMock = mock((_code?: number) => { throw new Error('process.exit called'); }); + const originalExit = process.exit; + (process as any).exit = exitMock; + try { + await expect((mockBM.onDisconnect as any)(0)).rejects.toThrow('process.exit called'); + expect(callerCb).toHaveBeenCalledWith(0); + expect(exitMock).toHaveBeenCalledWith(0); + } finally { + (process as any).exit = originalExit; + } + }); + + test('tunnelActive blocks idle-shutdown even in headless mode', () => { + const exitMock = mock((_code?: number) => { throw new Error('process.exit called'); }); + const originalExit = process.exit; + (process as any).exit = exitMock; + try { + const mockBM = makeMockBrowserManager('launched'); + buildFetchHandler(makeMinimalConfig({ browserManager: mockBM as any })); + __testInternals__.setTunnelActive(true); + __testInternals__.setLastActivity(Date.now() - (31 * 60 * 1000)); + __testInternals__.idleCheckTick(); + expect(exitMock).not.toHaveBeenCalled(); + } finally { + (process as any).exit = originalExit; + } + }); + + test('lifecycle handlers (idleCheckTick + parent watchdog + SIGTERM) read activeBrowserManager, not module-level browserManager', () => { + // Static guard against a future refactor reintroducing a stale read. + // The 3 lifecycle sites this plan fixed all call getConnectionMode via + // the indirection. Other module-level browserManager reads inside + // handleCommandInternalImpl (informational mode reporting in response + // payloads) are out of scope and intentionally untouched. + const src = fs.readFileSync(path.join(__dirname, '..', 'src', 'server.ts'), 'utf-8'); + const factoryStart = src.indexOf('export function buildFetchHandler'); + expect(factoryStart).toBeGreaterThan(0); + const moduleLevel = src.slice(0, factoryStart); + const activeCount = (moduleLevel.match(/activeBrowserManager\.getConnectionMode\(\)/g) || []).length; + // Edit 2 (idleCheckTick), Edit 3 (parent watchdog), Edit 6 (SIGTERM). + expect(activeCount).toBe(3); + }); +}); diff --git a/browse/test/sidebar-ux.test.ts b/browse/test/sidebar-ux.test.ts index 1ae3feabea..74ced5efd0 100644 --- a/browse/test/sidebar-ux.test.ts +++ b/browse/test/sidebar-ux.test.ts @@ -1589,19 +1589,17 @@ describe('tool calls collapse into reasoning disclosure', () => { }); // ─── Idle timeout disabled in headed mode (server.ts) ─────────── +// +// The original 'idle check skips in headed mode' string-grep test was deleted +// in v1.42.3.0 — it would have passed even with the dual-instance bug present +// because it only grepped for "=== 'headed'" + 'return' in the same window. +// Behavioral coverage lives in browse/test/server-factory.test.ts under the +// 'idle timer + onDisconnect dual-instance fix' describe block, which +// exercises the headed/headless/tunnel branches of idleCheckTick directly. describe('idle timeout behavior (server.ts)', () => { const serverSrc = fs.readFileSync(path.join(ROOT, 'src', 'server.ts'), 'utf-8'); - test('idle check skips in headed mode', () => { - const idleCheck = serverSrc.slice( - serverSrc.indexOf('idleCheckInterval'), - serverSrc.indexOf('idleCheckInterval') + 300, - ); - expect(idleCheck).toContain("=== 'headed'"); - expect(idleCheck).toContain('return'); - }); - test('sidebar-command resets idle timer', () => { const sidebarCmd = serverSrc.slice( serverSrc.indexOf("url.pathname === '/sidebar-command'"),