Symptom
After DOMShell has been idle for hours/days (no MCP traffic, Claude Desktop wasn't actively driving the browser), the next domshell_execute call from a client hits an MCP -32001 Request timed out. The container is healthy, the WS socket on port 9876 still reads ESTABLISHED in netstat, the side panel says "connected (authenticated)" — but commands sent over the WS never get a response. Only fix observed: restart the thv proxy AND the Chrome extension (both — neither alone reliably recovers).
Reproduced live on 2026-06-12 against @apireno/domshell@2.0.3 running thv-managed (thv list workload domshell-mcp-server, Up 3 days). Container log shows ~10 MCP session initialized → → sending SESSION_START to extension entries with NO acknowledgement back from the extension between them. After manual restart of the extension + proxy kick: Extension disconnected → Extension connected (authenticated) → working again, the next pwd returns in ~58 ms.
Root cause
The keepalive in src/background/index.ts:696-708 is one-sided:
chrome.alarms.onAlarm.addListener((alarm) => {
if (alarm.name !== KEEPALIVE_ALARM) return;
// The mere act of this listener firing wakes the service worker.
if (ws?.readyState === WebSocket.OPEN) {
ws.send(JSON.stringify({ type: "pong" })); // one-way send
}
if (wsEnabled && wsToken && (!ws || ws.readyState !== WebSocket.OPEN) && !wsReconnectTimer) {
wsConnect();
}
});
The extension sends a "pong" and considers itself healthy if WebSocket.OPEN. It never confirms that anything actually comes back from the server. The server-side (mcp-server/index.ts) is symmetric — it only sends EXECUTE messages, never heartbeats.
When the underlying TCP pipe goes "zombie" (TCP layer ESTABLISHED but WS framing or JS handler wedged — common during MV3 service-worker suspension cycles + macOS network sleep transitions), ws.send() succeeds at the TCP layer (acked by the kernel) but never reaches the server's WS event handler. Both sides keep WebSocket.OPEN true. The keepalive sees "OPEN" and skips reconnect. The user has to manually intervene.
Why the chrome.alarms keepalive isn't enough
The 24-second alarm DOES successfully wake the service worker (verified by the listener firing). But the alarm only checks ws.readyState, which is a JS-level state that doesn't reflect whether the WS framing layer is actually functional. The alarm is preventing SW suspension correctly; it's just doing the wrong liveness check.
Proposed fix (two-sided)
- Server-side periodic
PING — mcp-server/index.ts sends { "type": "PING", "id": <timestamp> } over the WS every ~20 seconds. Track per-connection.
- Extension-side
PONG reply + last-seen tracking — in the existing onmessage handler, recognize PING, reply with PONG, AND update lastInboundAt = Date.now().
- Extension-side liveness check in the alarm — in the alarm listener, if
wsEnabled && (Date.now() - lastInboundAt) > 60_000, force-close the WS (ws.close()) and rely on the existing onclose reconnect path. The 60s window survives 2 missed server PINGs.
This catches all the failure modes:
- Server-side TCP alive but extension JS dead → lastInbound stops, alarm reconnects
- Server died and TCP RST not received → lastInbound stops, alarm reconnects
- Network change (e.g. WiFi handoff) → lastInbound stops, alarm reconnects
- macOS sleep/wake cycles dropping the WS without notification → lastInbound stops
Alternative: WebSocket protocol-level ping/pong frames
The ws library on the server has built-in ping/pong frame support that Chrome's WS implementation responds to automatically. Cleaner protocol-wise but:
- Doesn't necessarily wake the suspended SW (auto-pong happens at the C++ layer, not the JS event loop)
- Doesn't update
lastInboundAt in our JS state
So WS-level pings would need to be paired with an SW-waking mechanism anyway. The application-level PING approach above gives both liveness AND SW wakeup in one channel.
Related
This issue joins the kernel-side queue for the next CWS-justifying extension release.
Sequencing
Both the server-side PING and extension-side PONG/check need to land together. Server-only changes don't help (extension would ignore unknown PING type today); extension-only changes don't help (nothing to detect on inbound). Bundle into a single release: DOMShell extension v1.3.2 + MCP server 2.0.4.
Workaround until then: if the side panel shows "connected" but commands hang, restart the extension via chrome://extensions/ → DOMShell → toggle off+on. Optionally also kick the thv proxy (thv restart domshell-mcp-server) to flush the dead socket on the server side.
Symptom
After DOMShell has been idle for hours/days (no MCP traffic, Claude Desktop wasn't actively driving the browser), the next
domshell_executecall from a client hits an MCP-32001 Request timed out. The container is healthy, the WS socket on port 9876 still readsESTABLISHEDin netstat, the side panel says "connected (authenticated)" — but commands sent over the WS never get a response. Only fix observed: restart the thv proxy AND the Chrome extension (both — neither alone reliably recovers).Reproduced live on 2026-06-12 against
@apireno/domshell@2.0.3running thv-managed (thv listworkloaddomshell-mcp-server, Up 3 days). Container log shows ~10MCP session initialized→→ sending SESSION_START to extensionentries with NO acknowledgement back from the extension between them. After manual restart of the extension + proxy kick:Extension disconnected→Extension connected (authenticated)→ working again, the nextpwdreturns in ~58 ms.Root cause
The keepalive in src/background/index.ts:696-708 is one-sided:
The extension sends a
"pong"and considers itself healthy ifWebSocket.OPEN. It never confirms that anything actually comes back from the server. The server-side (mcp-server/index.ts) is symmetric — it only sendsEXECUTEmessages, never heartbeats.When the underlying TCP pipe goes "zombie" (TCP layer ESTABLISHED but WS framing or JS handler wedged — common during MV3 service-worker suspension cycles + macOS network sleep transitions),
ws.send()succeeds at the TCP layer (acked by the kernel) but never reaches the server's WS event handler. Both sides keepWebSocket.OPENtrue. The keepalive sees "OPEN" and skips reconnect. The user has to manually intervene.Why the chrome.alarms keepalive isn't enough
The 24-second alarm DOES successfully wake the service worker (verified by the listener firing). But the alarm only checks
ws.readyState, which is a JS-level state that doesn't reflect whether the WS framing layer is actually functional. The alarm is preventing SW suspension correctly; it's just doing the wrong liveness check.Proposed fix (two-sided)
PING—mcp-server/index.tssends{ "type": "PING", "id": <timestamp> }over the WS every ~20 seconds. Track per-connection.PONGreply + last-seen tracking — in the existing onmessage handler, recognizePING, reply withPONG, AND updatelastInboundAt = Date.now().wsEnabled && (Date.now() - lastInboundAt) > 60_000, force-close the WS (ws.close()) and rely on the existingonclosereconnect path. The 60s window survives 2 missed server PINGs.This catches all the failure modes:
Alternative: WebSocket protocol-level ping/pong frames
The
wslibrary on the server has built-inping/pongframe support that Chrome's WS implementation responds to automatically. Cleaner protocol-wise but:lastInboundAtin our JS stateSo WS-level pings would need to be paired with an SW-waking mechanism anyway. The application-level PING approach above gives both liveness AND SW wakeup in one channel.
Related
createAgentLanesilently swallowsgroupNewfailure (kernel-side reliability)--end-of-options separator (and consider-e <pattern>for grep) #49 —parseArgs --end-of-options support (kernel-side; CWS queue)This issue joins the kernel-side queue for the next CWS-justifying extension release.
Sequencing
Both the server-side PING and extension-side PONG/check need to land together. Server-only changes don't help (extension would ignore unknown PING type today); extension-only changes don't help (nothing to detect on inbound). Bundle into a single release: DOMShell extension
v1.3.2+ MCP server2.0.4.Workaround until then: if the side panel shows "connected" but commands hang, restart the extension via
chrome://extensions/→ DOMShell → toggle off+on. Optionally also kick the thv proxy (thv restart domshell-mcp-server) to flush the dead socket on the server side.