fix(pi): kill the whole Pi process tree on session close (F08, F07)#712
fix(pi): kill the whole Pi process tree on session close (F08, F07)#712SabhyaC26 wants to merge 3 commits into
Conversation
|
_PiRpcSession spawned Pi without process-group isolation and teardown only signalled the direct child, so the sandbox launcher + pi node CLI and any descendants it spawned (MCP servers, shell tools) were orphaned on every close/interrupt. Mirror codex_executor: spawn with start_new_session on posix and os.killpg(SIGTERM->SIGKILL) the group, falling back to direct terminate()/kill() off-posix. Also reap the killed child with await wait() so it can't linger as a zombie under non-default child watchers (F07).
Strengthen the F08/F07 fix by falling back to direct process signals when process-group signalling fails, and add a real subprocess-tree regression test proving close() reaches descendants instead of only the launcher.
Polly/github-code-quality flagged the two empty 'except' blocks around os.killpg(SIGTERM/SIGKILL) in _PiRpcSession.close() as silent. They are best-effort: on failure the group_*_sent flag stays False and the fallback terminates/kills the direct child. Document that intent. Co-authored-by: Isaac
09cda6b to
6c411b1
Compare
Summary
Fixes F08 (high) and F07 (low) from
SDK_INTEGRATION_BUG_AUDIT.md._PiRpcSession.start()spawned Pi via_create_subprocess_execwith nostart_new_session/process-group isolation, andclose()onlyterminate()/kill()'d the single direct child PID. The spawned child is frequently the sandbox launcher wrapping thepinode CLI, which itself spawns MCP servers / shell tools — so on every close/interrupt the real pi tree was orphaned (reparented to init), leaking CPU/memory/FDs per abandoned turn. Now mirrorscodex_executor: spawn withstart_new_session=(os.name == "posix")and tear down withos.killpg(SIGTERM → SIGKILL)against the group, falling back to directterminate()/kill()off-posix.close()never reaped the child, so it could linger as a zombie under non-default child watchers. Added a guardedawait self.process.wait()after the kill (mirrors codex ~1225-1229). Theawaitis wrapped incontextlib.suppress(Exception)so a wrong-event-loopRuntimeError(test fixtures) can't abort the rest ofclose().Added
import signalfor the newos.killpgcalls.Test plan
pytest tests/inner/test_pi_executor.py -q→ 113 passedruff check omnigent/inner/pi_executor.py→ cleanexplicit-anyerrors are pre-existing and unrelated to these lines)