feat(windows): ConPTY terminal, zellij discovery, agent launcher trampoline, codex shim resolution#310
Conversation
On Windows the desktop supervisor can only TerminateProcess the daemon (no POSIX signal reaches a detached child), so the daemon's graceful shutdown never runs and ~/.ao/running.json is never removed. The leaked file survives into the next launch, and because Windows reuses PIDs aggressively the recorded PID usually belongs to an unrelated process. The startup pre-flight trusted PID liveness alone (runfile.CheckStale -> processalive.Alive), so it concluded a daemon was "already running" and exited with "refusing to start" on every restart. A dead daemon then makes the renderer's loopback REST calls (e.g. Spawn Orchestrator) fail silently. Verify the recorded port is actually served by an AO daemon with the recorded PID (a /healthz probe matching service + pid, the same ground truth inspectDaemon already uses) before refusing. A run-file left by a crashed, hard-killed, or reused-PID predecessor is treated as stale and overwritten, so startup is robust to a leaked run-file from any cause. Fixes aoagents#256
build-daemon.mjs compiles the bundled `ao` daemon with the build host's
GOOS and names it off the host platform (ao.exe only when the builder is
Windows). The release workflow ran only on macos-latest, so a Windows
package would ship a macOS binary named `ao` with no `ao.exe`, and the
app could not launch a valid Windows daemon ("This program cannot be run
in DOS mode" / binary not found).
Run the release as a per-OS matrix (macOS + Windows) so host == target
and each installer bundles a daemon compiled for its own platform, and
pin the Go toolchain with setup-go since build-daemon needs it on every
runner.
Fixes aoagents#235
Replaces the Windows stub in internal/terminal/pty_windows.go with a real ConPTY implementation backed by github.com/aymanbagabas/go-pty, so the daemon's /mux attach can stream a live terminal to the renderer on Windows. PTYSource.AttachCommand now returns (argv, env, err). On Windows the zellij attach is spawned directly (no powershell.exe wrapper) — wrapping ConPTY startup around a shell surfaces as modal application-error dialogs — and the per-session ZELLIJ_SOCKET_DIR is delivered via the spawn's CreateProcess env block instead of an 'env -u NO_COLOR' shim. Unix continues to use the env-shim wrapper and returns nil env. Adds go-pty v0.2.3 (+ bumps golang.org/x/sys to v0.44.0 transitively). Updates the in-process test fakes (terminal/fakes_test.go, httpd/terminal_mux_test.go) for the new signature.
…eout
Defaults the zellij binary to whatever exec.LookPath finds first (preferring zellij.exe on Windows), falling back to LOCALAPPDATA\Programs\zellij\zellij.exe and ProgramFiles{,(x86)}\{zellij,Zellij}\zellij.exe so a fresh-installed Windows user gets a working runtime without setting Options.Binary.
Raises the per-command timeout from 5s to 30s on Windows: the first zellij invocation after install routinely takes longer than 5s on Windows due to filesystem/AV warmup, which was causing benign DeadlineExceeded failures during session create.
On Windows, zellij's KDL `args` quoting cannot round-trip codex's --config key=value flags (or any argv with embedded quotes), and shell-wrapping the agent in powershell/cmd quoting is equally unsound. This adds a small launch trampoline so zellij runs a known-fixed argv and the real argv is delivered out-of-band. How it works on Windows: 1. zellij.Runtime.writeLayout persists cfg.Argv to a temp JSON spec via the new agentlaunch package (AO_LAUNCH_SPEC env var points at the file). 2. The KDL layout runs the trampoline as `<ao.exe> launch` (windowsLaunchArgv); PATH is augmented so the trampoline resolves. 3. The new hidden `ao launch` subcommand reads the spec, deletes the temp file, and execs the real agent with cfg.Argv inside cfg.WorkspacePath. Also adds: - runner.Start fire-and-forget path (process_windows.go uses powershell.exe -EncodedCommand + Start-Process -WindowStyle Hidden with CREATE_NEW_CONSOLE so the daemon is not blocked on zellij's --create-background settling). - powerShellEncodedCommand helper and switch from -Command to -EncodedCommand for the existing powershell shellLaunchSpec (avoids brittle KDL→PowerShell quoting round-trips). Unix is unchanged: writeLayout passes cfg.Env straight through, createSession stays synchronous via runner.Run, and process_other.go is a stub that returns an error if anyone calls into the background path.
…iteral strings
Three Windows-targeted refinements to the codex agent plugin so a default Windows install lands in a working state:
1. ResolveCodexBinary now follows .cmd/.ps1 shims to the underlying codex.exe (resolveNativeWindowsCodex + windowsNativeCodexCandidatesForShim). The npm-distributed codex shim cannot be exec'd directly under ConPTY without a shell wrapper; jumping straight to the .exe avoids that wrapper.
2. appendTerminalCompatibilityFlags adds Windows-specific args (e.g. --no-alt-screen) so codex's TUI renders correctly inside zellij's pane without the alternate-screen buffer churn that breaks ConPTY redraws.
3. hooks.go gains codexTOMLLiteralString / codexTOMLConfigString / containsTOMLControl so paths and other values with backslashes and quotes round-trip through codex's --config TOML parser using literal strings ('...') when basic strings would require unsafe escaping.
|
@miniMaddy ci failures resolve karlo |
Vaibhaav-Tiwari
left a comment
There was a problem hiding this comment.
go test -v ./internal/adapters/runtime/zellij fails on Windows in TestRuntimeIntegration:
Create: zellij runtime: list panes ao_itest_zj: exit status 1: There is no active session!
This is directly in scope for the Windows issue. The new Windows zellij path starts the session fire-and-forget, then
polls list-panes, but the integration session is already gone before discovery. That means the runtime path still has
an unhandled Windows failure mode and the changed package does not pass its own Windows integration test.
I also noticed a smaller cleanup issue: writeLayout creates an ao-launch-*.json spec on Windows, but Create only
defers removal of the layout file. If zellij startup or pane discovery fails, the launch spec can be left behind in
%TEMP%.
Tests I ran:
go test -v ./internal/terminal
go test -v ./internal/daemon
go test -v ./internal/cli
go test -v ./internal/adapters/agent/codex
Those passed.
go test -v ./internal/adapters/runtime/zellij
Failed as above.
…entlaunch, codex test quotes
Vaibhaav-Tiwari
left a comment
There was a problem hiding this comment.
The lint/doc cleanup looks fine, but the Windows runtime blocker is still present. On Windows:
go test -v ./internal/adapters/runtime/zellij
now fails in TestRuntimeIntegration at SendMessage:
SendMessage: zellij runtime: paste message ao_itest_zj/terminal_0: exit status 1: There is no active session!
So Create now returns, but the zellij session is gone before the runtime can paste/send input. That is still directly
in scope for the Windows issue because AO needs a stable runtime handle after spawn, not just a successful Create
return.
Other focused tests passed:
go test -v ./internal/terminal
go test -v ./internal/adapters/agent/codex
go test -v ./internal/daemon ./internal/cli
Summary
Four atomic commits adding Windows-native support across the daemon stack:
1. eat(terminal): Windows ConPTY support for /mux attach
2. eat(zellij): discover zellij binary on Windows and raise command timeout
3. eat(zellij,cli): Windows agent launcher trampoline for codex argv
4. eat(codex): Windows binary resolution, terminal compat flags, TOML literal strings
Testing