Skip to content

feat(windows): ConPTY terminal, zellij discovery, agent launcher trampoline, codex shim resolution#310

Open
miniMaddy wants to merge 9 commits into
aoagents:mainfrom
miniMaddy:fix/windows-daemon-startup-and-packaging
Open

feat(windows): ConPTY terminal, zellij discovery, agent launcher trampoline, codex shim resolution#310
miniMaddy wants to merge 9 commits into
aoagents:mainfrom
miniMaddy:fix/windows-daemon-startup-and-packaging

Conversation

@miniMaddy

Copy link
Copy Markdown

Summary

Four atomic commits adding Windows-native support across the daemon stack:

1. eat(terminal): Windows ConPTY support for /mux attach

  • Adds go-pty v0.2.3 (ConPTY) and bumps golang.org/x/sys v0.44.0
  • Implements pty_windows.go using ConPTY via winpty.New() with detach-grace + Kill fallback
  • Plumbs �nv param through PTYSource.AttachCommand on all platforms

2. eat(zellij): discover zellij binary on Windows and raise command timeout

  • Probes zellij.exe via PATH then well-known install locations (LOCALAPPDATA, ProgramFiles)
  • Raises default command timeout to 30 s on Windows (vs. the existing 10 s Unix default)

3. eat(zellij,cli): Windows agent launcher trampoline for codex argv

  • New �gentlaunch package: serialises agent argv to a temp JSON spec file, reads-and-removes on launch
  • New hidden �o launch subcommand: the trampoline entry point invoked inside the zellij pane
  • Windows zellij Create writes a direct layout (no shell wrapper) and starts the pane via Start-Process (CREATE_NEW_CONSOLE, hidden window)
  • PowerShell launch uses -EncodedCommand (UTF-16 LE base64) to avoid quoting issues

4. eat(codex): Windows binary resolution, terminal compat flags, TOML literal strings

  • ResolveCodexBinary follows .cmd/.ps1 npm shims to the native .exe on Windows
  • Appends --no-alt-screen on Windows to prevent alternate-screen escape sequences breaking the ConPTY pane
  • TOML config values are now emitted as literal strings (single-quote) to avoid backslash escapes in Windows paths

Testing

  • All changed packages pass go test: erminal, zellij, �gentlaunch, cli, codex
  • go build ./... and go vet ./... clean on both Unix and Windows
  • Pre-existing TestBuild_MatchesEmbedded failure in httpd/apispec/specgen is present on main too (stale embedded openapi.yaml) — not introduced by this PR

harshitsinghbhandari and others added 7 commits June 16, 2026 21:30
On Windows the desktop supervisor can only TerminateProcess the daemon
(no POSIX signal reaches a detached child), so the daemon's graceful
shutdown never runs and ~/.ao/running.json is never removed. The leaked
file survives into the next launch, and because Windows reuses PIDs
aggressively the recorded PID usually belongs to an unrelated process.
The startup pre-flight trusted PID liveness alone (runfile.CheckStale ->
processalive.Alive), so it concluded a daemon was "already running" and
exited with "refusing to start" on every restart. A dead daemon then
makes the renderer's loopback REST calls (e.g. Spawn Orchestrator) fail
silently.

Verify the recorded port is actually served by an AO daemon with the
recorded PID (a /healthz probe matching service + pid, the same ground
truth inspectDaemon already uses) before refusing. A run-file left by a
crashed, hard-killed, or reused-PID predecessor is treated as stale and
overwritten, so startup is robust to a leaked run-file from any cause.

Fixes aoagents#256
build-daemon.mjs compiles the bundled `ao` daemon with the build host's
GOOS and names it off the host platform (ao.exe only when the builder is
Windows). The release workflow ran only on macos-latest, so a Windows
package would ship a macOS binary named `ao` with no `ao.exe`, and the
app could not launch a valid Windows daemon ("This program cannot be run
in DOS mode" / binary not found).

Run the release as a per-OS matrix (macOS + Windows) so host == target
and each installer bundles a daemon compiled for its own platform, and
pin the Go toolchain with setup-go since build-daemon needs it on every
runner.

Fixes aoagents#235
Replaces the Windows stub in internal/terminal/pty_windows.go with a real ConPTY implementation backed by github.com/aymanbagabas/go-pty, so the daemon's /mux attach can stream a live terminal to the renderer on Windows.

PTYSource.AttachCommand now returns (argv, env, err). On Windows the zellij attach is spawned directly (no powershell.exe wrapper) — wrapping ConPTY startup around a shell surfaces as modal application-error dialogs — and the per-session ZELLIJ_SOCKET_DIR is delivered via the spawn's CreateProcess env block instead of an 'env -u NO_COLOR' shim. Unix continues to use the env-shim wrapper and returns nil env.

Adds go-pty v0.2.3 (+ bumps golang.org/x/sys to v0.44.0 transitively). Updates the in-process test fakes (terminal/fakes_test.go, httpd/terminal_mux_test.go) for the new signature.
…eout

Defaults the zellij binary to whatever exec.LookPath finds first (preferring zellij.exe on Windows), falling back to LOCALAPPDATA\Programs\zellij\zellij.exe and ProgramFiles{,(x86)}\{zellij,Zellij}\zellij.exe so a fresh-installed Windows user gets a working runtime without setting Options.Binary.

Raises the per-command timeout from 5s to 30s on Windows: the first zellij invocation after install routinely takes longer than 5s on Windows due to filesystem/AV warmup, which was causing benign DeadlineExceeded failures during session create.
On Windows, zellij's KDL `args` quoting cannot round-trip codex's --config key=value flags (or any argv with embedded quotes), and shell-wrapping the agent in powershell/cmd quoting is equally unsound. This adds a small launch trampoline so zellij runs a known-fixed argv and the real argv is delivered out-of-band.

How it works on Windows:

1. zellij.Runtime.writeLayout persists cfg.Argv to a temp JSON spec via the new agentlaunch package (AO_LAUNCH_SPEC env var points at the file).

2. The KDL layout runs the trampoline as `<ao.exe> launch` (windowsLaunchArgv); PATH is augmented so the trampoline resolves.

3. The new hidden `ao launch` subcommand reads the spec, deletes the temp file, and execs the real agent with cfg.Argv inside cfg.WorkspacePath.

Also adds:

- runner.Start fire-and-forget path (process_windows.go uses powershell.exe -EncodedCommand + Start-Process -WindowStyle Hidden with CREATE_NEW_CONSOLE so the daemon is not blocked on zellij's --create-background settling).

- powerShellEncodedCommand helper and switch from -Command to -EncodedCommand for the existing powershell shellLaunchSpec (avoids brittle KDL→PowerShell quoting round-trips).

Unix is unchanged: writeLayout passes cfg.Env straight through, createSession stays synchronous via runner.Run, and process_other.go is a stub that returns an error if anyone calls into the background path.
…iteral strings

Three Windows-targeted refinements to the codex agent plugin so a default Windows install lands in a working state:

1. ResolveCodexBinary now follows .cmd/.ps1 shims to the underlying codex.exe (resolveNativeWindowsCodex + windowsNativeCodexCandidatesForShim). The npm-distributed codex shim cannot be exec'd directly under ConPTY without a shell wrapper; jumping straight to the .exe avoids that wrapper.

2. appendTerminalCompatibilityFlags adds Windows-specific args (e.g. --no-alt-screen) so codex's TUI renders correctly inside zellij's pane without the alternate-screen buffer churn that breaks ConPTY redraws.

3. hooks.go gains codexTOMLLiteralString / codexTOMLConfigString / containsTOMLControl so paths and other values with backslashes and quotes round-trip through codex's --config TOML parser using literal strings ('...') when basic strings would require unsafe escaping.
@Priyanchew

Copy link
Copy Markdown
Collaborator

@miniMaddy ci failures resolve karlo

@Vaibhaav-Tiwari Vaibhaav-Tiwari self-requested a review June 18, 2026 10:23

@Vaibhaav-Tiwari Vaibhaav-Tiwari left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go test -v ./internal/adapters/runtime/zellij fails on Windows in TestRuntimeIntegration:

Create: zellij runtime: list panes ao_itest_zj: exit status 1: There is no active session!

This is directly in scope for the Windows issue. The new Windows zellij path starts the session fire-and-forget, then
polls list-panes, but the integration session is already gone before discovery. That means the runtime path still has
an unhandled Windows failure mode and the changed package does not pass its own Windows integration test.

I also noticed a smaller cleanup issue: writeLayout creates an ao-launch-*.json spec on Windows, but Create only
defers removal of the layout file. If zellij startup or pane discovery fails, the launch spec can be left behind in
%TEMP%.

Tests I ran:

go test -v ./internal/terminal
go test -v ./internal/daemon
go test -v ./internal/cli
go test -v ./internal/adapters/agent/codex

Those passed.

go test -v ./internal/adapters/runtime/zellij

Failed as above.

@Vaibhaav-Tiwari Vaibhaav-Tiwari left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lint/doc cleanup looks fine, but the Windows runtime blocker is still present. On Windows:

go test -v ./internal/adapters/runtime/zellij

now fails in TestRuntimeIntegration at SendMessage:

SendMessage: zellij runtime: paste message ao_itest_zj/terminal_0: exit status 1: There is no active session!

So Create now returns, but the zellij session is gone before the runtime can paste/send input. That is still directly
in scope for the Windows issue because AO needs a stable runtime handle after spawn, not just a successful Create
return.

Other focused tests passed:

go test -v ./internal/terminal
go test -v ./internal/adapters/agent/codex
go test -v ./internal/daemon ./internal/cli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants