feat: playwright self-validation and general browser capability by mcheemaa · Pull Request #53 · ghostwright/phantom

mcheemaa · 2026-04-13T03:45:38Z

Summary

Closes both UI gaps in one change: the agent can now self-validate pages it generates via phantom_create_page, and it has the full first-party @playwright/mcp 21-tool browsing surface always available. Both tools share one Chromium Browser and one per-query BrowserContext.

What changed

phantom_preview_page (src/ui/preview.ts): custom in-process MCP tool. Navigates headless Chromium to a /ui/<path>, captures a full-page PNG, bundles HTTP status, title, console messages, and failed network requests into one multimodal result. Stubs window.EventSource in an init script so Phantom's SSE live-reload wiring cannot hold the page open past load.
phantom-browser (src/ui/browser-mcp.ts): embeds @playwright/mcp via createConnection(config, contextGetter). Shares the same BrowserContext as phantom_preview_page via getOrCreatePreviewContext() so the agent can mix tools within a single query. browser.isolated: false is mandatory for the contextGetter path; documented inline.
Async MCP factories (src/agent/runtime.ts): mcpServerFactories now accepts () => McpServerConfig | Promise<McpServerConfig>. Resolution wrapped in Promise.all. Existing sync factories pass through unchanged.
Graceful shutdown (src/index.ts): closePreviewResources() wired into the existing onShutdown(...) registry next to Scheduler and HTTP server.
Agent prompt (src/agent/prompt-assembler.ts): two new guidance blocks (self-validate every UI page, general browser capability). Explicit warning against browser_run_code on external origins.
Dockerfile: bunx playwright install --with-deps --only-shell chromium added after the node_modules copy, with PLAYWRIGHT_BROWSERS_PATH=/home/phantom/.cache/ms-playwright and ownership chown to phantom:phantom.
package.json: exact-pinned playwright@1.59.1 and @playwright/mcp@0.0.70. No caret ranges because the contextGetter API has moved between point releases.

Research references

local/2026-04-12-phantom-ui-chapter/research/01-playwright-self-validation.md (auth model, SSE workaround, Docker plan, bundled-result design)
local/2026-04-12-phantom-ui-chapter/research/01b-playwright-expanded/implementation-plan.md (file-by-file changes, Dockerfile diff, prompt blocks, test plan, acceptance criteria)
local/2026-04-12-phantom-ui-chapter/research/01b-playwright-expanded/findings.md (createConnection contextGetter workaround, five-probe trace, 01b benchmark)

Test count

Before: 931 pass
After: 940 pass + 6 opt-in integration skipped by default = 946 total
Integration run with PHANTOM_INTEGRATION=1: 3 pass in 530ms with real Chromium on macOS arm64
bun run lint clean, bun run typecheck clean, zero regressions.

Docker image delta

Baseline main: 914 MiB
feat branch: 1.91 GiB
Delta: +996 MiB (Chromium headless shell plus apt deps for --with-deps)

The 8 GiB container RAM cap from PR #52 comfortably absorbs the larger image.

Test plan

bun install installs pinned versions
bun run typecheck zero errors
bun run lint zero warnings
bun test 940 pass, 0 fail, 6 opt-in skip
PHANTOM_INTEGRATION=1 bun test src/ui/__tests__/*.integration.test.ts passes with real Chromium
docker build -f Dockerfile -t phantom-playwright:v1 . succeeds
Image contains /home/phantom/.cache/ms-playwright/chromium_headless_shell-* owned by phantom:phantom
Deploy smoke test: orchestrator runs on a VM after review, verifies agent calls phantom_preview_page in a Slack flow and the screenshot looks right

Handoff

Full handoff report with decisions and open questions: local/2026-04-12-phantom-ui-chapter/handoffs/01-playwright-handoff.md

Adds two new MCP tool servers that share one Chromium instance and one BrowserContext per query: - phantom_preview_page: one-call self-validation for /ui/<path> pages. Navigates headless Chromium, captures a full-page PNG, and bundles HTTP status, title, console messages, and failed network requests alongside the screenshot. Stubs window.EventSource in an init script so Phantom's live-reload SSE wiring cannot hold the page open past 'load'. - phantom-browser: embeds the full 21-tool @playwright/mcp surface in process via createConnection with a contextGetter, giving the agent browser_navigate, browser_click, browser_snapshot, browser_run_code, and the rest out of the box. Both share one Browser singleton launched lazily on first use and one BrowserContext minted per query with a short-lived phantom_session cookie (10 minute TTL, domain: localhost). The browser and context are torn down on SIGTERM via the existing graceful-shutdown registry. The contextGetter path is load-bearing: the naive createConnection hangs under Bun on Linux amd64 (traced across five probes in research 01b). browser.isolated: false is mandatory because SimpleBrowser.newContext() throws; the embed defers to our shared context instead. mcpServerFactories now accepts async factories. The per-query resolution path in AgentRuntime uses Promise.all so sync factories pass through as resolved promises without behavior change. Agent prompt gains two new guidance blocks: always validate after phantom_create_page, and prefer phantom_preview_page for self-validation while using browser_* directly for multi-step browsing. Dockerfile installs the --only-shell chromium build (~75 MiB saved over the full binary) with PLAYWRIGHT_BROWSERS_PATH set to a phantom-owned cache directory. playwright@1.59.1 and @playwright/mcp@0.0.70 are pinned exactly because contextGetter and the capability surface have shifted between point releases already. Research: local/2026-04-12-phantom-ui-chapter/research/01-playwright-self-validation.md Research: local/2026-04-12-phantom-ui-chapter/research/01b-playwright-expanded/implementation-plan.md Research: local/2026-04-12-phantom-ui-chapter/research/01b-playwright-expanded/findings.md Test count: 931 -> 940 pass (+9) plus 6 opt-in integration tests skipped by default. Zero regressions. lint and typecheck clean. Docker image delta: +996 MiB (914 MiB -> 1.91 GiB).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 41369cfda9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-13T03:49:00Z

src/ui/preview.ts

+	if (currentContext) return currentContext;
+	if (currentContextPromise) return currentContextPromise;


Refresh preview auth instead of reusing one context forever

This function always returns the same BrowserContext, but that context only receives a preview session cookie once (minted by createPreviewSession, which now expires after 10 minutes). After the token expires, /ui/* requests become unauthenticated and phantom_preview_page/browser_* calls against local UI pages will start landing on login/401 responses until process restart. Rotating the context (or re-minting/re-injecting the cookie) per query is needed to keep the new preview capability working beyond the first 10 minutes.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-13T03:49:00Z

src/ui/preview.ts

+	browserPromise = (async () => {
+		const b = await chromium.launch({
+			headless: true,


Clear cached launch promise on Chromium startup failure

If chromium.launch(...) throws once (e.g., transient resource error), browserPromise remains set to that rejected promise, so every later call immediately fails and the preview/browser tools cannot recover without restarting the process. The cached promise should be cleared on failure so a subsequent call can retry launching Chromium.

Useful? React with 👍 / 👎.

Surgical fix pass on top of 41369cf in response to the internal review at local/2026-04-12-phantom-ui-chapter/review/01-playwright-review.md and Codex PR comments on #53. Fourteen acceptance criteria met. Preview correctness (src/ui/preview.ts): - F5: cookie path reverted from / to /ui, matching the magic-link posture in src/ui/serve.ts (buildPreviewCookie helper extracted). - F6 / Codex P1: rotate the preview session cookie on every warm getOrCreatePreviewContext call once Date.now() is inside the 2 minute safety margin before the 10 minute TTL. Uses addCookies in place so the cached BrowserContext stays the same instance. - F7: shuttingDown module flag set in closePreviewResources and guarded at the top of both getOrCreateBrowser and getOrCreatePreviewContext. Concurrent tool calls during SIGTERM now throw cleanly instead of spawning a resurrected Chromium. - F8: null currentContext in the page.close() catch block so the next call recreates a fresh context instead of handing back a dead one. - F9: replace window.EventSource = undefined with Reflect.deleteProperty(globalThis, "EventSource"). Covers the 'in' operator and drops a TypeScript widening. - Codex P2: try/finally pattern around browserPromise and currentContextPromise so a transient chromium.launch error no longer permanently poisons the cache. Subsequent calls retry cleanly. Browser MCP (src/ui/browser-mcp.ts): - F17: header note 3 rewritten to reflect the actual on-disk layout (top-level hoisted playwright-core@1.60.0-alpha vs. nested playwright/node_modules/playwright-core@1.59.1; no nested playwright-core under @playwright/mcp/node_modules). Cites the integration test that verifies the cross-version BrowserContext boundary end to end. Tests: - F1: preview.integration.test.ts rewritten to drive the real phantom_preview_page handler through a Client + InMemoryTransport pair. Asserts the bundled {image, text} result shape, base64 round-trip, status, title, console capture, and requestfailed capture. - F2 + F4: browser-mcp.integration.test.ts extended with a real Client.listTools() call asserting exactly 21 tools by name and a real browser_navigate round-trip across the cross-version BrowserContext boundary. - New preview-correctness.test.ts with 8 unit tests covering launch-failure recovery, shuttingDown, cookie rotation threshold, cookie path, and the F8 invariant. Uses spyOn(chromium, "launch") plus a fake Browser / BrowserContext so no real Chromium is spawned. Docs: - F3 + F11: Dockerfile image cost breakdown comment expanded to record the 996 MiB delta split: ~327 MB headless shell, ~91 MB fonts, ~500+ MB X11/GTK shared libs. Documents why --only-shell cannot trim more. Test count: 940 pass + 6 skip baseline -> 948 pass + 10 skip after (strictly greater, zero regressions). bun run lint and bun run typecheck both clean. PHANTOM_INTEGRATION=1 integration run on macOS arm64 passes 14/14 with real Chromium. Deferred per fix brief anti-patterns: F10 (v1.1 error-path screenshot), F11 (afterEach scope narrowing, pre-existing), F12 (graceful.ts tasks.reverse, out of scope).

chatgpt-codex-connector bot reviewed Apr 13, 2026

View reviewed changes

mcheemaa merged commit 527fa30 into main Apr 13, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: playwright self-validation and general browser capability#53

feat: playwright self-validation and general browser capability#53
mcheemaa merged 2 commits intomainfrom
feat/playwright-browser-capability

mcheemaa commented Apr 13, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if (currentContext) return currentContext;
		if (currentContextPromise) return currentContextPromise;

Conversation

mcheemaa commented Apr 13, 2026

Summary

What changed

Research references

Test count

Docker image delta

Test plan

Handoff

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant