Skip to content

feat: playwright self-validation and general browser capability#53

Merged
mcheemaa merged 2 commits intomainfrom
feat/playwright-browser-capability
Apr 13, 2026
Merged

feat: playwright self-validation and general browser capability#53
mcheemaa merged 2 commits intomainfrom
feat/playwright-browser-capability

Conversation

@mcheemaa
Copy link
Copy Markdown
Member

Summary

Closes both UI gaps in one change: the agent can now self-validate pages it generates via phantom_create_page, and it has the full first-party @playwright/mcp 21-tool browsing surface always available. Both tools share one Chromium Browser and one per-query BrowserContext.

What changed

  • phantom_preview_page (src/ui/preview.ts): custom in-process MCP tool. Navigates headless Chromium to a /ui/<path>, captures a full-page PNG, bundles HTTP status, title, console messages, and failed network requests into one multimodal result. Stubs window.EventSource in an init script so Phantom's SSE live-reload wiring cannot hold the page open past load.
  • phantom-browser (src/ui/browser-mcp.ts): embeds @playwright/mcp via createConnection(config, contextGetter). Shares the same BrowserContext as phantom_preview_page via getOrCreatePreviewContext() so the agent can mix tools within a single query. browser.isolated: false is mandatory for the contextGetter path; documented inline.
  • Async MCP factories (src/agent/runtime.ts): mcpServerFactories now accepts () => McpServerConfig | Promise<McpServerConfig>. Resolution wrapped in Promise.all. Existing sync factories pass through unchanged.
  • Graceful shutdown (src/index.ts): closePreviewResources() wired into the existing onShutdown(...) registry next to Scheduler and HTTP server.
  • Agent prompt (src/agent/prompt-assembler.ts): two new guidance blocks (self-validate every UI page, general browser capability). Explicit warning against browser_run_code on external origins.
  • Dockerfile: bunx playwright install --with-deps --only-shell chromium added after the node_modules copy, with PLAYWRIGHT_BROWSERS_PATH=/home/phantom/.cache/ms-playwright and ownership chown to phantom:phantom.
  • package.json: exact-pinned playwright@1.59.1 and @playwright/mcp@0.0.70. No caret ranges because the contextGetter API has moved between point releases.

Research references

  • local/2026-04-12-phantom-ui-chapter/research/01-playwright-self-validation.md (auth model, SSE workaround, Docker plan, bundled-result design)
  • local/2026-04-12-phantom-ui-chapter/research/01b-playwright-expanded/implementation-plan.md (file-by-file changes, Dockerfile diff, prompt blocks, test plan, acceptance criteria)
  • local/2026-04-12-phantom-ui-chapter/research/01b-playwright-expanded/findings.md (createConnection contextGetter workaround, five-probe trace, 01b benchmark)

Test count

  • Before: 931 pass
  • After: 940 pass + 6 opt-in integration skipped by default = 946 total
  • Integration run with PHANTOM_INTEGRATION=1: 3 pass in 530ms with real Chromium on macOS arm64
  • bun run lint clean, bun run typecheck clean, zero regressions.

Docker image delta

  • Baseline main: 914 MiB
  • feat branch: 1.91 GiB
  • Delta: +996 MiB (Chromium headless shell plus apt deps for --with-deps)

The 8 GiB container RAM cap from PR #52 comfortably absorbs the larger image.

Test plan

  • bun install installs pinned versions
  • bun run typecheck zero errors
  • bun run lint zero warnings
  • bun test 940 pass, 0 fail, 6 opt-in skip
  • PHANTOM_INTEGRATION=1 bun test src/ui/__tests__/*.integration.test.ts passes with real Chromium
  • docker build -f Dockerfile -t phantom-playwright:v1 . succeeds
  • Image contains /home/phantom/.cache/ms-playwright/chromium_headless_shell-* owned by phantom:phantom
  • Deploy smoke test: orchestrator runs on a VM after review, verifies agent calls phantom_preview_page in a Slack flow and the screenshot looks right

Handoff

Full handoff report with decisions and open questions: local/2026-04-12-phantom-ui-chapter/handoffs/01-playwright-handoff.md

Adds two new MCP tool servers that share one Chromium instance and one
BrowserContext per query:

- phantom_preview_page: one-call self-validation for /ui/<path> pages.
  Navigates headless Chromium, captures a full-page PNG, and bundles HTTP
  status, title, console messages, and failed network requests alongside
  the screenshot. Stubs window.EventSource in an init script so Phantom's
  live-reload SSE wiring cannot hold the page open past 'load'.
- phantom-browser: embeds the full 21-tool @playwright/mcp surface in
  process via createConnection with a contextGetter, giving the agent
  browser_navigate, browser_click, browser_snapshot, browser_run_code,
  and the rest out of the box.

Both share one Browser singleton launched lazily on first use and one
BrowserContext minted per query with a short-lived phantom_session
cookie (10 minute TTL, domain: localhost). The browser and context are
torn down on SIGTERM via the existing graceful-shutdown registry.

The contextGetter path is load-bearing: the naive createConnection hangs
under Bun on Linux amd64 (traced across five probes in research 01b).
browser.isolated: false is mandatory because SimpleBrowser.newContext()
throws; the embed defers to our shared context instead.

mcpServerFactories now accepts async factories. The per-query resolution
path in AgentRuntime uses Promise.all so sync factories pass through as
resolved promises without behavior change.

Agent prompt gains two new guidance blocks: always validate after
phantom_create_page, and prefer phantom_preview_page for self-validation
while using browser_* directly for multi-step browsing.

Dockerfile installs the --only-shell chromium build (~75 MiB saved over
the full binary) with PLAYWRIGHT_BROWSERS_PATH set to a phantom-owned
cache directory. playwright@1.59.1 and @playwright/mcp@0.0.70 are
pinned exactly because contextGetter and the capability surface have
shifted between point releases already.

Research: local/2026-04-12-phantom-ui-chapter/research/01-playwright-self-validation.md
Research: local/2026-04-12-phantom-ui-chapter/research/01b-playwright-expanded/implementation-plan.md
Research: local/2026-04-12-phantom-ui-chapter/research/01b-playwright-expanded/findings.md

Test count: 931 -> 940 pass (+9) plus 6 opt-in integration tests skipped
by default. Zero regressions. lint and typecheck clean.
Docker image delta: +996 MiB (914 MiB -> 1.91 GiB).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 41369cfda9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +59 to +60
if (currentContext) return currentContext;
if (currentContextPromise) return currentContextPromise;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Refresh preview auth instead of reusing one context forever

This function always returns the same BrowserContext, but that context only receives a preview session cookie once (minted by createPreviewSession, which now expires after 10 minutes). After the token expires, /ui/* requests become unauthenticated and phantom_preview_page/browser_* calls against local UI pages will start landing on login/401 responses until process restart. Rotating the context (or re-minting/re-injecting the cookie) per query is needed to keep the new preview capability working beyond the first 10 minutes.

Useful? React with 👍 / 👎.

Comment on lines +46 to +48
browserPromise = (async () => {
const b = await chromium.launch({
headless: true,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear cached launch promise on Chromium startup failure

If chromium.launch(...) throws once (e.g., transient resource error), browserPromise remains set to that rejected promise, so every later call immediately fails and the preview/browser tools cannot recover without restarting the process. The cached promise should be cleared on failure so a subsequent call can retry launching Chromium.

Useful? React with 👍 / 👎.

Surgical fix pass on top of 41369cf in response to the internal review
at local/2026-04-12-phantom-ui-chapter/review/01-playwright-review.md
and Codex PR comments on #53. Fourteen acceptance criteria met.

Preview correctness (src/ui/preview.ts):
- F5: cookie path reverted from / to /ui, matching the magic-link posture
  in src/ui/serve.ts (buildPreviewCookie helper extracted).
- F6 / Codex P1: rotate the preview session cookie on every warm
  getOrCreatePreviewContext call once Date.now() is inside the 2 minute
  safety margin before the 10 minute TTL. Uses addCookies in place so
  the cached BrowserContext stays the same instance.
- F7: shuttingDown module flag set in closePreviewResources and guarded
  at the top of both getOrCreateBrowser and getOrCreatePreviewContext.
  Concurrent tool calls during SIGTERM now throw cleanly instead of
  spawning a resurrected Chromium.
- F8: null currentContext in the page.close() catch block so the next
  call recreates a fresh context instead of handing back a dead one.
- F9: replace window.EventSource = undefined with
  Reflect.deleteProperty(globalThis, "EventSource"). Covers the 'in'
  operator and drops a TypeScript widening.
- Codex P2: try/finally pattern around browserPromise and
  currentContextPromise so a transient chromium.launch error no longer
  permanently poisons the cache. Subsequent calls retry cleanly.

Browser MCP (src/ui/browser-mcp.ts):
- F17: header note 3 rewritten to reflect the actual on-disk layout
  (top-level hoisted playwright-core@1.60.0-alpha vs. nested
  playwright/node_modules/playwright-core@1.59.1; no nested
  playwright-core under @playwright/mcp/node_modules). Cites the
  integration test that verifies the cross-version BrowserContext
  boundary end to end.

Tests:
- F1: preview.integration.test.ts rewritten to drive the real
  phantom_preview_page handler through a Client + InMemoryTransport
  pair. Asserts the bundled {image, text} result shape, base64
  round-trip, status, title, console capture, and requestfailed capture.
- F2 + F4: browser-mcp.integration.test.ts extended with a real
  Client.listTools() call asserting exactly 21 tools by name and a real
  browser_navigate round-trip across the cross-version BrowserContext
  boundary.
- New preview-correctness.test.ts with 8 unit tests covering
  launch-failure recovery, shuttingDown, cookie rotation threshold,
  cookie path, and the F8 invariant. Uses spyOn(chromium, "launch")
  plus a fake Browser / BrowserContext so no real Chromium is spawned.

Docs:
- F3 + F11: Dockerfile image cost breakdown comment expanded to record
  the 996 MiB delta split: ~327 MB headless shell, ~91 MB fonts, ~500+
  MB X11/GTK shared libs. Documents why --only-shell cannot trim more.

Test count: 940 pass + 6 skip baseline -> 948 pass + 10 skip after
(strictly greater, zero regressions). bun run lint and bun run
typecheck both clean. PHANTOM_INTEGRATION=1 integration run on macOS
arm64 passes 14/14 with real Chromium.

Deferred per fix brief anti-patterns: F10 (v1.1 error-path screenshot),
F11 (afterEach scope narrowing, pre-existing), F12 (graceful.ts
tasks.reverse, out of scope).
@mcheemaa mcheemaa merged commit 527fa30 into main Apr 13, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant