Skip to content

[STG-1940] feat: add browser-swarm extension bridge POC#100

Open
shrey150 wants to merge 60 commits into
mainfrom
shrey/browser-swarm-extension-bridge
Open

[STG-1940] feat: add browser-swarm extension bridge POC#100
shrey150 wants to merge 60 commits into
mainfrom
shrey/browser-swarm-extension-bridge

Conversation

@shrey150
Copy link
Copy Markdown
Contributor

@shrey150 shrey150 commented May 7, 2026

Summary

  • add a browser-swarm skill with a Chrome extension bridge and localhost relay
  • create browser-managed worker tabs, with a colored Chrome tab group when the browser supports tab groups cleanly
  • document the parent-harness/worker model with unique browse --session + --cdp contracts and structured worker reports
  • harden worker-scoped endpoints so sibling tab access and direct tab lifecycle changes are rejected
  • add Arc-safe no-group mode so Arc swarms never call chrome.tabGroups.* or chrome.tabs.group
  • add isolated relay-port support for disposable Chrome tests when another installed browser extension is already connected on the default port
  • stabilize parallel input by activating the owning tab and serializing forwarded Input.* CDP commands
  • expose connected extension metadata in /health so stale Arc extension reloads are visible
  • version the MV3 service-worker filename and patch the manifest-declared worker for disposable relay-port tests
  • share the Arc serialized and parallel click smoke tests through one reusable harness
  • add a running issues/evidence log at skills/browser-swarm/RUNNING_TEST_NOTES.md

Linear: https://linear.app/browserbase/issue/STG-1940/add-browser-swarm-extension-bridge-poc

E2E Test Matrix

Command / flow Observed output Confidence / sufficiency
node --check for browser-swarm relay, e2e, setup, launcher, and extension service worker All syntax checks exited 0; extension/manifest.json parsed as JSON Proves the touched scripts, extension worker, and manifest parse successfully, including the Arc no-group path, relay-port launcher path, input queue path, and latest relay/setup hardening changes.
BROWSER_SWARM_PORT=19997 BROWSER_SWARM_BROWSE_BIN=<local oclif browse> npm run e2e from skills/browser-swarm PASS: disposable Chrome launched with a temp extension copy patched to relay port 19997, extensionConnected=true with extension version 0.1.1, three labeled tabs created in a Chrome tab group, each worker session read title/url, tab list returned exactly one tab, snapshots and screenshots were captured, and relay CLI screenshot --path wrote a 1509844 byte PNG Proves the real extension + relay + target-bound endpoint + new --session/--cdp browse workflow works end to end in a disposable browser, even while Arc has an installed browser-swarm extension on the default port. Also proves the relay screenshot CLI honors --path.
BROWSER_SWARM_PORT=20000 BROWSER_SWARM_BROWSE_BIN=<local oclif browse> npm run e2e from skills/browser-swarm PASS: disposable Chrome launched with extension version 0.1.1; session-scoped Target.getTargets saw exactly one worker target; session-scoped sibling attach/info errored; root Target.createTarget increased target count 3 -> 4; root Target.closeTarget reduced it 4 -> 3; relay screenshot wrote a 1510778 byte PNG; three same-page workers submitted alpha-worker/beta-worker/gamma-worker in parallel Proves the latest target-scoping fix closes the session-forwarding isolation bypass, proves root lifecycle cleanup removes closed targets from relay state, and reproves simultaneous actions on identical-page tabs after the fix.
BROWSER_SWARM_PORT=20005 BROWSER_SWARM_BROWSE_BIN=<local oclif browse> npm run e2e from skills/browser-swarm PASS: disposable Chrome launched with extension version 0.1.1; root Target.createTarget with a root sessionId increased target count 3 -> 4; root Target.closeTarget with the same sessionId reduced it 4 -> 3; the root CDP client observed exactly one Target.detachedFromTarget event for the closed target; relay screenshot wrote a 1510792 byte PNG; same-page workers again submitted alpha-worker/beta-worker/gamma-worker in parallel Proves the duplicate-detach regression is fixed and that root lifecycle commands with a sessionId still go through relay bookkeeping instead of raw forwarding.
BROWSER_SWARM_PORT=20007 BROWSER_SWARM_BROWSE_BIN=<local oclif browse> npm run e2e from skills/browser-swarm PASS: disposable Chrome launched with extension version 0.1.1; worker endpoint isolation and session-scoped sibling/lifecycle rejection passed; root Target.createTarget/Target.closeTarget with a root sessionId moved target count 3 -> 4 -> 3; exactly one detach event was observed; relay screenshot wrote a 1509933 byte PNG; same-page workers filled and clicked submit for alpha-worker/beta-worker/gamma-worker in parallel Reconfirms the runtime-tested PR path after code/contract updates passes the full real Chrome e2e path, including target isolation, root lifecycle bookkeeping, screenshot output, and simultaneous same-page writes. Later commits after d455b35 only update browser-swarm docs and RUNNING_TEST_NOTES.md.
BROWSER_SWARM_PORT=20018 BROWSER_SWARM_BROWSE_BIN=<local oclif browse> npm run e2e from skills/browser-swarm PASS: disposable Chrome launched from current PR head 112ecf8 with extension id bljijffbkipealmmglkacdmfjjpdjphm, extension version 0.1.1, and service worker service-worker-v0-1-1.js; worker endpoint isolation and session-scoped sibling/lifecycle rejection passed; root Target.createTarget/Target.closeTarget with a root sessionId moved target count 3 -> 4 -> 3; exactly one detach event was observed; relay screenshot wrote a 1515911 byte PNG; same-page workers filled and clicked submit for alpha-worker/beta-worker/gamma-worker in parallel, and each final tab list exposed exactly one target Proves the current pushed head still passes the full real Chrome e2e path after extension metadata assertions, including real simultaneous same-page actions through target-bound worker endpoints.
Automated Chrome same-page read/write inside npm run e2e PASS: three worker tabs were navigated to the same local URL/title, parallel browse fill wrote alpha-worker/beta-worker/gamma-worker, parallel click #submit submitted each form, and final title/text/value/tab-list assertions all passed Proves read-and-write workflows on identical pages route by target-bound endpoint, not URL/title, and that the input queue fix handles parallel click submission in Chrome.
node scripts/setup-real-browser.mjs --browser arc --no-open --no-start-relay --timeout 2 --json, node scripts/setup-real-browser.mjs --browser chrome --no-open --no-start-relay --port 19995 --timeout 10 --json, and node scripts/setup-real-browser.mjs --browser not-a-browser --no-open --no-start-relay --no-wait PASS for detection behavior: stale Arc worker exited 3 with connected version 0.1.0, expected manifest version 0.1.1, and versionMatches: false; disposable Chrome exited 0 with connected version 0.1.1 and versionMatches: true; unknown browser exited 1 with the supported browser list Proves the setup helper no longer treats extensionConnected: true as sufficient when a stale MV3 service worker is active, and validates browser names before relay/browser side effects. It detects the current Arc blocker and passes on a fresh Chrome happy path.
Raw CDP isolation probe inside npm run e2e PASS: worker endpoint saw only its assigned target; sibling Target.attachToTarget and Target.getTargetInfo returned errors; worker Target.createTarget and Target.closeTarget returned explicit browser-swarm harness errors; the same lifecycle commands with an attached sessionId also returned errors; an unknown sessionId failed closed instead of falling back to the worker target Proves the scoped endpoint enforces the worker-owned-tab contract, blocks direct worker tab lifecycle changes, and covers the session-forwarding bypass/regression path.
Chrome grouped stress with actual Codex worker subagents PASS: spawned three real Codex worker agents against three target-bound Chrome endpoints; each worker reported one visible target, operated only on its own assigned page, and captured a screenshot (/tmp/browser-swarm-subagent-alpha.png, /tmp/browser-swarm-subagent-beta.png, /tmp/browser-swarm-subagent-gamma.png) Proves browser-swarm endpoints work with actual Codex subagents, not just scripted local commands. This proves target isolation per subagent; scheduling is still governed by Codex worker/tool execution rather than browser-swarm itself.
Current-head same-page stress with actual Codex worker subagents PASS at commit 112ecf8 / recorded in b172bf8: spawned three real Codex worker agents (Franklin, Raman, Mencius) against three target-bound endpoints in one disposable Chrome profile on identical http://127.0.0.1:18105/same tabs; workers wrote proper-codex-alpha / proper-codex-beta / proper-codex-gamma, returned structured JSON, and the main harness independently verified each endpoint exposed exactly one target with its own title/text/value state Directly proves proper Codex subagents can take actions concurrently on same-URL/same-title pages without stealing each other's tab state, with routing by target-bound endpoint rather than active browser tab.
Mixed Codex + Claude Code live workflow PASS: one Codex worker and one claude -p --permission-mode bypassPermissions --allowedTools Bash --output-format json agent ran concurrently in the same disposable Chrome profile on identical same-page tabs; each reported distinct title/text/value/url/tabCount/targetId/screenshot; the main harness independently verified one visible target per endpoint Proves both Codex and Claude Code agents can operate as browser-swarm workers in parallel and report structured evidence back to the main harness.
Latest-head mixed Codex + Claude Code workflow after relay hardening PASS at commit a3fe8ad: one real Codex worker subagent and one claude -p --permission-mode bypassPermissions --allowedTools Bash --output-format json worker ran concurrently in the same disposable Chrome profile on identical http://127.0.0.1:18087/same tabs; Codex reported latest-codex-worker, Claude reported latest-claude-worker, and the main harness independently verified each target-bound endpoint saw exactly one tab with its own title/text/value state Proves the relay hardening changes did not regress actual mixed-agent worker execution, structured worker reporting, or same-URL target isolation.
Current-head mixed Codex + Claude Code workflow after root-lifecycle fix PASS at commit 87267b6: one real Codex worker subagent and one claude -p --permission-mode bypassPermissions --allowedTools Bash --output-format json worker ran concurrently in a disposable Chrome profile on identical same-page tabs. Codex reported current-head-codex-worker-87267b6; Claude reported current-head-claude-worker-87267b6; the main harness independently verified each target-bound endpoint exposed exactly one target with distinct title/text/value state. Follow-up strict browse-only probing showed long session names can block browse daemon startup, while short session bs-claude-9039 passed get title, fill, click, tab list, get text, get value, and screenshot --path with one visible tab. Proves the latest root-lifecycle changes did not regress real mixed-agent worker execution, and closes prompt/contract reliability issues around shell selector quoting, raw relay probes, and overly long browse session names.
Latest runtime-tested mixed Codex + Claude Code workflow PASS at commit d455b35: one real Codex worker subagent and one claude -p --permission-mode bypassPermissions --allowedTools Bash --output-format json worker ran concurrently in a disposable Chrome profile on identical http://127.0.0.1:18090/same tabs. Codex reported current-head-codex-d455b35; Claude reported current-head-claude-d455b35; screenshots were written to /tmp/browser-swarm-current-codex-d455b35.png and /tmp/browser-swarm-current-claude-d455b35.png; the main harness independently verified each target-bound endpoint exposed exactly one target with distinct title/text/value state. Reconfirms that real Codex and Claude Code workers can run at the same time, perform same-page read/write actions, and report structured evidence back to the main harness. Later commits after d455b35 only update browser-swarm docs and RUNNING_TEST_NOTES.md.
Latest runtime-tested mixed Codex + Claude Code workflow after versioned worker PASS at commit aba8036: one real Codex worker subagent and one claude -p --permission-mode bypassPermissions --allowedTools Bash --output-format json worker ran concurrently in a disposable Chrome profile with extension version 0.1.1 on identical http://127.0.0.1:55459/same tabs. Codex reported codex-latest-a726f6c; Claude reported claude-latest-a726f6c; screenshots were written to /tmp/browser-swarm-mixed-codex-a726f6c.png (29743 bytes) and /tmp/browser-swarm-mixed-claude-a726f6c.png (29104 bytes); the main harness independently verified each target-bound endpoint exposed exactly one tab with distinct title/text/value state. Reconfirms real Codex and Claude Code workers can run concurrently and report structured evidence after the versioned extension worker and shared Arc harness changes.
Arc mixed Codex + Claude Code DOM-write workflow PASS: two Codex workers and one Claude Code worker ran concurrently in Arc no-group mode against identical local same-page tabs; all reported success, each saw one target, and main harness verified distinct title/text/value states for arc-alpha, arc-beta, and arc-gamma Proves Arc real-profile workers can perform parallel read/write work with mixed agent types when irreversible writes use DOM-level commands. Does not prove Arc pointer-click submission on the stale 0.1.0 service worker.
Chrome grouped root lifecycle stress PASS: root relay endpoint saw all three targets, Target.createTarget created a fourth target, relay navigation worked on that target, target-bound endpoint for the created tab saw only itself, worker endpoints blocked Target.createTarget/Target.closeTarget, and root Target.closeTarget cleaned the target back to the original three Proves the parent harness can manage tab lifecycle while workers remain scoped and cannot create/close tabs themselves.
Manual Chrome same-page simultaneous action test PASS: created three Chrome tabs with identical URL http://127.0.0.1:18080/same and identical title same-page-action-test; issued parallel actions; final DOM/title states were distinct (alpha-worker, beta-worker, gamma-worker) and each worker tab list still returned exactly one target Proves commands route by target-bound endpoint even before the automated e2e version was added. This also exposed the fill --press-enter/parallel input flake that led to the Input.* queue fix.
Arc real-profile grouped smoke about:blank tab appeared, then the old grouped path emitted timeout warnings around chrome.tabGroups.query / chrome.tabs.group; Arc 1.146.0 crashed shortly after Derisk result: Arc Spaces should not be treated as Chrome tab groups. Latest code adds --no-group and documents Arc mode as no visual grouping, with isolation still provided by target-bound endpoints.
Arc real-profile no-group smoke after extension reload PASS: /health returned extensionConnected: true; ensure --no-group --count 1 returned groupId: null, groupDisabled: true, and one target; browse get title/url/tab list --session <arc-smoke> --cdp <target-ws> returned Example Domain, https://example.com/, and exactly one visible target; relay emitted no extension warnings; Arc process stayed running Proves Arc can use the extension bridge when visual tab grouping is disabled.
Arc real-profile no-group real-world task PASS: created three Arc worker tabs for flights, surf rentals, and dinner; each unique browse --session + --cdp endpoint loaded public Google result pages, extracted titles/snapshots, and captured screenshots; tab list returned exactly one visible target per worker after session initialization; raw CDP Target.getTargets returned only the assigned target for each endpoint, and sibling Target.attachToTarget failed; relay emitted no extension warnings; Arc stayed running Proves the Arc no-group path handles a realistic multi-tab research workflow with target isolation. It does not prove high-volume stress or protected-site behavior.
BROWSER_SWARM_BROWSE_BIN=<local oclif browse> npm run e2e:arc-serialized-click from skills/browser-swarm PASS: the new reusable Arc smoke started a relay on port 19989, connected to Arc's active extension worker 0.1.0 while the unpacked manifest expected 0.1.1, created two no-group target-bound tabs on the same local URL, filled arc-serialized-alpha / arc-serialized-beta in parallel, clicked #submit sequentially through the top-level harness, verified each tab title/result/input matched its assigned value, verified each endpoint reported exactly one tab, then closed its tabs and daemons. Proves Arc no-group can do read/write via target-bound DOM operations and can do pointer-click submissions when the top-level harness serializes irreversible clicks. Leaves one open check: manually stop/reload or restart Arc so /health reports version 0.1.1, then rerun parallel pointer-click submission against the latest service worker input queue.
npm run diagnose:arc-worker -- --json from skills/browser-swarm BLOCKED diagnostic as designed on current Arc: command exited 3 with status: "STALE_ARC_SERVICE_WORKER_REGISTRATION", connected version 0.1.0, expected version 0.1.1; Arc Secure Preferences points at the current unpacked extension path but still records service-worker registration version 0.1.0; exact old worker URL hits 18, exact expected worker URL hits 0 Proves the current Arc blocker is the stale Browser Swarm MV3 service-worker registration, not just a relay-health mismatch. This is diagnostic evidence only; the actual Arc parallel-click pass still requires restarting/reloading Arc until /health reports extension version 0.1.1.
BROWSER_SWARM_BROWSE_BIN=<local oclif browse> npm run e2e:arc-parallel-click from skills/browser-swarm BLOCKED as designed on current Arc: command exited 3 with status: "BLOCKED_STALE_EXTENSION", expected extension version 0.1.1, connected Arc worker version 0.1.0, and targetCount: 0. Proves the final Arc parallel pointer-click verifier exists and refuses to judge the new input queue against Arc's stale MV3 worker. The actual parallel-click pass still requires manually reloading/restarting Arc until /health reports extension version 0.1.1.

Artifacts from the real e2e runs:

  • <temp>/browser-swarm-e2e/report.json
  • <temp>/browser-swarm-e2e/flights.png
  • <temp>/browser-swarm-e2e/rentals.png
  • <temp>/browser-swarm-e2e/dinner.png
  • /tmp/browser-swarm-subagent-alpha.png
  • /tmp/browser-swarm-subagent-beta.png
  • /tmp/browser-swarm-subagent-gamma.png
  • /tmp/browser-swarm-mixed-codex.png
  • /tmp/browser-swarm-mixed-claude.png
  • <temp>/browser-swarm-realworld-flights.png
  • <temp>/browser-swarm-realworld-rentals.png
  • <temp>/browser-swarm-realworld-dinner.png
  • skills/browser-swarm/RUNNING_TEST_NOTES.md

Note

High Risk
Adds a new local Chrome extension + relay that forwards and scopes CDP traffic; mistakes could expose cross-tab access or unsafe lifecycle control despite added guardrails.

Overview
Introduces a new browser-swarm skill that coordinates multiple agents against a single real Chromium profile via a localhost relay (scripts/swarm-relay.mjs) and a Manifest V3 extension bridge, producing target-bound CDP endpoints per tab.

Adds hardening to the relay/extension boundary: worker endpoints are scoped to one target, reject sibling target inspection/attach, and block Target.createTarget/Target.closeTarget; the relay also exposes extension metadata in /health and makes detach handling/idempotency more robust.

Adds operational tooling and docs: setup-real-browser.mjs (real-profile setup with version-staleness detection), launch-chrome.mjs (disposable Chrome with --relay-port by patching the extension), Arc-specific --no-group mode plus Arc diagnostics (diagnose-arc-worker.mjs), and e2e harnesses for same-page parallel input/click behavior. Updates plugin marketplace + README to list the new skill and adds RUNNING_TEST_NOTES.md with stress-test evidence.

Reviewed by Cursor Bugbot for commit edad4bb. Bugbot is set up for automated code reviews on this repo. Configure here.

@shrey150 shrey150 marked this pull request as ready for review May 7, 2026 08:20
Comment thread skills/browser-swarm/scripts/swarm-relay.mjs
Comment thread skills/browser-swarm/scripts/swarm-relay.mjs
Comment thread skills/browser-swarm/scripts/swarm-relay.mjs
Comment thread skills/browser-swarm/scripts/launch-chrome.mjs
Comment thread skills/browser-swarm/scripts/swarm-relay.mjs
Comment thread skills/browser-swarm/scripts/setup-real-browser.mjs
Comment thread skills/browser-swarm/scripts/setup-real-browser.mjs
@shrey150 shrey150 force-pushed the shrey/browser-swarm-extension-bridge branch from d8c3cb7 to 86a3aa3 Compare May 14, 2026 07:39
Comment thread skills/browser-swarm/scripts/swarm-relay.mjs
Comment thread skills/browser-swarm/scripts/swarm-relay.mjs
Comment thread skills/browser-swarm/scripts/swarm-relay.mjs Outdated
Comment thread skills/browser-swarm/scripts/swarm-relay.mjs Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit edad4bb. Configure here.

waitingForDebugger: false
}
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synthetic event references wrong target for attachToTarget

Low Severity

In emitSyntheticEvents, the Target.attachToTarget path looks up the target via this.findTarget(params.targetId, client), but handleCdpCommand resolves the target via sessionTarget || this.findTarget(null, client) when params.targetId is absent. For a root client with multiple targets, sessionTarget (from the provided sessionId) may differ from the first target returned by findTarget(null, ...). The synthetic Target.attachedToTarget event would then carry targetInfo for the wrong target compared to the sessionId in the response.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit edad4bb. Configure here.

}, 30000);
this.extensionRequests.set(id, { resolve, reject, timer });
this.extension.send(JSON.stringify(payload));
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relay timeout too short for multi-tab ensureTabs

Medium Severity

sendToExtension uses a fixed 30-second timeout for all extension requests. The extension's ensureTabs calls waitForTabLoad with a 15-second timeout per tab sequentially, plus attachTab overhead. Creating 3+ tabs pointing to slow-loading URLs can exceed 30 seconds total, causing the relay to time out and return an error to the caller even though the extension is still successfully creating tabs. This leaves relay and extension state out of sync.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit edad4bb. Configure here.

params,
targetId: firstTarget?.targetId
}).catch(() => {});
return {};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoAttach silently lost when no targets exist yet

Low Severity

When Target.setAutoAttach is sent before any targets exist, the relay forwards it to the extension with targetId: undefined. The extension's forwardCDPCommand calls findTarget, which returns null and throws — but the relay swallows this with .catch(() => {}) and returns {} to the client. The extension's autoAttachParams is never set, so subsequently created tabs skip auto-attach setup in attachTab, even though the client believes auto-attach is active.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit edad4bb. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant