Skip to content

moc-com/browser-automation-token-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Hidden Token Tax

Quantifying the True Cost of AI Browser Automation

Every click on a complex page silently consumes 10,000+ tokens via MCP. The same click via CLI? Just 35.

License: MIT Playwright CLI Playwright MCP Benchmark


An empirical benchmark revealing 2-5x token waste in standard MCP browser tools, with measured CDP protocol overhead and a hybrid routing solution.


Abstract

AI agents increasingly rely on browser automation for web interaction, but the token cost of these interactions remains poorly understood. We present the first systematic, empirical benchmark comparing four browser automation approaches for AI agents: @playwright/cli (file-based snapshots), @playwright/mcp (protocol-embedded snapshots), claude --chrome (CDP relay), and raw CDP (Chrome DevTools Protocol).

Our measurements across 4 real-world websites (5-882 DOM elements) reveal that MCP embeds the full accessibility tree in every action response, creating O(n) context growth that silently drains the agent's context window. In contrast, CLI's file-based approach achieves O(1) per-action cost, reducing total token consumption by 37-84% in multi-step workflows.

For a team of 10 developers running 50 browser automation workflows per day, this translates to $20,890 in annual API cost savings (at Claude Sonnet pricing).

We provide reproducible benchmark scripts, raw measurement data, and a practical hybrid routing strategy.


Table of Contents


Visual Summary

5-Action Workflow Token Cost Comparison

Context Window Growth Over 10 Actions

Annual API Cost Impact

Per-Action Token Cost (Log Scale)


Key Findings

🔍 Finding 1: The Token Tax

MCP includes the full ARIA snapshot in every action response.

Verified in source code: setIncludeSnapshot() is called on browser_click, browser_type, browser_navigate, and all interaction tools.

Every click, every keystroke, every navigation — the entire page tree is re-sent.

📊 Finding 2: The Scale

Page MCP per-action CLI per-action Ratio
Simple (5 elements) 86 tok 35 tok 2.5x
Form (44 elements) 248 tok 35 tok 7.1x
Complex (458 elements) 3,499 tok 41 tok 85x
Heavy (882 elements) 10,162 tok 35 tok 290x

💰 Finding 3: The Cost

A 10-person team running 50 browser workflows/day wastes $20,890/year on redundant tokens.

MCP CLI Savings
Per workflow (HN) $0.159 $0.044 72%
Per developer/year $2,900 $811 $2,089
10-person team/year $29,000 $8,110 $20,890

At Claude Sonnet $3/MTok input pricing

⚡ Finding 4: CDP Raw Overhead

The raw CDP accessibility tree is 15-157x larger than the processed ARIA snapshot.

Page ARIA Snapshot CDP Raw A11y Ratio
example.com 232 B 5,807 B 25x
httpbin 912 B 39,311 B 43x
Hacker News 40,620 B 631,263 B 15.5x

This reveals that Playwright's ARIA processing pipeline provides massive compression — but MCP negates this by re-sending every action.


The Problem: Invisible Context Drain

When an AI agent uses MCP browser tools, every action silently injects the full page accessibility tree into the context window:

Action 1: navigate    → +10,162 tokens (full snapshot)
Action 2: click link  → +10,167 tokens (full snapshot again)
Action 3: fill form   → +10,162 tokens (full snapshot again)
Action 4: click btn   → +10,167 tokens (full snapshot again)
Action 5: snapshot    → +10,162 tokens (full snapshot again)
─────────────────────────────────────────────────────
Total: 50,820 tokens consumed (for 5 simple actions on Hacker News)

With CLI, the same workflow:

Action 1: navigate    → +99 tokens (URL + title only)
Action 2: click link  → +35 tokens (confirmation only)
Action 3: fill form   → +35 tokens (confirmation only)
Action 4: click btn   → +35 tokens (confirmation only)
Action 5: snapshot    → +39 tokens (stdout) + 14,646 tokens (file, read once)
─────────────────────────────────────────────────────
Total: 14,889 tokens consumed (71% savings)

The difference? CLI writes snapshots to files. MCP embeds them in every response.

Context Growth Visualization

---
config:
  themeVariables:
    xyChart:
      plotColorPalette: "#10B981, #F59E0B"
---
xychart-beta
    title "Cumulative Token Cost Over 10 Actions (Medium Page, S=3000 tok)"
    x-axis ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"]
    y-axis "Total Tokens" 0 --> 35000
    line "CLI (O(1) + on-demand read)" [3035, 3070, 3105, 3140, 6175, 6210, 6245, 6280, 6315, 6350]
    line "MCP (O(n) every action)" [3035, 6070, 9105, 12140, 15175, 18210, 21245, 24280, 27315, 30350]
Loading

CLI grows at 35 tokens/action (constant). MCP grows at 3,035 tokens/action (linear with page size). By action 10, MCP has consumed 4.8x more tokens than CLI.


Methodology

Environment

Parameter Value
OS WSL2 (Linux 6.6.87.2-microsoft-standard-WSL2)
Node.js v22.21.1
@playwright/cli v0.1.1 (Playwright 1.59.0-alpha)
@playwright/mcp v0.0.68
Browser Chromium (bundled, headless)
Token estimation ceil(bytes / 4) (standard English text approximation)

Target Pages

We selected 4 pages spanning the complexity spectrum:

Page URL DOM Elements HTML Size Complexity
example.com https://example.com 5 528 B Minimal
httpbin forms https://httpbin.org/forms/post 44 1,419 B Form-heavy
Hacker News https://news.ycombinator.com 882 34,706 B Content-rich
GitHub Trending https://github.com/trending 458 567,849 B Complex SPA

Measurement Approach

flowchart LR
    subgraph "Three Independent Measurement Paths"
        A["CLI\n(subprocess)"] --> D["stdout bytes\n+ file bytes"]
        B["MCP-equivalent\nPlaywright API"] --> E["JSON response\nbytes"]
        C["CDP Raw\n(DevTools Protocol)"] --> F["Protocol response\nbytes"]
    end
    D --> G["Token = ceil(bytes/4)"]
    E --> G
    F --> G
    G --> H["Comparison\nTables"]
Loading
  1. CLI: Actual playwright-cli subprocess calls, measuring stdout and snapshot file sizes
  2. MCP-equivalent: Playwright API with ariaSnapshot(), wrapped in MCP JSON envelope
  3. CDP Raw: Direct page.context().newCDPSession() for raw protocol measurements

Each measurement was run 3 times; we report the median.

Reproducibility

All benchmark scripts and raw data are included in this repository. See Reproduce Our Results for instructions.


Results

1. Per-Action Token Cost

The fundamental metric: how many tokens does each approach consume per browser action?

CLI stdout vs MCP response (per action)

Page CLI stdout MCP response MCP/CLI ratio
example.com 35 tok 86 tok 2.5x
httpbin forms 35 tok 248 tok 7.1x
Hacker News 35 tok 10,162 tok 290x
GitHub Trending 41 tok 3,499 tok 85x

Key Insight: CLI stdout is constant (~35 tokens) regardless of page complexity. It contains only the URL, title, and a file link. The snapshot is written to disk.

When the CLI snapshot file IS read

Page CLI file MCP response Ratio
example.com 79 tok 86 tok 1.1x
httpbin forms 420 tok 248 tok 0.6x*
Hacker News 14,646 tok 10,162 tok 0.7x*
GitHub Trending 6,625 tok 3,499 tok 0.5x*

*CLI file is sometimes larger due to element refs (e1, e2, ...) and YAML formatting. But the file is read once, while MCP sends the snapshot on every action.

This is the critical insight: per-snapshot, MCP and CLI are similar. But MCP sends it N times; CLI sends it K times (K << N).

2. Multi-Step Workflow Cost

Real-world workflows involve multiple sequential actions. This is where the token tax compounds.

5-Action Workflow Comparison

---
config:
  themeVariables:
    xyChart:
      plotColorPalette: "#10B981, #F59E0B"
---
xychart-beta
    title "5-Action Workflow: Total Token Cost"
    x-axis ["example.com", "httpbin", "Hacker News", "GitHub"]
    y-axis "Total Tokens" 0 --> 55000
    bar "CLI (1 snapshot read)" [254, 595, 14821, 6800]
    bar "MCP (snapshot every action)" [430, 1370, 53090, 18745]
Loading
Page CLI Total MCP Total Savings Savings %
example.com 254 tok 430 tok 176 tok 41%
httpbin forms 595 tok 1,370 tok 775 tok 57%
Hacker News 14,821 tok 53,090 tok 38,269 tok 72%
GitHub Trending 6,800 tok 18,745 tok 11,945 tok 64%

Formula:

  • CLI: C(n) = n × 35 + k × S(page) where k = snapshot reads (typically 1-2)
  • MCP: C(n) = n × (35 + S(page)) — every action includes the full snapshot

7-Action Form Fill Workflow (httpbin)

navigate → snapshot → fill × 3 → click → snapshot

Mode Total Bytes Tokens vs MCP
CLI (standard) 4,637 B ~1,159 39% saved
CLI (optimized, 1 read) 2,921 B ~730 62% saved
MCP (standard) 7,658 B ~1,914 baseline

6-Action Complex Page (Hacker News)

navigate → snapshot → click → snapshot → click → snapshot

Mode Total Bytes Tokens vs MCP
CLI (standard) 122,764 B ~30,691 52% saved
CLI (optimized, 1 read) 41,788 B ~10,447 84% saved
MCP (standard) 254,826 B ~63,706 baseline

3. Context Growth Model

The mathematical relationship between actions and context consumption:

CLI:  C(n) = n × 35 + k × S(page)    // k = snapshot reads (1-3 typical)
MCP:  C(n) = n × (35 + S(page))       // snapshot in every response
CDP:  C(n) = n × (35 + R(page))       // R >> S (raw >> processed)

For n = 10 actions:

Page Size (S) CLI (k=2) MCP CDP Raw MCP/CLI CDP/CLI
50 tok (minimal) 450 850 15,000+ 1.9x 33x
500 tok (form) 1,350 5,350 100,000+ 4.0x 74x
3,000 tok (medium) 6,350 30,350 1,500,000+ 4.8x 236x
10,000 tok (heavy) 20,350 100,350 4.9x

Cross-over point: CLI advantage becomes significant when S > 200 tokens (any page with >10 interactive elements).

4. CDP Protocol Deep Dive

We measured the raw CDP protocol output to understand what Playwright's ARIA processing compresses.

CDP Response Size Comparison

Page DOMSnapshot Full A11y Tree ARIA Snapshot A11y/ARIA Ratio
example.com 999 B 5,807 B 232 B 25x
httpbin forms 1,002 B 39,311 B 912 B 43x
Hacker News 1,001 B 631,263 B 40,620 B 15.5x
GitHub Trending 999 B 1,036 B 1,436 B 0.7x*

*GitHub Trending showed anomalously low CDP A11y output (1,036 B), likely due to dynamic content loading or anti-scraping measures affecting the CDP session.

Insight: Playwright's ARIA processing pipeline provides 15-43x compression over raw CDP accessibility data. This is valuable work — but MCP negates the benefit by re-sending the compressed result on every action.

The Data Pipeline

flowchart LR
    A["Raw DOM\n(34-567KB HTML)"] --> B["CDP Accessibility\ngetFullAXTree\n(1-631KB)"]
    B --> C["ARIA Snapshot\nProcessing\n(232B-40KB)"]
    C --> D{"Delivery Method"}
    D -->|"MCP"| E["Embedded in\nevery response\n❌ O(n) growth"]
    D -->|"CLI"| F["Written to file\nread on demand\n✅ O(1) growth"]
    D -->|"chrome"| G["Same as MCP\n+ 15% CDP relay\n❌ O(n) growth"]

    style E fill:#FEF3C7,stroke:#F59E0B
    style F fill:#D1FAE5,stroke:#10B981
    style G fill:#EDE9FE,stroke:#6366F1
Loading

claude --chrome Analysis

claude --chrome uses the same Playwright MCP server internally, connected via a CDP WebSocket relay:

Claude CLI → MCP JSON-RPC → Playwright MCP Server → CDP WebSocket
    → CDPRelayServer → WebSocket → Chrome Extension → Native Messaging → Chrome

Source code analysis of cdpRelay.js confirms:

  • Same ARIA snapshot processing as MCP
  • Additional WebSocket relay overhead (~15%)
  • Not available on WSL2 (requires Chrome UI + Native Messaging bridge)

Estimated token cost: C_chrome(n) ≈ 1.15 × C_mcp(n)

5. Latency Analysis

Token efficiency and latency represent a trade-off:

Operation CLI MCP CDP Direct
Cold start (browser launch + navigate) 3,200-6,100ms 590-1,500ms 50-910ms
Warm snapshot 130-350ms 16-345ms 8-140ms
Click/Fill action ~1,200ms ~200ms N/A

CLI has higher per-action latency (~1.2s) due to Playwright code generation. But the total workflow cost is lower because:

  1. Fewer context tokens = faster LLM inference
  2. Fewer round-trips needed (snapshot on demand)
  3. Token savings far outweigh latency cost at scale

6. Cost Impact Analysis

Per-Workflow Cost (Claude Sonnet, $3/MTok input)

Page MCP Cost CLI Cost Savings
Simple page (5 actions) $0.001 $0.001 ~$0
Form fill (7 actions) $0.006 $0.002 $0.004
Complex page (5 actions) $0.159 $0.044 $0.115
GitHub scrape (4 actions) $0.056 $0.020 $0.036

Annualized Impact

Scale MCP Annual CLI Annual Annual Savings
Solo developer (10/day) $580 $162 $418
Active developer (50/day) $2,900 $811 $2,089
10-person team (50/day each) $29,000 $8,110 $20,890
Enterprise (100 devs) $290,000 $81,100 $208,900

Assumes complex-page-heavy workload. Real savings vary by page complexity distribution. Claude Opus pricing ($15/MTok) would increase savings by 5x.


Architecture Comparison

flowchart TB
    subgraph CLI ["@playwright/cli (Token-Efficient)"]
        direction LR
        A1["Agent"] -->|"subprocess"| A2["playwright-cli"]
        A2 -->|"35 tok stdout"| A1
        A2 -->|"file write"| A3[".playwright-cli/page.yml"]
        A3 -.->|"read on demand"| A1
    end

    subgraph MCP ["@playwright/mcp (Standard)"]
        direction LR
        B1["Agent"] -->|"JSON-RPC"| B2["MCP Server"]
        B2 -->|"response + full snapshot\n(100-10,000+ tok)"| B1
    end

    subgraph Chrome ["claude --chrome (Extension)"]
        direction LR
        C1["Agent"] -->|"JSON-RPC"| C2["MCP Server"]
        C2 -->|"CDP WebSocket"| C3["Relay"]
        C3 -->|"Native Msg"| C4["Chrome"]
        C2 -->|"response + snapshot\n+ 15% overhead"| C1
    end

    subgraph CDP ["Raw CDP (Reference)"]
        direction LR
        D1["Agent"] -->|"CDP Session"| D2["Browser"]
        D2 -->|"Full A11y Tree\n(15-157x larger)"| D1
    end

    style CLI fill:#D1FAE5,stroke:#10B981,stroke-width:3px
    style MCP fill:#FEF3C7,stroke:#F59E0B,stroke-width:2px
    style Chrome fill:#EDE9FE,stroke:#6366F1,stroke-width:2px
    style CDP fill:#FEE2E2,stroke:#EF4444,stroke-width:2px
Loading

Why CLI is Different

The key architectural difference is when and how the snapshot is delivered:

Aspect CLI MCP chrome
Snapshot delivery File on disk In every response In every response + relay
Agent reads snapshot When needed (0-3 times) Forced (every action) Forced (every action)
Per-action overhead ~35 tokens (constant) ~S tokens (page-dependent) ~1.15S tokens
Context growth O(1) O(n) O(n)
Element references e1, e2, ... (pointer IDs) Full tree re-parse Full tree re-parse

The Solution: Hybrid Routing

Based on our measurements, we recommend a hybrid routing strategy:

flowchart TD
    A["Browser Action Request"] --> B{"Action Type?"}
    B -->|"Deterministic\n(click, fill, type,\npress, goto)"| C["CLI\n✅ 35 tok/action"]
    B -->|"Exploratory\n(unknown page,\ndebugging)"| D["MCP\n📊 Full tree access"]
    B -->|"Visual/Manual\nCollaboration"| E["chrome\n🖥️ GUI needed"]

    C --> F{"Need snapshot?"}
    F -->|"Yes (first time\nor state changed)"| G["Read file\n(one-time cost)"]
    F -->|"No (using refs)"| H["Skip\n(0 extra tokens)"]

    style C fill:#D1FAE5,stroke:#10B981
    style D fill:#FEF3C7,stroke:#F59E0B
    style E fill:#EDE9FE,stroke:#6366F1
Loading

Routing Decision Table

Use Case Recommended Token Ratio vs MCP Reason
Form filling (5+ fields) CLI 1.9x savings Element refs, no repeated snapshots
Click sequences CLI 2-5x savings O(1) per action
Large page scraping (>500 elements) CLI 4.9x savings File-based snapshot, read once
CI/CD E2E testing CLI 4.8x savings Subprocess-friendly, low cost
Unknown page exploration MCP baseline Full tree needed for discovery
Real-time debugging MCP/chrome Interactive tree, state management
GUI collaboration chrome Visual context, WSL2 not supported

Reproduce Our Results

Prerequisites

# Node.js 18+ required
node --version  # v18.0.0+

# Install dependencies
npm install

# Install Chromium browser
npx playwright install chromium

Run Benchmarks

# Full benchmark (all 4 pages, all 3 approaches)
npm run benchmark

# JSON output (for further analysis)
npm run benchmark:json > my_results.json

# CLI workflow demo
npm run demo:cli

Configuration

For environments where Google Chrome is not installed (e.g., CI, WSL2):

# Use bundled Chromium instead of Chrome
export PLAYWRIGHT_MCP_CONFIG=$(cat <<'EOF'
{
  "browser": {
    "browserName": "chromium",
    "launchOptions": {
      "channel": "chromium",
      "headless": true,
      "chromiumSandbox": false
    }
  }
}
EOF
)

npm run benchmark

Raw Data

Full benchmark results are available in data/results.json.

Summary Table (all measurements)
Page Approach Action Tokens Bytes Latency (ms)
example.com CLI open 126 503 3,356
example.com CLI snapshot (file) 79 315 130
example.com MCP navigate 88 352 593
example.com MCP snapshot 78 312 22
example.com CDP Full A11y Tree 1,452 5,807 22
example.com CDP ARIA processed 58 232 74
httpbin CLI open 94 373 3,529
httpbin CLI snapshot (file) 420 1,677 149
httpbin MCP navigate 258 1,032 651
httpbin MCP snapshot 248 992 16
httpbin CDP Full A11y Tree 9,827 39,311 43
httpbin CDP ARIA processed 228 912 77
Hacker News CLI open 99 396 3,728
Hacker News CLI snapshot (file) 14,646 58,637 284
Hacker News MCP navigate 10,172 40,688 830
Hacker News MCP snapshot 10,162 40,648 125
Hacker News CDP Full A11y Tree 157,732 631,263 139
Hacker News CDP ARIA processed 10,142 40,620 240
GitHub Trending CLI open 107 429 6,096
GitHub Trending CLI snapshot (file) 6,625 26,502 348
GitHub Trending MCP navigate 3,509 14,036 1,499
GitHub Trending MCP snapshot 3,499 13,996 345
GitHub Trending CDP Full A11y Tree 259 1,036 8
GitHub Trending CDP ARIA processed 359 1,436 382

Discussion

Validated Claims

Claim Source Our Measurement Status
CLI is 4.2x more token-efficient Microsoft 1.7-6.1x (workflow) ✅ Lower bound confirmed
CLI up to 100x for single actions Microsoft 2.5-290x (per-action stdout) ✅ Exceeds claim for complex pages
On-demand snapshots reduce context Design goal O(1) vs O(n) verified ✅ Confirmed empirically
CLI has higher per-action latency Expected ~1.2s vs ~200ms ✅ Confirmed (code generation step)

Limitations

  1. Token estimation: We use ceil(bytes/4) as a proxy. Actual tokenizer output may vary by ±15%
  2. Single-run timing: Latency measurements show median of 3 runs; network variance affects results
  3. GitHub Trending anomaly: CDP A11y tree for GitHub Trending showed unexpectedly low output (1,036 B), suggesting dynamic content or anti-bot measures affected the CDP session
  4. WSL2 environment: claude --chrome could not be directly measured; token cost is estimated from source code analysis
  5. MCP ariaSnapshotDiff: Incremental diff support exists in @playwright/mcp but is not broadly exposed yet; when available, it would reduce MCP's context growth

Novel Contributions

  1. First empirical measurement of per-action token cost across CLI, MCP, and CDP
  2. Source code verification that MCP embeds snapshots in every action response (setIncludeSnapshot())
  3. CDP relay analysis confirming claude --chrome uses the same ARIA pipeline with ~15% overhead
  4. Context growth model with mathematical formulation: C(n) = n × 35 + k × S vs C(n) = n × (35 + S)
  5. Practical routing strategy based on action type classification

Future Work

  • Benchmark ariaSnapshotDiff when broadly available in MCP
  • Measure with actual LLM tokenizers (cl100k, o200k) instead of byte-based estimation
  • Extend to other MCP browser implementations (e.g., Browserbase, Stagehand)
  • Measure end-to-end task completion rates (token efficiency vs task success)

References

  1. Microsoft Playwright Team. "@playwright/cli — Token-efficient browser automation for AI agents." npm, 2025. https://www.npmjs.com/package/@playwright/cli
  2. Microsoft Playwright Team. "@playwright/mcp — Playwright tools for MCP." npm, 2025. https://www.npmjs.com/package/@playwright/mcp
  3. Anthropic. "Computer use with Claude — Browser tool (claude --chrome)." Anthropic Docs, 2025. https://docs.anthropic.com/en/docs/claude-code/browser-tool
  4. Chrome DevTools Protocol. "Accessibility Domain." Chrome DevTools Protocol, 2025. https://chromedevtools.github.io/devtools-protocol/tot/Accessibility/
  5. W3C. "WAI-ARIA Accessible Rich Internet Applications." W3C Recommendation, 2024. https://www.w3.org/TR/wai-aria/

License

MIT License. See LICENSE for details.


If this benchmark helped you optimize your AI browser automation costs,
please consider giving it a ⭐

Built with empirical measurements, not estimates.

About

The Hidden Token Tax: Quantifying the True Cost of AI Browser Automation — empirical benchmark of @playwright/cli vs @playwright/mcp vs CDP

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors