The Hidden Token Tax
Every click on a complex page silently consumes 10,000+ tokens via MCP. The same click via CLI? Just 35.
An empirical benchmark revealing 2-5x token waste in standard MCP browser tools, with measured CDP protocol overhead and a hybrid routing solution.
AI agents increasingly rely on browser automation for web interaction, but the token cost of these interactions remains poorly understood. We present the first systematic, empirical benchmark comparing four browser automation approaches for AI agents: @playwright/cli (file-based snapshots), @playwright/mcp (protocol-embedded snapshots), claude --chrome (CDP relay), and raw CDP (Chrome DevTools Protocol).
Our measurements across 4 real-world websites (5-882 DOM elements) reveal that MCP embeds the full accessibility tree in every action response, creating O(n) context growth that silently drains the agent's context window. In contrast, CLI's file-based approach achieves O(1) per-action cost, reducing total token consumption by 37-84% in multi-step workflows.
For a team of 10 developers running 50 browser automation workflows per day, this translates to $20,890 in annual API cost savings (at Claude Sonnet pricing).
We provide reproducible benchmark scripts, raw measurement data, and a practical hybrid routing strategy.
- Key Findings
- The Problem: Invisible Context Drain
- Methodology
- Results
- Architecture Comparison
- The Solution: Hybrid Routing
- Reproduce Our Results
- Raw Data
- Discussion
- References
|
MCP includes the full ARIA snapshot in every action response. Verified in source code: Every click, every keystroke, every navigation — the entire page tree is re-sent. |
|
||||||||||||||||||||||||||||||||
|
A 10-person team running 50 browser workflows/day wastes $20,890/year on redundant tokens.
At Claude Sonnet $3/MTok input pricing |
The raw CDP accessibility tree is 15-157x larger than the processed ARIA snapshot.
This reveals that Playwright's ARIA processing pipeline provides massive compression — but MCP negates this by re-sending every action. |
When an AI agent uses MCP browser tools, every action silently injects the full page accessibility tree into the context window:
Action 1: navigate → +10,162 tokens (full snapshot)
Action 2: click link → +10,167 tokens (full snapshot again)
Action 3: fill form → +10,162 tokens (full snapshot again)
Action 4: click btn → +10,167 tokens (full snapshot again)
Action 5: snapshot → +10,162 tokens (full snapshot again)
─────────────────────────────────────────────────────
Total: 50,820 tokens consumed (for 5 simple actions on Hacker News)
With CLI, the same workflow:
Action 1: navigate → +99 tokens (URL + title only)
Action 2: click link → +35 tokens (confirmation only)
Action 3: fill form → +35 tokens (confirmation only)
Action 4: click btn → +35 tokens (confirmation only)
Action 5: snapshot → +39 tokens (stdout) + 14,646 tokens (file, read once)
─────────────────────────────────────────────────────
Total: 14,889 tokens consumed (71% savings)
The difference? CLI writes snapshots to files. MCP embeds them in every response.
---
config:
themeVariables:
xyChart:
plotColorPalette: "#10B981, #F59E0B"
---
xychart-beta
title "Cumulative Token Cost Over 10 Actions (Medium Page, S=3000 tok)"
x-axis ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"]
y-axis "Total Tokens" 0 --> 35000
line "CLI (O(1) + on-demand read)" [3035, 3070, 3105, 3140, 6175, 6210, 6245, 6280, 6315, 6350]
line "MCP (O(n) every action)" [3035, 6070, 9105, 12140, 15175, 18210, 21245, 24280, 27315, 30350]
CLI grows at 35 tokens/action (constant). MCP grows at 3,035 tokens/action (linear with page size). By action 10, MCP has consumed 4.8x more tokens than CLI.
| Parameter | Value |
|---|---|
| OS | WSL2 (Linux 6.6.87.2-microsoft-standard-WSL2) |
| Node.js | v22.21.1 |
| @playwright/cli | v0.1.1 (Playwright 1.59.0-alpha) |
| @playwright/mcp | v0.0.68 |
| Browser | Chromium (bundled, headless) |
| Token estimation | ceil(bytes / 4) (standard English text approximation) |
We selected 4 pages spanning the complexity spectrum:
| Page | URL | DOM Elements | HTML Size | Complexity |
|---|---|---|---|---|
| example.com | https://example.com |
5 | 528 B | Minimal |
| httpbin forms | https://httpbin.org/forms/post |
44 | 1,419 B | Form-heavy |
| Hacker News | https://news.ycombinator.com |
882 | 34,706 B | Content-rich |
| GitHub Trending | https://github.com/trending |
458 | 567,849 B | Complex SPA |
flowchart LR
subgraph "Three Independent Measurement Paths"
A["CLI\n(subprocess)"] --> D["stdout bytes\n+ file bytes"]
B["MCP-equivalent\nPlaywright API"] --> E["JSON response\nbytes"]
C["CDP Raw\n(DevTools Protocol)"] --> F["Protocol response\nbytes"]
end
D --> G["Token = ceil(bytes/4)"]
E --> G
F --> G
G --> H["Comparison\nTables"]
- CLI: Actual
playwright-clisubprocess calls, measuring stdout and snapshot file sizes - MCP-equivalent: Playwright API with
ariaSnapshot(), wrapped in MCP JSON envelope - CDP Raw: Direct
page.context().newCDPSession()for raw protocol measurements
Each measurement was run 3 times; we report the median.
All benchmark scripts and raw data are included in this repository. See Reproduce Our Results for instructions.
The fundamental metric: how many tokens does each approach consume per browser action?
| Page | CLI stdout | MCP response | MCP/CLI ratio |
|---|---|---|---|
| example.com | 35 tok | 86 tok | 2.5x |
| httpbin forms | 35 tok | 248 tok | 7.1x |
| Hacker News | 35 tok | 10,162 tok | 290x |
| GitHub Trending | 41 tok | 3,499 tok | 85x |
Key Insight: CLI stdout is constant (~35 tokens) regardless of page complexity. It contains only the URL, title, and a file link. The snapshot is written to disk.
| Page | CLI file | MCP response | Ratio |
|---|---|---|---|
| example.com | 79 tok | 86 tok | 1.1x |
| httpbin forms | 420 tok | 248 tok | 0.6x* |
| Hacker News | 14,646 tok | 10,162 tok | 0.7x* |
| GitHub Trending | 6,625 tok | 3,499 tok | 0.5x* |
*CLI file is sometimes larger due to element refs (e1, e2, ...) and YAML formatting. But the file is read once, while MCP sends the snapshot on every action.
This is the critical insight: per-snapshot, MCP and CLI are similar. But MCP sends it N times; CLI sends it K times (K << N).
Real-world workflows involve multiple sequential actions. This is where the token tax compounds.
---
config:
themeVariables:
xyChart:
plotColorPalette: "#10B981, #F59E0B"
---
xychart-beta
title "5-Action Workflow: Total Token Cost"
x-axis ["example.com", "httpbin", "Hacker News", "GitHub"]
y-axis "Total Tokens" 0 --> 55000
bar "CLI (1 snapshot read)" [254, 595, 14821, 6800]
bar "MCP (snapshot every action)" [430, 1370, 53090, 18745]
| Page | CLI Total | MCP Total | Savings | Savings % |
|---|---|---|---|---|
| example.com | 254 tok | 430 tok | 176 tok | 41% |
| httpbin forms | 595 tok | 1,370 tok | 775 tok | 57% |
| Hacker News | 14,821 tok | 53,090 tok | 38,269 tok | 72% |
| GitHub Trending | 6,800 tok | 18,745 tok | 11,945 tok | 64% |
Formula:
- CLI:
C(n) = n × 35 + k × S(page)where k = snapshot reads (typically 1-2)- MCP:
C(n) = n × (35 + S(page))— every action includes the full snapshot
navigate → snapshot → fill × 3 → click → snapshot
| Mode | Total Bytes | Tokens | vs MCP |
|---|---|---|---|
| CLI (standard) | 4,637 B | ~1,159 | 39% saved |
| CLI (optimized, 1 read) | 2,921 B | ~730 | 62% saved |
| MCP (standard) | 7,658 B | ~1,914 | baseline |
navigate → snapshot → click → snapshot → click → snapshot
| Mode | Total Bytes | Tokens | vs MCP |
|---|---|---|---|
| CLI (standard) | 122,764 B | ~30,691 | 52% saved |
| CLI (optimized, 1 read) | 41,788 B | ~10,447 | 84% saved |
| MCP (standard) | 254,826 B | ~63,706 | baseline |
The mathematical relationship between actions and context consumption:
CLI: C(n) = n × 35 + k × S(page) // k = snapshot reads (1-3 typical)
MCP: C(n) = n × (35 + S(page)) // snapshot in every response
CDP: C(n) = n × (35 + R(page)) // R >> S (raw >> processed)
For n = 10 actions:
| Page Size (S) | CLI (k=2) | MCP | CDP Raw | MCP/CLI | CDP/CLI |
|---|---|---|---|---|---|
| 50 tok (minimal) | 450 | 850 | 15,000+ | 1.9x | 33x |
| 500 tok (form) | 1,350 | 5,350 | 100,000+ | 4.0x | 74x |
| 3,000 tok (medium) | 6,350 | 30,350 | 1,500,000+ | 4.8x | 236x |
| 10,000 tok (heavy) | 20,350 | 100,350 | — | 4.9x | — |
Cross-over point: CLI advantage becomes significant when
S > 200 tokens(any page with >10 interactive elements).
We measured the raw CDP protocol output to understand what Playwright's ARIA processing compresses.
| Page | DOMSnapshot | Full A11y Tree | ARIA Snapshot | A11y/ARIA Ratio |
|---|---|---|---|---|
| example.com | 999 B | 5,807 B | 232 B | 25x |
| httpbin forms | 1,002 B | 39,311 B | 912 B | 43x |
| Hacker News | 1,001 B | 631,263 B | 40,620 B | 15.5x |
| GitHub Trending | 999 B | 1,036 B | 1,436 B | 0.7x* |
*GitHub Trending showed anomalously low CDP A11y output (1,036 B), likely due to dynamic content loading or anti-scraping measures affecting the CDP session.
Insight: Playwright's ARIA processing pipeline provides 15-43x compression over raw CDP accessibility data. This is valuable work — but MCP negates the benefit by re-sending the compressed result on every action.
flowchart LR
A["Raw DOM\n(34-567KB HTML)"] --> B["CDP Accessibility\ngetFullAXTree\n(1-631KB)"]
B --> C["ARIA Snapshot\nProcessing\n(232B-40KB)"]
C --> D{"Delivery Method"}
D -->|"MCP"| E["Embedded in\nevery response\n❌ O(n) growth"]
D -->|"CLI"| F["Written to file\nread on demand\n✅ O(1) growth"]
D -->|"chrome"| G["Same as MCP\n+ 15% CDP relay\n❌ O(n) growth"]
style E fill:#FEF3C7,stroke:#F59E0B
style F fill:#D1FAE5,stroke:#10B981
style G fill:#EDE9FE,stroke:#6366F1
claude --chrome uses the same Playwright MCP server internally, connected via a CDP WebSocket relay:
Claude CLI → MCP JSON-RPC → Playwright MCP Server → CDP WebSocket
→ CDPRelayServer → WebSocket → Chrome Extension → Native Messaging → Chrome
Source code analysis of cdpRelay.js confirms:
- Same ARIA snapshot processing as MCP
- Additional WebSocket relay overhead (~15%)
- Not available on WSL2 (requires Chrome UI + Native Messaging bridge)
Estimated token cost: C_chrome(n) ≈ 1.15 × C_mcp(n)
Token efficiency and latency represent a trade-off:
| Operation | CLI | MCP | CDP Direct |
|---|---|---|---|
| Cold start (browser launch + navigate) | 3,200-6,100ms | 590-1,500ms | 50-910ms |
| Warm snapshot | 130-350ms | 16-345ms | 8-140ms |
| Click/Fill action | ~1,200ms | ~200ms | N/A |
CLI has higher per-action latency (~1.2s) due to Playwright code generation. But the total workflow cost is lower because:
- Fewer context tokens = faster LLM inference
- Fewer round-trips needed (snapshot on demand)
- Token savings far outweigh latency cost at scale
| Page | MCP Cost | CLI Cost | Savings |
|---|---|---|---|
| Simple page (5 actions) | $0.001 | $0.001 | ~$0 |
| Form fill (7 actions) | $0.006 | $0.002 | $0.004 |
| Complex page (5 actions) | $0.159 | $0.044 | $0.115 |
| GitHub scrape (4 actions) | $0.056 | $0.020 | $0.036 |
| Scale | MCP Annual | CLI Annual | Annual Savings |
|---|---|---|---|
| Solo developer (10/day) | $580 | $162 | $418 |
| Active developer (50/day) | $2,900 | $811 | $2,089 |
| 10-person team (50/day each) | $29,000 | $8,110 | $20,890 |
| Enterprise (100 devs) | $290,000 | $81,100 | $208,900 |
Assumes complex-page-heavy workload. Real savings vary by page complexity distribution. Claude Opus pricing ($15/MTok) would increase savings by 5x.
flowchart TB
subgraph CLI ["@playwright/cli (Token-Efficient)"]
direction LR
A1["Agent"] -->|"subprocess"| A2["playwright-cli"]
A2 -->|"35 tok stdout"| A1
A2 -->|"file write"| A3[".playwright-cli/page.yml"]
A3 -.->|"read on demand"| A1
end
subgraph MCP ["@playwright/mcp (Standard)"]
direction LR
B1["Agent"] -->|"JSON-RPC"| B2["MCP Server"]
B2 -->|"response + full snapshot\n(100-10,000+ tok)"| B1
end
subgraph Chrome ["claude --chrome (Extension)"]
direction LR
C1["Agent"] -->|"JSON-RPC"| C2["MCP Server"]
C2 -->|"CDP WebSocket"| C3["Relay"]
C3 -->|"Native Msg"| C4["Chrome"]
C2 -->|"response + snapshot\n+ 15% overhead"| C1
end
subgraph CDP ["Raw CDP (Reference)"]
direction LR
D1["Agent"] -->|"CDP Session"| D2["Browser"]
D2 -->|"Full A11y Tree\n(15-157x larger)"| D1
end
style CLI fill:#D1FAE5,stroke:#10B981,stroke-width:3px
style MCP fill:#FEF3C7,stroke:#F59E0B,stroke-width:2px
style Chrome fill:#EDE9FE,stroke:#6366F1,stroke-width:2px
style CDP fill:#FEE2E2,stroke:#EF4444,stroke-width:2px
The key architectural difference is when and how the snapshot is delivered:
| Aspect | CLI | MCP | chrome |
|---|---|---|---|
| Snapshot delivery | File on disk | In every response | In every response + relay |
| Agent reads snapshot | When needed (0-3 times) | Forced (every action) | Forced (every action) |
| Per-action overhead | ~35 tokens (constant) | ~S tokens (page-dependent) | ~1.15S tokens |
| Context growth | O(1) | O(n) | O(n) |
| Element references | e1, e2, ... (pointer IDs) |
Full tree re-parse | Full tree re-parse |
Based on our measurements, we recommend a hybrid routing strategy:
flowchart TD
A["Browser Action Request"] --> B{"Action Type?"}
B -->|"Deterministic\n(click, fill, type,\npress, goto)"| C["CLI\n✅ 35 tok/action"]
B -->|"Exploratory\n(unknown page,\ndebugging)"| D["MCP\n📊 Full tree access"]
B -->|"Visual/Manual\nCollaboration"| E["chrome\n🖥️ GUI needed"]
C --> F{"Need snapshot?"}
F -->|"Yes (first time\nor state changed)"| G["Read file\n(one-time cost)"]
F -->|"No (using refs)"| H["Skip\n(0 extra tokens)"]
style C fill:#D1FAE5,stroke:#10B981
style D fill:#FEF3C7,stroke:#F59E0B
style E fill:#EDE9FE,stroke:#6366F1
| Use Case | Recommended | Token Ratio vs MCP | Reason |
|---|---|---|---|
| Form filling (5+ fields) | CLI | 1.9x savings | Element refs, no repeated snapshots |
| Click sequences | CLI | 2-5x savings | O(1) per action |
| Large page scraping (>500 elements) | CLI | 4.9x savings | File-based snapshot, read once |
| CI/CD E2E testing | CLI | 4.8x savings | Subprocess-friendly, low cost |
| Unknown page exploration | MCP | baseline | Full tree needed for discovery |
| Real-time debugging | MCP/chrome | — | Interactive tree, state management |
| GUI collaboration | chrome | — | Visual context, WSL2 not supported |
# Node.js 18+ required
node --version # v18.0.0+
# Install dependencies
npm install
# Install Chromium browser
npx playwright install chromium# Full benchmark (all 4 pages, all 3 approaches)
npm run benchmark
# JSON output (for further analysis)
npm run benchmark:json > my_results.json
# CLI workflow demo
npm run demo:cliFor environments where Google Chrome is not installed (e.g., CI, WSL2):
# Use bundled Chromium instead of Chrome
export PLAYWRIGHT_MCP_CONFIG=$(cat <<'EOF'
{
"browser": {
"browserName": "chromium",
"launchOptions": {
"channel": "chromium",
"headless": true,
"chromiumSandbox": false
}
}
}
EOF
)
npm run benchmarkFull benchmark results are available in data/results.json.
Summary Table (all measurements)
| Page | Approach | Action | Tokens | Bytes | Latency (ms) |
|---|---|---|---|---|---|
| example.com | CLI | open | 126 | 503 | 3,356 |
| example.com | CLI | snapshot (file) | 79 | 315 | 130 |
| example.com | MCP | navigate | 88 | 352 | 593 |
| example.com | MCP | snapshot | 78 | 312 | 22 |
| example.com | CDP | Full A11y Tree | 1,452 | 5,807 | 22 |
| example.com | CDP | ARIA processed | 58 | 232 | 74 |
| httpbin | CLI | open | 94 | 373 | 3,529 |
| httpbin | CLI | snapshot (file) | 420 | 1,677 | 149 |
| httpbin | MCP | navigate | 258 | 1,032 | 651 |
| httpbin | MCP | snapshot | 248 | 992 | 16 |
| httpbin | CDP | Full A11y Tree | 9,827 | 39,311 | 43 |
| httpbin | CDP | ARIA processed | 228 | 912 | 77 |
| Hacker News | CLI | open | 99 | 396 | 3,728 |
| Hacker News | CLI | snapshot (file) | 14,646 | 58,637 | 284 |
| Hacker News | MCP | navigate | 10,172 | 40,688 | 830 |
| Hacker News | MCP | snapshot | 10,162 | 40,648 | 125 |
| Hacker News | CDP | Full A11y Tree | 157,732 | 631,263 | 139 |
| Hacker News | CDP | ARIA processed | 10,142 | 40,620 | 240 |
| GitHub Trending | CLI | open | 107 | 429 | 6,096 |
| GitHub Trending | CLI | snapshot (file) | 6,625 | 26,502 | 348 |
| GitHub Trending | MCP | navigate | 3,509 | 14,036 | 1,499 |
| GitHub Trending | MCP | snapshot | 3,499 | 13,996 | 345 |
| GitHub Trending | CDP | Full A11y Tree | 259 | 1,036 | 8 |
| GitHub Trending | CDP | ARIA processed | 359 | 1,436 | 382 |
| Claim | Source | Our Measurement | Status |
|---|---|---|---|
| CLI is 4.2x more token-efficient | Microsoft | 1.7-6.1x (workflow) | ✅ Lower bound confirmed |
| CLI up to 100x for single actions | Microsoft | 2.5-290x (per-action stdout) | ✅ Exceeds claim for complex pages |
| On-demand snapshots reduce context | Design goal | O(1) vs O(n) verified | ✅ Confirmed empirically |
| CLI has higher per-action latency | Expected | ~1.2s vs ~200ms | ✅ Confirmed (code generation step) |
- Token estimation: We use
ceil(bytes/4)as a proxy. Actual tokenizer output may vary by ±15% - Single-run timing: Latency measurements show median of 3 runs; network variance affects results
- GitHub Trending anomaly: CDP A11y tree for GitHub Trending showed unexpectedly low output (1,036 B), suggesting dynamic content or anti-bot measures affected the CDP session
- WSL2 environment:
claude --chromecould not be directly measured; token cost is estimated from source code analysis - MCP
ariaSnapshotDiff: Incremental diff support exists in @playwright/mcp but is not broadly exposed yet; when available, it would reduce MCP's context growth
- First empirical measurement of per-action token cost across CLI, MCP, and CDP
- Source code verification that MCP embeds snapshots in every action response (
setIncludeSnapshot()) - CDP relay analysis confirming
claude --chromeuses the same ARIA pipeline with ~15% overhead - Context growth model with mathematical formulation:
C(n) = n × 35 + k × SvsC(n) = n × (35 + S) - Practical routing strategy based on action type classification
- Benchmark
ariaSnapshotDiffwhen broadly available in MCP - Measure with actual LLM tokenizers (cl100k, o200k) instead of byte-based estimation
- Extend to other MCP browser implementations (e.g., Browserbase, Stagehand)
- Measure end-to-end task completion rates (token efficiency vs task success)
- Microsoft Playwright Team. "@playwright/cli — Token-efficient browser automation for AI agents." npm, 2025. https://www.npmjs.com/package/@playwright/cli
- Microsoft Playwright Team. "@playwright/mcp — Playwright tools for MCP." npm, 2025. https://www.npmjs.com/package/@playwright/mcp
- Anthropic. "Computer use with Claude — Browser tool (
claude --chrome)." Anthropic Docs, 2025. https://docs.anthropic.com/en/docs/claude-code/browser-tool - Chrome DevTools Protocol. "Accessibility Domain." Chrome DevTools Protocol, 2025. https://chromedevtools.github.io/devtools-protocol/tot/Accessibility/
- W3C. "WAI-ARIA Accessible Rich Internet Applications." W3C Recommendation, 2024. https://www.w3.org/TR/wai-aria/
MIT License. See LICENSE for details.
If this benchmark helped you optimize your AI browser automation costs,
please consider giving it a ⭐
Built with empirical measurements, not estimates.