The Hidden Token Tax

Quantifying the True Cost of AI Browser Automation

Every click on a complex page silently consumes 10,000+ tokens via MCP. The same click via CLI? Just 35.

An empirical benchmark revealing 2-5x token waste in standard MCP browser tools, with measured CDP protocol overhead and a hybrid routing solution.

Abstract

AI agents increasingly rely on browser automation for web interaction, but the token cost of these interactions remains poorly understood. We present the first systematic, empirical benchmark comparing four browser automation approaches for AI agents: @playwright/cli (file-based snapshots), @playwright/mcp (protocol-embedded snapshots), claude --chrome (CDP relay), and raw CDP (Chrome DevTools Protocol).

Our measurements across 4 real-world websites (5-882 DOM elements) reveal that MCP embeds the full accessibility tree in every action response, creating O(n) context growth that silently drains the agent's context window. In contrast, CLI's file-based approach achieves O(1) per-action cost, reducing total token consumption by 37-84% in multi-step workflows.

For a team of 10 developers running 50 browser automation workflows per day, this translates to $20,890 in annual API cost savings (at Claude Sonnet pricing).

We provide reproducible benchmark scripts, raw measurement data, and a practical hybrid routing strategy.

Visual Summary

Key Findings

🔍 Finding 1: The Token Tax

MCP includes the full ARIA snapshot in every action response.

Verified in source code: setIncludeSnapshot() is called on browser_click, browser_type, browser_navigate, and all interaction tools.

Every click, every keystroke, every navigation — the entire page tree is re-sent.

📊 Finding 2: The Scale

Page	MCP per-action	CLI per-action	Ratio
Simple (5 elements)	86 tok	35 tok	2.5x
Form (44 elements)	248 tok	35 tok	7.1x
Complex (458 elements)	3,499 tok	41 tok	85x
Heavy (882 elements)	10,162 tok	35 tok	290x

💰 Finding 3: The Cost

A 10-person team running 50 browser workflows/day wastes $20,890/year on redundant tokens.

	MCP	CLI	Savings
Per workflow (HN)	$0.159	$0.044	72%
Per developer/year	$2,900	$811	$2,089
10-person team/year	$29,000	$8,110	$20,890

At Claude Sonnet $3/MTok input pricing

⚡ Finding 4: CDP Raw Overhead

The raw CDP accessibility tree is 15-157x larger than the processed ARIA snapshot.

Page	ARIA Snapshot	CDP Raw A11y	Ratio
example.com	232 B	5,807 B	25x
httpbin	912 B	39,311 B	43x
Hacker News	40,620 B	631,263 B	15.5x

This reveals that Playwright's ARIA processing pipeline provides massive compression — but MCP negates this by re-sending every action.

The Problem: Invisible Context Drain

When an AI agent uses MCP browser tools, every action silently injects the full page accessibility tree into the context window:

Action 1: navigate    → +10,162 tokens (full snapshot)
Action 2: click link  → +10,167 tokens (full snapshot again)
Action 3: fill form   → +10,162 tokens (full snapshot again)
Action 4: click btn   → +10,167 tokens (full snapshot again)
Action 5: snapshot    → +10,162 tokens (full snapshot again)
─────────────────────────────────────────────────────
Total: 50,820 tokens consumed (for 5 simple actions on Hacker News)

With CLI, the same workflow:

Action 1: navigate    → +99 tokens (URL + title only)
Action 2: click link  → +35 tokens (confirmation only)
Action 3: fill form   → +35 tokens (confirmation only)
Action 4: click btn   → +35 tokens (confirmation only)
Action 5: snapshot    → +39 tokens (stdout) + 14,646 tokens (file, read once)
─────────────────────────────────────────────────────
Total: 14,889 tokens consumed (71% savings)

The difference? CLI writes snapshots to files. MCP embeds them in every response.

Context Growth Visualization

---
config:
  themeVariables:
    xyChart:
      plotColorPalette: "#10B981, #F59E0B"
---
xychart-beta
    title "Cumulative Token Cost Over 10 Actions (Medium Page, S=3000 tok)"
    x-axis ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"]
    y-axis "Total Tokens" 0 --> 35000
    line "CLI (O(1) + on-demand read)" [3035, 3070, 3105, 3140, 6175, 6210, 6245, 6280, 6315, 6350]
    line "MCP (O(n) every action)" [3035, 6070, 9105, 12140, 15175, 18210, 21245, 24280, 27315, 30350]

CLI grows at 35 tokens/action (constant). MCP grows at 3,035 tokens/action (linear with page size). By action 10, MCP has consumed 4.8x more tokens than CLI.

Methodology

Environment

Parameter	Value
OS	WSL2 (Linux 6.6.87.2-microsoft-standard-WSL2)
Node.js	v22.21.1
@playwright/cli	v0.1.1 (Playwright 1.59.0-alpha)
@playwright/mcp	v0.0.68
Browser	Chromium (bundled, headless)
Token estimation	`ceil(bytes / 4)` (standard English text approximation)

Target Pages

We selected 4 pages spanning the complexity spectrum:

Page	URL	DOM Elements	HTML Size	Complexity
example.com	`https://example.com`	5	528 B	Minimal
httpbin forms	`https://httpbin.org/forms/post`	44	1,419 B	Form-heavy
Hacker News	`https://news.ycombinator.com`	882	34,706 B	Content-rich
GitHub Trending	`https://github.com/trending`	458	567,849 B	Complex SPA

Measurement Approach

flowchart LR
    subgraph "Three Independent Measurement Paths"
        A["CLI\n(subprocess)"] --> D["stdout bytes\n+ file bytes"]
        B["MCP-equivalent\nPlaywright API"] --> E["JSON response\nbytes"]
        C["CDP Raw\n(DevTools Protocol)"] --> F["Protocol response\nbytes"]
    end
    D --> G["Token = ceil(bytes/4)"]
    E --> G
    F --> G
    G --> H["Comparison\nTables"]

CLI: Actual playwright-cli subprocess calls, measuring stdout and snapshot file sizes
MCP-equivalent: Playwright API with ariaSnapshot(), wrapped in MCP JSON envelope
CDP Raw: Direct page.context().newCDPSession() for raw protocol measurements

Each measurement was run 3 times; we report the median.

Reproducibility

All benchmark scripts and raw data are included in this repository. See Reproduce Our Results for instructions.

Results

1. Per-Action Token Cost

The fundamental metric: how many tokens does each approach consume per browser action?

CLI stdout vs MCP response (per action)

Page	CLI stdout	MCP response	MCP/CLI ratio
example.com	35 tok	86 tok	2.5x
httpbin forms	35 tok	248 tok	7.1x
Hacker News	35 tok	10,162 tok	290x
GitHub Trending	41 tok	3,499 tok	85x

Key Insight: CLI stdout is constant (~35 tokens) regardless of page complexity. It contains only the URL, title, and a file link. The snapshot is written to disk.

When the CLI snapshot file IS read

Page	CLI file	MCP response	Ratio
example.com	79 tok	86 tok	1.1x
httpbin forms	420 tok	248 tok	0.6x*
Hacker News	14,646 tok	10,162 tok	0.7x*
GitHub Trending	6,625 tok	3,499 tok	0.5x*

*CLI file is sometimes larger due to element refs (e1, e2, ...) and YAML formatting. But the file is read once, while MCP sends the snapshot on every action.

This is the critical insight: per-snapshot, MCP and CLI are similar. But MCP sends it N times; CLI sends it K times (K << N).

2. Multi-Step Workflow Cost

Real-world workflows involve multiple sequential actions. This is where the token tax compounds.

5-Action Workflow Comparison

---
config:
  themeVariables:
    xyChart:
      plotColorPalette: "#10B981, #F59E0B"
---
xychart-beta
    title "5-Action Workflow: Total Token Cost"
    x-axis ["example.com", "httpbin", "Hacker News", "GitHub"]
    y-axis "Total Tokens" 0 --> 55000
    bar "CLI (1 snapshot read)" [254, 595, 14821, 6800]
    bar "MCP (snapshot every action)" [430, 1370, 53090, 18745]

Page	CLI Total	MCP Total	Savings	Savings %
example.com	254 tok	430 tok	176 tok	41%
httpbin forms	595 tok	1,370 tok	775 tok	57%
Hacker News	14,821 tok	53,090 tok	38,269 tok	72%
GitHub Trending	6,800 tok	18,745 tok	11,945 tok	64%

Formula:

CLI: C(n) = n × 35 + k × S(page) where k = snapshot reads (typically 1-2)

MCP: C(n) = n × (35 + S(page)) — every action includes the full snapshot

7-Action Form Fill Workflow (httpbin)

navigate → snapshot → fill × 3 → click → snapshot

Mode	Total Bytes	Tokens	vs MCP
CLI (standard)	4,637 B	~1,159	39% saved
CLI (optimized, 1 read)	2,921 B	~730	62% saved
MCP (standard)	7,658 B	~1,914	baseline

6-Action Complex Page (Hacker News)

navigate → snapshot → click → snapshot → click → snapshot

Mode	Total Bytes	Tokens	vs MCP
CLI (standard)	122,764 B	~30,691	52% saved
CLI (optimized, 1 read)	41,788 B	~10,447	84% saved
MCP (standard)	254,826 B	~63,706	baseline

3. Context Growth Model

The mathematical relationship between actions and context consumption:

CLI:  C(n) = n × 35 + k × S(page)    // k = snapshot reads (1-3 typical)
MCP:  C(n) = n × (35 + S(page))       // snapshot in every response
CDP:  C(n) = n × (35 + R(page))       // R >> S (raw >> processed)

For n = 10 actions:

Page Size (S)	CLI (k=2)	MCP	CDP Raw	MCP/CLI	CDP/CLI
50 tok (minimal)	450	850	15,000+	1.9x	33x
500 tok (form)	1,350	5,350	100,000+	4.0x	74x
3,000 tok (medium)	6,350	30,350	1,500,000+	4.8x	236x
10,000 tok (heavy)	20,350	100,350	—	4.9x	—

Cross-over point: CLI advantage becomes significant when S > 200 tokens (any page with >10 interactive elements).

4. CDP Protocol Deep Dive

We measured the raw CDP protocol output to understand what Playwright's ARIA processing compresses.

CDP Response Size Comparison

Page	DOMSnapshot	Full A11y Tree	ARIA Snapshot	A11y/ARIA Ratio
example.com	999 B	5,807 B	232 B	25x
httpbin forms	1,002 B	39,311 B	912 B	43x
Hacker News	1,001 B	631,263 B	40,620 B	15.5x
GitHub Trending	999 B	1,036 B	1,436 B	0.7x*

*GitHub Trending showed anomalously low CDP A11y output (1,036 B), likely due to dynamic content loading or anti-scraping measures affecting the CDP session.

Insight: Playwright's ARIA processing pipeline provides 15-43x compression over raw CDP accessibility data. This is valuable work — but MCP negates the benefit by re-sending the compressed result on every action.

The Data Pipeline

flowchart LR
    A["Raw DOM\n(34-567KB HTML)"] --> B["CDP Accessibility\ngetFullAXTree\n(1-631KB)"]
    B --> C["ARIA Snapshot\nProcessing\n(232B-40KB)"]
    C --> D{"Delivery Method"}
    D -->|"MCP"| E["Embedded in\nevery response\n❌ O(n) growth"]
    D -->|"CLI"| F["Written to file\nread on demand\n✅ O(1) growth"]
    D -->|"chrome"| G["Same as MCP\n+ 15% CDP relay\n❌ O(n) growth"]

    style E fill:#FEF3C7,stroke:#F59E0B
    style F fill:#D1FAE5,stroke:#10B981
    style G fill:#EDE9FE,stroke:#6366F1

claude --chrome Analysis

claude --chrome uses the same Playwright MCP server internally, connected via a CDP WebSocket relay:

Claude CLI → MCP JSON-RPC → Playwright MCP Server → CDP WebSocket
    → CDPRelayServer → WebSocket → Chrome Extension → Native Messaging → Chrome

Source code analysis of cdpRelay.js confirms:

Same ARIA snapshot processing as MCP
Additional WebSocket relay overhead (~15%)
Not available on WSL2 (requires Chrome UI + Native Messaging bridge)

Estimated token cost: C_chrome(n) ≈ 1.15 × C_mcp(n)

5. Latency Analysis

Token efficiency and latency represent a trade-off:

Operation	CLI	MCP	CDP Direct
Cold start (browser launch + navigate)	3,200-6,100ms	590-1,500ms	50-910ms
Warm snapshot	130-350ms	16-345ms	8-140ms
Click/Fill action	~1,200ms	~200ms	N/A

CLI has higher per-action latency (~1.2s) due to Playwright code generation. But the total workflow cost is lower because:

Fewer context tokens = faster LLM inference

Fewer round-trips needed (snapshot on demand)

Token savings far outweigh latency cost at scale

6. Cost Impact Analysis

Per-Workflow Cost (Claude Sonnet, $3/MTok input)

Page	MCP Cost	CLI Cost	Savings
Simple page (5 actions)	$0.001	$0.001	~$0
Form fill (7 actions)	$0.006	$0.002	$0.004
Complex page (5 actions)	$0.159	$0.044	$0.115
GitHub scrape (4 actions)	$0.056	$0.020	$0.036

Annualized Impact

Scale	MCP Annual	CLI Annual	Annual Savings
Solo developer (10/day)	$580	$162	$418
Active developer (50/day)	$2,900	$811	$2,089
10-person team (50/day each)	$29,000	$8,110	$20,890
Enterprise (100 devs)	$290,000	$81,100	$208,900

Assumes complex-page-heavy workload. Real savings vary by page complexity distribution. Claude Opus pricing ($15/MTok) would increase savings by 5x.

Architecture Comparison

flowchart TB
    subgraph CLI ["@playwright/cli (Token-Efficient)"]
        direction LR
        A1["Agent"] -->|"subprocess"| A2["playwright-cli"]
        A2 -->|"35 tok stdout"| A1
        A2 -->|"file write"| A3[".playwright-cli/page.yml"]
        A3 -.->|"read on demand"| A1
    end

    subgraph MCP ["@playwright/mcp (Standard)"]
        direction LR
        B1["Agent"] -->|"JSON-RPC"| B2["MCP Server"]
        B2 -->|"response + full snapshot\n(100-10,000+ tok)"| B1
    end

    subgraph Chrome ["claude --chrome (Extension)"]
        direction LR
        C1["Agent"] -->|"JSON-RPC"| C2["MCP Server"]
        C2 -->|"CDP WebSocket"| C3["Relay"]
        C3 -->|"Native Msg"| C4["Chrome"]
        C2 -->|"response + snapshot\n+ 15% overhead"| C1
    end

    subgraph CDP ["Raw CDP (Reference)"]
        direction LR
        D1["Agent"] -->|"CDP Session"| D2["Browser"]
        D2 -->|"Full A11y Tree\n(15-157x larger)"| D1
    end

    style CLI fill:#D1FAE5,stroke:#10B981,stroke-width:3px
    style MCP fill:#FEF3C7,stroke:#F59E0B,stroke-width:2px
    style Chrome fill:#EDE9FE,stroke:#6366F1,stroke-width:2px
    style CDP fill:#FEE2E2,stroke:#EF4444,stroke-width:2px

Why CLI is Different

The key architectural difference is when and how the snapshot is delivered:

Aspect	CLI	MCP	chrome
Snapshot delivery	File on disk	In every response	In every response + relay
Agent reads snapshot	When needed (0-3 times)	Forced (every action)	Forced (every action)
Per-action overhead	~35 tokens (constant)	~S tokens (page-dependent)	~1.15S tokens
Context growth	O(1)	O(n)	O(n)
Element references	`e1`, `e2`, ... (pointer IDs)	Full tree re-parse	Full tree re-parse

The Solution: Hybrid Routing

Based on our measurements, we recommend a hybrid routing strategy:

flowchart TD
    A["Browser Action Request"] --> B{"Action Type?"}
    B -->|"Deterministic\n(click, fill, type,\npress, goto)"| C["CLI\n✅ 35 tok/action"]
    B -->|"Exploratory\n(unknown page,\ndebugging)"| D["MCP\n📊 Full tree access"]
    B -->|"Visual/Manual\nCollaboration"| E["chrome\n🖥️ GUI needed"]

    C --> F{"Need snapshot?"}
    F -->|"Yes (first time\nor state changed)"| G["Read file\n(one-time cost)"]
    F -->|"No (using refs)"| H["Skip\n(0 extra tokens)"]

    style C fill:#D1FAE5,stroke:#10B981
    style D fill:#FEF3C7,stroke:#F59E0B
    style E fill:#EDE9FE,stroke:#6366F1

Routing Decision Table

Use Case	Recommended	Token Ratio vs MCP	Reason
Form filling (5+ fields)	CLI	1.9x savings	Element refs, no repeated snapshots
Click sequences	CLI	2-5x savings	O(1) per action
Large page scraping (>500 elements)	CLI	4.9x savings	File-based snapshot, read once
CI/CD E2E testing	CLI	4.8x savings	Subprocess-friendly, low cost
Unknown page exploration	MCP	baseline	Full tree needed for discovery
Real-time debugging	MCP/chrome	—	Interactive tree, state management
GUI collaboration	chrome	—	Visual context, WSL2 not supported

Reproduce Our Results

Prerequisites

# Node.js 18+ required
node --version  # v18.0.0+

# Install dependencies
npm install

# Install Chromium browser
npx playwright install chromium

Run Benchmarks

# Full benchmark (all 4 pages, all 3 approaches)
npm run benchmark

# JSON output (for further analysis)
npm run benchmark:json > my_results.json

# CLI workflow demo
npm run demo:cli

Configuration

For environments where Google Chrome is not installed (e.g., CI, WSL2):

# Use bundled Chromium instead of Chrome
export PLAYWRIGHT_MCP_CONFIG=$(cat <<'EOF'
{
  "browser": {
    "browserName": "chromium",
    "launchOptions": {
      "channel": "chromium",
      "headless": true,
      "chromiumSandbox": false
    }
  }
}
EOF
)

npm run benchmark

Raw Data

Full benchmark results are available in data/results.json.

Summary Table (all measurements)

Page	Approach	Action	Tokens	Bytes	Latency (ms)
example.com	CLI	open	126	503	3,356
example.com	CLI	snapshot (file)	79	315	130
example.com	MCP	navigate	88	352	593
example.com	MCP	snapshot	78	312	22
example.com	CDP	Full A11y Tree	1,452	5,807	22
example.com	CDP	ARIA processed	58	232	74
httpbin	CLI	open	94	373	3,529
httpbin	CLI	snapshot (file)	420	1,677	149
httpbin	MCP	navigate	258	1,032	651
httpbin	MCP	snapshot	248	992	16
httpbin	CDP	Full A11y Tree	9,827	39,311	43
httpbin	CDP	ARIA processed	228	912	77
Hacker News	CLI	open	99	396	3,728
Hacker News	CLI	snapshot (file)	14,646	58,637	284
Hacker News	MCP	navigate	10,172	40,688	830
Hacker News	MCP	snapshot	10,162	40,648	125
Hacker News	CDP	Full A11y Tree	157,732	631,263	139
Hacker News	CDP	ARIA processed	10,142	40,620	240
GitHub Trending	CLI	open	107	429	6,096
GitHub Trending	CLI	snapshot (file)	6,625	26,502	348
GitHub Trending	MCP	navigate	3,509	14,036	1,499
GitHub Trending	MCP	snapshot	3,499	13,996	345
GitHub Trending	CDP	Full A11y Tree	259	1,036	8
GitHub Trending	CDP	ARIA processed	359	1,436	382

Discussion

Validated Claims

Claim	Source	Our Measurement	Status
CLI is 4.2x more token-efficient	Microsoft	1.7-6.1x (workflow)	✅ Lower bound confirmed
CLI up to 100x for single actions	Microsoft	2.5-290x (per-action stdout)	✅ Exceeds claim for complex pages
On-demand snapshots reduce context	Design goal	O(1) vs O(n) verified	✅ Confirmed empirically
CLI has higher per-action latency	Expected	~1.2s vs ~200ms	✅ Confirmed (code generation step)

Limitations

Token estimation: We use ceil(bytes/4) as a proxy. Actual tokenizer output may vary by ±15%
Single-run timing: Latency measurements show median of 3 runs; network variance affects results
GitHub Trending anomaly: CDP A11y tree for GitHub Trending showed unexpectedly low output (1,036 B), suggesting dynamic content or anti-bot measures affected the CDP session
WSL2 environment: claude --chrome could not be directly measured; token cost is estimated from source code analysis
MCP ariaSnapshotDiff: Incremental diff support exists in @playwright/mcp but is not broadly exposed yet; when available, it would reduce MCP's context growth

Novel Contributions

First empirical measurement of per-action token cost across CLI, MCP, and CDP
Source code verification that MCP embeds snapshots in every action response (setIncludeSnapshot())
CDP relay analysis confirming claude --chrome uses the same ARIA pipeline with ~15% overhead
Context growth model with mathematical formulation: C(n) = n × 35 + k × S vs C(n) = n × (35 + S)
Practical routing strategy based on action type classification

Future Work

Benchmark ariaSnapshotDiff when broadly available in MCP
Measure with actual LLM tokenizers (cl100k, o200k) instead of byte-based estimation
Extend to other MCP browser implementations (e.g., Browserbase, Stagehand)
Measure end-to-end task completion rates (token efficiency vs task success)

References

Microsoft Playwright Team. "@playwright/cli — Token-efficient browser automation for AI agents." npm, 2025. https://www.npmjs.com/package/@playwright/cli
Microsoft Playwright Team. "@playwright/mcp — Playwright tools for MCP." npm, 2025. https://www.npmjs.com/package/@playwright/mcp
Anthropic. "Computer use with Claude — Browser tool (claude --chrome)." Anthropic Docs, 2025. https://docs.anthropic.com/en/docs/claude-code/browser-tool
Chrome DevTools Protocol. "Accessibility Domain." Chrome DevTools Protocol, 2025. https://chromedevtools.github.io/devtools-protocol/tot/Accessibility/
W3C. "WAI-ARIA Accessible Rich Internet Applications." W3C Recommendation, 2024. https://www.w3.org/TR/wai-aria/

License

MIT License. See LICENSE for details.

If this benchmark helped you optimize your AI browser automation costs,
please consider giving it a ⭐

Built with empirical measurements, not estimates.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
benchmarks		benchmarks
data		data
docs		docs
figures		figures
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

The Hidden Token Tax

Quantifying the True Cost of AI Browser Automation

Abstract

Table of Contents

Visual Summary

Key Findings

🔍 Finding 1: The Token Tax

📊 Finding 2: The Scale

💰 Finding 3: The Cost

⚡ Finding 4: CDP Raw Overhead

The Problem: Invisible Context Drain

Context Growth Visualization

Methodology

Environment

Target Pages

Measurement Approach

Reproducibility

Results

1. Per-Action Token Cost

CLI stdout vs MCP response (per action)

When the CLI snapshot file IS read

2. Multi-Step Workflow Cost

5-Action Workflow Comparison

7-Action Form Fill Workflow (httpbin)

6-Action Complex Page (Hacker News)

3. Context Growth Model

4. CDP Protocol Deep Dive

CDP Response Size Comparison

The Data Pipeline

claude --chrome Analysis

5. Latency Analysis

6. Cost Impact Analysis

Per-Workflow Cost (Claude Sonnet, $3/MTok input)

Annualized Impact

Architecture Comparison

Why CLI is Different

The Solution: Hybrid Routing

Routing Decision Table

Reproduce Our Results

Prerequisites

Run Benchmarks

Configuration

Raw Data

Discussion

Validated Claims

Limitations

Novel Contributions

Future Work

References

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages