Context
SimBench compared iOS-simulator agent tools on speed and token cost using agent-device@0.17.6.
Source: https://github.com/rizwankce/SimBench
Results: https://github.com/rizwankce/SimBench/blob/main/RESULTS.md
Aggregate metrics: https://github.com/rizwankce/SimBench/blob/main/out/MERGED-final/aggregate.json
Agent-device adapter: https://github.com/rizwankce/SimBench/blob/main/harness/adapters/agent-device.sh
The benchmark is favorable on token cost: agent-device snapshot -i --force-full produced about 224 tokens for the initial Settings root UI dump, roughly tied with RocketSim and much smaller than XcodeBuildMCP, Mobilewright, or AXe.
The concern is latency variance. SimBench reports warm get-ui-state.1 median around 3265ms with variance around 278050, which was the highest variance in the table. Later, simpler screen reads were much faster, so the issue is likely screen/substrate dependent, but the jitter is still worth understanding.
Goal
Identify where iOS snapshot -i/interactive get-UI-state latency variance comes from and reduce it where practical.
Investigation Notes
- Add or use existing diagnostics to split snapshot time into runner request, accessibility tree retrieval, processing/shaping, transport/daemon IPC, and CLI output phases.
- Reproduce with a heavy built-in screen such as Settings root and a simpler screen to separate accessibility-substrate cost from agent-device overhead.
- Compare
snapshot -i, plain snapshot, and JSON output where relevant to isolate output shaping/token-friendly formatting cost.
- Check whether daemon/session reuse, state-dir placement, or runner lifecycle causes occasional slow outliers.
- Keep compact
snapshot -i output as a strength; the goal is lower jitter without expanding token output.
Acceptance Criteria
- A short benchmark or diagnostic trace identifies the dominant source of latency and outliers for iOS
snapshot -i.
- If an optimization is viable, implement it without changing the public snapshot contract or compact output semantics.
- If the bottleneck is primarily XCTest/accessibility traversal, document that with measured evidence and identify any remaining agent-device-controlled overhead.
- Add or update focused tests for any changed snapshot-processing or routing behavior where practical.
Context
SimBench compared iOS-simulator agent tools on speed and token cost using
agent-device@0.17.6.Source: https://github.com/rizwankce/SimBench
Results: https://github.com/rizwankce/SimBench/blob/main/RESULTS.md
Aggregate metrics: https://github.com/rizwankce/SimBench/blob/main/out/MERGED-final/aggregate.json
Agent-device adapter: https://github.com/rizwankce/SimBench/blob/main/harness/adapters/agent-device.sh
The benchmark is favorable on token cost:
agent-device snapshot -i --force-fullproduced about224tokens for the initial Settings root UI dump, roughly tied with RocketSim and much smaller than XcodeBuildMCP, Mobilewright, or AXe.The concern is latency variance. SimBench reports warm
get-ui-state.1median around3265mswith variance around278050, which was the highest variance in the table. Later, simpler screen reads were much faster, so the issue is likely screen/substrate dependent, but the jitter is still worth understanding.Goal
Identify where iOS
snapshot -i/interactive get-UI-state latency variance comes from and reduce it where practical.Investigation Notes
snapshot -i, plain snapshot, and JSON output where relevant to isolate output shaping/token-friendly formatting cost.snapshot -ioutput as a strength; the goal is lower jitter without expanding token output.Acceptance Criteria
snapshot -i.