Skip to content

perf: investigate iOS snapshot/get-ui-state latency jitter #917

Description

@thymikee

Context

SimBench compared iOS-simulator agent tools on speed and token cost using agent-device@0.17.6.

Source: https://github.com/rizwankce/SimBench
Results: https://github.com/rizwankce/SimBench/blob/main/RESULTS.md
Aggregate metrics: https://github.com/rizwankce/SimBench/blob/main/out/MERGED-final/aggregate.json
Agent-device adapter: https://github.com/rizwankce/SimBench/blob/main/harness/adapters/agent-device.sh

The benchmark is favorable on token cost: agent-device snapshot -i --force-full produced about 224 tokens for the initial Settings root UI dump, roughly tied with RocketSim and much smaller than XcodeBuildMCP, Mobilewright, or AXe.

The concern is latency variance. SimBench reports warm get-ui-state.1 median around 3265ms with variance around 278050, which was the highest variance in the table. Later, simpler screen reads were much faster, so the issue is likely screen/substrate dependent, but the jitter is still worth understanding.

Goal

Identify where iOS snapshot -i/interactive get-UI-state latency variance comes from and reduce it where practical.

Investigation Notes

  • Add or use existing diagnostics to split snapshot time into runner request, accessibility tree retrieval, processing/shaping, transport/daemon IPC, and CLI output phases.
  • Reproduce with a heavy built-in screen such as Settings root and a simpler screen to separate accessibility-substrate cost from agent-device overhead.
  • Compare snapshot -i, plain snapshot, and JSON output where relevant to isolate output shaping/token-friendly formatting cost.
  • Check whether daemon/session reuse, state-dir placement, or runner lifecycle causes occasional slow outliers.
  • Keep compact snapshot -i output as a strength; the goal is lower jitter without expanding token output.

Acceptance Criteria

  • A short benchmark or diagnostic trace identifies the dominant source of latency and outliers for iOS snapshot -i.
  • If an optimization is viable, implement it without changing the public snapshot contract or compact output semantics.
  • If the bottleneck is primarily XCTest/accessibility traversal, document that with measured evidence and identify any remaining agent-device-controlled overhead.
  • Add or update focused tests for any changed snapshot-processing or routing behavior where practical.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions