You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|**Token Savings**| baseline |**70-74% per step**| Significant |
317
+
|**SPA Hydration**| No built-in wait |**`check().eventually()` handles it**| More reliable |
318
+
319
+
**Why A11y Tree Failed at Step 4:**
320
+
321
+
The A11y (accessibility tree) approach failed to click the login button because:
322
+
323
+
1.**Element ID mismatch**: The A11y tree assigns sequential IDs based on DOM traversal order, which can change between snapshots as the SPA re-renders. The LLM selected element 47 ("Sign in"), but that ID no longer pointed to the button after form state changed.
324
+
325
+
2.**No stable identifiers**: Unlike Predicate's `data-predicate-id` attributes (injected by the browser extension), A11y IDs are ephemeral and not anchored to the actual DOM elements.
326
+
327
+
3.**SPA state changes**: After filling both form fields, the button transitioned from disabled → enabled. This state change can cause the A11y tree to re-order elements, invalidating the LLM's element selection.
328
+
329
+
**Predicate Snapshot succeeded because:**
330
+
-`data-predicate-id` attributes are stable across re-renders
331
+
- ML-ranking surfaces the most relevant elements (button with "Sign in" text)
332
+
-`runtime.check().eventually()` properly waits for SPA hydration
333
+
334
+
#### Raw Demo Logs
335
+
336
+
<details>
337
+
<summary>Click to expand full demo output</summary>
> **Note:** Latency includes network time for ML ranking via the Predicate gateway. Token savings translate directly to cost savings—97% fewer tokens = 97% lower LLM costs.
> **Key Insight:** Predicate Snapshot not only reduces tokens by 70%+ per step, but also **improves automation reliability** on SPAs with automatic wait for hydration via `runtime.check().eventually()`. The stable element IDs survive React/Next.js re-renders that break A11y tree-based approaches.
## Why Predicate Snapshot Over Accessibility Tree?
502
+
503
+
OpenClaw and similar browser automation frameworks default to the **Accessibility Tree (A11y)** for navigating websites. While A11y works for simple cases, it has fundamental limitations that make it unreliable for production LLM-driven automation:
504
+
505
+
### A11y Tree Limitations
506
+
507
+
| Problem | Description | Impact on LLM Agents |
508
+
|---------|-------------|----------------------|
509
+
|**Optimized for Consumption, Not Action**| A11y is designed for assistive technology (screen readers), not action verification or layout reasoning | Lacks precise semantic geometry and ordinality (e.g., "the first item in a list") that agents need for reliable reasoning |
510
+
|**Hydration Lag & Structural Inconsistency**| In JS-heavy SPAs, A11y often lags behind hydration or misrepresents dynamic overlays and grouping | Snapshots miss interactive nodes or incorrectly label states (e.g., confusing `focused` with `active`) |
511
+
|**Shadow DOM & Iframe Blind Spots**| A11y struggles to maintain global order across Shadow DOM and iframe boundaries | Cross-shadow ARIA delegation is inconsistent; iframe contents are often missing or lose spatial context |
512
+
|**Token Inefficiency**| Extracting the entire A11y tree for small actions wastes context window and compute | Superfluous nodes (like `genericContainer`) consume tokens without helping the agent |
513
+
|**Missing Visual/Layout Bugs**| A11y trees miss rendering-time issues like overlapping buttons or z-index conflicts | Agent reports elements as "correct" but cannot detect visual collisions |
514
+
515
+
### Predicate Snapshot Advantages
516
+
517
+
| Capability | How Predicate Solves It |
518
+
|------------|------------------------|
519
+
|**Post-Rendered Geometry**| Layers in actual bounding boxes and grouping missing from standard A11y representations |
520
+
|**Live DOM Synchronization**| Anchors on the live, post-rendered DOM ensuring perfect sync with actual page state |
521
+
|**Unified Cross-Boundary Grounding**| Rust/WASM engine prunes and ranks elements across Shadow DOM and iframes, maintaining unified element ordering |
522
+
|**Token-Efficient Pruning**| Specifically prunes uninformative branches while preserving all interactive elements, enabling 3B parameter models to perform at larger model levels |
523
+
|**Deterministic Verification**| Binds intent to deterministic outcomes via snapshot diff, providing an auditable "truth" layer rather than just a structural "report" |
524
+
525
+
> **Bottom Line:** A11y trees tell you what *should* be there. Predicate Snapshots tell you what *is* there—and prove it.
0 commit comments