Skip to content

Commit d62cced

Browse files
authored
Merge pull request #3 from PredicateSystems/demo3
changed demo to login form
2 parents 869ac92 + 017c9f3 commit d62cced

6 files changed

Lines changed: 983 additions & 67 deletions

File tree

README.md

Lines changed: 182 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -246,9 +246,25 @@ news.ycombinator.com (REAL)
246246
Monthly Savings: $11,557.63
247247
```
248248

249-
### Run LLM Action Demo
249+
### Run Login Demo (Multi-Step Workflow)
250250

251-
Test that Predicate snapshots work for real browser automation with an LLM.
251+
This demo demonstrates real-world browser automation with a **6-step login workflow**:
252+
253+
1. Navigate to login page and wait for delayed hydration
254+
2. Fill username field (LLM selects element + human-like typing)
255+
3. Fill password field (button state: disabled → enabled)
256+
4. Click login button
257+
5. Navigate to profile page
258+
6. Extract username from profile card
259+
260+
![Local Llama Land Login Form with Predicate Overlay](demo/localLlamaLand_demo.png)
261+
262+
*The Predicate overlay shows ML-ranked element IDs (green borders) that the LLM uses to select form fields.*
263+
264+
Target site: `https://www.localllamaland.com/login` - a test site with intentional challenges:
265+
- **Delayed hydration**: Form loads after ~600ms (SPA pattern)
266+
- **State transitions**: Login button disabled until both fields filled
267+
- **Late-loading content**: Profile card loads after 800-1200ms
252268

253269
**Setup:**
254270

@@ -266,36 +282,63 @@ OPENAI_API_KEY=sk-your-openai-api-key-here
266282
PREDICATE_API_KEY=sk-your-predicate-api-key-here
267283
```
268284

269-
3. Run the demo:
285+
3. Run the demo with visible browser and element overlay:
270286
```bash
271-
npm run demo:llm
272-
273-
# With visible browser and element overlay (for debugging)
274-
npm run demo:llm -- --headed --overlay
287+
npm run demo:login -- --headed --overlay
275288
```
276289

277290
**Alternative LLM providers:**
278291
```bash
279292
# Anthropic Claude
280-
ANTHROPIC_API_KEY=sk-... npm run demo:llm
293+
ANTHROPIC_API_KEY=sk-... npm run demo:login -- --headed --overlay
281294

282295
# Local LLM (Ollama)
283-
SENTIENCE_LOCAL_LLM_BASE_URL=http://localhost:11434/v1 npm run demo:llm
296+
SENTIENCE_LOCAL_LLM_BASE_URL=http://localhost:11434/v1 npm run demo:login -- --headed --overlay
297+
298+
# Headless mode (no browser window)
299+
npm run demo:login
284300
```
285301

286302
**Flags:**
287303
- `--headed` - Run browser in visible window (not headless)
288304
- `--overlay` - Show green borders around captured elements (requires `--headed`)
289305

290-
This demo compares:
291-
- **Tokens**: A11y tree vs Predicate snapshot input size
292-
- **Latency**: Total time including LLM response
293-
- **Success**: Whether the LLM correctly identifies the target element
306+
This demo compares A11y Tree vs Predicate Snapshot across **all 6 steps**, measuring:
307+
- **Tokens per step**: Input size for each LLM call
308+
- **Latency**: Time per step including form interactions
309+
- **Success rate**: Step completion across the workflow
310+
311+
#### Key Observations
312+
313+
| Metric | A11y Tree | Predicate Snapshot | Delta |
314+
|--------|-----------|-------------------|-------|
315+
| **Steps Completed** | 3/6 (failed at step 4) | **6/6** | Predicate wins |
316+
| **Token Savings** | baseline | **70-74% per step** | Significant |
317+
| **SPA Hydration** | No built-in wait | **`check().eventually()` handles it** | More reliable |
318+
319+
**Why A11y Tree Failed at Step 4:**
320+
321+
The A11y (accessibility tree) approach failed to click the login button because:
322+
323+
1. **Element ID mismatch**: The A11y tree assigns sequential IDs based on DOM traversal order, which can change between snapshots as the SPA re-renders. The LLM selected element 47 ("Sign in"), but that ID no longer pointed to the button after form state changed.
324+
325+
2. **No stable identifiers**: Unlike Predicate's `data-predicate-id` attributes (injected by the browser extension), A11y IDs are ephemeral and not anchored to the actual DOM elements.
326+
327+
3. **SPA state changes**: After filling both form fields, the button transitioned from disabled → enabled. This state change can cause the A11y tree to re-order elements, invalidating the LLM's element selection.
328+
329+
**Predicate Snapshot succeeded because:**
330+
- `data-predicate-id` attributes are stable across re-renders
331+
- ML-ranking surfaces the most relevant elements (button with "Sign in" text)
332+
- `runtime.check().eventually()` properly waits for SPA hydration
333+
334+
#### Raw Demo Logs
335+
336+
<details>
337+
<summary>Click to expand full demo output</summary>
294338

295-
**Example output:**
296339
```
297340
======================================================================
298-
LLM Browser Navigation COMPARISON: A11y Tree vs. Predicate Snapshot
341+
LOGIN + PROFILE CHECK: A11y Tree vs. Predicate Snapshot
299342
======================================================================
300343
Using OpenAI provider
301344
Model: gpt-4o-mini
@@ -304,51 +347,114 @@ Overlay enabled: elements will be highlighted with green borders
304347
Predicate snapshots: REAL (ML-ranked)
305348
======================================================================
306349
307-
Task: Click first news link on HN
350+
======================================================================
351+
Running with A11Y approach
352+
======================================================================
308353
309-
[A11y Tree] Click first news link on HN
310-
Chose element: 48
311-
Tokens: 35191, Latency: 3932ms
312-
[Predicate] Click first news link on HN
313-
Chose element: 48
314-
Tokens: 864, Latency: 2477ms
354+
[2026-02-25 01:14:50] Step 1: Wait for login form hydration
355+
Waiting for form to hydrate using runtime.check().eventually()...
356+
Button initially disabled: false
357+
PASS (11822ms) | Found 19 elements
315358
316-
Task: Click More link on HN
359+
[2026-02-25 01:15:02] Step 2: Fill username field
360+
Snapshot: 45 elements, 1241 tokens
361+
LLM chose element 37: "Username"
362+
PASS (6771ms) | Typed "testuser"
363+
Tokens: prompt=1241 total=1251
317364
318-
[A11y Tree] Click More link on HN
319-
Chose element: 1199
320-
Tokens: 35179, Latency: 2366ms
321-
[Predicate] Click More link on HN
322-
Chose element: 11
323-
Tokens: 861, Latency: 1979ms
365+
[2026-02-25 01:15:08] Step 3: Fill password field
366+
LLM chose element 42: "Password"
367+
Waiting for login button to become enabled...
368+
PASS (12465ms) | Button enabled: true
369+
Tokens: prompt=1295 total=1305
324370
325-
Task: Click search on Example.com
371+
[2026-02-25 01:15:21] Step 4: Click login button
372+
LLM chose element 47: "Sign in"
373+
FAIL (7801ms) | Navigated to https://www.localllamaland.com/login
374+
Tokens: prompt=1367 total=1377
375+
376+
======================================================================
377+
Running with PREDICATE approach
378+
======================================================================
326379
327-
[A11y Tree] Click search on Example.com
328-
Chose element: 7
329-
Tokens: 272, Latency: 492ms
330-
[Predicate] Click search on Example.com
331-
Chose element: 6
332-
Tokens: 44, Latency: 6255ms
380+
[2026-02-25 01:15:29] Step 1: Wait for login form hydration
381+
Waiting for form to hydrate using runtime.check().eventually()...
382+
Button initially disabled: false
383+
PASS (10586ms) | Found 19 elements
384+
385+
[2026-02-25 01:15:40] Step 2: Fill username field
386+
Snapshot: 19 elements, 351 tokens
387+
LLM chose element 23: "username"
388+
PASS (12877ms) | Typed "testuser"
389+
Tokens: prompt=351 total=361
390+
391+
[2026-02-25 01:15:53] Step 3: Fill password field
392+
LLM chose element 25: "Password"
393+
Waiting for login button to become enabled...
394+
PASS (17886ms) | Button enabled: true
395+
Tokens: prompt=352 total=362
396+
397+
[2026-02-25 01:16:10] Step 4: Click login button
398+
LLM chose element 29: "Sign in"
399+
PASS (12690ms) | Navigated to https://www.localllamaland.com/profile
400+
Tokens: prompt=346 total=356
401+
402+
[2026-02-25 01:16:23] Step 5: Navigate to profile page
403+
PASS (1ms) | Already on profile page
404+
405+
[2026-02-25 01:16:23] Step 6: Extract username from profile
406+
Waiting for profile card to load...
407+
Found username: testuser@localllama.land
408+
Found email: Profile testuser testuser@localllama.lan
409+
PASS (20760ms) | username=testuser@localllama.land
410+
Tokens: prompt=480 total=480
333411
334412
======================================================================
335413
RESULTS SUMMARY
336414
======================================================================
337415
338-
┌─────────────────────────────────────────────────────────────────────┐
339-
│ Metric │ A11y Tree │ Predicate │ Δ │
340-
├─────────────────────────────────────────────────────────────────────┤
341-
│ Total Tokens │ 70642 │ 1769 │ -97% │
342-
│ Avg Tokens/Task │ 23547 │ 590 │ │
343-
│ Total Latency (ms) │ 6790 │ 10711 │ -58% │
344-
│ Success Rate │ 3/3 │ 3/3 │ │
345-
└─────────────────────────────────────────────────────────────────────┘
346-
347-
Key Insight: Predicate snapshots use ~97% fewer tokens
348-
while achieving the same task success rate.
416+
+-----------------------------------------------------------------------+
417+
| Metric | A11y Tree | Predicate | Delta |
418+
+-----------------------------------------------------------------------+
419+
| Total Tokens | 3933 | 1559 | -60% |
420+
| Total Latency (ms) | 38859 | 74800 | +92% |
421+
| Steps Passed | 3/6 | 6/6 | |
422+
+-----------------------------------------------------------------------+
423+
424+
Key Insight: Predicate snapshots use 60% fewer tokens
425+
for a multi-step login workflow with form filling.
426+
427+
Step-by-step breakdown:
428+
----------------------------------------------------------------------
429+
Step 1: Wait for login form hydration
430+
A11y: 0 tokens, 11822ms, PASS
431+
Pred: 0 tokens, 10586ms, PASS (0% savings)
432+
Step 2: Fill username field
433+
A11y: 1251 tokens, 6771ms, PASS
434+
Pred: 361 tokens, 12877ms, PASS (71% savings)
435+
Step 3: Fill password field
436+
A11y: 1305 tokens, 12465ms, PASS
437+
Pred: 362 tokens, 17886ms, PASS (72% savings)
438+
Step 4: Click login button
439+
A11y: 1377 tokens, 7801ms, FAIL
440+
Pred: 356 tokens, 12690ms, PASS (74% savings)
349441
```
350442

351-
> **Note:** Latency includes network time for ML ranking via the Predicate gateway. Token savings translate directly to cost savings—97% fewer tokens = 97% lower LLM costs.
443+
</details>
444+
445+
#### Summary
446+
447+
| Step | A11y Tree | Predicate Snapshot | Token Savings |
448+
|------|-----------|-------------------|---------------|
449+
| Step 1: Navigate to localllamaland.com/login | PASS | PASS | - |
450+
| Step 2: Fill username | 1,251 tokens, PASS | 361 tokens, PASS | **71%** |
451+
| Step 3: Fill password | 1,305 tokens, PASS | 362 tokens, PASS | **72%** |
452+
| Step 4: Click login | 1,377 tokens, **FAIL** | 356 tokens, PASS | **74%** |
453+
| Step 5: Navigate to profile | (not reached) | PASS | - |
454+
| Step 6: Extract username | (not reached) | 480 tokens, PASS | - |
455+
| **Total** | **3,933 tokens, 3/6 steps** | **1,559 tokens, 6/6 steps** | **60%** |
456+
457+
> **Key Insight:** Predicate Snapshot not only reduces tokens by 70%+ per step, but also **improves automation reliability** on SPAs with automatic wait for hydration via `runtime.check().eventually()`. The stable element IDs survive React/Next.js re-renders that break A11y tree-based approaches.
352458
353459
### Build
354460

@@ -372,7 +478,8 @@ predicate-snapshot-skill/
372478
│ └── act.ts # PredicateActTool implementation
373479
├── demo/
374480
│ ├── compare.ts # Token comparison demo
375-
│ └── llm-action.ts # LLM action comparison demo
481+
│ ├── llm-action.ts # Simple LLM action demo (single clicks)
482+
│ └── login-demo.ts # Multi-step login workflow demo
376483
├── SKILL.md # OpenClaw skill manifest
377484
└── package.json
378485
```
@@ -390,3 +497,29 @@ predicate-snapshot-skill/
390497

391498
- Documentation: [predicatesystems.ai/docs](https://predicatesystems.ai/docs)
392499
- Issues: [GitHub Issues](https://github.com/PredicateSystems/openclaw-predicate-skill/issues)
500+
501+
## Why Predicate Snapshot Over Accessibility Tree?
502+
503+
OpenClaw and similar browser automation frameworks default to the **Accessibility Tree (A11y)** for navigating websites. While A11y works for simple cases, it has fundamental limitations that make it unreliable for production LLM-driven automation:
504+
505+
### A11y Tree Limitations
506+
507+
| Problem | Description | Impact on LLM Agents |
508+
|---------|-------------|----------------------|
509+
| **Optimized for Consumption, Not Action** | A11y is designed for assistive technology (screen readers), not action verification or layout reasoning | Lacks precise semantic geometry and ordinality (e.g., "the first item in a list") that agents need for reliable reasoning |
510+
| **Hydration Lag & Structural Inconsistency** | In JS-heavy SPAs, A11y often lags behind hydration or misrepresents dynamic overlays and grouping | Snapshots miss interactive nodes or incorrectly label states (e.g., confusing `focused` with `active`) |
511+
| **Shadow DOM & Iframe Blind Spots** | A11y struggles to maintain global order across Shadow DOM and iframe boundaries | Cross-shadow ARIA delegation is inconsistent; iframe contents are often missing or lose spatial context |
512+
| **Token Inefficiency** | Extracting the entire A11y tree for small actions wastes context window and compute | Superfluous nodes (like `genericContainer`) consume tokens without helping the agent |
513+
| **Missing Visual/Layout Bugs** | A11y trees miss rendering-time issues like overlapping buttons or z-index conflicts | Agent reports elements as "correct" but cannot detect visual collisions |
514+
515+
### Predicate Snapshot Advantages
516+
517+
| Capability | How Predicate Solves It |
518+
|------------|------------------------|
519+
| **Post-Rendered Geometry** | Layers in actual bounding boxes and grouping missing from standard A11y representations |
520+
| **Live DOM Synchronization** | Anchors on the live, post-rendered DOM ensuring perfect sync with actual page state |
521+
| **Unified Cross-Boundary Grounding** | Rust/WASM engine prunes and ranks elements across Shadow DOM and iframes, maintaining unified element ordering |
522+
| **Token-Efficient Pruning** | Specifically prunes uninformative branches while preserving all interactive elements, enabling 3B parameter models to perform at larger model levels |
523+
| **Deterministic Verification** | Binds intent to deterministic outcomes via snapshot diff, providing an auditable "truth" layer rather than just a structural "report" |
524+
525+
> **Bottom Line:** A11y trees tell you what *should* be there. Predicate Snapshots tell you what *is* there—and prove it.

SKILL.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,10 @@ Reduces prompt token usage by **95%** while preserving actionable elements.
2424
## Requirements
2525

2626
- Node.js 18+
27-
- `PREDICATE_API_KEY` environment variable
27+
- `PREDICATE_API_KEY` environment variable (optional)
28+
29+
**Without API key:** Local heuristic-based pruning (~80% token reduction)
30+
**With API key:** ML-powered ranking for cleaner output (~95% token reduction, less noise)
2831

2932
Get your free API key at [predicate.systems/keys](https://predicate.systems/keys)
3033

@@ -43,7 +46,7 @@ cd ~/.openclaw/skills/predicate-snapshot && npm install && npm run build
4346

4447
## Configuration
4548

46-
Set your API key:
49+
For enhanced ML-powered ranking, set your API key:
4750

4851
```bash
4952
export PREDICATE_API_KEY="sk-..."

demo/localLlamaLand_demo.png

53.8 KB
Loading

0 commit comments

Comments
 (0)