Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions skills/stagehand-export/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
name: stagehand-export
description: Translate a graduated /autobrowse task into a deterministic Stagehand TypeScript script. Mines the last passing trace.json for working XPath/CSS selectors, bakes them in as cached Action descriptors, falls back to observe() for ARIA-ref clicks, and auto-generates a Zod schema for extract() from task.md's Output block. Use after /autobrowse has converged to ship a no-LLM-loop runnable script (tsx, Browserbase Functions, cron). Trigger keywords: stagehand-export, export to stagehand, autobrowse to stagehand, convert browse skill to typescript.
allowed-tools: Bash Read Write Edit Glob Grep
metadata:
author: browserbase
homepage: https://github.com/browserbase/skills
---

# stagehand-export — autobrowse → Stagehand bridge

`/autobrowse` produces a `strategy.md` and trace history that a Claude session can replay step-by-step using the `browse` CLI. Every replay still pays an LLM bill. `stagehand-export` collapses that loop into a deterministic Stagehand TypeScript script that uses the **exact selectors that worked** in the last passing run, replaying them via Stagehand's cached-`Action` path (no LLM call per step).

## When to use

- A `/autobrowse` task has graduated (`**Status:** completed` + `success: true` JSON on a recent run).
- You want to schedule the task, deploy it as a Browserbase Function, or invoke it from non-Claude code.
- You want to ship the automation without paying per-step inference costs.

Do **not** use this before `/autobrowse` converges — the export needs a passing trace to mine.

## Inputs

A workspace produced by `/autobrowse`:

```
<workspace>/ # default: ./autobrowse
├── tasks/<task>/
│ ├── task.md
│ └── strategy.md
└── traces/<task>/
├── run-001/, run-002/, …
└── latest/ # symlink to newest run
```

## How it works

1. **Pick the run to mine.** If `--run` is passed, use it. Otherwise walk runs newest-first and pick the most recent whose `summary.md` shows `**Status:** completed` and whose final JSON has `success: true`.
2. **Parse `task.md` → Zod schema.** The fenced JSON block under `## Output` is walked into a Zod schema. Strings, numbers, booleans, arrays, and nullable fields are inferred; mixed types fall back to `z.unknown()`.
3. **Walk `trace.json`, classify each successful `browse` command** (see `references/command-mapping.md`). Each command becomes one of:
- **Cached `Action`** — for `click`/`fill`/`select` with a stable XPath or CSS selector. Emitted as `await stagehand.act({ selector, method, arguments, description })`. Description is sourced from the assistant's reasoning on that turn.
- **`observe()` fallback** — for ARIA-ref clicks (`0-970`, `[0-58]`). Refs are session-scoped and can't be replayed; the script emits `const actions = await stagehand.observe('<intent>'); await stagehand.act(actions[0]);`.
- **Playwright primitive** — `page.goto`, `page.waitForLoadState`, `page.waitForTimeout`, `page.keyboard.press`/`type`, etc.
- **Dropped** — `browse snapshot`, `browse env`, `browse stop`, `browse get text body` (replaced by `Stagehand` init/close + `extract()`).
4. **Emit `extract()`** at the end with the inferred Zod schema. Stagehand pulls the final result JSON.
5. **Write outputs** into `<workspace>/tasks/<task>/stagehand/`:
- `<task>.ts` — the script
- `selectors.cache.json` — the cached Action descriptors as a human-readable index
- `package.json`, `tsconfig.json` — minimal scaffold
6. **Verify** — `npm install --silent` + `npx tsx <task>.ts`. Parse the script's stdout JSON; pass = `success: true`. Report and stop (no autoloop back into autobrowse).

## How to run

```bash
# Default — verify after generating
node ${CLAUDE_SKILL_DIR}/scripts/export.mjs --task <task-name>

# Custom workspace
node ${CLAUDE_SKILL_DIR}/scripts/export.mjs --task <task-name> --workspace ./autobrowse

# Force a specific run
node ${CLAUDE_SKILL_DIR}/scripts/export.mjs --task <task-name> --run run-022

# Skip the verification run (write files only)
node ${CLAUDE_SKILL_DIR}/scripts/export.mjs --task <task-name> --no-verify
```

The script prints a JSON report to stdout and diagnostics to stderr. Exit codes: `0` = generated and verified, `2` = verification failed (or `--no-verify` and just generated), `1` = generator/install error.

## Output report

```json
{
"task": "sf-311-request",
"run": "run-022",
"script": "./autobrowse/tasks/sf-311-request/stagehand/sf-311-request.ts",
"cache": "./autobrowse/tasks/sf-311-request/stagehand/selectors.cache.json",
"cached_actions": 7,
"observe_fallbacks": 3,
"schema_fields": 6,
"verified": true,
"passed": true,
"exit_code": 0,
"run_log": "./autobrowse/tasks/sf-311-request/stagehand/run.log",
"output": { "success": true, "confirmation_number": "101003821426", ... }
}
```

## Rules

- **Read-only on the autobrowse workspace** — never edit `task.md`, `strategy.md`, or trace files. Only write inside `tasks/<task>/stagehand/`.
- **Re-runnable.** Running `stagehand-export` twice overwrites `<task>.ts` and `selectors.cache.json` but leaves `package.json`/`tsconfig.json`/`node_modules` alone.
- **Fail loud on no passing runs.** Don't silently export from a failed run — the whole point is to bake in *what worked*.
- **Don't reinvent caching.** The generated script uses Stagehand's built-in `cacheDir`. `selectors.cache.json` is a side-car index for humans, not a parallel cache layer.
- **Auth is out of scope for v1.** If `task.md` mentions credentials/cookies/login, the generated script emits a `// TODO: wire up authed context` comment near the Stagehand constructor.

## Reference

- `references/command-mapping.md` — the full `browse` → Stagehand translation table the generator uses.
- `scripts/export.mjs` — the generator. Self-contained Node script, no dependencies beyond stdlib.
78 changes: 78 additions & 0 deletions skills/stagehand-export/references/command-mapping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# `browse` CLI → Stagehand translation table

This is the table the generator (`scripts/export.mjs`) applies when walking `trace.json`. Each row is "what the autobrowse inner agent did with the `browse` CLI" → "what Stagehand call the generator emits."

## Why three different output shapes?

The `browse` CLI mixes three element-targeting strategies:

- **Stable selectors** (XPath like `//label[normalize-space(.)='Street']`, CSS like `#dform_widget_Request_description`). These survive across sessions and DOM rerenders. They map cleanly to Stagehand's `Action` descriptor and replay with no LLM call.
- **ARIA refs** (`[0-970]`, `0-58`). Valid only inside the snapshot they came from — re-snapshotting can reassign refs. They cannot be cached. Stagehand's `observe()` is the natural replacement: at runtime, ask Stagehand to find an element matching the natural-language intent we captured from the assistant's reasoning, then `act()` on the returned `Action`.
- **Pure natural language** (`browse type "hello"` with no selector context — assumes focus). These also have no stable target. Stagehand can either accept a plain string instruction to `act()`, or we can drop to Playwright primitives (`page.keyboard.type`).

## Mapping

| `browse` form | Stagehand emission | Why |
|---|---|---|
| `browse stop` / `browse env local` / `browse env remote` | (dropped — replaced by `new Stagehand({ env })` + `stagehand.init()` at top) | Stagehand owns the session. The env flag picks `LOCAL` vs `BROWSERBASE`. |
| `browse open <url>` / `browse newpage <url>` | `await page.goto(url)` | Direct Playwright primitive. |
| `browse wait load` | `await page.waitForLoadState('load')` | Direct primitive. |
| `browse wait timeout <ms>` | `await page.waitForTimeout(ms)` | Direct primitive. |
| `browse wait selector "<sel>"` | `await page.waitForSelector(sel)` | Direct primitive. |
| `browse snapshot` | (dropped) | Snapshots exist to surface ARIA refs to an LLM. Stagehand's `act`/`observe` see the DOM themselves. |
| `browse screenshot <path>` | (dropped) | Debug-only; not needed in a deterministic replay. |
| `browse get url/title` | (dropped) | If the strategy uses these to gate a step, the corresponding step is preserved; the read itself is dropped. |
| `browse get text <sel>` | `await page.locator(sel).innerText()` (manual port; v1 drops) | v1 leaves these out and relies on the final `extract()` to pull all output data. |
| `browse click "//xpath-or-#css"` | **Cached `Action`**: `await stagehand.act({ selector, method: "click", arguments: [], description })` | Stable selector → deterministic replay, no LLM. |
| `browse click <ref>` (e.g. `0-970`, `[0-58]`) | **observe() fallback**: `const a = await stagehand.observe(intent); await stagehand.act(a[0]);` | Refs are session-scoped; resolve at runtime via NL intent. |
| `browse fill "<sel>" "<value>" [--no-press-enter]` | Cached `Action` with `method: "fill"`, `arguments: [value]` | The `--no-press-enter` flag is stripped (Stagehand's fill doesn't press Enter). |
| `browse select "<sel>" "<value>"` | Cached `Action` with `method: "selectOptionFromDropdown"`, `arguments: [value]` | Stable selector. |
| `browse type "<text>"` (no selector — types into focused element) | `await page.keyboard.type(text)` | No targetable element; assume focus is correct from the prior step. |
| `browse press <key>` | `await page.keyboard.press(key)` | Direct primitive. |
| `browse scroll <x> <y> <dx> <dy>` | `await page.mouse.move(x,y); await page.mouse.wheel(dx,dy)` | Direct primitive. |
| `browse back` / `forward` / `reload` | `page.goBack()` / `page.goForward()` / `page.reload()` | Direct primitive. |
| (anything else) | `// TODO: unhandled browse verb <verb>` comment | Surfaces gaps; the user hand-ports. |

## Intent sourcing

Each `Action` descriptor needs a `description` for self-healing. The generator pulls intent from two sources, in priority order:

1. **The assistant's `reasoning` text on the same turn as the tool call.** From `trace.json`, the entry with `role: "assistant"` and a `reasoning` field that immediately precedes the `tool_input`. The first line/sentence becomes the description. This is the highest-quality source because it captures the agent's actual goal.
2. **The strategy.md section heading** for the turn range. Strategy headers carry markers like `### Page 2 — Location (turns 8–18)`. If reasoning is empty, the section heading is the fallback.

If both are missing, the description falls back to `click element (turn N)`.

## Selectors cache side-car (`selectors.cache.json`)

The generator writes a sibling file alongside the `.ts`:

```json
{
"task": "sf-311-request",
"generated_from": { "workspace": "...", "run": "run-022" },
"actions": [
{
"description": "Now I'll click the Street radio button.",
"selector": "//label[normalize-space(.)='Street']",
"method": "click",
"arguments": [],
"turn": 8,
"section": "Page 2 — Location (turns 8–18)"
},
...
]
}
```

This is **not** a parallel cache that the script reads at runtime — Stagehand's built-in `cacheDir` (configured to `./.stagehand-cache` in the constructor) handles runtime caching. The side-car exists so a human can scan, edit, or diff the selectors the export captured.

## Selector classification (heuristic)

The generator classifies the target string of `click`/`fill`/`select`:

- **Ref** — matches `^\[?\d+-\d+\]?$` (e.g., `0-970`, `[0-58]`).
- **XPath** — starts with `/`, `./`, or `//`.
- **CSS** — starts with `#`, `.`, `[`, or matches a tag-name pattern.
- **Unknown** — anything else; falls back to `observe()`.

Heuristics are intentionally permissive — a misclassification just means the generated line is the wrong shape; the user can edit it. The verification run catches issues that matter at runtime.
Loading