Skip to content

feat: add stagehand-export skill#105

Open
aq17 wants to merge 1 commit into
mainfrom
add-stagehand-export-skill
Open

feat: add stagehand-export skill#105
aq17 wants to merge 1 commit into
mainfrom
add-stagehand-export-skill

Conversation

@aq17
Copy link
Copy Markdown
Contributor

@aq17 aq17 commented May 13, 2026

Summary

  • Adds /stagehand-export — translates a graduated /autobrowse task into a deterministic Stagehand TypeScript script.
  • Mines the last passing trace.json for XPath/CSS selectors → cached Action descriptors. Falls back to observe() for ARIA-ref clicks. Auto-generates a Zod schema for extract() from task.md's Output block.
  • Verified end-to-end on the sdml-grants task: autobrowse iter 1 ($0.36, 7 turns, 75s) → stagehand-export → npx tsx replay returned the same 7 grants with success: true.

Why

/autobrowse produces a strategy.md another Claude session can replay step-by-step, but every replay still pays per-step inference. This collapses the loop into a one-LLM-call (just extract()) deterministic script you can bb functions deploy, cron, or invoke from non-Claude code.

Files added

  • skills/stagehand-export/SKILL.md — entry point + workflow doc
  • skills/stagehand-export/scripts/export.mjs — generator (~620 lines, stdlib only, no deps)
  • skills/stagehand-export/references/command-mapping.mdbrowse CLI → Stagehand translation table

Test plan

  • Run node skills/stagehand-export/scripts/export.mjs --task <name> --workspace ./autobrowse --no-verify against a graduated autobrowse task; inspect the generated .ts
  • Run without --no-verify to execute the script and confirm exit 0 + success: true JSON
  • Confirm cached Action count and observe() fallback count match the trace's command shape

🤖 Generated with Claude Code


Note

Medium Risk
Adds a new code generator that parses trace/markdown inputs and runs npm install + executes generated scripts; failures or heuristic mis-parsing could produce incorrect automation or unexpected local side effects during verification.

Overview
Adds a new stagehand-export skill that exports a graduated /autobrowse task into a deterministic Stagehand TypeScript script, mining the most recent passing trace.json for stable XPath/CSS selectors and emitting cached stagehand.act(...) calls with observe() fallbacks for ARIA refs.

The generator (scripts/export.mjs) also infers a Zod OutputSchema from task.md’s ## Output JSON block, scaffolds package.json/tsconfig.json, writes a selectors.cache.json sidecar, and can optionally verify by running npm install and executing the generated script, reporting pass/fail and logs.

Reviewed by Cursor Bugbot for commit a1ccf2d. Bugbot is set up for automated code reviews on this repo. Configure here.

Translate a graduated /autobrowse task into a deterministic Stagehand
TypeScript script. The autobrowse loop converges on a working workflow
but every replay still pays per-step LLM inference. stagehand-export
collapses that into a single .ts file by mining the last passing
trace.json for the XPath/CSS selectors that worked, baking them in as
cached Action descriptors, and falling back to observe() for
ARIA-ref clicks. The Zod schema for extract() is auto-generated from
task.md's Output block.

Verified end-to-end on the sdml-grants task: autobrowse iter 1 ($0.36,
7 turns, 75s) -> stagehand-export -> npx tsx replay returned the same
7 grants with success: true.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a1ccf2d. Configure here.

// The script prints OutputSchema JSON to stdout as the last block.
const stdout = run.stdout ?? "";
const lastBrace = stdout.lastIndexOf("{");
if (lastBrace >= 0) parsedOutput = JSON.parse(stdout.slice(lastBrace));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verification JSON parsing fails for nested output objects

High Severity

stdout.lastIndexOf("{") finds the opening brace of the last nested object in pretty-printed JSON, not the root object. For any output with nested objects or arrays of objects (e.g., the 7-grants result from the PR description), JSON.parse(stdout.slice(lastBrace)) parses a substring starting mid-structure with trailing ] and } characters, which is invalid JSON. The parse throws, parsedOutput stays null, and verification incorrectly reports failure. Using stdout.indexOf("{") or JSON.parse(stdout.trim()) would correctly target the root object.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a1ccf2d. Configure here.

const trace = JSON.parse(fs.readFileSync(tracePath, "utf-8"));
const taskMd = fs.readFileSync(taskFile, "utf-8");
const strategyMd = fs.readFileSync(strategyFile, "utf-8");
const summaryMd = fs.readFileSync(summaryPath, "utf-8");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused summaryMd read crashes on missing file

Medium Severity

summaryMd is read via fs.readFileSync(summaryPath) but never referenced afterward—it's dead code. Worse, when --run is forced, the isPassing() check is skipped, so there's no guarantee summary.md exists. If the forced run directory lacks a summary.md, this line throws an uncaught ENOENT error and crashes the script before any generation happens.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a1ccf2d. Configure here.

const selector = args[0];
const value = args.slice(1).join(" ");
ops.push({ kind: "act", method: "selectOptionFromDropdown", selector, arguments: [value], ...base });
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select case skips ref classification unlike click/fill

Medium Severity

The select case unconditionally pushes a kind: "act" op with the raw selector, without calling classifySelector first. Unlike the click and fill cases—which check for ARIA refs and route them to observe_act—a browse select [0-58] "value" command would emit a cached Action with an ephemeral ARIA ref as the selector, producing a generated script line that can't replay deterministically.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a1ccf2d. Configure here.

const STAGEHAND_ENV = usesBrowserbase ? "BROWSERBASE" : "LOCAL";

// First goto URL
const firstGoto = ops.find((o) => o.kind === "goto");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several computed values are assigned but never used

Low Severity

firstGoto is computed on line 373 but never read or referenced anywhere in the script — it appears to be leftover from an earlier design (perhaps to emit the initial URL separately). Similarly, lastReasoning and lastTurn on lines 248–249 are declared and never read, having been superseded by the per-turn turnReasoning logic. These dead stores add confusion about whether functionality is missing.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a1ccf2d. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant