feat: add stagehand-export skill#105
Conversation
Translate a graduated /autobrowse task into a deterministic Stagehand TypeScript script. The autobrowse loop converges on a working workflow but every replay still pays per-step LLM inference. stagehand-export collapses that into a single .ts file by mining the last passing trace.json for the XPath/CSS selectors that worked, baking them in as cached Action descriptors, and falling back to observe() for ARIA-ref clicks. The Zod schema for extract() is auto-generated from task.md's Output block. Verified end-to-end on the sdml-grants task: autobrowse iter 1 ($0.36, 7 turns, 75s) -> stagehand-export -> npx tsx replay returned the same 7 grants with success: true. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a1ccf2d. Configure here.
| // The script prints OutputSchema JSON to stdout as the last block. | ||
| const stdout = run.stdout ?? ""; | ||
| const lastBrace = stdout.lastIndexOf("{"); | ||
| if (lastBrace >= 0) parsedOutput = JSON.parse(stdout.slice(lastBrace)); |
There was a problem hiding this comment.
Verification JSON parsing fails for nested output objects
High Severity
stdout.lastIndexOf("{") finds the opening brace of the last nested object in pretty-printed JSON, not the root object. For any output with nested objects or arrays of objects (e.g., the 7-grants result from the PR description), JSON.parse(stdout.slice(lastBrace)) parses a substring starting mid-structure with trailing ] and } characters, which is invalid JSON. The parse throws, parsedOutput stays null, and verification incorrectly reports failure. Using stdout.indexOf("{") or JSON.parse(stdout.trim()) would correctly target the root object.
Reviewed by Cursor Bugbot for commit a1ccf2d. Configure here.
| const trace = JSON.parse(fs.readFileSync(tracePath, "utf-8")); | ||
| const taskMd = fs.readFileSync(taskFile, "utf-8"); | ||
| const strategyMd = fs.readFileSync(strategyFile, "utf-8"); | ||
| const summaryMd = fs.readFileSync(summaryPath, "utf-8"); |
There was a problem hiding this comment.
Unused summaryMd read crashes on missing file
Medium Severity
summaryMd is read via fs.readFileSync(summaryPath) but never referenced afterward—it's dead code. Worse, when --run is forced, the isPassing() check is skipped, so there's no guarantee summary.md exists. If the forced run directory lacks a summary.md, this line throws an uncaught ENOENT error and crashes the script before any generation happens.
Reviewed by Cursor Bugbot for commit a1ccf2d. Configure here.
| const selector = args[0]; | ||
| const value = args.slice(1).join(" "); | ||
| ops.push({ kind: "act", method: "selectOptionFromDropdown", selector, arguments: [value], ...base }); | ||
| break; |
There was a problem hiding this comment.
select case skips ref classification unlike click/fill
Medium Severity
The select case unconditionally pushes a kind: "act" op with the raw selector, without calling classifySelector first. Unlike the click and fill cases—which check for ARIA refs and route them to observe_act—a browse select [0-58] "value" command would emit a cached Action with an ephemeral ARIA ref as the selector, producing a generated script line that can't replay deterministically.
Reviewed by Cursor Bugbot for commit a1ccf2d. Configure here.
| const STAGEHAND_ENV = usesBrowserbase ? "BROWSERBASE" : "LOCAL"; | ||
|
|
||
| // First goto URL | ||
| const firstGoto = ops.find((o) => o.kind === "goto"); |
There was a problem hiding this comment.
Several computed values are assigned but never used
Low Severity
firstGoto is computed on line 373 but never read or referenced anywhere in the script — it appears to be leftover from an earlier design (perhaps to emit the initial URL separately). Similarly, lastReasoning and lastTurn on lines 248–249 are declared and never read, having been superseded by the per-turn turnReasoning logic. These dead stores add confusion about whether functionality is missing.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit a1ccf2d. Configure here.


Summary
/stagehand-export— translates a graduated/autobrowsetask into a deterministic Stagehand TypeScript script.trace.jsonfor XPath/CSS selectors → cachedActiondescriptors. Falls back toobserve()for ARIA-ref clicks. Auto-generates a Zod schema forextract()fromtask.md's Output block.sdml-grantstask: autobrowse iter 1 ($0.36, 7 turns, 75s) → stagehand-export →npx tsxreplay returned the same 7 grants withsuccess: true.Why
/autobrowseproduces astrategy.mdanother Claude session can replay step-by-step, but every replay still pays per-step inference. This collapses the loop into a one-LLM-call (justextract()) deterministic script you canbb functions deploy, cron, or invoke from non-Claude code.Files added
skills/stagehand-export/SKILL.md— entry point + workflow docskills/stagehand-export/scripts/export.mjs— generator (~620 lines, stdlib only, no deps)skills/stagehand-export/references/command-mapping.md—browseCLI → Stagehand translation tableTest plan
node skills/stagehand-export/scripts/export.mjs --task <name> --workspace ./autobrowse --no-verifyagainst a graduated autobrowse task; inspect the generated.ts--no-verifyto execute the script and confirm exit 0 +success: trueJSON🤖 Generated with Claude Code
Note
Medium Risk
Adds a new code generator that parses trace/markdown inputs and runs
npm install+ executes generated scripts; failures or heuristic mis-parsing could produce incorrect automation or unexpected local side effects during verification.Overview
Adds a new
stagehand-exportskill that exports a graduated/autobrowsetask into a deterministic Stagehand TypeScript script, mining the most recent passingtrace.jsonfor stable XPath/CSS selectors and emitting cachedstagehand.act(...)calls withobserve()fallbacks for ARIA refs.The generator (
scripts/export.mjs) also infers a ZodOutputSchemafromtask.md’s## OutputJSON block, scaffoldspackage.json/tsconfig.json, writes aselectors.cache.jsonsidecar, and can optionally verify by runningnpm installand executing the generated script, reporting pass/fail and logs.Reviewed by Cursor Bugbot for commit a1ccf2d. Bugbot is set up for automated code reviews on this repo. Configure here.