Skip to content

Audit Agent Experience Skill#79

Open
jay-sahnan wants to merge 6 commits into
mainfrom
audit-agent-experience
Open

Audit Agent Experience Skill#79
jay-sahnan wants to merge 6 commits into
mainfrom
audit-agent-experience

Conversation

@jay-sahnan
Copy link
Copy Markdown
Contributor

@jay-sahnan jay-sahnan commented Apr 25, 2026

Spawns parallel Claude subagents against a target docs/SDK/SKILL.md from a one-sentence prompt, captures structured traces, and renders a graded HTML report scoring Setup Friction, Speed, Efficiency, Error Recovery, and Doc Quality. Includes narrative cross-agent review to surface convergent hallucinations and silent workarounds the JSON self-report misses.


Note

Low Risk
Primarily adds new skill documentation/templates and a static prospecting profile, with no executable code changes beyond what future skill runners may follow.

Overview
Introduces a new audit-agent-experience skill, including a detailed SKILL.md playbook for running parallel subagent onboarding audits (config prompts, credential-handling guidance, trace parsing, scoring rubric, and report generation flow).

Adds supporting assets for the skill: an HTML report template (assets/report-template.html) plus reference docs (references/*) defining prompt variants, subagent trace schema/brief, and scoring rules, and includes a new MIT LICENSE.txt. Also adds a Browserbase prospecting profile JSON under skills/event-prospecting/profiles/.

Reviewed by Cursor Bugbot for commit 8f64258. Bugbot is set up for automated code reviews on this repo. Configure here.

jay-sahnan and others added 2 commits April 25, 2026 09:07
Spawns parallel Claude subagents against a target docs/SDK/SKILL.md from a
one-sentence prompt, captures structured traces, and renders a graded HTML
report scoring Setup Friction, Speed, Efficiency, Error Recovery, and Doc
Quality. Includes narrative cross-agent review to surface convergent
hallucinations and silent workarounds the JSON self-report misses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread skills/audit-agent-experience/assets/report-template.html
Comment thread skills/audit-agent-experience/references/prompt-variants.md
Comment thread skills/audit-agent-experience/SKILL.md Outdated
Comment thread skills/audit-agent-experience/SKILL.md Outdated
Comment thread skills/audit-agent-experience/SKILL.md Outdated
Comment thread skills/audit-agent-experience/assets/report-template.html
Comment thread skills/audit-agent-experience/assets/report-template.html
Comment thread skills/audit-agent-experience/assets/report-template.html Outdated
Comment thread skills/audit-agent-experience/references/prompt-variants.md Outdated
Comment thread skills/audit-agent-experience/SKILL.md Outdated
@shrey150 shrey150 self-requested a review May 5, 2026 16:18
@jay-sahnan
Copy link
Copy Markdown
Contributor Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8f64258. Configure here.

Comment thread skills/event-prospecting/profiles/browserbase.json Outdated
Comment thread skills/audit-agent-experience/assets/report-template.html Outdated
Comment thread skills/audit-agent-experience/SKILL.md Outdated
Copy link
Copy Markdown
Contributor

@shrey150 shrey150 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-approving, quick fixes to make

description: "Audit the developer experience of a product, SDK, docs site, or SKILL.md by dropping multiple Claude subagents at it with only a tiny task prompt and real tools (WebFetch, Bash, Write). Agents must discover the docs themselves, install deps, ask for credentials if needed, and attempt real execution. The skill captures each agent's trace — tool calls, retries, wall time, errors — and scores on Setup Friction, Speed, Efficiency, Error Recovery, and Doc Quality, then emits an HTML report with an A–F grade and concrete fixes. Use when the user asks to audit agent experience, test a skill, audit docs for agents, check if a SDK is agent-friendly, validate a SKILL.md, measure agent DX, or benchmark how painful onboarding is for an AI agent. Triggers: 'audit agent experience', 'test this skill', 'audit docs for agents', 'is my SDK agent-friendly', 'run a DX audit', 'agent experience test', 'test my docs', 'how do agents do with my product'."
license: MIT
metadata:
author: jay
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to make this your GH username?


Write the value into per-agent workspace `.env` files using the same generic names (`API_KEY`, `PROJECT_ID`, `SECRET`) as the paste flow — see Step 2. The discovery layer is upstream of injection; downstream behavior (generic names, agent must read docs to map them) is unchanged.

**Orchestrator-retained credentials.** After writing per-agent `.env` files, the orchestrator keeps the **original product-specific names → values** (e.g. `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID`) available to itself for downstream verification work in Steps 6 / 6.5 / 8 — for example, calling the product's API with `curl` to confirm that a session ID an agent reported actually resolves, or fetching session metadata to enrich the report. The orchestrator can read them with `printenv` (no need to store anywhere — the parent shell already has them since auto-discover sourced them from there).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove BROWSERBASE_PROJECT_ID

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants