Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions skills/dotagents-qa/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
FROM node:24-bookworm

ENV CI=1
ENV PNPM_HOME=/pnpm
ENV COREPACK_HOME=/corepack
ENV PATH=/pnpm:$PATH

RUN apt-get update \
&& apt-get install -y --no-install-recommends \
ca-certificates \
git \
jq \
less \
openssh-client \
ripgrep \
&& rm -rf /var/lib/apt/lists/*

RUN mkdir -p "$PNPM_HOME" "$COREPACK_HOME" \
&& corepack enable \
&& corepack prepare pnpm@10.28.2 --activate

WORKDIR /sandbox

CMD ["bash"]
225 changes: 162 additions & 63 deletions skills/dotagents-qa/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,121 @@
---
name: dotagents-qa
description: QA dotagents behavior changes with a local fixture project. Use when changes may affect dotagents install, sync, list, doctor, skill placement, agent symlinks, MCP or hook config generation, user scope, or package/runtime behavior and an agent needs to prove a representative agents.toml setup still installs correctly.
description: QA dotagents behavior changes in a Docker sandbox. Use when changes may affect dotagents install, sync, list, doctor, skill placement, agent symlinks, MCP or hook config generation, user scope, or package/runtime behavior.
---

# dotagents QA

Answer the practical question: "With this local dotagents build, does a representative `agents.toml` still install and wire the expected files?"
Do real QA for the change in front of you. Docker is the safety boundary, not
the test plan: use it so dotagents cannot write to host agent config, host home
directories, or host cache state while you build fixtures that prove the
changed behavior.

## Workflow
## 1. Understand The Change

1. Inspect the changed surface:
- `git status --short`
- `git diff --stat`
- `git diff -- <paths>`
2. Build the local CLI before smoke testing:
Start from the diff and identify the behavior that could regress:

```bash
pnpm build
git status --short
git diff --stat
git diff -- <paths>
```

3. Create a temp project that exercises the changed behavior. Start here and add only what the change needs.
Write down the QA target before running commands:
- Which command path changed: `install`, `sync`, `list`, `doctor`, `add`,
`remove`, `mcp`, `trust`, `init`, package runtime, or scope resolution.
- Which surfaces must be inspected: `.agents/skills`, agent skill symlinks,
MCP config, hook config, lockfile, gitignore, CLI output, or user scope.
- Which fixture shape proves it: local skills, nested skills, wildcard source,
specific agents, MCP entries, hooks, existing broken state, or remote source.

Run focused Vitest coverage for logic bugs. Use this skill for end-to-end QA
evidence, not as a substitute for regression tests.

## 2. Enter A Docker Sandbox

Build the repo-local QA image when it is missing, when this Dockerfile changes,
or when the repo `packageManager` pnpm version changes:

```bash
docker build \
-f skills/dotagents-qa/Dockerfile \
-t dotagents-qa:local \
skills/dotagents-qa
```

Use an interactive container so the QA steps stay change-specific:

```bash
set -euo pipefail
REPO="$(pwd)"
TMP="$(mktemp -d)"
mkdir -p "$TMP/local-skills" "$TMP/home" "$TMP/state"
for skill in review commit; do
mkdir -p "$TMP/local-skills/$skill"
printf -- "---\nname: %s\ndescription: Fixture %s skill.\n---\n\n%s fixture.\n" \
"$skill" "$skill" "$skill" > "$TMP/local-skills/$skill/SKILL.md"
done

cat > "$TMP/agents.toml" <<'EOF'
OUT="$(mktemp -d "${TMPDIR:-/tmp}/dotagents-qa.XXXXXX")"
docker run --rm -it \
-v "$REPO:/host-repo:ro" \
-v "$OUT:/qa-out" \
dotagents-qa:local
```

If your tool environment is not attached to a TTY, use `-i` instead of `-it`
and feed the same commands with a here-doc. Keep `-i`; without stdin attached,
the container shell will receive no script.

Inside the container:

```bash
set -euo pipefail
export CI=1
export HOME=/sandbox/home
export DOTAGENTS_STATE_DIR=/sandbox/state
export DOTAGENTS_HOME=/sandbox/user-agents

mkdir -p "$HOME" "$DOTAGENTS_STATE_DIR" "$DOTAGENTS_HOME" /sandbox/repo
tar -C /host-repo \
--exclude=.git \
--exclude=node_modules \
--exclude=.turbo \
--exclude=coverage \
--exclude=core \
--exclude='*.tsbuildinfo' \
--exclude='packages/*/dist' \
-cf - . | tar -C /sandbox/repo -xf -

cd /sandbox/repo
pnpm install --frozen-lockfile
pnpm build
```

Run `pnpm check` inside Docker unless the change requires a narrower
target or the check is already known to be unrelated. If `build` or `check`
fails, treat that as a QA finding and stop before fixture work unless you are
explicitly isolating the playbook mechanics. If skipped or bypassed, report why.

## 3. Build The Fixture

Create the smallest fixture that proves the changed behavior. Start from this
shape only when it fits; edit it aggressively for the diff you are testing.

```bash
fixture=/sandbox/fixture
mkdir -p "$fixture/local-skills/review" "$fixture/local-skills/commit"

cat > "$fixture/local-skills/review/SKILL.md" <<'SKILL'
---
name: review
description: Fixture review skill.
---

Review fixture.
SKILL

cat > "$fixture/local-skills/commit/SKILL.md" <<'SKILL'
---
name: commit
description: Fixture commit skill.
---

Commit fixture.
SKILL

cat > "$fixture/agents.toml" <<'TOML'
version = 1
agents = ["claude", "cursor"]

Expand All @@ -52,78 +135,94 @@ args = ["-e", "process.exit(0)"]
[[hooks]]
event = "Stop"
command = "echo fixture"
EOF
TOML
```

4. Run the local CLI from inside the temp project, with home/cache isolated:
Useful fixture changes:
- Skill resolution: nested `skills/` layouts, wildcard sources, duplicate
names, local paths outside `.agents`, or the exact `path:` shape touched.
- Agent placement: include only affected agents, or include all supported
agents when shared config or registry behavior changed.
- MCP and hooks: use the exact command, URL, headers, env refs, hook event, or
matcher affected by the diff.
- Sync and doctor: pre-create broken or legacy state, then prove repair and
diagnostics.
- User scope: run from outside a project with `--user` and inspect
`$DOTAGENTS_HOME`; never use the host home directory.
- Remote sources: use the real source and trust policy only when source,
trust, git, well-known, or network behavior changed.

## 4. Exercise The CLI

Run the built local CLI from the fixture. Capture output and inspect generated
files, not just exit codes.

```bash
set -euo pipefail
export HOME="$TMP/home" DOTAGENTS_STATE_DIR="$TMP/state"
cd "$TMP"
node "$REPO/packages/dotagents/dist/cli/index.js" install
node "$REPO/packages/dotagents/dist/cli/index.js" list > "$TMP/list.out"
node "$REPO/packages/dotagents/dist/cli/index.js" doctor --fix
node "$REPO/packages/dotagents/dist/cli/index.js" doctor
cli=(node /sandbox/repo/packages/dotagents/dist/cli/index.js)
cd "$fixture"

"${cli[@]}" install | tee /qa-out/install.out
"${cli[@]}" list | tee /qa-out/list.out
"${cli[@]}" doctor --fix | tee /qa-out/doctor-fix.out
"${cli[@]}" doctor | tee /qa-out/doctor.out
```

5. Assert the behavior that matters:
Assert what matters for this change. Examples:

```bash
set -euo pipefail
grep -q "review" "$TMP/list.out" && grep -q "commit" "$TMP/list.out"
test -f .agents/skills/review/SKILL.md && test -f .agents/skills/commit/SKILL.md
grep -q "review" /qa-out/list.out
test -f .agents/skills/review/SKILL.md
test -L .claude/skills
test -f .mcp.json
test -f .cursor/mcp.json
test -f .claude/settings.json
test -f .cursor/hooks.json
```

For `sync` changes, break generated state and assert repair:
For `sync` changes, break the generated state in the way the diff claims to
repair, then verify the repair:

```bash
set -euo pipefail
rm .mcp.json .claude/skills
node "$REPO/packages/dotagents/dist/cli/index.js" sync
"${cli[@]}" sync | tee /qa-out/sync.out
test -f .mcp.json
test -L .claude/skills
```

## Adjust The Fixture

- Skill resolution changes: use multiple local skills, nested `skills/` directories, or the exact `path:` layout touched by the change.
- Agent placement changes: edit `agents = [...]` and assert expected symlinks/config files.
- MCP or hook changes: include representative `[[mcp]]` or `[[hooks]]` entries and inspect generated JSON/TOML.
- User-scope changes: set `DOTAGENTS_HOME="$TMP/user-home"` and pass `--user`; never write to the real home directory.
- Package/runtime changes: run the built CLI as above, or pack/install the package only when the packaging path itself changed.
- Remote source changes: use `getsentry/skills`; avoid remotes for ordinary install-location checks.

## Optional Agent CLI Checks

Use only when discovery/registration changed.
For user-scope changes:

```bash
set -euo pipefail
mkdir -p "$TMP/codex-home"
CODEX_HOME="$TMP/codex-home" codex debug prompt-input "probe skills" > "$TMP/codex-prompt.json"
grep -q "review" "$TMP/codex-prompt.json"
cd /sandbox
"${cli[@]}" --user install | tee /qa-out/user-install.out
test -f "$DOTAGENTS_HOME/agents.toml"
test -d "$DOTAGENTS_HOME/skills"
```

Claude: no cheap dry-run skill list. If auth/network/model cost is acceptable, run a minimal non-interactive `/skill-name` prompt from `$TMP`; otherwise report skipped.
Copy useful evidence before leaving the container:

## Optional Remote Check
```bash
cp -a "$fixture" /qa-out/fixture
cp -a "$DOTAGENTS_HOME" /qa-out/user-agents 2>/dev/null || true
```

Use only when source resolution, trust, git, well-known, or network behavior changed.
## 5. Real Agent Clients

```toml
[[skills]]
name = "code-review"
source = "getsentry/skills"
```
Use real clients only when discovery or registration in Claude, Cursor, Codex,
VS Code, or OpenCode changed. Docker proves generated files and symlinks; it
does not prove an installed host client notices them.

Run `pnpm check` after the smoke test unless there is a concrete reason to run only a targeted package check.
Keep host-client checks isolated with explicit temp homes/config dirs where
the client supports it. If a client cannot run without reading host state, say
so and report the Docker-generated files you inspected instead.

## Report
## 6. Report Evidence

Include the temp path if it remains for debugging, the exact fixture shape, commands run, assertions made, and any skipped checks.
Report:
- the changed behavior you targeted
- Docker image and setup used
- fixture shape and why it matched the diff
- commands run
- generated files or command output inspected
- assertions that passed
- `/qa-out` host path if retained for debugging
- skipped checks and residual risk
17 changes: 13 additions & 4 deletions skills/dotagents-qa/SOURCES.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,26 @@
| `specs/SPEC.md` | canonical `.agents/skills/`, agent symlink, MCP, hook, install, sync, list, and doctor behavior |
| `packages/dotagents/src/agents/definitions/claude.ts` | Claude project skills and generated config paths |
| `packages/dotagents/src/agents/definitions/cursor.ts` | Cursor shares `.claude/skills` and writes Cursor-specific MCP/hook config |
| `packages/dotagents/src/agents/definitions/vscode.ts` | VS Code reads `.agents/skills` natively and writes `.vscode/mcp.json` plus shared Claude-style hooks |
| `packages/dotagents/src/agents/definitions/codex.ts` | Codex reads `.agents/skills` natively and writes `.codex/config.toml` |
| `packages/dotagents/src/agents/definitions/opencode.ts` | OpenCode reads `.agents/skills` natively and writes `opencode.json` |
| `packages/dotagents/src/cli/cache.ts` | `DOTAGENTS_STATE_DIR` cache isolation |
| `packages/dotagents/src/cli/update-notifier.ts` | `HOME` isolation for update-check cache |
| `skills/dotagents/SKILL.md` | sibling skill layout |
| `/Users/dcramer/src/sentry-mcp/.agents/skills/mcp-qa/SKILL.md` | Numbered QA flow, primary path guidance, optional client checks, and pass criteria structure |
| local `codex debug prompt-input` probe | verifies Codex can expose `.agents/skills` metadata without a model call |

## Decisions

- Keep the runtime skill as an inline smoke-test guide rather than a broad QA matrix.
- Use local `path:` skills by default so the fixture tests dotagents wiring without network/trust noise.
- Build first and run `packages/dotagents/dist/cli/index.js` from inside the temp project.
- Include representative local skills, agents, MCP, and hooks because those cover the main generated outputs.
- Keep the runtime skill as a guided QA playbook rather than a broad QA matrix or fixed smoke harness.
- Provide a repo-local Dockerfile for the sandbox/toolchain only; keep fixture design and assertions manual.
- Keep the Dockerfile's prepared pnpm version aligned with the root `packageManager`.
- Mount the host checkout read-only and do dependency install/build inside Docker to avoid host system-file writes and host binary assumptions.
- Exclude `*.tsbuildinfo` with `dist` so TypeScript project references rebuild cleanly instead of reusing stale host incremental state.
- Document `-i` for non-TTY Docker runs because stdin must stay attached when an agent feeds commands through a here-doc.
- Make agents derive the fixture from the diff first, then use local `path:` skills, specific agents, MCP entries, hooks, broken state, user scope, or remote sources as needed.
- Run `pnpm check` and `pnpm build` inside Docker when appropriate, then run `packages/dotagents/dist/cli/index.js` from inside the fixture project.
- Require inspection of generated files and command output that demonstrate the changed behavior, not only exit codes.
- Expose the skill through `.agents/skills/dotagents-qa` so existing Claude/Cursor symlinks discover it.
- Keep agent CLI registration and remote-source checks optional because they add auth, network, or tool-version variance.

Expand Down
26 changes: 17 additions & 9 deletions skills/dotagents-qa/SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,40 @@

## Intent

Provide a concise workflow for proving that local dotagents changes still install a representative `agents.toml` setup into the expected project locations.
Provide a concise, Docker-sandboxed playbook for performing change-specific dotagents QA without writing to host agent config or host home directories.

## Scope

In scope:
- building and running the local dotagents CLI
- creating disposable project fixtures with local skill sources
- designing disposable project fixtures inside Docker based on the changed behavior
- verifying `.agents/skills/`, agent skills symlinks, MCP configs, hook configs, `list`, `doctor`, and `sync`
- verifying `--user` behavior with `DOTAGENTS_HOME` inside Docker
- optionally checking agent CLI registration when discovery paths changed
- optionally using `getsentry/skills` for remote source behavior
- isolating home and cache state during smoke tests
- isolating home and cache state from the host

Out of scope:
- release publishing
- network-backed source testing for ordinary install-location changes
- replacing focused Vitest regression coverage for logic bugs
- treating a fixed smoke script as sufficient QA for behavior-specific changes

## Runtime Contract

- Start from the changed dotagents behavior and customize the fixture only as needed.
- Run the built CLI from inside the temp project so project scope resolves correctly.
- Isolate `HOME`, `DOTAGENTS_STATE_DIR`, and `DOTAGENTS_HOME` for user-scope checks.
- Start from the changed dotagents behavior and state the behavior being proved before running commands.
- Use the repo-local QA Dockerfile for a stable Node/pnpm/system-tool baseline.
- Mount the host checkout read-only and copy it into a throwaway container before installing dependencies or building.
- Exclude host build artifacts and TypeScript build-info files from the container copy.
- Support both interactive Docker sessions and stdin-attached non-TTY runs.
- Run the built CLI from inside the container fixture project so project scope resolves correctly.
- Inspect generated files and command output that demonstrate the changed behavior, not just exit codes.
- Keep `HOME`, `DOTAGENTS_STATE_DIR`, and `DOTAGENTS_HOME` inside Docker for user-scope checks.
- Report fixture shape, commands, assertions, skipped checks, and residual risk.

## Maintenance

- Keep `SKILL.md` focused on temp-project smoke testing.
- Update the fixture when dotagents changes config fields, generated file locations, or supported agents.
- Add scripts only if the same fixture becomes stable enough to automate.
- Keep `SKILL.md` focused on guided, change-specific Docker QA.
- Keep examples editable and illustrative; do not turn the skill into a fixed test harness.
- Keep the Dockerfile pnpm version aligned with the root `packageManager`.
- Update examples when dotagents changes config fields, generated file locations, or supported agents.
Loading