feat(bench): add npm run bench + bench:cold-start runners#1111
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces local benchmarking entrypoints plus multiple “repo health” automation scripts/workflows and several product/test updates (CLI/SDK/docs/templates), effectively broadening beyond the stated “bench foundation” scope.
Changes:
- Add
npm run benchandnpm run bench:cold-startrunners underscripts/with smoke tests. - Add PR health automation (readiness, impact analysis, repo health checks, PR nudge) via new scripts and GitHub workflows.
- Update CLI/SDK behavior and tests (e.g., new
squad skillcommand, watch agent spawning utilities, remote URL parsing fixes), plus version bumps/changesets.
Show a summary per file
| File | Description |
|---|---|
| test/template-sync.test.ts | Re-sync templates before parity assertions; updates sync invocation |
| test/scripts/security-review.test.ts | Updates expected security-review categories |
| test/scripts/security-review-skills.test.ts | Removes skill-scanner tests |
| test/scripts/risk-scorer.test.ts | Adds unit tests for risk scoring |
| test/scripts/parse-diff.test.ts | Adds unit tests for diff parsing helpers |
| test/platform-adapter.test.ts | Adds regression cases for dotted repo names |
| test/migrate-directory.test.cjs | Adjusts .NET workflow expectations to upgrade |
| test/cross-package-exports.test.ts | Adds runtime export smoke test (CLI → SDK) |
| test/comms-teams-integration.test.ts | Updates expected warning message text |
| test/cli/upgrade.test.ts | Uses OS temp dir; removes warnIfSkillCustomized tests |
| test/cli/loop.test.ts | Updates error expectation to “Copilot CLI” |
| test/cli/init.test.ts | Uses OS temp dir for init tests |
| test/bench-runner.test.ts | Adds smoke tests for run-benchmarks runner |
| scripts/security-review.mjs | Adds PR diff security scanner script |
| scripts/run-benchmarks.mjs | Adds npm run bench CLI shim runner |
| scripts/repo-health-comment.mjs | Adds helper to upsert repo-health PR comments |
| scripts/pr-readiness.mjs | Adds PR readiness checks + comment upsert script |
| scripts/measure-cold-start.mjs | Adds cold-start latency measurement runner |
| scripts/impact-utils/risk-scorer.mjs | Adds risk tier scoring utility |
| scripts/impact-utils/report-generator.mjs | Adds markdown impact report generator |
| scripts/impact-utils/parse-diff.mjs | Adds diff parsing helpers |
| scripts/check-squad-leakage.mjs | Adds .squad/ leakage detector |
| scripts/check-bootstrap-deps.mjs | Adds bootstrap “node:* only” dependency gate |
| scripts/architectural-review.mjs | Adds architectural review scanner script |
| scripts/analyze-impact.mjs | Adds PR impact analysis generator (gh-driven) |
| packages/squad-sdk/src/platform/detect.ts | Widens repo-name regex to allow dots |
| packages/squad-sdk/package.json | Bumps SDK version to 0.9.4 |
| packages/squad-cli/src/cli/commands/watch/agent-spawn.ts | Adds shared agent spawn utilities + copilot detection |
| packages/squad-cli/src/cli/commands/skill.ts | Adds squad skill (APM integration) command |
| packages/squad-cli/src/cli-entry.ts | Wires new skill command entrypoint |
| packages/squad-cli/package.json | Bumps CLI version + pins SDK dependency |
| package.json | Bumps root version; adds bench scripts |
| package-lock.json | Updates lockfile versions/deps |
| index.cjs | Renames hasForce to shouldForce |
| docs/src/navigation.ts | Updates SDK API Reference nav slug |
| docs/src/content/docs/reference/sdk.md | Fixes API reference link target |
| docs/src/content/docs/features/state-backends.md | Adds state backends documentation page |
| CHANGELOG.md | Adds 0.9.4 release header |
| .squad/templates/agents/challenger.md | Adds Challenger agent template |
| .squad/skills/fact-checking/SKILL.md | Adds fact-checking skill |
| .squad-templates/workflow-wiring-guide.md | Adds workflow wiring guide template |
| .squad-templates/workflow-wiring-appendix-b-documenter.md | Adds documenter wiring appendix |
| .squad-templates/workflow-wiring-appendix-a-code-reviewer.md | Adds code reviewer wiring appendix |
| .squad-templates/squad.agent.md | Updates coordinator template instructions |
| .github/workflows/squad-scope-check.yml | Adds repo-health scope boundary check |
| .github/workflows/squad-repo-health.yml | Adds repo health workflow (bootstrap/leakage/arch/security) |
| .github/workflows/squad-pr-readiness.yml | Adds PR readiness comment workflow |
| .github/workflows/squad-pr-nudge.yml | Adds scheduled stale PR nudge workflow |
| .github/workflows/squad-npm-publish.yml | Tightens lockfile integrity check condition |
| .github/workflows/squad-impact.yml | Adds automated PR impact report comment |
| .github/workflows/squad-docs-links.yml | Adds manual docs link-check workflow |
| .github/copilot-instructions.md | Documents automated PR nudge workflow |
| .changeset/watch-rate-limit-detection.md | Removes changeset entry |
| .changeset/watch-p0-p1-fixes.md | Removes changeset entry |
| .changeset/teams-adapter-security.md | Removes changeset entry |
| .changeset/teams-adapter-fixes.md | Removes changeset entry |
| .changeset/start-tunnel-node-pty.md | Removes changeset entry |
| .changeset/skill-security-scanner.md | Removes changeset entry |
| .changeset/shell-injection-fixes.md | Removes changeset entry |
| .changeset/review-findings-fix.md | Removes changeset entry |
| .changeset/pid-tracker-cleanup.md | Removes changeset entry |
| .changeset/monorepo-subfolder-support.md | Removes changeset entry |
| .changeset/fix-copilot-message-flag.md | Removes changeset entry |
| .changeset/fix-cast-base-path.md | Removes changeset entry |
| .changeset/external-capability-loading.md | Removes changeset entry |
| .changeset/dynamic-state-discovery.md | Removes changeset entry |
| .changeset/fix-watch-windows-shared-fetch.md | Adds changeset for watch Windows/shared fetch work |
| .changeset/deprecate-tunnel-rc-repl.md | Adds changeset for deprecation warnings |
| .changeset/apm-integration.md | Adds changeset for APM/skill command feature |
| .changeset/audit-onboarding-skill-guards.md | Removes changeset entry |
Copilot's findings
- Files reviewed: 71/72 changed files
- Comments generated: 6
PR Checkup — feedback triagedStatus: 5 of 6 threads resolved (out-of-scope script/workflow bugs to be addressed in a focused repo-health PR + rebase). 1 thread left open: the meta scope concern on Root cause: PR was branched from stale Recommended next step: land focused repo-health PR to @copilot-pull-request-reviewer — thanks for the careful review. |
55b699a to
f73f0cb
Compare
Foundation PR for measuring perf changes consistently. - scripts/run-benchmarks.mjs — thin CLI shim over the existing BenchmarkSuite (packages/squad-sdk/src/runtime/benchmarks.ts). Supports --iterations=N, --filter=NAME, --json, --help. - scripts/measure-cold-start.mjs — wall-clock CLI cold-start timer (squad --version, squad help) over N runs with avg/min/max/p95. Uses NO_COLOR=1 to avoid ANSI noise in captured output. - New npm scripts: bench, bench:cold-start. - test/bench-runner.test.ts — smoke tests for the runner shim: argument parsing, --help, --json, --filter, missing dist/. Runtime tests use it.runIf(SDK_BUILT) so they skip cleanly on fresh checkouts where dist/ has not been built yet. Both scripts fail fast with a helpful message if the SDK or CLI dist/ output is missing. No production code changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
f73f0cb to
d757437
Compare
|
✅ All previously-flagged review feedback has been addressed:
@copilot-pull-request-reviewer — ready for another look when you get a chance. Thanks! |
Summary
Foundation PR for the performance-improvement series: adds a thin CLI shim over the existing
BenchmarkSuite(already shipped in@bradygaster/squad-sdk/runtime/benchmarks) plus a wall-clock CLI cold-start timer. Lets every subsequent perf PR include before/after numbers.No production code changes. Two new npm scripts and two new files under
scripts/.What Changed
scripts/run-benchmarks.mjs— wrapsBenchmarkSuite.runAll()+formatBenchmarkReport(). Flags:--iterations=N,--filter=NAME,--json,-h. Fails fast with a helpful message ifpackages/squad-sdk/dist/is missing.scripts/measure-cold-start.mjs— spawns the built CLI undernodewithNO_COLOR=1and timessquad --version/squad helpover N runs. Reports avg/min/max/p95 + aFailscolumn. Exits non-zero if any run failed (this is critical so CI catches a broken CLI instead of reporting valid-looking timings for a non-functional binary).package.json: new scriptsbenchandbench:cold-start.test/bench-runner.test.ts(new, 7 cases) — argument parsing,--help,--json,--filter, missing dist behavior. Runtime tests useit.runIf(SDK_BUILT)so the suite skips cleanly on a fresh checkout wheredist/has not been built.Testing
npm run bench -- --filter=routing --iterations=3produces a clean formatted table.npm run bench -- --json --iterations=2emits parseable JSON withresults / totalTime / timestamp.npm run bench:cold-start -- --runs=3produces timing output and exits 0 when the CLI is healthy.Notes
Part of a five-PR series. Land this first — every subsequent PR's description can then reference before/after numbers from
npm run bench.Companion PRs: