Skip to content

feat(bench): add npm run bench + bench:cold-start runners#1111

Merged
bradygaster merged 1 commit into
bradygaster:devfrom
spboyer:feat/bench-runner
May 14, 2026
Merged

feat(bench): add npm run bench + bench:cold-start runners#1111
bradygaster merged 1 commit into
bradygaster:devfrom
spboyer:feat/bench-runner

Conversation

@spboyer
Copy link
Copy Markdown
Contributor

@spboyer spboyer commented May 12, 2026

Summary

Foundation PR for the performance-improvement series: adds a thin CLI shim over the existing BenchmarkSuite (already shipped in @bradygaster/squad-sdk/runtime/benchmarks) plus a wall-clock CLI cold-start timer. Lets every subsequent perf PR include before/after numbers.

No production code changes. Two new npm scripts and two new files under scripts/.

What Changed

  • scripts/run-benchmarks.mjs — wraps BenchmarkSuite.runAll() + formatBenchmarkReport(). Flags: --iterations=N, --filter=NAME, --json, -h. Fails fast with a helpful message if packages/squad-sdk/dist/ is missing.
  • scripts/measure-cold-start.mjs — spawns the built CLI under node with NO_COLOR=1 and times squad --version / squad help over N runs. Reports avg/min/max/p95 + a Fails column. Exits non-zero if any run failed (this is critical so CI catches a broken CLI instead of reporting valid-looking timings for a non-functional binary).
  • package.json: new scripts bench and bench:cold-start.
  • test/bench-runner.test.ts (new, 7 cases) — argument parsing, --help, --json, --filter, missing dist behavior. Runtime tests use it.runIf(SDK_BUILT) so the suite skips cleanly on a fresh checkout where dist/ has not been built.

Testing

  • npm run bench -- --filter=routing --iterations=3 produces a clean formatted table.
  • npm run bench -- --json --iterations=2 emits parseable JSON with results / totalTime / timestamp.
  • npm run bench:cold-start -- --runs=3 produces timing output and exits 0 when the CLI is healthy.
  • 7/7 bench-runner smoke tests pass.

Notes

Part of a five-PR series. Land this first — every subsequent PR's description can then reference before/after numbers from npm run bench.

Companion PRs:

  • chore: dependency hygiene (zero-risk; deletes redundant playwright/esbuild and stale tarballs)
  • resolution cache + multi-squad config dedupe
  • parallel charter discovery with bounded concurrency
  • non-blocking scheduler script execution

@spboyer spboyer marked this pull request as ready for review May 13, 2026 13:49
Copilot AI review requested due to automatic review settings May 13, 2026 13:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces local benchmarking entrypoints plus multiple “repo health” automation scripts/workflows and several product/test updates (CLI/SDK/docs/templates), effectively broadening beyond the stated “bench foundation” scope.

Changes:

  • Add npm run bench and npm run bench:cold-start runners under scripts/ with smoke tests.
  • Add PR health automation (readiness, impact analysis, repo health checks, PR nudge) via new scripts and GitHub workflows.
  • Update CLI/SDK behavior and tests (e.g., new squad skill command, watch agent spawning utilities, remote URL parsing fixes), plus version bumps/changesets.
Show a summary per file
File Description
test/template-sync.test.ts Re-sync templates before parity assertions; updates sync invocation
test/scripts/security-review.test.ts Updates expected security-review categories
test/scripts/security-review-skills.test.ts Removes skill-scanner tests
test/scripts/risk-scorer.test.ts Adds unit tests for risk scoring
test/scripts/parse-diff.test.ts Adds unit tests for diff parsing helpers
test/platform-adapter.test.ts Adds regression cases for dotted repo names
test/migrate-directory.test.cjs Adjusts .NET workflow expectations to upgrade
test/cross-package-exports.test.ts Adds runtime export smoke test (CLI → SDK)
test/comms-teams-integration.test.ts Updates expected warning message text
test/cli/upgrade.test.ts Uses OS temp dir; removes warnIfSkillCustomized tests
test/cli/loop.test.ts Updates error expectation to “Copilot CLI”
test/cli/init.test.ts Uses OS temp dir for init tests
test/bench-runner.test.ts Adds smoke tests for run-benchmarks runner
scripts/security-review.mjs Adds PR diff security scanner script
scripts/run-benchmarks.mjs Adds npm run bench CLI shim runner
scripts/repo-health-comment.mjs Adds helper to upsert repo-health PR comments
scripts/pr-readiness.mjs Adds PR readiness checks + comment upsert script
scripts/measure-cold-start.mjs Adds cold-start latency measurement runner
scripts/impact-utils/risk-scorer.mjs Adds risk tier scoring utility
scripts/impact-utils/report-generator.mjs Adds markdown impact report generator
scripts/impact-utils/parse-diff.mjs Adds diff parsing helpers
scripts/check-squad-leakage.mjs Adds .squad/ leakage detector
scripts/check-bootstrap-deps.mjs Adds bootstrap “node:* only” dependency gate
scripts/architectural-review.mjs Adds architectural review scanner script
scripts/analyze-impact.mjs Adds PR impact analysis generator (gh-driven)
packages/squad-sdk/src/platform/detect.ts Widens repo-name regex to allow dots
packages/squad-sdk/package.json Bumps SDK version to 0.9.4
packages/squad-cli/src/cli/commands/watch/agent-spawn.ts Adds shared agent spawn utilities + copilot detection
packages/squad-cli/src/cli/commands/skill.ts Adds squad skill (APM integration) command
packages/squad-cli/src/cli-entry.ts Wires new skill command entrypoint
packages/squad-cli/package.json Bumps CLI version + pins SDK dependency
package.json Bumps root version; adds bench scripts
package-lock.json Updates lockfile versions/deps
index.cjs Renames hasForce to shouldForce
docs/src/navigation.ts Updates SDK API Reference nav slug
docs/src/content/docs/reference/sdk.md Fixes API reference link target
docs/src/content/docs/features/state-backends.md Adds state backends documentation page
CHANGELOG.md Adds 0.9.4 release header
.squad/templates/agents/challenger.md Adds Challenger agent template
.squad/skills/fact-checking/SKILL.md Adds fact-checking skill
.squad-templates/workflow-wiring-guide.md Adds workflow wiring guide template
.squad-templates/workflow-wiring-appendix-b-documenter.md Adds documenter wiring appendix
.squad-templates/workflow-wiring-appendix-a-code-reviewer.md Adds code reviewer wiring appendix
.squad-templates/squad.agent.md Updates coordinator template instructions
.github/workflows/squad-scope-check.yml Adds repo-health scope boundary check
.github/workflows/squad-repo-health.yml Adds repo health workflow (bootstrap/leakage/arch/security)
.github/workflows/squad-pr-readiness.yml Adds PR readiness comment workflow
.github/workflows/squad-pr-nudge.yml Adds scheduled stale PR nudge workflow
.github/workflows/squad-npm-publish.yml Tightens lockfile integrity check condition
.github/workflows/squad-impact.yml Adds automated PR impact report comment
.github/workflows/squad-docs-links.yml Adds manual docs link-check workflow
.github/copilot-instructions.md Documents automated PR nudge workflow
.changeset/watch-rate-limit-detection.md Removes changeset entry
.changeset/watch-p0-p1-fixes.md Removes changeset entry
.changeset/teams-adapter-security.md Removes changeset entry
.changeset/teams-adapter-fixes.md Removes changeset entry
.changeset/start-tunnel-node-pty.md Removes changeset entry
.changeset/skill-security-scanner.md Removes changeset entry
.changeset/shell-injection-fixes.md Removes changeset entry
.changeset/review-findings-fix.md Removes changeset entry
.changeset/pid-tracker-cleanup.md Removes changeset entry
.changeset/monorepo-subfolder-support.md Removes changeset entry
.changeset/fix-copilot-message-flag.md Removes changeset entry
.changeset/fix-cast-base-path.md Removes changeset entry
.changeset/external-capability-loading.md Removes changeset entry
.changeset/dynamic-state-discovery.md Removes changeset entry
.changeset/fix-watch-windows-shared-fetch.md Adds changeset for watch Windows/shared fetch work
.changeset/deprecate-tunnel-rc-repl.md Adds changeset for deprecation warnings
.changeset/apm-integration.md Adds changeset for APM/skill command feature
.changeset/audit-onboarding-skill-guards.md Removes changeset entry

Copilot's findings

  • Files reviewed: 71/72 changed files
  • Comments generated: 6

Comment thread test/template-sync.test.ts
Comment thread packages/squad-cli/src/cli/commands/watch/agent-spawn.ts Outdated
Comment thread scripts/pr-readiness.mjs Outdated
Comment thread .github/workflows/squad-pr-nudge.yml Outdated
Comment thread package.json
Comment thread .github/workflows/squad-repo-health.yml Outdated
@spboyer
Copy link
Copy Markdown
Contributor Author

spboyer commented May 13, 2026

PR Checkup — feedback triaged

Status: 5 of 6 threads resolved (out-of-scope script/workflow bugs to be addressed in a focused repo-health PR + rebase). 1 thread left open: the meta scope concern on package.json:17.

Root cause: PR was branched from stale main (~36 unrelated commits in the diff). The intended bench-runner addition is the only in-scope content.

Recommended next step: land focused repo-health PR to dev for the genuine bugs, rebase this branch onto current dev, then re-request review.

@copilot-pull-request-reviewer — thanks for the careful review.

@spboyer spboyer force-pushed the feat/bench-runner branch from 55b699a to f73f0cb Compare May 13, 2026 14:49
Foundation PR for measuring perf changes consistently.

- scripts/run-benchmarks.mjs — thin CLI shim over the existing
  BenchmarkSuite (packages/squad-sdk/src/runtime/benchmarks.ts).
  Supports --iterations=N, --filter=NAME, --json, --help.
- scripts/measure-cold-start.mjs — wall-clock CLI cold-start timer
  (squad --version, squad help) over N runs with avg/min/max/p95.
  Uses NO_COLOR=1 to avoid ANSI noise in captured output.
- New npm scripts: bench, bench:cold-start.
- test/bench-runner.test.ts — smoke tests for the runner shim:
  argument parsing, --help, --json, --filter, missing dist/.
  Runtime tests use it.runIf(SDK_BUILT) so they skip cleanly on
  fresh checkouts where dist/ has not been built yet.

Both scripts fail fast with a helpful message if the SDK or CLI
dist/ output is missing.

No production code changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the feat/bench-runner branch from f73f0cb to d757437 Compare May 13, 2026 15:08
@spboyer
Copy link
Copy Markdown
Contributor Author

spboyer commented May 13, 2026

✅ All previously-flagged review feedback has been addressed:

  • Rebased onto current dev so the diff is now scoped to just adding the npm run bench + bench:cold-start runners — earlier scope-drift comments no longer apply.
  • Resolved every unresolved review thread with an explanatory reply.
  • CI is green (test, Policy Gates, sdk-exports-validation, samples-build).

@copilot-pull-request-reviewer — ready for another look when you get a chance. Thanks!

@bradygaster bradygaster merged commit a383714 into bradygaster:dev May 14, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants