Skip to content

perf(agents): parallel charter discovery with bounded concurrency#1113

Merged
bradygaster merged 2 commits into
bradygaster:devfrom
spboyer:perf/charter-discovery-parallel
May 14, 2026
Merged

perf(agents): parallel charter discovery with bounded concurrency#1113
bradygaster merged 2 commits into
bradygaster:devfrom
spboyer:perf/charter-discovery-parallel

Conversation

@spboyer
Copy link
Copy Markdown
Contributor

@spboyer spboyer commented May 12, 2026

Summary

Three serial for...await loops in agent/charter discovery were linear in the number of agents — costly on GitHub-sourced teams (each iteration is an HTTP round-trip) and wasteful on local storage even though FS reads are fan-out safe up to ulimit.

For an 8-agent team this drops GitHubAgentSource.listAgents() latency from ~1.6-4 s of serial HTTP round-trips to ~0.4-1 s parallel.

What Changed

New utility

packages/squad-sdk/src/utils/map-with-limit.ts:

  • mapWithLimit(items, limit, fn) — bounded parallel map, preserves input order, fast-fails on first rejection.
  • mapWithLimitSettled(items, limit, fn) — same shape but returns per-item PromiseSettledResult so one bad input does not abort the batch.

Both are implemented with a worker-pool pattern and an indexed result array (not push-on-completion) so order is preserved regardless of completion sequence.

Call sites

packages/squad-sdk/src/config/agent-source.ts:

  • LocalAgentSource.listAgents():

    • SquadState branch uses mapWithLimitSettled(items, 8, ...) — the prior implementation already continued on per-agent errors, so settled semantics preserve behavior.
    • Storage fallback uses mapWithLimit(items, 8, ...) — fast-fail because the original code propagated isDirectory() / read() errors to the caller. Storage backend outages now still surface instead of producing a silently-partial list.
    • Storage fallback also parallelises the isDirectory probe so concurrency budget is spent on real work.
  • GitHubAgentSource.listAgents():

    • Uses mapWithLimit(dirs, 5, ...) — bounded at 5 to stay well clear of GitHub secondary rate limits while still removing serial HTTP latency.
    • Critical fix from initial review: uses mapWithLimit not mapWithLimitSettled, so a 403 / rate-limit / auth / network failure still rejects the call instead of being silently swallowed. The original for...await propagated these errors; the parallel version must too.

packages/squad-sdk/src/agents/index.ts:

  • CharterCompiler.compileAll() (both SquadState and storage fallback branches) uses mapWithLimitSettled(candidates, 8, ...). The prior implementation try/catched each charter compile and continued on failure, so per-item settled semantics are correct here.

Concurrency limits

Site Limit Rationale
LocalAgentSource (FS) 8 Headroom under typical macOS (~256) / Linux (~1024) ulimits
GitHubAgentSource (HTTP) 5 Comfortably below GitHub secondary rate-limit thresholds
CharterCompiler.compileAll 8 Same FS rationale

Testing

  • test/map-with-limit.test.ts (new, 12 cases):
    • Order preservation under reverse-order completion
    • Real concurrency cap proved via gated deferred() (3 in flight is observable before release)
    • mapWithLimit fast-fails; mapWithLimitSettled never throws
    • Empty input, limit > items.length, limit < 1 / NaN edge cases
  • test/agent-source-github.test.ts — 19 existing tests still pass unchanged + 2 new regression tests:
    • "propagates a thrown fetcher error instead of silently dropping the agent" (asserts 403 from a mocked fetcher rejects the call — would have caught the bug fixed during review)
    • "still skips agents whose charter.md returns null" (preserves the existing missing-charter behavior)
  • test/charter-compiler.test.ts — 38 existing tests pass unchanged.

Notes

Part of a five-PR perf series. Companion PRs:

  • chore: dependency hygiene
  • feat(bench): npm run bench + bench:cold-start runners
  • resolution cache + multi-squad config dedupe
  • non-blocking scheduler script execution

@spboyer spboyer marked this pull request as ready for review May 13, 2026 13:49
Copilot AI review requested due to automatic review settings May 13, 2026 13:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces bounded-concurrency utilities and applies them to agent/charter discovery to reduce latency (especially for GitHub-backed teams), while also bundling a large set of repo-health automation, workflows, CLI features, docs, template updates, and versioning/changeset churn.

Changes:

  • Add mapWithLimit / mapWithLimitSettled and use them to parallelize agent discovery and charter compilation with configured concurrency caps.
  • Add several GitHub Actions + scripts for repo health (bootstrap dependency gate, leakage detection, architectural/security review comments, PR readiness, impact analysis, PR nudges).
  • Add/adjust CLI behavior and docs/templates, plus bump versions to 0.9.4 across packages.
Show a summary per file
File Description
test/template-sync.test.ts Re-sync templates before byte-compare tests; adjust script invocation to --sync.
test/scripts/security-review.test.ts Updates expected categories (removes skill-scan categories).
test/scripts/security-review-skills.test.ts Removes the skill security scanner test suite.
test/scripts/risk-scorer.test.ts Adds unit tests for new PR risk scoring helper.
test/scripts/parse-diff.test.ts Adds unit tests for diff-name parsing and file-status enrichment helpers.
test/platform-adapter.test.ts Adds regression coverage for dot-containing repo names in remote parsing.
test/migrate-directory.test.cjs Updates .NET workflow expectations to occur on upgrade rather than init.
test/map-with-limit.test.ts Adds unit tests for bounded-concurrency map helpers.
test/cross-package-exports.test.ts Adds runtime smoke tests ensuring CLI-imported SDK exports exist and exports-map targets exist.
test/comms-teams-integration.test.ts Updates expected warning message text for token refresh failure fallback.
test/cli/upgrade.test.ts Moves test root to OS temp; removes warnIfSkillCustomized test block.
test/cli/loop.test.ts Updates expected error text from “gh CLI” to “Copilot CLI”.
test/cli/init.test.ts Moves test root to OS temp.
test/agent-source-github.test.ts Adds regression tests ensuring GitHub fetch errors propagate and missing charters are still skipped.
scripts/security-review.mjs Adds a new informational security review script for PR diffs (workflows/secrets/exec/etc).
scripts/repo-health-comment.mjs Adds comment upsert helper for repo-health workflow findings.
scripts/pr-readiness.mjs Adds PR readiness checks + comment upsert orchestration script.
scripts/impact-utils/risk-scorer.mjs Adds risk tier calculation helper for impact analysis.
scripts/impact-utils/report-generator.mjs Adds markdown report generator for impact analysis.
scripts/impact-utils/parse-diff.mjs Adds helpers to parse diff name output and enrich file statuses.
scripts/check-squad-leakage.mjs Adds detector script for accidental .squad/ file leakage in PRs.
scripts/check-bootstrap-deps.mjs Adds blocking gate for “protected bootstrap files” importing only node:*/built-ins.
scripts/architectural-review.mjs Adds architectural review script (bootstrap, exports, cross-imports, template sync signal, refactor size, deletions).
scripts/analyze-impact.mjs Adds PR impact analysis script powered by GH CLI + helpers above.
packages/squad-sdk/src/utils/map-with-limit.ts Adds bounded-concurrency map utilities used by agent/charter discovery.
packages/squad-sdk/src/platform/detect.ts Broadens repo-name regex to allow dots; adds inline rationale comments.
packages/squad-sdk/src/config/agent-source.ts Parallelizes local/GitHub agent discovery with concurrency limits and correct error semantics.
packages/squad-sdk/src/agents/index.ts Parallelizes CharterCompiler.compileAll() with bounded concurrency using settled semantics.
packages/squad-sdk/package.json Bumps SDK version to 0.9.4.
packages/squad-cli/src/cli/commands/watch/agent-spawn.ts Adds shared spawn utilities for watch capabilities and Copilot CLI resolution logic.
packages/squad-cli/src/cli/commands/skill.ts Adds new squad skill APM integration command (publish/install/list).
packages/squad-cli/src/cli-entry.ts Adds skill command routing via dynamic import.
packages/squad-cli/package.json Bumps CLI version to 0.9.4 and pins SDK dependency to 0.9.4.
package.json Bumps repo version to 0.9.4.
package-lock.json Updates workspace lock entries (currently showing 0.9.4-insider.1).
index.cjs Renames hasForce to shouldForce for clarity in import flow.
docs/src/navigation.ts Updates SDK API reference navigation slug.
docs/src/content/docs/reference/sdk.md Updates “API Reference” link target.
docs/src/content/docs/features/state-backends.md Adds new documentation page explaining state backends and usage.
CHANGELOG.md Adds a 0.9.4 header entry.
.squad/templates/agents/challenger.md Adds a new “Challenger” agent template.
.squad/skills/fact-checking/SKILL.md Adds a fact-checking skill definition and review format guidance.
.squad-templates/workflow-wiring-guide.md Adds workflow wiring guide template content.
.squad-templates/workflow-wiring-appendix-b-documenter.md Adds appendix template for wiring a documenter/librarian role.
.squad-templates/workflow-wiring-appendix-a-code-reviewer.md Adds appendix template for wiring a reviewer gate.
.squad-templates/squad.agent.md Updates coordinator template instructions (notably removes CURRENT_DATETIME propagation in prompts and updates MCP guidance/worktree logic).
.github/workflows/squad-scope-check.yml Adds workflow to enforce repo-health PR scope boundaries.
.github/workflows/squad-repo-health.yml Adds repo health workflow (bootstrap protection, leakage, architectural/security review comments).
.github/workflows/squad-pr-readiness.yml Adds PR readiness workflow that posts/upserts a checklist comment.
.github/workflows/squad-pr-nudge.yml Adds scheduled PR nudge workflow for stale PRs.
.github/workflows/squad-npm-publish.yml Tightens lockfile integrity check condition to only flag https:// resolved deps.
.github/workflows/squad-impact.yml Adds impact analysis workflow that posts/upserts an impact report comment.
.github/workflows/squad-docs-links.yml Adds manual weekly link check workflow that files an issue on broken links.
.github/copilot-instructions.md Documents the new PR nudge workflow behavior.
.changeset/watch-rate-limit-detection.md Removes changeset entry.
.changeset/watch-p0-p1-fixes.md Removes changeset entry.
.changeset/teams-adapter-security.md Removes changeset entry.
.changeset/teams-adapter-fixes.md Removes changeset entry.
.changeset/start-tunnel-node-pty.md Removes changeset entry.
.changeset/skill-security-scanner.md Removes changeset entry.
.changeset/shell-injection-fixes.md Removes changeset entry.
.changeset/review-findings-fix.md Removes changeset entry.
.changeset/pid-tracker-cleanup.md Removes changeset entry.
.changeset/monorepo-subfolder-support.md Removes changeset entry.
.changeset/fix-watch-windows-shared-fetch.md Adds changeset entry describing Windows/watch fixes + shared agent-spawn.
.changeset/fix-copilot-message-flag.md Removes changeset entry.
.changeset/fix-cast-base-path.md Removes changeset entry.
.changeset/external-capability-loading.md Removes changeset entry.
.changeset/dynamic-state-discovery.md Removes changeset entry.
.changeset/deprecate-tunnel-rc-repl.md Adds changeset entry for deprecation notices.
.changeset/audit-onboarding-skill-guards.md Removes changeset entry.
.changeset/apm-integration.md Adds changeset entry for APM integration feature.

Copilot's findings

  • Files reviewed: 73/74 changed files
  • Comments generated: 5

Comment thread packages/squad-sdk/src/utils/map-with-limit.ts
Comment thread scripts/pr-readiness.mjs Outdated
Comment thread .github/workflows/squad-pr-nudge.yml Outdated
Comment thread .github/workflows/squad-repo-health.yml Outdated
Comment thread packages/squad-sdk/src/config/agent-source.ts
spboyer pushed a commit to spboyer/squad that referenced this pull request May 13, 2026
`1.5` and `Infinity` (which fails fast for other reasons), producing a
worker count via `Math.min(limit, items.length)` that did not match the
caller's intent and contradicted the existing error message. Switch to
and error message already promise.

Adds regression tests for `1.5` and `Infinity` on both functions.

Per review feedback on PR bradygaster#1113.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer
Copy link
Copy Markdown
Contributor Author

spboyer commented May 13, 2026

PR Checkup — feedback addressed

Code fix applied (in-scope):

  • b7310ca6: mapWithLimit / mapWithLimitSettled now reject non-integer limits with the documented "positive integer" error. Regression tests added.
  • Changeset: .changeset/map-with-limit-integer-validation.md (patch to @bradygaster/squad-sdk)

Out-of-scope feedback (acknowledged + resolved):

  • 3 review threads on scripts/pr-readiness.mjs, .github/workflows/squad-pr-nudge.yml, and .github/workflows/squad-repo-health.yml flag real bugs but those files are only in this diff because the branch was cut from a stale base. They will be addressed in a focused repo-health PR to dev.

Open thread (left intentionally):

  • The PR-scope concern thread is left open until the branch is rebased onto current dev (will reduce diff to just the perf change in agent-source.ts).

@copilot-pull-request-reviewer — re-review requested once the rebase + repo-health PR land. Thanks for the careful review!

@spboyer spboyer force-pushed the perf/charter-discovery-parallel branch from b7310ca to 6d9bcce Compare May 13, 2026 14:49
Three serial 'for...await' loops in agent/charter discovery were
linear in the number of agents — costly on GitHub-sourced teams
(each iteration is an HTTP round-trip) and unnecessary on local
storage (filesystem reads are fan-out safe up to ulimit).

Changes
-------
- New utility: packages/squad-sdk/src/utils/map-with-limit.ts
  - mapWithLimit(items, limit, fn) — bounded parallel map; preserves
    input order; fast-fails on first rejection.
  - mapWithLimitSettled(items, limit, fn) — same but with per-item
    PromiseSettledResult shape so one bad input does not abort.

- LocalAgentSource.listAgents() (config/agent-source.ts):
  - Both SquadState and storage-fallback branches now parallelise the
    per-agent charter reads.
  - Storage-fallback also parallelises the isDirectory probe before
    dispatching reads, so concurrency budget is spent on real work.
  - Concurrency = 8 (leaves headroom under typical macOS/Linux ulimits).

- GitHubAgentSource.listAgents() (config/agent-source.ts):
  - Bounded concurrency = 5 to stay clear of GitHub secondary rate
    limits while still removing serial HTTP latency. For an 8-agent
    team this drops list latency from ~1.6-4s to ~0.4-1s.

- CharterCompiler.compileAll() (agents/index.ts):
  - Both branches (SquadState and raw StorageProvider) parallelise
    the per-charter compile call.
  - Order preserved via indexed-array result accumulation, NOT push-
    on-completion, so downstream consumers see the same sequence as
    the prior sequential implementation.
  - Concurrency = 8.

All three call sites use mapWithLimitSettled() so a single corrupt or
missing charter.md is skipped (matching the prior 'try { ... } catch'
behavior) without aborting the batch.

Tests
-----
- test/map-with-limit.test.ts (12 cases):
  - order preservation under reverse-order completion
  - real concurrency cap (gated execution proves the ceiling holds)
  - mapWithLimit fast-fails; mapWithLimitSettled never throws
  - empty input, limit > items.length, limit < 1 / NaN
- All 38 existing charter-compiler tests + 19 existing GitHub agent-source
  tests pass unchanged (the parallel implementation is behaviorally
  identical for these test fixtures).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the perf/charter-discovery-parallel branch from 6d9bcce to c2698c8 Compare May 13, 2026 15:10
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer
Copy link
Copy Markdown
Contributor Author

spboyer commented May 13, 2026

✅ All previously-flagged review feedback has been addressed:

  • Rebased onto current dev so the diff is now scoped to just parallel charter discovery with bounded concurrency — earlier scope-drift comments no longer apply.
  • Resolved every unresolved review thread with an explanatory reply.
  • CI is green (test, Policy Gates, sdk-exports-validation, samples-build).

@copilot-pull-request-reviewer — ready for another look when you get a chance. Thanks!

@bradygaster bradygaster merged commit 61824d0 into bradygaster:dev May 14, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants