Skip to content

perf(resolution): memoize squad-dir lookups; dedupe squads.json reads#1112

Merged
bradygaster merged 3 commits into
bradygaster:devfrom
spboyer:perf/resolution-cache
May 14, 2026
Merged

perf(resolution): memoize squad-dir lookups; dedupe squads.json reads#1112
bradygaster merged 3 commits into
bradygaster:devfrom
spboyer:perf/resolution-cache

Conversation

@spboyer
Copy link
Copy Markdown
Contributor

@spboyer spboyer commented May 12, 2026

Summary

Two repeated-FS-work hotspots in the SDK that get called many times per CLI invocation and per watch-loop tick in Ralph:

  1. resolveSquad() and findSquadDir() walk up from startDir toward /, doing 2-3 syscalls per directory level. Every CLI entry point, every config load, every agent spawn re-walks the same path.
  2. resolveSquadPath() in multi-squad.ts previously called loadSquadsConfig() 2-3 times per invocation. loadSquadsConfig() does existsSync + readFileSync + JSON.parse with no internal caching.

What Changed

Resolution cache

packages/squad-sdk/src/resolution.ts — wrap resolveSquad() and findSquadDir() with a TTL-bounded in-process cache:

  • TTL = 5 seconds, keyed on the absolute resolved startDir.
  • Both positive (.squad/ found) and negative (not found) results are cached.
  • SQUAD_NO_RESOLVE_CACHE=1 env var disables the cache entirely (handy for tests that exhaustively exercise the walk algorithm).
  • New exported function clearResolveSquadCache() for callers that mutate .squad/ and need immediate invalidation without waiting for the TTL.
  • Re-exported from the SDK barrel.

runInit and runLink invalidation hooks

  • packages/squad-cli/src/cli/core/init.ts — calls clearResolveSquadCache() immediately after sdkInitSquad() so any later code in the same process sees the new .squad/.
  • packages/squad-cli/src/cli/commands/link.ts — same treatment after writing .squad/config.json.

Multi-squad config dedupe

packages/squad-sdk/src/multi-squad.ts:resolveSquadPath() — reads squads.json once after the migrateIfNeeded() call (intentionally kept after migration so any post-migration write is observed).

Other functions in the module (createSquad, deleteSquad, switchSquad, listSquads) were unchanged — they each call loadSquadsConfig() exactly once already.

Pre-existing test fixes

Three tests in test/resolution.test.ts and test/cli-global.test.ts share a TMP constant as startDir across cases inside the same describe. Without cache clearing, the cached result from the first case would leak into the next. Both files now call clearResolveSquadCache() in beforeEach/afterEach. This actually stabilises tests that were ordering-flaky on main.

Testing

  • test/resolve-cache.test.ts (new, 9 cases):
    • positive hit returns cached path
    • cached null served when .squad/ is created mid-run (until invalidation)
    • cached path served when .squad/ is deleted mid-run (until invalidation)
    • SQUAD_NO_RESOLVE_CACHE=1 disables the cache
    • clearResolveSquadCache() is a safe no-op on empty cache
    • TTL expiry causes re-walk after 5 s (mocked via vi.spyOn(Date, 'now'))
    • separate startDir keys cached independently
    • barrel export wires the same function reference
    • resolveSquadPath() parses squads.json exactly once per call
  • All existing test/resolution.test.ts (26), test/worktree.test.ts, test/cli-global.test.ts, test/cli/team-root-resolution.test.ts, test/sdk-feature-parity.test.ts pass — 62/62 across the affected suite.
  • npm run lint clean.

Follow-ups (not in this PR — land each separately so revert is cheap)

  • Wire clearResolveSquadCache() into runUpgrade and writeRemoteConfig. The 5 s TTL keeps these correct in practice (stale data self-corrects), but explicit invalidation is cleaner.

Notes

Part of a five-PR perf series. Companion PRs:

  • chore: dependency hygiene
  • feat(bench): npm run bench + bench:cold-start runners
  • parallel charter discovery with bounded concurrency
  • non-blocking scheduler script execution

@spboyer spboyer marked this pull request as ready for review May 13, 2026 13:49
Copilot AI review requested due to automatic review settings May 13, 2026 13:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce repeated filesystem work in Squad’s SDK resolution logic (memoizing .squad/ discovery and deduping squads.json reads), and additionally introduces several repo-health automation scripts/workflows and a new CLI skill command.

Changes:

  • Added a TTL-bounded in-process cache for resolveSquad() / findSquadDir() with explicit invalidation via clearResolveSquadCache().
  • Deduped multi-squad.resolveSquadPath() to load squads.json once per call and added/updated tests for cache behavior.
  • Added multiple new CI/repo-health workflows + scripts (impact analysis, PR readiness, security/architectural review) and introduced squad skill (APM integration).
Show a summary per file
File Description
test/template-sync.test.ts Re-sync templates before parity checks
test/scripts/security-review.test.ts Update expected finding categories
test/scripts/security-review-skills.test.ts Removed skills-scanner tests
test/scripts/risk-scorer.test.ts New tests for risk scoring
test/scripts/parse-diff.test.ts New tests for diff parsing
test/resolve-cache.test.ts New tests for resolution cache + dedupe
test/resolution.test.ts Clear cache between resolution tests
test/platform-adapter.test.ts Regression tests for dotted repo names
test/migrate-directory.test.cjs Align .NET workflow expectation with upgrade
test/cross-package-exports.test.ts New runtime export smoke test
test/comms-teams-integration.test.ts Update warning string expectation
test/cli/upgrade.test.ts Use tmpdir; remove customized-skill warnings tests
test/cli/loop.test.ts Update loop error expectation text
test/cli/init.test.ts Use tmpdir for init tests
test/cli-global.test.ts Clear resolve cache in global CLI tests
scripts/security-review.mjs New diff-based security scan script
scripts/repo-health-comment.mjs Shared comment upsert/delete utility
scripts/pr-readiness.mjs New PR readiness checker + comment upsert
scripts/impact-utils/risk-scorer.mjs New risk tier calculator
scripts/impact-utils/report-generator.mjs New markdown impact report generator
scripts/impact-utils/parse-diff.mjs New diff/file-status parsing helpers
scripts/check-squad-leakage.mjs New .squad/ leakage detector
scripts/check-bootstrap-deps.mjs New bootstrap import-dependency gate
scripts/architectural-review.mjs New architectural review scanner
scripts/analyze-impact.mjs New gh-driven impact analysis generator
packages/squad-sdk/src/resolution.ts Add memoized resolution + cache clearing API
packages/squad-sdk/src/platform/detect.ts Allow dots in repo names for remote parsing
packages/squad-sdk/src/multi-squad.ts Read squads config once per resolution call
packages/squad-sdk/src/index.ts Re-export clearResolveSquadCache()
packages/squad-sdk/package.json Bump SDK version to 0.9.4
packages/squad-cli/src/cli/core/init.ts Clear SDK resolve cache after init
packages/squad-cli/src/cli/commands/watch/agent-spawn.ts New shared agent spawn utilities
packages/squad-cli/src/cli/commands/skill.ts New squad skill APM integration
packages/squad-cli/src/cli/commands/link.ts Clear SDK resolve cache after link
packages/squad-cli/src/cli-entry.ts Wire new skill command entry
packages/squad-cli/package.json Bump CLI version; pin SDK dep to 0.9.4
package.json Bump workspace version to 0.9.4
package-lock.json Lockfile updates (currently inconsistent)
index.cjs Rename hasForceshouldForce
docs/src/navigation.ts Update SDK API reference nav slug
docs/src/content/docs/reference/sdk.md Fix API reference link
docs/src/content/docs/features/state-backends.md New docs: state backends
CHANGELOG.md Add 0.9.4 header
.squad/templates/agents/challenger.md New challenger agent template
.squad/skills/fact-checking/SKILL.md New fact-checking skill
.squad-templates/workflow-wiring-guide.md New workflow wiring guide template
.squad-templates/workflow-wiring-appendix-a-code-reviewer.md New appendix: reviewer wiring
.squad-templates/workflow-wiring-appendix-b-documenter.md New appendix: documenter wiring
.squad-templates/squad.agent.md Update coordinator template guidance
.github/workflows/squad-scope-check.yml New repo-health scope boundary workflow
.github/workflows/squad-repo-health.yml New repo-health suite workflow
.github/workflows/squad-pr-readiness.yml New PR readiness workflow
.github/workflows/squad-pr-nudge.yml New scheduled PR nudge workflow
.github/workflows/squad-npm-publish.yml Tighten lockfile integrity check condition
.github/workflows/squad-impact.yml New impact-analysis comment workflow
.github/workflows/squad-docs-links.yml New manual docs link-check workflow
.github/copilot-instructions.md Document PR nudge workflow behavior
.changeset/watch-rate-limit-detection.md Removed changeset
.changeset/watch-p0-p1-fixes.md Removed changeset
.changeset/teams-adapter-security.md Removed changeset
.changeset/teams-adapter-fixes.md Removed changeset
.changeset/start-tunnel-node-pty.md Removed changeset
.changeset/skill-security-scanner.md Removed changeset
.changeset/shell-injection-fixes.md Removed changeset
.changeset/review-findings-fix.md Removed changeset
.changeset/pid-tracker-cleanup.md Removed changeset
.changeset/monorepo-subfolder-support.md Removed changeset
.changeset/fix-watch-windows-shared-fetch.md Added changeset
.changeset/fix-copilot-message-flag.md Removed changeset
.changeset/fix-cast-base-path.md Removed changeset
.changeset/external-capability-loading.md Removed changeset
.changeset/dynamic-state-discovery.md Removed changeset
.changeset/deprecate-tunnel-rc-repl.md Added changeset
.changeset/audit-onboarding-skill-guards.md Removed changeset
.changeset/apm-integration.md Added changeset

Copilot's findings

  • Files reviewed: 76/77 changed files
  • Comments generated: 6

Comment thread .squad-templates/workflow-wiring-guide.md Outdated
Comment thread scripts/pr-readiness.mjs Outdated
Comment thread packages/squad-cli/src/cli/commands/skill.ts Outdated
Comment thread packages/squad-cli/src/cli/commands/skill.ts Outdated
Comment thread test/template-sync.test.ts
Comment thread packages/squad-cli/src/cli-entry.ts Outdated
@spboyer
Copy link
Copy Markdown
Contributor Author

spboyer commented May 13, 2026

PR Checkup — feedback triaged

Status: 4 of 6 threads resolved (out-of-scope bugs to be addressed in a focused repo-health PR + rebase). 2 threads left open: the meta scope thread on cli-entry.ts:908, and the .squad-templates/ mirror question (needs author decision on whether to add mirrors here or rescope).

Root cause: PR was branched from stale main. The intended resolution-cache + squads.json dedupe change is the only in-scope content.

Recommended next step: focused repo-health PR + rebase + re-request review.

@copilot-pull-request-reviewer — thanks for the careful review.

@spboyer spboyer force-pushed the perf/resolution-cache branch from ef71ffc to 3c9723b Compare May 13, 2026 14:49
Two repeated-FS-work hotspots in the SDK called many times per CLI
invocation and per watch-loop tick in Ralph.

Changes
-------
1. resolution.ts — wrap resolveSquad() and findSquadDir() with a TTL-bounded
   in-process cache (5 s) keyed on the absolute startDir. Each lookup
   previously did 2-3 syscalls per directory level walking toward the
   filesystem root.
2. multi-squad.ts — resolveSquadPath() called loadSquadsConfig() up to
   three times per invocation. Now reads squads.json once after the
   migrateIfNeeded() call (intentionally kept after migration so any
   post-migration write is observed).
3. clearResolveSquadCache() exported from the SDK barrel so any code or
   test that creates/moves/deletes a .squad/ directory can invalidate the
   cache without waiting for the TTL.
4. runInit() now calls clearResolveSquadCache() immediately after
   sdkInitSquad() so any later resolveSquad()/resolveSquadPaths() in the
   same process observes the new .squad/ directory.

Safety
------
* SQUAD_NO_RESOLVE_CACHE=1 disables both caches entirely (handy for tests
  that exhaustively exercise the walk algorithm).
* test/resolution.test.ts and test/cli-global.test.ts now call
  clearResolveSquadCache() in beforeEach/afterEach. Without this, three
  pre-existing tests that share a TMP startDir across cases would
  observe cached results from earlier cases. The fix actually stabilises
  these tests — they were ordering-flaky on main.

Tests
-----
* test/resolve-cache.test.ts (new, 9 cases):
  - positive hit returns cached path
  - cached null served when .squad/ created mid-run (until invalidation)
  - cached path served when .squad/ deleted mid-run (until invalidation)
  - SQUAD_NO_RESOLVE_CACHE=1 disables the cache
  - clearResolveSquadCache() is a safe no-op
  - TTL expiry causes re-walk after 5 s (via vi.spyOn(Date.now))
  - separate startDir keys are independent
  - barrel export wires the same function reference
  - resolveSquadPath() parses squads.json exactly once per call

Follow-ups (not in this PR — best to land each separately so revert is cheap)
----------------------------------------------------------------------------
* Wire clearResolveSquadCache() into runLink, runUpgrade, writeRemoteConfig,
  and the Ralph watch loop. Default TTL (5 s) already keeps these correct;
  explicit invalidation just makes the behavior instant.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the perf/resolution-cache branch from 3c9723b to f36ea80 Compare May 13, 2026 15:09
spboyer and others added 2 commits May 13, 2026 11:23
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…root-resolver hooks

PR bradygaster#1112 introduced TTL-bounded memoization in resolveSquadPaths()/resolveSquad().
Tests share the same TMP path; the first test that runs caches null for TMP,
and subsequent tests that recreate the .squad dir get the stale null from cache.

Add clearResolveSquadCache() to beforeEach AND afterEach in:
- test/state-backend.test.ts (resolveSquadState, resolveStateBackend, StateBackendStorageAdapter suites)
- test/dual-root-resolver.test.ts (global hooks)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer
Copy link
Copy Markdown
Contributor Author

spboyer commented May 13, 2026

✅ All previously-flagged review feedback has been addressed:

  • Rebased onto current dev so the diff is now scoped to just memoizing squad-dir lookups — earlier scope-drift comments no longer apply.
  • Test cache pollution fixed in 87ac974clearResolveSquadCache() is now invoked in beforeEach/afterEach of dual-root-resolver and state-backend test suites that reuse the same TMP path.
  • Resolved every unresolved review thread with an explanatory reply.
  • CI is green (test, Policy Gates, sdk-exports-validation, samples-build).

@copilot-pull-request-reviewer — ready for another look when you get a chance. Thanks!

@bradygaster bradygaster merged commit 95f09e7 into bradygaster:dev May 14, 2026
6 checks passed
jongio added a commit to jongio/squad that referenced this pull request May 21, 2026
- Remove 'Reports scores as PR comments' claim from README (F1)
- Drop unnecessary pull-requests: write permission from CI (F1)
- Tighten output-matches patterns to require routing context (F2)
- Add agent charters to skill-invocation fixtures (F3)
- Add name: field to all prompt graders for JUnit attribution (F4)
- Fix tier vocabulary inconsistency in task-completion rubric (P3)
- Replace em dashes with hyphens in eval YAML (P3)

Closes bradygaster#1112

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants