perf(resolution): memoize squad-dir lookups; dedupe squads.json reads#1112
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce repeated filesystem work in Squad’s SDK resolution logic (memoizing .squad/ discovery and deduping squads.json reads), and additionally introduces several repo-health automation scripts/workflows and a new CLI skill command.
Changes:
- Added a TTL-bounded in-process cache for
resolveSquad()/findSquadDir()with explicit invalidation viaclearResolveSquadCache(). - Deduped
multi-squad.resolveSquadPath()to loadsquads.jsononce per call and added/updated tests for cache behavior. - Added multiple new CI/repo-health workflows + scripts (impact analysis, PR readiness, security/architectural review) and introduced
squad skill(APM integration).
Show a summary per file
| File | Description |
|---|---|
| test/template-sync.test.ts | Re-sync templates before parity checks |
| test/scripts/security-review.test.ts | Update expected finding categories |
| test/scripts/security-review-skills.test.ts | Removed skills-scanner tests |
| test/scripts/risk-scorer.test.ts | New tests for risk scoring |
| test/scripts/parse-diff.test.ts | New tests for diff parsing |
| test/resolve-cache.test.ts | New tests for resolution cache + dedupe |
| test/resolution.test.ts | Clear cache between resolution tests |
| test/platform-adapter.test.ts | Regression tests for dotted repo names |
| test/migrate-directory.test.cjs | Align .NET workflow expectation with upgrade |
| test/cross-package-exports.test.ts | New runtime export smoke test |
| test/comms-teams-integration.test.ts | Update warning string expectation |
| test/cli/upgrade.test.ts | Use tmpdir; remove customized-skill warnings tests |
| test/cli/loop.test.ts | Update loop error expectation text |
| test/cli/init.test.ts | Use tmpdir for init tests |
| test/cli-global.test.ts | Clear resolve cache in global CLI tests |
| scripts/security-review.mjs | New diff-based security scan script |
| scripts/repo-health-comment.mjs | Shared comment upsert/delete utility |
| scripts/pr-readiness.mjs | New PR readiness checker + comment upsert |
| scripts/impact-utils/risk-scorer.mjs | New risk tier calculator |
| scripts/impact-utils/report-generator.mjs | New markdown impact report generator |
| scripts/impact-utils/parse-diff.mjs | New diff/file-status parsing helpers |
| scripts/check-squad-leakage.mjs | New .squad/ leakage detector |
| scripts/check-bootstrap-deps.mjs | New bootstrap import-dependency gate |
| scripts/architectural-review.mjs | New architectural review scanner |
| scripts/analyze-impact.mjs | New gh-driven impact analysis generator |
| packages/squad-sdk/src/resolution.ts | Add memoized resolution + cache clearing API |
| packages/squad-sdk/src/platform/detect.ts | Allow dots in repo names for remote parsing |
| packages/squad-sdk/src/multi-squad.ts | Read squads config once per resolution call |
| packages/squad-sdk/src/index.ts | Re-export clearResolveSquadCache() |
| packages/squad-sdk/package.json | Bump SDK version to 0.9.4 |
| packages/squad-cli/src/cli/core/init.ts | Clear SDK resolve cache after init |
| packages/squad-cli/src/cli/commands/watch/agent-spawn.ts | New shared agent spawn utilities |
| packages/squad-cli/src/cli/commands/skill.ts | New squad skill APM integration |
| packages/squad-cli/src/cli/commands/link.ts | Clear SDK resolve cache after link |
| packages/squad-cli/src/cli-entry.ts | Wire new skill command entry |
| packages/squad-cli/package.json | Bump CLI version; pin SDK dep to 0.9.4 |
| package.json | Bump workspace version to 0.9.4 |
| package-lock.json | Lockfile updates (currently inconsistent) |
| index.cjs | Rename hasForce → shouldForce |
| docs/src/navigation.ts | Update SDK API reference nav slug |
| docs/src/content/docs/reference/sdk.md | Fix API reference link |
| docs/src/content/docs/features/state-backends.md | New docs: state backends |
| CHANGELOG.md | Add 0.9.4 header |
| .squad/templates/agents/challenger.md | New challenger agent template |
| .squad/skills/fact-checking/SKILL.md | New fact-checking skill |
| .squad-templates/workflow-wiring-guide.md | New workflow wiring guide template |
| .squad-templates/workflow-wiring-appendix-a-code-reviewer.md | New appendix: reviewer wiring |
| .squad-templates/workflow-wiring-appendix-b-documenter.md | New appendix: documenter wiring |
| .squad-templates/squad.agent.md | Update coordinator template guidance |
| .github/workflows/squad-scope-check.yml | New repo-health scope boundary workflow |
| .github/workflows/squad-repo-health.yml | New repo-health suite workflow |
| .github/workflows/squad-pr-readiness.yml | New PR readiness workflow |
| .github/workflows/squad-pr-nudge.yml | New scheduled PR nudge workflow |
| .github/workflows/squad-npm-publish.yml | Tighten lockfile integrity check condition |
| .github/workflows/squad-impact.yml | New impact-analysis comment workflow |
| .github/workflows/squad-docs-links.yml | New manual docs link-check workflow |
| .github/copilot-instructions.md | Document PR nudge workflow behavior |
| .changeset/watch-rate-limit-detection.md | Removed changeset |
| .changeset/watch-p0-p1-fixes.md | Removed changeset |
| .changeset/teams-adapter-security.md | Removed changeset |
| .changeset/teams-adapter-fixes.md | Removed changeset |
| .changeset/start-tunnel-node-pty.md | Removed changeset |
| .changeset/skill-security-scanner.md | Removed changeset |
| .changeset/shell-injection-fixes.md | Removed changeset |
| .changeset/review-findings-fix.md | Removed changeset |
| .changeset/pid-tracker-cleanup.md | Removed changeset |
| .changeset/monorepo-subfolder-support.md | Removed changeset |
| .changeset/fix-watch-windows-shared-fetch.md | Added changeset |
| .changeset/fix-copilot-message-flag.md | Removed changeset |
| .changeset/fix-cast-base-path.md | Removed changeset |
| .changeset/external-capability-loading.md | Removed changeset |
| .changeset/dynamic-state-discovery.md | Removed changeset |
| .changeset/deprecate-tunnel-rc-repl.md | Added changeset |
| .changeset/audit-onboarding-skill-guards.md | Removed changeset |
| .changeset/apm-integration.md | Added changeset |
Copilot's findings
- Files reviewed: 76/77 changed files
- Comments generated: 6
PR Checkup — feedback triagedStatus: 4 of 6 threads resolved (out-of-scope bugs to be addressed in a focused repo-health PR + rebase). 2 threads left open: the meta scope thread on Root cause: PR was branched from stale Recommended next step: focused repo-health PR + rebase + re-request review. @copilot-pull-request-reviewer — thanks for the careful review. |
ef71ffc to
3c9723b
Compare
Two repeated-FS-work hotspots in the SDK called many times per CLI invocation and per watch-loop tick in Ralph. Changes ------- 1. resolution.ts — wrap resolveSquad() and findSquadDir() with a TTL-bounded in-process cache (5 s) keyed on the absolute startDir. Each lookup previously did 2-3 syscalls per directory level walking toward the filesystem root. 2. multi-squad.ts — resolveSquadPath() called loadSquadsConfig() up to three times per invocation. Now reads squads.json once after the migrateIfNeeded() call (intentionally kept after migration so any post-migration write is observed). 3. clearResolveSquadCache() exported from the SDK barrel so any code or test that creates/moves/deletes a .squad/ directory can invalidate the cache without waiting for the TTL. 4. runInit() now calls clearResolveSquadCache() immediately after sdkInitSquad() so any later resolveSquad()/resolveSquadPaths() in the same process observes the new .squad/ directory. Safety ------ * SQUAD_NO_RESOLVE_CACHE=1 disables both caches entirely (handy for tests that exhaustively exercise the walk algorithm). * test/resolution.test.ts and test/cli-global.test.ts now call clearResolveSquadCache() in beforeEach/afterEach. Without this, three pre-existing tests that share a TMP startDir across cases would observe cached results from earlier cases. The fix actually stabilises these tests — they were ordering-flaky on main. Tests ----- * test/resolve-cache.test.ts (new, 9 cases): - positive hit returns cached path - cached null served when .squad/ created mid-run (until invalidation) - cached path served when .squad/ deleted mid-run (until invalidation) - SQUAD_NO_RESOLVE_CACHE=1 disables the cache - clearResolveSquadCache() is a safe no-op - TTL expiry causes re-walk after 5 s (via vi.spyOn(Date.now)) - separate startDir keys are independent - barrel export wires the same function reference - resolveSquadPath() parses squads.json exactly once per call Follow-ups (not in this PR — best to land each separately so revert is cheap) ---------------------------------------------------------------------------- * Wire clearResolveSquadCache() into runLink, runUpgrade, writeRemoteConfig, and the Ralph watch loop. Default TTL (5 s) already keeps these correct; explicit invalidation just makes the behavior instant. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3c9723b to
f36ea80
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…root-resolver hooks PR bradygaster#1112 introduced TTL-bounded memoization in resolveSquadPaths()/resolveSquad(). Tests share the same TMP path; the first test that runs caches null for TMP, and subsequent tests that recreate the .squad dir get the stale null from cache. Add clearResolveSquadCache() to beforeEach AND afterEach in: - test/state-backend.test.ts (resolveSquadState, resolveStateBackend, StateBackendStorageAdapter suites) - test/dual-root-resolver.test.ts (global hooks) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
✅ All previously-flagged review feedback has been addressed:
@copilot-pull-request-reviewer — ready for another look when you get a chance. Thanks! |
- Remove 'Reports scores as PR comments' claim from README (F1) - Drop unnecessary pull-requests: write permission from CI (F1) - Tighten output-matches patterns to require routing context (F2) - Add agent charters to skill-invocation fixtures (F3) - Add name: field to all prompt graders for JUnit attribution (F4) - Fix tier vocabulary inconsistency in task-completion rubric (P3) - Replace em dashes with hyphens in eval YAML (P3) Closes bradygaster#1112 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Two repeated-FS-work hotspots in the SDK that get called many times per CLI invocation and per watch-loop tick in Ralph:
resolveSquad()andfindSquadDir()walk up fromstartDirtoward/, doing 2-3 syscalls per directory level. Every CLI entry point, every config load, every agent spawn re-walks the same path.resolveSquadPath()inmulti-squad.tspreviously calledloadSquadsConfig()2-3 times per invocation.loadSquadsConfig()doesexistsSync+readFileSync+JSON.parsewith no internal caching.What Changed
Resolution cache
packages/squad-sdk/src/resolution.ts— wrapresolveSquad()andfindSquadDir()with a TTL-bounded in-process cache:startDir..squad/found) and negative (not found) results are cached.SQUAD_NO_RESOLVE_CACHE=1env var disables the cache entirely (handy for tests that exhaustively exercise the walk algorithm).clearResolveSquadCache()for callers that mutate.squad/and need immediate invalidation without waiting for the TTL.runInitandrunLinkinvalidation hookspackages/squad-cli/src/cli/core/init.ts— callsclearResolveSquadCache()immediately aftersdkInitSquad()so any later code in the same process sees the new.squad/.packages/squad-cli/src/cli/commands/link.ts— same treatment after writing.squad/config.json.Multi-squad config dedupe
packages/squad-sdk/src/multi-squad.ts:resolveSquadPath()— readssquads.jsononce after themigrateIfNeeded()call (intentionally kept after migration so any post-migration write is observed).Other functions in the module (
createSquad,deleteSquad,switchSquad,listSquads) were unchanged — they each callloadSquadsConfig()exactly once already.Pre-existing test fixes
Three tests in
test/resolution.test.tsandtest/cli-global.test.tsshare aTMPconstant asstartDiracross cases inside the samedescribe. Without cache clearing, the cached result from the first case would leak into the next. Both files now callclearResolveSquadCache()inbeforeEach/afterEach. This actually stabilises tests that were ordering-flaky onmain.Testing
test/resolve-cache.test.ts(new, 9 cases):.squad/is created mid-run (until invalidation).squad/is deleted mid-run (until invalidation)SQUAD_NO_RESOLVE_CACHE=1disables the cacheclearResolveSquadCache()is a safe no-op on empty cachevi.spyOn(Date, 'now'))startDirkeys cached independentlyresolveSquadPath()parsessquads.jsonexactly once per calltest/resolution.test.ts(26),test/worktree.test.ts,test/cli-global.test.ts,test/cli/team-root-resolution.test.ts,test/sdk-feature-parity.test.tspass — 62/62 across the affected suite.npm run lintclean.Follow-ups (not in this PR — land each separately so revert is cheap)
clearResolveSquadCache()intorunUpgradeandwriteRemoteConfig. The 5 s TTL keeps these correct in practice (stale data self-corrects), but explicit invalidation is cleaner.Notes
Part of a five-PR perf series. Companion PRs: