[AAASM-2328] 🐛 (ci): Retry gh release download on cross-repo race (release not found)#65
Conversation
release-node.yml fires on the same tag-push event as agent-assembly's
Release workflow, in parallel. agent-assembly takes ~10 min to build
+ publish the binaries that release-node depends on, so the first
gh release download attempt consistently fails with:
release not found
(observed on v0.0.1-alpha.2 and v0.0.1-alpha.3 dry-runs; would have
failed every prior alpha too if not blocked at earlier steps).
Fix: wrap gh release download in a retry loop. Up to 20 attempts
with 60-second sleeps = 20 min ceiling waiting for the Release.
Distinguish 'release not found' (race condition → retry) from
other errors (auth failure, rate limit, etc. → fail-fast).
Verified locally:
* retry-on-race: 3-attempt simulation with first 2 returning
'release not found' → third succeeds → loop exits cleanly
* fail-fast on other error: 'HTTP 403 rate limit' → grep miss
on 'release not found' → exit 1 without retry
Trade-off vs alternatives considered:
- workflow_run cross-repo trigger: NOT possible; workflow_run
only fires for same-repo workflows
- repository_dispatch from agent-assembly: cleaner architecturally
but needs PAT + extra cross-repo wiring. Defer until needed.
- Manual workflow_dispatch only: loses automation entirely.
Retry-with-backoff is the smallest change that solves the race.
Migrate to repository_dispatch later if polling cost becomes real.
Tracked: AAASM-2328
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Claude Code review — AAASM-2328CI state25/25 SUCCESS — Scope vs. acceptance criteria
Design choices worth recordingWhy retry-with-backoff over alternatives
Retry-with-backoff is the smallest change that solves the actual problem. If 20 min polling per release becomes meaningful CI-minutes cost (~$0.04 at ubuntu-latest rates), revisit Why 20 attempts × 60sagent-assembly's Release pipeline observed durations:
20 min ceiling gives ~2× headroom for outliers. Could tighten to 15 if observed durations stay stable. Subtle correctness detailThe
The exit code we actually want is Not blocking — the current form works correctly under the observed failure modes. But a small follow-up to use VerdictReady for human approval and merge. The fix is correct, locally verified, design choices documented. The pipe-construct quibble in the section above is style, not correctness. — Claude Code (Opus 4.7, 1M context) |



Description
release-node.yml fires on the same tag-push event as agent-assembly's Release workflow, in parallel. agent-assembly takes ~10 min to build + publish the binaries release-node depends on, so the first download attempt consistently fails:
Observed on alpha-2 and alpha-3 dry-runs. Hidden in alpha-1 by an earlier-failing pnpm-lock step (AAASM-2098).
Fix
Wrap
gh release downloadin a retry loop: up to 20 attempts × 60s = 20 min ceiling. Distinguish 'release not found' (race → retry) from other errors (auth, rate-limit → fail-fast).Alternatives considered
workflow_runcross-repo triggerrepository_dispatchfrom agent-assemblyRetry-with-backoff is the smallest change that solves the race.
Local verification
HTTP 403 rate limit→ grep miss on 'release not found' → exit 1 without retryRelated
— Claude Code (Opus 4.7, 1M context)