fix(deploy): tag tarballs with run_id to prevent parallel-deploy clobber#175
Conversation
…oy clobber The deploy workflow's concurrency block keys per-environment, so prod and staging dispatches can run in parallel. Both used the same /tmp/<name>.tar.gz paths on the shared VPS, racing each other: 1. Runner A and Runner B both scp to /tmp/nextjs-bundle.tar.gz 2. The second upload wins (overwrites) 3. The first SSH-extract sees the OTHER deploy's bundle 4. After extract, that first deploy rm's the tarball 5. The second extract finds the file gone → exit 1 Today's deploy hit this: staging dispatched 1s after production, production's upload landed first (17:32:23Z), staging's overwrote (17:32:25Z), staging extracted ITS bundle (correct branch by upload ordering luck) and rm'd it, production's extract found nothing. Production failed; staging happened to deploy the right code, but the race could equally have flipped — staging would have silently deployed main's tarball to staging's APP_DIR. Fix: tag every /tmp tarball with $BUNDLE_TAG = github.run_id (a monotonic system-assigned integer per workflow run — no injection risk). Each deploy now owns a unique filename: /tmp/nextjs-bundle-<run_id>.tar.gz /tmp/wp-theme-<run_id>.tar.gz /tmp/wp-plugin-redis-translations-<run_id>.tar.gz /tmp/wp-plugin-cdcf-mcp-<run_id>.tar.gz BUNDLE_TAG is added to the job env so every step inherits it; the SSH commands interpolate it runner-side (same pattern as the existing $APP_DIR / $WP_THEME_DIR) before sending the literal text to the VPS shell. Concurrency group is unchanged — it still prevents a second push to the SAME environment from racing. The change only enables parallel prod + staging to coexist on disk. Verification: `python -c "import yaml; yaml.safe_load(...)"` clean; 14 BUNDLE_TAG references confirmed across create/upload/extract/rm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 26 minutes and 35 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
|
@coderabbitai review |
✅ Action performedReview finished.
|
Problem
The deploy workflow's concurrency group is keyed per-environment:
So a
productiondispatch and astagingdispatch can — and do — run in parallel. Both used the same/tmp/<name>.tar.gzpaths on the shared VPS, racing each other:scpto/tmp/nextjs-bundle.tar.gzrms the tarballexit 1Today's deploy hit this (staging run 27030067375, prod run 27030068492):
17:32:23Z17:32:25Z(overwriting)rmd it at17:32:25–27Z17:32:27Z→ failedStaging deployed the right code by luck of upload ordering. Had the order flipped, staging would have silently deployed
main's tarball to staging's APP_DIR — a worse failure mode (silent wrong-branch deploy) than the visible exit-1 we got.Fix
Tag every
/tmptarball with$BUNDLE_TAG = github.run_id(system-assigned integer per workflow run — no injection risk):BUNDLE_TAGis added to the jobenv:so every step inherits it; the SSH commands interpolate it runner-side (same pattern as the existing$APP_DIR/$WP_THEME_DIR) before sending the literal text to the VPS shell.Concurrency group is unchanged — it still prevents a second push to the SAME environment from racing. The change only enables parallel prod + staging to coexist on disk.
Diff stats
python -c "import yaml; yaml.safe_load(...)"cleanTest plan
fix/deploy-tarball-race) and confirm green