You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the fourth audit window of 2026-06-02 (after morning, afternoon, and evening), covering completed runs between 19:54Z and 22:06Z. The night window saw 31 runs (28 completed, 3 in-progress) at 85.7% success (24/28). Four failures fell into just two classes: a NEW transient Docker Hub registry-pull timeout (2 runs, pure infra) and the persistent safe-output partial-failure-intolerance class (2 runs, 2 new variants). Both of the day's recurring high-severity classes — token-budget-429 and copilot-sdk session timeout — were absent this window. Notably, 3 of the 4 failures occurred on a single dev feature branch (copilot/add-support-for-copilot-connection-token), so production-main impact was limited to one infra blip.
Window Summary
Metric
Value
Runs (completed / in-progress)
31 (28 / 3)
Success rate (completed)
85.7% (24/28)
Failures
4 across 2 classes
Tokens / effective tokens
14.0M / 63.4M
Turns / action-minutes
276 / 225
Missing tools / data / MCP failures
0 / 0 / 0
Safe items emitted
26
Firewall blocked
287 / 1337 (21.5%)
Engines
copilot 17, claude 9, codex 2, antigravity 1, gemini 1, pi 1
Critical Findings
🆕 docker-registry-pull-timeout (infra, 2 runs) — NEW class
Both Smoke CI (§26849511621, main) and Smoke Codex (§26847739078, feature branch) failed the agent step at container-image pull: Get (registry1.docker.io/redacted) context deadline exceeded for node:lts-alpine, three retries, then exit 123. This is transient Docker Hub registry friction (~21:05–21:39Z) — not agent logic and not firewall-blocked. Two unrelated workflows hitting the same root cause in one window points to a brief registry slow/outage window rather than a workflow defect.
🔁 safe-output-partial-failure-intolerance (2 runs) — RECURRING, 2 new variants
A single failed safe-output message still red-fails the entire safe_outputs job even when other messages succeed:
Changeset Generator (§26847739151) — allowed-files-list-rejection variant: push_to_pull_request_branch rejected because the patch modified pkg/cli/codemod_pull_request_target_checkout_false.go, outside the allowed-files list. The security guard worked as intended, but the single rejection reddened the job.
Smoke Copilot (§26847739375) — dispatch-no-ref-on-branch variant: dispatch_workflow "haiku-printer" failed with No ref found for: refs/heads/copilot/add-support-for-copilot-connection-token; create_discussion and other messages landed fine, but message 3 red-failed the job.
This class has now produced six distinct variants over recent windows and remains the dominant, fixable failure mode.
Positives This Window
✅ token-budget-429-effective-tokensabsent (3rd consecutive window; top effective-token run only 10.65M, well under the 25M cap)
✅ copilot-sdk session.idle/auth timeouts absent (had escalated to prod-main in the evening window)
✅ 0 missing tools, 0 missing data, 0 MCP failures
✅ New-engine smokes (pi / gemini / antigravity) all passed again
Trend Charts (14-day)
Workflow Health — Daily Runs & Success Rate
Success rate has held in a healthy 80–96% band since the 05-23 dip (41.6%, a known bad day). The 06-02 daily rollup (84.2%) sits at the lower end; intraday the day actually trended 84.2% → 89.4% → 97.8% → 85.7% across the four windows, with the night dip driven by infra + dev-branch noise rather than a workflow regression.
Token Usage — Daily + 7-day Moving Average
The 7-day moving average has drifted down from ~60M to ~38.6M tokens/day, with daily totals oscillating between ~14M and ~69M. The 06-02 full-day total (32.7M) is below the moving average — consumption is stable with no runaway-cost trend.
Failure Detail Table
Workflow
Run
Engine
Branch
Failed Job
Class
Smoke CI
26849511621
copilot
main
agent
docker-registry-pull-timeout
Smoke Codex
26847739078
codex
feature
agent
docker-registry-pull-timeout
Changeset Generator
26847739151
codex
feature
safe_outputs
safe-output-partial-failure (allowed-files)
Smoke Copilot
26847739375
copilot
feature
safe_outputs
safe-output-partial-failure (dispatch no-ref)
Firewall Hotspots (by design — smoke probes)
Workflow
Blocked / Total
Smoke Antigravity
10 / 12 (83%)
Smoke Copilot
57 / 127 (45%)
Smoke Gemini
20 / 63 (32%)
Smoke Claude
30 / 101 (30%)
Overall 287/1337 (21.5%) blocked — in line with prior windows. Smoke-test workflows intentionally probe egress, so high block rates here are expected and not a concern.
Recommendations
Harden container base-image pulls against transient Docker Hub timeouts — cache/pin node:lts-alpine in a warm layer, add a registry mirror, or extend retry-with-backoff beyond 3 attempts. (Watch for recurrence first; single-window so far.)
Make safe_outputs tolerate per-message failures — exit non-red when ≥1 message lands and the remaining failures are expected/guarded conditions (allowed-files rejection, dispatch no-ref-on-branch, count-exceeded, missing-issue-context). This is the day's dominant fixable class.
Continue watchingtoken-budget-429 and copilot-sdk timeouts — both quiet this window but intermittent; the SDK class hit prod-main earlier today.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
This is the fourth audit window of 2026-06-02 (after morning, afternoon, and evening), covering completed runs between 19:54Z and 22:06Z. The night window saw 31 runs (28 completed, 3 in-progress) at 85.7% success (24/28). Four failures fell into just two classes: a NEW transient Docker Hub registry-pull timeout (2 runs, pure infra) and the persistent safe-output partial-failure-intolerance class (2 runs, 2 new variants). Both of the day's recurring high-severity classes —
token-budget-429andcopilot-sdk session timeout— were absent this window. Notably, 3 of the 4 failures occurred on a single dev feature branch (copilot/add-support-for-copilot-connection-token), so production-mainimpact was limited to one infra blip.Window Summary
Critical Findings
🆕 docker-registry-pull-timeout (infra, 2 runs) — NEW class
Both Smoke CI (§26849511621,
main) and Smoke Codex (§26847739078, feature branch) failed the agent step at container-image pull:Get (registry1.docker.io/redacted) context deadline exceededfornode:lts-alpine, three retries, then exit 123. This is transient Docker Hub registry friction (~21:05–21:39Z) — not agent logic and not firewall-blocked. Two unrelated workflows hitting the same root cause in one window points to a brief registry slow/outage window rather than a workflow defect.🔁 safe-output-partial-failure-intolerance (2 runs) — RECURRING, 2 new variants
A single failed safe-output message still red-fails the entire
safe_outputsjob even when other messages succeed:push_to_pull_request_branchrejected because the patch modifiedpkg/cli/codemod_pull_request_target_checkout_false.go, outside the allowed-files list. The security guard worked as intended, but the single rejection reddened the job.dispatch_workflow "haiku-printer"failed withNo ref found for: refs/heads/copilot/add-support-for-copilot-connection-token;create_discussionand other messages landed fine, but message 3 red-failed the job.This class has now produced six distinct variants over recent windows and remains the dominant, fixable failure mode.
Positives This Window
token-budget-429-effective-tokensabsent (3rd consecutive window; top effective-token run only 10.65M, well under the 25M cap)copilot-sdksession.idle/auth timeouts absent (had escalated to prod-main in the evening window)Trend Charts (14-day)
Workflow Health — Daily Runs & Success Rate
Success rate has held in a healthy 80–96% band since the 05-23 dip (41.6%, a known bad day). The 06-02 daily rollup (84.2%) sits at the lower end; intraday the day actually trended 84.2% → 89.4% → 97.8% → 85.7% across the four windows, with the night dip driven by infra + dev-branch noise rather than a workflow regression.
Token Usage — Daily + 7-day Moving Average
The 7-day moving average has drifted down from ~60M to ~38.6M tokens/day, with daily totals oscillating between ~14M and ~69M. The 06-02 full-day total (32.7M) is below the moving average — consumption is stable with no runaway-cost trend.
Failure Detail Table
Firewall Hotspots (by design — smoke probes)
Overall 287/1337 (21.5%) blocked — in line with prior windows. Smoke-test workflows intentionally probe egress, so high block rates here are expected and not a concern.
Recommendations
node:lts-alpinein a warm layer, add a registry mirror, or extend retry-with-backoff beyond 3 attempts. (Watch for recurrence first; single-window so far.)safe_outputstolerate per-message failures — exit non-red when ≥1 message lands and the remaining failures are expected/guarded conditions (allowed-files rejection, dispatch no-ref-on-branch, count-exceeded, missing-issue-context). This is the day's dominant fixable class.token-budget-429andcopilot-sdktimeouts — both quiet this window but intermittent; the SDK class hit prod-main earlier today.References:
Beta Was this translation helpful? Give feedback.
All reactions