diagnostic(ci): instrument checkout sites to capture includeIf flake state#402
Merged
Merged
Conversation
The 'could not read Username for https://github.com' / 'Bad credentials' failures across today's runs (#388 macos, #393 main-push ubuntu, #401 ubuntu test, etc.) all happen inside actions/checkout@v6.0.2's 'Fetching the repository' step right after a successful 'Setting up auth' that writes an includeIf-scoped credentials config. Failure looks consistent with includeIf gitdir path resolution mismatch, but we have no direct evidence yet — the hypothesis ought to be confirmed before we swap to a workaround that downgrades security (persist-credentials: true) or replaces actions/checkout outright. This PR adds .github/actions/diagnose-checkout-failure (composite action) and wires every actions/checkout call site in ci.yml to: 1. continue-on-error: true on the checkout step (id: checkout) 2. follow-up step that runs only on steps.checkout.conclusion == 'failure' and dumps: - git version - workspace path / readlink chain (catches symlink mismatches) - .git/config contents (raw) - resolved gitdir from git rev-parse --absolute-git-dir - effective config after includeIf evaluation - existence of every includeIf-referenced path - leftover credentials files in RUNNER_TEMP / /tmp / /github/runner_temp - per-component readlink of GITHUB_WORKSPACE (symlink detection) 3. Re-fails the job at the end so the diagnostic doesn't silently convert a real failure into a pass. Once the next flake fires, the dump will tell us which of: - includeIf path doesn't match actual gitdir - credentials file got cleaned up between setup and fetch - workspace path has a symlink we didn't anticipate - the runner had no token in the first place …actually caused the failure, and we can apply the proportionate fix. This PR is intentionally diagnostic-only: no behavior change beyond the failure-path noise. Once root cause is identified and fixed, revert by removing the diagnostic step / continue-on-error from each checkout block.
4 tasks
The 'Update branch' merge commit was authored by the GitHub UI using GITHUB_TOKEN, which by GH Actions design does not trigger downstream pull_request workflows on the resulting SHA. Only the Mergify check (lazy-evaluated) registered against 53e93a7; CI / CodeQL / ecp PR analyze all targeted the prior SHA (8ae2fc0). This empty commit emits a user-authored push so pull_request:synchronize fires and the full check matrix runs against the actual PR head.
Contributor
ecp impact cache (0 symbols) — internal, used by
|
coseto6125
added a commit
that referenced
this pull request
May 23, 2026
… ref Root cause for today's recurring 'could not read Username for https://github.com' flakes (#388 macos, #393 ubuntu, #395 dep-review, #397 main-push, #401 ubuntu Test, #402 instrumentation): the runner image ships with a default credential.helper in /etc/gitconfig that errors with ENXIO when git falls back to it. actions/checkout sets up http.extraheader scoped to the repo URL, but on certain runner image revisions the auth setup escapes our isolation and the system helper gets invoked anyway. Rather than work around the broken helper (which would leave a permanent shellcheck-style `-c credential.helper=` debt on every git command), we eliminate the failure surface entirely: the only steps that do post-checkout fetches all want the same thing — main's tip SHA — and GitHub already provides that in the pull_request event payload (`github.event.pull_request.base.sha`). # ci.yml — `Detect code changes` job Was: git fetch --no-tags origin "$BASE_REF" diff_range="origin/$BASE_REF...HEAD" Now: # Event payload exposes base.sha for free; checkout used default # ref (refs/pull/N/merge) so both sides are in local object DB. diff_range="$BASE_SHA...HEAD" Three-dot range still gives merge-base..HEAD semantics — equivalent to the old behavior, no network needed. # ecp-pr-analyze.yml — drop `Fetch base ref` + recompute branch point locally Was: - uses: actions/checkout@v6.0.2 with: ref: ${{ pull_request.head.sha }} # only PR head ancestors fetched - name: Fetch base ref run: git fetch origin "$BASE_REF:..." # network — triggers ENXIO flake ... BASE=$(git merge-base "origin/$BASE_REF" HEAD) Now: - uses: actions/checkout@v6.0.2 with: fetch-depth: 0 # No `ref:` override. Default refs/pull/N/merge brings both PR # head AND base history into local object DB. - name: Compute branch point + switch HEAD to PR head run: | PR_HEAD=$(git rev-parse HEAD^1) # merge ref's parent 1 BASE_TIP=$(git rev-parse HEAD^2) # merge ref's parent 2 BRANCH_POINT=$(git merge-base "$PR_HEAD" "$BASE_TIP") git checkout "$PR_HEAD" Branch point is the SAME value the old `git merge-base origin/<base>` would produce — but derived purely from local objects (the merge ref's two parents) instead of a network fetch. # Edge cases - PR with merge conflicts: GitHub doesn't compute refs/pull/N/merge, checkout fails. This is correct — conflicted PRs can't merge, so ecp impact analysis would be meaningless. Author resolves conflict, ref recomputed, next run works. - Push to main / merge_group / workflow_dispatch: unchanged code path (already used BEFORE_SHA / blanket 'code=true', no fetch). # Result - One entire class of CI flake eliminated: no post-checkout git fetch means no credential-helper invocation means no ENXIO. - No upstream-bug workaround comment debt. - Slightly faster CI (one fewer network round-trip per PR job). - Closes the path that diagnostic instrumentation in PR #402 was trying to capture; PR #402 can be closed once this lands.
coseto6125
added a commit
that referenced
this pull request
May 23, 2026
… ref (#404) Root cause for today's recurring 'could not read Username for https://github.com' flakes (#388 macos, #393 ubuntu, #395 dep-review, #397 main-push, #401 ubuntu Test, #402 instrumentation): the runner image ships with a default credential.helper in /etc/gitconfig that errors with ENXIO when git falls back to it. actions/checkout sets up http.extraheader scoped to the repo URL, but on certain runner image revisions the auth setup escapes our isolation and the system helper gets invoked anyway. Rather than work around the broken helper (which would leave a permanent shellcheck-style `-c credential.helper=` debt on every git command), we eliminate the failure surface entirely: the only steps that do post-checkout fetches all want the same thing — main's tip SHA — and GitHub already provides that in the pull_request event payload (`github.event.pull_request.base.sha`). # ci.yml — `Detect code changes` job Was: git fetch --no-tags origin "$BASE_REF" diff_range="origin/$BASE_REF...HEAD" Now: # Event payload exposes base.sha for free; checkout used default # ref (refs/pull/N/merge) so both sides are in local object DB. diff_range="$BASE_SHA...HEAD" Three-dot range still gives merge-base..HEAD semantics — equivalent to the old behavior, no network needed. # ecp-pr-analyze.yml — drop `Fetch base ref` + recompute branch point locally Was: - uses: actions/checkout@v6.0.2 with: ref: ${{ pull_request.head.sha }} # only PR head ancestors fetched - name: Fetch base ref run: git fetch origin "$BASE_REF:..." # network — triggers ENXIO flake ... BASE=$(git merge-base "origin/$BASE_REF" HEAD) Now: - uses: actions/checkout@v6.0.2 with: fetch-depth: 0 # No `ref:` override. Default refs/pull/N/merge brings both PR # head AND base history into local object DB. - name: Compute branch point + switch HEAD to PR head run: | PR_HEAD=$(git rev-parse HEAD^1) # merge ref's parent 1 BASE_TIP=$(git rev-parse HEAD^2) # merge ref's parent 2 BRANCH_POINT=$(git merge-base "$PR_HEAD" "$BASE_TIP") git checkout "$PR_HEAD" Branch point is the SAME value the old `git merge-base origin/<base>` would produce — but derived purely from local objects (the merge ref's two parents) instead of a network fetch. # Edge cases - PR with merge conflicts: GitHub doesn't compute refs/pull/N/merge, checkout fails. This is correct — conflicted PRs can't merge, so ecp impact analysis would be meaningless. Author resolves conflict, ref recomputed, next run works. - Push to main / merge_group / workflow_dispatch: unchanged code path (already used BEFORE_SHA / blanket 'code=true', no fetch). # Result - One entire class of CI flake eliminated: no post-checkout git fetch means no credential-helper invocation means no ENXIO. - No upstream-bug workaround comment debt. - Slightly faster CI (one fewer network round-trip per PR job). - Closes the path that diagnostic instrumentation in PR #402 was trying to capture; PR #402 can be closed once this lands.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Today's CI flakes (#388 macos, #393 ubuntu push, #397 push, #401 ubuntu Test) all hit `fatal: could not read Username for 'https://github.com'\` in `actions/checkout@v6.0.2`'s "Fetching the repository" step, right after a successful "Setting up auth" that wrote an `includeIf.gitdir:...path = /tmp/git-credentials-X.config` directive.
Working hypothesis: includeIf gitdir path resolution mismatch — the path checkout writes in includeIf doesn't always match the canonicalized gitdir git later resolves at fetch time (could be symlink, case sensitivity, runner image git version differences). But this is just a hypothesis from log-reading; we have no direct evidence.
Before swapping to a workaround that downgrades security (`persist-credentials: true`) or replaces `actions/checkout` entirely, capture diagnostic data so the fix matches the actual root cause.
What this PR does
Add `.github/actions/diagnose-checkout-failure/action.yml` — composite action that dumps on checkout failure:
Wire `ci.yml`'s 5 `actions/checkout` call sites:
Test plan
Notes