External ingest: harden finalize against 412 phase mismatch#322254
Open
bhavyaus wants to merge 3 commits into
Open
External ingest: harden finalize against 412 phase mismatch#322254bhavyaus wants to merge 3 commits into
bhavyaus wants to merge 3 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the Copilot external-ingest indexing flow against server-side phase gating, specifically addressing cases where POST /external/code/ingest/finalize returns HTTP 412 (“phase mismatch”) after document upload appears to have succeeded. It improves correctness and diagnosability by reconciling missing documents and surfacing real upload failures earlier, and it adds protocol-level tests using a mock GitHub fetcher.
Changes:
- Add a bounded finalize retry loop that re-polls
/external/code/ingest/batch, re-uploads still-missing documents, backs off briefly, and retries finalize. - Stop counting failed document uploads as successful; instead fail the pass with a descriptive error (including status and requestId when available), and abort deterministically on 404 (“ingest gone”).
- Add an
ExternalIngestClienttest suite that exercises the real client against a mockIGithubApiFetcherServicefor retries, re-upload behavior, retry exhaustion, and error surfacing.
Show a summary per file
| File | Description |
|---|---|
| extensions/copilot/src/platform/workspaceChunkSearch/node/codeSearch/externalIngestClient.ts | Implements finalize 412 reconciliation/retry and makes document-upload failures fail the ingestion pass deterministically. |
| extensions/copilot/src/platform/workspaceChunkSearch/test/node/externalIngest.spec.ts | Adds end-to-end-ish protocol tests for finalize retries and upload failure behavior using a recording mock fetcher. |
Copilot's findings
Comments suppressed due to low confidence (1)
extensions/copilot/src/platform/workspaceChunkSearch/node/codeSearch/externalIngestClient.ts:473
- The upload task throws when the server requests a docSha that isn't present in
mappings, but that rejection is later swallowed by the outerPromise.all(uploading)error handler, letting the pass continue and potentially turning a client-side mapping mismatch into a confusing finalize 412/retry flow. Treat this as a recorded failure so the pass fails deterministically with a descriptive error.
const fileEntry = mappings.get(requestedDocSha);
if (!fileEntry) {
throw new Error(`No mapping for docSha: ${requestedDocSha}`);
}
- Files reviewed: 2/2 changed files
- Comments generated: 1
Fixes the "Build Codebase Semantic Index" failure for non-GitHub repos
where `/external/code/ingest/finalize` returns HTTP 412 ("request sent
for phase which was not the current phase"). The server only advances to
the finalize phase once it has received every expected document, so a
finalize can fail when the server is still missing documents.
- Retry finalize with backoff (bounded) after a 412, re-polling `/batch`
for documents the server still reports as missing and re-uploading them.
- Stop swallowing per-document upload failures: collect them and fail the
pass with a descriptive error (status + requestId) before finalize, so
the real cause is surfaced instead of a misleading finalize 412. Failed
uploads are no longer counted as uploaded.
- Make a `/document` 404 ("ingest is gone") abort the pass deterministically
rather than relying on a rejection an intermediate `Promise.all` can swallow.
- When finalize keeps returning 412 but the server reports no missing
documents, surface a distinct "likely server-side" error after retries.
Adds an ExternalIngestClient test suite driving the real client through the
ingest HTTP protocol (retry-then-success, re-upload still-missing, exhaustion,
server-side-no-missing-docs, persistent upload failure, and 404 abort).
Refs #320915
- Test suite uses fake timers so finalize-retry backoff doesn't add real wall-clock time - Record unmapped docSha as a deterministic upload failure instead of throwing (rejection was swallowed by Promise.all) - updatedFileCount counts unique uploaded docShas across passes instead of summing per-pass attempts - Add test for the unmapped-docSha failure path
46d97fe to
ec99d2f
Compare
auto-merge was automatically disabled
June 21, 2026 05:48
Pull request was converted to draft
createPlatformServices() wires a static auth service backed by createStaticGitHubTokenProvider(), which throws when no GITHUB_PAT/GITHUB_OAUTH_TOKEN is set (CI). Override IAuthenticationService with a static token so getAuthToken() resolves deterministically. Verified by running the suite with the token env vars unset.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the "GitHub Copilot: Build Codebase Semantic Index" failure reported in #320915, where, for non-GitHub repositories, document upload succeeds but
POST /external/code/ingest/finalizereturns HTTP 412 withrequest sent for phase which was not the current phase.Root cause
The external-ingest protocol is phase-gated server-side: create checkpoint → reconcile coded symbols → upload documents → finalize. The server only advances to the finalize phase once it has received every expected document. The client previously:
finalizeexactly once with no reconciliation, and412rather than the real cause.Fixes #320915