test(gateway): de-flake integration tests via shared helper#407
Merged
Conversation
The gateway integration tests spawned `slv gateway run` as a subprocess on a RANDOM port and polled /healthz for 10s. Two flake sources in CI: - a random port could already be in use → bind fail → the test polled a dead process for the full 10s; - a cold `deno run --no-check src/index.ts` recompiles the whole CLI graph, and under the parallel load of all 5 files a single start could exceed 10s. Extract the duplicated plumbing into `cli/test/integration/_gateway_helpers.ts` and fix all three: - `pickPort()` returns an OS-assigned free loopback port (bind :0) — no collisions; - `waitForHealthz` watches `child.status` and fails fast with stderr if the subprocess exits before healthz, instead of polling a dead process; - default healthz timeout 10s → 30s; - warm the Deno module cache once per run (`deno cache <entry>`) so every spawn boots from a warm graph. No test assertions/scenarios changed. Local run: 18 passed / 0 failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review follow-up: homePrefix was exposed on spawnGateway/startGateway but never passed by any caller. Drop it; temp dirs use a single 'slv-gw-it-' prefix (each makeTempDir is already unique). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The
cli/test/integration/gateway_*.test.tssuite intermittently fails CI with:This is the flake that has been forcing reruns on essentially every PR. Two root causes:
30000 + random*10000with no free-port check; an in-use port → the subprocess fails to bind → the test polls a dead process for the full 10s.deno run -A --no-check src/index.ts gateway run, which recompiles the whole CLI module graph; under the parallel load of all 5 files a single cold start can exceed the 10s deadline on loaded CI runners.Fix
Extracted the duplicated spawn/health/port/cleanup plumbing (copy-pasted across 5 files) into
cli/test/integration/_gateway_helpers.ts, and fixed all three failure modes there:pickPort()binds127.0.0.1:0and returns the OS-assigned free port — no more collisions.waitForHealthzwatcheschild.statusand fails fast (surfacing stderr) if the subprocess exits before/healthz, instead of polling a dead process until the deadline.deno cache <entry>) so every spawn boots from a warm graph — the decisive fix under parallel load.The 5 test files now import the helper and drop their local duplicates. No test assertions or scenarios were changed — only the plumbing.
Testing
cd cli && deno test -A --config=deno.json test/integration/gateway_*.test.ts→ 18 passed / 0 failed, run multiple times consecutively, all green.deno fmt --checkanddeno checkclean on all 6 files.🤖 Generated with Claude Code