fix(test): defeat Linear API's transient "conflict on insert" flake#144
Merged
Conversation
6c5704c to
d47de18
Compare
CI's Tests (Online) job has been flaking on `projects create` for several PRs. Linear's API has a transient failure mode where `*Create` mutations return "Entity X with id <UUID> already exists" for a UUID that doesn't actually exist server-side (verified by `read` immediately returning "Entity not found"). The same pattern hit PR #140's post-merge CI before the work in #141 even started. Local probing shows a clear cold-start signature: from a fresh process, the first ~3 attempts at `projects create` fail back-to-back, then it abruptly starts working (next 7 succeed in a row). The previous helper's 3-retry, 14s-backoff budget routinely runs out before the cold window closes. Layered fix: 1. **`run_lineark_with_retry`** (in tests/online.rs): - Up to 8 attempts with `[0, 2, 5, 10, 20, 30, 45, 60]`s backoff (~172s worst case) so we ride out the cold window. - Mutate the request body each retry by appending a fresh suffix to the `<name>` positional. Linear's stuck UUIDs are keyed on body content, so a different body avoids the cached state. - Returns `(Output, String)` where the second element is the name actually used. Callers shadow their `unique_name` with this so downstream `read by name` lookups match server state. 2. **CI workflow + Makefile** (`ci.yml`, `ci-online-fork.yml`, `Makefile`): wrap the whole online suite in a 3x retry with a workspace clean between attempts. Belt-and-suspenders for the occasional case where even the per-call helper's budget isn't enough.
d47de18 to
5d23aa8
Compare
Linear's GraphQL API is returning spurious "conflict on insert" errors and occasional HTTP 502s on `projects create` today — reproducible with a single ad-hoc probe, not just under CI load. No amount of client-side retry can beat a degraded upstream, and the suite-level 3x retry in the previous commit still wasn't enough. Flip the step to `continue-on-error: true` so the online suite still runs (failures stay visible in the job log for triage) but doesn't gate the PR check. Revisit once Linear's API stabilizes, or move online tests to a separate non-required workflow.
… API Linear's GraphQL API has permanent consistency issues — phantom "conflict on insert" errors on creates, eventually-consistent search indexes, occasional 502s. These aren't transient; they're part of the environment. Online tests must be written defensively around them or they'll flake in CI regardless of our code. Replace the wall-of-text paragraph with a structured "three pillars" section that names the mandatory patterns and points at the helpers that implement each one: 1. Proactive cleanup — cleanup_zombies + RAII guards + create_test_team. 2. Unique names per attempt — UUID-suffixed names; retry helpers MUST mutate the body between attempts (run_lineark_with_retry appends a fresh suffix and returns the actually-used name). 3. Retries on every transient-capable call — retry_create for mutations, retry_search / retry_with_backoff for reads-after-create, settle() for propagation waits. Note the CI-level 3x suite retry as belt-and-suspenders, not a substitute. Also pins down the "check exit before parsing output" and "no-production-tokens" rules that were implicit.
Audit-driven sweep of every online test against the cleanup / unique-name / retry pillars now codified in CLAUDE.md. Result: 100% pass on a fresh workspace (67 CLI + 40 SDK = 107 tests, ~5 min runtime). Pillar 1 (proactive cleanup): - `cleanup_workspace` was silently missing archived/trashed resources for issues, documents, projects, labels, and teams — the list calls didn't pass `include_archived(true)`. That left zombies accumulating across runs (we found 10 stuck-in-trash projects + 24 zombie teams in a single workspace). Fix: pass `include_archived(true)` on every list, and document why in the helper. Pillar 2 (unique names per attempt): - 6 SDK tests had hardcoded titles like `"[test] SDK issue_create_and_delete"` with no UUID suffix. Two test runs would collide. Added `Uuid::new_v4()[..8]` suffixes to each. Pillar 3 (retries on every transient-capable call): - 32 CLI tests + the `create_two_issues` helper called `lineark create` directly via `lineark().args(...).output()`, bypassing `run_lineark_with_retry`. Converted all 30+ sites; preserved `unique_name` shadowing where downstream lookups need the actually-used name. - `run_lineark_with_retry` bumped from 8 → 15 attempts with backoffs `[0,2,5,10,20,30,60×9]` (~9 min worst case). Body mutation gated on the `[test]` prefix so `comments create <uuid>` and `relations create <uuid>` positionals aren't corrupted. - `retry_create` in `lineark-test-utils` bumped from 3 → 15 attempts to match the CLI helper. Per-test persistence is the right pattern; re-running the whole suite for one bad test wastes cycles. Suite-level retry removed: - The CI workflows + Makefile previously wrapped the suite in a 3x retry + `continue-on-error: true`. Both gone. Per-call retries are persistent enough that any test failure is a real regression worth blocking on.
`~/.linear_api_token_test` (and the `LINEAR_TEST_TOKEN` GitHub secret) now accepts a `;`-separated pool of API tokens — one per workspace. `test_token()` picks one at random per test process and pins it for the process lifetime via `OnceLock`, so a single test run stays consistent while consecutive runs spread across the pool. Why per-process and not per-test: a single test session creates teams + projects + issues that reference each other; mid-run token swap would orphan resources. The load-spreading benefit comes from the *number of runs over time*, not granularity within one run. Format details: - `;` is the primary separator. Newlines also separate (so single-line and one-per-line files both work). - Trailing `;` is harmless (`tok;` parses as one token). - Lines whose first non-whitespace is `#` are comments. Important: comment filtering happens *before* `;`-splitting, so a `;` inside a comment doesn't accidentally produce fake "tokens" — caught with a unit test. - Backwards compatible: a single-line single-token file still works. Logs the chosen index/N and the last 6 chars of the token at process start, so a CI-failure triage can identify which workspace was hit without dumping full secrets. 7 unit tests cover the parser. Verified locally: 100% green online suite (67 CLI + 40 SDK) with 3-workspace pool active.
Previously the cleanup binary called `test_token()` and cleaned only the one randomly-chosen workspace. With token rotation in place, that meant the explicit CI cleanup step ran on a *different* workspace than the test process drew — useless ceremony. The other workspaces silently accumulated zombies until they happened to lose three random draws in a row and the in-process `cleanup_zombies()` finally caught up. New behaviour: the binary loads every token from the pool via `all_test_tokens()` (newly exported from `lineark-test-utils`) and calls `cleanup_workspace` against each, with per-workspace failures isolated so one bad workspace doesn't block the rest. Logs a `cleanup: workspace N/total (token …<last6>)` line per pass — same triage tag the `test_token` log uses, so a failure lines up visually with which workspace the actual tests then picked. Verified locally: 100% green online suite (67 CLI + 40 SDK), three cleanup messages in the log show all pooled workspaces visited before the test phase began.
Earlier in this PR I added the three-pillars CLAUDE.md section pointing at the helpers as they existed at that commit. Subsequent commits in the same PR bumped retry counts (3→15 for retry_create, 8→15 for the CLI helper), removed the suite-level 3x retry, and removed `continue-on-error: true` — but CLAUDE.md was never refreshed and ended up advertising state that no longer exists. Caught by an independent review. Updates: - Pillar 3: corrected attempt counts and backoff schedule for both helpers; called out that body mutation is gated on the `[test]` prefix (so UUID-positional commands like `comments create <uuid>` / `relations create <uuid>` aren't corrupted on retry). - "CI-level belt-and-suspenders" bullet replaced with "No suite-level retry" — explains why and notes the historical context. - New bullet under "Other online-test rules" documenting the `;`-separated multi-workspace token pool, `OnceLock` per-process pin, and the `cleanup-test-workspace` binary's whole-pool sweep.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
CI's Tests (Online) job had been flaking on
*Createmutations for several PRs running. Linear's GraphQL API has structural consistency issues that aren't transient (despite looking like flakes):*Createmutations sometimes return"Entity X with id <UUID> already exists"for a UUID that doesn't actually exist server-side, soft-deleted resources accumulate as zombies counting against free-plan limits, and freshly-created resources may not be queryable for several seconds.Same pattern hit PR #140's post-merge CI before the work in #141 even started — confirming the issue is structural, not introduced by recent work.
Approach
Treat Linear's quirks as the environment we're testing against, not bugs to wait out. Codify three mandatory pillars for every online test (added to
CLAUDE.md) and retrofit the entire suite to follow them.The three pillars
1. Proactive cleanup. Never assume the workspace is empty; never leak resources between tests.
cleanup_workspaceinlineark-test-utilshad a real bug: it queriedclient.X().first(250).send()withoutinclude_archived(true), so soft-deleted projects/issues/documents from prior runs were silently invisible. We discovered 24 zombie teams + 10 stuck-in-trash projects accumulated this way. Fixed.create_test_team()(callscleanup_zombies()once per process + assigns a unique team key) and wraps every created resource in the matching RAII*Guardimmediately after the create returns. Guards delete on drop so cleanup happens even when later asserts panic.cleanup-test-workspacebinary now iterates every pooled workspace (not just the random one it draws), so any workspace silently accumulating zombies because it kept losing the random draw still gets tidied.2. Unique names per attempt. Linear's API returns phantom "conflict on insert" errors keyed on request body content, so retries with identical bodies hit the same sticky conflict.
format!("[test] <what> {}", &uuid::Uuid::new_v4().to_string()[..8])."[test] SDK issue_create_and_delete"— no UUID suffix. Two test runs would collide. Added suffixes.run_lineark_with_retrymutates the<name>positional between attempts (appendsretry-<uuid6>) so each retry sends a different body. Mutation is gated on[test]prefix socomments create <uuid>andrelations create <uuid>(which take UUID positionals) aren't corrupted.(Output, String)so callers can shadow theirunique_namewith the actually-used name for downstreamread-by-namelookups.3. Retries on every transient-capable call.
create_two_issueshelper calledlineark <resource> createdirectly vialineark().args(...).output(), bypassingrun_lineark_with_retry. Converted all 30+ sites.run_lineark_with_retrybumped to 15 attempts with[0, 2, 5, 10, 20, 30, 60×9]s backoffs (~9 min worst case) — persistent enough to ride out Linear's cold window without needing a suite-level wrapper.retry_createinlineark-test-utils(used by SDK tests) bumped from 3 → 15 attempts to match.continue-on-error: truefrom an earlier emergency patch is also gone.Multi-workspace token rotation
~/.linear_api_token_testand theLINEAR_TEST_TOKENGitHub secret now accept a;-separated pool of API tokens — one per workspace.test_token()picks one at random per test process and pins it for the process lifetime viaOnceLock. Across runs the load distributes ~uniformly across N workspaces, spreading pressure on Linear's free-plan resource limits and trash-retention quirks.Format details (with unit tests):
;is the primary separator; newlines also work.;is harmless;#-prefixed comment lines are dropped.;-splitting, so a;inside a comment doesn't accidentally produce fake "tokens".test_token()logsusing workspace N/total (token …<last6>)at process start so a CI-failure triage can instantly identify which workspace was hit.This PR ships configured with 3 fresh workspaces (cadu-test-2/3/4). The previous workspace (cadus-test-workspace) had 10 trashed projects stuck on Linear's 30-day retention cycle — Linear's API has no permanent-delete for projects, so we abandoned it for fresh workspaces with no stuck history.
Test plan
cargo fmt && cargo clippy --workspace --all-targets -- -D warningscleanmake test— 235 offline tests pass (this is test-only code; offline suite is unaffected by design)make test-online— 107 online tests pass locally (67 CLI + 40 SDK)serde_json::Value::Nullinjection or hand-writtenmutation(...)strings remain undercrates/lineark/src/commands/