Skip to content

fix(test): defeat Linear API's transient "conflict on insert" flake#144

Merged
flipbit03 merged 8 commits into
mainfrom
fix/online-test-conflict-recovery
Apr 18, 2026
Merged

fix(test): defeat Linear API's transient "conflict on insert" flake#144
flipbit03 merged 8 commits into
mainfrom
fix/online-test-conflict-recovery

Conversation

@flipbit03
Copy link
Copy Markdown
Owner

@flipbit03 flipbit03 commented Apr 17, 2026

Problem

CI's Tests (Online) job had been flaking on *Create mutations for several PRs running. Linear's GraphQL API has structural consistency issues that aren't transient (despite looking like flakes): *Create mutations sometimes return "Entity X with id <UUID> already exists" for a UUID that doesn't actually exist server-side, soft-deleted resources accumulate as zombies counting against free-plan limits, and freshly-created resources may not be queryable for several seconds.

Same pattern hit PR #140's post-merge CI before the work in #141 even started — confirming the issue is structural, not introduced by recent work.

Approach

Treat Linear's quirks as the environment we're testing against, not bugs to wait out. Codify three mandatory pillars for every online test (added to CLAUDE.md) and retrofit the entire suite to follow them.

The three pillars

1. Proactive cleanup. Never assume the workspace is empty; never leak resources between tests.

  • cleanup_workspace in lineark-test-utils had a real bug: it queried client.X().first(250).send() without include_archived(true), so soft-deleted projects/issues/documents from prior runs were silently invisible. We discovered 24 zombie teams + 10 stuck-in-trash projects accumulated this way. Fixed.
  • Every test uses create_test_team() (calls cleanup_zombies() once per process + assigns a unique team key) and wraps every created resource in the matching RAII *Guard immediately after the create returns. Guards delete on drop so cleanup happens even when later asserts panic.
  • The cleanup-test-workspace binary now iterates every pooled workspace (not just the random one it draws), so any workspace silently accumulating zombies because it kept losing the random draw still gets tidied.

2. Unique names per attempt. Linear's API returns phantom "conflict on insert" errors keyed on request body content, so retries with identical bodies hit the same sticky conflict.

  • Generate a per-test random suffix: format!("[test] <what> {}", &uuid::Uuid::new_v4().to_string()[..8]).
  • 6 SDK tests had hardcoded titles like "[test] SDK issue_create_and_delete" — no UUID suffix. Two test runs would collide. Added suffixes.
  • run_lineark_with_retry mutates the <name> positional between attempts (appends retry-<uuid6>) so each retry sends a different body. Mutation is gated on [test] prefix so comments create <uuid> and relations create <uuid> (which take UUID positionals) aren't corrupted.
  • Returns (Output, String) so callers can shadow their unique_name with the actually-used name for downstream read-by-name lookups.

3. Retries on every transient-capable call.

  • 32 CLI tests + the create_two_issues helper called lineark <resource> create directly via lineark().args(...).output(), bypassing run_lineark_with_retry. Converted all 30+ sites.
  • run_lineark_with_retry bumped to 15 attempts with [0, 2, 5, 10, 20, 30, 60×9]s backoffs (~9 min worst case) — persistent enough to ride out Linear's cold window without needing a suite-level wrapper.
  • retry_create in lineark-test-utils (used by SDK tests) bumped from 3 → 15 attempts to match.
  • Suite-level retry removed. The previous version wrapped the whole online suite in a 3x retry — wasteful (re-ran 60+ passing tests because 1 failed) and counter-productive (suite-level retries are no longer needed when per-call retries are persistent). Replaced with single-shot suite invocation.
  • continue-on-error: true from an earlier emergency patch is also gone.

Multi-workspace token rotation

~/.linear_api_token_test and the LINEAR_TEST_TOKEN GitHub secret now accept a ;-separated pool of API tokens — one per workspace. test_token() picks one at random per test process and pins it for the process lifetime via OnceLock. Across runs the load distributes ~uniformly across N workspaces, spreading pressure on Linear's free-plan resource limits and trash-retention quirks.

Format details (with unit tests):

  • ; is the primary separator; newlines also work.
  • Trailing ; is harmless; #-prefixed comment lines are dropped.
  • Comment filtering happens before ;-splitting, so a ; inside a comment doesn't accidentally produce fake "tokens".
  • Single-line single-token files (the original format) work unchanged.
  • test_token() logs using workspace N/total (token …<last6>) at process start so a CI-failure triage can instantly identify which workspace was hit.

This PR ships configured with 3 fresh workspaces (cadu-test-2/3/4). The previous workspace (cadus-test-workspace) had 10 trashed projects stuck on Linear's 30-day retention cycle — Linear's API has no permanent-delete for projects, so we abandoned it for fresh workspaces with no stuck history.

Test plan

  • cargo fmt && cargo clippy --workspace --all-targets -- -D warnings clean
  • make test — 235 offline tests pass (this is test-only code; offline suite is unaffected by design)
  • make test-online — 107 online tests pass locally (67 CLI + 40 SDK)
  • CI all 8 checks green: Lint, Tests (Offline), Tests (Online), 5 Build targets
  • CI logs show all 3 pooled workspaces being visited per run (cleanup hits all 3, test processes pick at random)
  • No serde_json::Value::Null injection or hand-written mutation(...) strings remain under crates/lineark/src/commands/

@flipbit03 flipbit03 force-pushed the fix/online-test-conflict-recovery branch from 6c5704c to d47de18 Compare April 17, 2026 14:02
CI's Tests (Online) job has been flaking on `projects create` for
several PRs. Linear's API has a transient failure mode where `*Create`
mutations return "Entity X with id <UUID> already exists" for a UUID
that doesn't actually exist server-side (verified by `read` immediately
returning "Entity not found"). The same pattern hit PR #140's post-merge
CI before the work in #141 even started.

Local probing shows a clear cold-start signature: from a fresh process,
the first ~3 attempts at `projects create` fail back-to-back, then it
abruptly starts working (next 7 succeed in a row). The previous helper's
3-retry, 14s-backoff budget routinely runs out before the cold window
closes.

Layered fix:

1. **`run_lineark_with_retry`** (in tests/online.rs):
   - Up to 8 attempts with `[0, 2, 5, 10, 20, 30, 45, 60]`s backoff
     (~172s worst case) so we ride out the cold window.
   - Mutate the request body each retry by appending a fresh suffix to
     the `<name>` positional. Linear's stuck UUIDs are keyed on body
     content, so a different body avoids the cached state.
   - Returns `(Output, String)` where the second element is the name
     actually used. Callers shadow their `unique_name` with this so
     downstream `read by name` lookups match server state.

2. **CI workflow + Makefile** (`ci.yml`, `ci-online-fork.yml`,
   `Makefile`): wrap the whole online suite in a 3x retry with a
   workspace clean between attempts. Belt-and-suspenders for the
   occasional case where even the per-call helper's budget isn't enough.
@flipbit03 flipbit03 force-pushed the fix/online-test-conflict-recovery branch from d47de18 to 5d23aa8 Compare April 17, 2026 14:03
@flipbit03 flipbit03 changed the title fix(test): recover after Linear's "conflict on insert" on projects create fix(test): defeat Linear API's transient "conflict on insert" flake Apr 17, 2026
Linear's GraphQL API is returning spurious "conflict on insert" errors
and occasional HTTP 502s on `projects create` today — reproducible with
a single ad-hoc probe, not just under CI load. No amount of client-side
retry can beat a degraded upstream, and the suite-level 3x retry in the
previous commit still wasn't enough.

Flip the step to `continue-on-error: true` so the online suite still
runs (failures stay visible in the job log for triage) but doesn't gate
the PR check. Revisit once Linear's API stabilizes, or move online
tests to a separate non-required workflow.
… API

Linear's GraphQL API has permanent consistency issues — phantom
"conflict on insert" errors on creates, eventually-consistent search
indexes, occasional 502s. These aren't transient; they're part of the
environment. Online tests must be written defensively around them or
they'll flake in CI regardless of our code.

Replace the wall-of-text paragraph with a structured "three pillars"
section that names the mandatory patterns and points at the helpers
that implement each one:

1. Proactive cleanup — cleanup_zombies + RAII guards + create_test_team.
2. Unique names per attempt — UUID-suffixed names; retry helpers MUST
   mutate the body between attempts (run_lineark_with_retry appends a
   fresh suffix and returns the actually-used name).
3. Retries on every transient-capable call — retry_create for mutations,
   retry_search / retry_with_backoff for reads-after-create, settle()
   for propagation waits. Note the CI-level 3x suite retry as
   belt-and-suspenders, not a substitute.

Also pins down the "check exit before parsing output" and
"no-production-tokens" rules that were implicit.
Audit-driven sweep of every online test against the cleanup / unique-name /
retry pillars now codified in CLAUDE.md. Result: 100% pass on a fresh
workspace (67 CLI + 40 SDK = 107 tests, ~5 min runtime).

Pillar 1 (proactive cleanup):
- `cleanup_workspace` was silently missing archived/trashed resources for
  issues, documents, projects, labels, and teams — the list calls didn't
  pass `include_archived(true)`. That left zombies accumulating across
  runs (we found 10 stuck-in-trash projects + 24 zombie teams in a
  single workspace). Fix: pass `include_archived(true)` on every list,
  and document why in the helper.

Pillar 2 (unique names per attempt):
- 6 SDK tests had hardcoded titles like `"[test] SDK issue_create_and_delete"`
  with no UUID suffix. Two test runs would collide. Added `Uuid::new_v4()[..8]`
  suffixes to each.

Pillar 3 (retries on every transient-capable call):
- 32 CLI tests + the `create_two_issues` helper called `lineark create`
  directly via `lineark().args(...).output()`, bypassing
  `run_lineark_with_retry`. Converted all 30+ sites; preserved
  `unique_name` shadowing where downstream lookups need the
  actually-used name.
- `run_lineark_with_retry` bumped from 8 → 15 attempts with backoffs
  `[0,2,5,10,20,30,60×9]` (~9 min worst case). Body mutation gated on the
  `[test]` prefix so `comments create <uuid>` and `relations create <uuid>`
  positionals aren't corrupted.
- `retry_create` in `lineark-test-utils` bumped from 3 → 15 attempts to
  match the CLI helper. Per-test persistence is the right pattern;
  re-running the whole suite for one bad test wastes cycles.

Suite-level retry removed:
- The CI workflows + Makefile previously wrapped the suite in a 3x retry
  + `continue-on-error: true`. Both gone. Per-call retries are
  persistent enough that any test failure is a real regression worth
  blocking on.
`~/.linear_api_token_test` (and the `LINEAR_TEST_TOKEN` GitHub secret)
now accepts a `;`-separated pool of API tokens — one per workspace.
`test_token()` picks one at random per test process and pins it for the
process lifetime via `OnceLock`, so a single test run stays consistent
while consecutive runs spread across the pool.

Why per-process and not per-test: a single test session creates teams +
projects + issues that reference each other; mid-run token swap would
orphan resources. The load-spreading benefit comes from the *number of
runs over time*, not granularity within one run.

Format details:
- `;` is the primary separator. Newlines also separate (so single-line
  and one-per-line files both work).
- Trailing `;` is harmless (`tok;` parses as one token).
- Lines whose first non-whitespace is `#` are comments. Important: comment
  filtering happens *before* `;`-splitting, so a `;` inside a comment
  doesn't accidentally produce fake "tokens" — caught with a unit test.
- Backwards compatible: a single-line single-token file still works.

Logs the chosen index/N and the last 6 chars of the token at process
start, so a CI-failure triage can identify which workspace was hit
without dumping full secrets.

7 unit tests cover the parser. Verified locally: 100% green online suite
(67 CLI + 40 SDK) with 3-workspace pool active.
Previously the cleanup binary called `test_token()` and cleaned only the
one randomly-chosen workspace. With token rotation in place, that meant
the explicit CI cleanup step ran on a *different* workspace than the
test process drew — useless ceremony. The other workspaces silently
accumulated zombies until they happened to lose three random draws in
a row and the in-process `cleanup_zombies()` finally caught up.

New behaviour: the binary loads every token from the pool via
`all_test_tokens()` (newly exported from `lineark-test-utils`) and calls
`cleanup_workspace` against each, with per-workspace failures isolated
so one bad workspace doesn't block the rest. Logs a `cleanup: workspace
N/total (token …<last6>)` line per pass — same triage tag the
`test_token` log uses, so a failure lines up visually with which
workspace the actual tests then picked.

Verified locally: 100% green online suite (67 CLI + 40 SDK), three
cleanup messages in the log show all pooled workspaces visited before
the test phase began.
Earlier in this PR I added the three-pillars CLAUDE.md section pointing
at the helpers as they existed at that commit. Subsequent commits in the
same PR bumped retry counts (3→15 for retry_create, 8→15 for the CLI
helper), removed the suite-level 3x retry, and removed
`continue-on-error: true` — but CLAUDE.md was never refreshed and ended
up advertising state that no longer exists. Caught by an independent
review.

Updates:
- Pillar 3: corrected attempt counts and backoff schedule for both
  helpers; called out that body mutation is gated on the `[test]`
  prefix (so UUID-positional commands like `comments create <uuid>` /
  `relations create <uuid>` aren't corrupted on retry).
- "CI-level belt-and-suspenders" bullet replaced with "No suite-level
  retry" — explains why and notes the historical context.
- New bullet under "Other online-test rules" documenting the
  `;`-separated multi-workspace token pool, `OnceLock` per-process pin,
  and the `cleanup-test-workspace` binary's whole-pool sweep.
@flipbit03 flipbit03 merged commit d025801 into main Apr 18, 2026
8 checks passed
@flipbit03 flipbit03 deleted the fix/online-test-conflict-recovery branch April 18, 2026 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant