Skip to content

Flaky test report: committed-code failures on 2026-05-06 #257

@andrross

Description

@andrross

Flaky test report: committed-code failures on 2026-05-06

Tests that failed against committed code (Timer or Post Merge Action builds against main) in the 24 hours ending 2026-05-06T10:00 UTC. All failures were non-reproducible locally with the original seed, consistent with timing-dependent flakiness.

Summary Table (sorted by total builds affected)

# Test Builds Affected First Seen Recent Build Reproduced? Pattern
1 IndexActionIT.testAutoGenerateIdNoDuplicates 254 2024-03-26 75923 No Chronic, worsening
2 SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase 180 2024-04-04 75870 No Chronic, worsening
3 RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest 133 2024-05-03 75883 No Chronic, worsening
4 FullRollingRestartIT.testFullRollingRestart_withNoRecoveryPayloadAndSource 116 2024-10-11 75870 No Chronic, worsening
5 RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodes 103 2024-04-03 75942 No Chronic, worsening
6 RareClusterStateIT.testDisassociateNodesWhileShardInit 33 2024-11-04 75923 No Chronic, spike in Apr 2026
7 SmokeTestMultiNodeClientYamlTestSuiteIT (20_terms/numeric profiler) 33 2025-04-25 75870 No Chronic, stable
8 CloneSnapshotIT.testCloneShallowSnapshotIndex 31 2024-04-02 75929 No Chronic, low-rate
9 CloneSnapshotIT.testCloneAfterRepoShallowSettingDisabled 28 2024-04-11 75929 No Chronic, low-rate
10 EhcacheDiskCacheManagerTests.testCreateAndCloseCacheConcurrently 27 2025-07-30 75902 No Worsening since Apr 2026

Detailed Findings

1. IndexActionIT.testAutoGenerateIdNoDuplicates (SEGMENT replication)

  • Build: 75923, 75870
  • Seed: 15B7E75E9FF97558, 45E706BEDD6C2A23
  • Reproduced locally: No
  • First seen: 2024-03-26
  • Total builds affected: 254
  • Pattern: Chronic flake present since early 2024. Significant worsening in 2026: 11 builds in Feb, 19 in Mar, 21 in Apr, 14 in May (first 6 days). The April 2026 CI runner migration to m7a.8xlarge likely amplified this timing-sensitive test.

2. SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase

  • Build: 75870
  • Seed: 45E706BEDD6C2A23
  • Reproduced locally: No
  • First seen: 2024-04-04
  • Total builds affected: 180
  • Pattern: Long-standing flake. Notable spike in Apr 2026 (16 builds) and May 2026 (8 builds in 6 days). Likely CPU-speed amplified.

3. RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest (SEGMENT)

  • Build: 75883
  • Seed: A8FAB62DF9249109
  • Reproduced locally: No
  • First seen: 2024-05-03
  • Total builds affected: 133
  • Pattern: Chronic. Quiet from Oct 2025 to Jan 2026, then resurgence: 8 in Feb, 11 in Mar, 13 in Apr, 3 in May. Consistent with environmental sensitivity.

4. FullRollingRestartIT.testFullRollingRestart_withNoRecoveryPayloadAndSource (SEGMENT)

  • Build: 75870
  • Seed: 45E706BEDD6C2A23
  • Reproduced locally: No
  • First seen: 2024-10-11
  • Total builds affected: 116
  • Pattern: Large spike in Aug 2025 (24 builds), quiet Sep-Jan, then resurgence Feb-May 2026. Timing-sensitive.

5. RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodes (SEGMENT)

  • Build: 75942
  • Seed: E5DFB55DF375D11E
  • Reproduced locally: No
  • First seen: 2024-04-03
  • Total builds affected: 103
  • Pattern: Similar to sibling test above. Spike in Mar 2026 (17 builds), sustained in Apr (13) and May (5 in 6 days).

6. RareClusterStateIT.testDisassociateNodesWhileShardInit

  • Build: 75923, 75872
  • Seed: 15B7E75E9FF97558, BCE18225FF96300E
  • Reproduced locally: No
  • First seen: 2024-11-04
  • Total builds affected: 33
  • Pattern: Low-rate chronic flake that spiked dramatically in Apr 2026 (12 builds vs 3-4/month prior). Strong candidate for CPU-speed amplification.

7. SmokeTestMultiNodeClientYamlTestSuiteIT (search.aggregation/20_terms/numeric profiler)

  • Build: 75870
  • Seed: 45E706BEDD6C2A23
  • Reproduced locally: No
  • First seen: 2025-04-25
  • Total builds affected: 33
  • Pattern: Relatively stable at 1-5 builds/month since inception. No clear worsening trend.

8. CloneSnapshotIT.testCloneShallowSnapshotIndex

  • Build: 75929
  • Seed: F850481D7A3BDBD2
  • Reproduced locally: No
  • First seen: 2024-04-02
  • Total builds affected: 31
  • Pattern: Low-rate chronic flake, 1-2 builds/month. No significant worsening.

9. CloneSnapshotIT.testCloneAfterRepoShallowSettingDisabled

  • Build: 75929
  • Seed: F850481D7A3BDBD2
  • Reproduced locally: No
  • First seen: 2024-04-11
  • Total builds affected: 28
  • Pattern: Low-rate chronic flake, similar to sibling test. Both failed in the same build (75929), suggesting a shared environmental trigger.

10. EhcacheDiskCacheManagerTests.testCreateAndCloseCacheConcurrently

  • Build: 75902
  • Seed: 1F25B4F7632D6D6C (failure was suite timeout, not assertion)
  • Reproduced locally: No
  • First seen: 2025-07-30
  • Total builds affected: 27
  • Pattern: Worsening since Apr 2026 (7 builds) and May 2026 (6 builds in 6 days). The failure mode is suite timeout (>1200000 msec), not an assertion failure. This suggests the test hangs intermittently rather than producing a wrong result.

Observations

  1. None of the 10 tests reproduced locally with the original seed. This is expected for timing-dependent flakes — the seed controls randomization but not thread scheduling, GC pauses, or network timing.

  2. The April 2026 CI runner migration (m5.8xlarge → m7a.8xlarge) correlates with worsening rates for tests Bump com.diffplug.spotless from 5.6.1 to 6.2.0 #1-6 and Bump guava from 30.1.1-jre to 31.0.1-jre in /distribution/tools/plugin-cli #10. Faster CPUs compress timing windows, making races more likely to manifest.

  3. The highest-impact tests (Bump com.diffplug.spotless from 5.6.1 to 6.2.0 #1-5) are all integration tests involving SEGMENT replication strategy. This parameterized variant appears disproportionately affected, suggesting segment replication introduces additional timing sensitivity.

  4. EhcacheDiskCacheManagerTests (Bump guava from 30.1.1-jre to 31.0.1-jre in /distribution/tools/plugin-cli #10) has a distinct failure mode — suite timeout rather than assertion failure. This likely indicates a deadlock or resource exhaustion rather than a race condition.

Data Source

Historical failure data from https://metrics.opensearch.org/_dashboards (index pattern: gradle-check-*). Includes all build types (Timer, Post Merge Action, Pull Request) for flake rate assessment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions