Skip to content

Flaky test report: committed-code failures on 2026-05-07 #258

@andrross

Description

@andrross

Flaky test report: committed-code failures on 2026-05-07

Summary

Analysis of gradle-check failures against committed code (Timer and Post Merge Action builds) in the past 24 hours. 5 distinct tests failed across 4 builds.

Summary Table

# Test Builds Affected (All-Time) First Seen Pattern Build Link
1 MixedClusterClientYamlTestSuiteIT 310_match_bool_prefix/multi_match multiple fields partial term 365 2024-03-25 Stable/chronic 76064
2 MixedClusterClientYamlTestSuiteIT 310_match_bool_prefix/multi_match multiple fields complete term 352 2024-03-25 Stable/chronic 76064
3 IngestFromKinesisIT testKinesisIngestion 255 2025-03-24 Worsening (spike Mar 2026) 76083
4 MixedClusterClientYamlTestSuiteIT cluster.health/10_basic/cluster health with closed index 202 2024-03-25 Stable/chronic 76076
5 FlightMetricsTests testComprehensiveMetrics 71 2025-07-25 Stable (~6/month) 76138
6 EhcacheDiskCacheManagerTests testCreateAndCloseCacheConcurrently 29 2025-03-05 Worsening (Apr-May 2026) 76071

Detailed Findings

1. MixedClusterClientYamlTestSuiteIT - 310_match_bool_prefix/multi_match multiple fields partial term

  • Build: 76064 (Post Merge Action)
  • Seed: AD6EEB0DC58E72AE
  • Error: hits.hits.0._id: expected String [4] but was String [1]
  • Reproduced locally: N/A — BWC test requires multi-version cluster infrastructure
  • First seen: 2024-03-25
  • Total unique builds affected: 365
  • Pattern: Chronic flake. Major spike in Sep 2024 (137 builds), otherwise steady at 3-16 builds/month. Stable and ongoing — no improvement or worsening trend in recent months.

2. MixedClusterClientYamlTestSuiteIT - 310_match_bool_prefix/multi_match multiple fields complete term

  • Build: 76064 (Post Merge Action)
  • Seed: AD6EEB0DC58E72AE
  • Error: hits.hits.0._id: expected String [4] but was String [1]
  • Reproduced locally: N/A — BWC test requires multi-version cluster infrastructure
  • First seen: 2024-03-25
  • Total unique builds affected: 352
  • Pattern: Nearly identical to the partial term variant. Chronic flake with same spike in Sep 2024 (133 builds). These two tests always fail together in the same builds.

3. IngestFromKinesisIT - testKinesisIngestion

  • Build: 76083 (Post Merge Action)
  • Seed: E0E189648EF687DD
  • Error: ResourceInUseException: Stream test already exists — test cleanup/setup race condition with the embedded Kinesis mock
  • Reproduced locally: No — passed with seed E0E189648EF687DD
  • First seen: 2025-03-24
  • Total unique builds affected: 255
  • Pattern: Worsening. Large spikes in Mar 2025 (51 builds), Sep 2025 (111 builds), and Mar 2026 (53 builds). The error indicates a resource cleanup race that is timing-dependent and not seed-reproducible.

4. MixedClusterClientYamlTestSuiteIT - cluster.health/10_basic/cluster health with closed index

  • Build: 76076 (Timer, main)
  • Seed: 467A7A407AF287D8
  • Error: expected [2xx] status code but api [cluster.health] returned [408 Request Timeout] — cluster health timed out with status:red and 51 unassigned shards
  • Reproduced locally: N/A — BWC test requires multi-version cluster infrastructure
  • First seen: 2024-03-25
  • Total unique builds affected: 202
  • Pattern: Chronic flake. Spike in Sep 2024 (58 builds), otherwise 1-14 builds/month. Uptick in Apr 2026 (10 builds) may correlate with the m7a.8xlarge runner migration (faster CPUs can change cluster formation timing).

5. FlightMetricsTests - testComprehensiveMetrics

  • Build: 76138 (Post Merge Action)
  • Seed: 2E69A50FFD89D2D0
  • Error: BindTransportException: Failed to bind to [/0:0:0:0:0:0:0:1%lo, /127.0.0.1]:PortsRange{portRange='25401'} — port conflict
  • Reproduced locally: No — passed with seed 2E69A50FFD89D2D0
  • First seen: 2025-07-25
  • Total unique builds affected: 71
  • Pattern: Stable at ~5-11 builds/month since inception. Port binding failures are inherently environment-dependent and not seed-reproducible. Slight uptick in Apr 2026 (11 builds).

6. EhcacheDiskCacheManagerTests - testCreateAndCloseCacheConcurrently

  • Build: 76071 (Post Merge Action)
  • Seed: DF0F4253E4345108
  • Error: Suite timeout exceeded (>= 1200000 msec) — test hung until the 20-minute suite timeout
  • Reproduced locally: No — passed with seed DF0F4253E4345108
  • First seen: 2025-03-05
  • Total unique builds affected: 29
  • Pattern: Worsening. Was dormant Feb-Mar 2026, then 7 builds affected in both Apr and May 2026 (7 days into May). The timeout suggests a deadlock or livelock in concurrent cache creation/close that manifests under specific thread scheduling — consistent with the m7a.8xlarge runner migration amplifying latent races.

Reproduction Summary

Test Seed Reproduced?
MixedClusterClientYamlTestSuiteIT (partial term) AD6EEB0DC58E72AE N/A (BWC infra required)
MixedClusterClientYamlTestSuiteIT (complete term) AD6EEB0DC58E72AE N/A (BWC infra required)
MixedClusterClientYamlTestSuiteIT (cluster.health) 467A7A407AF287D8 N/A (BWC infra required)
EhcacheDiskCacheManagerTests DF0F4253E4345108 No
IngestFromKinesisIT E0E189648EF687DD No
FlightMetricsTests 2E69A50FFD89D2D0 No

None of the locally-runnable tests reproduced with their CI seeds, which is consistent with all failures being timing/environment-dependent rather than seed-deterministic.

Notes

  • The EhcacheDiskCacheManagerTests.classMethod failure is a secondary artifact of the suite timeout caused by testCreateAndCloseCacheConcurrently hanging — it is not a separate flaky test.
  • The MixedClusterClientYamlTestSuiteIT 310_match_bool_prefix tests (partial and complete term) always fail together and share the same root cause (document scoring/ordering non-determinism in mixed-version clusters).
  • Data source: gradle-check-* index at metrics.opensearch.org, queried 2026-05-07.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions