Summary
Flaky test failures observed in committed-code CI builds (Timer and Post Merge Action) during the 24-hour window ending 2026-05-10T10:00Z. None of the failures reproduced locally with the original seed, confirming they are timing/environment-dependent flakes rather than deterministic failures.
Failing Tests
| # |
Test |
Build |
Seed |
Reproduced Locally |
First Seen |
Total Builds Affected |
Trend |
| 1 |
MixedClusterClientYamlTestSuiteIT (310_match_bool_prefix) |
76388 |
50E3E31F4E9E08C2 |
Skipped (BWC test) |
2024-03-25 |
463 |
Stable/chronic (~5-18 builds/month) |
| 2 |
RemoteRestoreSnapshotIT.testClusterManagerFailoverDuringSnapshotCreation |
76405 |
51E20D91F32548B6 |
No |
2024-09-02 |
215 |
Stable (~6-17 builds/month) |
| 3 |
RemoteRestoreSnapshotIT.classMethod (suite timeout) |
76405 |
51E20D91F32548B6 |
No |
2024-08-30 |
135 |
Stable (~5-16 builds/month) |
| 4 |
RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodes |
76419 |
3B49CE0DA9CDBF7C |
No |
2024-04-03 |
108 |
Worsening (0/month → 10-17/month since Feb 2026) |
| 5 |
LangPainlessClientYamlTestSuiteIT (derived_fields search definition) |
76396 |
320563A8B9CEAE61 |
No |
2024-05-15 |
63 |
Stable/intermittent (~1-6 builds/month) |
| 6 |
ReindexIT.testReindexTask |
76394, 76398 |
CC8FB0BBA1B722D9, 703D20324B41038C |
No |
2024-06-18 |
25 |
Stable/low (~1-3 builds/month) |
| 7 |
WarmIndexSegmentReplicationIT.testPrimaryReceivesDocsDuringReplicaRecovery |
76394 |
CC8FB0BBA1B722D9 |
No |
2025-03-17 |
11 |
Slightly worsening (3 in Apr 2026) |
| 8 |
IndicesRequestCacheCleanupIT.testDynamicStalenessThresholdUpdate |
76398 |
703D20324B41038C |
No |
2024-08-08 |
10 |
Stable/low (~1 build/month) |
Failure Details
MixedClusterClientYamlTestSuiteIT (310_match_bool_prefix)
- Error:
hits.hits.0._id: expected String [4] but was String [1] — document ordering mismatch in multi_match bool prefix queries
- Pattern: Chronic flake since March 2024 with 463 affected builds. Peaked at 185 builds in Sep 2024, otherwise steady at 5-18/month. This is a BWC test requiring a mixed-version cluster so it cannot be reproduced with a simple local gradle command.
RemoteRestoreSnapshotIT.testClusterManagerFailoverDuringSnapshotCreation
- Error:
Test abandoned because suite timeout was reached (>= 1200000 msec)
- Pattern: Consistent flake since Sep 2024. The test exercises cluster manager failover during snapshot creation — inherently timing-sensitive. The
classMethod failure is a consequence of the same suite timeout.
RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodes
- Error:
replica shards haven't caught up with primary expected:<27> but was:<24>
- Pattern: Was dormant from Jun 2024 through Jan 2025, then resurfaced in Feb 2026 and has been worsening significantly (4→17→13→10 builds/month in Feb-May 2026). This timing correlates with the mid-April 2026 CI runner migration to m7a.8xlarge — faster CPUs may be amplifying a latent race in segment replication recovery.
LangPainlessClientYamlTestSuiteIT (derived_fields)
- Error:
hits.total: expected Integer [4] but was Integer [3] — missing document in derived field search results
- Pattern: Intermittent since May 2024 with gaps of several months between occurrences. Low overall impact.
ReindexIT.testReindexTask
- Error:
java.lang.AssertionError (assertTrue failure in task completion check)
- Pattern: Low-frequency chronic flake since Jun 2024. Appeared in 2 separate builds in the same 24h window, suggesting a possible environmental trigger.
WarmIndexSegmentReplicationIT.testPrimaryReceivesDocsDuringReplicaRecovery
- Error:
CorruptIndexException during refresh on replica — segment replication race during recovery
- Pattern: Relatively new (Mar 2025), low frequency but slightly increasing in recent months.
IndicesRequestCacheCleanupIT.testDynamicStalenessThresholdUpdate
- Error:
expected:<1> but was:<2> — cache entry count mismatch after staleness threshold update
- Pattern: Very low frequency (10 builds total over 22 months). Rare but persistent.
Methodology
- Queried
gradle-check-* index in the OpenSearch metrics cluster for failures in Timer/main and Post Merge Action builds within the past 24 hours
- Aggregated historical failures by month across all build types (including PR builds) using cardinality aggregation on
build_number
- Extracted seeds from Jenkins test report stack traces
- Attempted local reproduction using
./gradlew :<module>:<testTask> --tests "<class>.<method>" -Dtests.seed=<SEED>
- None reproduced — consistent with timing-dependent flakes where the seed controls randomization but not thread scheduling or I/O timing
Recommendations
- RecoveryWhileUnderLoadIT deserves priority attention given its worsening trend correlating with the CI runner migration
- RemoteRestoreSnapshotIT and MixedClusterClientYamlTestSuiteIT are high-volume chronic flakes that contribute significant CI noise
- WarmIndexSegmentReplicationIT is worth watching as a potentially worsening trend
Summary
Flaky test failures observed in committed-code CI builds (Timer and Post Merge Action) during the 24-hour window ending 2026-05-10T10:00Z. None of the failures reproduced locally with the original seed, confirming they are timing/environment-dependent flakes rather than deterministic failures.
Failing Tests
MixedClusterClientYamlTestSuiteIT(310_match_bool_prefix)50E3E31F4E9E08C2RemoteRestoreSnapshotIT.testClusterManagerFailoverDuringSnapshotCreation51E20D91F32548B6RemoteRestoreSnapshotIT.classMethod(suite timeout)51E20D91F32548B6RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodes3B49CE0DA9CDBF7CLangPainlessClientYamlTestSuiteIT(derived_fields search definition)320563A8B9CEAE61ReindexIT.testReindexTaskCC8FB0BBA1B722D9,703D20324B41038CWarmIndexSegmentReplicationIT.testPrimaryReceivesDocsDuringReplicaRecoveryCC8FB0BBA1B722D9IndicesRequestCacheCleanupIT.testDynamicStalenessThresholdUpdate703D20324B41038CFailure Details
MixedClusterClientYamlTestSuiteIT (310_match_bool_prefix)
hits.hits.0._id: expected String [4] but was String [1]— document ordering mismatch in multi_match bool prefix queriesRemoteRestoreSnapshotIT.testClusterManagerFailoverDuringSnapshotCreation
Test abandoned because suite timeout was reached (>= 1200000 msec)classMethodfailure is a consequence of the same suite timeout.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodes
replica shards haven't caught up with primary expected:<27> but was:<24>LangPainlessClientYamlTestSuiteIT (derived_fields)
hits.total: expected Integer [4] but was Integer [3]— missing document in derived field search resultsReindexIT.testReindexTask
java.lang.AssertionError(assertTrue failure in task completion check)WarmIndexSegmentReplicationIT.testPrimaryReceivesDocsDuringReplicaRecovery
CorruptIndexExceptionduring refresh on replica — segment replication race during recoveryIndicesRequestCacheCleanupIT.testDynamicStalenessThresholdUpdate
expected:<1> but was:<2>— cache entry count mismatch after staleness threshold updateMethodology
gradle-check-*index in the OpenSearch metrics cluster for failures in Timer/main and Post Merge Action builds within the past 24 hoursbuild_number./gradlew :<module>:<testTask> --tests "<class>.<method>" -Dtests.seed=<SEED>Recommendations