Flaky test report: committed-code failures on 2026-04-30

## Overview

This report covers test failures observed in committed-code gradle-check runs (Timer and Post Merge Action builds against `main`) during the 24-hour window ending 2026-04-30. Nine distinct test failures were identified across 7 builds. Historical failure data is aggregated across all build types (including PR builds) to show the true flake rate.

None of the failures reproduced deterministically with the original seed on a local dev desktop, which is consistent with timing-sensitive flaky tests whose seeds control randomization but not thread scheduling, GC pauses, or network timing.

## Summary Table

Sorted by total unique builds affected (all-time, all build types):

| Test | Builds Affected (Total) | First Seen | Apr 2026 | Trend | Reproduced Locally? |
|---|---|---|---|---|---|
| `IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode` | 257 | 2024-05 | 12 | ⚠️ Worsening (resurgence after long quiet period) | No |
| `IndexActionIT.testAutoGenerateIdNoDuplicates` | 240 | 2024-03 | 21 | ⚠️ Worsening (steady climb since 2025-04) | No |
| `FullRollingRestartIT.testFullRollingRestart` | 232 | 2024-10 | 21 | ⚠️ Worsening (burst pattern, active since 2026-02) | No |
| `RemoteStoreStatsIT.testNonZeroPrimaryStatsOnNewlyCreatedIndexWithZeroDocs` | 190 | 2024-03 | 7 | ➡️ Stable (chronic, low-level) | No |
| `AzureBlobStoreRepositoryTests.testMultipleSnapshotAndRollback` | 158 | 2024-03 | 4 | ➡️ Stable (chronic, low-level) | N/A (requires Docker fixture) |
| `FullRollingRestartIT.testFullRollingRestart_withNoRecoveryPayloadAndSource` | 113 | 2024-10 | 9 | ⚠️ Worsening (tracks testFullRollingRestart pattern) | No |
| `EhcacheDiskCacheManagerTests.testCreateAndCloseCacheConcurrently` | 21 | 2025-07 | 7 | ⚠️ Worsening (spike in Apr 2026) | No |
| `WarmIndexSegmentReplicationIT.testRestartPrimary_NoReplicas` | 21 | 2024-04 | 3 | ➡️ Stable (low-frequency, long-lived) | No |
| `WarmIndexSegmentReplicationIT.testReplicaHasDiffFilesThanPrimary` | 16 | 2024-07 | 3 | ➡️ Stable (low-frequency) | No |

## Detailed Findings

### 1. IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode

- **Recent build**: [#75416](https://build.ci.opensearch.org/job/gradle-check/75416/)
- **Error**: Suite timeout reached
- **Seed**: `2259C86F44A4690D` (suite-level only — individual test seed not available due to timeout)
- **Local reproduction**: Did not reproduce
- **Pattern**: Massive burst in May–Jun 2024 (120+119 builds), then nearly silent until Mar 2026 (3 builds) and a sharp resurgence in Apr 2026 (12 builds). The Apr 2026 uptick coincides with the CI runner migration to m7a.8xlarge — faster CPUs may be amplifying a latent timing issue. The failure mode (suite timeout) suggests the test or a co-located test in the suite hangs intermittently.
- **Monthly**: `2024-05:120, 2024-06:119, 2024-07:1, 2024-12:1, 2025-02:1, 2026-03:3, 2026-04:12`

### 2. IndexActionIT.testAutoGenerateIdNoDuplicates

- **Recent builds**: [#75437](https://build.ci.opensearch.org/job/gradle-check/75437/), [#75422](https://build.ci.opensearch.org/job/gradle-check/75422/)
- **Error**: Count mismatch — expected document count doesn't match actual (e.g., "Count is 79 but 73 was expected")
- **Seeds**: `DF7FD6BB7B77DE60:EC27A2EA45D12891`, `29883286046413C0:1AD046D73AC2E531`
- **Local reproduction**: Did not reproduce with either seed
- **Pattern**: Chronic flaky test present since Mar 2024. Significant escalation starting Apr 2025 (46 builds), remaining elevated since. The test runs with `cluster.indices.replication.strategy=SEGMENT` parameterization. The count mismatch suggests a race between auto-generated ID indexing and the subsequent count verification under segment replication.
- **Monthly**: `2024-03:1, 2024-04:1, 2024-06:3, 2024-07:3, 2024-09:3, 2024-10:10, 2024-11:4, 2024-12:2, 2025-01:6, 2025-02:4, 2025-03:7, 2025-04:46, 2025-05:12, 2025-06:11, 2025-07:20, 2025-08:13, 2025-09:9, 2025-10:9, 2025-11:8, 2025-12:10, 2026-01:7, 2026-02:11, 2026-03:19, 2026-04:21`

### 3. FullRollingRestartIT.testFullRollingRestart

- **Recent build**: [#75436](https://build.ci.opensearch.org/job/gradle-check/75436/)
- **Error**: `replica shards haven't caught up with primary expected:<24> but was:<19>`
- **Seed**: `A69557A7C9948C52:A7D259FE9CC06338`
- **Local reproduction**: Did not reproduce
- **Pattern**: Burst pattern — isolated hit in Oct 2024, then massive spike Jul 2025 (105 builds), tapering through Aug 2025, quiet Sep 2025–Jan 2026, then resurgence Feb 2026 (35) continuing through Apr 2026 (21). Runs with `cluster.indices.replication.strategy=SEGMENT`. The assertion failure indicates replicas lag behind the primary during rolling restart, a timing-sensitive condition.
- **Monthly**: `2024-10:1, 2025-04:1, 2025-07:105, 2025-08:43, 2025-09:1, 2026-02:35, 2026-03:25, 2026-04:21`

### 4. RemoteStoreStatsIT.testNonZeroPrimaryStatsOnNewlyCreatedIndexWithZeroDocs

- **Recent build**: [#75412](https://build.ci.opensearch.org/job/gradle-check/75412/)
- **Error**: `java.lang.AssertionError` (bare assertion, no message)
- **Seed**: `6BE1A9E35D5F1D77:A6167C3F1B40CAB1`
- **Local reproduction**: Did not reproduce
- **Pattern**: One of the longest-running flaky tests, present since Mar 2024. Peaked Aug–Sep 2024 (19 builds/month), then declined to a low chronic rate of 2–9 builds/month. Stable and persistent — not worsening but not improving either.
- **Monthly**: `2024-03:1, 2024-04:8, 2024-05:14, 2024-06:5, 2024-07:13, 2024-08:19, 2024-09:19, 2024-10:16, 2024-11:9, 2024-12:8, 2025-01:16, 2025-02:5, 2025-03:3, 2025-04:2, 2025-05:4, 2025-06:3, 2025-07:3, 2025-08:7, 2025-09:3, 2025-10:2, 2025-11:3, 2025-12:2, 2026-01:7, 2026-02:2, 2026-03:9, 2026-04:7`

### 5. AzureBlobStoreRepositoryTests.testMultipleSnapshotAndRollback

- **Recent build**: [#75422](https://build.ci.opensearch.org/job/gradle-check/75422/)
- **Error**: `RepositoryVerificationException: path is not accessible on cluster-manager node`
- **Seed**: Not available in error output
- **Local reproduction**: Could not attempt — requires Docker compose for Azure fixture
- **Pattern**: Chronic since Mar 2024. Peaked Aug–Sep 2024 and Dec 2025, otherwise steady at 1–6 builds/month. The error suggests an infrastructure/fixture issue rather than a code logic bug.
- **Monthly**: `2024-03:1, 2024-04:6, 2024-05:3, 2024-06:6, 2024-07:4, 2024-08:18, 2024-09:22, 2024-10:8, 2024-11:5, 2024-12:11, 2025-01:3, 2025-02:4, 2025-03:3, 2025-04:7, 2025-05:6, 2025-06:2, 2025-07:1, 2025-08:1, 2025-09:6, 2025-10:6, 2025-11:5, 2025-12:15, 2026-01:4, 2026-02:4, 2026-03:3, 2026-04:4`

### 6. FullRollingRestartIT.testFullRollingRestart_withNoRecoveryPayloadAndSource

- **Recent build**: [#75424](https://build.ci.opensearch.org/job/gradle-check/75424/)
- **Error**: `replica shards haven't caught up with primary expected:<16> but was:<13>`
- **Seed**: `ECF4F5EFE2ED5E0C:DFBD9FAF4BA03F54`
- **Local reproduction**: Did not reproduce
- **Pattern**: Tracks `testFullRollingRestart` almost exactly — same class, same root cause (replica lag under SEGMENT replication during rolling restart), same burst timeline. Jul 2025 spike (47), Feb 2026 resurgence (17), continuing Apr 2026 (9).
- **Monthly**: `2024-10:1, 2025-04:1, 2025-07:47, 2025-08:24, 2025-09:1, 2026-02:17, 2026-03:13, 2026-04:9`

### 7. EhcacheDiskCacheManagerTests.testCreateAndCloseCacheConcurrently

- **Recent build**: [#75418](https://build.ci.opensearch.org/job/gradle-check/75418/)
- **Error**: Suite timeout reached
- **Seed**: `E8DBFF5D8F617ACF` (suite-level only)
- **Local reproduction**: Did not reproduce
- **Pattern**: Newer flaky test, first seen Jul 2025. Low frequency (1–4 builds/month) until Apr 2026 spike to 7 builds. The Apr 2026 increase coincides with the m7a.8xlarge CI runner migration. Suite timeout suggests the concurrent cache create/close test may deadlock or hang under certain thread interleavings.
- **Monthly**: `2025-07:1, 2025-08:2, 2025-09:1, 2025-10:2, 2025-11:4, 2025-12:2, 2026-01:2, 2026-04:7`

### 8. WarmIndexSegmentReplicationIT.testRestartPrimary_NoReplicas

- **Recent build**: [#75437](https://build.ci.opensearch.org/job/gradle-check/75437/)
- **Error**: `FileCache did not initialise correctly`
- **Seed**: `DF7FD6BB7B77DE60:A56DA7FD5AE1A066`
- **Local reproduction**: Did not reproduce
- **Pattern**: Low-frequency, long-lived flaky test since Apr 2024. Sporadic 1–3 builds/month with no clear trend. The FileCache initialization error suggests a race condition during primary restart in warm index segment replication.
- **Monthly**: `2024-04:2, 2024-05:1, 2024-07:2, 2024-09:2, 2024-12:1, 2025-06:1, 2025-08:3, 2025-09:1, 2025-12:2, 2026-02:3, 2026-04:3`

### 9. WarmIndexSegmentReplicationIT.testReplicaHasDiffFilesThanPrimary

- **Recent build**: [#75424](https://build.ci.opensearch.org/job/gradle-check/75424/)
- **Error**: `Unexpected ShardFailures` — refresh operation failed with RemoteTransportException
- **Seed**: `ECF4F5EFE2ED5E0C:1B18569E388D817B`
- **Local reproduction**: Did not reproduce
- **Pattern**: Lowest frequency of all tests in this report (16 total builds since Jul 2024). Sporadic 1–3 builds/month. The shard refresh failure during segment replication suggests a transient network/transport issue in the test cluster.
- **Monthly**: `2024-07:2, 2024-09:2, 2024-12:1, 2025-03:1, 2025-07:1, 2025-08:3, 2025-12:1, 2026-01:2, 2026-04:3`

## Notes

- **CI runner migration**: Around 2026-04-15, gradle-check runners moved from m5.8xlarge to m7a.8xlarge. Several tests show Apr 2026 upticks that may be attributable to faster CPUs amplifying latent timing races (notably `IndicesRequestCacheIT`, `EhcacheDiskCacheManagerTests`).
- **Seed non-determinism**: All 8 locally-tested seeds passed, confirming these failures depend on factors beyond RandomizedRunner seed control (thread scheduling, GC timing, network ordering). This is expected for integration tests, especially those involving segment replication and rolling restarts.
- **SEGMENT replication**: 4 of 9 tests run with `cluster.indices.replication.strategy=SEGMENT` parameterization. Segment replication timing sensitivity is a recurring theme.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky test report: committed-code failures on 2026-04-30 #251

Overview

Summary Table

Detailed Findings

1. IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode

2. IndexActionIT.testAutoGenerateIdNoDuplicates

3. FullRollingRestartIT.testFullRollingRestart

4. RemoteStoreStatsIT.testNonZeroPrimaryStatsOnNewlyCreatedIndexWithZeroDocs

5. AzureBlobStoreRepositoryTests.testMultipleSnapshotAndRollback

6. FullRollingRestartIT.testFullRollingRestart_withNoRecoveryPayloadAndSource

7. EhcacheDiskCacheManagerTests.testCreateAndCloseCacheConcurrently

8. WarmIndexSegmentReplicationIT.testRestartPrimary_NoReplicas

9. WarmIndexSegmentReplicationIT.testReplicaHasDiffFilesThanPrimary

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Test	Builds Affected (Total)	First Seen	Apr 2026	Trend	Reproduced Locally?
`IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode`	257	2024-05	12	⚠️ Worsening (resurgence after long quiet period)	No
`IndexActionIT.testAutoGenerateIdNoDuplicates`	240	2024-03	21	⚠️ Worsening (steady climb since 2025-04)	No
`FullRollingRestartIT.testFullRollingRestart`	232	2024-10	21	⚠️ Worsening (burst pattern, active since 2026-02)	No
`RemoteStoreStatsIT.testNonZeroPrimaryStatsOnNewlyCreatedIndexWithZeroDocs`	190	2024-03	7	➡️ Stable (chronic, low-level)	No
`AzureBlobStoreRepositoryTests.testMultipleSnapshotAndRollback`	158	2024-03	4	➡️ Stable (chronic, low-level)	N/A (requires Docker fixture)
`FullRollingRestartIT.testFullRollingRestart_withNoRecoveryPayloadAndSource`	113	2024-10	9	⚠️ Worsening (tracks testFullRollingRestart pattern)	No
`EhcacheDiskCacheManagerTests.testCreateAndCloseCacheConcurrently`	21	2025-07	7	⚠️ Worsening (spike in Apr 2026)	No
`WarmIndexSegmentReplicationIT.testRestartPrimary_NoReplicas`	21	2024-04	3	➡️ Stable (low-frequency, long-lived)	No
`WarmIndexSegmentReplicationIT.testReplicaHasDiffFilesThanPrimary`	16	2024-07	3	➡️ Stable (low-frequency)	No

Flaky test report: committed-code failures on 2026-04-30 #251

Description

Overview

Summary Table

Detailed Findings

1. IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode

2. IndexActionIT.testAutoGenerateIdNoDuplicates

3. FullRollingRestartIT.testFullRollingRestart

4. RemoteStoreStatsIT.testNonZeroPrimaryStatsOnNewlyCreatedIndexWithZeroDocs

5. AzureBlobStoreRepositoryTests.testMultipleSnapshotAndRollback

6. FullRollingRestartIT.testFullRollingRestart_withNoRecoveryPayloadAndSource

7. EhcacheDiskCacheManagerTests.testCreateAndCloseCacheConcurrently

8. WarmIndexSegmentReplicationIT.testRestartPrimary_NoReplicas

9. WarmIndexSegmentReplicationIT.testReplicaHasDiffFilesThanPrimary

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions