Summary
One test failure was detected against committed code (Timer/Post Merge Action builds on main) in the past 24 hours (2026-05-10 to 2026-05-11).
Failing Tests
| Test |
Build |
Builds Affected (Total) |
First Failure |
Pattern |
RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodes |
76481 |
112 |
2024-04-03 |
Worsening |
Detailed Findings
RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodes
Build: 76481 (Timer, main)
Error:
java.lang.AssertionError: replica shards haven't caught up with primary expected:<25> but was:<22>
at OpenSearchIntegTestCase.waitForReplication(OpenSearchIntegTestCase.java:2570)
at RecoveryWhileUnderLoadIT.assertAfterRefreshAndWaitForReplication(RecoveryWhileUnderLoadIT.java:504)
at RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodes(RecoveryWhileUnderLoadIT.java:350)
Seed: C8CCF036B428F9A5:AC55FEE67C7B2DF6
Local reproduction: NOT reproducible. Ran 6 times with the original seed — all passed. The failure is timing-dependent and not deterministic with the seed alone.
Historical pattern (monthly unique builds affected):
- 2024-04 to 2024-08: Low (1-4/month)
- 2024-09 to 2025-03: Mostly dormant (0-1/month)
- 2025-04: 22 (spike begins)
- 2025-06: 77 (peak)
- 2025-07: 43
- 2025-08 to 2026-01: Low (0-14/month)
- 2026-02: 13 (resurgence)
- 2026-03: 29
- 2026-04: 28
- 2026-05: 19 (11 days in, on pace for ~52/month)
Assessment: This is a chronic flaky test with a worsening trend. The recent resurgence (Feb 2026 onward) correlates with the CI runner migration to faster m7a.8xlarge instances in mid-April 2026, which amplifies timing-sensitive races. The test exercises segment replication recovery under load with node allocation changes — a scenario where replica catch-up timing is inherently non-deterministic. The assertBusy timeout in waitForReplication appears insufficient under faster execution conditions.
Other Builds
The remaining Timer builds on main in this period either passed all tests or experienced build-level (non-test) failures:
- Build 76511: FAILURE, 0 test failures (build infrastructure issue, only 144 tests ran)
- Build 76493: FAILURE, 0 test failures (build infrastructure issue, only 257 tests ran)
- Build 76457 (Post Merge Action): FAILURE, 0 test failures (build infrastructure issue)
- All other Timer builds: Passed all tests
Summary
One test failure was detected against committed code (Timer/Post Merge Action builds on
main) in the past 24 hours (2026-05-10 to 2026-05-11).Failing Tests
RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodesDetailed Findings
RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodes
Build: 76481 (Timer, main)
Error:
Seed:
C8CCF036B428F9A5:AC55FEE67C7B2DF6Local reproduction: NOT reproducible. Ran 6 times with the original seed — all passed. The failure is timing-dependent and not deterministic with the seed alone.
Historical pattern (monthly unique builds affected):
Assessment: This is a chronic flaky test with a worsening trend. The recent resurgence (Feb 2026 onward) correlates with the CI runner migration to faster
m7a.8xlargeinstances in mid-April 2026, which amplifies timing-sensitive races. The test exercises segment replication recovery under load with node allocation changes — a scenario where replica catch-up timing is inherently non-deterministic. TheassertBusytimeout inwaitForReplicationappears insufficient under faster execution conditions.Other Builds
The remaining Timer builds on main in this period either passed all tests or experienced build-level (non-test) failures: