Skip to content

test: reduce CI flakiness in Ray-backed tests#253

Draft
Copilot wants to merge 4 commits intomainfrom
copilot/improve-test-reliability
Draft

test: reduce CI flakiness in Ray-backed tests#253
Copilot wants to merge 4 commits intomainfrom
copilot/improve-test-reliability

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 23, 2026

Summary

Tighten test reliability for failures seen on main: intermittent Ray worker/actor startup crashes in unit tests. The changes target test isolation and narrow retry handling without weakening the underlying assertions.

Changes

  • ZMQ test fixture isolation

    • Replace per-test os.environ mutation for PLUGBOARD_FLAGS_ZMQ_PUBSUB_PROXY with scoped DI settings overrides.
    • Apply the change consistently across shared, unit, and integration ZMQ fixtures to avoid cross-test/process races when Ray workers start in parallel.
  • Targeted reruns for Ray-backed flakes

    • Add limited reruns to the Ray-backed unit tests that intermittently fail during actor/worker startup:
      • test_multiprocessing_channel
      • test_state_backend_init
      • test_state_backend_init_with_existing_job
    • Keep reruns narrow to preserve signal and avoid broad pipeline inflation.
testing_settings = Settings.model_validate({"flags": {"zmq_pubsub_proxy": zmq_pubsub_proxy}})
with override_settings(testing_settings):
    yield ZMQConnector

Copilot AI and others added 3 commits April 23, 2026 19:47
Agent-Logs-Url: https://github.com/plugboard-dev/plugboard/sessions/9de3125e-8da3-432a-a1b1-73188d656af3

Co-authored-by: toby-coleman <13170610+toby-coleman@users.noreply.github.com>
Agent-Logs-Url: https://github.com/plugboard-dev/plugboard/sessions/9de3125e-8da3-432a-a1b1-73188d656af3

Co-authored-by: toby-coleman <13170610+toby-coleman@users.noreply.github.com>
Agent-Logs-Url: https://github.com/plugboard-dev/plugboard/sessions/9de3125e-8da3-432a-a1b1-73188d656af3

Co-authored-by: toby-coleman <13170610+toby-coleman@users.noreply.github.com>
@toby-coleman
Copy link
Copy Markdown
Contributor

@copilot Remove the changes to the smoke tests. They are not that flaky.

Agent-Logs-Url: https://github.com/plugboard-dev/plugboard/sessions/a6cdaafd-95ac-40e9-bed6-65c91b47a1ba

Co-authored-by: toby-coleman <13170610+toby-coleman@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 5, 2026

@copilot Remove the changes to the smoke tests. They are not that flaky.

Removed the smoke-test changes in 3d8471b.

Copilot AI changed the title test: reduce CI flakiness in Ray-backed and smoke tests test: reduce CI flakiness in Ray-backed tests May 5, 2026
Copilot AI requested a review from toby-coleman May 5, 2026 19:51
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Benchmark comparison for 07750b3c (base) vs 3d8471bb (PR)


------------------------------------------------------------------------------------------------------------------ benchmark: 2 tests -----------------------------------------------------------------------------------------------------------------
Name (time in ms)                                                                         Min                 Max                Mean            StdDev              Median               IQR            Outliers     OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_process_run (main/.benchmarks/Linux-CPython-3.14-64bit/0001_base)     352.8164 (1.0)      358.8909 (1.0)      355.8906 (1.0)      2.1652 (1.0)      356.0099 (1.00)     2.0833 (1.0)           2;0  2.8099 (1.0)           5           1
test_benchmark_process_run (pr/.benchmarks/Linux-CPython-3.14-64bit/0001_pr)         353.6085 (1.00)     364.1089 (1.01)     357.5500 (1.00)     4.7998 (2.22)     354.4052 (1.0)      7.7744 (3.73)          1;0  2.7968 (1.00)          5           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants