feat: async run_experiment via RunHandle + cancellation + status widget by hinderling · Pull Request #10 · pertzlab/faro

hinderling · 2026-05-15T14:08:05Z

Summary

Move the MDA feed loop onto a worker thread, expose live status through a RunHandle (psygnal Signal), and add a napari dock widget that mirrors + steers the current run. Replaces the synchronous-blocking run_experiment / continue_experiment API.

Draft: breaks the public API. Notebook updates required (see below) before merging. The async demo notebook included here is a test artifact — it must be removed before merge (see Demo notebook section).

Why

The controller's feed loop ran on the main thread, so:

napari froze for the duration of every run (no Qt-event processing).
run_experiment blocked the calling cell — no interactive monitoring / cancellation without Ctrl-C (which sometimes left device state half-set).
Status was opaque: "what timepoint are we on, are we lagging?" was unanswerable.
No clean way to cancel or pause a long run.

Moving the loop onto its own thread fixes all of these: napari is responsive by construction, the cell returns immediately, and cancellation / pause / live status become natural.

What changed

New: `faro/core/run_status.py`

RunStatus — immutable snapshot dataclass: state, current_event_index, current_fov, n_events_total, n_events_consumed, n_frames_received, started_at / finished_at, lag_ms, background_errors, fatal_error, …
RunHandle — owns the worker thread + cooperative cancel/pause events, carries the run's (sorted) event list. Methods: status(), wait(), cancel(), pause(), resume(), is_running(), is_paused(). Signal: statusChanged (psygnal) emitting the latest RunStatus.
RunState: pending → running ⇄ pausing/paused → done/error (cancelling on cancel).

`faro/core/controller.py`

Controller.runStarted = Signal(object) fires on each new run/continue carrying the fresh RunHandle.
run_experiment / continue_experiment spawn a worker thread and return the handle immediately; validation still runs synchronously on the caller. Events are sorted once and stashed on the handle so the widget renders them in execution order.
_run_worker centralises pre-flight setup and wraps the feed loop so failures land in handle.fatal_error instead of crashing the user.
_run_mda_with_events polls cancel_event and pause_event each iteration — pause halts feeding after the in-flight backpressure window drains; resume continues.
fix: the engine queue is recreated per run. A cancelled run aborts the engine mid-drain, leaving a stale STOP_EVENT behind; reusing the queue made the next run's engine consume that sentinel and stall after a few events ("stuck at 3/80").
fix: _bump_status_for_frame skips IMG_STIM snaps — a stim emission is the SLM-illuminated snap paired with its imaging frame; counting it double-updated lag/elapsed and drifted the frame count off the RTMEvent count.
napari preview: the controller no longer carries its own preview-layer machinery, and live mode no longer has to be manually disconnected before a run. napari-micromanager's own _NapariMDAHandler keeps routing frames into the preview layer throughout the run; the controller just stops continuous sequence acquisition once at MDA start to avoid a snap-buffer race. Notebooks can drop the old "break the CoreViewerLink before running" dance.

New: `faro/widgets/experiment_status.py`

ExperimentStatusWidget — a napari dock panel that mirrors and controls the current run:

State chip, legend (imaging / stim / ref).
Event strip — one cell per RTMEvent, color-coded by type, past=opaque / future=dimmed progress fill, current cell bordered. Scales to thousands of events.
FOV map — one dot per unique stage position, equal-aspect, visit-order path, active dot recolored to the current event type.
Stats — event N/M, elapsed, scheduled, lag (red > 5 s), remaining, errors.
Pause / Resume + Stop buttons.
Theme-adaptive (napari light/dark), auto-rebinds on every new run via runStarted.

Async/Qt fixes folded in

PYMM_SIGNALS_BACKEND=psygnal forced in faro/microscope/base.py — with a QApplication loaded, pymmcore-plus otherwise picks the Qt signal backend and queues frameReady to the main thread; if the main thread is blocked (handle.wait()), frames never reach the controller. Forcing psygnal keeps the data path direct/synchronous on the engine thread.
Widget connects statusChanged with thread="main" + drives psygnal.qt.start_emitting_from_queue() so worker-thread emits reach QWidgets safely.
uv.lock: bumped pymmcore-widgets past an upstream fix (_presets_widget crashing on an empty device label during MDA events).

BREAKING: notebook updates required

Before

ctrl.run_experiment(events, stim_mode="current")   # blocked here
ctrl.finish_experiment()

After — choose one:

(a) Blocking equivalent (smallest diff):

ctrl.run_experiment(events, stim_mode="current").wait()
ctrl.finish_experiment()

(b) Non-blocking, with status / cancel / pause:

handle = ctrl.run_experiment(events, stim_mode="current")
# other cells can run; handle.status() / handle.cancel() / handle.pause()
handle.wait()                  # block at the end if desired
ctrl.finish_experiment()

Optional napari widget:

from faro.widgets import ExperimentStatusWidget
viewer.window.add_dock_widget(ExperimentStatusWidget(ctrl), name="Experiment")

Demo notebook (test artifact — remove before merge)

experiments/02_demo_sim_optogenetic/demo_sim_optogenetic_napari_async.ipynb is included only to exercise this PR against the virtual-microscope optogenetic backend (async run, pause/resume, cancel/restart, the status widget, multi-FOV). It doubles as a worked example of what the migrated notebooks could look like. It should be deleted before this PR merges — the real deliverable is the API + widget, not this notebook.

What to check / test before merging

Every notebook in experiments/* that calls run_experiment / continue_experiment — migrate to .wait() or the non-blocking flow. Confirm none rely on the old blocking return.
Notebooks that manually tear down the napari live link / CoreViewerLink before a run — that workaround is no longer needed; verify removing it and that the preview layer keeps updating during the run.
tests/hardware/* — update for the new RunHandle return type; run on the Moench rig.
Multi-channel imaging: the widget's frame counter / strip cursor assume ~1 imaging frame per RTMEvent. For multi-channel plans n_frames_received outpaces the RTMEvent count — verify the strip/stats still read sensibly or gate the assumption.
continue_experiment + the widget: confirm the strip/map rebuild correctly for the appended events and the FOV map merges positions.
Headless / no-Qt runs (CI, non-microscope dev machine) — import faro stays Qt-free; .wait() path works without a QApplication.
Cancel-then-restart and pause/resume on real hardware (verified on the simulator; engine-abort semantics differ per device).
Bump the virtual-microscope lockfile pin — uv lock --upgrade-package virtual-microscope to pick up the fixes now on its default branch (JIT pre-warm; SimCameraDevice digital ROI / MDA-teardown fix). Without this the demo notebook's first ~4 s of frames stall and the napari Snap preview freezes after a run. Commit the uv.lock change separately (it is not async/widget code).

Related (separate repo)

Two virtual-microscope fixes were needed for the demo notebook and have already landed on its default branch (virtual-env):

JIT pre-warm — pre-warms the numba physics-step JIT before the RealtimeEngine starts; otherwise the first ~4 s of snaps stall behind a compile holding the sim lock, so frames arrive in a burst instead of paced.
SimCameraDevice digital ROI — implements real ROI cropping. It also fixes an MDA-teardown bug: the camera previously raised NotImplementedError from set_roi, which aborted MDARunner._finish_run before it emitted sequenceFinished; napari-micromanager then never cleared _mda_running, so the Snap preview silently stopped updating after a run.

These are not part of this PR — faro just needs the lockfile bump above to pick them up.

Verification

Exercised end-to-end against the virtual-microscope optogenetic backend (napari + napari-micromanager + the widget):

Live status flows worker → widget on the main thread (psygnal queued delivery); strip / FOV map / stats update in real time.
Cancel mid-run, then restart from the notebook — reaches steady state, no stall.
Pause halts feeding after the backpressure window drains; resume runs to completion.
Frame count tracks RTMEvents 1:1 for single-channel plans; stim snaps no longer double-count.
87 unit tests pass.

Compatibility notes

Headless / no Qt: works — psygnal delivers slots synchronously without Qt. Widget package is opt-in (import faro.widgets); import faro / import faro.core stay Qt-free.
MDA engines other than pymmcore-plus: no regression — the controller still talks to hardware exclusively through AbstractMicroscope.

Screenshot

Move the MDA feed loop onto a worker thread, expose live status through a RunHandle + psygnal Signal, and add a minimal napari widget that mirrors the current run. Breaking change: ctrl.run_experiment(events, ...) and ctrl.continue_experiment(...) now return a RunHandle immediately instead of blocking until the run is done. Existing notebooks that did `ctrl.run_experiment(events, ...)` must be updated to either `handle = ctrl.run_experiment(events, ...); handle.wait()` for the old blocking semantics, or to use the new non-blocking flow (poll handle.status(), subscribe to handle.statusChanged, call handle.cancel() to stop early). What's in this commit: - faro/core/run_status.py (new): * RunStatus -- immutable snapshot dataclass with state, event/FOV indices, frame count, lag_ms, error info. * RunHandle -- owns the worker thread + cooperative cancel event, exposes status()/wait()/cancel()/is_running() + a psygnal statusChanged signal that emits the latest RunStatus on each update. Subscribers on the main thread see queued-connection delivery via psygnal's Qt integration. - faro/core/controller.py: * Controller exposes a class-level runStarted = Signal(object). Fires on every new run/continue so widgets can re-bind. * run_experiment / continue_experiment spawn a worker thread, return the handle, emit runStarted. Validation still happens synchronously so a bad event list raises on the calling thread. * _run_worker centralises pre-flight setup (writer init -- including the potentially-slow zarr rmtree on overwrite -- and Analyzer construction) and wraps the feed loop in try/except so worker-side failures land in handle.fatal_error rather than crashing the user. * _run_mda_with_events accepts the handle, checks handle.cancel_event at each loop iteration and in the backpressure throttle, asks the engine to cancel the in-flight event when set, and emits status updates on each RTMEvent dequeue. * _on_frame_ready (and ControllerSimulated._on_frame_ready) call a shared _bump_status_for_frame helper that increments n_frames_received and computes lag_ms vs event.min_start_time. * Now off the main thread, all the prior Qt-pumping helpers (_pump_qt_and_sleep, _qt_join, _wait_for_frame_pumping_qt) and the superqt ensure_main_thread import are obsolete and removed. The preview-layer machinery (viewer=, _on_preview_frame, _apply_preview, PREVIEW_LAYER_NAME) is also removed -- napari-micromanager's own _NapariMDAHandler already routes generator events into the preview layer. * finish_experiment now waits for the current handle before shutting down the Analyzer. * _pending_sentinels guarded by a Lock since extend_experiment now runs on the calling thread while the feed loop runs on the worker. - faro/widgets/experiment_status.py (new): * ExperimentStatusWidget -- read-out of state, FOV, event index, frame count, lag, elapsed time, error count. Has a Stop button that calls handle.cancel(). Subscribes to controller.runStarted so it automatically re-binds when a new run begins; cleans up the previous handle's signal subscription on each rebind. Verified end-to-end via a Qt smoke test: - Live updates flow from the worker thread to the widget on the main thread (psygnal+Qt queued delivery). - Stop button triggers handle.cancel(); the worker's cancel-check fires within one iteration and the run exits at the next event boundary. - Starting a new run re-binds the widget to the new handle and resets the progress bar / counters.

The OmeZarrWriter init in _run_worker still pulled image height/width via self._mic.mmc.getImageHeight/Width -- a pymmcore-plus-specific call that breaks any non-pymmcore microscope. Use the AbstractMicroscope-level convention: subclasses populate self.image_height / self.image_width on the microscope instance (Moench already does this in init_scope). Fall back to mmc if the attributes aren't present but mmc is, so existing pymmcore-only microscopes keep working without code changes. Raise a clear error when neither path is available.

Three independent bugs surfaced when running the new async run_experiment + ExperimentStatusWidget against a napari viewer (reproduced with the optogenetic virtual_microscope backend): 1. pymmcore-plus's signals_backend() auto-selects the *qt* backend whenever a QApplication is loaded. core.mda.events.frameReady then becomes a QtCore.SignalInstance and cross-thread emits land in Qt.QueuedConnection, where they're delivered only when the main thread pumps events. With Controller.run_experiment now spawning a worker and RunHandle.wait() joining on it, the main thread is typically idle-blocked exactly when the engine is firing frames -- so the controller's _on_frame_ready never ran, the engine completed "successfully" with zero frames received, and the pipeline never saw any data. Force PYMM_SIGNALS_BACKEND=psygnal in faro/microscope/base.py so the data path stays direct/synchronous on the engine thread regardless of whether Qt is loaded. The widget-side path (RunHandle.statusChanged) still uses psygnal's own queued delivery -- see fix #2. 2. ExperimentStatusWidget connected handle.statusChanged with the default (direct) connection. Status updates emitted from the worker thread therefore ran the widget's _refresh slot synchronously off-main, calling QLabel.setText / QProgressBar.setValue from a non-GUI thread. Under napari that lands in vispy's OpenGL compositor and aborts with "Cannot make QOpenGLContext current in a different thread" -> SIGABRT (kernel hard-crash in VSCode Jupyter). Switch to connect(..., thread="main") so psygnal queues the call into its main-thread queue. 3. psygnal's queued callbacks live in QueuedCallback._GLOBAL_QUEUE, which nothing drains by default -- the widget would be invoked on the main thread, but only when something explicitly calls psygnal.emit_queued(). RunHandle's docstring claims auto-Qt delivery; that's not how psygnal actually works. Call psygnal.qt.start_emitting_from_queue() in the widget's __init__, which installs a main-thread QTimer that fires emit_queued() on every Qt event-loop tick. Idempotent and global, so multiple widgets / multiple runs are safe. Lockfile: bump pymmcore-widgets (8c8f76e -> 48ff414) so the unrelated upstream crash in pymmcore_widgets._presets_widget._on_property_changed when handed an empty device label (virtual_microscope's shutter) is included. Without that bump, the MDA engine itself aborts on the first setShutterOpen() once frames actually start flowing. Verified end-to-end against virtual_microscope's optogenetic backend: - headless async run: 5/5 frames (regression check, unchanged) - napari.Viewer() + handle.wait(): 5/5 frames (was 0/5) - napari + napari-micromanager + widget: 5/5 frames, no crash, exit 0 - widget visibly updates progress / frames / state mid-experiment (sampled QLabel.text() while pumping Qt events) - 87 unit tests still pass

Sibling of demo_sim_optogenetic.ipynb that exercises the new async run_experiment + RunHandle + ExperimentStatusWidget end-to-end against virtual_microscope's optogenetic backend, with a live napari viewer dock-attached. Walks through: handle = ctrl.run_experiment(...) is non-blocking, the kernel is free; poll handle.status() while it runs; subscribe to handle.statusChanged from the kernel side; cancel via the widget Stop button or handle.cancel(); handle.wait() blocks if you want the old synchronous semantics; continue_experiment() re-binds the widget automatically via runStarted. Phases are concatenated with combine(..., axis="t") per the new RTMSequence API.

Backend changes that make an async run inspectable and steerable -- the data the new ExperimentStatusWidget renders, plus two bug fixes surfaced while building it. run_status.py - RunHandle.events: optional snapshot of the (sorted) RTMEvents the handle is driving, so widgets can render per-event visualisations (event strip, FOV map) that need the full plan up front. - Pause/resume: RunState gains "pausing"/"paused"; RunHandle gains pause()/resume()/is_paused() and a pause_event the feed loop polls. cancel() now also clears the pause event so a cancel while paused still releases the feed loop. controller.py - run_experiment / continue_experiment sort events once (by min_start_time, then position) and stash the sorted list on the handle, so the order the worker processes matches what the widget displays. - Feed loop honors pause_event: before pulling the next RTMEvent it checks the flag, flips state to "paused", and idles until resume() -- the MDA engine drains whatever is already queued, then waits. - fix: the engine queue (self._queue) is recreated per run. The finally-block feeds a STOP_EVENT sentinel to stop the engine; on a *cancelled* run cancel_mda() aborts the engine, which may stop without draining the queue, leaving stale events + the sentinel behind. Reusing that queue made the next run's engine consume the stale sentinel and exit after a few events ("stuck at 3/80"). A fresh queue per run fixes it. - fix: _bump_status_for_frame skips IMG_STIM frames. A stim emission is the SLM-illuminated snap paired with its imaging frame; counting it double-updated the status (lag/elapsed refreshing twice per stim event) and made n_frames_received drift away from the RTMEvent count. Imaging + ref frames are the meaningful data frames. Verified end-to-end against the optogenetic virtual-microscope backend: cancel mid-run then restart reaches steady state (no stall); pause halts feeding after the backpressure window drains and resume continues to completion; frame count tracks RTMEvents 1:1 for single-channel plans.

Rework the minimal status widget into a full run dashboard, driven by the RunHandle data exposed in the previous commit. Components (top to bottom): - State chip -- RUNNING / PAUSED / DONE / ... as plain text in a translucent-neutral rounded chip (no per-state fill: a colored banner competed with the imaging/stim/ref legend colors). - Legend chips -- imaging / stim / ref; the chip matching the current event type is fully opaque, the others dimmed. - EventStrip -- one cell per RTMEvent, color-coded by type. Past + current cells opaque (progress fill), future cells dimmed. Same-type runs are coalesced into single fills so thousands of events render with correct alpha instead of over-stacking at sub-pixel widths. Empty state draws a "(no events loaded)" placeholder. - FovMap -- one dot per unique FOV position, equal-aspect (a straight line of FOVs stays a line), grey visit-order path, active dot recolored to the current event type. Pinned square via resizeEvent. Paints its own rounded panel background; "FOV X/Y" counter in the corner. - Stats form -- event N/M, elapsed, scheduled, lag, remaining, errors. Times formatted hh:mm:ss with the leading unit suffixed and dropped when zero; lag turns red past 5 s. Wrapped in a shaded panel echoing napari's layer-controls boxes. - Pause/Resume + Stop buttons. Threading / theming details: - statusChanged is connected with thread="main" and the widget calls psygnal.qt.start_emitting_from_queue() so worker-thread emits are delivered on the GUI thread (drives QWidgets safely under napari). - A 250 ms QTimer ticks the elapsed/remaining clocks between status emissions so time fields don't freeze between frames. - The strip cursor tracks n_frames_received (actual snaps), not n_events_consumed (the feed loop runs 3-4 ahead via backpressure, which made the strip jump several cells at run start). - Colors/fonts derive from the Qt palette so the widget adapts to napari's light/dark theme; corner radii match napari widgets.

Add a second stage position (20, 20, 0) to the baseline / stim / recovery sequences so the demo exercises a 2-FOV acquisition -- the ExperimentStatusWidget's FOV map then shows both positions and the visit-order path between them. Drop the frame interval 1.5s -> 1s.

hinderling · 2026-05-16T11:38:39Z

@alandolt can you have a look if you see any general issues with this architecture change? still a few open TODOs before merging, but the main idea is there i think! but would be great to have your input before i start migrating the other notebooks etc. I think this will also be useful more long-term, running experiments on different microscopes simultaneously with BO for example, in combo with pymmcore-proxy.

Add FrameDispenser.cancel() and the FrameWaitCancelled exception so a thread blocked in wait_for_frame / get_predecessor is woken immediately instead of sitting out the full timeout. This lets an experiment abort promptly: a feed loop parked in an up-to-80s stim-mask wait is released the instant the run is cancelled.

Cancellation: RunHandle gains an on_cancel hook, invoked synchronously from cancel(), that wakes a feed loop blocked in a stim-mask wait via Analyzer.cancel_pending_waits(). Previously a cancel issued during that wait took up to the stim-mask timeout (~80s) to take effect, leaving the frame handler connected in the meantime. Queue stats: Analyzer.queue_stats() / Controller.queue_stats() expose storage, pipeline and deferred queue depths for the status widget. finish_experiment runs its teardown (run wait + Analyzer drain) on a worker thread and pumps Qt, so napari stays responsive during the drain. Lag is anchored to the first frame's acquisition start rather than the worker's start time, so worker/engine startup (~1s) is no longer charged to every lag reading.

Stop now cancels the run and then runs finish_experiment(), so the next run starts clean instead of leaking the old Analyzer; the state banner shows STOPPING... while the drain runs. Stats are split into three panels (timing / queues / errors). The storage and pipeline queue depths render as grayscale fill bars that turn red past 80% of capacity; deferred shows as a plain count. The FovMap is freely resizable instead of pinned square.

alandolt · 2026-05-19T09:30:31Z

looks super cool and well executed. Thanks.
After a first glance through the code I don't see any issue, will probably soon push forward to also expand controller by an update method that replaces the old stored acquisition events (as seen here https://github.com/pertzlab/faro/blob/main/faro/core/controller.py), as for my agent stuff this is the way to go for some agent classes.
Will try it out on the real mic tomorrow.

hinderling mentioned this pull request May 16, 2026

fix: stop live mode + pump Qt event loop during run_experiment #9

Closed

3 tasks

hinderling and others added 7 commits May 16, 2026 11:35

hinderling force-pushed the feat/async-run-handle branch from d473b9b to 3c0e798 Compare May 16, 2026 09:53

hinderling marked this pull request as ready for review May 16, 2026 11:29

hinderling added 3 commits May 19, 2026 09:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: async run_experiment via RunHandle + cancellation + status widget#10

feat: async run_experiment via RunHandle + cancellation + status widget#10
hinderling wants to merge 10 commits into
pertzlab:mainfrom
hinderling:feat/async-run-handle

hinderling commented May 15, 2026 •

edited

Loading

Uh oh!

hinderling commented May 16, 2026

Uh oh!

alandolt commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hinderling commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What changed

New: faro/core/run_status.py

faro/core/controller.py

New: faro/widgets/experiment_status.py

Async/Qt fixes folded in

BREAKING: notebook updates required

Before

After — choose one:

Demo notebook (test artifact — remove before merge)

What to check / test before merging

Related (separate repo)

Verification

Compatibility notes

Screenshot

Uh oh!

hinderling commented May 16, 2026

Uh oh!

alandolt commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hinderling commented May 15, 2026 •

edited

Loading

New: `faro/core/run_status.py`

`faro/core/controller.py`

New: `faro/widgets/experiment_status.py`