Skip to content

fix: OME-Zarr init_stream PermissionError on SMB/network drives#12

Draft
hinderling wants to merge 1 commit into
pertzlab:mainfrom
hinderling:fix/ome-zarr-init-stream-smb
Draft

fix: OME-Zarr init_stream PermissionError on SMB/network drives#12
hinderling wants to merge 1 commit into
pertzlab:mainfrom
hinderling:fix/ome-zarr-init-stream-smb

Conversation

@hinderling
Copy link
Copy Markdown
Collaborator

Problem

OmeZarrWriter._init_stream_direct (the multi-position OME-Zarr path) builds the store like this:

root = zarr.open_group(self._zarr_path, mode="w")   # writes zarr.json
...
root.attrs["ome"] = {...}                            # rewrites zarr.json

Assigning root.attrs rewrites the group's zarr.json a second time. zarr v3's LocalStore writes metadata atomically: it writes a zarr.<uuid>.partial temp file, then os.replace()s it over the target.

On SMB / network drives (here a Windows Z: share) that os.replace() over an existing zarr.json — one created microseconds earlier by the open_group call — intermittently fails:

PermissionError: [WinError 5] Access is denied:
'Z:\...\acquisition.ome.zarr\zarr.<uuid>.partial' -> 'Z:\...\acquisition.ome.zarr\zarr.json'

The just-written zarr.json is still pinned by an SMB oplock (or an AV scan) when the replace runs. The first write succeeds precisely because its target does not exist yet — that is a plain rename, not a replace-over-existing.

init_stream is called once at run start (from Controller._run_worker), outside the writer's write() retry loop — so this surfaced as a hard crash that aborted the acquisition before the first frame.

Fix

Bake the OME metadata into the group's creation call so zarr.json is written exactly once:

root = zarr.open_group(self._zarr_path, mode="w", attributes={"ome": ome_metadata})

With a single write, the only os.replace() is a rename into a path that does not exist yet — which is not subject to the replace-over-existing failure mode. Verified on zarr 3.1.4: open_group(attributes=...) writes the attributes in the initial metadata write, and the subsequent create_array("0", ...) adds the child array without rewriting the parent group's zarr.json.

Where else this pattern appears — and why the same fix does not drop in cleanly

The label-writing paths have the same multi-write shape — _create_label_array, in both OmeZarrWriter and OmeZarrWriterPlate:

labels_grp = img_grp.require_group("labels")
...
labels_grp.attrs["ome"]    = ome_attrs        # write
labels_grp.attrs["labels"] = existing         # write
label_grp = labels_grp.require_group(name)
...
label_grp.attrs["ome"]         = {...}        # write
label_grp.attrs["multiscales"] = multiscales  # write
label_grp.attrs["image-label"] = image_label  # write

Every attrs[...] = is another atomic replace-over-existing of that group's zarr.json, so each could in principle hit the same WinError 5.

The "bake attributes at creation" fix does not transfer cleanly here:

  1. labels_grp is read-modify-write. It reads the existing ome attrs, appends the new label name, and writes back — the labels container accumulates label names across calls. The final attribute set is not known at creation time, and require_group opens an already-existing group, so there is no one-shot creation write to bake into.
  2. label_grp is created with require_group(name) — idempotent open-or-create — not a one-shot create_group, so again there is no single creation write to attach attributes to.

So the structural single-write fix is specific to init_stream, where the group genuinely is created fresh in one call.

Why the label paths are nonetheless safe today: _create_label_array is reached only via _write_label, which is dispatched by write() — and write() wraps every call in a retry loop (_WRITE_RETRY_ATTEMPTS, exponential backoff) that catches PermissionError / OSError. A WinError 5 from a label attr write is caught and retried, and _create_label_array is re-runnable on retry (require_group is idempotent, create_array(overwrite=True), and the label-name merge skips names already added). init_stream was the one metadata-writing path with no retry net — which is exactly why it crashed while the label paths do not.

A reasonable follow-up (out of scope here) would be to collapse the label paths' 2-3 separate attrs[...] = assignments into a single update_attributes({...}) call — fewer network round-trips, fewer chances to flake into the retry loop — but it is lower priority since the retry already covers them.

Test plan

  • zarr 3.1.4: open_group(mode="w", attributes={"ome": ...}) writes attrs in the creation write; create_array does not rewrite the parent zarr.json.
  • Existing writer / zarr / ome tests pass (14 passed).
  • On the SMB drive: a multi-position OME-Zarr run starts without PermissionError.

_init_stream_direct created the multi-position store with
zarr.open_group(mode="w") and then assigned root.attrs["ome"], which
rewrites zarr.json a second time via an atomic temp-file + os.replace.
On SMB/network drives that replace-over-existing intermittently fails
with PermissionError (WinError 5): the file written microseconds
earlier is still held by an SMB oplock or an AV scan. init_stream runs
outside the writer's retry loop, so the run crashed.

Bake the OME metadata into the group's creation call
(open_group(..., attributes={"ome": ...})) so zarr.json is written
exactly once -- the lone write is a rename into a non-existent path,
which does not hit the replace-over-existing failure mode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant