You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tracking the work to resolve the 15 open issues labelled fill-sentinel-values plus the related upstream gaps in xarray's FillValueCoder.
Framing
VirtualiZarr correctness is measured against the Zarr spec, not against xarray equivalence. Several recent reports (zarr-developers/VirtualiZarr#989, zarr-developers/VirtualiZarr#485, zarr-developers/VirtualiZarr#628) hit xarray's FillValueCoder.decode failing on JSON-native scalars in zarr metadata — the parser is producing spec-compliant output that xarray's HDF5-style coder can't consume. Tracked upstream at pydata/xarray#11332. Those are upstream xarray issues, not virtualizarr bugs.
The 15 open issues plus two new findings collapse into 8 underlying problems. Each issue is listed under its primary cluster; cross-cluster cascades are noted inline.
A. Parser crashes during fill extraction — local parser fixes, ~5-20 lines each.
(new finding, no GH issue yet) structured-dtype _FillValue raises TypeError at _extract_attrs in parsers/hdf/hdf.py:364 due to a v == "DIMENSION_SCALE" comparison against a void scalar.
B. HDF parser _FillValue encoding gaps — local parser fix; emit base64 for kind S per docs/custom_parsers.md.
D. h5py default fillvalue propagated indiscriminately — parser fix: use dataset.id.get_create_plist().fill_value_defined() to skip propagating defaults. Fixing D removes the cascade into C for vlen-string-without-_FillValue cases.
E. Cross-parser inconsistency — different parsers produce different fill defaults / metadata for the same source. Architectural fix.
fill_value_defined() distinction: stop propagating h5py-default fills to zarr storage.
Cross-parser consistency: extend the property-test suite to Kerchunk, TIFF; port HDFParser conventions; document the contract in docs/custom_parsers.md.
Writer-side round-trips: Icechunk / Kerchunk writers preserve fill semantics.
Phase 2 runs in parallel with all others. BothEnginesFailedIdenticallyError cases auto-resolve when xarray ships the fix; no virtualizarr code change required.
Tracking the work to resolve the 15 open issues labelled
fill-sentinel-valuesplus the related upstream gaps in xarray'sFillValueCoder.Framing
VirtualiZarr correctness is measured against the Zarr spec, not against xarray equivalence. Several recent reports (zarr-developers/VirtualiZarr#989, zarr-developers/VirtualiZarr#485, zarr-developers/VirtualiZarr#628) hit xarray's
FillValueCoder.decodefailing on JSON-native scalars in zarr metadata — the parser is producing spec-compliant output that xarray's HDF5-style coder can't consume. Tracked upstream at pydata/xarray#11332. Those are upstream xarray issues, not virtualizarr bugs.The property-test infrastructure added in zarr-developers/VirtualiZarr#990 distinguishes failure categories:
Root-cause clusters
The 15 open issues plus two new findings collapse into 8 underlying problems. Each issue is listed under its primary cluster; cross-cluster cascades are noted inline.
A. Parser crashes during fill extraction — local parser fixes, ~5-20 lines each.
_FillValueraisesTypeErrorat_extract_attrsinparsers/hdf/hdf.py:364due to av == "DIMENSION_SCALE"comparison against a void scalar.B. HDF parser
_FillValueencoding gaps — local parser fix; emit base64 for kindSperdocs/custom_parsers.md._FillValuezarr-developers/VirtualiZarr#628 — same root cause as #785; symptom is xarray's base64 assertion downstream.C. xarray
FillValueCoderlacking branches — upstream, tracked at pydata/xarray#11332. Out of virtualizarr scope.D. h5py default fillvalue propagated indiscriminately — parser fix: use
dataset.id.get_create_plist().fill_value_defined()to skip propagating defaults. Fixing D removes the cascade into C for vlen-string-without-_FillValuecases.E. Cross-parser inconsistency — different parsers produce different fill defaults / metadata for the same source. Architectural fix.
F. Writer-side fill semantics — writer-API design questions, distinct from parser fixes.
nullis a valid fill_value for Zarr V2 zarr-developers/VirtualiZarr#478G. Attribute serialization fidelity — zarr v3 metadata is JSON; lossy for some attribute shapes.
float32→float64).H. Cross-cutting encoding model — meta-discussion; closes via the totality of the other clusters.
Phases
_extract_attrs, ZarrParser default lookup, S-dtype base64 encoding. ~50 lines total across several small PRs.FillValueCoderJSON-native-scalar gap; engage with zarr-specs#351, zarr-extensions#33.fill_value_defined()distinction: stop propagating h5py-default fills to zarr storage.docs/custom_parsers.md.Phase 2 runs in parallel with all others.
BothEnginesFailedIdenticallyErrorcases auto-resolve when xarray ships the fix; no virtualizarr code change required.Status
_get_fill_value+ StringDType + kind-skip)References
fix/problem_fillvalues)