Skip to content

Correctly handle HDF5 fillvalue for string dtype arrays.#988

Open
sharkinsspatial wants to merge 1 commit into
mainfrom
fix/problem_fillvalues
Open

Correctly handle HDF5 fillvalue for string dtype arrays.#988
sharkinsspatial wants to merge 1 commit into
mainfrom
fix/problem_fillvalues

Conversation

@sharkinsspatial
Copy link
Copy Markdown
Collaborator

What I did

Updated the HDF5 parser to support string dtype arrays with a bytes or str fillvalue.

Acceptance criteria:

@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 88.23529% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.93%. Comparing base (2f04001) to head (9ed7c07).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
virtualizarr/parsers/hdf/hdf.py 84.61% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #988      +/-   ##
==========================================
- Coverage   89.97%   89.93%   -0.04%     
==========================================
  Files          33       33              
  Lines        2064     2077      +13     
==========================================
+ Hits         1857     1868      +11     
- Misses        207      209       +2     
Files with missing lines Coverage Δ
virtualizarr/codecs.py 93.44% <100.00%> (+0.22%) ⬆️
virtualizarr/parsers/hdf/hdf.py 94.55% <84.61%> (-1.04%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +85 to +86
if dtype.kind == "O":
dtype = np.dtypes.StringDType()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if dtype.kind == "O":
dtype = np.dtypes.StringDType()
# Discriminate against other object-kind HDF5 dtypes (vlen arrays, object/
# region references) that would silently be coerced to StringDType and
# produce garbage downstream — those aren't supported yet, so fail loudly.
if h5py.check_string_dtype(dtype) is not None:
dtype = np.dtypes.StringDType()
elif dtype.kind == "O":
raise NotImplementedError(
f"HDF5 object dtype {dtype!r} is not a variable-length string and "
f"is not yet supported by HDFParser. h5py exposes vlen arrays "
f"(`h5py.vlen_dtype`) and object/region references "
f"(`h5py.ref_dtype`, `h5py.regionref_dtype`) as numpy object dtype; "
f"please open an issue if your file needs one of these."
)

I don't think all hdf5 object-kind dtypes are variable-length strings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working fill-sentinel-values

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failure to read HDF5 data with data arrays of dtype='<U1'

2 participants