Segmentation fault when reading corrupted compressed HDF5 chunks instead of raising exception

### netCDF4 segfault on corrupt gzip chunk (h5py raises exception instead)

Hey everyone,

I ran into a problem when trying to read a NetCDF4/HDF5 file that contains a corrupt gzip-compressed chunk. When I read the file using `netCDF4-python`, Python crashes with a segmentation fault. When I try the exact same file and slice with `h5py`, it behaves normally and raises an exception instead of crashing.

It looks like the crash happens somewhere in the underlying netCDF-C / HDF5 filter decompression code, but since h5py manages to surface the error safely, I’m hoping there might be a way to handle this more gracefully on the Python side.


## Environment

```
Python: 3.13.2 (Windows)
netCDF4: 1.7.3
netCDF4 HDF5 lib version: 1.14.6
netCDF4 netcdf lib version: 4.9.2
h5py: 3.15.1
h5py HDF5: 1.14.6
numpy: 2.3.5
```


## Minimal example

This is very much bound to my corrupt file but this could be generalized to any corrupt file or corrupt variable within a file. 

[corrupt_chunk_minimal.zip](https://github.com/user-attachments/files/24236612/corrupt_chunk_minimal.zip)

```python
import netCDF4

filepath = "path/to/file_with_corrupt_chunk.nc"

with netCDF4.Dataset(filepath, 'r') as ds:
    # Causes a crash instead of an exception
    lst = ds.variables['lst'][0:1, 15000:16000, 33000:34000]
```


## Same read using h5py (works - throws exception)

```python
import h5py

filepath = "path/to/file_with_corrupt_chunk.nc"

with h5py.File(filepath, 'r') as f:
    lst = f['lst'][0:1, 15000:16000, 33000:34000]
    # Raises: OSError: Can't synchronously read data 
    # (filter returned failure during read)
```

---

## File info

* Format: NetCDF4/HDF5
* `lst` shape: (1, 18000, 36000)
* Chunking: (1, 1000, 1000)
* Compression: gzip level 9 + shuffle
* Total chunks: 648
* Corrupt chunks: 1 (chunk indices 0, 15, 33)


## What I found

Digging into `_netCDF4.pyx`, the crash seems to happen inside `nc_get_vara()` while HDF5 is trying to decompress the data. Since the segfault happens internally, the Python exception logic never gets triggered.

h5py somehow manages to get a clean error back from HDF5 instead of crashing, so there might be a way to surface the error at the netCDF4 layer too.


Ideally, when netCDF4 hits a corrupt compressed chunk, it would raise a Python exception (e.g., `RuntimeError`, or something specific to HDF5), instead of crashing the interpreter.

Something like:

```python
RuntimeError: HDF5 filter decompression failed: corrupt or invalid compressed data
```

Would be much easier to handle. As right now I am forced to exclude this file from my dataset.

Happy to provide the file or run additional tests. Let me know!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Segmentation fault when reading corrupted compressed HDF5 chunks instead of raising exception #1456

netCDF4 segfault on corrupt gzip chunk (h5py raises exception instead)

Environment

Minimal example

Same read using h5py (works - throws exception)

File info

What I found

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Segmentation fault when reading corrupted compressed HDF5 chunks instead of raising exception #1456

Description

netCDF4 segfault on corrupt gzip chunk (h5py raises exception instead)

Environment

Minimal example

Same read using h5py (works - throws exception)

File info

What I found

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions