Skip to content

Address h5netcdf scalar issue#3331

Open
mraspaud wants to merge 12 commits intopytroll:mainfrom
mraspaud:h5netcdf-cache-fix
Open

Address h5netcdf scalar issue#3331
mraspaud wants to merge 12 commits intopytroll:mainfrom
mraspaud:h5netcdf-cache-fix

Conversation

@mraspaud
Copy link
Member

@mraspaud mraspaud commented Feb 3, 2026

This PR address caching scalar values in the netcdf util when using h5netcdf >= 1.7.0.
The solution (catching a ValueError on top of the IndexError) was already available, but in a duplicated part of the code that was not used for the caching part.

While setting up the test case corresponding to this issue and adding more h5netcdf engine tests (through simply parametrising existing tests to also use that engine), it became clear that netcdf4 and h5netcdf cannot be imported at the same time in some environments, hence the changes in the netcdf file creation fixture, the removal of netcdf4 as top level import, and the usage of pytest-forked in ci. For information, I am running on my RHEL9 laptop inside a uv environment with python 3.14.

  • Tests added

@mraspaud mraspaud self-assigned this Feb 3, 2026
@mraspaud mraspaud requested a review from djhoese as a code owner February 3, 2026 10:42
@mraspaud mraspaud added the bug label Feb 3, 2026
Copy link
Member

@pnuu pnuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question and one typo fix inline. In general I'm fine with this.

Comment on lines 73 to 76
from multiprocessing import Process
p = Process(target=_create_netcdf_file, args=(filename, ))
p.start()
p.join()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this complication, does it help us in some way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, see the text I wrote in the PR note. Basically, it’s because in some environments (like mine), writing with netcdf4 and reading with h5netdf from the same process crashes, making all the h5netcdf tests fail in my case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. I didn't see the connection between the comment and the fixture.

Comment on lines +65 to +69
# WARN: This is very confusing, but it seems to work: The reader_kwargs, when passed without a reader key,
# are put into a dictionary where keys are individual letters of the reader name, and the value is the
# reader kwargs; however, notice the `reader[idx]` part in the call here, as idx is 0 and thus the first
# letter of the reader is used to index `reader_kwargs_without_filter`, thus retrieving the correct
# reader_kwargs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically this logic is incorrect and works by luck/chance and needs to be fixed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it was the intended behaviour or not... It's working in the end, so maybe it is? But yeah, it would definitely benefit from a dedicated test and then a refactor...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what did you do to run into the issue? You called it from the Scene or noticed it within a test you already had?

@djhoese
Copy link
Member

djhoese commented Feb 3, 2026

it became clear that netcdf4 and h5netcdf cannot be imported at the same time in some environments

Is this intended to not work? Is this a bug in h5netcdf then? Is it something that can or should be fixed upstream? The forking with the pytest flag I can kind of agree with...maybe, but the process creation inside the test file creation is too much for me when all of this comes down to an HDF5 issue from what we can tell.

@djhoese
Copy link
Member

djhoese commented Feb 3, 2026

Oh and if you have error messages could you please put then in the PR description so in the future this PR is easily discoverable.

@djhoese
Copy link
Member

djhoese commented Feb 3, 2026

I'm guessing you're running into the same issues users describe here when installing packages from PyPI:

#3043

If you're able to reproduce this easily that'd be great for making progress on that issue.

@mraspaud
Copy link
Member Author

mraspaud commented Feb 4, 2026

Oh and if you have error messages could you please put then in the PR description so in the future this PR is easily discoverable.

Good point! Here is the error I get from pytest repeatedly in the h5netcdf cases (and I tried creating coordinate variables, doesn't help, the only thing that helps is when netCDF4 is not loaded).

TL;DR: RuntimeError: Unspecified error in H5DSis_scale (return value <0)

_______________________________________________________________________________________________________ test_get_data_as_xarray_h5netcdf _______________________________________________________________________________________________________

tmp_path = PosixPath('/tmp/pytest-of-a001673/pytest-3/test_get_data_as_xarray_h5netc0')

    def test_get_data_as_xarray_h5netcdf(tmp_path):
        """Test getting xr.DataArray from h5netcdf variable."""
        import numpy as np
    
        from satpy.readers.core.netcdf import get_data_as_xarray
    
        data = np.array([1, 2, 3])
        fname = tmp_path / "test.nc"
>       fid = _write_test_h5netcdf(fname, data)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

data       = array([1, 2, 3])
fname      = PosixPath('/tmp/pytest-of-a001673/pytest-3/test_get_data_as_xarray_h5netc0/test.nc')
get_data_as_xarray = <function get_data_as_xarray at 0x7fc017c50930>
np         = <module 'numpy' from '/home/a001673/usr/src/satpy/h5netcdf-cache-fix/.venv/lib/python3.14/site-packages/numpy/__init__.py'>
tmp_path   = PosixPath('/tmp/pytest-of-a001673/pytest-3/test_get_data_as_xarray_h5netc0')

satpy/tests/reader_tests/test_netcdf_utils.py:389: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
satpy/tests/reader_tests/test_netcdf_utils.py:401: in _write_test_h5netcdf
    fid.dimensions = {"y": data.size}
    ^^^^^^^^^^^^^^
        data       = array([1, 2, 3])
        fid        = <h5netcdf.File 'test.nc' (mode r+, backend h5py)>
Dimensions:
Groups:
Variables:
Attributes:
        fname      = PosixPath('/tmp/pytest-of-a001673/pytest-3/test_get_data_as_xarray_h5netc0/test.nc')
        h5netcdf   = <module 'h5netcdf' from '/home/a001673/usr/src/satpy/h5netcdf-cache-fix/.venv/lib/python3.14/site-packages/h5netcdf/__init__.py'>
.venv/lib/python3.14/site-packages/h5netcdf/core.py:1257: in dimensions
    self._dimensions.update(value)
        self       = <h5netcdf.File 'test.nc' (mode r+, backend h5py)>
Dimensions:
Groups:
Variables:
Attributes:
        value      = {'y': 3}
<frozen _collections_abc>:971: in update
    ???
        key        = 'y'
        kwds       = {}
        other      = {'y': 3}
        self       = <h5netcdf.Dimensions: >
.venv/lib/python3.14/site-packages/h5netcdf/dimensions.py:43: in __setitem__
    self._objects[name] = Dimension(self._group, name, size, create_h5ds=True)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        name       = 'y'
        self       = <h5netcdf.Dimensions: >
        size       = 3
.venv/lib/python3.14/site-packages/h5netcdf/dimensions.py:108: in __init__
    self._create_scale()
        create_h5ds = True
        name       = 'y'
        parent     = <h5netcdf.File 'test.nc' (mode r+, backend h5py)>
Dimensions:
Groups:
Variables:
Attributes:
        phony      = False
        self       = <h5netcdf.Dimension 'y': size 3>
        size       = 3
.venv/lib/python3.14/site-packages/h5netcdf/dimensions.py:228: in _create_scale
    if not self._root._h5py.h5ds.is_scale(self._h5ds.id):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        NOT_A_VARIABLE = b'This is a netCDF dimension but not a netCDF variable.'
        dimid      = 0
        dimlen     = b'         3'
        kwargs     = {'track_order': True}
        scale_name = b'This is a netCDF dimension but not a netCDF variable.         3'
        self       = <h5netcdf.Dimension 'y': size 3>
        size       = 3
h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper
    ???
h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper
    ???
h5py/h5ds.pyx:34: in h5py.h5ds.is_scale
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   RuntimeError: Unspecified error in H5DSis_scale (return value <0)


h5py/defs.pyx:4326: RuntimeError

@codecov
Copy link

codecov bot commented Feb 4, 2026

Codecov Report

❌ Patch coverage is 99.59350% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 96.33%. Comparing base (2554db5) to head (42ee530).
⚠️ Report is 26 commits behind head on main.

Files with missing lines Patch % Lines
satpy/readers/smos_l2_wind.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3331      +/-   ##
==========================================
- Coverage   96.34%   96.33%   -0.01%     
==========================================
  Files         463      463              
  Lines       58916    58854      -62     
==========================================
- Hits        56760    56697      -63     
- Misses       2156     2157       +1     
Flag Coverage Δ
behaviourtests 3.60% <0.00%> (+<0.01%) ⬆️
unittests 96.42% <99.59%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +150 to +154
if ds_id["name"] == "lat":
data = data.where((data > -90.0) & (data < 90.0), drop=True)
else:
# Is there such a case?
data = data.where((data.y > 0) & (data.y < len(data.y) - 1), drop=True)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TAlonglong That's the part I'm unsure about...

@mraspaud
Copy link
Member Author

mraspaud commented Feb 4, 2026

@joleenf @djhoese I did a refactor of the mimic and geocat tests, if you could have a look that would be great!



class TestGEOCATReader(unittest.TestCase):
@pytest.fixture(scope="session")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these aren't used outside of this module then the scope should be module, not session.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants