-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
Describe the bug
Chunking warning when opening h5/netcdf in Kestrel NSRDB. This happens because we try to create a dask backed dataset with chunks not aligned to those in the filesystem.
This probably causes performance degradation for the agrivoltaics irradiance dataset pipelines. At best it creates extra noise.
To Reproduce
WEATHER_DB = "NSRDB"
WEATHER_ARG = {
"satellite": "Americas",
"names": "TMY",
"NREL_HPC": True,
"attributes": pvdeg.pysam.INSPIRE_NSRDB_ATTRIBUTES,
}
pvdeg.weather.get(database=WEATHER_DB, geospatial=True, **WEATHER_ARG)
/home/tford/.conda-envs/geospatial/lib/python3.9/site-packages/xarray/core/dataset.py:277: UserWarning: The specified chunks separate the stored chunks along dimension "phony_dim_1" starting at index 500. This could degrade performance. Instead, consider rechunking after loading.
warnings.warn(Expected behavior
Instead of forcing chunks on open we should let xarray/dask determine how the loaded dataset should be chunked.
If we want to force some chunking behavior, we can rechunk the dask backed dataset after opening all the files. This should be faster at load time or produce simpler dask graphs.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working