Skip to content

Remove forced chunking on NSRDB open (Kestrel) #250

@tobin-ford

Description

@tobin-ford

Describe the bug
Chunking warning when opening h5/netcdf in Kestrel NSRDB. This happens because we try to create a dask backed dataset with chunks not aligned to those in the filesystem.

This probably causes performance degradation for the agrivoltaics irradiance dataset pipelines. At best it creates extra noise.

To Reproduce

WEATHER_DB = "NSRDB"
WEATHER_ARG = {
  "satellite": "Americas",
  "names": "TMY",
  "NREL_HPC": True,
  "attributes": pvdeg.pysam.INSPIRE_NSRDB_ATTRIBUTES,
}
pvdeg.weather.get(database=WEATHER_DB, geospatial=True, **WEATHER_ARG)


/home/tford/.conda-envs/geospatial/lib/python3.9/site-packages/xarray/core/dataset.py:277: UserWarning: The specified chunks separate the stored chunks along dimension "phony_dim_1" starting at index 500. This could degrade performance. Instead, consider rechunking after loading.
  warnings.warn(

Expected behavior
Instead of forcing chunks on open we should let xarray/dask determine how the loaded dataset should be chunked.
If we want to force some chunking behavior, we can rechunk the dask backed dataset after opening all the files. This should be faster at load time or produce simpler dask graphs.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions