Skip to content

Zarr write + Zarr groups#368

Merged
davidhassell merged 42 commits intoNCAS-CMS:mainfrom
davidhassell:zarr-write
Jan 8, 2026
Merged

Zarr write + Zarr groups#368
davidhassell merged 42 commits intoNCAS-CMS:mainfrom
davidhassell:zarr-write

Conversation

@davidhassell
Copy link
Contributor

Fixes #354 and #355

Orientation

cfdm/data/aggregatedarray.py

  • Change to allow URIs to be Python strngs or scalar numpy string types.

cfdm/data/data.py

  • Include Zarr dataset sharding methods
  • Include Zarr fragment class
  • Move the test for an "empty" slice to the parent construct (in cfdm/mixin/propertiesdata.py), because sometime we want to allow slices on abstract data (just like numpy does)

cfdm/data/fragment/fragmentfilearray.py

  • Include Zarr fragment class

cfdm/data/fragment/fragmentzarrarray.py

  • New file defining a Zarr fragment (very similary to cfdm/data/fragment/fragmentnetcdf4array.py

cfdm/data/netcdfindexer.py

  • Account for zarr using the new numpy T data type

cfdm/mixin/netcdf.py

  • New mixin class for dataset shards

cfdm/read_write/abstract/abstractio.py

  • Rename "file" to "dataset"

cfdm/read_write/netcdf/flatten/flatten.py

  • Rename "file" to "dataset"
  • Use match clauses to switch between dataset format APIs
  • Changes to allow Zarr datasets to be flattened
    • Zarr datasets to not have well-defined Dimensions, so finding
      which dimensions belong to which groups for Zarr datasets is more
      involved, and needs configuring with the new
      group_dimension_search keyword.

cfdm/read_write/netcdf/netcdfread.py

  • Rename "file" to "dataset"
  • Use match clauses to switch between dataset format APIs
  • Handle Zarr shards
  • Performance improvements in _cache_data_elements, largely aimed at
    reducing the number of requests to disk

cfdm/read_write/netcdf/netcdfwrite.py

  • Rename "file" to "dataset"
  • Use match clauses to switch between dataset format APIs
  • Handle Zarr shards

cfdm/read_write/netcdf/zarr.py

  • Include a reference variable in ZarrDimension

cfdm/read_write/read.py

  • New keywords store_dataset_shards and group_dimension_search

cfdm/read_write/write.py

  • Rename "file" to "dataset"
  • New keyword dataset_shards

setup.py

  • zarr import optional

@davidhassell davidhassell added this to the NEXTVERSION milestone Nov 13, 2025
@davidhassell davidhassell added enhancement New feature or request dataset write Relating to writing datasets dataset read Relating to reading datasets labels Nov 13, 2025
Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff! With a satisfying amount of match-case additions 🙂

Only minor comments, though note especially the more significant one about append mode, and as a general comment when I run test_zarr with Python 3.12 and Zarr 3.0.8 I get spammed with a warning of:

/home/slb93/miniconda3/envs/cf-env-312-numpy2/lib/python3.12/site-packages/zarr/codecs/vlen_utf8.py:44: UserWarning: The codec `vlen-utf8` is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
  return cls(**configuration_parsed)

but when I run it with Python 3.13 and Zarr 3.1.0 I don't see those. So I suspect the Zarr version 3.0.8 might be to blame. Ideally we can check this isn't emerging as a result of our code and fix it if it is.

Otherwise all good so please merge once you've considered all of my feedback here.

Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff! With a satisfying amount of match-case additions 🙂

Only minor comments, though note especially the more significant one about append mode, and as a general comment when I run test_zarr with Python 3.12 and Zarr 3.0.8 I get spammed with a warning of:

/home/slb93/miniconda3/envs/cf-env-312-numpy2/lib/python3.12/site-packages/zarr/codecs/vlen_utf8.py:44: UserWarning: The codec `vlen-utf8` is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
  return cls(**configuration_parsed)

but when I run it with Python 3.13 and Zarr 3.1.0 I don't see those. So I suspect the Zarr version 3.0.8 might be to blame. Ideally we can check this isn't emerging as a result of our code and fix it if it is.

Otherwise all good so please merge once you've considered all of my feedback here.

@sadielbartholomew
Copy link
Member

(Think GitHub is playing up a bit - top-level review comment came through twice for the one review 🤔 )

davidhassell and others added 3 commits January 8, 2026 09:23
Co-authored-by: Sadie L. Bartholomew <sadie.bartholomew@ncas.ac.uk>
Co-authored-by: Sadie L. Bartholomew <sadie.bartholomew@ncas.ac.uk>
@davidhassell
Copy link
Contributor Author

On the Zarr version, the requirements give zarr>=3.1.3, so we should be covered for this. However, what it does show a need for, perhaps, is some version checking via importlib.metadata, as you demonstrated in #362 (comment) ... one for another issue!

davidhassell and others added 3 commits January 8, 2026 09:53
Co-authored-by: Sadie L. Bartholomew <sadie.bartholomew@ncas.ac.uk>
@davidhassell
Copy link
Contributor Author

Thanks Sadie - very thorough. Some resolutions and comments back t you ...

@sadielbartholomew
Copy link
Member

On the Zarr version, the requirements give zarr>=3.1.3, so we should be covered for this.

Ah, fair - good point. What happened was, when I went to check what Zarr we require I referenced the removed dependency version (zarr>=3.0.8) but of course should have checked the optional dependency version specification where that has been bumped up. Sorry for the mistake. The good news is that means the spammy warning issue goes away nicely.

@sadielbartholomew
Copy link
Member

Thanks for such a quick response to the feedback, David. I have responded to all comments and the only thing remaining to be addressed is that at: #368 (comment). Please take a look and once it's resolved we can merge.

@sadielbartholomew
Copy link
Member

OK now ready to merge as far as I am concerned, thanks!

@davidhassell
Copy link
Contributor Author

w00t! Thanks again. Merging now.

@davidhassell davidhassell merged commit 5781b4d into NCAS-CMS:main Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset read Relating to reading datasets dataset write Relating to writing datasets enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Write Zarr datasets

2 participants