Support to use zarr.sync.ProcessSynchronizer(path) with S3 as path

Hi everyone,

I've been digging around to see if there's already an existing way to use `zarr.sync.ProcessSynchronizer(path)` with S3 as `path`, but no luck.

My scenario is I have a Lambda function that listens to S3 events and writes NetCDF files to a Zarr store (on S3), each Lambda call will process one NetCDF file.

As Lambda is a distributed system, 10 new files uploaded will trigger 10 different processes that try to write to the Zarr store pretty much at the same time, and I experience some data corruption issues.

Using `zarr.sync.ProcessSynchronizer()` in `xarray.dataset.to_zarr(synchronizer=...)` for `DirectoryStore` seems to solve this write consistency issue.

But storing Zarr store on `S3` is important to us, and cloud-optimised format like Zarr should be able to fully support S3. So I wonder if this is a bug or a non-existing feature or I just don't know it yet.

Please advise.

Thanks everyone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support to use zarr.sync.ProcessSynchronizer(path) with S3 as path #1224

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Support to use zarr.sync.ProcessSynchronizer(path) with S3 as path #1224

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions