Skip to content

Support to use zarr.sync.ProcessSynchronizer(path) with S3 as path #1224

@vietnguyengit

Description

@vietnguyengit

Hi everyone,

I've been digging around to see if there's already an existing way to use zarr.sync.ProcessSynchronizer(path) with S3 as path, but no luck.

My scenario is I have a Lambda function that listens to S3 events and writes NetCDF files to a Zarr store (on S3), each Lambda call will process one NetCDF file.

As Lambda is a distributed system, 10 new files uploaded will trigger 10 different processes that try to write to the Zarr store pretty much at the same time, and I experience some data corruption issues.

Using zarr.sync.ProcessSynchronizer() in xarray.dataset.to_zarr(synchronizer=...) for DirectoryStore seems to solve this write consistency issue.

But storing Zarr store on S3 is important to us, and cloud-optimised format like Zarr should be able to fully support S3. So I wonder if this is a bug or a non-existing feature or I just don't know it yet.

Please advise.

Thanks everyone.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions