Skip to content

[WIP] Extract ZarrCore as a separate library#252

Draft
asinghvi17 wants to merge 11 commits intomasterfrom
as/zarrcore
Draft

[WIP] Extract ZarrCore as a separate library#252
asinghvi17 wants to merge 11 commits intomasterfrom
as/zarrcore

Conversation

@asinghvi17
Copy link
Member

Fixes #186.

The idea of this pr is to extract a core library ZarrCore.jl so that it can be (a) used in Zarr and (b) potentially depended on by Zarrs.jl also. This way,

  • we can have a higher degree of interoperability and swappability between the two libraries - so Zarr.jl could use a store defined in Zarrs.jl, for example, and they would subtype the same supertype.
  • for people who require 0 binary dependencies, we have a pure Julia version of zarr

Below follows a more detailed changelog. I'm still testing this and not quite there with the API yet. But it seems to work so far.

  • Extract a minimal ZarrCore.jl package at lib/ZarrCore/ containing all core types, registries, and pure-Julia implementations sufficient to read/write uncompressed Zarr v2 and
    v3 arrays
  • Rewrite Zarr.jl as a thin wrapper that depends on ZarrCore, registering Blosc/Zlib/Zstd compressors, V3 compression codecs, and network/archive stores
  • Replace hardcoded codec parsing in metadata3.jl with extensible v3_codec_parsers registry and codec_to_dict dispatch

Detailed description

What's in ZarrCore

  • Types: ZarrFormat, AbstractCodecPipeline, MetadataV2, MetadataV3, V2Pipeline, V3Pipeline, ZArray, ZGroup
  • Registries: compressortypes, v3_codec_parsers, codec_to_dict, filterdict, storageregexlist, chunk_key_encoding_parsers
  • Compressor: NoCompressor only (default_compressor() returns it)
  • V3 Codecs: BytesCodec, TransposeCodec
  • Filters: All pure-Julia (Fletcher32, Shuffle, Delta, Quantize, FixedScaleOffset, VLen)
  • Stores: DirectoryStore, DictStore, ConsolidatedStore
  • API: zcreate, zopen, zzeros, zgroup, consolidate_metadata

What stays in Zarr.jl

  • Blosc/Zlib/Zstd compressors (override default_compressor() → BloscCompressor())
  • V3 compression codecs: GzipV3Codec, BloscV3Codec, ZstdV3Codec, CRC32cV3Codec, ShardingCodec
  • Network/archive stores: HTTPStore, GCStore, S3Store, ZipStore
  • AWSS3 extension (unchanged)

Key design decisions

  • init registration pattern: All Dict/Array mutations (compressor types, codec parsers, storage regexes) happen in Zarr.init() since top-level mutations don't persist
    through Julia precompilation
  • DEFAULT_COMPRESSOR_FACTORY Ref: Allows Zarr.jl to override ZarrCore's default compressor at runtime
  • Registry-based V3 codec parsing: v3_codec_parsers Dict replaces the hardcoded if/elseif chain in metadata3.jl, making codecs extensible without modifying core code
  • codec_to_dict dispatch: Replaces hardcoded isa checks in lower3 serialization
  • Backward-compatible module paths: Zarr.

asinghvi17 and others added 10 commits March 23, 2026 19:11
Minimal core package with Project.toml (UUID, deps on JSON, DiskArrays,
OffsetArrays, DateTimes64, Dates) and module entry point defining
ZarrFormat{V}, DV constant, and AbstractCodecPipeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- chunkkeyencoding.jl: AbstractChunkKeyEncoding, ChunkKeyEncoding
- MaxLengthStrings.jl: MaxLengthString type
- Compressors: abstract Compressor, compressortypes registry,
  NoCompressor, default_compressor(), zcompress/zuncompress fallbacks
- Codecs: V3Codecs module with V3Codec{In,Out} abstract type,
  v3_codec_parsers registry, codec_to_dict dispatch, BytesCodec,
  TransposeCodec
- Filters: all pure-Julia filters (Fletcher32, Shuffle, Delta,
  Quantize, FixedScaleOffset, VLen)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- metadata.jl: MetadataV2, type string system, fill value
  encoding/decoding, default_compressor() in constructors
- metadata3.jl: MetadataV3 with registry-based codec parsing via
  parse_v3_codec/v3_codec_parsers (replaces hardcoded if/elseif),
  codec_to_dict dispatch for serialization (replaces isa checks),
  compressor_to_v3_bytes_codecs dispatch
- pipeline.jl: V2Pipeline, V3Pipeline, pipeline_encode/pipeline_decode!

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Storage.jl: AbstractStore, storageregexlist, metadata I/O,
  SequentialRead/ConcurrentRead strategies
- DirectoryStore, DictStore, ConsolidatedStore (core stores)
- ZArray.jl: ZArray type, readblock!/writeblock!, zcreate/zzeros/zopen
  with default_compressor()
- ZGroup.jl: ZGroup, zgroup, zopen, consolidate_metadata
  (without HTTP.serve/writezip)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Exports: ZArray, ZGroup, zopen, zzeros, zcreate, storagesize,
  storageratio, zinfo, DirectoryStore, DictStore, ConsolidatedStore,
  zgroup
- Smoke tests: NoCompressor roundtrip, DirectoryStore, DictStore,
  V3 uncompressed roundtrip, default_compressor verification

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- src/Zarr.jl: imports from ZarrCore, re-exports public API, includes
  compressor/codec/store files, overrides default_compressor() to
  BloscCompressor via __init__, adds Codecs.V3Codecs wrapper module
  for backward-compatible module paths
- src/Codecs/V3/compression_codecs.jl: GzipV3Codec, BloscV3Codec,
  ZstdV3Codec, CRC32cV3Codec, ShardingCodec with full infrastructure,
  all registering into ZarrCore.Codecs.V3Codecs.v3_codec_parsers
- Compressor files: registration moved to Zarr.__init__
- Storage files: minor fixes for ZarrCore compatibility

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Add ZarrCore as path dependency via [sources.ZarrCore]
- Remove DataStructures from [deps] and [compat] (unused)
- Add ZarrCore to test/Project.toml with source path

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Remove 21 source files from src/ that now live in
lib/ZarrCore/src/. Zarr.jl no longer includes these directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add ZarrCore test command and package structure documentation
describing the core/wrapper split, default_compressor override
pattern, and v3_codec_parsers registry.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- 2026-03-20-zarrcore-extraction-design.md: architectural decisions
- 2026-03-20-zarrcore-extraction.md: 22-task implementation plan

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@asinghvi17 asinghvi17 changed the title Extract ZarrCore as a separate library [WIP] Extract ZarrCore as a separate library Mar 23, 2026
Julia LTS (1.10) doesn't support [sources] in Project.toml for local
path dependencies. Add explicit Pkg.develop step to resolve ZarrCore
from lib/ZarrCore/ before the build step.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@coveralls
Copy link

coveralls commented Mar 23, 2026

Pull Request Test Coverage Report for Build 23455574299

Details

  • 102 of 170 (60.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-5.1%) to 77.218%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/Codecs/V3/compression_codecs.jl 79 147 53.74%
Totals Coverage Status
Change from base Build 23337285125: -5.1%
Covered Lines: 322
Relevant Lines: 417

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor Zarr.jl into ZarrCore (minimal pure Julia implementation) and Zarr.jl could load all the extra packages.

2 participants