perf:Add additional sharding benchmarks by mkitti · Pull Request #3712 · zarr-developers/zarr-python

mkitti · 2026-02-16T23:01:26Z

Summary

Added benchmarks for monitoring Morton order computation in sharded arrays. These benchmarks help assess the impact of Morton order optimizations in the context of I/O operations.

Benchmarks Added

test_sharded_morton_indexing - Sharded array indexing with power-of-2 chunks per shard
test_sharded_morton_indexing_large - Large shard with 32^3 = 32,768 chunks
test_sharded_morton_single_chunk - Reading a single chunk from a large shard
test_morton_order_iter - Direct benchmark of morton_order_iter (no I/O)
test_sharded_morton_write_single_chunk - Writing a single chunk to a large shard (best end-to-end test)

Benchmark Results

Single Chunk Write (Best End-to-End Test)

Writing a single 1x1x1 chunk to a shard with 32^3 = 32,768 chunks:

Branch	Mean Time	Improvement
Main (no optimization)	425ms	-
Optimized (PR #3708)	261ms	164ms (39% faster)

Morton Order Computation (Micro-benchmark)

Direct morton_order_iter benchmark without I/O:

Shape	Main Branch	Optimized	Speedup
(8, 8, 8)	2.73ms	0.85ms	3.2x
(16, 16, 16)	25.53ms	6.31ms	4.0x
(32, 32, 32)	229.25ms	51.31ms	4.5x

Profiling Analysis

Profile of single chunk write benchmark showing where time is spent:

Main Branch (977ms total)

Function	Time	Calls	% of Total
`decode_morton` (scalar)	289ms	32,768	30%
`get_chunk_slice`	104ms	32,768	11%
`_localize_chunk`	103ms	32,768	11%
`_morton_order`	99ms	1	10%
Generator expressions	94ms	262k	10%
`all()` / `len()`	87ms	263k	9%

Optimized Branch (456ms total)

Function	Time	Calls	% of Total
`get_chunk_slice`	110ms	32,768	24%
`_localize_chunk`	105ms	32,768	23%
`_morton_order`	66ms	1	14%
Generator expressions	38ms	131k	8%
`decode_morton_vectorized`	9ms	1	2%

Key Optimization Wins

Vectorized decoding: Eliminates 32,768 scalar decode_morton calls (289ms → 9ms)
Reduced bounds checking: Hypercube optimization eliminates all() checks for in-bounds coordinates
Fewer function calls: 1.1M calls reduced to 299k calls

Remaining Optimization Opportunity

get_chunk_slice and _localize_chunk are called 32,768 times even when writing a single chunk due to line 508 in sharding.py:

shard_dict = {k: shard_reader.get(k) for k in morton_order_iter(chunks_per_shard)}

This builds a dict of ALL chunks before writing. Optimizing this read-modify-write pattern could save an additional ~215ms.

Checklist

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/user-guide/*.md
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

mkitti · 2026-02-17T04:30:43Z

If we wanted to minimize this pull request, I would reduce it to just "test_sharded_morton_write_single_chunk".

mkitti · 2026-02-17T22:58:07Z

@d-v-b merge or add benchmark label, please.

mkitti added 3 commits February 13, 2026 15:51

test:Add sharding indexing benchmarks

34cf53e

test:Add morton_order_iter benchmark tests

81d87ef

tests:Add single chunk write test for sharding

e195b40

github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Feb 16, 2026

Document changes

094bfbd

github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Feb 16, 2026

mkitti mentioned this pull request Feb 17, 2026

perf: Vectorize get_chunk_slice for faster sharded writes #3713

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf:Add additional sharding benchmarks#3712

perf:Add additional sharding benchmarks#3712
mkitti wants to merge 4 commits intozarr-developers:mainfrom
mkitti:mkitti-morton-order-shard-indexing-benchmarks

mkitti commented Feb 16, 2026 •

edited

Loading

Uh oh!

mkitti commented Feb 17, 2026

Uh oh!

mkitti commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mkitti commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmarks Added

Benchmark Results

Single Chunk Write (Best End-to-End Test)

Morton Order Computation (Micro-benchmark)

Profiling Analysis

Main Branch (977ms total)

Optimized Branch (456ms total)

Key Optimization Wins

Remaining Optimization Opportunity

Checklist

Uh oh!

mkitti commented Feb 17, 2026

Uh oh!

mkitti commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mkitti commented Feb 16, 2026 •

edited

Loading