perf:Add additional sharding benchmarks#3712
Open
mkitti wants to merge 4 commits intozarr-developers:mainfrom
Open
perf:Add additional sharding benchmarks#3712mkitti wants to merge 4 commits intozarr-developers:mainfrom
mkitti wants to merge 4 commits intozarr-developers:mainfrom
Conversation
Author
|
If we wanted to minimize this pull request, I would reduce it to just "test_sharded_morton_write_single_chunk". |
6 tasks
Author
|
@d-v-b merge or add benchmark label, please. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Added benchmarks for monitoring Morton order computation in sharded arrays. These benchmarks help assess the impact of Morton order optimizations in the context of I/O operations.
Benchmarks Added
test_sharded_morton_indexing- Sharded array indexing with power-of-2 chunks per shardtest_sharded_morton_indexing_large- Large shard with 32^3 = 32,768 chunkstest_sharded_morton_single_chunk- Reading a single chunk from a large shardtest_morton_order_iter- Direct benchmark ofmorton_order_iter(no I/O)test_sharded_morton_write_single_chunk- Writing a single chunk to a large shard (best end-to-end test)Benchmark Results
Single Chunk Write (Best End-to-End Test)
Writing a single 1x1x1 chunk to a shard with 32^3 = 32,768 chunks:
Morton Order Computation (Micro-benchmark)
Direct
morton_order_iterbenchmark without I/O:Profiling Analysis
Profile of single chunk write benchmark showing where time is spent:
Main Branch (977ms total)
decode_morton(scalar)get_chunk_slice_localize_chunk_morton_orderall()/len()Optimized Branch (456ms total)
get_chunk_slice_localize_chunk_morton_orderdecode_morton_vectorizedKey Optimization Wins
decode_mortoncalls (289ms → 9ms)all()checks for in-bounds coordinatesRemaining Optimization Opportunity
get_chunk_sliceand_localize_chunkare called 32,768 times even when writing a single chunk due to line 508 insharding.py:This builds a dict of ALL chunks before writing. Optimizing this read-modify-write pattern could save an additional ~215ms.
Checklist
docs/user-guide/*.mdchanges/