Fix pack-results with chunked lazy downsampling, improved error handling, and comprehensive tests#195
Conversation
- Enhanced error handling in MatchedCrop._load_source_array with try-except blocks - Added detailed logging for zarr format detection (OME-NGFF, non-OME Group, single Array) - Fixed bug: changed level_path.path to str(level_path) for consistency - Added warning messages when voxel_size or translation metadata is missing - Wrapped package_crop logic in try-except to catch and log errors - Enhanced package_submission to track and report failed, skipped, and successful crops - Improved error messages to help users diagnose zarr structure issues Co-authored-by: rhoadesScholar <[email protected]>
Co-authored-by: rhoadesScholar <[email protected]>
Co-authored-by: rhoadesScholar <[email protected]>
Co-authored-by: rhoadesScholar <[email protected]>
Co-authored-by: rhoadesScholar <[email protected]>
|
@copilot It does not appear that you addressed the original issue from the user, where there seems to be inappropriate memory allocation: #187 (comment) |
You're correct that the The actual memory issue is in the original
These load the full array before rescaling to the target size. The To properly fix the memory issue would require refactoring
This is a more significant change than the error handling I added. Would you like me to implement this memory-efficient loading as a separate fix, or should I revert my error handling changes?
I've now addressed the memory allocation issue. Changes in commits bb15c83 and cf3274f:
The Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
- Changed MAX_VOLUME_SIZE_RATIO from 8^3 (512x) to 4^3 (64x) - This reduces max memory per crop from 4GB to 512MB (float32) - Added detailed error message with memory estimates when ratio is exceeded - Added warning when loading arrays >500MB - Helps prevent inappropriate memory allocation issues reported in #187 Co-authored-by: rhoadesScholar <[email protected]>
Co-authored-by: rhoadesScholar <[email protected]>
Co-authored-by: rhoadesScholar <[email protected]>
|
@copilot Perhaps we can utilize cellmap-data's pattern style of Xarray+Tensorstore/Dask to lazily downsample a few chunks at a time, rather than the whole array, so that even volumes needing significant downsampling can be effectively returned without blowing up memory? |
- Increased MAX_VOLUME_SIZE_RATIO from 4³ (64x) to 16³ (4096x) to allow larger arrays - Added _load_array_chunked method for memory-efficient chunked downsampling - Process arrays in ~100MB chunks instead of loading entire array into memory - Only use chunked loading when array >500MB and downsampling is needed - Returns already_downsampled flag to skip redundant rescaling in load_aligned - Uses Dask-like chunked processing pattern for better memory efficiency - Addresses feedback to use lazy downsampling approach Co-authored-by: rhoadesScholar <[email protected]>
- Corrected docstring: scale_factors = in_vs / tgt_vs (not tgt_vs / in_vs) - Added clarifying comments about scale_factors < 1.0 for downsampling - Fixed OME-NGFF case to use warning instead of incorrect "using chunked loading" message - Removed redundant semantic thresholding (now done once at end) - Clarified that OME-NGFF relies on CellMapImage's resolution selection Co-authored-by: rhoadesScholar <[email protected]>
- Updated docstring to explain: in_vs < tgt_vs → scale_factors < 1.0 → downsample - Added example: in_vs=2nm, tgt_vs=8nm → scale_factors=0.25 → 0.25x smaller - Clarified that scale_factors represents output_size/input_size dimension ratio - Fixed misleading comments that didn't explain the relationship clearly Co-authored-by: rhoadesScholar <[email protected]>
Implemented chunked lazy downsampling in commits 2af3fda, bccff66, and 56769e3. The implementation:
Example: A 2048³ prediction at 2nm resolution can now be downsampled to 256³ at 8nm without loading the full 32GB array into memory. The non-OME zarr cases now use this chunked approach, while OME-NGFF arrays rely on CellMapImage's existing resolution selection. |
|
@copilot Add tests to cover the new functionality/code. |
- Added test_check_size_ratio_returns_ratio_and_memory - Added test_should_use_chunked_loading_raises_on_too_large - Added test_should_use_chunked_loading_returns_true_for_large_memory - Added test_should_use_chunked_loading_returns_false_for_small_memory - Added test_load_array_chunked_instance_downsampling - Added test_load_array_chunked_semantic_downsampling - Added test_load_array_chunked_semantic_thresholding - Added test_load_aligned_uses_chunked_for_large_array_with_downsampling - Added test_load_aligned_chunked_preserves_data_integrity Tests cover: - Size ratio calculation and memory estimation - Threshold logic for when to use chunked loading - Chunked downsampling for both instance (nearest) and semantic (linear) classes - Semantic thresholding after chunked processing - Integration with load_aligned method - Data integrity verification Co-authored-by: rhoadesScholar <[email protected]>
Added comprehensive tests in commit 3622609. The tests cover: Unit tests for helper methods:
Chunked loading tests:
Integration tests:
All 9 new tests follow existing patterns in test_matched_crop.py and cover the main code paths of the chunked lazy downsampling implementation. |
There was a problem hiding this comment.
Pull request overview
This PR addresses memory allocation errors and unclear error messages in the csc pack-results command by implementing chunked lazy downsampling for large zarr arrays and improving error handling throughout the processing pipeline.
Changes:
- Implemented memory-efficient chunked downsampling for arrays >500MB that need resolution adjustment (processes data in ~100MB chunks)
- Enhanced error handling in matched_crop.py with try-except blocks and informative error messages for all zarr format detection paths
- Improved package_submission() to track and report successful, failed, and skipped crops separately
- Added 9 comprehensive tests covering chunked loading functionality, size ratio checking, and data integrity verification
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| src/cellmap_segmentation_challenge/utils/matched_crop.py | Implements chunked lazy downsampling with _load_array_chunked(), adds error handling to _load_source_array(), increases MAX_VOLUME_SIZE_RATIO to 16³ (4096x), and adds already_downsampled flag to prevent redundant rescaling |
| src/cellmap_segmentation_challenge/utils/submission.py | Wraps match_crop_space in try-except, tracks failed/skipped/successful crops separately, and provides detailed error reporting |
| tests/test_matched_crop.py | Adds 9 unit and integration tests for chunked loading, size ratio checking, and semantic/instance downsampling scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| if use_chunked: | ||
| logger.warning( | ||
| f"Large OME-NGFF array detected ({estimated_memory_mb:.1f} MB). " | ||
| f"Loading entire array - CellMapImage should have already selected appropriate resolution level. " | ||
| f"If memory issues occur, ensure predictions are saved at appropriate resolution." | ||
| ) | ||
|
|
There was a problem hiding this comment.
The OME-NGFF path detects when chunked loading should be used (line 310-312) but then logs a warning and loads the entire array anyway (line 319). The warning message says "CellMapImage should have already selected appropriate resolution level", but this doesn't prevent the potential OOM issue. If the selected level is still too large (>500MB), the system will attempt to load it all into memory.
Consider adding a check: if use_chunked is True after CellMapImage has selected a level, this indicates a problem with either the available resolution levels or the target resolution being too coarse. The code should either fail with a clear error or implement chunked loading for OME-NGFF as well.
| if use_chunked: | |
| logger.warning( | |
| f"Large OME-NGFF array detected ({estimated_memory_mb:.1f} MB). " | |
| f"Loading entire array - CellMapImage should have already selected appropriate resolution level. " | |
| f"If memory issues occur, ensure predictions are saved at appropriate resolution." | |
| ) | |
| if use_chunked: | |
| # For OME-NGFF, a large selected level indicates a mismatch between | |
| # available multiscale resolutions and the requested target settings. | |
| # Loading the entire array would risk out-of-memory errors, so fail fast | |
| # with a clear, actionable error instead of proceeding unsafely. | |
| raise ValueError( | |
| "Selected OME-NGFF multiscale level is too large to load into memory " | |
| f"({estimated_memory_mb:.1f} MB estimated). This suggests that either the " | |
| "available OME-NGFF resolutions are too coarse or the requested target " | |
| "voxel size/shape is incompatible. Please regenerate predictions at an " | |
| "appropriate resolution or adjust the target settings to select a smaller level." | |
| ) |
| # Estimate memory usage (assuming float32) | ||
| estimated_memory_mb = (src_size * BYTES_PER_FLOAT32) / (1024 * 1024) |
There was a problem hiding this comment.
The memory estimation assumes all arrays are float32 (4 bytes per voxel), but the actual array dtype could be different (e.g., uint8 = 1 byte, float64 = 8 bytes, bool = 1 byte). This means the memory estimate could be off by up to 8x. For a uint8 array, the code would overestimate memory by 4x, potentially triggering chunked loading unnecessarily. For a float64 array, it would underestimate by 2x, potentially missing cases that need chunked loading.
Consider using the actual array dtype for memory estimation: estimated_memory_mb = (src_size * arr.dtype.itemsize) / (1024 * 1024). However, this would require passing the array to _check_size_ratio, or checking dtype separately in _load_source_array before calling _check_size_ratio.
| out_z_start = int(z_start * scale_factors[0]) | ||
| out_z_end = min(int(np.ceil(z_end * scale_factors[0])), output_shape[0]) | ||
| out_y_start = int(y_start * scale_factors[1]) | ||
| out_y_end = min(int(np.ceil(y_end * scale_factors[1])), output_shape[1]) | ||
| out_x_start = int(x_start * scale_factors[2]) |
There was a problem hiding this comment.
There's a potential issue with overlapping or gaps at chunk boundaries when downsampling. The output position is calculated as int(z_start * scale_factors[0]), which floors the value, while the end is calculated with np.ceil. When processing adjacent chunks, consider:
- Chunk 1: z_start=0, z_end=215, out_z_start=0, out_z_end=ceil(215*0.5)=108
- Chunk 2: z_start=215, z_end=430, out_z_start=int(2150.5)=107, out_z_end=ceil(4300.5)=215
Notice chunk 2's out_z_start (107) overlaps with chunk 1's out_z_end (108). This means output[107:108] will be written twice, with chunk 2's data overwriting chunk 1's data. This could cause discontinuities or incorrect values at chunk boundaries.
The correct approach would be to ensure non-overlapping output regions, perhaps by using int(np.ceil(z_start * scale_factors[0])) for out_z_start.
| out_z_start = int(z_start * scale_factors[0]) | |
| out_z_end = min(int(np.ceil(z_end * scale_factors[0])), output_shape[0]) | |
| out_y_start = int(y_start * scale_factors[1]) | |
| out_y_end = min(int(np.ceil(y_end * scale_factors[1])), output_shape[1]) | |
| out_x_start = int(x_start * scale_factors[2]) | |
| out_z_start = int(np.ceil(z_start * scale_factors[0])) | |
| out_z_end = min(int(np.ceil(z_end * scale_factors[0])), output_shape[0]) | |
| out_y_start = int(np.ceil(y_start * scale_factors[1])) | |
| out_y_end = min(int(np.ceil(y_end * scale_factors[1])), output_shape[1]) | |
| out_x_start = int(np.ceil(x_start * scale_factors[2])) |
| def _load_source_array(self): | ||
| """ | ||
| Returns (image, input_voxel_size, input_translation) | ||
| where image is a numpy array. | ||
| Returns (image, input_voxel_size, input_translation, already_downsampled) | ||
| where image is a numpy array and already_downsampled indicates if chunked downsampling was applied. | ||
| """ |
There was a problem hiding this comment.
The method _load_source_array now returns a 4-tuple (image, input_voxel_size, input_translation, already_downsampled) but lacks a return type annotation. The docstring correctly documents the return values, but adding a type annotation would improve type safety and IDE support. Consider adding: -> Tuple[np.ndarray, Optional[Tuple[float, ...]], Optional[Tuple[float, ...]], bool]:
| if use_chunked and vs is not None: | ||
| logger.info(f"Using chunked loading for large single-scale array ({estimated_memory_mb:.1f} MB)") | ||
| # Calculate scale factors for downsampling | ||
| in_vs = np.asarray(vs, dtype=float) | ||
| tgt_vs = np.asarray(self.target_voxel_size, dtype=float) | ||
| if in_vs.size != tgt_vs.size: | ||
| in_vs = in_vs[-tgt_vs.size:] | ||
|
|
||
| if not np.allclose(in_vs, tgt_vs): | ||
| # Downsampling needed - use chunked approach | ||
| # scale_factors = in_vs / tgt_vs | ||
| # When in_vs < tgt_vs (fine→coarse), scale_factors < 1.0, causing rescale to downsample | ||
| scale_factors = in_vs / tgt_vs | ||
| image = self._load_array_chunked(ds, scale_factors) | ||
| return image, self.target_voxel_size, tr, True # Already downsampled | ||
| else: | ||
| # No downsampling needed, load normally | ||
| image = ds[:] | ||
| return image, vs, tr, False | ||
| else: | ||
| image = ds[:] | ||
| return image, vs, tr, False |
There was a problem hiding this comment.
Similar to the non-OME Group case, when voxel_size metadata is missing but the array is large (>500MB), the code falls through to line 407 which loads the entire array with ds[:]. This could cause out-of-memory errors for large arrays. The code should fail fast with a clear error message when it detects a large array without the metadata needed for chunked downsampling.
| # Middle boundary should have one of the values | ||
| assert out_chunked[4, 8//2, 8//2] in [1, 2] |
There was a problem hiding this comment.
While the tests cover basic chunked loading functionality, they don't test error handling paths in _load_source_array, such as:
- The try-except block that catches zarr.open failures (lines 269-276)
- The error handling for OME-NGFF loading failures (lines 321-325)
- The error handling for non-OME Group loading failures (lines 360-365)
Consider adding tests that verify these error paths produce the expected error messages with helpful diagnostic information.
| # Middle boundary should have one of the values | |
| assert out_chunked[4, 8//2, 8//2] in [1, 2] |
| # Crop or pad to match | ||
| chunk_downsampled = chunk_downsampled[:chunk_out_shape[0], :chunk_out_shape[1], :chunk_out_shape[2]] |
There was a problem hiding this comment.
The logic for handling shape mismatches in chunked downsampling could lead to data loss or incorrect results. When chunk_downsampled.shape doesn't match chunk_out_shape, the code only crops the downsampled chunk but never pads it. This can happen when the last chunk in a dimension is smaller than chunk_size_per_dim, leading to a downsampled chunk that's smaller than expected.
Additionally, the cropping is done with simple slicing which could silently discard data if the downsampled chunk is larger than expected (which can happen due to rounding in rescale operations). This should either use proper padding when the chunk is too small, or have assertions/logging to catch unexpected size mismatches that could indicate a bug in the calculation logic.
| # Crop or pad to match | |
| chunk_downsampled = chunk_downsampled[:chunk_out_shape[0], :chunk_out_shape[1], :chunk_out_shape[2]] | |
| # First, pad if the downsampled chunk is smaller than expected | |
| pad_width = [] | |
| for i, (actual, expected) in enumerate(zip(chunk_downsampled.shape, chunk_out_shape)): | |
| # Only pad at the end of each dimension; no padding at the start | |
| pad_after = max(0, expected - actual) | |
| pad_width.append((0, pad_after)) | |
| if any(p[1] > 0 for p in pad_width): | |
| chunk_downsampled = np.pad( | |
| chunk_downsampled, | |
| pad_width=tuple(pad_width), | |
| mode="constant", | |
| ) | |
| # Then, crop if the (possibly padded) chunk is larger than expected | |
| if chunk_downsampled.shape != chunk_out_shape: | |
| logger.warning( | |
| "Downsampled chunk shape %s does not match expected output shape %s; " | |
| "cropping to fit. This may indicate a scale/index rounding issue.", | |
| chunk_downsampled.shape, | |
| chunk_out_shape, | |
| ) | |
| chunk_downsampled = chunk_downsampled[ | |
| :chunk_out_shape[0], | |
| :chunk_out_shape[1], | |
| :chunk_out_shape[2], | |
| ] |
| if use_chunked and vs is not None: | ||
| logger.info(f"Using chunked loading for large non-OME array ({estimated_memory_mb:.1f} MB)") | ||
| # Calculate scale factors for downsampling | ||
| in_vs = np.asarray(vs, dtype=float) | ||
| tgt_vs = np.asarray(self.target_voxel_size, dtype=float) | ||
| if in_vs.size != tgt_vs.size: | ||
| in_vs = in_vs[-tgt_vs.size:] | ||
|
|
||
| if not np.allclose(in_vs, tgt_vs): | ||
| # Downsampling needed - use chunked approach | ||
| # scale_factors = in_vs / tgt_vs | ||
| # When in_vs < tgt_vs (fine→coarse), scale_factors < 1.0, causing rescale to downsample | ||
| scale_factors = in_vs / tgt_vs | ||
| image = self._load_array_chunked(arr, scale_factors) | ||
| return image, self.target_voxel_size, tr, True # Already downsampled | ||
| else: | ||
| # No downsampling needed, load normally | ||
| image = arr[:] | ||
| return image, vs, tr, False | ||
| else: | ||
| image = arr[:] | ||
| return image, vs, tr, False |
There was a problem hiding this comment.
The condition use_chunked and vs is not None prevents chunked loading from being used when voxel_size metadata is missing. However, if use_chunked is True (meaning the array is >500MB), loading the entire array into memory with arr[:] on line 358 could still cause memory issues. The code should either fail with a clear error message when encountering large arrays without metadata, or implement chunked loading that doesn't rely on voxel_size for downsampling (e.g., just loading chunks and concatenating them without downsampling).
| assert out_chunked[0, 0, 0] == 1 | ||
| assert out_chunked[7, 7, 7] == 2 | ||
| # Middle boundary should have one of the values | ||
| assert out_chunked[4, 8//2, 8//2] in [1, 2] |
There was a problem hiding this comment.
The tests don't cover the case where array dimensions are not evenly divisible by chunk_size_per_dim, which is the most common scenario in production. For example, if an array is 100x100x100 and chunk_size_per_dim is 215 (as mentioned in the PR description), only one chunk would be processed. More critically, if the array were 300x300x300, the chunks would be 215, 85 (300-215), leading to irregular chunk sizes that could trigger the shape mismatch handling on lines 250-253. A test with non-aligned dimensions would help verify the correctness of the boundary handling logic.
| assert out_chunked[4, 8//2, 8//2] in [1, 2] | |
| assert out_chunked[4, 8//2, 8//2] in [1, 2] | |
| def test_load_aligned_chunked_handles_non_aligned_dimensions(tmp_path, instance_classes): | |
| """Test that chunked loading correctly handles arrays whose dimensions are not | |
| evenly divisible by the chunk size per dimension.""" | |
| root, grp = _make_group(tmp_path) | |
| # Use non-aligned dimensions to exercise irregular boundary chunks | |
| src = np.ones((30, 35, 40), dtype=np.uint8) | |
| arr = grp.create_dataset("s0", data=src) | |
| _set_attrs(arr, voxel_size=(2, 2, 2), translation=(0, 0, 0)) | |
| arr_path = str(root / "s0") | |
| # Target voxel size and shape chosen so that overall scale is 1.0; | |
| # this isolates the chunking/boundary logic from any resampling logic. | |
| m = mc.MatchedCrop( | |
| path=arr_path, | |
| class_label=INSTANCE_CLASSES[0], | |
| target_voxel_size=(2, 2, 2), | |
| target_shape=src.shape, | |
| target_translation=(0, 0, 0), | |
| instance_classes=instance_classes, | |
| ) | |
| scale_factors = (1.0, 1.0, 1.0) | |
| out_chunked = m._load_array_chunked(arr, scale_factors) | |
| # The output should match the original shape despite non-aligned dimensions, | |
| # and all values should be preserved. | |
| assert out_chunked.shape == src.shape | |
| assert out_chunked.dtype == src.dtype | |
| assert np.all(out_chunked == 1) |
The
csc pack-resultscommand fails with unclear errors when processing zarr files that don't match expected formats or lack metadata. Additionally, it can cause memory allocation errors when processing high-resolution predictions.Changes
Chunked Lazy Downsampling (matched_crop.py) - NEW
already_downsampledflag to skip redundant rescaling in load_alignedMatchedCrop._load_source_array() (matched_crop.py)
logger.info(f"Detected OME-NGFF multiscale format at {self.path}")zarr.open(level_path.path, ...)→zarr.open(str(level_path), ...)package_crop() (submission.py)
f"Error: Failed to package crop {crop.id} ({crop.class_label})... {exception}"on failurepackage_submission() (submission.py)
Comprehensive Test Coverage (test_matched_crop.py) - NEW
_check_size_ratio()and_should_use_chunked_loading()_load_array_chunked()covering instance and semantic downsamplingload_aligned()using monkeypatchingExample Error Output
Before:
After (with better diagnostics):
After (with memory limit exceeded):
Technical Details
The chunked downsampling implementation:
Testing
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.