Skip to content

feat: optimize property access with cached_property decorator#61

Merged
rhoadesScholar merged 12 commits intomainfrom
improve_cache
Feb 25, 2026
Merged

feat: optimize property access with cached_property decorator#61
rhoadesScholar merged 12 commits intomainfrom
improve_cache

Conversation

@rhoadesScholar
Copy link
Member

  • Replaced @Property with @cached_property in multiple classes (CellMapDatasetWriter, CellMapDataSplit, CellMapImage, CellMapMultiDataset) to improve performance by caching results of expensive computations.
  • Removed redundant try-except blocks for attribute access, simplifying the code and enhancing readability.
  • Updated related methods to ensure consistent behavior with the new caching mechanism.
  • Removed test for zero-byte cache limit as it was deemed unnecessary.

- Replaced @Property with @cached_property in multiple classes (CellMapDatasetWriter, CellMapDataSplit, CellMapImage, CellMapMultiDataset) to improve performance by caching results of expensive computations.
- Removed redundant try-except blocks for attribute access, simplifying the code and enhancing readability.
- Updated related methods to ensure consistent behavior with the new caching mechanism.
- Removed test for zero-byte cache limit as it was deemed unnecessary.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes repeated access to expensive, derived attributes in CellMap-Data’s dataset/image abstractions by switching from manual @property-with-_cached patterns to functools.cached_property, and updates cache-bounding behavior in the DataLoader accordingly.

Changes:

  • Replaced several computed @property implementations with @cached_property across dataset/image/split/multidataset writer components.
  • Updated TensorStore cache bounding and context-application logic in CellMapDataLoader, including new edge-case handling and warnings.
  • Removed the test that asserted a 0-byte TensorStore cache limit disables caching.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/test_dataloader.py Removes the zero-byte cache-limit test related to TensorStore cache bounding.
src/cellmap_data/multidataset.py Converts derived weight/count/index properties to @cached_property and updates empty() cache initialization.
src/cellmap_data/image.py Converts multiple expensive image metadata/data accessors to @cached_property.
src/cellmap_data/datasplit.py Converts combined-dataset accessors to @cached_property and updates cache invalidation/reset logic.
src/cellmap_data/dataset_writer.py Converts writer-derived geometry/index helpers to @cached_property.
src/cellmap_data/dataset.py Converts dataset-derived geometry/count/index helpers to @cached_property.
src/cellmap_data/dataloader.py Updates TensorStore context application to align with cached_property caching and adds cache-splitting edge-case warning + refresh() warning.

Comment on lines +266 to +268
self.classes,
self.input_arrays,
self.target_arrays,
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UserWarning("Validation datasets not loaded.") creates a warning object but does not emit it, so this is effectively a no-op. If this is meant to notify callers, use warnings.warn(...) (or logger.warning(...)) instead.

Copilot uses AI. Check for mistakes.
Comment on lines +176 to +177
UserWarning(
f"Unable to get validation indices for dataset {dataset}\n skipping"
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the AttributeError handler, UserWarning(...) is instantiated but never emitted, so missing validation_indices will be silently ignored. Use warnings.warn(...) or logger.warning(...) so callers understand that validation indices were skipped for that dataset.

Suggested change
UserWarning(
f"Unable to get validation indices for dataset {dataset}\n skipping"
logger.warning(
"Unable to get validation indices for dataset %r; skipping this "
"dataset when building validation_indices.",
dataset,

Copilot uses AI. Check for mistakes.
Comment on lines 208 to +221
per_worker_bytes = tensorstore_cache_bytes // effective_workers
if per_worker_bytes == 0 and tensorstore_cache_bytes > 0:
per_worker_bytes = 1
logger.warning(
"tensorstore_cache_bytes=%d with num_workers=%d results in "
"per-worker cache limit of 0 bytes, which TensorStore treats as "
"unlimited. Setting per-worker cache limit to 1 byte to enforce "
"a meaningful bound. To avoid this warning, set tensorstore_cache_bytes "
"to at least %d bytes for num_workers=%d.",
tensorstore_cache_bytes,
num_workers,
effective_workers,
effective_workers,
)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When tensorstore_cache_bytes < num_workers, forcing per_worker_bytes=1 means the aggregate cache budget across workers can exceed the requested tensorstore_cache_bytes (e.g., total=1 with 4 workers becomes 4 bytes). Either adjust the wording/docs (since the budget is no longer truly “shared across all workers”), or change the splitting strategy to keep the effective total bounded (e.g., cap the number of workers that get a cache_pool limit, or explicitly document the minimum per-worker limit behavior).

Copilot uses AI. Check for mistakes.
Comment on lines 63 to 66
def _apply_context_to_image(image: "CellMapImage", context) -> None:
"""Set the TensorStore context on a single CellMapImage, warning if already opened."""
if "_array" in getattr(image, "__dict__", {}):
if "array" in getattr(image, "__dict__", {}):
logger.warning(
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module-level docstring above still refers to the TensorStore array being cached under "_array", but the implementation now checks for the cached_property name "array". Update the docstring/comments accordingly to avoid misleading future changes/debugging.

Copilot uses AI. Check for mistakes.
Comment on lines +209 to +221
if per_worker_bytes == 0 and tensorstore_cache_bytes > 0:
per_worker_bytes = 1
logger.warning(
"tensorstore_cache_bytes=%d with num_workers=%d results in "
"per-worker cache limit of 0 bytes, which TensorStore treats as "
"unlimited. Setting per-worker cache limit to 1 byte to enforce "
"a meaningful bound. To avoid this warning, set tensorstore_cache_bytes "
"to at least %d bytes for num_workers=%d.",
tensorstore_cache_bytes,
num_workers,
effective_workers,
effective_workers,
)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior for the per_worker_bytes == 0 case (warning + clamping to 1 byte) is introduced here but there is no test asserting the warning and resulting per-image context when tensorstore_cache_bytes < num_workers. Consider adding a unit test in TestTensorStoreCacheBounding to lock in this edge-case behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +479 to +493
@cached_property
def sampling_box_shape(self) -> dict[str, int]:
"""Returns the shape of the sampling box of the dataset in voxels of the largest voxel size requested."""
try:
return self._sampling_box_shape
except AttributeError:
self._sampling_box_shape = self._get_box_shape(self.sampling_box)
if self.pad:
for c, size in self._sampling_box_shape.items():
if size <= 0:
logger.debug(
"Sampling box for axis %s has size %d <= 0. "
"Setting to 1 and padding.",
c,
size,
)
self._sampling_box_shape[c] = 1
return self._sampling_box_shape
shape = self._get_box_shape(self.sampling_box)
if self.pad:
for c, size in shape.items():
if size <= 0:
logger.debug(
"Sampling box for axis %s has size %d <= 0. "
"Setting to 1 and padding.",
c,
size,
)
shape[c] = 1
return shape
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sampling_box_shape is now a @cached_property, so setting instance._sampling_box_shape = ... (as done in CellMapDataset.empty()) will no longer affect the returned value. This can break empty-dataset behavior (e.g., len() or sampling) because the cached_property will recompute from sampling_box instead of using the pre-set shape. Update CellMapDataset.empty() (or any other callers) to pre-populate sampling_box_shape in __dict__ (or clear/pop it appropriately) rather than writing _sampling_box_shape.

Copilot uses AI. Check for mistakes.
size = np.prod([stop - start for start, stop in self.bounding_box.items()])
self._size = int(size)
return self._size
return int(np.prod([stop - start for start, stop in self.bounding_box.items()]))
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size computes stop - start over self.bounding_box.items(), but .items() yields (axis, [start, stop]). This will raise a TypeError (subtracting list/str) and should instead subtract the numeric bounds (e.g., iterate over .values() or unpack (axis, (start, stop))).

Suggested change
return int(np.prod([stop - start for start, stop in self.bounding_box.items()]))
return int(np.prod([stop - start for start, stop in self.bounding_box.values()]))

Copilot uses AI. Check for mistakes.
Comment on lines 247 to +250
array_future = ts.open(
spec, read=True, write=False, context=self.context
)
try:
array = array_future.result()
except ValueError as e:
Warning(e)
UserWarning("Falling back to zarr3 driver")
spec["driver"] = "zarr3"
array_future = ts.open(
spec, read=True, write=False, context=self.context
)
array = array_future.result()
data = xt._TensorStoreAdapter(array)
self._array = xarray.DataArray(data=data, coords=self.full_coords)
return self._array
array = array_future.result()
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning(e) and UserWarning(...) are being constructed here but not emitted/logged, so failures to open with the default driver will be silently ignored. Use logger.warning(...) and/or warnings.warn(...) (and consider including the exception details) so users can actually see the fallback behavior.

Copilot uses AI. Check for mistakes.
@codecov
Copy link

codecov bot commented Feb 25, 2026

Codecov Report

❌ Patch coverage is 79.39189% with 61 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.48%. Comparing base (726bfd0) to head (4b6a177).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
src/cellmap_data/dataset_writer.py 62.06% 22 Missing ⚠️
src/cellmap_data/image.py 79.48% 16 Missing ⚠️
src/cellmap_data/dataset.py 86.25% 11 Missing ⚠️
src/cellmap_data/datasplit.py 77.14% 8 Missing ⚠️
src/cellmap_data/multidataset.py 92.30% 3 Missing ⚠️
src/cellmap_data/dataloader.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #61      +/-   ##
==========================================
+ Coverage   70.41%   70.48%   +0.07%     
==========================================
  Files          28       28              
  Lines        2633     2450     -183     
==========================================
- Hits         1854     1727     -127     
+ Misses        779      723      -56     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rhoadesScholar
Copy link
Member Author

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link
Contributor

Copilot AI commented Feb 25, 2026

@rhoadesScholar I've opened a new pull request, #62, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 9 commits February 25, 2026 18:00
Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
…m_workers

Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
fix: address PR review feedback - warning emission, cached_property compatibility, and documentation
@rhoadesScholar rhoadesScholar merged commit 2c9a16e into main Feb 25, 2026
10 checks passed
@rhoadesScholar rhoadesScholar deleted the improve_cache branch February 25, 2026 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants