feat: optimize property access with cached_property decorator by rhoadesScholar · Pull Request #61 · janelia-cellmap/cellmap-data

rhoadesScholar · 2026-02-25T17:21:46Z

Replaced @Property with @cached_property in multiple classes (CellMapDatasetWriter, CellMapDataSplit, CellMapImage, CellMapMultiDataset) to improve performance by caching results of expensive computations.
Removed redundant try-except blocks for attribute access, simplifying the code and enhancing readability.
Updated related methods to ensure consistent behavior with the new caching mechanism.
Removed test for zero-byte cache limit as it was deemed unnecessary.

- Replaced @Property with @cached_property in multiple classes (CellMapDatasetWriter, CellMapDataSplit, CellMapImage, CellMapMultiDataset) to improve performance by caching results of expensive computations. - Removed redundant try-except blocks for attribute access, simplifying the code and enhancing readability. - Updated related methods to ensure consistent behavior with the new caching mechanism. - Removed test for zero-byte cache limit as it was deemed unnecessary.

Copilot

Pull request overview

This PR optimizes repeated access to expensive, derived attributes in CellMap-Data’s dataset/image abstractions by switching from manual @property-with-_cached patterns to functools.cached_property, and updates cache-bounding behavior in the DataLoader accordingly.

Changes:

Replaced several computed @property implementations with @cached_property across dataset/image/split/multidataset writer components.
Updated TensorStore cache bounding and context-application logic in CellMapDataLoader, including new edge-case handling and warnings.
Removed the test that asserted a 0-byte TensorStore cache limit disables caching.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
tests/test_dataloader.py	Removes the zero-byte cache-limit test related to TensorStore cache bounding.
src/cellmap_data/multidataset.py	Converts derived weight/count/index properties to `@cached_property` and updates `empty()` cache initialization.
src/cellmap_data/image.py	Converts multiple expensive image metadata/data accessors to `@cached_property`.
src/cellmap_data/datasplit.py	Converts combined-dataset accessors to `@cached_property` and updates cache invalidation/reset logic.
src/cellmap_data/dataset_writer.py	Converts writer-derived geometry/index helpers to `@cached_property`.
src/cellmap_data/dataset.py	Converts dataset-derived geometry/count/index helpers to `@cached_property`.
src/cellmap_data/dataloader.py	Updates TensorStore context application to align with `cached_property` caching and adds cache-splitting edge-case warning + `refresh()` warning.

Copilot · 2026-02-25T17:26:52Z

src/cellmap_data/datasplit.py

+            self.classes,
+            self.input_arrays,
+            self.target_arrays,


UserWarning("Validation datasets not loaded.") creates a warning object but does not emit it, so this is effectively a no-op. If this is meant to notify callers, use warnings.warn(...) (or logger.warning(...)) instead.

Copilot · 2026-02-25T17:26:53Z

src/cellmap_data/multidataset.py

+                UserWarning(
+                    f"Unable to get validation indices for dataset {dataset}\n skipping"


In the AttributeError handler, UserWarning(...) is instantiated but never emitted, so missing validation_indices will be silently ignored. Use warnings.warn(...) or logger.warning(...) so callers understand that validation indices were skipped for that dataset.

Suggested change

UserWarning(

f"Unable to get validation indices for dataset {dataset}\n skipping"

logger.warning(

"Unable to get validation indices for dataset %r; skipping this "

"dataset when building validation_indices.",

dataset,

Copilot · 2026-02-25T17:26:53Z

src/cellmap_data/dataloader.py

            per_worker_bytes = tensorstore_cache_bytes // effective_workers
+            if per_worker_bytes == 0 and tensorstore_cache_bytes > 0:
+                per_worker_bytes = 1
+                logger.warning(
+                    "tensorstore_cache_bytes=%d with num_workers=%d results in "
+                    "per-worker cache limit of 0 bytes, which TensorStore treats as "
+                    "unlimited. Setting per-worker cache limit to 1 byte to enforce "
+                    "a meaningful bound. To avoid this warning, set tensorstore_cache_bytes "
+                    "to at least %d bytes for num_workers=%d.",
+                    tensorstore_cache_bytes,
+                    num_workers,
+                    effective_workers,
+                    effective_workers,
+                )


When tensorstore_cache_bytes < num_workers, forcing per_worker_bytes=1 means the aggregate cache budget across workers can exceed the requested tensorstore_cache_bytes (e.g., total=1 with 4 workers becomes 4 bytes). Either adjust the wording/docs (since the budget is no longer truly “shared across all workers”), or change the splitting strategy to keep the effective total bounded (e.g., cap the number of workers that get a cache_pool limit, or explicitly document the minimum per-worker limit behavior).

Copilot · 2026-02-25T17:26:53Z

src/cellmap_data/dataloader.py

 def _apply_context_to_image(image: "CellMapImage", context) -> None:
    """Set the TensorStore context on a single CellMapImage, warning if already opened."""
-    if "_array" in getattr(image, "__dict__", {}):
+    if "array" in getattr(image, "__dict__", {}):
        logger.warning(


The module-level docstring above still refers to the TensorStore array being cached under "_array", but the implementation now checks for the cached_property name "array". Update the docstring/comments accordingly to avoid misleading future changes/debugging.

Copilot · 2026-02-25T17:26:54Z

src/cellmap_data/dataloader.py

+            if per_worker_bytes == 0 and tensorstore_cache_bytes > 0:
+                per_worker_bytes = 1
+                logger.warning(
+                    "tensorstore_cache_bytes=%d with num_workers=%d results in "
+                    "per-worker cache limit of 0 bytes, which TensorStore treats as "
+                    "unlimited. Setting per-worker cache limit to 1 byte to enforce "
+                    "a meaningful bound. To avoid this warning, set tensorstore_cache_bytes "
+                    "to at least %d bytes for num_workers=%d.",
+                    tensorstore_cache_bytes,
+                    num_workers,
+                    effective_workers,
+                    effective_workers,
+                )


New behavior for the per_worker_bytes == 0 case (warning + clamping to 1 byte) is introduced here but there is no test asserting the warning and resulting per-image context when tensorstore_cache_bytes < num_workers. Consider adding a unit test in TestTensorStoreCacheBounding to lock in this edge-case behavior.

Copilot · 2026-02-25T17:26:54Z

src/cellmap_data/dataset.py

+    @cached_property
    def sampling_box_shape(self) -> dict[str, int]:
        """Returns the shape of the sampling box of the dataset in voxels of the largest voxel size requested."""
-        try:
-            return self._sampling_box_shape
-        except AttributeError:
-            self._sampling_box_shape = self._get_box_shape(self.sampling_box)
-            if self.pad:
-                for c, size in self._sampling_box_shape.items():
-                    if size <= 0:
-                        logger.debug(
-                            "Sampling box for axis %s has size %d <= 0. "
-                            "Setting to 1 and padding.",
-                            c,
-                            size,
-                        )
-                        self._sampling_box_shape[c] = 1
-            return self._sampling_box_shape
+        shape = self._get_box_shape(self.sampling_box)
+        if self.pad:
+            for c, size in shape.items():
+                if size <= 0:
+                    logger.debug(
+                        "Sampling box for axis %s has size %d <= 0. "
+                        "Setting to 1 and padding.",
+                        c,
+                        size,
+                    )
+                    shape[c] = 1
+        return shape


sampling_box_shape is now a @cached_property, so setting instance._sampling_box_shape = ... (as done in CellMapDataset.empty()) will no longer affect the returned value. This can break empty-dataset behavior (e.g., len() or sampling) because the cached_property will recompute from sampling_box instead of using the pre-set shape. Update CellMapDataset.empty() (or any other callers) to pre-populate sampling_box_shape in __dict__ (or clear/pop it appropriately) rather than writing _sampling_box_shape.

Copilot · 2026-02-25T17:26:54Z

src/cellmap_data/dataset.py

-            size = np.prod([stop - start for start, stop in self.bounding_box.items()])
-            self._size = int(size)
-            return self._size
+        return int(np.prod([stop - start for start, stop in self.bounding_box.items()]))


size computes stop - start over self.bounding_box.items(), but .items() yields (axis, [start, stop]). This will raise a TypeError (subtracting list/str) and should instead subtract the numeric bounds (e.g., iterate over .values() or unpack (axis, (start, stop))).

Suggested change

return int(np.prod([stop - start for start, stop in self.bounding_box.items()]))

return int(np.prod([stop - start for start, stop in self.bounding_box.values()]))

Copilot · 2026-02-25T17:26:55Z

src/cellmap_data/image.py

                array_future = ts.open(
                    spec, read=True, write=False, context=self.context
                )
-                try:
-                    array = array_future.result()
-                except ValueError as e:
-                    Warning(e)
-                    UserWarning("Falling back to zarr3 driver")
-                    spec["driver"] = "zarr3"
-                    array_future = ts.open(
-                        spec, read=True, write=False, context=self.context
-                    )
-                    array = array_future.result()
-                data = xt._TensorStoreAdapter(array)
-            self._array = xarray.DataArray(data=data, coords=self.full_coords)
-            return self._array
+                array = array_future.result()


Warning(e) and UserWarning(...) are being constructed here but not emitted/logged, so failures to open with the default driver will be silently ignored. Use logger.warning(...) and/or warnings.warn(...) (and consider including the exception details) so users can actually see the fallback behavior.

codecov · 2026-02-25T17:33:39Z

Codecov Report

❌ Patch coverage is 79.39189% with 61 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.48%. Comparing base (726bfd0) to head (4b6a177).
⚠️ Report is 14 commits behind head on main.

Files with missing lines	Patch %	Lines
src/cellmap_data/dataset_writer.py	62.06%	22 Missing ⚠️
src/cellmap_data/image.py	79.48%	16 Missing ⚠️
src/cellmap_data/dataset.py	86.25%	11 Missing ⚠️
src/cellmap_data/datasplit.py	77.14%	8 Missing ⚠️
src/cellmap_data/multidataset.py	92.30%	3 Missing ⚠️
src/cellmap_data/dataloader.py	83.33%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #61      +/-   ##
==========================================
+ Coverage   70.41%   70.48%   +0.07%     
==========================================
  Files          28       28              
  Lines        2633     2450     -183     
==========================================
- Hits         1854     1727     -127     
+ Misses        779      723      -56

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rhoadesScholar · 2026-02-25T17:55:07Z

@copilot open a new pull request to apply changes based on the comments in this thread

Copilot · 2026-02-25T17:55:15Z

@rhoadesScholar I've opened a new pull request, #62, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>

…m_workers Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>

…tests

…e warning emission

Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>

fix: address PR review feedback - warning emission, cached_property compatibility, and documentation

rhoadesScholar requested a review from Copilot February 25, 2026 17:21

black format

0c19a11

Copilot started reviewing on behalf of rhoadesScholar February 25, 2026 17:22 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

Initial plan

8f1cecc

Copilot AI mentioned this pull request Feb 25, 2026

fix: address PR review feedback - warning emission, cached_property compatibility, and documentation #62

Merged

Copilot AI and others added 9 commits February 25, 2026 18:00

fix: address PR review comments - proper warning emission and bug fixes

4234992

Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>

docs: clarify tensorstore cache budget behavior when cache_bytes < nu…

4042303

…m_workers Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>

black format

1307a72

refactor: reorganize CI workflow by moving docs job after test job

7a07d87

fix: update attribute access for sampling_box_shape in empty dataset …

377031d

…tests

fix: adjust num_workers parameter in CellMapDataLoader test to improv…

aee68b7

…e warning emission

test: add coverage for size and bounding_box_shape properties

ffb91b9

Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>

black format

fa84ec9

Merge pull request #62 from janelia-cellmap/copilot/sub-pr-61

4b6a177

fix: address PR review feedback - warning emission, cached_property compatibility, and documentation

rhoadesScholar merged commit 2c9a16e into main Feb 25, 2026
10 checks passed

rhoadesScholar deleted the improve_cache branch February 25, 2026 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: optimize property access with cached_property decorator#61

feat: optimize property access with cached_property decorator#61
rhoadesScholar merged 12 commits intomainfrom
improve_cache

rhoadesScholar commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

codecov bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

rhoadesScholar commented Feb 25, 2026

Uh oh!

Copilot AI commented Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		UserWarning(
		f"Unable to get validation indices for dataset {dataset}\n skipping"

	return int(np.prod([stop - start for start, stop in self.bounding_box.items()]))
	return int(np.prod([stop - start for start, stop in self.bounding_box.values()]))

Conversation

rhoadesScholar commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rhoadesScholar commented Feb 25, 2026

Uh oh!

Copilot AI commented Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Feb 25, 2026 •

edited

Loading