Skip to content

Fix memory leak from cached xarray wrappers in CellMapImage#64

Merged
rhoadesScholar merged 21 commits intomainfrom
copilot/fix-memory-leak-issue
Feb 27, 2026
Merged

Fix memory leak from cached xarray wrappers in CellMapImage#64
rhoadesScholar merged 21 commits intomainfrom
copilot/fix-memory-leak-issue

Conversation

Copy link
Contributor

Copilot AI commented Feb 25, 2026

  • Analyze memory leak issue and identify root causes
  • Fix cached_property memory accumulation in CellMapImage.array
  • Add explicit cleanup method (_clear_array_cache) to clear cached array after data retrieval
  • Invalidate array cache after each getitem call to prevent accumulation
  • Add comprehensive tests for memory management including simulated training loop
  • Document the memory management approach and clarify TensorStore cache behavior
  • Verify minimal performance impact (TensorStore chunk cache is preserved)
  • Create detailed documentation of the fix (MEMORY_LEAK_FIX.md)
  • Verify EmptyImage doesn't have the same issue
  • Address PR review feedback:
    • Wrap cache clearing in try/finally to ensure cleanup on exceptions
    • Short-circuit empty classes in multidataset class_counts
    • Use self.executor instead of creating new ThreadPoolExecutor
    • Handle Windows+TensorStore in multidataset class_counts computation
    • Fix UserWarning/Warning construction to properly emit warnings using logger
Original prompt

This section details on the original issue you should resolve

<issue_title>Potential memory leak, or unintentional accumulation</issue_title>
<issue_description>Image
With a batch size that should be ~350MB each, num_workers=11, persistent_workers=False, prefetch_factor=1, and CELLMAP_TENSORSTORE_CACHE_BYTES=1, a training loop blows past the 480GB memory limit for the job. Could this be due to upsampling happening during array retrieval not getting cleared properly?</issue_description>

<agent_instructions>The training loop mentioned can be found here if you need it: https://github.com/janelia-cellmap/cellmap-segmentation-challenge/blob/main/src/cellmap_segmentation_challenge/train.py</agent_instructions>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 4 commits February 25, 2026 21:35
Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix potential memory leak during training loop execution Fix memory leak from cached xarray wrappers in CellMapImage Feb 25, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes a critical memory leak in CellMapImage where cached xarray wrappers accumulated during training loops, causing memory consumption to grow from 0 to nearly 500GB over 20 minutes despite 350MB batches. The fix adds explicit cache clearing after each data retrieval operation while preserving TensorStore's chunk cache performance.

Changes:

  • Adds _clear_array_cache() method to remove cached xarray wrapper from __dict__ after each __getitem__ call
  • Preserves TensorStore connection and chunk cache for performance
  • Includes comprehensive test suite and documentation

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/cellmap_data/image.py Implements cache clearing mechanism by adding _clear_array_cache() method and calling it after data retrieval in __getitem__(), plus updated docstrings
tests/test_memory_management.py New comprehensive test suite verifying cache clearing behavior across different scenarios including spatial transforms, value transforms, and interpolation
MEMORY_LEAK_FIX.md Detailed documentation explaining the root cause, solution, and architectural considerations
tests/demo_memory_fix.py Demo script showing the fix (though it only prints information rather than executing actual demonstrations)

@rhoadesScholar rhoadesScholar marked this pull request as ready for review February 25, 2026 21:59
rhoadesScholar and others added 4 commits February 25, 2026 17:00
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Implemented cache clearing in __getitem__ to prevent memory accumulation.
- Added tests to verify cache clearing behavior during simulated training loops.
- Updated demo script for memory profiling and added objgraph support.
- Moved read limiter to utils for better organization.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 10 changed files in this pull request and generated 3 comments.

rhoadesScholar and others added 3 commits February 26, 2026 14:41
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 12 changed files in this pull request and generated 7 comments.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@rhoadesScholar
Copy link
Member

@copilot apply changes based on the comments in this thread

…and use executor pattern

Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 12 changed files in this pull request and generated 2 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 13 changed files in this pull request and generated 2 comments.

@rhoadesScholar
Copy link
Member

@copilot apply changes based on the comments in this thread

…ed warning objects

Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 13 changed files in this pull request and generated 2 comments.

rhoadesScholar and others added 3 commits February 26, 2026 18:00
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

❌ Patch coverage is 78.98551% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.83%. Comparing base (2c9a16e) to head (907b28a).
⚠️ Report is 22 commits behind head on main.

Files with missing lines Patch % Lines
src/cellmap_data/image.py 66.66% 12 Missing ⚠️
src/cellmap_data/image_writer.py 75.67% 9 Missing ⚠️
src/cellmap_data/multidataset.py 83.33% 7 Missing ⚠️
src/cellmap_data/dataset_writer.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #64      +/-   ##
==========================================
+ Coverage   70.48%   71.83%   +1.34%     
==========================================
  Files          28       28              
  Lines        2450     2464      +14     
==========================================
+ Hits         1727     1770      +43     
+ Misses        723      694      -29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rhoadesScholar rhoadesScholar merged commit 58bceb7 into main Feb 27, 2026
12 checks passed
@rhoadesScholar rhoadesScholar deleted the copilot/fix-memory-leak-issue branch February 27, 2026 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Potential memory leak, or unintentional accumulation

3 participants