Skip to content

Fix class_weights background voxel count to use actual array volume size#66

Merged
rhoadesScholar merged 8 commits intorewritefrom
copilot/sub-pr-65
Mar 17, 2026
Merged

Fix class_weights background voxel count to use actual array volume size#66
rhoadesScholar merged 8 commits intorewritefrom
copilot/sub-pr-65

Conversation

Copy link
Contributor

Copilot AI commented Mar 13, 2026

class_weights was computing total_voxels = sum(fg_counts.values()) — the sum of all foreground voxels across all classes — rather than the actual spatial volume of each class's data array. This made bg = total_voxels - fg incorrect, corrupting the fg/bg imbalance ratio used for weighted sampling.

Changes

  • CellMapImage.total_voxels — new property; product of the s0 array's spatial shape, scaled to training resolution via the existing _scale_count helper
  • EmptyImage.total_voxels — returns 0 (no backing array)
  • CellMapDataset.class_counts — adds "totals_total" key alongside "totals", sourced from each target image's total_voxels
  • CellMapMultiDataset.class_counts — aggregates "totals_total" across datasets
  • CellMapMultiDataset.class_weights — computes bg = totals_total[cls] - fg per class instead of summing across all fg counts
# Before (wrong): bg was total fg across all classes minus this class's fg
total_voxels = sum(counts.values())  # sum of fg voxels for all classes
bg = total_voxels - fg

# After (correct): bg is the actual array volume minus this class's fg
bg = max(total_counts[cls] - fg, 0)  # total_counts[cls] = np.prod(array_shape) scaled to training res

📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.

@rhoadesScholar
Copy link
Member

@copilot Address the purpose of this PR

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

❌ Patch coverage is 81.25000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.22%. Comparing base (ce2a150) to head (ef33153).
⚠️ Report is 9 commits behind head on rewrite.

Files with missing lines Patch % Lines
src/cellmap_data/image.py 69.23% 4 Missing ⚠️
src/cellmap_data/dataset.py 80.00% 1 Missing ⚠️
src/cellmap_data/empty_image.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           rewrite      #66      +/-   ##
===========================================
+ Coverage    80.81%   81.22%   +0.41%     
===========================================
  Files           23       23              
  Lines         1538     1561      +23     
===========================================
+ Hits          1243     1268      +25     
+ Misses         295      293       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI changed the title [WIP] [WIP] Address feedback on voxel count calculation in rewrite PR Fix class_weights background voxel count to use actual array volume size Mar 13, 2026
@rhoadesScholar rhoadesScholar requested a review from Copilot March 13, 2026 19:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates how class voxel statistics are computed across CellMapImageCellMapDatasetCellMapMultiDataset, adding explicit “total voxel” counts so class sampling weights can use true background volume rather than deriving totals from foreground counts.

Changes:

  • Add CellMapImage.total_voxels and propagate per-class total-volume counts via a new totals_total key in class_counts.
  • Update CellMapMultiDataset.class_weights to compute background as totals_total - totals per class.
  • Extend unit tests to validate the new keys and the corrected weight calculation.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/cellmap_data/image.py Adds total_voxels for computing per-image volume at training resolution.
src/cellmap_data/empty_image.py Adds a total_voxels stub for API compatibility with EmptyImage.
src/cellmap_data/dataset.py Extends class_counts to include totals_total per class.
src/cellmap_data/multidataset.py Aggregates totals_total and fixes class_weights background calculation.
tests/test_image.py Adds a test asserting total_voxels equals array size in the fixture.
tests/test_multidataset.py Adds tests for totals_total and for corrected background/weight math.

Co-authored-by: Copilot Autofix powered by AI <[email protected]>
@rhoadesScholar rhoadesScholar self-requested a review March 13, 2026 19:12
@rhoadesScholar rhoadesScholar marked this pull request as ready for review March 13, 2026 19:13
Copy link
Contributor Author

Copilot AI commented Mar 13, 2026

@copilot Address the purpose of this PR

The PR's purpose is to fix total_voxels to reflect the actual data volume size (spatial shape) for each class, rather than using the sum of all foreground voxel counts. CellMapImage.total_voxels is now derived from the cached bounding_box (world-space extent divided by training-resolution voxel size) so no extra zarr I/O is needed. CellMapDataset.class_counts exposes a "totals_total" key, and class_weights computes bg = totals_total[cls] - fg per class. Committed in 0d4bf5a.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refines how class statistics are computed for sampling by distinguishing per-class foreground voxel counts from the total voxel volume of the underlying label arrays, ensuring background-vs-foreground ratios are based on the real volume size rather than inferred totals.

Changes:

  • Extend class_counts to return both per-class foreground totals (totals) and per-class total volume (totals_total) across CellMapDataset and CellMapMultiDataset.
  • Update CellMapMultiDataset.class_weights to compute background voxels using totals_total - totals.
  • Add CellMapImage.total_voxels (and EmptyImage.total_voxels) and introduce targeted regression tests for the new behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/cellmap_data/image.py Adds total_voxels derived from bounding_box and training scale.
src/cellmap_data/empty_image.py Provides total_voxels for empty sources (returns 0).
src/cellmap_data/dataset.py Expands class_counts to include totals_total and caches the result.
src/cellmap_data/multidataset.py Aggregates totals_total across datasets and fixes class_weights background calculation.
tests/test_image.py Adds a test validating total_voxels equals array size for a simple fixture.
tests/test_multidataset.py Adds tests asserting totals_total and class weight math use the true volume size.

Co-authored-by: Copilot Autofix powered by AI <[email protected]>
@rhoadesScholar rhoadesScholar merged commit e3f395d into rewrite Mar 17, 2026
12 checks passed
@rhoadesScholar rhoadesScholar deleted the copilot/sub-pr-65 branch March 17, 2026 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants