Skip to content

Add mask support for training on foreground-only regions#220

Closed
balazsthomay wants to merge 1 commit intopierotofy:mainfrom
balazsthomay:feature/mask-support
Closed

Add mask support for training on foreground-only regions#220
balazsthomay wants to merge 1 commit intopierotofy:mainfrom
balazsthomay:feature/mask-support

Conversation

@balazsthomay
Copy link
Copy Markdown

@balazsthomay balazsthomay commented Dec 30, 2025

Summary

Add mask support for training on foreground-only regions. This enables users to exclude background pixels during training, so only the region of interest contributes to the loss. Useful for object-centric reconstruction, cleaner outputs for compositing, and handling complex/cluttered backgrounds.

Changes

  • CLI: Add --mask-dir option to specify a directory containing binary mask images
  • Camera struct: Add maskPath, mask tensor, and maskPyramids cache fields
  • Mask I/O: Add imreadMask(), maskToTensor(), tensorToMask() functions in cv_utils
  • Mask loading: Load masks in loadImage() with automatic resizing to match image dimensions
  • Mask pyramid: Add getMask() with nearest-neighbor downscaling (like image pyramids)
  • L1 loss: Modify to compute weighted mean over masked pixels only
  • SSIM loss: Apply post-masking to SSIM map before averaging
  • Training loop: Pass masks through to mainLoss() for both training and validation

Usage

./opensplat /path/to/project --mask-dir /path/to/masks -n 30000

Masks should be binary images (white=include, black=exclude) with filenames matching the source images (e.g., images/001.jpgmasks/001.png).

Testing

Build Verification

  • CMake configuration succeeds: cmake -B build .
  • Build completes: cmake --build build -j8
  • Binary runs: ./build/opensplat --help

Functional Testing

  • Tested with sample dataset (407 images + masks)
  • Output PLY file is valid
  • Loss values differ with masks vs without masks

Platform Testing (if applicable)

  • CUDA build tested
  • Metal/MPS build tested
  • CPU-only build tested

Breaking Changes

None - this is an additive feature. Existing workflows without --mask-dir continue to work unchanged.

Add --mask-dir option to specify a directory containing binary mask images
(0=exclude, 1=include). Masks are matched to images by filename stem and
applied during loss computation (L1 and SSIM) so only foreground pixels
contribute to training.

Features:
- Binary mask loading with automatic resizing to match image dimensions
- Mask pyramid caching (like image pyramids) using nearest-neighbor interpolation
- Masked L1 loss: weighted mean over foreground pixels only
- Masked SSIM: post-masking of SSIM map before averaging
- Validation loss also respects masks
@pierotofy
Copy link
Copy Markdown
Owner

Thanks for the PR! Did you use LLMs to generate this? It mostly looks very good, but can you explain how you arrived at:

 if (mask.numel() > 0){
        // Expand mask from [H,W,1] to [H,W,3] for broadcasting
        torch::Tensor expandedMask = mask.expand_as(diff);
        // Masked mean: sum of masked values / count of masked pixels
        torch::Tensor maskedDiff = diff * expandedMask;
        return maskedDiff.sum() / (expandedMask.sum() + 1e-8f);
    }

?

@balazsthomay
Copy link
Copy Markdown
Author

Thanks for the PR! Did you use LLMs to generate this? It mostly looks very good, but can you explain how you arrived at:

 if (mask.numel() > 0){
        // Expand mask from [H,W,1] to [H,W,3] for broadcasting
        torch::Tensor expandedMask = mask.expand_as(diff);
        // Masked mean: sum of masked values / count of masked pixels
        torch::Tensor maskedDiff = diff * expandedMask;
        return maskedDiff.sum() / (expandedMask.sum() + 1e-8f);
    }

?

Yes, I used Claude to help me, sorry I didn't mention that.

The mask is greyscale but the image is RGB so I just copy the mask value over 3 channels.

Then, I wanted to measure how wrong/different our rendered image is, but only for the white/foreground pixels.

Without mask I'd add up all the errors and divide by the total number of pixels. But with a mask:

  1. Zero out the errors for pixels we don't care about (multiply by the mask)
  2. Add up what's left (only foreground errors)
  3. Divide by how many foreground pixels there are (not total pixels)

and we get the average error across all foreground pixels that mainLoss() can use.

The + 1e-8f is just a safety net in case someone passes in an all-black mask to not divide by zero.

The two failed tests (Docker CUDA) couldn't complete, not sure why.

@pierotofy
Copy link
Copy Markdown
Owner

The thing is, Claude has no idea of how this works.

image

You can confirm this with some careful testing.

@pierotofy pierotofy closed this Jan 2, 2026
@JorisGoosen JorisGoosen mentioned this pull request Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants