Skip to content

ncsu-yoon-lab/image_segmentation

Repository files navigation

Regarding the DINO Model

Number of patches per side: 672 / 14 = 48 Total patches: 48 *48 = 2304 patches Each patch then has 384 dimentional feature vector

Regarding Segmentation Head

  • It takes patch features and then upsamples then back into full image resolution.

  • Since the output produced by DINO is [batchsize, num_patches, 384] 384 is the features extracted

  • patch_features = patch_features.view( batch_size, self.num_patches_per_side, self.num_patches_per_side, 384 ).permute(0, 3, 1, 2). Here, the 2304 patches is converted into 2D spatial grid.

  • This converted 2D spartial grid is then sent over to ConvTranspose2d(384, 512, kernelsize=3, stride=2, padding=1, output_padding=1)

  • stride=2: Doubles spatial size (48×48 → 96×96) - this is for upsampling 384 → 512: Increases channels - this is where we get richer features kernel_size=3: Uses 3×3 kernels to learn how to intelligently upsample

  • What is upsampling? The process of making the data spartially larger - increasing the height and width dimensions. The problem is that in DINO we have patch resolution (48X48), but we need pixel level predictions (672X672) The solution: Gradually upsample through multiple steps: 48×48 → 96×96 → 192×192 → 384×384 → 672×672

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors