MixViT CovMAE

Hello, I have seen that you reference to the ConvMAE pertained based method as MixViT-COnvMAE, but actually, looking at your implementation the backbone is much more similar to the MixCvT layout, with multiple patch embedding and blocks.
Am I missing something or could be?
Because I am trying to adapt [PiMAE](https://github.com/BLVLab/PiMAE) as you have done with the ConvMAE model, thank you!

Moreover, I have seen that during training, you are passing templates and search tokes to the same backbone multiple times, how the training procedure deal with it? Because I would like to enrich your model with some kind of notion about hand trajectory (when tracked object is handled or similar).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MixViT CovMAE #103

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MixViT CovMAE #103

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions