Thoughts on padding images of different sizes for VisionTransformer? #1609
Replies: 2 comments
-
Hi, I had the same questions for a while and here are my thoughts. We know where we are padding in every image. So, how many padded patches will be created and their positions. HF transformers have
|
Beta Was this translation helpful? Give feedback.
-
While you could use padding/masking here, as you say, it's not likley it'd line up exactly on patch sizes. Esp for the 384x384 and larger for the few that have it, squash crop (distorting the aspect) actually works quite well and often better. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I have a problem where I am trying to preserve the image aspect ratio (as best as possible).
The shorter side of each image is resampled to 384 (so I can use
vit_base_patch32_384
), and the longer side has a upper limit of 640.Consider a batch with two images,
I know can pad
img_0
to match the dimensions ofimg_1
.Questions
Is there some way for the VisionTransformer to ignore the padded pixels?
Ignoring all padding might be impossible at times, since the patches have a fixed size. But could ViT ignore patches which only contain padding?
Beta Was this translation helpful? Give feedback.
All reactions