Initialize weights of reg_token for ViT #2229
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The reg_tokens in VisionTransformer are not initialzed in the init_weights() function. Therefore, when setting
reg_tokens>0
andno_embed_class=True
, all reg_tokens are initialized as 0 and remain the same during training, unless ROPE is used to break the symmetry. As a result, all of the models from Searching for Better ViT Baselines withreg_tokens=4
have 4 reg_tokens of the same weight, exceptvit_betwixt_patch16_rope_reg4_gap_256.sbb_in1k
. To verify this, runTherefore, the reg_tokens should be randomly initialized in the init_weights() function.