You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* The training has some extra args features worth noting
29
+
* The `--naflex-train-seq-lens'` argument specifies which sequence lengths to randomly pick from per batch during training
30
+
* The `--naflex-max-seq-len` argument sets the target sequence length for validation
31
+
* Adding `--model-kwargs enable_patch_interpolator=True --naflex-patch-sizes 12 16 24` will enable random patch size selection per-batch w/ interpolation
32
+
* The `--naflex-loss-scale` arg changes loss scaling mode per batch relative to the batch size, `timm` NaFlex loading changes the batch size for each seq len
33
+
15
34
## May 28, 2025
16
35
* Add a number of small/fast models thanks to https://github.com/brianhou0208
17
36
* SwiftFormer - [(ICCV2023) SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https://github.com/Amshaker/SwiftFormer)
0 commit comments