Skip to content

Commit 6c7ce45

Browse files
committed
Update README.md with some more NaFlexVit details
1 parent 4d9c87b commit 6c7ce45

File tree

1 file changed

+19
-0
lines changed

1 file changed

+19
-0
lines changed

README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,25 @@
1212

1313
## What's New
1414

15+
## June 5, 2025
16+
* Initial NaFlexVit model code. NaFlexVit is a Vision Transformer with:
17+
1. Encapsulated embedding and position encoding in a single module
18+
2. Support for nn.Linear patch embedding on pre-patchified (dictionary) inputs
19+
3. Support for NaFlex variable aspect, variable resolution (SigLip-2: https://arxiv.org/abs/2502.14786)
20+
4. Support for FlexiViT variable patch size (https://arxiv.org/abs/2212.08013)
21+
5. Support for NaViT fractional/factorized position embedding (https://arxiv.org/abs/2307.06304)
22+
* Existing vit models in `vision_transformer.py` can be loaded into the NaFlexVit model by adding the `use_naflex=True` flag to `create_model`
23+
* Some native weights coming soon
24+
* A full NaFlex data pipeline is available that allows training / fine-tuning / evaluating with variable aspect / size images
25+
* To enable in `train.py` and `validate.py` add the `--naflex-loader` arg, must be used with a NaFlexVit
26+
* To evaluate an existing (classic) ViT loaded in NaFlexVit model w/ NaFlex data pipe:
27+
* `python validate.py /imagenet --amp -j 8 --model vit_base_patch16_224 --model-kwargs use_naflex=True --naflex-loader --naflex-max-seq-len 256`
28+
* The training has some extra args features worth noting
29+
* The `--naflex-train-seq-lens'` argument specifies which sequence lengths to randomly pick from per batch during training
30+
* The `--naflex-max-seq-len` argument sets the target sequence length for validation
31+
* Adding `--model-kwargs enable_patch_interpolator=True --naflex-patch-sizes 12 16 24` will enable random patch size selection per-batch w/ interpolation
32+
* The `--naflex-loss-scale` arg changes loss scaling mode per batch relative to the batch size, `timm` NaFlex loading changes the batch size for each seq len
33+
1534
## May 28, 2025
1635
* Add a number of small/fast models thanks to https://github.com/brianhou0208
1736
* SwiftFormer - [(ICCV2023) SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https://github.com/Amshaker/SwiftFormer)

0 commit comments

Comments
 (0)