norm_act_layer
altered training dynamics for mobilenetv2_120d
?
#1447
-
I've noticed a fairly significant change in my training dynamics after updating In the above figure, all inputs, configurations, and dependencies are identical except for My current hypothesis, based on the delta between those to commits (here), is that the introduction of the DetailsUnfortunately, the model code is not something I can readily share. However, I have included a few details below. The most interesting of which are the ONNX export differences. It's very clear the convolutional layer stack is different but some elements, like the ONNX DifferencesI have no idea where that Old ExportInput normalization layers and initial block of New ExportInput normalization layers and initial block of Common Model Config backbone:
model_name: mobilenetv2_120d
global_pool: avg
pretrained: true
preprocess: true Common Optimizer Config optimizer_config:
opt: adam
lr: 1.0e-3
weight_decay: 1.0e-5
filter_bias_and_bn: True
kwargs:
eps: 1.0e-7 Common Scheduler Config scheduler_config:
sched: cosine
warmup_lr: 1.0e-5
warmup_epochs: 2
decay_epochs: 1.0
decay_rate: 1.0
epochs: 30
lr_cycle_decay: 0.95
lr_cycle_limit: 20 Common Dependencies
If anyone has already encountered this or notices something I have missed, I would be very thankful. Thanks for your time, and thanks for the amazing repo! I am very motivated to figure this out so that I can benefit from all the improvements since |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
@AffineParameter please see #1444 and #1254 ... does that answer the issue? (ie are you using sync BN?) .. in general I would avoid syncbn unless you really need it (are down in very low batch sizes like < 16), the torch native sync bn conversion hack does not work with norm + act layers, so I've added a timm version (works for native AMP + syncbn, but I haven't added support for APEX) |
Beta Was this translation helpful? Give feedback.
@AffineParameter please see #1444 and #1254 ... does that answer the issue? (ie are you using sync BN?) .. in general I would avoid syncbn unless you really need it (are down in very low batch sizes like < 16),
the torch native sync bn conversion hack does not work with norm + act layers, so I've added a timm version (works for native AMP + syncbn, but I haven't added support for APEX)