You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: acceleration/fast_model_training_guide.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -305,13 +305,13 @@ With all the above strategies, in this section, we introduce how to apply them t
305
305
### 1. Spleen segmentation
306
306
307
307
- Select the algorithms based on the experiments.
308
-
As a binary segmentation task, we replaced the baseline `Dice` loss with a `DiceCE` loss, it can help improve the convergence. And we tried several numerical optimizers, and finally replaced the baseline `Adam` optimizer with `Novograd`. To achieve the target metric (`mean Dice = 0.94` of the `foreground` channel) it reduces the number of training epochs from 280 to 135.
308
+
As a binary segmentation task, we replaced the baseline `Dice` loss with a `DiceCE` loss, it can help improve the convergence. And we tried to analyze the training curve and tuned different parameters of the network and tested several numerical optimizers, finally replaced the baseline `Adam` optimizer with `SGD`. To achieve the target metric (`mean Dice = 0.94` of the `foreground` channel only) it reduces the number of training epochs from 280 to 60.
309
309
- Optimize GPU utilization.
310
310
1. With `AMP`, the training speed is significantly improved and can achieve almost the same validation metric as without `AMP`.
311
311
2. The deterministic transform results of all the spleen dataset is around 8 GB, which can be cached in a V100 GPU memory. So, we cached all the data in GPU memory and executed the following transforms in GPU directly.
312
312
- Replace `DataLoader` with `ThreadDataLoader`. As all the data are cached in GPU, the computation of randomized transforms is on GPU and light-weighted, `ThreadDataLoader` help avoid the IPC cost of multi-processing in `DataLoader` and increase the GPU utilization.
313
313
314
-
In summary, with a V100 GPU and the target validation `mean dice = 0.94` of the `forground` channel only, it's approximately `40x` speedup compared with the Pytorch regular implementation when achieving the same metric. And every epoch is `20x` faster than regular training.
314
+
In summary, with a V100 GPU and the target validation `mean dice = 0.94` of the `forground` channel only, it's more than `100x` speedup compared with the Pytorch regular implementation when achieving the same metric. And every epoch is `20x` faster than regular training.
315
315

316
316
317
317
More details are available at [Spleen fast training tutorial](https://github.com/Project-MONAI/tutorials/blob/master/acceleration/fast_training_tutorial.ipynb).
0 commit comments