You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* [DLMED] improve fast model accuracy
Signed-off-by: Nic Ma <[email protected]>
* [DLMED] update tutorial doc
Signed-off-by: Nic Ma <[email protected]>
* [DLMED] update doc
Signed-off-by: Nic Ma <[email protected]>
* [DLMED] update latest algorithm
Signed-off-by: Nic Ma <[email protected]>
* [DLMED] make multi-gpu config more reusable
Signed-off-by: Nic Ma <[email protected]>
* [DLMED] clear log
Signed-off-by: Nic Ma <[email protected]>
* [DLMED] update according to comments
Signed-off-by: Nic Ma <[email protected]>
* [DLMED] update doc
Signed-off-by: Nic Ma <[email protected]>
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -175,7 +175,7 @@ And compares the training speed and memory usage with/without AMP.
175
175
This notebook compares the performance of `Dataset`, `CacheDataset` and `PersistentDataset`. These classes differ in how data is stored (in memory or on disk), and at which moment transforms are applied.
This tutorial compares the training performance of pure PyTorch program and optimized program in MONAI based on NVIDIA GPU device and latest CUDA library.
178
-
The optimization methods mainly include: `AMP`, `CacheDataset`and `Novograd`.
Copy file name to clipboardExpand all lines: acceleration/fast_model_training_guide.md
+2-3Lines changed: 2 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -305,14 +305,13 @@ With all the above strategies, in this section, we introduce how to apply them t
305
305
### 1. Spleen segmentation
306
306
307
307
- Select the algorithms based on the experiments.
308
-
1. As a binary segmentation task, we replaced the baseline `Dice` loss with a `DiceCE` loss, it can help improve the convergence. To achieve the target metric (mean Dice = 0.95) it reduces the number of training epochs from 200 to 50.
309
-
2. We tried several numerical optimizers, and finally replaced the baseline `Adam` optimizer with `Novograd`, which consistently reduce the number of training epochs from 50 to 30.
308
+
As a binary segmentation task, we replaced the baseline `Dice` loss with a `DiceCE` loss, it can help improve the convergence. And we tried to analyze the training curve and tuned different parameters of the network and tested several numerical optimizers, finally replaced the baseline `Adam` optimizer with `SGD`. To achieve the target metric (`mean Dice = 0.94` of the `foreground` channel only) it reduces the number of training epochs from 280 to 60.
310
309
- Optimize GPU utilization.
311
310
1. With `AMP`, the training speed is significantly improved and can achieve almost the same validation metric as without `AMP`.
312
311
2. The deterministic transform results of all the spleen dataset is around 8 GB, which can be cached in a V100 GPU memory. So, we cached all the data in GPU memory and executed the following transforms in GPU directly.
313
312
- Replace `DataLoader` with `ThreadDataLoader`. As all the data are cached in GPU, the computation of randomized transforms is on GPU and light-weighted, `ThreadDataLoader` help avoid the IPC cost of multi-processing in `DataLoader` and increase the GPU utilization.
314
313
315
-
In summary, with a V100 GPU, we can achieve the training converges at a target validation mean Dice of `0.95` within one minute (`52s` on a V100 GPU, `41s` on an A100 GPU), it is approximately `200x` faster compared with the native PyTorch implementation when achieving the target metric. And each epoch is `20x` faster than the regular training.
314
+
In summary, with a V100 GPU and the target validation `mean dice = 0.94` of the `forground` channel only, it's more than `100x` speedup compared with the Pytorch regular implementation when achieving the same metric (validation accuracies). And every epoch is `20x` faster than regular training.
316
315

317
316
318
317
More details are available at [Spleen fast training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/fast_training_tutorial.ipynb).
0 commit comments