Skip to content

Commit 21108f4

Browse files
committed
[DLMED] update latest algorithm
Signed-off-by: Nic Ma <[email protected]>
1 parent 130af81 commit 21108f4

File tree

3 files changed

+59
-37
lines changed

3 files changed

+59
-37
lines changed

acceleration/fast_model_training_guide.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -305,13 +305,13 @@ With all the above strategies, in this section, we introduce how to apply them t
305305
### 1. Spleen segmentation
306306

307307
- Select the algorithms based on the experiments.
308-
As a binary segmentation task, we replaced the baseline `Dice` loss with a `DiceCE` loss, it can help improve the convergence. And we tried several numerical optimizers, and finally replaced the baseline `Adam` optimizer with `Novograd`. To achieve the target metric (`mean Dice = 0.94` of the `foreground` channel) it reduces the number of training epochs from 280 to 135.
308+
As a binary segmentation task, we replaced the baseline `Dice` loss with a `DiceCE` loss, it can help improve the convergence. And we tried to analyze the training curve and tuned different parameters of the network and tested several numerical optimizers, finally replaced the baseline `Adam` optimizer with `SGD`. To achieve the target metric (`mean Dice = 0.94` of the `foreground` channel only) it reduces the number of training epochs from 280 to 60.
309309
- Optimize GPU utilization.
310310
1. With `AMP`, the training speed is significantly improved and can achieve almost the same validation metric as without `AMP`.
311311
2. The deterministic transform results of all the spleen dataset is around 8 GB, which can be cached in a V100 GPU memory. So, we cached all the data in GPU memory and executed the following transforms in GPU directly.
312312
- Replace `DataLoader` with `ThreadDataLoader`. As all the data are cached in GPU, the computation of randomized transforms is on GPU and light-weighted, `ThreadDataLoader` help avoid the IPC cost of multi-processing in `DataLoader` and increase the GPU utilization.
313313

314-
In summary, with a V100 GPU and the target validation `mean dice = 0.94` of the `forground` channel only, it's approximately `40x` speedup compared with the Pytorch regular implementation when achieving the same metric. And every epoch is `20x` faster than regular training.
314+
In summary, with a V100 GPU and the target validation `mean dice = 0.94` of the `forground` channel only, it's more than `100x` speedup compared with the Pytorch regular implementation when achieving the same metric. And every epoch is `20x` faster than regular training.
315315
![spleen fast training](../figures/fast_training.png)
316316

317317
More details are available at [Spleen fast training tutorial](https://github.com/Project-MONAI/tutorials/blob/master/acceleration/fast_training_tutorial.ipynb).

acceleration/fast_training_tutorial.ipynb

Lines changed: 57 additions & 35 deletions
Large diffs are not rendered by default.

figures/fast_training.png

-3.3 KB
Loading

0 commit comments

Comments
 (0)