You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Signed-off-by: Mingxin Zheng
<[email protected]>
Fixes#1210 .
### Description
MONAI Tutorial repo has version tags. When users navigate the repo, it
is expected that hyperlinks are pointing to the files and folders in the
same tags, instead of other branches such as `main`.
- Fixed hyperlinks to local repo resource and ensure they are using
internal relative paths
- Removed duplicated license
### Checks
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Avoid including large-size files in the PR.
- [x] Clean up long text outputs from code cells in the notebook.
- [x] For security purposes, please check the contents and remove any
sensitive info such as user names and private key.
- [x] Ensure (1) hyperlinks and markdown anchors are working (2) use
relative paths for tutorial repo files (3) put figure and graphs in the
`./figure` folder
- [ ] Notebook runs automatically `./runner.sh -t <path to .ipynb file>`
---------
Signed-off-by: Mingxin Zheng <[email protected]>
Copy file name to clipboardExpand all lines: 3d_segmentation/unet_segmentation_3d_catalyst.ipynb
+2-1Lines changed: 2 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,7 @@
1
1
{
2
2
"cells": [
3
3
{
4
+
"attachments": {},
4
5
"cell_type": "markdown",
5
6
"metadata": {
6
7
"colab_type": "text",
@@ -33,7 +34,7 @@
33
34
"* Sliding window inference method.\n",
34
35
"* Deterministic training for reproducibility.\n",
35
36
"\n",
36
-
"This tutorial is based on [unet_training_dict.py](https://github.com/Project-MONAI/tutorials/blob/main/3d_segmentation/torch/unet_training_dict.py) and [spleen_segmentation_3d.ipynb](https://github.com/Project-MONAI/tutorials/blob/main/3d_segmentation/spleen_segmentation_3d.ipynb).\n",
37
+
"This tutorial is based on [unet_training_dict.py](torch/unet_training_dict.py) and [spleen_segmentation_3d.ipynb](spleen_segmentation_3d.ipynb).\n",
37
38
"\n",
38
39
"[](https://colab.research.google.com/github/Project-MONAI/tutorials/blob/main/3d_segmentation/unet_segmentation_3d_catalyst.ipynb)"
Copy file name to clipboardExpand all lines: acceleration/fast_model_training_guide.md
+16-16Lines changed: 16 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -77,7 +77,7 @@ And the output `.json` file contains various aspects of GPU information.
77
77
[NVIDIA Nsight™ Systems](https://developer.nvidia.com/nsight-systems) is a system-wide performance analysis tool designed to visualize algorithms, help to identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs.
78
78
79
79
Nsight provides a great GUI to visualize the output database (`.qdrep` file) from the analysis results of DLProf. With necessary annotation inside the existing training scripts. The GPU utilization of each operation can be seen through the interface. Then, users understand better which components are the bottlenecks. The detailed example is shown in the following
As shown in the following figure, each training epoch can be decomposed into multiple steps, including data loading (I/O), model forward/backward operation, optimization, etc. Then, necessary improvement can be conducted targeting certain steps. For example, if data loading (I/O) takes too much time during training, we can try to cache them into CPU/GPU bypassing data loading and pre-processing. After program optimization, users can re-run the analysis to compare the results before and after optimization.
83
83
@@ -126,7 +126,7 @@ with torch.cuda.amp.autocast():
126
126
nvtx.end_range(rng_train_forward)
127
127
```
128
128
129
-
The concrete examples can be found in the profiling tutorials of [radiology pipeline](https://github.com/Project-MONAI/tutorials/blob/main/performance_profiling/radiology) and [pathology pipelines](https://github.com/Project-MONAI/tutorials/blob/main/pathology/tumor_detection/ignite/profiling_camelyon_pipeline.ipynb).
129
+
The concrete examples can be found in the profiling tutorials of [radiology pipeline](../performance_profiling/radiology) and [pathology pipelines](../pathology/tumor_detection/ignite/profiling_camelyon_pipeline.ipynb).
130
130
131
131
### 4. NVIDIA Management Library (NVML)
132
132
@@ -176,12 +176,12 @@ With the tools described in the previous sections, we can identify the bottlenec
176
176
177
177
Users often need to train the model with many (potentially thousands of) epochs over the training data to achieve decent model quality. A native PyTorch implementation may repeatedly load data and run the same pre-processing steps for every data point during training, which can be time-consuming and redundant.
178
178
179
-
MONAI provides a multi-thread `CacheDataset` and `LMDBDataset` to accelerate these loading steps during training by storing the intermediate outcomes before the first randomized transform in the transform chain. Enabling this feature could potentially give 10x training speedups in the [Datasets experiment](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/dataset_type_performance.ipynb).
179
+
MONAI provides a multi-thread `CacheDataset` and `LMDBDataset` to accelerate these loading steps during training by storing the intermediate outcomes before the first randomized transform in the transform chain. Enabling this feature could potentially give 10x training speedups in the [Datasets experiment](dataset_type_performance.ipynb).
180
180

181
181
182
182
### 2. Cache intermediate outcomes into persistent storage
183
183
184
-
`PersistentDataset` is similar to `CacheDataset`, where the caches are persisted to disk storage or LMDB for rapid retrieval across experimental runs (as is the case when tuning hyperparameters), or when the entire size of the dataset exceeds available memory. `PersistentDataset` could achieve similar performance when comparing to `CacheDataset` in [Datasets experiment](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/dataset_type_performance.ipynb).
184
+
`PersistentDataset` is similar to `CacheDataset`, where the caches are persisted to disk storage or LMDB for rapid retrieval across experimental runs (as is the case when tuning hyperparameters), or when the entire size of the dataset exceeds available memory. `PersistentDataset` could achieve similar performance when comparing to `CacheDataset` in [Datasets experiment](dataset_type_performance.ipynb).
Full example of `SmartCacheDataset` is available at [Distributed training with SmartCache](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/distributed_training/unet_training_smartcache.py).
202
+
Full example of `SmartCacheDataset` is available at [Distributed training with SmartCache](distributed_training/unet_training_smartcache.py).
203
203
204
204
### 4. `ThreadDataLoader` vs. `DataLoader`
205
205
206
206
If the transforms are light-weighted, especially when we cache all the data in RAM, the multiprocessing of PyTorch `DataLoader` may cause unnecessary IPC time and cause the drop of GPU utilization after every epoch. MONAI provides `ThreadDataLoader` which executes transforms in a separate thread:
a `ThreadDataLoader` example is available at [Spleen fast training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/fast_training_tutorial.ipynb).
208
+
a `ThreadDataLoader` example is available at [Spleen fast training tutorial](fast_training_tutorial.ipynb).
209
209
210
210
## Algorithmic improvement
211
211
@@ -221,7 +221,7 @@ The following figure shows the great improvement in model convergence after we c
221
221

222
222
223
223
Furthermore, we changed default optimizer to Novograd, modified learning rate related settings, and added other necessary improvements. The concrete examples are shown in
224
-
[spleen fast training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/fast_training_tutorial.ipynb) and [BraTS distributed training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/distributed_training/brats_training_ddp.py). Both are very typical applications in 3D medical image segmentation but with unique challenges. Spleen segmentation has very limited data but with large image size, and brain tumor segmentation has relatively small image samples but with a much larger data pool. Combing algorithmic improvement with computing improvement, our model training cost is significantly reduced when reaching the same level of performance as the existing pipeline.
224
+
[spleen fast training tutorial](fast_training_tutorial.ipynb) and [BraTS distributed training tutorial](distributed_training/brats_training_ddp.py). Both are very typical applications in 3D medical image segmentation but with unique challenges. Spleen segmentation has very limited data but with large image size, and brain tumor segmentation has relatively small image samples but with a much larger data pool. Combing algorithmic improvement with computing improvement, our model training cost is significantly reduced when reaching the same level of performance as the existing pipeline.
225
225
226
226
## Optimizing GPU utilization
227
227
@@ -238,15 +238,15 @@ We tried to compare the training speed of the spleen segmentation task if AMP ON
AMP tutorial is available at [AMP tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/automatic_mixed_precision.ipynb).
241
+
AMP tutorial is available at [AMP tutorial](automatic_mixed_precision.ipynb).
242
242
243
243
### 2. Execute transforms on GPU
244
244
245
245
Running preprocessing transforms on CPU while keeping GPU busy by running the model training is a common practice and is an optimal resource distribution in many use cases.
246
246
From MONAI v0.7 we introduced PyTorch `Tensor` based computation in transforms, many transforms already support both `NumPy array` and PyTorch `Tensor` as input types and computational backends. To get the supported backends of every transform, please execute: `python monai/transforms/utils.py`.
247
247
248
248
To accelerate the high-computation transforms, users can first convert input data into GPU Tensor by `ToTensor` or `EnsureType` transform, then the following transforms can execute on GPU based on PyTorch `Tensor` APIs.
249
-
GPU transform tutorial is available at [Spleen fast training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/fast_training_tutorial.ipynb).
249
+
GPU transform tutorial is available at [Spleen fast training tutorial](fast_training_tutorial.ipynb).
Here we convert to PyTorch `Tensor` with `EnsureTyped` transform and move data to GPU with `ToDeviced` transform. `CacheDataset` caches the transform results until `ToDeviced`, so it is in GPU memory. Then in every epoch, the program fetches cached data from GPU memory and only execute the random transform `RandCropByPosNegLabeld` on GPU directly.
286
-
GPU caching example is available at [Spleen fast training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/fast_training_tutorial.ipynb).
286
+
GPU caching example is available at [Spleen fast training tutorial](fast_training_tutorial.ipynb).
287
287
288
288
## Leveraging multi-GPU distributed training
289
289
@@ -293,19 +293,19 @@ Additionally, with more GPU devices, we can achieve more benefits:
293
293
- Some training algorithms can converge faster with a larger batch size and the training progress is more stable.
294
294
- If caching data in GPU memory, every GPU only needs to cache a partition, so we can use a larger cache rate to cache more data in total to accelerate training. Caching data to GPU can largely reduce CPU-based operations during model training. It can greatly improve the model training efficiency.
295
295
296
-
For example, during the training of brain tumor segmentation task, with 8 GPUs, we can cache all the data in GPU memory directly and execute the following transforms on GPU device, so it's more than `10x` faster than single GPU training. More details are available at [BraTS distributed training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/distributed_training/brats_training_ddp.py).
296
+
For example, during the training of brain tumor segmentation task, with 8 GPUs, we can cache all the data in GPU memory directly and execute the following transforms on GPU device, so it's more than `10x` faster than single GPU training. More details are available at [BraTS distributed training tutorial](distributed_training/brats_training_ddp.py).
297
297
298
298
## Leveraging multi-node distributed training
299
299
300
300
Distributed data parallelism (DDP) is an important feature of PyTorch to connect multiple GPU devices in multiple nodes to train or evaluate models. It can further improve the training speed when we fully leveraged multiple GPUs on multiple nodes.
301
301
302
-
The distributed data parallel APIs of MONAI are compatible with the native PyTorch distributed module, PyTorch-ignite distributed module, Horovod, XLA, and the SLURM platform. Here we provide [a real-world training example](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/distributed_training/brats_training_ddp.py) based on [Decathlon challenge](http://medicaldecathlon.com/) Task01 - Brain Tumor segmentation using the module `torch.distributed.launch`.
302
+
The distributed data parallel APIs of MONAI are compatible with the native PyTorch distributed module, PyTorch-ignite distributed module, Horovod, XLA, and the SLURM platform. Here we provide [a real-world training example](distributed_training/brats_training_ddp.py) based on [Decathlon challenge](http://medicaldecathlon.com/) Task01 - Brain Tumor segmentation using the module `torch.distributed.launch`.
303
303
304
304
For more details about the PyTorch distributed training setup, please refer to: https://pytorch.org/docs/stable/distributed.html.
305
305
306
306
And if using [SLURM](https://developer.nvidia.com/slurm) workload manager, please refer to [SLURM + Singularity MONAI example](https://github.com/UFResearchComputing/MultiNode_MONAI_example).
307
307
308
-
More details are available at [BraTS distributed training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/distributed_training/brats_training_ddp.py).
308
+
More details are available at [BraTS distributed training tutorial](distributed_training/brats_training_ddp.py).
309
309
310
310
## Examples
311
311
@@ -323,7 +323,7 @@ With all the above strategies, in this section, we introduce how to apply them t
323
323
In summary, with a A100 GPU and the target validation `mean dice = 0.94` of the `forground` channel only, it's more than `150x` speedup compared with the Pytorch regular implementation when achieving the same metric (validation accuracies). And every epoch is `50x` faster than regular training.
324
324

325
325
326
-
More details are available at [Spleen fast training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/fast_training_tutorial.ipynb).
326
+
More details are available at [Spleen fast training tutorial](fast_training_tutorial.ipynb).
327
327
328
328
### 2. Brain tumor segmentation
329
329
@@ -343,7 +343,7 @@ In summary, combining the optimization strategies, the training time of eight A1
More details are available at [BraTS distributed training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/distributed_training/brats_training_ddp.py).
346
+
More details are available at [BraTS distributed training tutorial](distributed_training/brats_training_ddp.py).
347
347
348
348
349
349
### 3. Pathology metastasis detection task
@@ -356,4 +356,4 @@ In this way, we accelerated the data loading, data augmentation and preprocessin
356
356
- In these two experiments, the corresponding best FROC achieved is 0.70 for baseline (Numpy) pipeline at epoch 6, and 0.69 for CuCIM pipeline at epoch 2. Please note that the epoch at which the best model is achieved, as well as its corresponding FROC, can have some variabilities across runs with different random seeds.
More details are available at [Profiling Pathology Metastasis Detection Pipeline](https://github.com/Project-MONAI/tutorials/blob/main/performance_profiling/pathology/profiling_train_base_nvtx.md).
359
+
More details are available at [Profiling Pathology Metastasis Detection Pipeline](../performance_profiling/pathology/profiling_train_base_nvtx.md).
Copy file name to clipboardExpand all lines: bundle/introducing_config/README.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -155,5 +155,5 @@ plt.show()
155
155
- Running customized Python components (made available on the `PYTHONPATH`, more examples [in the model_zoo](https://github.com/Project-MONAI/model-zoo)).
156
156
- Overriding the component in `example.yaml` using, for example, `--id=new_value` in the command line.
157
157
- Multiple configuration files and cross-file references.
158
-
- Replacing in terms of plain texts instead of Python objects ([tutorial](https://github.com/Project-MONAI/tutorials/blob/main/bundle/get_started.md)).
158
+
- Replacing in terms of plain texts instead of Python objects ([tutorial](../get_started.md)).
159
159
- The debugging mode to investigate the intermediate variables and results.
Copy file name to clipboardExpand all lines: deployment/bentoml/mednist_classifier_bentoml.ipynb
+2-1Lines changed: 2 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -17,12 +17,13 @@
17
17
]
18
18
},
19
19
{
20
+
"attachments": {},
20
21
"cell_type": "markdown",
21
22
"metadata": {},
22
23
"source": [
23
24
"# Deploying a MedNIST Classifier with BentoML\n",
24
25
"\n",
25
-
"This notebook demos the process of packaging up a trained model using BentoML into an artifact which can be run as a local program performing inference, a web service doing the same, and a Docker containerized web service. BentoML provides various ways of deploying models with existing platforms like AWS or Azure but we'll focus on local deployment here since researchers are more likely to do this. This tutorial will train a MedNIST classifier like the [MONAI tutorial here](https://github.com/Project-MONAI/tutorials/blob/main/2d_classification/mednist_tutorial.ipynb) and then do the packaging as described in this [BentoML tutorial](https://github.com/bentoml/gallery/blob/master/pytorch/fashion-mnist/pytorch-fashion-mnist.ipynb)."
26
+
"This notebook demos the process of packaging up a trained model using BentoML into an artifact which can be run as a local program performing inference, a web service doing the same, and a Docker containerized web service. BentoML provides various ways of deploying models with existing platforms like AWS or Azure but we'll focus on local deployment here since researchers are more likely to do this. This tutorial will train a MedNIST classifier like the [MONAI tutorial here](../../2d_classification/mednist_tutorial.ipynb) and then do the packaging as described in this [BentoML tutorial](https://github.com/bentoml/gallery/blob/master/pytorch/fashion-mnist/pytorch-fashion-mnist.ipynb)."
Copy file name to clipboardExpand all lines: experiment_management/bundle_integrate_mlflow.ipynb
+2-1Lines changed: 2 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,7 @@
1
1
{
2
2
"cells": [
3
3
{
4
+
"attachments": {},
4
5
"cell_type": "markdown",
5
6
"metadata": {},
6
7
"source": [
@@ -22,7 +23,7 @@
22
23
"2. Use MLflow in MONAI bundle with a settings JSON file.\n",
23
24
"3. Use MLflow in parsed MONAI bundle with python code.\n",
24
25
"\n",
25
-
"This tutorial takes the [3D spleen segmentation task](https://github.com/Project-MONAI/tutorials/blob/main/3d_segmentation/spleen_segmentation_3d.ipynb) as an example. In order to quickly verify the MLflow function, each example will only run 10 epochs."
26
+
"This tutorial takes the [3D spleen segmentation task](../3d_segmentation/spleen_segmentation_3d.ipynb) as an example. In order to quickly verify the MLflow function, each example will only run 10 epochs."
0 commit comments