Skip to content

Commit e896eca

Browse files
fix external links and duplicate license (#1211)
Signed-off-by: Mingxin Zheng <[email protected]> Fixes #1210 . ### Description MONAI Tutorial repo has version tags. When users navigate the repo, it is expected that hyperlinks are pointing to the files and folders in the same tags, instead of other branches such as `main`. - Fixed hyperlinks to local repo resource and ensure they are using internal relative paths - Removed duplicated license ### Checks <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Avoid including large-size files in the PR. - [x] Clean up long text outputs from code cells in the notebook. - [x] For security purposes, please check the contents and remove any sensitive info such as user names and private key. - [x] Ensure (1) hyperlinks and markdown anchors are working (2) use relative paths for tutorial repo files (3) put figure and graphs in the `./figure` folder - [ ] Notebook runs automatically `./runner.sh -t <path to .ipynb file>` --------- Signed-off-by: Mingxin Zheng <[email protected]>
1 parent 28d253b commit e896eca

File tree

18 files changed

+65
-73
lines changed

18 files changed

+65
-73
lines changed

3d_segmentation/challenge_baseline/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -109,9 +109,9 @@ This baseline method achieves 0.6904 ± 0.1801 Dice score on the challenge valid
109109

110110
- For MONAI technical documentation, please visit [docs.monai.io](https://docs.monai.io/).
111111
- Please visit [`Project-MONAI/tutorials`](https://github.com/Project-MONAI/tutorials) for more examples, including:
112-
- [`3D segmentation pipelines`](https://github.com/Project-MONAI/tutorials/tree/main/3d_segmentation),
113-
- [`Dynamic UNet`](https://github.com/Project-MONAI/tutorials/blob/main/modules/dynunet_tutorial.ipynb),
114-
- [`Training acceleration`](https://github.com/Project-MONAI/tutorials/tree/main/acceleration).
112+
- [`3D segmentation pipelines`](../),
113+
- [`Dynamic UNet`](../../modules/dynunet_pipeline/README.md),
114+
- [`Training acceleration`](../../acceleration).
115115

116116
## Submitting to the leaderboard
117117

3d_segmentation/unet_segmentation_3d_catalyst.ipynb

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
{
22
"cells": [
33
{
4+
"attachments": {},
45
"cell_type": "markdown",
56
"metadata": {
67
"colab_type": "text",
@@ -33,7 +34,7 @@
3334
"* Sliding window inference method.\n",
3435
"* Deterministic training for reproducibility.\n",
3536
"\n",
36-
"This tutorial is based on [unet_training_dict.py](https://github.com/Project-MONAI/tutorials/blob/main/3d_segmentation/torch/unet_training_dict.py) and [spleen_segmentation_3d.ipynb](https://github.com/Project-MONAI/tutorials/blob/main/3d_segmentation/spleen_segmentation_3d.ipynb).\n",
37+
"This tutorial is based on [unet_training_dict.py](torch/unet_training_dict.py) and [spleen_segmentation_3d.ipynb](spleen_segmentation_3d.ipynb).\n",
3738
"\n",
3839
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Project-MONAI/tutorials/blob/main/3d_segmentation/unet_segmentation_3d_catalyst.ipynb)"
3940
]

acceleration/fast_model_training_guide.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ And the output `.json` file contains various aspects of GPU information.
7777
[NVIDIA Nsight™ Systems](https://developer.nvidia.com/nsight-systems) is a system-wide performance analysis tool designed to visualize algorithms, help to identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs.
7878

7979
Nsight provides a great GUI to visualize the output database (`.qdrep` file) from the analysis results of DLProf. With necessary annotation inside the existing training scripts. The GPU utilization of each operation can be seen through the interface. Then, users understand better which components are the bottlenecks. The detailed example is shown in the following
80-
[performance profiling tutorial]( https://github.com/Project-MONAI/tutorials/blob/main/performance_profiling/).
80+
[performance profiling tutorial](../performance_profiling/).
8181

8282
As shown in the following figure, each training epoch can be decomposed into multiple steps, including data loading (I/O), model forward/backward operation, optimization, etc. Then, necessary improvement can be conducted targeting certain steps. For example, if data loading (I/O) takes too much time during training, we can try to cache them into CPU/GPU bypassing data loading and pre-processing. After program optimization, users can re-run the analysis to compare the results before and after optimization.
8383

@@ -126,7 +126,7 @@ with torch.cuda.amp.autocast():
126126
nvtx.end_range(rng_train_forward)
127127
```
128128

129-
The concrete examples can be found in the profiling tutorials of [radiology pipeline]( https://github.com/Project-MONAI/tutorials/blob/main/performance_profiling/radiology) and [pathology pipelines](https://github.com/Project-MONAI/tutorials/blob/main/pathology/tumor_detection/ignite/profiling_camelyon_pipeline.ipynb).
129+
The concrete examples can be found in the profiling tutorials of [radiology pipeline](../performance_profiling/radiology) and [pathology pipelines](../pathology/tumor_detection/ignite/profiling_camelyon_pipeline.ipynb).
130130

131131
### 4. NVIDIA Management Library (NVML)
132132

@@ -176,12 +176,12 @@ With the tools described in the previous sections, we can identify the bottlenec
176176

177177
Users often need to train the model with many (potentially thousands of) epochs over the training data to achieve decent model quality. A native PyTorch implementation may repeatedly load data and run the same pre-processing steps for every data point during training, which can be time-consuming and redundant.
178178

179-
MONAI provides a multi-thread `CacheDataset` and `LMDBDataset` to accelerate these loading steps during training by storing the intermediate outcomes before the first randomized transform in the transform chain. Enabling this feature could potentially give 10x training speedups in the [Datasets experiment](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/dataset_type_performance.ipynb).
179+
MONAI provides a multi-thread `CacheDataset` and `LMDBDataset` to accelerate these loading steps during training by storing the intermediate outcomes before the first randomized transform in the transform chain. Enabling this feature could potentially give 10x training speedups in the [Datasets experiment](dataset_type_performance.ipynb).
180180
![cache dataset](../figures/cache_dataset.png)
181181

182182
### 2. Cache intermediate outcomes into persistent storage
183183

184-
`PersistentDataset` is similar to `CacheDataset`, where the caches are persisted to disk storage or LMDB for rapid retrieval across experimental runs (as is the case when tuning hyperparameters), or when the entire size of the dataset exceeds available memory. `PersistentDataset` could achieve similar performance when comparing to `CacheDataset` in [Datasets experiment](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/dataset_type_performance.ipynb).
184+
`PersistentDataset` is similar to `CacheDataset`, where the caches are persisted to disk storage or LMDB for rapid retrieval across experimental runs (as is the case when tuning hyperparameters), or when the entire size of the dataset exceeds available memory. `PersistentDataset` could achieve similar performance when comparing to `CacheDataset` in [Datasets experiment](dataset_type_performance.ipynb).
185185

186186
![cachedataset speed](../figures/datasets_speed.png)
187187

@@ -199,13 +199,13 @@ epoch 3: [image3, image4, image5, image1]
199199
epoch 3: [image4, image5, image1, image2]
200200
epoch N: [image[N % 5] ...]
201201
```
202-
Full example of `SmartCacheDataset` is available at [Distributed training with SmartCache](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/distributed_training/unet_training_smartcache.py).
202+
Full example of `SmartCacheDataset` is available at [Distributed training with SmartCache](distributed_training/unet_training_smartcache.py).
203203

204204
### 4. `ThreadDataLoader` vs. `DataLoader`
205205

206206
If the transforms are light-weighted, especially when we cache all the data in RAM, the multiprocessing of PyTorch `DataLoader` may cause unnecessary IPC time and cause the drop of GPU utilization after every epoch. MONAI provides `ThreadDataLoader` which executes transforms in a separate thread:
207207
![threaddataloader](../figures/threaddataloader.png)
208-
a `ThreadDataLoader` example is available at [Spleen fast training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/fast_training_tutorial.ipynb).
208+
a `ThreadDataLoader` example is available at [Spleen fast training tutorial](fast_training_tutorial.ipynb).
209209

210210
## Algorithmic improvement
211211

@@ -221,7 +221,7 @@ The following figure shows the great improvement in model convergence after we c
221221
![diceceloss](../figures/diceceloss.png)
222222

223223
Furthermore, we changed default optimizer to Novograd, modified learning rate related settings, and added other necessary improvements. The concrete examples are shown in
224-
[spleen fast training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/fast_training_tutorial.ipynb) and [BraTS distributed training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/distributed_training/brats_training_ddp.py). Both are very typical applications in 3D medical image segmentation but with unique challenges. Spleen segmentation has very limited data but with large image size, and brain tumor segmentation has relatively small image samples but with a much larger data pool. Combing algorithmic improvement with computing improvement, our model training cost is significantly reduced when reaching the same level of performance as the existing pipeline.
224+
[spleen fast training tutorial](fast_training_tutorial.ipynb) and [BraTS distributed training tutorial](distributed_training/brats_training_ddp.py). Both are very typical applications in 3D medical image segmentation but with unique challenges. Spleen segmentation has very limited data but with large image size, and brain tumor segmentation has relatively small image samples but with a much larger data pool. Combing algorithmic improvement with computing improvement, our model training cost is significantly reduced when reaching the same level of performance as the existing pipeline.
225225

226226
## Optimizing GPU utilization
227227

@@ -238,15 +238,15 @@ We tried to compare the training speed of the spleen segmentation task if AMP ON
238238

239239
![amp a100 results](../figures/amp_training_a100.png)
240240

241-
AMP tutorial is available at [AMP tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/automatic_mixed_precision.ipynb).
241+
AMP tutorial is available at [AMP tutorial](automatic_mixed_precision.ipynb).
242242

243243
### 2. Execute transforms on GPU
244244

245245
Running preprocessing transforms on CPU while keeping GPU busy by running the model training is a common practice and is an optimal resource distribution in many use cases.
246246
From MONAI v0.7 we introduced PyTorch `Tensor` based computation in transforms, many transforms already support both `NumPy array` and PyTorch `Tensor` as input types and computational backends. To get the supported backends of every transform, please execute: `python monai/transforms/utils.py`.
247247

248248
To accelerate the high-computation transforms, users can first convert input data into GPU Tensor by `ToTensor` or `EnsureType` transform, then the following transforms can execute on GPU based on PyTorch `Tensor` APIs.
249-
GPU transform tutorial is available at [Spleen fast training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/fast_training_tutorial.ipynb).
249+
GPU transform tutorial is available at [Spleen fast training tutorial](fast_training_tutorial.ipynb).
250250

251251
### 3. Adapt `cuCIM` to execute GPU transforms
252252

@@ -283,7 +283,7 @@ train_transforms = [
283283
dataset = CacheDataset(..., transform=train_trans)
284284
```
285285
Here we convert to PyTorch `Tensor` with `EnsureTyped` transform and move data to GPU with `ToDeviced` transform. `CacheDataset` caches the transform results until `ToDeviced`, so it is in GPU memory. Then in every epoch, the program fetches cached data from GPU memory and only execute the random transform `RandCropByPosNegLabeld` on GPU directly.
286-
GPU caching example is available at [Spleen fast training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/fast_training_tutorial.ipynb).
286+
GPU caching example is available at [Spleen fast training tutorial](fast_training_tutorial.ipynb).
287287

288288
## Leveraging multi-GPU distributed training
289289

@@ -293,19 +293,19 @@ Additionally, with more GPU devices, we can achieve more benefits:
293293
- Some training algorithms can converge faster with a larger batch size and the training progress is more stable.
294294
- If caching data in GPU memory, every GPU only needs to cache a partition, so we can use a larger cache rate to cache more data in total to accelerate training. Caching data to GPU can largely reduce CPU-based operations during model training. It can greatly improve the model training efficiency.
295295

296-
For example, during the training of brain tumor segmentation task, with 8 GPUs, we can cache all the data in GPU memory directly and execute the following transforms on GPU device, so it's more than `10x` faster than single GPU training. More details are available at [BraTS distributed training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/distributed_training/brats_training_ddp.py).
296+
For example, during the training of brain tumor segmentation task, with 8 GPUs, we can cache all the data in GPU memory directly and execute the following transforms on GPU device, so it's more than `10x` faster than single GPU training. More details are available at [BraTS distributed training tutorial](distributed_training/brats_training_ddp.py).
297297

298298
## Leveraging multi-node distributed training
299299

300300
Distributed data parallelism (DDP) is an important feature of PyTorch to connect multiple GPU devices in multiple nodes to train or evaluate models. It can further improve the training speed when we fully leveraged multiple GPUs on multiple nodes.
301301

302-
The distributed data parallel APIs of MONAI are compatible with the native PyTorch distributed module, PyTorch-ignite distributed module, Horovod, XLA, and the SLURM platform. Here we provide [a real-world training example](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/distributed_training/brats_training_ddp.py) based on [Decathlon challenge](http://medicaldecathlon.com/) Task01 - Brain Tumor segmentation using the module `torch.distributed.launch`.
302+
The distributed data parallel APIs of MONAI are compatible with the native PyTorch distributed module, PyTorch-ignite distributed module, Horovod, XLA, and the SLURM platform. Here we provide [a real-world training example](distributed_training/brats_training_ddp.py) based on [Decathlon challenge](http://medicaldecathlon.com/) Task01 - Brain Tumor segmentation using the module `torch.distributed.launch`.
303303

304304
For more details about the PyTorch distributed training setup, please refer to: https://pytorch.org/docs/stable/distributed.html.
305305

306306
And if using [SLURM](https://developer.nvidia.com/slurm) workload manager, please refer to [SLURM + Singularity MONAI example](https://github.com/UFResearchComputing/MultiNode_MONAI_example).
307307

308-
More details are available at [BraTS distributed training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/distributed_training/brats_training_ddp.py).
308+
More details are available at [BraTS distributed training tutorial](distributed_training/brats_training_ddp.py).
309309

310310
## Examples
311311

@@ -323,7 +323,7 @@ With all the above strategies, in this section, we introduce how to apply them t
323323
In summary, with a A100 GPU and the target validation `mean dice = 0.94` of the `forground` channel only, it's more than `150x` speedup compared with the Pytorch regular implementation when achieving the same metric (validation accuracies). And every epoch is `50x` faster than regular training.
324324
![spleen fast training](../figures/fast_training.png)
325325

326-
More details are available at [Spleen fast training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/fast_training_tutorial.ipynb).
326+
More details are available at [Spleen fast training tutorial](fast_training_tutorial.ipynb).
327327

328328
### 2. Brain tumor segmentation
329329

@@ -343,7 +343,7 @@ In summary, combining the optimization strategies, the training time of eight A1
343343

344344
![brats benchmark](../figures/brats_benchmark.png)
345345

346-
More details are available at [BraTS distributed training tutorial](https://github.com/Project-MONAI/tutorials/blob/main/acceleration/distributed_training/brats_training_ddp.py).
346+
More details are available at [BraTS distributed training tutorial](distributed_training/brats_training_ddp.py).
347347

348348

349349
### 3. Pathology metastasis detection task
@@ -356,4 +356,4 @@ In this way, we accelerated the data loading, data augmentation and preprocessin
356356
- In these two experiments, the corresponding best FROC achieved is 0.70 for baseline (Numpy) pipeline at epoch 6, and 0.69 for CuCIM pipeline at epoch 2. Please note that the epoch at which the best model is achieved, as well as its corresponding FROC, can have some variabilities across runs with different random seeds.
357357
![pathology gpu utilization](../figures/train_loss_pathology.png)
358358

359-
More details are available at [Profiling Pathology Metastasis Detection Pipeline](https://github.com/Project-MONAI/tutorials/blob/main/performance_profiling/pathology/profiling_train_base_nvtx.md).
359+
More details are available at [Profiling Pathology Metastasis Detection Pipeline](../performance_profiling/pathology/profiling_train_base_nvtx.md).

bundle/introducing_config/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,5 +155,5 @@ plt.show()
155155
- Running customized Python components (made available on the `PYTHONPATH`, more examples [in the model_zoo](https://github.com/Project-MONAI/model-zoo)).
156156
- Overriding the component in `example.yaml` using, for example, `--id=new_value` in the command line.
157157
- Multiple configuration files and cross-file references.
158-
- Replacing in terms of plain texts instead of Python objects ([tutorial](https://github.com/Project-MONAI/tutorials/blob/main/bundle/get_started.md)).
158+
- Replacing in terms of plain texts instead of Python objects ([tutorial](../get_started.md)).
159159
- The debugging mode to investigate the intermediate variables and results.

deployment/bentoml/mednist_classifier_bentoml.ipynb

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,12 +17,13 @@
1717
]
1818
},
1919
{
20+
"attachments": {},
2021
"cell_type": "markdown",
2122
"metadata": {},
2223
"source": [
2324
"# Deploying a MedNIST Classifier with BentoML\n",
2425
"\n",
25-
"This notebook demos the process of packaging up a trained model using BentoML into an artifact which can be run as a local program performing inference, a web service doing the same, and a Docker containerized web service. BentoML provides various ways of deploying models with existing platforms like AWS or Azure but we'll focus on local deployment here since researchers are more likely to do this. This tutorial will train a MedNIST classifier like the [MONAI tutorial here](https://github.com/Project-MONAI/tutorials/blob/main/2d_classification/mednist_tutorial.ipynb) and then do the packaging as described in this [BentoML tutorial](https://github.com/bentoml/gallery/blob/master/pytorch/fashion-mnist/pytorch-fashion-mnist.ipynb)."
26+
"This notebook demos the process of packaging up a trained model using BentoML into an artifact which can be run as a local program performing inference, a web service doing the same, and a Docker containerized web service. BentoML provides various ways of deploying models with existing platforms like AWS or Azure but we'll focus on local deployment here since researchers are more likely to do this. This tutorial will train a MedNIST classifier like the [MONAI tutorial here](../../2d_classification/mednist_tutorial.ipynb) and then do the packaging as described in this [BentoML tutorial](https://github.com/bentoml/gallery/blob/master/pytorch/fashion-mnist/pytorch-fashion-mnist.ipynb)."
2627
]
2728
},
2829
{

experiment_management/bundle_integrate_mlflow.ipynb

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
{
22
"cells": [
33
{
4+
"attachments": {},
45
"cell_type": "markdown",
56
"metadata": {},
67
"source": [
@@ -22,7 +23,7 @@
2223
"2. Use MLflow in MONAI bundle with a settings JSON file.\n",
2324
"3. Use MLflow in parsed MONAI bundle with python code.\n",
2425
"\n",
25-
"This tutorial takes the [3D spleen segmentation task](https://github.com/Project-MONAI/tutorials/blob/main/3d_segmentation/spleen_segmentation_3d.ipynb) as an example. In order to quickly verify the MLflow function, each example will only run 10 epochs."
26+
"This tutorial takes the [3D spleen segmentation task](../3d_segmentation/spleen_segmentation_3d.ipynb) as an example. In order to quickly verify the MLflow function, each example will only run 10 epochs."
2627
]
2728
},
2829
{

0 commit comments

Comments
 (0)