auto3dseg features: gpu optimization (#1187)

dongyang0122 · dongy · pre-commit-ci[bot] · web-flow · commit 0f290bc2a3cc · 2023-01-21T09:14:20.000-07:00
### Description
Added new auto3dseg features: gpu based hyper-parameter optimization.

### Checks
&lt;!--- Put an `x` in all the boxes that apply, and remove the not
applicable items --&gt;
- [ ] Avoid including large-size files in the PR.
- [ ] Clean up long text outputs from code cells in the notebook.
- [ ] For security purposes, please check the contents and remove any
sensitive info such as user names and private key.
- [ ] Ensure (1) hyperlinks and markdown anchors are working (2) use
relative paths for tutorial repo files (3) put figure and graphs in the
`./figure` folder
- [ ] Notebook runs automatically `./runner.sh -t &lt;path to .ipynb file&gt;`

Signed-off-by: dongy &lt;dongy@nvidia.com&gt;
Signed-off-by: dongyang0122 &lt;don.yang.mech@gmail.com&gt;
Co-authored-by: dongy &lt;dongy@nvidia.com&gt;
Co-authored-by: pre-commit-ci[bot] &lt;66853113+pre-commit-ci[bot]@users.noreply.github.com&gt;
Co-authored-by: Mingxin Zheng &lt;18563433+mingxin-zheng@users.noreply.github.com&gt;
diff --git a/auto3dseg/README.md b/auto3dseg/README.md
@@ -92,6 +92,13 @@ Each module of **Auto3DSeg** in different steps can be individually used for dif
 - Step 4: [Hyper-parameter optimization](docs/hpo.md)
 - Step 5: [Model ensemble](docs/ensemble.md)
 
+## GPU utilization optimization
+
+Given the variety of GPU devices users have, we provide an automated way to optimize the GPU utilization (e.g., memory usage) of algorithms in Auto3DSeg.
+During algorithm generation, users can enable the optimization option.
+Auto3DSeg can then further automatically tune the hyper-parameters to fully utilize the available GPU capacity.
+Concrete examples can be found [here](docs/gpu_opt.md).
+
 ## FAQ
 
 Please refer to [FAQ](docs/faq.md) for frequently asked questions.
diff --git a/auto3dseg/docs/gpu_opt.md b/auto3dseg/docs/gpu_opt.md
@@ -0,0 +1,55 @@
+## GPU Utlization Optimization
+
+### Introduction
+
+We introduced an automated solution to optimize the GPU usage of algorithms in Auto3DSeg.
+Typically, the most time-consuming process in Auto3DSeg is model training.
+Sometimes the low GPU utilization is because that GPU capacities is not fully utilized with fixed hyperparameters.
+Our proposed solution is capable to automatically estimate hyper-parameters in model training configurations maximizing utilities of the available GPU capacities.
+The solution is leveraging hyper-parameter optimization algorithms to search for optimital hyper-parameters with any given GPU devices.
+
+The following hyper-paramters in model training configurations are optimized in the process.
+
+1. **num_images_per_batch:** Batch size determines how many images are in each mini-batch and how many training iterations per epoch. Large batch size can reduce training time per epoch and increase GPU memory usage with decent CPU capacities for I/O;
+2. **num_sw_batch_size:** Batch size in sliding-window inference directly relates to how many patches are in one pass of model feedforward operation. Large batch size in sliding-window inference can reduce overall inference time and increase GPU memory usage;
+3. **validation_data_device:** Validation device indicates if the volume is stored on GPU or CPU. Ideally, it would be fast to store input volumes onto GPU for inference. However, if 3D volumes are very large and GPU memory is limited, we have to store the image arrays on CPU (instead of GPU) and put patches of volumes on GPU for inference;
+4. **num_trials:** The trial number defines the time length of the optimization process. The larger the number of trials, the longer the optimization process.
+
+### Usage
+
+User can follow the [tutorial](../notebooks/auto3dseg_autorunner_ref_api.ipynb) for experiments, and modify the cell of algorithm generation with the following scripts.
+And we allow the user to define the scope of the hyper-parameter optimization process instead of using default settings.
+If the key in `gpu_customization_specs` is `universal`, it means that the range settings will apply to all algorithms; if the keys are algorithm names, then the ranges correspond to different algorithms respectively.
+
+As shown in the following code snippet, the user can select the algorithm to use.
+If `algos` is a dictionary, it outlines the algorithm to use.
+If `algos` is a list or a string, defines a subset of names of the algorithms to use, e.g. (`segresnet`, `dints`) out of the full set of algorithm templates provided by templates_path_or_url. Defaults value of `algos` is None which means using all available algorithms.
+
+```python
+bundle_generator = BundleGen(
+    algo_path=work_dir,
+    algos="dints",
+    data_stats_filename=datastats_file,
+    data_src_cfg_name=input,
+)
+
+gpu_customization_specs = {
+    "universal": {"num_trials": 20, "range_num_images_per_batch": [1, 20], "range_num_sw_batch_size": [1, 40]}
+}
+bundle_generator.generate(
+    work_dir,
+    num_fold=5,
+    gpu_customization=True,
+    gpu_customization_specs=gpu_customization_specs,
+)
+```
+
+### Effect
+
+We take the `DiNTS` algorithm in `MSD Task02_Heart` as an example, and compare the training process of one model before and after GPU opmization when using an 80GB A100 GPU.
+And training loss curves and validation accuracy curvers are compared between the training processes with and without GPU opmization.
+After GPU optimization, the batch size is increased from 1 to 8, batch size in sliding-window inference is increased to 14, and validation device is GPU.
+From the following figure, we can see that model converge faster when using GPU optimization, and validation accuracies becomes better on average.
+The improvement is primarily to take advantage of the more GPU capacity available on the A100.
+
+<div align="center"> <img src="../figures/gpu_opt.png" width="960"/> </div>
diff --git a/auto3dseg/figures/gpu_opt.png b/auto3dseg/figures/gpu_opt.png