Project-MONAI · Nic-Ma · Nov 16, 2022 · Nov 15, 2022 · Nov 16, 2022
diff --git a/auto3dseg/notebooks/auto_runner.ipynb b/auto3dseg/notebooks/auto_runner.ipynb
@@ -8,24 +8,30 @@
     "\n",
     "This notebook will introduce `AutoRunner`, the interface to run the Auto3Dseg pipeline with minimal user inputs.\n",
     "\n",
-    "## 1. Set up environment, imports and datasets\n",
-    "### 1.1 Set up Environment"
+    "Specifically, it will show the features below:\n",
+    "1. Use `AutoRunner` with an input config file `input.yaml` example\n",
+    "2. How to prepare an `input.yaml`\n",
+    "3. How to configure the input/ouput folders\n",
+    "4. How to set the internal parameters of **Auto3DSeg** components\n",
+    "5. How to apply hyper parameter optimization\n",
+    "\n",
+    "## Setup environment"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "!python -c \"import monai\" || pip install -q \"monai-weekly[nibabel]\""
+    "!python -c \"import monai\" || pip install -q \"monai-weekly[nibabel, nni, tqdm, cucim, yaml, optuna]\""
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 1.2 Set up imports"
+    "## Setup imports"
    ]
   },
   {
@@ -44,10 +50,9 @@
    ],
    "source": [
     "import os\n",
+    "import tempfile\n",
     "import torch\n",
     "\n",
-    "from pathlib import Path\n",
-    "\n",
     "from monai.bundle.config_parser import ConfigParser\n",
     "from monai.apps import download_and_extract\n",
     "\n",
@@ -59,55 +64,35 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 1.3 Download public datasets"
+    "## Download dataset"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Task04_Hippocampus.tar: 27.1MB [00:15, 1.88MB/s]                              "
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "2022-10-18 08:11:37,235 - INFO - Downloaded: Task04_Hippocampus.tar\n",
-      "2022-10-18 08:11:37,235 - INFO - Expected md5 is None, skip md5 check for file Task04_Hippocampus.tar.\n",
-      "2022-10-18 08:11:37,236 - INFO - Writing into directory: ..\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
-    "root = str(Path(\".\"))\n",
+    "directory = os.environ.get(\"MONAI_DATA_DIRECTORY\")\n",
+    "root_dir = tempfile.mkdtemp() if directory is None else directory\n",
+    "print(root_dir)\n",
+    "\n",
     "msd_task = \"Task04_Hippocampus\"\n",
     "resource = \"https://msd-for-monai.s3-us-west-2.amazonaws.com/\" + msd_task + \".tar\"\n",
-    "compressed_file = os.path.join(root, msd_task + \".tar\")\n",
-    "if os.path.exists(root):\n",
-    "    download_and_extract(resource, compressed_file, root)\n",
     "\n",
-    "dataroot = os.path.join(root, msd_task)\n",
+    "compressed_file = os.path.join(root_dir, msd_task + \".tar\")\n",
+    "dataroot = os.path.join(root_dir, msd_task)\n",
+    "if os.path.exists(dataroot):\n",
+    "    download_and_extract(resource, compressed_file, root_dir)\n",
+    "\n",
     "datalist_file = os.path.join(\"..\", \"tasks\", \"msd\", msd_task, \"msd_\" + msd_task.lower() + \"_folds.json\")"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 1.4 Prepare a input YAML configuration"
+    "## Prepare a input YAML configuration"
    ]
   },
   {
@@ -117,8 +102,8 @@
    "outputs": [],
    "source": [
     "data_src_cfg = {\n",
-    "    \"name\": msd_task,  # optional\n",
-    "    \"task\": \"segmentation\",  # optional\n",
+    "    \"name\": msd_task,  # optional, it is only for your own record\n",
+    "    \"task\": \"segmentation\",  # optional, it is only for your own record\n",
     "    \"modality\": \"MRI\",  # required\n",
     "    \"datalist\": datalist_file,  # required\n",
     "    \"dataroot\": dataroot,  # required\n",
@@ -131,7 +116,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 2. Run the Auto3DSeg pipeline in a few lines of code\n",
+    "## Run the Auto3DSeg pipeline in a few lines of code\n",
     "\n",
     "Below is the typical usage of AutoRunner\n",
     "```python\n",
@@ -143,26 +128,14 @@
     "\n",
     "If the user would like to perform a full training in the tutorial, it is recommended to uncomment the `runner.run()` appended at the end of each code block.\n",
     "\n",
-    "### 2.1 Use the default setting"
+    "## Use the default setting with the input YAML file"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "2022-10-18 08:11:37,523 - INFO - ./work_dir does not exists. Creating...\n",
-      "2022-10-18 08:11:37,524 - INFO - ./work_dir created to save all results\n",
-      "2022-10-18 08:11:37,524 - INFO - Loading ./input.yaml for AutoRunner and making a copy in /workspace/monai/tutorials-in-dev/auto3dseg/notebooks/work_dir/input.yaml\n",
-      "2022-10-18 08:11:37,531 - INFO - The output_dir is not specified. /workspace/monai/tutorials-in-dev/auto3dseg/notebooks/work_dir/ensemble_output will be used to save ensemble predictions\n",
-      "2022-10-18 08:11:37,533 - INFO - Directory /workspace/monai/tutorials-in-dev/auto3dseg/notebooks/work_dir/ensemble_output is created to save ensemble predictions\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "runner = AutoRunner(input=input)\n",
     "# runner.run()"
@@ -172,23 +145,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 2.2 Use the dictionary instead of a YAML file as the input"
+    "## Use the default setting with the dictionary instead of the YAML file as the input"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "2022-10-18 08:11:37,674 - INFO - Work directory ./work_dir is used to save all results\n",
-      "2022-10-18 08:11:37,676 - INFO - The output_dir is not specified. /workspace/monai/tutorials-in-dev/auto3dseg/notebooks/work_dir/ensemble_output will be used to save ensemble predictions\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "runner = AutoRunner(input=data_src_cfg)\n",
     "# runner.run()"
@@ -198,8 +162,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 3 Customize and configure the Auto3Dseg\n",
-    "### 3.1 Set your working directory"
+    "## Customize working directory\n",
+    "`AutoRunner` provides the user interfaces to save all the intermediate and final results in a user-specified location.\n",
+    "Here we use `./my_workspace` as an example"
    ]
   },
   {
@@ -228,9 +193,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 3.2 Use cached result to save computation time\n",
+    "## Customize result caching\n",
     "\n",
-    "AutoRunner saves intermediate results by default. The user can choose whether it uses the cached results or restart from scratch.\n",
+    "AutoRunner saves intermediate results by default to save computation time.\n",
+    "The user can choose whether it uses the cached results or restart from scratch.\n",
     "\n",
     "If the users want to start from scratch, they can set `not_use_cache` to True"
    ]
@@ -265,7 +231,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 3.3 Output Ensemble Result\n",
+    "## Customize the output folder to save ensemble result\n",
     "\n",
     "AutoRunner will perform inference on the testing data specified by the `datalist` in the data source config input. The inference result will be written to the `ensemble_output` folder under the working directory in the form of `nii.gz`. The user can choose the format by adding keyword arguments to the AutoRunner. A list of argument can be found in [MONAI tranforms documentation](https://docs.monai.io/en/stable/transforms.html#saveimage)."
    ]
@@ -294,8 +260,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 4 Setting Auto3DSeg internal parameters\n",
-    "### 4.1 Change the number of folds for cross-validation"
+    "## Setting Auto3DSeg internal parameters\n",
+    "`Auto3DSeg` has four steps: data analysis, algorithm generation, training, and ensemble. Users can configure the internal parameters of the `AutoRunner` object to customize some steps in the pipeline.\n",
+    "\n",
+    "Below, we begin the experiments with a smaller number of cross-validation folds. The default is 5 in the algorithm but we set it to 2 here:"
    ]
   },
   {
@@ -323,41 +291,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 4.2 Customize traininig parameters by override the default values"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "2022-10-18 08:11:38,312 - INFO - Work directory ./work_dir is used to save all results\n",
-      "2022-10-18 08:11:38,314 - INFO - Loading ./input.yaml for AutoRunner and making a copy in /workspace/monai/tutorials-in-dev/auto3dseg/notebooks/work_dir/input.yaml\n",
-      "2022-10-18 08:11:38,320 - INFO - The output_dir is not specified. /workspace/monai/tutorials-in-dev/auto3dseg/notebooks/work_dir/ensemble_output will be used to save ensemble predictions\n"
-     ]
-    }
-   ],
-   "source": [
-    "runner = AutoRunner(input=input)\n",
-    "# Note: among the provided bundles, most networks takes \"num_iterations\" to control the training iterations except segresnet\n",
-    "train_param = {\"num_iterations\": 8}\n",
-    "runner.set_training_params(params=train_param)\n",
-    "# runner.run()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### 4.2.1 A common set of training parameter for all algorithm templates\n",
+    "## Customize training parameters by override the default values\n",
     "\n",
-    "Note: This is for demo purpose. The user doesn't need to specify this training params.\n",
+    "`set_training_params` in `AutoRunner` provides an interface to change all algorithms' training parameters in one line. \n",
     "\n",
-    "**Auto3DSeg** uses bundle templates to perform training, validation, and inference. The number of epochs/iterations of training is specified by the config files in each template. While we can override them, it is also noted that some bundle templates may use \"num_iterations\" and other may use \"num_epochs\" to iterate. Below is code-block to convert num_epoch to iteration style and override all algorithms with the same training parameters for 1-GPU/2-GPU machine. "
+    "Note: **Auto3DSeg** uses bundle templates to perform training, validation, and inference. The number of epochs/iterations of training is specified by the config files in each template. While we can override them, it is also noted that some bundle templates may use `num_iterations` and other may use `num_epochs` to iterate.\n",
+    "\n",
+    "For demo purpose, below is code-block to convert num_epoch to iteration style and override all algorithms with the same training parameters for 1-GPU/2-GPU machine. \n"
    ]
   },
   {
@@ -384,6 +324,7 @@
     "    \"num_epochs\": num_epoch,\n",
     "    \"num_warmup_iterations\": n_iter_val,\n",
     "}\n",
+    "runner = AutoRunner(input=input)\n",
     "runner.set_training_params(params=train_param)\n",
     "# runner.run()\n"
    ]
@@ -392,7 +333,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 4.3 Customize the ensemble method (mean vs. majority voting)"
+    "## Customize the ensemble method\n",
+    "\n",
+    "There are two supported methods: \"AlgoEnsembleBestN\" and \"AlgoEnsembleBestByFold\""
    ]
   },
   {
@@ -420,7 +363,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 4.4 Customize the inference parameters by override the default values"
+    "## Customize the inference parameters by override the default values"
    ]
   },
   {
@@ -454,12 +397,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 5 Train model with HPO (NNI Grid-search)\n",
-    "### 5.1 Apply HPO to search hyper-parameter in Auto3DSeg\n",
+    "## Train model with HPO (NNI Grid-search)\n",
     "\n",
-    "Note: Auto3DSeg supports hyper parameter optimization (HPO) via NNI and Optuna backends. Notebook of how to use these modules can be found in this directory.\n",
-    "AutoRunner supports NNI backend with a grid search method via automatically generating a the NNI config and run `nnictl` commands in subprocess.\n",
-    "Note: to run the HPO, you need to ensure the development environment has `nni` package. Please refer to the [MONAI Installation Guide](https://docs.monai.io/en/stable/installation.html#installing-the-recommended-dependencies) for how to install the recommended dependencies."
+    "**Auto3DSeg** supports hyper parameter optimization (HPO) via `NNI` and `Optuna` backends.\n",
+    "AutoRunner supports `NNI` backend with a grid search method via automatically generating a the `NNI` config and run `nnictl` commands in subprocess.\n",
+    "\n",
+    "Note: to run the HPO, you need to ensure the development environment has `nni` package.\n",
+    "Please refer to the [MONAI Installation Guide](https://docs.monai.io/en/stable/installation.html#installing-the-recommended-dependencies) for how to install the recommended dependencies."
    ]
   },
   {
@@ -488,9 +432,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 5.2 Override the templated values\n",
+    "## Override the templated values\n",
+    "\n",
+    "The default `NNI` config that `AutoRunner` looks like below. User can override some of the parameters via the `set_hpo_params` interface:\n",
     "\n",
-    "AutoRunner uses the following NNI config in its HPO module\n",
     "```python\n",
     "default_nni_config = {\n",
     "    \"trialCodeDirectory\": \".\",\n",
@@ -501,9 +446,7 @@
     "    \"tuner\": {\"name\": \"GridSearch\"},\n",
     "    \"trainingService\": {\"platform\": \"local\", \"useActiveGpu\": True},\n",
     "}\n",
-    "```\n",
-    "\n",
-    "It can be override by setting the hpo parameters"
+    "```"
    ]
   },
   {
@@ -534,7 +477,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 6 Conclusion\n",
+    "For more details about the usage of **Auto3DSeg** HPO features, please check the [Auto3DSeg NNI Notebok](./hpo_nni.ipynb) and [Auto3DSeg Optuna Notebook](./hpo_optuna.ipynb)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Conclusion\n",
     "\n",
     "Here we demonstrate how to use the AutoRunner APIs to customize your **Auto3DSeg** pipeline with mininal inputs. Don't forget you need to execute the `run` command to start the training and make everything take effect.\n",
     "\n",
@@ -546,7 +496,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3.8.13 ('base')",
+   "display_name": "Python 3.8.10 64-bit",
    "language": "python",
    "name": "python3"
   },
@@ -560,12 +510,12 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.13"
+   "version": "3.8.10"
   },
   "orig_nbformat": 4,
   "vscode": {
    "interpreter": {
-    "hash": "d4d1e4263499bec80672ea0156c357c1ee493ec2b1c70f0acce89fc37c4a6abe"
+    "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
    }
   }
  },