Skip to content

Commit f444983

Browse files
Can-Zhaopre-commit-ci[bot]guopengfmingxin-zhengKumoLiu
authored
Maisi readme (#1743)
Fixes # . ### Description Maisi readme. ### Checks <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [ ] Avoid including large-size files in the PR. - [ ] Clean up long text outputs from code cells in the notebook. - [ ] For security purposes, please check the contents and remove any sensitive info such as user names and private key. - [ ] Ensure (1) hyperlinks and markdown anchors are working (2) use relative paths for tutorial repo files (3) put figure and graphs in the `./figure` folder - [ ] Notebook runs automatically `./runner.sh -t <path to .ipynb file>` --------- Signed-off-by: Can Zhao <[email protected]> Signed-off-by: Pengfei Guo <[email protected]> Signed-off-by: Can-Zhao <[email protected]> Signed-off-by: Pengfei Guo <[email protected]> Signed-off-by: dongyang0122 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pengfei Guo <[email protected]> Co-authored-by: Pengfei Guo <[email protected]> Co-authored-by: Mingxin Zheng <[email protected]> Co-authored-by: YunLiu <[email protected]> Co-authored-by: dongyang0122 <[email protected]>
1 parent 2ceaab5 commit f444983

File tree

7 files changed

+267
-4
lines changed

7 files changed

+267
-4
lines changed

generative/maisi/LICENSE.weights

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
NVIDIA License
2+
3+
1. Definitions
4+
5+
“Licensor” means any person or entity that distributes its Work.
6+
“Work” means (a) the original work of authorship made available under this license, which may include software, documentation, or other files, and (b) any additions to or derivative works thereof that are made available under this license.
7+
The terms “reproduce,” “reproduction,” “derivative works,” and “distribution” have the meaning as provided under U.S. copyright law; provided, however, that for the purposes of this license, derivative works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work.
8+
Works are “made available” under this license by including in or with the Work either (a) a copyright notice referencing the applicability of this license to the Work, or (b) a copy of this license.
9+
10+
2. License Grant
11+
12+
2.1 Copyright Grant. Subject to the terms and conditions of this license, each Licensor grants to you a perpetual, worldwide, non-exclusive, royalty-free, copyright license to use, reproduce, prepare derivative works of, publicly display, publicly perform, sublicense and distribute its Work and any resulting derivative works in any form.
13+
14+
3. Limitations
15+
16+
3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so under this license, (b) you include a complete copy of this license with your distribution, and (c) you retain without modification any copyright, patent, trademark, or attribution notices that are present in the Work.
17+
18+
3.2 Derivative Works. You may specify that additional or different terms apply to the use, reproduction, and distribution of your derivative works of the Work (“Your Terms”) only if (a) Your Terms provide that the use limitation in Section 3.3 applies to your derivative works, and (b) you identify the specific derivative works that are subject to Your Terms. Notwithstanding Your Terms, this license (including the redistribution requirements in Section 3.1) will continue to apply to the Work itself.
19+
20+
3.3 Use Limitation. The Work and any derivative works thereof only may be used or intended for use non-commercially. Notwithstanding the foregoing, NVIDIA Corporation and its affiliates may use the Work and any derivative works commercially. As used herein, “non-commercially” means for research or evaluation purposes only.
21+
22+
3.4 Patent Claims. If you bring or threaten to bring a patent claim against any Licensor (including any claim, cross-claim or counterclaim in a lawsuit) to enforce any patents that you allege are infringed by any Work, then your rights under this license from such Licensor (including the grant in Section 2.1) will terminate immediately.
23+
24+
3.5 Trademarks. This license does not grant any rights to use any Licensor’s or its affiliates’ names, logos, or trademarks, except as necessary to reproduce the notices described in this license.
25+
26+
3.6 Termination. If you violate any term of this license, then your rights under this license (including the grant in Section 2.1) will terminate immediately.
27+
28+
4. Disclaimer of Warranty.
29+
30+
THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF
31+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS LICENSE.
32+
33+
5. Limitation of Liability.
34+
35+
EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION, LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

generative/maisi/README.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Medical AI for Synthetic Imaging (MAISI)
2+
This example demonstrates the applications of training and validating NVIDIA MAISI, a 3D Latent Diffusion Model (LDM) capable of generating large CT images accompanied by corresponding segmentation masks. It supports variable volume size and voxel spacing and allows for the precise control of organ/tumor size.
3+
4+
## MAISI Model Highlight
5+
- A Foundation Variational Auto-Encoder (VAE) model for latent feature compression that works for both CT and MRI with flexible volume size and voxel size
6+
- A Foundation Diffusion model that can generate large CT volumes up to 512 &times; 512 &times; 768 size, with flexible volume size and voxel size
7+
- A ControlNet to generate image/mask pairs that can improve downstream tasks, with controllable organ/tumor size
8+
9+
## Example Results and Evaluation
10+
11+
## MAISI Model Workflow
12+
The training and inference workflows of MAISI are depicted in the figure below. It begins by training an autoencoder in pixel space to encode images into latent features. Following that, it trains a diffusion model in the latent space to denoise the noisy latent features. During inference, it first generates latent features from random noise by applying multiple denoising steps using the trained diffusion model. Finally, it decodes the denoised latent features into images using the trained autoencoder.
13+
<p align="center">
14+
<img src="./figures/maisi_train.jpg" alt="MAISI training scheme">
15+
<br>
16+
<em>Figure 1: MAISI training scheme</em>
17+
</p>
18+
19+
<p align="center">
20+
<img src="./figures/maisi_infer.jpg" alt="MAISI inference scheme")
21+
<br>
22+
<em>Figure 2: MAISI inference scheme</em>
23+
</p>
24+
MAISI is based on the following papers:
25+
26+
[**Latent Diffusion:** Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." CVPR 2022.](https://openaccess.thecvf.com/content/CVPR2022/papers/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf)
27+
28+
[**ControlNet:** Lvmin Zhang, Anyi Rao, Maneesh Agrawala; “Adding Conditional Control to Text-to-Image Diffusion Models.” ICCV 2023.](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_Adding_Conditional_Control_to_Text-to-Image_Diffusion_Models_ICCV_2023_paper.pdf)
29+
30+
### 1. Installation
31+
Please refer to the [Installation of MONAI Generative Model](../README.md).
32+
33+
Note: MAISI depends on [xFormers](https://github.com/facebookresearch/xformers) library.
34+
ARM64 users can build xFormers from the [source](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) if the available wheel does not meet their requirements.
35+
36+
### 2. Model inference and example outputs
37+
Please refer to [maisi_inference_tutorial.ipynb](maisi_inference_tutorial.ipynb) for the tutorial for MAISI model inference.
38+
39+
### 3. Training example
40+
Training data preparation can be found in [./data/README.md](./data/README.md)
41+
42+
#### [3.1 3D Autoencoder Training](./maisi_train_vae_tutorial.ipynb)
43+
44+
Please refer to [maisi_train_vae_tutorial.ipynb](maisi_train_vae_tutorial.ipynb) for the tutorial for MAISI VAE model training.
45+
46+
#### [3.2 3D Latent Diffusion Training](./scripts/diff_model_train.py)
47+
48+
Please refer to [maisi_diff_unet_training_tutorial.ipynb](maisi_diff_unet_training_tutorial.ipynb) for the tutorial for MAISI diffusion model training.
49+
50+
#### [3.3 3D ControlNet Training](./scripts/train_controlnet.py)
51+
52+
We provide a [training config](./configs/config_maisi_controlnet_train.json) executing finetuning for pretrained ControlNet with a new class (i.e., Kidney Tumor).
53+
When finetuning with other new class names, please update the `weighted_loss_label` in training config
54+
and [label_dict.json](./configs/label_dict.json) accordingly. There are 8 dummy labels as deletable placeholders in default `label_dict.json` that can be used for finetuning. Users may apply any placeholder labels for fine-tuning purpose. If there are more than 8 new labels needed in finetuning, users can freely define numeric label indices less than 256. The current ControlNet implementation can support up to 256 labels (0~255).
55+
Preprocessed dataset for ControlNet training and more details anout data preparation can be found in the [README](./data/README.md).
56+
57+
#### Training Configuration
58+
The training was performed with the following:
59+
- GPU: at least 60GB GPU memory for 512 &times; 512 &times; 512 volume
60+
- Actual Model Input (the size of 3D image feature in latent space) for the latent diffusion model: 128 &times; 128 &times; 128 for 512 &times; 512 &times; 512 volume
61+
- AMP: True
62+
63+
#### Execute Training:
64+
To train with a single GPU, please run:
65+
```bash
66+
python -m scripts.train_controlnet -c ./configs/config_maisi.json -t ./configs/config_maisi_controlnet_train.json -e ./configs/environment_maisi_controlnet_train.json -g 1
67+
```
68+
69+
The training script also enables multi-GPU training. For instance, if you are using eight GPUs, you can run the training script with the following command:
70+
```bash
71+
export NUM_GPUS_PER_NODE=8
72+
torchrun \
73+
--nproc_per_node=${NUM_GPUS_PER_NODE} \
74+
--nnodes=1 \
75+
--master_addr=localhost --master_port=1234 \
76+
-m scripts.train_controlnet -c ./configs/config_maisi.json -t ./configs/config_maisi_controlnet_train.json -e ./configs/environment_maisi_controlnet_train.json -g ${NUM_GPUS_PER_NODE}
77+
```
78+
Please also check [maisi_train_controlnet_tutorial.ipynb](./maisi_train_controlnet_tutorial.ipynb) for more details about data preparation and training parameters.
79+
80+
### 4. License
81+
82+
The code is released under Apache 2.0 License.
83+
84+
The model weight is released under [NSCLv1 License](./LICENSE.weights).
85+
86+
### 5. Questions and Bugs
87+
88+
- For questions relating to the use of MONAI, please use our [Discussions tab](https://github.com/Project-MONAI/MONAI/discussions) on the main repository of MONAI.
89+
- For bugs relating to MONAI functionality, please create an issue on the [main repository](https://github.com/Project-MONAI/MONAI/issues).
90+
- For bugs relating to the running of a tutorial, please create an issue in [this repository](https://github.com/Project-MONAI/Tutorials/issues).

generative/maisi/data/README.md

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# Medical AI for Synthetic Imaging (MAISI) Data Preparation
2+
3+
Disclaimer: We are not the hosts of the data. Please make sure to read the requirements and usage policies of the data and give credit to the authors of the datasets!
4+
5+
### 1 VAE training Data
6+
For the released Foundation autoencoder model weights in MAISI, we used 37243 CT training data and 1963 CT validation data from chest, abdomen, head and neck region; and 17887 MRI training data and 940 MRI validation data from brain, skull-stripped brain, chest, and below-abdomen region. The training data come from [TCIA Covid 19 Chest CT](https://wiki.cancerimagingarchive.net/display/Public/CT+Images+in+COVID-19#70227107b92475d33ae7421a9b9c426f5bb7d5b3), [TCIA Colon Abdomen CT](https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=3539213), [MSD03 Liver Abdomen CT](http://medicaldecathlon.com/), [LIDC chest CT](https://www.cancerimagingarchive.net/collection/lidc-idri/), [TCIA Stony Brook Covid Chest CT](https://www.cancerimagingarchive.net/collection/covid-19-ny-sbu/), [NLST Chest CT](https://www.cancerimagingarchive.net/collection/nlst/), [TCIA Upenn GBM Brain MR](https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70225642), [Aomic Brain MR](https://openneuro.org/datasets/ds003097/versions/1.2.1), [QTIM Brain MR](https://openneuro.org/datasets/ds004169/versions/1.0.7), [TCIA Acrin Chest MR](https://www.cancerimagingarchive.net/collection/acrin-contralateral-breast-mr/), [TCIA Prostate MR Below-Abdomen MR](https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=68550661#68550661a2c52df5969d435eae49b9669bea21a6).
7+
8+
In total, we included:
9+
| Index | Dataset Name | Number of Training Data | Number of Validation Data |
10+
|-------|------------------------------------------------|-------------------------|---------------------------|
11+
| 1 | Covid 19 Chest CT | 722 | 49 |
12+
| 2 | TCIA Colon Abdomen CT | 1522 | 77 |
13+
| 3 | MSD03 Liver Abdomen CT | 104 | 0 |
14+
| 4 | LIDC chest CT | 450 | 24 |
15+
| 5 | TCIA Stony Brook Covid Chest CT | 2644 | 139 |
16+
| 6 | NLST Chest CT | 31801 | 1674 |
17+
| 7 | TCIA Upenn GBM Brain MR (skull-stripped) | 2550 | 134 |
18+
| 8 | Aomic Brain MR | 2630 | 138 |
19+
| 9 | QTIM Brain MR | 1275 | 67 |
20+
| 10 | Acrin Chest MR | 6599 | 347 |
21+
| 11 | TCIA Prostate MR Below-Abdomen MR | 928 | 49 |
22+
| 12 | Aomic Brain MR, skull-stripped | 2630 | 138 |
23+
| 13 | QTIM Brain MR, skull-stripped | 1275 | 67 |
24+
| | Total CT | 37243 | 1963 |
25+
| | Total MRI | 17887 | 940 |
26+
27+
28+
### 2 Diffusion model training Data
29+
30+
The training dataset for the Diffusion model used in MAISI comprises 10,277 CT volumes from 24 distinct datasets, encompassing various body regions and disease patterns.
31+
32+
The table below provides a summary of the number of volumes for each dataset.
33+
34+
|Index| Dataset name|Number of volumes|
35+
|:-----|:-----|:-----|
36+
1 | AbdomenCT-1K | 789
37+
2 | AeroPath | 15
38+
3 | AMOS22 | 240
39+
4 | autoPET23 | 200
40+
5 | Bone-Lesion | 223
41+
6 | BTCV | 48
42+
7 | COVID-19 | 524
43+
8 | CRLM-CT | 158
44+
9 | CT-ORG | 94
45+
10 | CTPelvic1K-CLINIC | 94
46+
11 | LIDC | 422
47+
12 | MSD Task03 | 88
48+
13 | MSD Task06 | 50
49+
14 | MSD Task07 | 224
50+
15 | MSD Task08 | 235
51+
16 | MSD Task09 | 33
52+
17 | MSD Task10 | 87
53+
18 | Multi-organ-Abdominal-CT | 65
54+
19 | NLST | 3109
55+
20 | Pancreas-CT | 51
56+
21 | StonyBrook-CT | 1258
57+
22 | TCIA_Colon | 1437
58+
23 | TotalSegmentatorV2 | 654
59+
24 | VerSe | 179
60+
61+
### 3 ControlNet model training Data
62+
63+
#### 3.1 Example preprocessed dataset
64+
65+
We provide the preprocessed subset of [C4KC-KiTS](https://www.cancerimagingarchive.net/collection/c4kc-kits/) dataset used in the finetuning config `environment_maisi_controlnet_train.json`. The dataset and corresponding JSON data list can be downloaded from [this link](https://drive.google.com/drive/folders/1iMStdYxcl26dEXgJEXOjkWvx-I2fYZ2u?usp=sharing) and should be saved in `maisi/dataset/` folder.
66+
67+
The structure of example folder in the preprocessed dataset is:
68+
69+
```
70+
|-*arterial*.nii.gz # original image
71+
|-*arterial_emb*.nii.gz # encoded image embedding
72+
KiTS-000* --|-mask*.nii.gz # original labels
73+
|-mask_pseudo_label*.nii.gz # pseudo labels
74+
|-mask_combined_label*.nii.gz # combined mask of original and pseudo labels
75+
```
76+
77+
An example combined mask of original and pseudo labels is shown below:
78+
![example_combined_mask](../figures/example_combined_mask.png)
79+
80+
Please note that the label of Kidney Tumor is mapped to index `129` in this preprocessed dataset. The encoded image embedding is generated by provided `Autoencoder` in `./models/autoencoder_epoch273.pt` during preprocessing to save memory usage for training. The pseudo labels are generated by [VISTA 3D](https://github.com/Project-MONAI/VISTA). In addition, the dimension of each volume and corresponding pseudo label is resampled to the closest multiple of 128 (e.g., 128, 256, 384, 512, ...).
81+
82+
The training workflow requires one JSON file to specify the image embedding and segmentation pairs. The example file is located in the `maisi/dataset/C4KC-KiTS_subset.json`.
83+
84+
The JSON file has the following structure:
85+
```python
86+
{
87+
"training": [
88+
{
89+
"image": "*/*arterial_emb*.nii.gz", # relative path to the image embedding file
90+
"label": "*/mask_combined_label*.nii.gz", # relative path to the combined label file
91+
"dim": [512, 512, 512], # the dimension of image
92+
"spacing": [1.0, 1.0, 1.0], # the spacing of image
93+
"top_region_index": [0, 1, 0, 0], # the top region index of the image
94+
"bottom_region_index": [0, 0, 0, 1], # the bottom region index of the image
95+
"fold": 0 # fold index for cross validation, fold 0 is used for training
96+
},
97+
98+
...
99+
]
100+
}
101+
```
102+
103+
#### 3.2 Controlnet full training datasets
104+
The ControlNet training dataset used in MAISI contains 6330 CT volumes (5058 and 1272 volumes are used for training and validation, respectively) across 20 datasets and covers different body regions and diseases.
105+
106+
The table below summarizes the number of volumes for each dataset.
107+
108+
|Index| Dataset name|Number of volumes|
109+
|:-----|:-----|:-----|
110+
1 | AbdomenCT-1K | 789
111+
2 | AeroPath | 15
112+
3 | AMOS22 | 240
113+
4 | Bone-Lesion | 237
114+
5 | BTCV | 48
115+
6 | CT-ORG | 94
116+
7 | CTPelvic1K-CLINIC | 94
117+
8 | LIDC | 422
118+
9 | MSD Task03 | 105
119+
10 | MSD Task06 | 50
120+
11 | MSD Task07 | 225
121+
12 | MSD Task08 | 235
122+
13 | MSD Task09 | 33
123+
14 | MSD Task10 | 101
124+
15 | Multi-organ-Abdominal-CT | 64
125+
16 | Pancreas-CT | 51
126+
17 | StonyBrook-CT | 1258
127+
18 | TCIA_Colon | 1436
128+
19 | TotalSegmentatorV2 | 654
129+
20| VerSe | 179
130+
131+
### 4. Questions and bugs
132+
133+
- For questions relating to the use of MONAI, please use our [Discussions tab](https://github.com/Project-MONAI/MONAI/discussions) on the main repository of MONAI.
134+
- For bugs relating to MONAI functionality, please create an issue on the [main repository](https://github.com/Project-MONAI/MONAI/issues).
135+
- For bugs relating to the running of a tutorial, please create an issue in [this repository](https://github.com/Project-MONAI/Tutorials/issues).
136+
137+
### Reference
138+
[1] [Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." CVPR 2022.](https://openaccess.thecvf.com/content/CVPR2022/papers/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf)
Loading
142 KB
Loading
183 KB
Loading

generative/maisi/maisi_train_vae_tutorial.ipynb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -872,6 +872,10 @@
872872
" print(f\"Epoch {epoch} train_vae_loss {loss_weighted_sum(train_epoch_losses)}: {train_epoch_losses}.\")\n",
873873
" for loss_name, loss_value in train_epoch_losses.items():\n",
874874
" tensorboard_writer.add_scalar(f\"train_{loss_name}_epoch\", loss_value, epoch)\n",
875+
" torch.save(autoencoder.state_dict(), trained_g_path)\n",
876+
" torch.save(discriminator.state_dict(), trained_d_path)\n",
877+
" print(\"Save trained autoencoder to\", trained_g_path)\n",
878+
" print(\"Save trained discriminator to\", trained_d_path)\n",
875879
"\n",
876880
" # Validation\n",
877881
" if epoch % val_interval == 0:\n",
@@ -891,12 +895,8 @@
891895
" for key in val_epoch_losses:\n",
892896
" val_epoch_losses[key] /= len(dataloader_val)\n",
893897
"\n",
894-
" torch.save(autoencoder.state_dict(), trained_g_path)\n",
895-
" torch.save(discriminator.state_dict(), trained_d_path)\n",
896898
" val_loss_g = loss_weighted_sum(val_epoch_losses)\n",
897899
" print(f\"Epoch {epoch} val_vae_loss {val_loss_g}: {val_epoch_losses}.\")\n",
898-
" print(\"Save trained autoencoder to\", trained_g_path)\n",
899-
" print(\"Save trained discriminator to\", trained_d_path)\n",
900900
"\n",
901901
" if val_loss_g < best_val_recon_epoch_loss:\n",
902902
" best_val_recon_epoch_loss = val_loss_g\n",

0 commit comments

Comments
 (0)