Auto3dSeg CUDA out of memory issues. #1089
-
Hi, I've been experimenting with the Auto3dSeg pipeline for a few days. I keep running into Does anyone have any tips on how to avoid such problems? I've tried altering the patch sizes used by segresnet (they appear to be larger than the other algorithms, I'd like to avoid manually editing the hyperparameter YAML files if possible, as I'm trying to build an automated solution that requires minimal manual user intervention. Happy to provide more technical details and logs if that would be useful for the discussion. Many thanks, Peter |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 4 replies
-
hi @peterhessey, there are two potential sources for OOM type of issues. The first one is from the model training. Reducing batch size or patch size would resolve the issue. You can refer this link to set up parameters in the configuration. The 2nd one is from model validation. The scripts load the entire images into GPU memory.for sliding-window inference. If the image is very large, the loading could cause OOM issue. Could you please confirm the issue if it is from the 1st or 2nd source? If it is the 2nd source, we will update the repo with a fix for the issue. Thanks! |
Beta Was this translation helpful? Give feedback.
-
hi @peterhessey, you can refer this README to training SegResNet model. could you please share the log once you start the training commands. I can help you further resolve the issue. |
Beta Was this translation helpful? Give feedback.
-
hi @peterhessey, thank you for sharing the information! There is currently no easy way to automatically generate .yaml configurations with smaller patch sizes. Users need to either manually modify the .yaml file, or add additional options when launching model training commands. |
Beta Was this translation helpful? Give feedback.
-
I believe the OOM is caused by model validation steps during the training process. It is highly possible that some volumes after data pre-processing are very large. We currently are working towards resolving the OOM issues during validation. And the update will be release by this month. At current stage, you can modify the target re-sampling resolution/spacing in the transform configuration .yaml files (e.g., from 1 x 1 x 1 to 1.5 x 1.5 x 1.5 or 2.0 x 2.0 x 2.0). |
Beta Was this translation helpful? Give feedback.
-
Awesome, thank you for all your help @dongyang0122! |
Beta Was this translation helpful? Give feedback.
I believe the OOM is caused by model validation steps during the training process. It is highly possible that some volumes after data pre-processing are very large. We currently are working towards resolving the OOM issues during validation. And the update will be release by this month. At current stage, you can modify the target re-sampling resolution/spacing in the transform configuration .yaml files (e.g., from 1 x 1 x 1 to 1.5 x 1.5 x 1.5 or 2.0 x 2.0 x 2.0).