Replies: 2 comments
-
Hi @KumoLiu , Could you please help share some info for this question? Thanks in advance. |
Beta Was this translation helpful? Give feedback.
-
Some more info on the problem for anyone interested... I’m attempting to use FSDP for medical image segmentation to reduce GPU memory footprint during training. As a starting point, I’m trying to adapt the MONAI BRATS 3D segmentation tutorial to use FSDP with 2 GPUs. I’ve created a fork of the original tutorial that instead spawns 2 processes, each of which begins a training loop with a module wrapped with FSDP. I had to split the fsdp_main function out into its own file due to this error.
I’m currently seeing this printed error message:
And the following stack trace:
This seems to be occurring when the FSDP module unshards the parameters before performing the forward pass. For some reason this particular I have checked that all input/label data and model parameters are on the correct devices before performing the forward pass. The problem should be reproducible if you run the forked code with the following versions and update this directory to which the segmentation data is downloaded. Any ideas why there might be a device mismatch for these flattened parameters?
Thanks, |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there,
I originally posted this question in the PyTorch forums because it is related to use of the
FullyShardedDataParallel
, but I thought it would be worthwhile reposting here.I'm trying to adapt the BRATS 3D segmentation tutorial to run using multiple GPUs to reduce the memory footprint and hence train using higher-resolution input images. Has anybody had any success with this?
Please see the original post for full details.
Thanks,
Brett
Beta Was this translation helpful? Give feedback.
All reactions