Is there a way to initialize a DDP process that does not calculate gradients ? #6518

shamanez · 2021-03-15T04:37:05Z

shamanez
Mar 15, 2021

Let's say I have four GPUs and I want two GPUs to calculate the gradients and do the backpropagation and the other two to do some inference task that helps the training process.

At the moment can I do this by accessing the def training_step and using self.trainer.global_rank ?. Basically, for the later two GPUs, I do not calculate the loss but only run my inference task. Although this is possible, there is another problem. That is in the DDP initialization process it does create a copy of the entire model in every GPU. This is really redundant in my case because for the inference task I just need a only small part of the main model and this is a wastage of GPU memory.

So is there any way that I can have a DDP process that doesn't copy the entire model and I can initialize it in a custom way?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is there a way to initialize a DDP process that does not calculate gradients ? #6518

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Is there a way to initialize a DDP process that does not calculate gradients ? #6518

Uh oh!

shamanez Mar 15, 2021

Replies: 0 comments

shamanez
Mar 15, 2021