Is there a way to initialize a DDP process that does not calculate gradients ? #6518
Unanswered
shamanez
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Let's say I have four GPUs and I want two GPUs to calculate the gradients and do the backpropagation and the other two to do some inference task that helps the training process.
At the moment can I do this by accessing the def training_step and using self.trainer.global_rank ?. Basically, for the later two GPUs, I do not calculate the loss but only run my inference task. Although this is possible, there is another problem. That is in the DDP initialization process it does create a copy of the entire model in every GPU. This is really redundant in my case because for the inference task I just need a only small part of the main model and this is a wastage of GPU memory.
So is there any way that I can have a DDP process that doesn't copy the entire model and I can initialize it in a custom way?
Beta Was this translation helpful? Give feedback.
All reactions