Sanity Check fail when ddp is enable #6559
Unanswered
edwardpwtsoi
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi guys, I encounter this error when I enable ddp. Everything was fine when I was not using ddp. Could anyone suggest this is a bug should be reported or something wrong from my code.
Restored states from the checkpoint file at lightning_logs/version_34/checkpoints/step=199.ckpt INFO:lightning:Restored states from the checkpoint file at lightning_logs/version_34/checkpoints/step=199.ckpt Traceback (most recent call last): File "/workspace/train.py", line 76, in <module> main(parser.parse_args()) File "/workspace/train.py", line 68, in main trainer.fit(model, datamodule) File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 498, in fit self.dispatch() File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 545, in dispatch self.accelerator.start_training(self) File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training self.training_type_plugin.start_training(trainer) File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 114, in start_training self._results = trainer.run_train() File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 606, in run_train self.run_sanity_check(self.lightning_module) File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 855, in run_sanity_check _, eval_results = self.run_evaluation(max_batches=self.num_sanity_val_batches) File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 711, in run_evaluation for batch_idx, batch in enumerate(dataloader): File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 517, in __next__ data = self._next_data() File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1179, in _next_data return self._process_data(data) File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/usr/local/lib/python3.8/dist-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) TypeError: __init__() missing 4 required positional arguments: 'casting', 'from_', 'to', and 'i'
Beta Was this translation helpful? Give feedback.
All reactions