Replies: 2 comments 10 replies
-
Hi @hw-ju , thanks for posting the questions.
Yes, these handlers only need to be employed in rank 0, and the result is based on all data. In addition, I check the code of the dynunet pipeline, and currently these handlers are executed in all ranks. I will submit a PR to modify it. For the sync related question, may need @wyli @Nic-Ma @ericspod to help answer it, thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi @yiheng-wang-nv! Could you help with two questions below?
For example, we can see them in output below
before lines https://github.com/Project-MONAI/tutorials/blob/main/modules/dynunet_pipeline/train.py#L100 and https://github.com/Project-MONAI/tutorials/blob/main/modules/dynunet_pipeline/train.py#L230? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
In the dynunet_pipeline tutorial using DDP, looks like there's no explicit synchronization, e.g.
dist.barrier
, between training and validation. Does themonai.engines.Trainer
class (andEvaluator
class) take care of such synchronization?Checkpointsaver
andStatsHandler
handlers are only executed by rank 0? If true, doesStatsHandler
log metrics obtained from just rank 0 or does it log aggregated metrics from all ranks?Beta Was this translation helpful? Give feedback.
All reactions