Missing Saved Checkpoints when using Multiple Loggers #6809
-
Hello Lightning gods! When using a single logger (without specifying a logger when creating the checkpoint_callback = ModelCheckpoint(
monitor="val_loss",
save_top_k=5,
)
trainer = pl.Trainer(
gpus=-1,
accelerator="ddp",
callbacks=[checkpoint_callback],
) However, when I added a second logger, the from aim.pytorch_lightning import AimLogger
from pytorch_lightning.loggers import TensorBoardLogger
checkpoint_callback = ModelCheckpoint(
monitor="val_loss",
save_top_k=5,
)
tb_logger = TensorBoardLogger("lightning_logs", name="my_experiment")
aim_logger = AimLogger(experiment="my_experiment")
trainer = pl.Trainer(
gpus=-1,
accelerator="ddp",
callbacks=[checkpoint_callback],
logger=[aim_logger, tb_logger],
) How can we configure Lightning to also include the checkpoint files in the TensorBoard log directories, just like in the original case? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi This is very reasonable, since both loggers were used to produce the same checkpoints. |
Beta Was this translation helpful? Give feedback.
Hi
When using multiple loggers, there is no canonical way to store the checkpoints in subdirs. What Lightning currently does is put the checkpoints one level above in a directory with the names of the loggers concatenated:
This is very reasonable, since both loggers were used to produce the same checkpoints.
There are alternatives, for example saving the checkpoints in the experiment dir of the first logger in the list, or copy the checkpoints to both subdirs, but this is not implemented.