Missing Saved Checkpoints when using Multiple Loggers #6809

athenawisdoms · 2021-04-03T19:40:06Z

athenawisdoms
Apr 3, 2021

Hello Lightning gods!

When using a single logger (without specifying a logger when creating the Trainer) and also using a ModelCheckpoint callback to automatically save good checkpoints, everything works as expected: The directory created by TensorBoard at <path_to_project>/lightning_logs/<experiment_name>/version_<N> contains hparams.yaml and a directory checkpoints containing several .ckpt files.

checkpoint_callback = ModelCheckpoint(
    monitor="val_loss",
    save_top_k=5,
)
trainer = pl.Trainer(
    gpus=-1,
    accelerator="ddp",
    callbacks=[checkpoint_callback],
)

However, when I added a second logger, the checkpoints directory is no longer find inside the lightning_logs subdirectories. These subdirectories created by TensorBoard only contains hparams.yaml and a events.out.tfevents.* file.

from aim.pytorch_lightning import AimLogger
from pytorch_lightning.loggers import TensorBoardLogger

checkpoint_callback = ModelCheckpoint(
    monitor="val_loss",
    save_top_k=5,
)
tb_logger = TensorBoardLogger("lightning_logs", name="my_experiment")
aim_logger = AimLogger(experiment="my_experiment")
trainer = pl.Trainer(
    gpus=-1,
    accelerator="ddp",
    callbacks=[checkpoint_callback],
    logger=[aim_logger, tb_logger],
)

How can we configure Lightning to also include the checkpoint files in the TensorBoard log directories, just like in the original case?

Answered by awaelchli

Apr 4, 2021

Hi
When using multiple loggers, there is no canonical way to store the checkpoints in subdirs. What Lightning currently does is put the checkpoints one level above in a directory with the names of the loggers concatenated:

This is very reasonable, since both loggers were used to produce the same checkpoints.
There are alternatives, for example saving the checkpoints in the experiment dir of the first logger in the list, or copy the checkpoints to both subdirs, but this is not implemented.

View full answer

awaelchli · 2021-04-04T22:16:41Z

awaelchli
Apr 4, 2021

Hi
When using multiple loggers, there is no canonical way to store the checkpoints in subdirs. What Lightning currently does is put the checkpoints one level above in a directory with the names of the loggers concatenated:

This is very reasonable, since both loggers were used to produce the same checkpoints.
There are alternatives, for example saving the checkpoints in the experiment dir of the first logger in the list, or copy the checkpoints to both subdirs, but this is not implemented.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missing Saved Checkpoints when using Multiple Loggers #6809

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Missing Saved Checkpoints when using Multiple Loggers #6809

Uh oh!

Uh oh!

athenawisdoms Apr 3, 2021

Replies: 1 comment

Uh oh!

awaelchli Apr 4, 2021

athenawisdoms
Apr 3, 2021

awaelchli
Apr 4, 2021