Releases: Lightning-AI/pytorch-lightning
Standard weekly patch release
[1.3.1] - 2021-05-11
Fixed
- Fixed DeepSpeed with IterableDatasets (#7362)
- Fixed
Trainer.current_epoch
not getting restored after tuning (#7434) - Fixed local rank displayed in console log (#7395)
Contributors
@akihironitta @awaelchli @leezu
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Lightning CLI, PyTorch Profiler, Improved Early Stopping
Today we are excited to announce Lightning 1.3, containing highly anticipated new features including a new Lightning CLI, improved TPU support, integrations such as PyTorch profiler, new early stopping strategies, predict and validate trainer routines, and more.
[1.3.0] - 2021-05-06
Added
- Added support for the
EarlyStopping
callback to run at the end of the training epoch (#6944) - Added synchronization points before and after
setup
hooks are run (#7202) - Added a
teardown
hook toClusterEnvironment
(#6942) - Added utils for metrics to scalar conversions (#7180)
- Added utils for NaN/Inf detection for gradients and parameters (#6834)
- Added more explicit exception message when trying to execute
trainer.test()
ortrainer.validate()
withfast_dev_run=True
(#6667) - Added
LightningCLI
class to provide simple reproducibility with minimum boilerplate training CLI (#4492, #6862, #7156, #7299) - Added
gradient_clip_algorithm
argument to Trainer for gradient clipping by value (#6123). - Added a way to print to terminal without breaking up the progress bar (#5470)
- Added support to checkpoint after training steps in
ModelCheckpoint
callback (#6146) - Added
TrainerStatus.{INITIALIZING,RUNNING,FINISHED,INTERRUPTED}
(#7173) - Added
Trainer.validate()
method to perform one evaluation epoch over the validation set (#4948) - Added
LightningEnvironment
for Lightning-specific DDP (#5915) - Added
teardown()
hook to LightningDataModule (#4673) - Added
auto_insert_metric_name
parameter toModelCheckpoint
(#6277) - Added arg to
self.log
that enables users to give custom names when dealing with multiple dataloaders (#6274) - Added
teardown
method toBaseProfiler
to enable subclasses defining post-profiling steps outside of__del__
(#6370) - Added
setup
method toBaseProfiler
to enable subclasses defining pre-profiling steps for every process (#6633) - Added no return warning to predict (#6139)
- Added
Trainer.predict
config validation (#6543) - Added
AbstractProfiler
interface (#6621) - Added support for including module names for forward in the autograd trace of
PyTorchProfiler
(#6349) - Added support for the PyTorch 1.8.1 autograd profiler (#6618)
- Added
outputs
parameter to callback'son_validation_epoch_end
&on_test_epoch_end
hooks (#6120) - Added
configure_sharded_model
hook (#6679) - Added support for
precision=64
, enabling training with double precision (#6595) - Added support for DDP communication hooks (#6736)
- Added
artifact_location
argument toMLFlowLogger
which will be passed to theMlflowClient.create_experiment
call (#6677) - Added
model
parameter to precision plugins'clip_gradients
signature (#6764, #7231) - Added
is_last_batch
attribute toTrainer
(#6825) - Added
LightningModule.lr_schedulers()
for manual optimization (#6567) - Added
MpModelWrapper
in TPU Spawn (#7045) - Added
max_time
Trainer argument to limit training time (#6823) - Added
on_predict_{batch,epoch}_{start,end}
hooks (#7141) - Added new
EarlyStopping
parametersstopping_threshold
anddivergence_threshold
(#6868) - Added
debug
flag to TPU Training Plugins (PT_XLA_DEBUG) (#7219) - Added new
UnrepeatedDistributedSampler
andIndexBatchSamplerWrapper
for tracking distributed predictions (#7215) - Added
trainer.predict(return_predictions=None|False|True)
(#7215) - Added
BasePredictionWriter
callback to implement prediction saving (#7127) - Added
trainer.tune(scale_batch_size_kwargs, lr_find_kwargs)
arguments to configure the tuning algorithms (#7258) - Added
tpu_distributed
check for TPU Spawn barrier (#7241) - Added device updates to TPU Spawn for Pod training (#7243)
- Added warning when missing
Callback
and usingresume_from_checkpoint
(#7254) - DeepSpeed single file saving (#6900)
- Added Training type Plugins Registry (#6982, #7063, #7214, #7224)
- Add
ignore
param tosave_hyperparameters
(#6056)
Changed
- Changed
LightningModule.truncated_bptt_steps
to be property (#7323) - Changed
EarlyStopping
callback from by default runningEarlyStopping.on_validation_end
if only training is run. Setcheck_on_train_epoch_end
to run the callback at the end of the train epoch instead of at the end of the validation epoch (#7069) - Renamed
pytorch_lightning.callbacks.swa
topytorch_lightning.callbacks.stochastic_weight_avg
(#6259) - Refactor
RunningStage
andTrainerState
usage (#4945, #7173)- Added
RunningStage.SANITY_CHECKING
- Added
TrainerFn.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING}
- Changed
trainer.evaluating
to returnTrue
if validating or testing
- Added
- Changed
setup()
andteardown()
stage argument to take any of{fit,validate,test,predict}
(#6386) - Changed profilers to save separate report files per state and rank (#6621)
- The trainer no longer tries to save a checkpoint on exception or run callback's
on_train_end
functions (#6864) - Changed
PyTorchProfiler
to usetorch.autograd.profiler.record_function
to record functions (#6349) - Disabled
lr_scheduler.step()
in manual optimization (#6825) - Changed warnings and recommendations for dataloaders in
ddp_spawn
(#6762) pl.seed_everything
will now also set the seed on theDistributedSampler
(#7024)- Changed default setting for communication of multi-node training using
DDPShardedPlugin
(#6937) trainer.tune()
now returns the tuning result (#7258)LightningModule.from_datasets()
now acceptsIterableDataset
instances as training datasets. (#7503)- Changed
resume_from_checkpoint
warning to an error when the checkpoint file does not exist (#7075) - Automatically set
sync_batchnorm
fortraining_type_plugin
(#6536) - Allowed training type plugin to delay optimizer creation (#6331)
- Removed ModelSummary validation from train loop on_trainer_init (#6610)
- Moved
save_function
to accelerator (#6689) - Updated DeepSpeed ZeRO (#6546, #6752, #6142, #6321)
- Improved verbose logging for
EarlyStopping
callback (#6811) - Run ddp_spawn dataloader checks on Windows (#6930)
- Updated mlflow with using
resolve_tags
(#6746) - Moved
save_hyperparameters
to its own function (#7119) - Replaced
_DataModuleWrapper
with__new__
(#7289) - Reset
current_fx
properties on lightning module in teardown (#7247) - Auto-set
DataLoader.worker_init_fn
withseed_everything
(#6960) - Remove
model.trainer
call inside of dataloading mixin (#7317) - Split profilers module (#6261)
- Ensure accelerator is valid if running interactively (#5970)
- Disabled batch transfer in DP mode (#6098)
Deprecated
- Deprecated
outputs
in bothLightningModule.on_train_epoch_end
andCallback.on_train_epoch_end
hooks (#7339) - Deprecated
Trainer.truncated_bptt_steps
in favor ofLightningModule.truncated_bptt_steps
(#7323) - Deprecated
outputs
in bothLightningModule.on_train_epoch_end
andCallback.on_train_epoch_end
hooks (#7339) - Deprecated
LightningModule.grad_norm
in favor ofpytorch_lightning.utilities.grads.grad_norm
(#7292) - Deprecated the
save_function
property from theModelCheckpoint
callback (#7201) - Deprecated
LightningModule.write_predictions
andLightningModule.write_predictions_dict
(#7066) - Deprecated
TrainerLoggingMixin
in favor of a separate utilities module for metric handling (#7180) - Deprecated
TrainerTrainingTricksMixin
in favor of a separate utilities module for NaN/Inf detection for gradients and parameters (#6834) period
has been deprecated in favor ofevery_n_val_epochs
in theModelCheckpoint
callback (#6146)- Deprecated
trainer.running_sanity_check
in favor oftrainer.sanity_checking
(#4945) - Deprecated
Profiler(output_filename)
in favor ofdirpath
andfilename
(#6621) - Deprecated
PytorchProfiler(profiled_functions)
in favor ofrecord_functions
(#6349) - Deprecated
@auto_move_data
in favor oftrainer.predict
(#6993) - Deprecated
Callback.on_load_checkpoint(checkpoint)
in favor ofCallback.on_load_checkpoint(trainer, pl_module, checkpoint)
(#7253) - Deprecated metrics in favor of
torchmetrics
(#6505, #6530, #6540, #6547, #6515, #6572, #6573, #6584, #6636, #6637, #6649, #6659, #7131) - Deprecated the
LightningModule.datamodule
getter and setter methods; access them throughTrainer.datamodule
instead (#7168) - Deprecated the use of
Trainer(gpus="i")
(string) for selecting the i-th GPU; from v1.5 this will set the number of GPUs instead of the index (#6388)
Removed
- Removed the
exp_save_path
property from theLightningModule
(#7266) - Removed training loop explicitly calling
EarlyStopping.on_validation_end
if no validation is run (#7069) - Removed
automatic_optimization
as a property from the training loop in favor ofLightningModule.automatic_optimization
(#7130) - Removed evaluation loop legacy returns for
*_epoch_end
hooks (#6973) - Removed support for passing a bool value to
profiler
argument of Trainer (#6164) - Removed no return warning from val/test step (#6139)
- Removed passing a
ModelCheckpoint
instance toTrainer(checkpoint_callback)
(#6166) - Removed deprecated Trainer argument
enable_pl_optimizer
andautomatic_optimization
(#6163) - Removed deprecated metrics (#6161)
- from
pytorch_lightning.metrics.functional.classification
removedto_onehot
,to_categorical
,get_num_classes
,roc
,multiclass_roc
,average_precision
,precision_recall_curve
,multiclass_precision_recall_curve
- from
pytorch_lightning.metrics.functional.reduction
removedreduce
,class_reduce
- from
- Removed deprecated
ModelCheckpoint
argumentsprefix
,mode="auto"
(#6162) - Removed
mode='auto'
fromEarlyStopping
(#6167) - Removed
epoch
andstep
argume...
Quick patch release
Fixing missing packaging
package in dependencies, which was affecting the only installation to a very blank system.
Standard weekly patch release
Standard weekly patch release
[1.2.8] - 2021-04-14
Added
- Added TPUSpawn + IterableDataset error message (#6875)
Fixed
- Fixed process rank not being available right away after
Trainer
instantiation (#6941) - Fixed
sync_dist
for tpus (#6950) - Fixed
AttributeError for
require_backward_grad_sync` when running manual optimization with sharded plugin (#6915) - Fixed
--gpus
default for parser returned byTrainer.add_argparse_args
(#6898) - Fixed TPU Spawn all gather (#6896)
- Fixed
EarlyStopping
logic whenmin_epochs
ormin_steps
requirement is not met (#6705) - Fixed csv extension check (#6436)
- Fixed checkpoint issue when using Horovod distributed backend (#6958)
- Fixed tensorboard exception raising (#6901)
- Fixed setting the eval/train flag correctly on accelerator model (#6983)
- Fixed DDP_SPAWN compatibility with bug_report_model.py (#6892)
- Fixed bug where
BaseFinetuning.flatten_modules()
was duplicating leaf node parameters (#6879) - Set better defaults for
rank_zero_only.rank
when training is launched with SLURM and torchelastic:
Contributors
@ananthsub @awaelchli @ethanwharris @justusschock @kandluis @kaushikb11 @liob @SeanNaren @skmatz
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.2.7] - 2021-04-06
Fixed
- Fixed resolve a bug with omegaconf and
xm.save
(#6741) - Fixed an issue with IterableDataset when len is not defined (#6828)
- Sanitize None params during pruning (#6836)
- Enforce an epoch scheduler interval when using SWA (#6588)
- Fixed TPU Colab hang issue, post training (#6816])
- Fixed a bug where
TensorBoardLogger
would give a warning and not log correctly to a symbolic linksave_dir
(#6730)
Contributors
@awaelchli, @ethanwharris, @karthikprasad, @kaushikb11, @mibaumgartner, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.2.6] - 2021-03-30
Changed
- Changed the behavior of
on_epoch_start
to run at the beginning of validation & test epoch (#6498)
Removed
- Removed legacy code to include
step
dictionary returns incallback_metrics
. Useself.log_dict
instead. (#6682)
Fixed
- Fixed
DummyLogger.log_hyperparams
raising aTypeError
when running withfast_dev_run=True
(#6398) - Fixed error on TPUs when there was no
ModelCheckpoint
(#6654) - Fixed
trainer.test
freeze on TPUs (#6654) - Fixed a bug where gradients were disabled after calling
Trainer.predict
(#6657) - Fixed bug where no TPUs were detected in a TPU pod env (#6719)
Contributors
@awaelchli, @carmocca, @ethanwharris, @kaushikb11, @rohitgr7, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Weekly patch release - torchmetrics compatibility
[1.2.5] - 2021-03-23
Changed
- Added Autocast in validation, test and predict modes for Native AMP (#6565)
- Update Gradient Clipping for the TPU Accelerator (#6576)
- Refactored setup for typing friendly (#6590)
Fixed
- Fixed a bug where
all_gather
would not work correctly withtpu_cores=8
(#6587) - Fixed comparing required versions (#6434)
- Fixed duplicate logs appearing in console when using the python logging module (#6275)
Contributors
@awaelchli, @Borda, @ethanwharris, @justusschock, @kaushikb11
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.2.4] - 2021-03-16
Changed
- Changed the default of
find_unused_parameters
back toTrue
in DDP and DDP Spawn (#6438)
Fixed
- Expose DeepSpeed loss parameters to allow users to fix loss instability (#6115)
- Fixed DP reduction with collection (#6324)
- Fixed an issue where the tuner would not tune the learning rate if also tuning the batch size (#4688)
- Fixed broadcast to use PyTorch
broadcast_object_list
and addreduce_decision
(#6410) - Fixed logger creating directory structure too early in DDP (#6380)
- Fixed DeepSpeed additional memory use on rank 0 when default device not set early enough (#6460)
- Fixed
DummyLogger.log_hyperparams
raising aTypeError
when running withfast_dev_run=True
(#6398) - Fixed an issue with
Tuner.scale_batch_size
not finding the batch size attribute in the datamodule (#5968) - Fixed an exception in the layer summary when the model contains torch.jit scripted submodules (#6511)
- Fixed when Train loop config was run during
Trainer.predict
(#6541)
Contributors
@awaelchli, @kaushikb11, @Palzer, @SeanNaren, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.2.3] - 2021-03-09
Fixed
- Fixed
ModelPruning(make_pruning_permanent=True)
pruning buffers getting removed when saved during training (#6073) - Fixed when
_stable_1d_sort
to work whenn >= N
(#6177) - Fixed
AttributeError
whenlogger=None
on TPU (#6221) - Fixed PyTorch Profiler with
emit_nvtx
(#6260) - Fixed
trainer.test
frombest_path
hangs after callingtrainer.fit
(#6272) - Fixed
SingleTPU
callingall_gather
(#6296) - Ensure we check deepspeed/sharded in multinode DDP (#6297)
- Check
LightningOptimizer
doesn't delete optimizer hooks (#6305) - Resolve memory leak for evaluation (#6326)
- Ensure that clip gradients is only called if the value is greater than 0 (#6330)
- Fixed
Trainer
not resettinglightning_optimizers
when callingTrainer.fit()
multiple times (#6372)
Contributors
@awaelchli, @carmocca, @chizuchizu, @frankier, @SeanNaren, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]