Skip to content

Releases: Lightning-AI/pytorch-lightning

standard weekly patch release

15 Dec 23:32
748a74e
Compare
Choose a tag to compare

Overview

Detail changes

Added

  • Add a notebook example to reach a quick baseline of ~94% accuracy on CIFAR10 using Resnet in Lightning (#4818)

Changed

  • Simplify accelerator steps (#5015)
  • Refactor load in checkpoint connector (#4593)

Removed

  • Drop duplicate metrics (#5014)
  • Remove beta arg from F1 class and functional (#5076)

Fixed

  • Fixed trainer by default None in DDPAccelerator (#4915)
  • Fixed LightningOptimizer to expose optimizer attributes (#5095)
  • Do not warn when the name key is used in the lr_scheduler dict (#5057)
  • Check if optimizer supports closure (#4981)
  • Extend LightningOptimizer to exposure underlying Optimizer attributes + update doc (#5095)
  • Add deprecated metric utility functions back to functional (#5067, #5068)
  • Allow any input in to_onnx and to_torchscript (#4378)
  • Do not warn when the name key is used in the lr_scheduler dict (#5057)

Contributors

@Borda, @carmocca, @hemildesai, @rohitgr7, @s-rog, @tarepan, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Model Parallelism Training and More Logging Options

10 Dec 01:05
cdbddbe
Compare
Choose a tag to compare

Overview

Lightning 1.1 is out! You can now train models with twice the parameters and zero code changes with the new sharded model training! We also have a new plugin for sequential model parallelism, more logging options, and a lot of improvements!
Release highlights: https://bit.ly/3gyLZpP

Learn more about sharded training: https://bit.ly/2W3hgI0

Detail changes

Added

  • Added "monitor" key to saved ModelCheckpoints (#4383)
  • Added ConfusionMatrix class interface (#4348)
  • Added multiclass AUROC metric (#4236)
  • Added global step indexing to the checkpoint name for a better sub-epoch checkpointing experience (#3807)
  • Added optimizer hooks in callbacks (#4379)
  • Added option to log momentum (#4384)
  • Added current_score to ModelCheckpoint.on_save_checkpoint (#4721)
  • Added logging using self.log in train and evaluation for epoch end hooks (#4913)
  • Added ability for DDP plugin to modify optimizer state saving (#4675)
  • Added casting to python types for NumPy scalars when logging hparams (#4647)
  • Added prefix argument in loggers (#4557)
  • Added printing of total num of params, trainable and non-trainable params in ModelSummary (#4521)
  • Added PrecisionRecallCurve, ROC, AveragePrecision class metric (#4549)
  • Added custom Apex and NativeAMP as Precision plugins (#4355)
  • Added DALI MNIST example (#3721)
  • Added sharded plugin for DDP for multi-GPU training memory optimizations (#4773)
  • Added experiment_id to the NeptuneLogger (#3462)
  • Added Pytorch Geometric integration example with Lightning (#4568)
  • Added all_gather method to LightningModule which allows gradient-based tensor synchronizations for use-cases such as negative sampling. (#5012)
  • Enabled self.log in most functions (#4969)
  • Added changeable extension variable for ModelCheckpoint (#4977)

Changed

  • Removed multiclass_roc and multiclass_precision_recall_curve, use roc and precision_recall_curve instead (#4549)
  • Tuner algorithms will be skipped if fast_dev_run=True (#3903)
  • WandbLogger does not force wandb reinit arg to True anymore and creates a run only when needed (#4648)
  • Changed automatic_optimization to be a model attribute (#4602)
  • Changed Simple Profiler report to order by percentage time spent + num calls (#4880)
  • Simplify optimization Logic (#4984)
  • Classification metrics overhaul (#4837)
  • Updated fast_dev_run to accept integer representing num_batches (#4629)
  • Refactored optimizer (#4658)

Deprecated

  • Deprecated prefix argument in ModelCheckpoint (#4765)
  • Deprecated the old way of assigning hyper-parameters through self.hparams = ... (#4813)
  • Deprecated mode='auto' from ModelCheckpoint and EarlyStopping (#4695)

Removed

  • Removed reorder parameter of the auc metric (#5004)

Fixed

  • Added feature to move tensors to CPU before saving (#4309)
  • Fixed LoggerConnector to have logged metrics on root device in DP (#4138)
  • Auto convert tensors to contiguous format when gather_all (#4907)
  • Fixed PYTHONPATH for DDP test model (#4528)
  • Fixed allowing logger to support indexing (#4595)
  • Fixed DDP and manual_optimization (#4976)

Contributors

@ananyahjha93, @awaelchli, @blatr, @Borda, @borisdayma, @carmocca, @ddrevicky, @george-gca, @gianscarpe, @irustandi, @janhenriklambrechts, @jeremyjordan, @justusschock, @lezwon, @rohitgr7, @s-rog, @SeanNaren, @SkafteNicki, @tadejsv, @tchaton, @williamFalcon, @zippeurfou

If we forgot someone due to not matching commit email with GitHub account, let us know :]

standard weekly patch release

24 Nov 16:55
Compare
Choose a tag to compare

Detail changes

Added

  • Added casting to python types for numpy scalars when logging hparams (#4647)
  • Added warning when progress bar refresh rate is less than 20 on Google Colab to prevent crashing (#4654)
  • Added F1 class metric (#4656)

Changed

  • Consistently use step=trainer.global_step in LearningRateMonitor independently of logging_interval (#4376)
  • Metric states are no longer as default added to state_dict (#4685)
  • Renamed class metric Fbeta >> FBeta (#4656)
  • Model summary: add 1 decimal place (#4745)
  • Do not override PYTHONWARNINGS (#4700)

Fixed

  • Fixed checkpoint hparams dict casting when omegaconf is available (#4770)
  • Fixed incomplete progress bars when total batches not divisible by refresh rate (#4577)
  • Updated SSIM metric (#4566)(#4656)
  • Fixed batch_arg_name - add batch_arg_name to all calls to _adjust_batch_sizebug (#4812)
  • Fixed torchtext data to GPU (#4785)
  • Fixed a crash bug in MLFlow logger (#4716)

Contributors

@awaelchli, @jonashaag, @jungwhank, @M-Salti, @moi90, @pgagarinov, @s-rog, @Samyak2, @SkafteNicki, @teddykoker, @ydcjeff

If we forgot someone due to not matching commit email with GitHub account, let us know :]

standard weekly patch release

17 Nov 21:57
Compare
Choose a tag to compare

Detail changes

Added

  • Added lambda closure to manual_optimizer_step (#4618)

Changed

  • Change Metrics persistent default mode to False (#4685)

Fixed

  • Prevent crash if sync_dist=True on CPU (#4626)
  • Fixed average pbar Metrics (#4534)
  • Fixed setup callback hook to correctly pass the LightningModule through (#4608)
  • Allowing decorate model init with saving hparams inside (#4662)
  • Fixed split_idx set by LoggerConnector in on_trainer_init to Trainer (#4697)

Contributors

@ananthsub, @Borda, @SeanNaren, @SkafteNicki, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

standard weekly patch release

11 Nov 13:16
Compare
Choose a tag to compare

Detail changes

Added

  • Added metrics aggregation in Horovod and fixed early stopping (#3775)
  • Added manual_optimizer_step which work with AMP Native and accumulated_grad_batches (#4485)
  • Added persistent(mode) method to metrics, to enable and disable metric states being added to state_dict (#4482)
  • Added congratulations at the end of our notebooks (#4555)

Changed

Fixed

  • Fixed feature-lack in hpc_load (#4526)
  • Fixed metrics states being overridden in DDP mode (#4482)
  • Fixed lightning_getattr, lightning_hasattr not finding the correct attributes in datamodule (#4347)
  • Fixed automatic optimization AMP by manual_optimization_step (#4485)
  • Replace MisconfigurationException with warning in ModelCheckpoint Callback (#4560)
  • Fixed logged keys in mlflow logger (#4412)
  • Fixed is_picklable by catching AttributeError (#4508)

Contributors

@dscarmo, @jtamir, @kazhang, @maxjeblick, @rohitgr7, @SkafteNicki, @tarepan, @tchaton, @tgaddair, @williamFalcon

If we forgot someone due to not matching commit email with GitHub account, let us know :]

standard weekly patch release

04 Nov 02:00
Compare
Choose a tag to compare

Detail changes

Added

  • Added PyTorch 1.7 Stable support (#3821)
  • Added timeout for tpu_device_exists to ensure process does not hang indefinitely (#4340)

Changed

  • W&B log in sync with Trainer step (#4405)
  • Hook on_after_backward is called only when optimizer_step is being called (#4439)
  • Moved track_and_norm_grad into training loop and called only when optimizer_step is being called (#4439)
  • Changed type checker with explicit cast of ref_model object (#4457)

Deprecated

  • Deprecated passing ModelCheckpoint instance to checkpoint_callback Trainer argument (#4336)

Fixed

  • Disable saving checkpoints if not trained (#4372)
  • Fixed error using auto_select_gpus=True with gpus=-1 (#4209)
  • Disabled training when limit_train_batches=0 (#4371)
  • Fixed that metrics do not store computational graph for all seen data (#4313)
  • Fixed AMP unscale for on_after_backward (#4439)
  • Fixed TorchScript export when module includes Metrics (#4428)
  • Fixed CSV logger warning (#4419)
  • Fixed skip DDP parameter sync (#4301)

Contributors

@ananthsub, @awaelchli, @borisdayma, @carmocca, @justusschock, @lezwon, @rohitgr7, @SeanNaren, @SkafteNicki, @ssaru, @tchaton, @ydcjeff

If we forgot someone due to not matching commit email with GitHub account, let us know :]

standard weekly patch release

27 Oct 22:15
5d10a36
Compare
Choose a tag to compare

Detail changes

Added

  • Added dirpath and filename parameter in ModelCheckpoint (#4213)
  • Added plugins docs and DDPPlugin to customize ddp across all accelerators (#4258)
  • Added strict option to the scheduler dictionary (#3586)
  • Added fsspec support for profilers (#4162)
  • Added autogenerated helptext to Trainer.add_argparse_args (#4344)
  • Added support for string values in Trainer's profiler parameter (#3656)

Changed

  • Improved error messages for invalid configure_optimizers returns (#3587)
  • Allow changing the logged step value in validation_step (#4130)
  • Allow setting replace_sampler_ddp=True with a distributed sampler already added (#4273)
  • Fixed santized parameters for WandbLogger.log_hyperparams (#4320)

Deprecated

  • Deprecated filepath in ModelCheckpoint (#4213)
  • Deprecated reorder parameter of the auc metric (#4237)
  • Deprecated bool values in Trainer's profiler parameter (#3656)

Fixed

  • Fixed setting device ids in DDP (#4297)
  • Fixed synchronization of best model path in ddp_accelerator (#4323)
  • Fixed WandbLogger not uploading checkpoint artifacts at the end of training (#4341)

Contributors

@ananthsub, @awaelchli, @carmocca, @ddrevicky, @louis-she, @mauvilsa, @rohitgr7, @SeanNaren, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

standard weekly patch release

20 Oct 23:12
e0e402d
Compare
Choose a tag to compare

Detail changes

Added

  • Added persistent flag to Metric.add_state (#4195)

Changed

  • Used checkpoint_connector.hpc_save in SLURM (#4217)
  • Moved base req. to root (#4219)

Fixed

  • Fixed hparams assign in init (#4189)
  • Fixed overwrite check for model hooks (#4010)

Contributors

@Borda, @EspenHa, @teddykoker

If we forgot someone due to not matching commit email with GitHub account, let us know :]

fixes a major logging bug for val in 1.0

15 Oct 14:59
5c153c2
Compare
Choose a tag to compare

Fixes the last major bugs for validation logging.
Also removes duplicate charts for metric / metric_loss.
Doing this minor release because correct validation metrics logging is critical.

Details changes

Added

  • Added trace functionality to the function to_torchscript (#4142)

Changed

  • Called on_load_checkpoint before loading state_dict (#4057)

Removed

  • Removed duplicate metric vs step log for train loop (#4173)

Fixed

  • Fixed the self.log problem in validation_step() (#4169)
  • Fixed hparams saving - save the state when save_hyperparameters() is called [in __init__] (#4163)
  • Fixed runtime failure while exporting hparams to yaml (#4158)

Contributors

@Borda, @NumesSanguis, @rohitgr7, @williamFalcon

If we forgot someone due to not matching commit email with GitHub account, let us know :]

minor jit fixes

14 Oct 00:47
bbbc111
Compare
Choose a tag to compare

Obligatory post 1.0 minor release. Main fix is to make Lightning module fully compatible with Jit (had some edge-cases we had not covered).