Releases: Lightning-AI/pytorch-lightning
Week bugfix release
[0.5.5] - 2022-08-9
Deprecated
- Deprecate sheety API (#14004)
Fixed
- Resolved a bug where the work statuses will grow quickly and be duplicated (#13970)
- Resolved a bug about a race condition when sending the work state through the caller_queue (#14074)
- Fixed Start Lightning App on Cloud if Repo Begins With Name "Lightning" (#14025)
Contributors
If we forgot someone due to not matching commit email with GitHub account, let us know :]
PyTorch Lightning 1.7: Apple Silicon support, Native FSDP, Collaborative training, and multi-GPU support with Jupyter notebooks
The core team is excited to announce the release of PyTorch Lightning 1.7 ⚡
PyTorch Lightning 1.7 is the culmination of work from 106 contributors who have worked on features, bug-fixes, and documentation for a total of over 492 commits since 1.6.0.
Highlights
Apple Silicon Support
For those using PyTorch 1.12 on M1 or M2 Apple machines, we have created the MPSAccelerator
. MPSAccelerator
enables accelerated GPU training on Apple’s Metal Performance Shaders (MPS) as a backend process.
NOTE
Support for this accelerator is currently marked as experimental in PyTorch. Because many operators are still missing, you may run into a few rough edges.
# Selects the accelerator
trainer = pl.Trainer(accelerator="mps")
# Equivalent to
from pytorch_lightning.accelerators import MPSAccelerator
trainer = pl.Trainer(accelerator=MPSAccelerator())
# Defaults to "mps" when run on M1 or M2 Apple machines
# to avoid code changes when switching computers
trainer = pl.Trainer(accelerator="gpu")
Native Fully Sharded Data Parallel Strategy
PyTorch 1.12 also added native support for Fully Sharded Data Parallel (FSDP). Previously, PyTorch Lightning enabled this by using the fairscale
project. You can now choose between both options.
NOTE
Support for this strategy is marked as beta in PyTorch.
# Native PyTorch implementation
trainer = pl.Trainer(strategy="fsdp_native")
# Equivalent to
from pytorch_lightning.strategies import DDPFullyShardedNativeStrategy
trainer = pl.Trainer(strategy=DDPFullyShardedNativeStrategy())
# For reference, FairScale's implementation can be used with
trainer = pl.Trainer(strategy="fsdp")
A Collaborative Training strategy using Hivemind
Collaborative Training solves the need for top-tier multi-GPU servers by allowing you to train across unreliable machines such as local ones or even preemptible cloud compute across the Internet.
Under the hood, we use Hivemind. This provides de-centralized training across the Internet.
from pytorch_lightning.strategies import HivemindStrategy
trainer = pl.Trainer(
strategy=HivemindStrategy(target_batch_size=8192),
accelerator="gpu",
devices=1
)
For more information, check out the docs.
Distributed support in Jupyter Notebooks
So far, the only multi-GPU strategy supported in Jupyter notebooks (including Grid.ai, Google Colab, and Kaggle, for example) has been the Data-Parallel (DP) strategy (strategy="dp"
). DP, however, has several limitations that often obstruct users' workflows. It can be slow, it's incompatible with TorchMetrics, it doesn't persist state changes on replicas, and it's difficult to use with non-primitive input- and output structures.
In this release, we've added support for Distributed Data Parallel in Jupyter notebooks using the fork mechanism to address these shortcomings. This is only available for MacOS and Linux (sorry Windows!).
NOTE
This feature is experimental.
This is how you use multi-device in notebooks now:
# Train on 2 GPUs in a Jupyter notebook
trainer = pl.Trainer(accelerator="gpu", devices=2)
# Can be set explicitly
trainer = pl.Trainer(accelerator="gpu", devices=2, strategy="ddp_notebook")
# Can also be used in non-interactive environments
trainer = pl.Trainer(accelerator="gpu", devices=2, strategy="ddp_fork")
By default, the Trainer detects the interactive environment and selects the right strategy for you. Learn more in the full documentation.
Versioning of "last" checkpoints
If a run is configured to save to the same directory as a previous run and ModelCheckpoint(save_last=True)
is enabled, the "last" checkpoint is now versioned with a simple -v1
suffix to avoid overwriting the existing "last" checkpoint. This mimics the behaviour for checkpoints that monitor a metric.
Automatically reload the "last" checkpoint
In certain scenarios, like when running in a cloud spot instance with fault-tolerant training enabled, it is useful to load the latest available checkpoint. It is now possible to pass the string ckpt_path="last"
in order to load the latest available checkpoint from the set of existing checkpoints.
trainer = Trainer(...)
trainer.fit(..., ckpt_path="last")
Validation every N batches across epochs
In some cases, for example iteration based training, it is useful to run validation after every N
number of training batches without being limited by the epoch boundary. Now, you can enable validation based on total training batches.
trainer = Trainer(..., val_check_interval=N, check_val_every_n_epoch=None)
trainer.fit(...)
For example, given 5 epochs of 10 batches, setting N=25
would run validation in the 3rd and 5th epoch.
CPU stats monitoring
PyTorch Lightning provides the DeviceStatsMonitor
callback to monitor the stats of the hardware currently used. However, users often also want to monitor the stats of other hardware. In this release, we have added an option to additionally monitor CPU stats:
from pytorch_lightning.callbacks import DeviceStatsMonitor
# Log both CPU stats and GPU stats
trainer = pl.Trainer(callbacks=DeviceStatsMonitor(cpu_stats=True), accelerator="gpu")
# Log just the GPU stats
trainer = pl.Trainer(callbacks=DeviceStatsMonitor(cpu_stats=False), accelerator="gpu")
# Equivalent to `DeviceStatsMonitor()`
trainer = pl.Trainer(callbacks=DeviceStatsMonitor(cpu_stats=True), accelerator="cpu")
The CPU stats are gathered using the psutil
package.
Automatic distributed samplers
It is now possible to use custom samplers in a distributed environment without the need to set replace_ddp_sampler=False
and wrap your sampler manually with the DistributedSampler
.
Inference mode support
PyTorch 1.9 introduced torch.inference_mode
, which is a faster alternative for torch.no_grad
. Lightning will now use inference_mode
wherever possible during evaluation.
Support for warn-level determinism
In Pytorch 1.11, operations that do not have a deterministic implementation can be set to throw a warning instead of an error when ran in deterministic mode. This is now supported by our Trainer
:
trainer = pl.Trainer(deterministic="warn")
LightningCLI improvements
After the latest updates to jsonargparse
, the library supporting the LightningCLI
, there's now complete support for shorthand notation. This includes automatic support for shorthand notation to all arguments, not just the ones that are part of the registries, plus support inside configuration files.
+ # pytorch_lightning==1.7.0
trainer:
callbacks:
- - class_path: pytorch_lightning.callbacks.EarlyStopping
+ - class_path: EarlyStopping
init_args:
monitor: "loss"
A header with the version that generated the config is now included.
All subclasses for a given base class can be specified by name, so there's no need to explicitly register them. The only requirement is that the module where the subclass is defined is imported prior to parsing.
from pytorch_lightning.cli import LightningCLI
import my_code.models
import my_code.optimizers
cli = LightningCLI()
# Now use any of the classes:
# python trainer.py fit --model=Model1 --optimizer=CustomOptimizer
The new version renders the registries and the auto_registry
flag, introduced in 1.6.0, unnecessary, so we have deprecated them.
Support was also added for list appending; for example, to add a callback to an existing list that might be already configured:
$ python trainer.py fit \
- --trainer.callbacks=EarlyStopping \
+ --trainer.callbacks+=EarlyStopping \
--trainer.callbacks.patience=5 \
- --trainer.callbacks=LearningRateMonitor \
+ --trainer.callbacks+=LearningRateMonitor \
--trainer.callbacks.logging_interval=epoch
Callback registration through entry points
Entry Points are an advanced feature in Python's setuptools that allow packages to expose metadata to other packages. In Lightning, we ...
Build-in templates
Minor bug-fix release
Lightning App 0.5.2
PyTorch Lightning 1.6.5: Standard patch release
[1.6.5] - 2022-07-13
Fixed
- Fixed
estimated_stepping_batches
requiring distributed comms inconfigure_optimizers
for theDeepSpeedStrategy
(#13350) - Fixed bug with Python version check that prevented use with development versions of Python (#13420)
- The loops now call
.set_epoch()
also on batch samplers if the dataloader has one wrapped in a distributed sampler (#13396) - Fixed the restoration of log step during restart (#13467)
Contributors
@adamjstewart @akihironitta @awaelchli @Borda @martinosorb @rohitgr7 @SeanNaren
PyTorch Lightning 1.6.4: Standard patch release
[1.6.4] - 2022-06-01
Added
- Added all DDP params to be exposed through hpu parallel strategy (#13067)
Changed
- Keep
torch.backends.cudnn.benchmark=False
by default (unlike in v1.6.{0-4}) after speed and memory problems depending on the data used. Please consider tuningTrainer(benchmark)
manually. (#13154) - Prevent modification of
torch.backends.cudnn.benchmark
whenTrainer(benchmark=...)
is not set (#13154)
Fixed
- Fixed an issue causing zero-division error for empty dataloaders (#12885)
- Fixed mismatching default values for the types of some arguments in the DeepSpeed and Fully-Sharded strategies which made the CLI unable to use them (#12989)
- Avoid redundant callback restore warning while tuning (#13026)
- Fixed
Trainer(precision=64)
during evaluation which now uses the wrapped precision module (#12983) - Fixed an issue to use wrapped
LightningModule
for evaluation duringtrainer.fit
forBaguaStrategy
(#12983) - Fixed an issue wrt unnecessary usage of habana mixed precision package for fp32 types (#13028)
- Fixed the number of references of
LightningModule
so it can be deleted (#12897) - Fixed
materialize_module
setting a module's child recursively (#12870) - Fixed issue where the CLI could not pass a
Profiler
to theTrainer
(#13084) - Fixed torchelastic detection with non-distributed installations (#13142)
- Fixed logging's step values when multiple dataloaders are used during evaluation (#12184)
- Fixed epoch logging on train epoch end (#13025)
- Fixed
DDPStrategy
andDDPSpawnStrategy
to initialize optimizers only after moving the module to the device (#11952)
Contributors
@akihironitta @ananthsub @ar90n @awaelchli @Borda @carmocca @dependabot @jerome-habana @mads-oestergaard @otaj @rohitgr7
PyTorch Lightning 1.6.3: Standard patch release
[1.6.3] - 2022-05-03
Fixed
- Use only a single instance of
rich.console.Console
throughout codebase (#12886) - Fixed an issue to ensure all the checkpoint states are saved in a common filepath with
DeepspeedStrategy
(#12887) - Fixed
trainer.logger
deprecation message (#12671) - Fixed an issue where sharded grad scaler is passed in when using BF16 with the
ShardedStrategy
(#12915) - Fixed an issue wrt recursive invocation of DDP configuration in hpu parallel plugin (#12912)
- Fixed printing of ragged dictionaries in
Trainer.validate
andTrainer.test
(#12857) - Fixed threading support for legacy loading of checkpoints (#12814)
- Fixed pickling of
KFoldLoop
(#12441) - Stopped
optimizer_zero_grad
from being called after IPU execution (#12913) - Fixed
fuse_modules
to be qat-aware fortorch>=1.11
(#12891) - Enforced eval shuffle warning only for default samplers in DataLoader (#12653)
- Enable mixed precision in
DDPFullyShardedStrategy
whenprecision=16
(#12965) - Fixed
TQDMProgressBar
reset and update to show correct time estimation (#12889) - Fixed fit loop restart logic to enable resume using the checkpoint (#12821)
Contributors
@akihironitta @carmocca @hmellor @jerome-habana @kaushikb11 @krshrimali @mauvilsa @niberger @ORippler @otaj @rohitgr7 @SeanNaren
PyTorch Lightning 1.6.2: Standard patch release
[1.6.2] - 2022-04-27
Fixed
- Fixed
ImportError
whentorch.distributed
is not available. (#12794) - When using custom DataLoaders in LightningDataModule, multiple inheritance is resolved properly (#12716)
- Fixed encoding issues on terminals that do not support unicode characters (#12828)
- Fixed support for
ModelCheckpoint
monitors with dots (#12783)
Contributors
@akihironitta @alvitawa @awaelchli @Borda @carmocca @code-review-doctor @ethanfurman @HenryLau0220 @krshrimali @otaj
PyTorch Lightning 1.6.1: Standard weekly patch release
[1.6.1] - 2022-04-13
Changed
- Support
strategy
argument being case insensitive (#12528)
Fixed
- Run main progress bar updates independent of val progress bar updates in
TQDMProgressBar
(#12563) - Avoid calling
average_parameters
multiple times per optimizer step (#12452) - Properly pass some Logger's parent's arguments to
super().__init__()
(#12609) - Fixed an issue where incorrect type warnings appear when the overridden
LightningLite.run
method accepts user-defined arguments (#12629) - Fixed
rank_zero_only
decorator in LSF environments (#12587) - Don't raise a warning when
nn.Module
is not saved under hparams (#12669) - Raise
MisconfigurationException
when the accelerator is available but the user passes invalid([]/0/"0")
values to thedevices
flag (#12708) - Support
auto_select_gpus
with the accelerator and devices API (#12608)
Contributors
@akihironitta @awaelchli @Borda @carmocca @kaushikb11 @krshrimali @mauvilsa @otaj @pre-commit-ci @rohitgr7 @semaphore-egg @tkonopka @wayi1
If we forgot someone due to not matching the commit email with the GitHub account, let us know :]