Skip to content

Commit 8c7f562

Browse files
authored
Merge 430f66f into c219aa4
2 parents c219aa4 + 430f66f commit 8c7f562

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+804
-553
lines changed

.github/workflows/ci_test-conda.yml

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,6 @@ jobs:
3030
pip install --requirement requirements/devel.txt --upgrade-strategy only-if-needed
3131
pip list
3232
33-
- name: Cache datasets
34-
# todo this probably does not work with docker images, rather cache dockers
35-
uses: actions/cache@v2
36-
with:
37-
path: Datasets
38-
key: pl-dataset
39-
4033
- name: Pull checkpoints from S3
4134
# todo: consider adding coma caching, but ATM all models have less then 100KB
4235
run: |
@@ -46,6 +39,12 @@ jobs:
4639
unzip -o checkpoints.zip
4740
ls -l checkpoints/
4841
42+
# todo: require proper fix in docker image
43+
- name: Hotfix dependency
44+
run: |
45+
pip install torchtext==0.6.0 -U
46+
shell: bash
47+
4948
- name: Tests
5049
run: |
5150
# NOTE: run coverage on tests does not propagare faler status for Win, https://github.com/nedbat/coveragepy/issues/1003

.github/workflows/ci_test-full.yml

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,12 @@ jobs:
112112
pip list
113113
shell: bash
114114

115+
# todo: require proper fix in docker image
116+
- name: Hotfix dependency
117+
run: |
118+
pip install torchtext==0.6.0 -U
119+
shell: bash
120+
115121
- name: Reinstall Horovod if necessary
116122
if: runner.os != 'windows'
117123
env:
@@ -135,7 +141,12 @@ jobs:
135141
- name: Tests
136142
run: |
137143
# NOTE: do not include coverage report here, see: https://github.com/nedbat/coveragepy/issues/1003
138-
coverage run --source pytorch_lightning -m pytest pytorch_lightning tests pl_examples -v --durations=50 --junitxml=junit/test-results-${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.requires }}.xml
144+
coverage run --source pytorch_lightning -m pytest pytorch_lightning tests -v --durations=50 --junitxml=junit/test-results-${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.requires }}.xml
145+
146+
# todo: put this back just when TorchVision can download datasets
147+
#- name: Examples
148+
# run: |
149+
# python -m pytest pl_examples -v --durations=10
139150

140151
- name: Upload pytest test results
141152
uses: actions/upload-artifact@v2

CHANGELOG.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,27 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
66

77

8+
## [1.2.2] - 2021-03-02
9+
10+
### Added
11+
12+
- Added `checkpoint` parameter to callback's `on_save_checkpoint` hook ([#6072](https://github.com/PyTorchLightning/pytorch-lightning/pull/6072))
13+
14+
### Changed
15+
16+
- Changed the order of `backward`, `step`, `zero_grad` to `zero_grad`, `backward`, `step` ([#6147](https://github.com/PyTorchLightning/pytorch-lightning/pull/6147))
17+
- Changed default for DeepSpeed CPU Offload to False, due to prohibitively slow speeds at smaller scale ([#6262](https://github.com/PyTorchLightning/pytorch-lightning/pull/6262))
18+
19+
### Fixed
20+
21+
- Fixed epoch level schedulers not being called when `val_check_interval < 1.0` ([#6075](https://github.com/PyTorchLightning/pytorch-lightning/pull/6075))
22+
- Fixed multiple early stopping callbacks ([#6197](https://github.com/PyTorchLightning/pytorch-lightning/pull/6197))
23+
- Fixed incorrect usage of `detach()`, `cpu()`, `to()` ([#6216](https://github.com/PyTorchLightning/pytorch-lightning/pull/6216))
24+
- Fixed LBFGS optimizer support which didn't converge in automatic optimization ([#6147](https://github.com/PyTorchLightning/pytorch-lightning/pull/6147))
25+
- Prevent `WandbLogger` from dropping values ([#5931](https://github.com/PyTorchLightning/pytorch-lightning/pull/5931))
26+
- Fixed error thrown when using valid distributed mode in multi node ([#6297](https://github.com/PyTorchLightning/pytorch-lightning/pull/6297)
27+
28+
829
## [1.2.1] - 2021-02-23
930

1031
### Fixed

azure-pipelines.yml

Lines changed: 21 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ jobs:
2323
# how much time to give 'run always even if cancelled tasks' before stopping them
2424
cancelTimeoutInMinutes: 2
2525

26-
pool: dsvm-spot-pool
26+
pool: gridai-spot-pool
2727

2828
#strategy:
2929
# matrix:
@@ -58,25 +58,31 @@ jobs:
5858
export GIT_TERMINAL_PROMPT=1
5959
#sudo apt-get install -y cmake
6060
# python -m pip install "pip==20.1"
61-
pip install --requirement requirements.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html
61+
pip install --requirement requirements.txt
6262
python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if 'fairscale' not in line] ; open(fname, 'w').writelines(lines)"
6363
python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if 'horovod' not in line] ; open(fname, 'w').writelines(lines)"
6464
pip install --requirement ./requirements/devel.txt --upgrade-strategy only-if-needed
6565
pip install git+https://$(AUTH_TOKEN)@github.com/PyTorchLightning/[email protected] --no-cache-dir
6666
pip list
6767
displayName: 'Install dependencies'
6868
69-
- script: |
69+
- bash: |
7070
python tests/collect_env_details.py
71+
python -c "import torch ; mgpu = torch.cuda.device_count() ; assert mgpu >= 2, f'GPU: {mgpu}'"
7172
displayName: 'Env details'
7273
74+
# todo: require proper fix in docker image
75+
- bash: |
76+
pip install torchtext==0.7 -U
77+
displayName: 'HotFix'
78+
7379
- bash: |
7480
wget https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip -P legacy/
7581
unzip -o legacy/checkpoints.zip -d legacy/
7682
ls -l legacy/checkpoints/
7783
displayName: 'Get legacy checkpoints'
7884
79-
- script: |
85+
- bash: |
8086
python -m coverage run --source pytorch_lightning -m pytest pytorch_lightning tests -v --durations=50
8187
displayName: 'Testing: standard'
8288
@@ -90,12 +96,14 @@ jobs:
9096
codecov --token=$(CODECOV_TOKEN) --flags=gpu,pytest --name="GPU-coverage" --env=linux,azure
9197
displayName: 'Statistics'
9298
93-
- script: |
94-
python -m pytest benchmarks pl_examples -v --maxfail=2 --durations=0
95-
displayName: 'Testing: extended'
96-
97-
- script: |
98-
python setup.py install --user --quiet
99-
bash pl_examples/run_ddp-example.sh
100-
pip uninstall -y pytorch-lightning
101-
displayName: 'Examples'
99+
- bash: |
100+
python -m pytest benchmarks -v --maxfail=2 --durations=0
101+
displayName: 'Testing: benchmarks'
102+
103+
# todo: put this back just when TorchVision can download datasets
104+
#- bash: |
105+
# python -m pytest pl_examples -v --maxfail=2 --durations=0
106+
# python setup.py install --user --quiet
107+
# bash pl_examples/run_ddp-example.sh
108+
# pip uninstall -y pytorch-lightning
109+
# displayName: 'Examples'

docs/source/common/lightning_module.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -946,7 +946,7 @@ When set to ``False``, Lightning does not automate the optimization process. Thi
946946
opt = self.optimizers(use_pl_optimizer=True)
947947
948948
loss = ...
949-
self.manual_backward(loss, opt)
949+
self.manual_backward(loss)
950950
opt.step()
951951
opt.zero_grad()
952952
@@ -961,16 +961,16 @@ In the multi-optimizer case, ignore the ``optimizer_idx`` argument and use the o
961961
962962
def training_step(self, batch, batch_idx, optimizer_idx):
963963
# access your optimizers with use_pl_optimizer=False. Default is True
964-
(opt_a, opt_b) = self.optimizers(use_pl_optimizer=True)
964+
opt_a, opt_b = self.optimizers(use_pl_optimizer=True)
965965
966966
gen_loss = ...
967967
opt_a.zero_grad()
968-
self.manual_backward(gen_loss, opt_a)
968+
self.manual_backward(gen_loss)
969969
opt_a.step()
970970
971971
disc_loss = ...
972972
opt_b.zero_grad()
973-
self.manual_backward(disc_loss, opt_b)
973+
self.manual_backward(disc_loss)
974974
opt_b.step()
975975
976976
--------------

docs/source/common/optimizers.rst

Lines changed: 39 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -23,32 +23,31 @@ to manually manage the optimization process. To do so, do the following:
2323

2424
* Override your LightningModule ``automatic_optimization`` property to return ``False``
2525
* Drop or ignore the optimizer_idx argument
26-
* Use `self.manual_backward(loss)` instead of `loss.backward()`.
26+
* Use ``self.manual_backward(loss)`` instead of ``loss.backward()``.
2727

28-
.. note:: This is only recommended for experts who need ultimate flexibility. Lightning will handle only precision and accelerators logic. The users are left with zero_grad, accumulated_grad_batches, model toggling, etc..
28+
.. note:: This is only recommended for experts who need ultimate flexibility. Lightning will handle only precision and accelerators logic. The users are left with ``optimizer.zero_grad()``, gradient accumulation, model toggling, etc..
2929

30-
.. warning:: Before 1.2, ``optimzer.step`` was calling ``zero_grad`` internally. From 1.2, it is left to the users expertize.
30+
.. warning:: Before 1.2, ``optimzer.step`` was calling ``optimizer.zero_grad()`` internally. From 1.2, it is left to the users expertize.
3131

3232
.. tip:: To perform ``accumulate_grad_batches`` with one optimizer, you can do as such.
3333

3434
.. tip:: ``self.optimizers()`` will return ``LightningOptimizer`` objects. You can access your own optimizer with ``optimizer.optimizer``. However, if you use your own optimizer to perform a step, Lightning won't be able to support accelerators and precision for you.
3535

36-
3736
.. code-block:: python
3837
3938
def training_step(batch, batch_idx, optimizer_idx):
4039
opt = self.optimizers()
4140
4241
loss = self.compute_loss(batch)
4342
self.manual_backward(loss)
44-
opt.step()
4543
4644
# accumulate gradient batches
4745
if batch_idx % 2 == 0:
46+
opt.step()
4847
opt.zero_grad()
4948
5049
51-
.. tip:: It is a good practice to provide the optimizer with a ``closure`` function that performs a ``forward`` and ``backward`` pass of your model. It is optional for most optimizers, but makes your code compatible if you switch to an optimizer which requires a closure.
50+
.. tip:: It is a good practice to provide the optimizer with a ``closure`` function that performs a ``forward`` and ``backward`` pass of your model. It is optional for most optimizers, but makes your code compatible if you switch to an optimizer which requires a closure. See also `the PyTorch docs <https://pytorch.org/docs/stable/optim.html#optimizer-step-closure>`_.
5251

5352
Here is the same example as above using a ``closure``.
5453

@@ -71,7 +70,6 @@ Here is the same example as above using a ``closure``.
7170
.. code-block:: python
7271
7372
# Scenario for a GAN.
74-
7573
def training_step(...):
7674
opt_gen, opt_dis = self.optimizers()
7775
@@ -137,8 +135,12 @@ Here is an example on how to use it:
137135

138136
Automatic optimization
139137
======================
140-
With Lightning most users don't have to think about when to call .backward(), .step(), .zero_grad(), since
141-
Lightning automates that for you.
138+
With Lightning most users don't have to think about when to call ``.zero_grad()``, ``.backward()`` and ``.step()``
139+
since Lightning automates that for you.
140+
141+
.. warning::
142+
Before 1.2.2, ``.zero_grad()`` was called after ``.backward()`` and ``.step()`` internally.
143+
From 1.2.2, Lightning calls ``.zero_grad()`` before ``.backward()``.
142144

143145
Under the hood Lightning does the following:
144146

@@ -147,33 +149,33 @@ Under the hood Lightning does the following:
147149
for epoch in epochs:
148150
for batch in data:
149151
loss = model.training_step(batch, batch_idx, ...)
152+
optimizer.zero_grad()
150153
loss.backward()
151154
optimizer.step()
152-
optimizer.zero_grad()
153155
154-
for scheduler in schedulers:
155-
scheduler.step()
156+
for lr_scheduler in lr_schedulers:
157+
lr_scheduler.step()
156158
157159
In the case of multiple optimizers, Lightning does the following:
158160

159161
.. code-block:: python
160162
161163
for epoch in epochs:
162-
for batch in data:
163-
for opt in optimizers:
164-
disable_grads_for_other_optimizers()
165-
train_step(opt)
166-
opt.step()
164+
for batch in data:
165+
for opt in optimizers:
166+
loss = model.training_step(batch, batch_idx, optimizer_idx)
167+
opt.zero_grad()
168+
loss.backward()
169+
opt.step()
167170
168-
for scheduler in schedulers:
169-
scheduler.step()
171+
for lr_scheduler in lr_schedulers:
172+
lr_scheduler.step()
170173
171174
172175
Learning rate scheduling
173176
------------------------
174-
Every optimizer you use can be paired with any `LearningRateScheduler <https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate>`_.
175-
In the basic use-case, the scheduler (or multiple schedulers) should be returned as the second output from the ``.configure_optimizers``
176-
method:
177+
Every optimizer you use can be paired with any `Learning Rate Scheduler <https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate>`_.
178+
In the basic use-case, the scheduler (or multiple schedulers) should be returned as the second output from the ``.configure_optimizers`` method:
177179

178180
.. testcode::
179181

@@ -262,7 +264,7 @@ returned as a dict which can contain the following keywords:
262264

263265
Use multiple optimizers (like GANs)
264266
-----------------------------------
265-
To use multiple optimizers return > 1 optimizers from :meth:`pytorch_lightning.core.LightningModule.configure_optimizers`
267+
To use multiple optimizers return two or more optimizers from :meth:`pytorch_lightning.core.LightningModule.configure_optimizers`
266268

267269
.. testcode::
268270

@@ -283,13 +285,15 @@ Lightning will call each optimizer sequentially:
283285
.. code-block:: python
284286
285287
for epoch in epochs:
286-
for batch in data:
287-
for opt in optimizers:
288-
train_step(opt)
289-
opt.step()
288+
for batch in data:
289+
for opt in optimizers:
290+
loss = train_step(batch, batch_idx, optimizer_idx)
291+
opt.zero_grad()
292+
loss.backward()
293+
opt.step()
290294
291-
for scheduler in schedulers:
292-
scheduler.step()
295+
for lr_scheduler in lr_schedulers:
296+
lr_scheduler.step()
293297
294298
----------
295299

@@ -334,7 +338,7 @@ Here we add a learning-rate warm up
334338
# update params
335339
optimizer.step(closure=closure)
336340

337-
.. note:: The default ``optimizer_step`` is relying on the internal ``LightningOptimizer`` to properly perform a step. It handles TPUs, AMP, accumulate_grad_batches, zero_grad, and much more ...
341+
.. note:: The default ``optimizer_step`` is relying on the internal ``LightningOptimizer`` to properly perform a step. It handles TPUs, AMP, accumulate_grad_batches and much more ...
338342

339343
.. testcode::
340344

@@ -364,6 +368,11 @@ Using the closure functions for optimization
364368

365369
When using optimization schemes such as LBFGS, the `second_order_closure` needs to be enabled. By default, this function is defined by wrapping the `training_step` and the backward steps as follows
366370

371+
.. warning::
372+
Before 1.2.2, ``.zero_grad()`` was called outside the closure internally.
373+
From 1.2.2, the closure calls ``.zero_grad()`` inside, so there is no need to define your own closure
374+
when using similar optimizers to :class:`torch.optim.LBFGS` which requires reevaluation of the loss with the closure in ``optimizer.step()``.
375+
367376
.. testcode::
368377

369378
def second_order_closure(pl_module, split_batch, batch_idx, opt_idx, optimizer, hidden):

docs/source/starter/introduction_guide.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -361,9 +361,9 @@ The training step is what happens inside the training loop.
361361
# TRAINING STEP
362362
# ....
363363
# TRAINING STEP
364+
optimizer.zero_grad()
364365
loss.backward()
365366
optimizer.step()
366-
optimizer.zero_grad()
367367
368368
In the case of MNIST, we do the following
369369

@@ -377,9 +377,9 @@ In the case of MNIST, we do the following
377377
loss = F.nll_loss(logits, y)
378378
# ------ TRAINING STEP END ------
379379
380+
optimizer.zero_grad()
380381
loss.backward()
381382
optimizer.step()
382-
optimizer.zero_grad()
383383
384384
In Lightning, everything that is in the training step gets organized under the
385385
:func:`~pytorch_lightning.core.LightningModule.training_step` function in the LightningModule.

docs/source/starter/new-project.rst

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -248,7 +248,7 @@ as long as you return a loss with an attached graph from the `training_step`, Li
248248
.. code-block:: python
249249
250250
def training_step(self, batch, batch_idx):
251-
loss = self.encoder(batch[0])
251+
loss = self.encoder(batch)
252252
return loss
253253
254254
.. _manual_opt:
@@ -267,19 +267,18 @@ Turn off automatic optimization and you control the train loop!
267267
268268
def training_step(self, batch, batch_idx, optimizer_idx):
269269
# access your optimizers with use_pl_optimizer=False. Default is True
270-
(opt_a, opt_b, opt_c) = self.optimizers(use_pl_optimizer=True)
270+
opt_a, opt_b = self.optimizers(use_pl_optimizer=True)
271271
272-
loss_a = self.generator(batch[0])
273-
274-
# use this instead of loss.backward so we can automate half precision, etc...
275-
self.manual_backward(loss_a, opt_a, retain_graph=True)
276-
self.manual_backward(loss_a, opt_a)
277-
opt_a.step()
272+
loss_a = self.generator(batch)
278273
opt_a.zero_grad()
274+
# use `manual_backward()` instead of `loss.backward` to automate half precision, etc...
275+
self.manual_backward(loss_a)
276+
opt_a.step()
279277
280-
loss_b = self.discriminator(batch[0])
281-
self.manual_backward(loss_b, opt_b)
282-
...
278+
loss_b = self.discriminator(batch)
279+
opt_b.zero_grad()
280+
self.manual_backward(loss_b)
281+
opt_b.step()
283282
284283
285284
Predict or Deploy

0 commit comments

Comments
 (0)