You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+5Lines changed: 5 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
15
15
16
16
### Changed
17
17
18
+
- Changed the order of `backward`, `step`, `zero_grad` to `zero_grad`, `backward`, `step` ([#6147](https://github.com/PyTorchLightning/pytorch-lightning/pull/6147))
19
+
18
20
19
21
### Deprecated
20
22
@@ -30,6 +32,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
30
32
- Fixed multiple early stopping callbacks ([#6197](https://github.com/PyTorchLightning/pytorch-lightning/pull/6197))
31
33
32
34
35
+
- Fixed LBFGS optimizer support which didn't converge in automatic optimization ([#6147](https://github.com/PyTorchLightning/pytorch-lightning/pull/6147))
36
+
37
+
33
38
- Prevent `WandbLogger` from dropping values ([#5931](https://github.com/PyTorchLightning/pytorch-lightning/pull/5931))
Copy file name to clipboardExpand all lines: docs/source/common/optimizers.rst
+39-30Lines changed: 39 additions & 30 deletions
Original file line number
Diff line number
Diff line change
@@ -23,32 +23,31 @@ to manually manage the optimization process. To do so, do the following:
23
23
24
24
* Override your LightningModule ``automatic_optimization`` property to return ``False``
25
25
* Drop or ignore the optimizer_idx argument
26
-
* Use `self.manual_backward(loss)` instead of `loss.backward()`.
26
+
* Use ``self.manual_backward(loss)`` instead of ``loss.backward()``.
27
27
28
-
.. note:: This is only recommended for experts who need ultimate flexibility. Lightning will handle only precision and accelerators logic. The users are left with zero_grad, accumulated_grad_batches, model toggling, etc..
28
+
.. note:: This is only recommended for experts who need ultimate flexibility. Lightning will handle only precision and accelerators logic. The users are left with ``optimizer.zero_grad()``, gradient accumulation, model toggling, etc..
29
29
30
-
.. warning:: Before 1.2, ``optimzer.step`` was calling ``zero_grad`` internally. From 1.2, it is left to the users expertize.
30
+
.. warning:: Before 1.2, ``optimzer.step`` was calling ``optimizer.zero_grad()`` internally. From 1.2, it is left to the users expertize.
31
31
32
32
.. tip:: To perform ``accumulate_grad_batches`` with one optimizer, you can do as such.
33
33
34
34
.. tip:: ``self.optimizers()`` will return ``LightningOptimizer`` objects. You can access your own optimizer with ``optimizer.optimizer``. However, if you use your own optimizer to perform a step, Lightning won't be able to support accelerators and precision for you.
.. tip:: It is a good practice to provide the optimizer with a ``closure`` function that performs a ``forward`` and ``backward`` pass of your model. It is optional for most optimizers, but makes your code compatible if you switch to an optimizer which requires a closure.
50
+
.. tip:: It is a good practice to provide the optimizer with a ``closure`` function that performs a ``forward`` and ``backward`` pass of your model. It is optional for most optimizers, but makes your code compatible if you switch to an optimizer which requires a closure. See also `the PyTorch docs <https://pytorch.org/docs/stable/optim.html#optimizer-step-closure>`_.
52
51
53
52
Here is the same example as above using a ``closure``.
54
53
@@ -71,7 +70,6 @@ Here is the same example as above using a ``closure``.
71
70
.. code-block:: python
72
71
73
72
# Scenario for a GAN.
74
-
75
73
deftraining_step(...):
76
74
opt_gen, opt_dis =self.optimizers()
77
75
@@ -137,8 +135,12 @@ Here is an example on how to use it:
137
135
138
136
Automatic optimization
139
137
======================
140
-
With Lightning most users don't have to think about when to call .backward(), .step(), .zero_grad(), since
141
-
Lightning automates that for you.
138
+
With Lightning most users don't have to think about when to call ``.zero_grad()``, ``.backward()`` and ``.step()``
139
+
since Lightning automates that for you.
140
+
141
+
.. warning::
142
+
Before 1.2.2, ``.zero_grad()`` was called after ``.backward()`` and ``.step()`` internally.
143
+
From 1.2.2, Lightning calls ``.zero_grad()`` before ``.backward()``.
142
144
143
145
Under the hood Lightning does the following:
144
146
@@ -147,33 +149,33 @@ Under the hood Lightning does the following:
147
149
for epoch in epochs:
148
150
for batch in data:
149
151
loss = model.training_step(batch, batch_idx, ...)
152
+
optimizer.zero_grad()
150
153
loss.backward()
151
154
optimizer.step()
152
-
optimizer.zero_grad()
153
155
154
-
forschedulerinschedulers:
155
-
scheduler.step()
156
+
forlr_schedulerinlr_schedulers:
157
+
lr_scheduler.step()
156
158
157
159
In the case of multiple optimizers, Lightning does the following:
158
160
159
161
.. code-block:: python
160
162
161
163
for epoch in epochs:
162
-
for batch in data:
163
-
for opt in optimizers:
164
-
disable_grads_for_other_optimizers()
165
-
train_step(opt)
166
-
opt.step()
164
+
for batch in data:
165
+
for opt in optimizers:
166
+
loss = model.training_step(batch, batch_idx, optimizer_idx)
167
+
opt.zero_grad()
168
+
loss.backward()
169
+
opt.step()
167
170
168
-
forschedulerinschedulers:
169
-
scheduler.step()
171
+
forlr_schedulerinlr_schedulers:
172
+
lr_scheduler.step()
170
173
171
174
172
175
Learning rate scheduling
173
176
------------------------
174
-
Every optimizer you use can be paired with any `LearningRateScheduler <https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate>`_.
175
-
In the basic use-case, the scheduler (or multiple schedulers) should be returned as the second output from the ``.configure_optimizers``
176
-
method:
177
+
Every optimizer you use can be paired with any `Learning Rate Scheduler <https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate>`_.
178
+
In the basic use-case, the scheduler (or multiple schedulers) should be returned as the second output from the ``.configure_optimizers`` method:
177
179
178
180
.. testcode::
179
181
@@ -262,7 +264,7 @@ returned as a dict which can contain the following keywords:
262
264
263
265
Use multiple optimizers (like GANs)
264
266
-----------------------------------
265
-
To use multiple optimizers return > 1 optimizers from :meth:`pytorch_lightning.core.LightningModule.configure_optimizers`
267
+
To use multiple optimizers return two or more optimizers from :meth:`pytorch_lightning.core.LightningModule.configure_optimizers`
266
268
267
269
.. testcode::
268
270
@@ -283,13 +285,15 @@ Lightning will call each optimizer sequentially:
283
285
.. code-block:: python
284
286
285
287
for epoch in epochs:
286
-
for batch in data:
287
-
for opt in optimizers:
288
-
train_step(opt)
289
-
opt.step()
288
+
for batch in data:
289
+
for opt in optimizers:
290
+
loss = train_step(batch, batch_idx, optimizer_idx)
291
+
opt.zero_grad()
292
+
loss.backward()
293
+
opt.step()
290
294
291
-
forschedulerinschedulers:
292
-
scheduler.step()
295
+
forlr_schedulerinlr_schedulers:
296
+
lr_scheduler.step()
293
297
294
298
----------
295
299
@@ -334,7 +338,7 @@ Here we add a learning-rate warm up
334
338
# update params
335
339
optimizer.step(closure=closure)
336
340
337
-
.. note:: The default ``optimizer_step`` is relying on the internal ``LightningOptimizer`` to properly perform a step. It handles TPUs, AMP, accumulate_grad_batches, zero_grad, and much more ...
341
+
.. note:: The default ``optimizer_step`` is relying on the internal ``LightningOptimizer`` to properly perform a step. It handles TPUs, AMP, accumulate_grad_batches and much more ...
338
342
339
343
.. testcode::
340
344
@@ -364,6 +368,11 @@ Using the closure functions for optimization
364
368
365
369
When using optimization schemes such as LBFGS, the `second_order_closure` needs to be enabled. By default, this function is defined by wrapping the `training_step` and the backward steps as follows
366
370
371
+
.. warning::
372
+
Before 1.2.2, ``.zero_grad()`` was called outside the closure internally.
373
+
From 1.2.2, the closure calls ``.zero_grad()`` inside, so there is no need to define your own closure
374
+
when using similar optimizers to :class:`torch.optim.LBFGS` which requires reevaluation of the loss with the closure in ``optimizer.step()``.
0 commit comments