[docs] note ParameterOptimization is out of date (#528)

Gary Miguel · dan-zheng · web-flow · commit f9779647d430 · 2020-08-31T09:38:34.000-07:00
Also fix some minor errors.

Co-authored-by: Dan Zheng &lt;danielzheng@google.com&gt;
diff --git a/docs/ParameterOptimization.md b/docs/ParameterOptimization.md
@@ -4,6 +4,10 @@
 
 Last updated: March 2019
 
+**Note**: This document is outdated due to changes to the `Differentiable` protocol.
+
+In particular, `Differentiable.CotangentVector` was deprecated in <https://github.com/apple/swift/pull/24825> and `Differentiable.AllDifferentiableVariables` was deprecated in <https://github.com/tensorflow/swift-apis/pull/419>.
+
 ## Introduction
 
 The concept of parameter optimization is crucial for machine learning algorithms. This document explains the concept of parameters and parameter optimization, shows how TensorFlow (graph mode) and PyTorch handle parameter update, and describes the current design for Swift.
@@ -100,7 +104,7 @@ for (inout θ, dθ) in zip(parameters, gradients) {
 }
 ```
 
-We don't want to actually lower the for-loop or zip operation to TensorFlow (lowering wouldn't be straightforward or and lowered representation wouldn't be efficient). Instead, we want to fully unroll the loop into individual straight-line statements:
+We don't want to actually lower the for-loop or zip operation to TensorFlow (lowering wouldn't be straightforward and a lowered representation wouldn't be efficient). Instead, we want to fully unroll the loop into individual straight-line statements:
 
 ```swift
 // w1, w2, b1, b2: Tensor<Float>
@@ -135,7 +139,7 @@ optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
 train = optimizer.minimize(loss=loss)
 ```
 
-In the last line above: how does the optimizer determine which tensors are parameters to be minimized? This is done implicitly by examining the graph of `loss`: since the only Variables in the graph are `W` and `B`, they are determined to be the parameters and are minimized.
+In the last line above: how does the optimizer determine which tensors are parameters to be minimized? This is done implicitly by examining the graph of `loss`: since the only Variables in the graph are `W` and `b`, they are determined to be the parameters and are minimized.
 
 In Swift, TensorFlow graphs are an implementation detail and aren't visible to users: there's no way to inspect whether tensors are placeholders/constants/variables, so the TensorFlow style of implicit parameter analysis is not really suitable. With implicit parameters, it's difficult to work with parameters directly (e.g. to implement a custom optimizer for arbitrary parameters). The authors believe that parameter representation and parameter update are language-design problems and should be explicitly clear in Swift.