Skip to content
This repository was archived by the owner on Jul 1, 2023. It is now read-only.

Commit 9a47e3c

Browse files
joaogui1BradLarsonBart Chrzaszcz
authored
[WIP] Equations for losses (#579)
* Draft equations for common losses * Better wording Co-Authored-By: Brad Larson <[email protected]> * Fix typo Co-Authored-By: Brad Larson <[email protected]> * Fix typo Co-Authored-By: Brad Larson <[email protected]> * Fix typo Co-Authored-By: Brad Larson <[email protected]> * Fix typo Co-Authored-By: Brad Larson <[email protected]> * Fix typo Co-Authored-By: Brad Larson <[email protected]> * All non categorical losses * Combatibility with tf2.x documentation * hinge -> Hinge Co-Authored-By: Bart Chrzaszcz <[email protected]> * Fix typo Co-Authored-By: Bart Chrzaszcz <[email protected]> * Hinge -> hinge Co-Authored-By: Bart Chrzaszcz <[email protected]> * Remove extra space Co-Authored-By: Bart Chrzaszcz <[email protected]> * Add spaces for consistency Co-Authored-By: Bart Chrzaszcz <[email protected]> * Add spaces for consistency Co-Authored-By: Bart Chrzaszcz <[email protected]> * L losses draft and Hinge losses * mean errors * cosh and poisson * crossentropies and KL divergence * L* losses and hubber * Typo Co-Authored-By: Brad Larson <[email protected]> Co-authored-by: Brad Larson <[email protected]> Co-authored-by: Bart Chrzaszcz <[email protected]>
1 parent f2acd6c commit 9a47e3c

File tree

1 file changed

+45
-23
lines changed

1 file changed

+45
-23
lines changed

Sources/TensorFlow/Loss.swift

Lines changed: 45 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,8 @@
1212
// See the License for the specific language governing permissions and
1313
// limitations under the License.
1414

15-
/// Returns the L1 loss between predictions and expectations.
15+
/// Computes the L1 loss between `expected` and `predicted`.
16+
/// `loss = reduction(abs(expected - predicted))`
1617
///
1718
/// - Parameters:
1819
/// - predicted: Predicted outputs from a neural network.
@@ -27,7 +28,8 @@ public func l1Loss<Scalar: TensorFlowFloatingPoint>(
2728
reduction(abs(expected - predicted))
2829
}
2930

30-
/// Returns the L2 loss between predictions and expectations.
31+
/// Computes the L2 loss between `expected` and `predicted`.
32+
/// `loss = reduction(square(expected - predicted))`
3133
///
3234
/// - Parameters:
3335
/// - predicted: Predicted outputs from a neural network.
@@ -42,7 +44,8 @@ public func l2Loss<Scalar: TensorFlowFloatingPoint>(
4244
reduction((expected - predicted).squared())
4345
}
4446

45-
/// Returns the mean absolute error between predictions and expectations.
47+
/// Computes the mean of absolute difference between labels and predictions.
48+
/// `loss = mean(abs(expected - predicted))`
4649
///
4750
/// - Parameters:
4851
/// - predicted: Predicted outputs from a neural network.
@@ -55,7 +58,8 @@ public func meanAbsoluteError<Scalar: TensorFlowFloatingPoint>(
5558
l1Loss(predicted: predicted, expected: expected, reduction: _mean)
5659
}
5760

58-
/// Returns the mean squared error between predictions and expectations.
61+
/// Computes the mean of squares of errors between labels and predictions.
62+
/// `loss = mean(square(expected - predicted))`
5963
///
6064
/// - Parameters:
6165
/// - predicted: Predicted outputs from a neural network.
@@ -68,7 +72,8 @@ public func meanSquaredError<Scalar: TensorFlowFloatingPoint>(
6872
l2Loss(predicted: predicted, expected: expected, reduction: _mean)
6973
}
7074

71-
/// Returns the mean squared logarithmic error between predictions and expectations.
75+
/// Computes the mean squared logarithmic error between `predicted` and `expected`
76+
/// `loss = square(log(expected) - log(predicted))`
7277
///
7378
/// - Note: Negative tensor entries will be clamped at `0` to avoid undefined
7479
/// logarithmic behavior, as `log(_:)` is undefined for negative reals.
@@ -86,7 +91,8 @@ public func meanSquaredLogarithmicError<Scalar: TensorFlowFloatingPoint>(
8691
return l2Loss(predicted: logPredicted, expected: logExpected, reduction: _mean)
8792
}
8893

89-
/// Returns the mean absolute percentage error between predictions and expectations.
94+
/// Computes the mean absolute percentage error between `predicted` and `expected`.
95+
/// `loss = 100 * mean(abs((expected - predicted) / abs(expected)))`
9096
///
9197
/// - Parameters:
9298
/// - predicted: Predicted outputs from a neural network.
@@ -99,7 +105,9 @@ public func meanAbsolutePercentageError<Scalar: TensorFlowFloatingPoint>(
99105
100 * abs((expected - predicted) / abs(expected)).mean()
100106
}
101107

102-
/// Returns the hinge loss between predictions and expectations.
108+
/// Computes the hinge loss between `predicted` and `expected`.
109+
/// `loss = reduction(max(0, 1 - predicted * expected))`
110+
/// `expected` values are expected to be -1 or 1.
103111
///
104112
/// - Parameters:
105113
/// - predicted: Predicted outputs from a neural network.
@@ -114,7 +122,9 @@ public func hingeLoss<Scalar: TensorFlowFloatingPoint>(
114122
reduction(max(Tensor(0), Tensor(1) - expected * predicted))
115123
}
116124

117-
/// Returns the squared hinge loss between predictions and expectations.
125+
/// Computes the squared hinge loss between `predicted` and `expected`.
126+
/// `loss = reduction(square(max(0, 1 - predicted * expected)))`
127+
/// `expected` values are expected to be -1 or 1.
118128
///
119129
/// - Parameters:
120130
/// - predicted: Predicted outputs from a neural network.
@@ -129,7 +139,10 @@ public func squaredHingeLoss<Scalar: TensorFlowFloatingPoint>(
129139
reduction(hingeLoss(predicted: predicted, expected: expected).squared())
130140
}
131141

132-
/// Returns the hinge loss between predictions and expectations.
142+
/// Computes the categorical hinge loss between `predicted` and `expected`.
143+
/// `loss = maximum(negative - positive + 1, 0)`
144+
/// where `negative = max((1 - expected) * predicted)` and
145+
/// `positive = sum(predicted * expected)`
133146
///
134147
/// - Parameters:
135148
/// - predicted: Predicted outputs from a neural network.
@@ -146,8 +159,9 @@ public func categoricalHingeLoss<Scalar: TensorFlowFloatingPoint>(
146159
return reduction(max(Tensor(0), negative - positive + Tensor(1)))
147160
}
148161

149-
/// Returns the logarithm of the hyperbolic cosine of the error between predictions and
150-
/// expectations.
162+
/// Computes the logarithm of the hyperbolic cosine of the prediction error.
163+
/// `logcosh = log((exp(x) + exp(-x))/2)`,
164+
/// where x is the error `predicted - expected`
151165
///
152166
/// - Parameters:
153167
/// - predicted: Predicted outputs from a neural network.
@@ -163,7 +177,9 @@ public func logCoshLoss<Scalar: TensorFlowFloatingPoint>(
163177
return reduction(x + softplus(Tensor(-2) * x) - log(Tensor(2)))
164178
}
165179

166-
/// Returns the Poisson loss between predictions and expectations.
180+
/// Computes the Poisson loss between predicted and expected
181+
/// The Poisson loss is the mean of the elements of the `Tensor`
182+
/// `predicted - expected * log(predicted)`.
167183
///
168184
/// - Parameters:
169185
/// - predicted: Predicted outputs from a neural network.
@@ -178,8 +194,8 @@ public func poissonLoss<Scalar: TensorFlowFloatingPoint>(
178194
reduction(predicted - expected * log(predicted))
179195
}
180196

181-
/// Returns the Kullback-Leibler divergence (KL divergence) between between expectations and
182-
/// predictions. Given two distributions `p` and `q`, KL divergence computes `p * log(p / q)`.
197+
/// Computes Kullback-Leibler divergence loss between `expected` and `predicted`.
198+
/// `loss = reduction(expected * log(expected / predicted))`
183199
///
184200
/// - Parameters:
185201
/// - predicted: Predicted outputs from a neural network.
@@ -194,7 +210,10 @@ public func kullbackLeiblerDivergence<Scalar: TensorFlowFloatingPoint>(
194210
reduction(expected * log(expected / predicted))
195211
}
196212

197-
/// Returns the softmax cross entropy (categorical cross entropy) between logits and labels.
213+
/// Computes the sparse softmax cross entropy (categorical cross entropy) between logits and labels.
214+
/// Use this crossentropy loss function when there are two or more label classes.
215+
/// We expect labels to be provided as integers. There should be `# classes`
216+
/// floating point values per feature for `logits` and a single floating point value per feature for `expected`.
198217
///
199218
/// - Parameters:
200219
/// - logits: One-hot encoded outputs from a neural network.
@@ -228,7 +247,10 @@ func _vjpSoftmaxCrossEntropyHelper<Scalar: TensorFlowFloatingPoint>(
228247
return (loss, { $0.expandingShape(at: -1) * grad })
229248
}
230249

231-
/// Returns the softmax cross entropy (categorical cross entropy) between logits and labels.
250+
/// Computes the sparse softmax cross entropy (categorical cross entropy) between logits and labels.
251+
/// Use this crossentropy loss function when there are two or more label classes.
252+
/// We expect labels to be provided provided in a `one_hot` representation.
253+
/// There should be `# classes` floating point values per feature.
232254
///
233255
/// - Parameters:
234256
/// - logits: Unscaled log probabilities from a neural network.
@@ -263,10 +285,10 @@ func _vjpSoftmaxCrossEntropyHelper<Scalar: TensorFlowFloatingPoint>(
263285
return (loss, { $0.expandingShape(at: -1) * grad })
264286
}
265287

266-
/// Returns the sigmoid cross entropy (binary cross entropy) between logits and labels.
267-
///
268-
/// The reduction is reduced over all elements. If reduced over batch size is intended, please
269-
/// consider to scale the loss.
288+
/// Computes the sigmoid cross entropy (binary cross entropy) between logits and labels.
289+
/// Use this cross-entropy loss when there are only two label classes (assumed to
290+
/// be 0 and 1). For each example, there should be a single floating-point value
291+
/// per prediction.
270292
///
271293
/// - Parameters:
272294
/// - logits: The unscaled output of a neural network.
@@ -284,10 +306,10 @@ public func sigmoidCrossEntropy<Scalar: TensorFlowFloatingPoint>(
284306
return reduction(maxLogitsWithZero - logits * labels + log1p(exp(-negAbsLogits)))
285307
}
286308

287-
/// Returns the Huber loss between predictions and expectations.
309+
/// Computes the Huber loss between `predicted` and `expected`.
288310
///
289-
/// For each value `x` in the difference `expected - predicted`, the loss is:
290-
/// - `0.5 * x^2` if `abs(x) <= δ`.
311+
/// For each value `x` in `error = expected - predicted`:
312+
/// - `0.5 * x^2` if `|x| <= δ`.
291313
/// - `0.5 * δ^2 + δ * (|x| - δ)` otherwise.
292314
///
293315
/// - Source: [Uncyclopedia article](https://en.wikipedia.org/wiki/Huber_loss).

0 commit comments

Comments
 (0)