-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Add variance()
and variance(squeezingAxes:)
.
#23811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
// TODO: Fix variance derivative calculation. | ||
// PyTorch has: `expected == [[-0.75, -0.25], [ 0.25, 0.75]]`. | ||
let input: Tensor<Float> = [[1, 2], [3, 4]] | ||
let expected = Tensor<Float>(repeating: -0.75, shape: [2, 2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our variance
ops are implemented using mean
and squared
- no custom derivatives for variance
.
However, the derivatives don't seem to match Python:
import torch
import torch.nn.functional as F
x = torch.tensor([[1, 2], [3, 4]], dtype=torch.float)
x.requires_grad = True
y = x.var(unbiased=False)
y.backward()
print(x)
# tensor([[1., 2.],
# [3., 4.]], requires_grad=True)
print(y)
# tensor(1.2500, grad_fn=<VarBackward0>)
print(x.grad)
# tensor([[-0.7500, -0.2500],
# [ 0.2500, 0.7500]])
# We are not consistent with this.
Any ideas on fixing on this? Is a custom variance
derivative necessary?
(I would find it strange if a custom derivative is necessary for correctness.)
Seeking help: cc @jekbradbury @jph00 @sgugger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyTorch has a custom var_backward
implementation:
Tensor var_backward(const Tensor & grad, const Tensor & self, bool unbiased) {
return (2.0 / (self.numel() - unbiased)) * grad * (self - self.mean());
}
Tensor var_backward(Tensor grad, const Tensor & self, IntArrayRef dim, bool unbiased, bool keepdim) {
if (self.dim() == 0) {
return var_backward(grad, self, unbiased);
}
if (!keepdim && self.dim() > 1) {
grad = unsqueeze_multiple(grad, dim, self.sizes().size());
}
return (2.0 / (_safe_size(self.sizes(), dim) - unbiased)) * grad * (self - self.mean(dim, true));
}
I'll try to port it to Swift.
I'm still concerned about derivative correctness - a custom derivative shouldn't be necessary for correctness, especially because variance
isn't super complicated (it just calls mean
and squared
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it's not something that @sgugger or I have looked at before so we don't have any particular insights here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we hand-calculate and verify a simple case? It'd be highly preferable to not make it a primitive differentiable function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyTorch is correct and doing
let xGrad = gradient(at: x) {x in return (x - x.mean()).squared().mean()}
or
let xGrad = gradient(at: x) {x in return {x in return x.variance(alongAxes: 0,1)}}
gives me the same result as PyTorch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PyTorch gradient is correct. This Swift version also gives the right results:
@differentiable
func variance(_ x: Tensor<Float>) -> Tensor<Float> {
let mean = x.mean()
let squaredDiff = (x - mean).squared()
return squaredDiff.mean()
}
x.product(squeezingAxes: 0).toHost(shape: []).array) | ||
expectEqual(ShapedArray(shape: [1, 5], scalars: [1, 4, 9, 16, 25]), | ||
x.product(alongAxes: 0).toHost(shape: []).array) | ||
expectEqual(Tensor(30), x.sum().toHost(shape: [])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the arguments to toHost(shape:)
don't seem quite right (it shouldn't always be []
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I'll remove the toHost
calls to force us to re-evaluate this bug.
Reorganize reduction ops: `sum`, `product`, `mean`, `variance`.
Todo: fix variance derivatives. They are currently inconsistent with PyTorch.
2964e58
to
d8aeafc
Compare
@swift-ci Please test tensorflow |
* Add `variance()` and `variance(squeezingAxes:)`. * Reorganize reduction ops: `sum`, `product`, `mean`, `variance`.
variance()
andvariance(squeezingAxes:)
.squeezingAxes:
versions ofsum
andmean
.Todos:
variance
ops.