Skip to content

Add variance() and variance(squeezingAxes:). #23811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 9, 2019

Conversation

dan-zheng
Copy link
Contributor

  • Add variance() and variance(squeezingAxes:).
  • Add derivatives for squeezingAxes: versions of sum and mean.
  • Test variance.
  • Test variance derivatives.

Todos:

  • Add "bias correction" flag to variance ops.

@dan-zheng dan-zheng added the tensorflow This is for "tensorflow" branch PRs. label Apr 5, 2019
@dan-zheng dan-zheng requested review from rxwei and jekbradbury April 5, 2019 06:24
// TODO: Fix variance derivative calculation.
// PyTorch has: `expected == [[-0.75, -0.25], [ 0.25, 0.75]]`.
let input: Tensor<Float> = [[1, 2], [3, 4]]
let expected = Tensor<Float>(repeating: -0.75, shape: [2, 2])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our variance ops are implemented using mean and squared - no custom derivatives for variance.

However, the derivatives don't seem to match Python:

import torch
import torch.nn.functional as F

x = torch.tensor([[1, 2], [3, 4]], dtype=torch.float)
x.requires_grad = True
y = x.var(unbiased=False)
y.backward()

print(x)
# tensor([[1., 2.],
#         [3., 4.]], requires_grad=True)

print(y)
# tensor(1.2500, grad_fn=<VarBackward0>)

print(x.grad)
# tensor([[-0.7500, -0.2500],
#         [ 0.2500,  0.7500]])
# We are not consistent with this.

Any ideas on fixing on this? Is a custom variance derivative necessary?
(I would find it strange if a custom derivative is necessary for correctness.)

Seeking help: cc @jekbradbury @jph00 @sgugger

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyTorch has a custom var_backward implementation:

Tensor var_backward(const Tensor & grad, const Tensor & self, bool unbiased) {
  return (2.0 / (self.numel() - unbiased)) * grad * (self - self.mean());
}

Tensor var_backward(Tensor grad, const Tensor & self, IntArrayRef dim, bool unbiased, bool keepdim) {
  if (self.dim() == 0) {
    return var_backward(grad, self, unbiased);
  }
  if (!keepdim && self.dim() > 1) {
    grad = unsqueeze_multiple(grad, dim, self.sizes().size());
  }
  return (2.0 / (_safe_size(self.sizes(), dim) - unbiased)) * grad * (self - self.mean(dim, true));
}

I'll try to port it to Swift.

I'm still concerned about derivative correctness - a custom derivative shouldn't be necessary for correctness, especially because variance isn't super complicated (it just calls mean and squared).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it's not something that @sgugger or I have looked at before so we don't have any particular insights here.

Copy link
Contributor

@rxwei rxwei Apr 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we hand-calculate and verify a simple case? It'd be highly preferable to not make it a primitive differentiable function.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyTorch is correct and doing

let xGrad = gradient(at: x) {x in return (x - x.mean()).squared().mean()}

or

let xGrad = gradient(at: x) {x in return {x in return x.variance(alongAxes: 0,1)}}

gives me the same result as PyTorch.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PyTorch gradient is correct. This Swift version also gives the right results:

@differentiable
func variance(_ x: Tensor<Float>) -> Tensor<Float> {
    let mean = x.mean()
    let squaredDiff = (x - mean).squared()
    return squaredDiff.mean()
}

x.product(squeezingAxes: 0).toHost(shape: []).array)
expectEqual(ShapedArray(shape: [1, 5], scalars: [1, 4, 9, 16, 25]),
x.product(alongAxes: 0).toHost(shape: []).array)
expectEqual(Tensor(30), x.sum().toHost(shape: []))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if reduction ops still fail on TPU. The #if !TPU condition was added months ago (b/111815968).

cc @pschuh @bgogul: I wonder if you have any clue if this test still fails on TPU?
Or if toHost(shape: ...) is necessary everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the arguments to toHost(shape:) don't seem quite right (it shouldn't always be []).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I'll remove the toHost calls to force us to re-evaluate this bug.

Reorganize reduction ops: `sum`, `product`, `mean`, `variance`.
Todo: fix variance derivatives. They are currently inconsistent with PyTorch.
@dan-zheng
Copy link
Contributor Author

@swift-ci Please test tensorflow

@dan-zheng dan-zheng merged commit ec50086 into swiftlang:tensorflow Apr 9, 2019
@dan-zheng dan-zheng deleted the tensor-variance branch April 9, 2019 10:20
rxwei pushed a commit to rxwei/swift that referenced this pull request May 11, 2019
* Add `variance()` and `variance(squeezingAxes:)`.
* Reorganize reduction ops: `sum`, `product`, `mean`, `variance`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tensorflow This is for "tensorflow" branch PRs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants