Improved derivative performance for broadcasted operations. #142

eaplatanios · 2019-05-30T22:36:40Z

Re-implementation of swiftlang/swift#24408.

@rxwei the reason you were getting errors is because you were differentiating a tensor-valued function (as opposed to scalar-valued) and you were assuming that the gradient seed has the same shape as the operation result. I fixed that by broadcasting the gradient seeds before computing the gradients.

rxwei · 2019-05-30T22:58:32Z

the reason you were getting errors is because you were differentiating a tensor-valued function (as opposed to scalar-valued) and you were assuming that the gradient seed has the same shape as the operation result. I fixed that by broadcasting the gradient seeds before computing the gradients.

We have been assuming that seeds are always shape-compatible (if not same-shape) as the original result, and existing implementations have not been doing broadcasting to the shape of the original result. I think there are other reasons for it, which may lead to a different solution.

eaplatanios · 2019-05-30T23:00:18Z

That's interesting! I actually traced the test failure and saw that the seed shape was not matching the result shape when the add VJP was being called. Given that this was being directly invoked by the gradient function, could there be a bug somewhere more core to AD support?

rxwei · 2019-05-30T23:11:03Z

Tests/TensorFlowTests/OperatorTests/MathTests.swift

+	  }
+	  let x = Tensor<Float>(ones: [1, 2, 1, 4])
+	  let y = Tensor<Float>(ones: [4, 1, 3, 1])
+	  let (dx, dy) = gradient(at: x, y, in: foo)


In this test case, the use of gradient(at:in:) is not valid because gradient is only mathematically defined for functions that return a scalar. We can either make foo(_:_:) to a sum() or use pullback(at:in:).

That's what I thought. This was actually the failing case in the other PR and that's why I copied it here but wasn't sure of the semantics of gradient. I'll add a call to sum() and remove the seed broadcasts then. :)

rxwei · 2019-05-30T23:17:12Z

Given that this was being directly invoked by the gradient function, could there be a bug somewhere more core to AD support?

AD is fully shape-agnostic, and there's some mathematical consistency to differential operators like gradient(at:in:) in that it should fail when the function's result is not semantically a scalar. I think a proper solution is to change the definition of gradient(at:in:), gradient(at:_:in:), Tensor.gradient(in:), valueWithGradient(at:in:), valueWithGradient(at:_:in:)and Tensor.valueWithGradient(in:) (all defined in DifferentialOperators.swift) to call precondition(_:_:file:line:) on the original function's result, for example:

@inlinable
public func valueWithGradient<T, R>(
    at x: T,
    in f: @differentiable (T) -> Tensor<R>
) -> (value: Tensor<R>, gradient: T.TangentVector)
    where T: Differentiable, R: TensorFlowFloatingPoint {
    let (y, pullback) = valueWithPullback(at: x, in: f)
    precondition(y.rank == 0)
    return (y, pullback(Tensor<R>(1)))
}

eaplatanios · 2019-05-30T23:18:54Z

I also prefer that. What's the cost of preconditions? Are they removed when compiling with optimizations enabled?

rxwei · 2019-05-30T23:23:23Z

I also prefer that. What's the cost of preconditions? Are they removed when compiling with optimizations enabled?

	-Onone	-O	-Ounchecked
assert()	YES	NO	NO
assertionFailure()	YES	NO	NO**
precondition()	YES	YES	NO
preconditionFailure()	YES	YES	YES**
fatalError()*	YES	YES	YES

From this blog post.

eaplatanios · 2019-05-30T23:24:12Z

Awesome, thanks! :) I'll be away for ~30' but will make those changes once I get back.

eaplatanios · 2019-05-31T00:11:05Z

I'm looking into this now. I can add the precondition for valueWithGradient, but not for gradient as I don't think it ever evaluates f directly.

eaplatanios · 2019-05-31T00:21:38Z

I made all necessary changes and all tests pass locally.

rxwei · 2019-05-31T00:55:32Z

I'm looking into this now. I can add the precondition for valueWithGradient, but not for gradient as I don't think it ever evaluates f directly.

It actually does because it calls pullback(at:in:) which calls valueWithPullback(at:in:). You can change gradient(at:in:) to call valueWithGradient(at:in:) and drop the value after precondition. Don't worry about performance since we can optimize those things away later.

eaplatanios · 2019-05-31T01:02:45Z

Sounds good. Done in the latest commit. :)

rxwei · 2019-05-31T04:50:03Z

All tests pass. Thank you!

eaplatanios added 2 commits May 30, 2019 18:24

Improved the performance of the gradients for broadcasted ops.

8f9052c

Bug fix.

1ef795c

eaplatanios mentioned this pull request May 30, 2019

[TF] Remove unbroadcast(to:) and improve derivative performance. swiftlang/swift#24408

Closed

rxwei reviewed May 30, 2019

View reviewed changes

Addressed Richard's feedback.

0a1f6a5

rxwei added the enhancement New feature or request label May 31, 2019

Added some more checks.

f561c27

rxwei self-assigned this May 31, 2019

This was referenced May 31, 2019

Enhanced the 'matmul' wrapper. #143

Merged

Added support for the 'sign' op and its VJP. #144

Merged

rxwei approved these changes May 31, 2019

View reviewed changes

rxwei merged commit f194110 into tensorflow:master May 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improved derivative performance for broadcasted operations. #142

Improved derivative performance for broadcasted operations. #142

Uh oh!

eaplatanios commented May 30, 2019

Uh oh!

rxwei commented May 30, 2019

Uh oh!

eaplatanios commented May 30, 2019

Uh oh!

rxwei May 30, 2019

Uh oh!

eaplatanios May 30, 2019

Uh oh!

rxwei commented May 30, 2019

Uh oh!

eaplatanios commented May 30, 2019

Uh oh!

rxwei commented May 30, 2019 •

edited

Loading

Uh oh!

eaplatanios commented May 30, 2019

Uh oh!

eaplatanios commented May 31, 2019

Uh oh!

eaplatanios commented May 31, 2019

Uh oh!

rxwei commented May 31, 2019

Uh oh!

eaplatanios commented May 31, 2019

Uh oh!

rxwei commented May 31, 2019

Uh oh!

Uh oh!

Improved derivative performance for broadcasted operations. #142

Improved derivative performance for broadcasted operations. #142

Uh oh!

Conversation

eaplatanios commented May 30, 2019

Uh oh!

rxwei commented May 30, 2019

Uh oh!

eaplatanios commented May 30, 2019

Uh oh!

rxwei May 30, 2019

Choose a reason for hiding this comment

Uh oh!

eaplatanios May 30, 2019

Choose a reason for hiding this comment

Uh oh!

rxwei commented May 30, 2019

Uh oh!

eaplatanios commented May 30, 2019

Uh oh!

rxwei commented May 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eaplatanios commented May 30, 2019

Uh oh!

eaplatanios commented May 31, 2019

Uh oh!

eaplatanios commented May 31, 2019

Uh oh!

rxwei commented May 31, 2019

Uh oh!

eaplatanios commented May 31, 2019

Uh oh!

rxwei commented May 31, 2019

Uh oh!

Uh oh!

rxwei commented May 30, 2019 •

edited

Loading