Skip to content
This repository was archived by the owner on Jul 1, 2023. It is now read-only.

Improved derivative performance for broadcasted operations. #142

Merged
merged 4 commits into from
May 31, 2019

Conversation

eaplatanios
Copy link
Contributor

Re-implementation of swiftlang/swift#24408.

@rxwei the reason you were getting errors is because you were differentiating a tensor-valued function (as opposed to scalar-valued) and you were assuming that the gradient seed has the same shape as the operation result. I fixed that by broadcasting the gradient seeds before computing the gradients.

@rxwei
Copy link
Contributor

rxwei commented May 30, 2019

the reason you were getting errors is because you were differentiating a tensor-valued function (as opposed to scalar-valued) and you were assuming that the gradient seed has the same shape as the operation result. I fixed that by broadcasting the gradient seeds before computing the gradients.

We have been assuming that seeds are always shape-compatible (if not same-shape) as the original result, and existing implementations have not been doing broadcasting to the shape of the original result. I think there are other reasons for it, which may lead to a different solution.

@eaplatanios
Copy link
Contributor Author

That's interesting! I actually traced the test failure and saw that the seed shape was not matching the result shape when the add VJP was being called. Given that this was being directly invoked by the gradient function, could there be a bug somewhere more core to AD support?

}
let x = Tensor<Float>(ones: [1, 2, 1, 4])
let y = Tensor<Float>(ones: [4, 1, 3, 1])
let (dx, dy) = gradient(at: x, y, in: foo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test case, the use of gradient(at:in:) is not valid because gradient is only mathematically defined for functions that return a scalar. We can either make foo(_:_:) to a sum() or use pullback(at:in:).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I thought. This was actually the failing case in the other PR and that's why I copied it here but wasn't sure of the semantics of gradient. I'll add a call to sum() and remove the seed broadcasts then. :)

@rxwei
Copy link
Contributor

rxwei commented May 30, 2019

Given that this was being directly invoked by the gradient function, could there be a bug somewhere more core to AD support?

AD is fully shape-agnostic, and there's some mathematical consistency to differential operators like gradient(at:in:) in that it should fail when the function's result is not semantically a scalar. I think a proper solution is to change the definition of gradient(at:in:), gradient(at:_:in:), Tensor.gradient(in:), valueWithGradient(at:in:), valueWithGradient(at:_:in:)and Tensor.valueWithGradient(in:) (all defined in DifferentialOperators.swift) to call precondition(_:_:file:line:) on the original function's result, for example:

@inlinable
public func valueWithGradient<T, R>(
    at x: T,
    in f: @differentiable (T) -> Tensor<R>
) -> (value: Tensor<R>, gradient: T.TangentVector)
    where T: Differentiable, R: TensorFlowFloatingPoint {
    let (y, pullback) = valueWithPullback(at: x, in: f)
    precondition(y.rank == 0)
    return (y, pullback(Tensor<R>(1)))
}

@eaplatanios
Copy link
Contributor Author

I also prefer that. What's the cost of preconditions? Are they removed when compiling with optimizations enabled?

@rxwei
Copy link
Contributor

rxwei commented May 30, 2019

I also prefer that. What's the cost of preconditions? Are they removed when compiling with optimizations enabled?

  -Onone -O -Ounchecked
assert() YES NO NO
assertionFailure() YES NO NO**
precondition() YES YES NO
preconditionFailure() YES YES YES**
fatalError()* YES YES YES

From this blog post.

@eaplatanios
Copy link
Contributor Author

Awesome, thanks! :) I'll be away for ~30' but will make those changes once I get back.

@eaplatanios
Copy link
Contributor Author

I'm looking into this now. I can add the precondition for valueWithGradient, but not for gradient as I don't think it ever evaluates f directly.

@eaplatanios
Copy link
Contributor Author

I made all necessary changes and all tests pass locally.

@rxwei
Copy link
Contributor

rxwei commented May 31, 2019

I'm looking into this now. I can add the precondition for valueWithGradient, but not for gradient as I don't think it ever evaluates f directly.

It actually does because it calls pullback(at:in:) which calls valueWithPullback(at:in:). You can change gradient(at:in:) to call valueWithGradient(at:in:) and drop the value after precondition. Don't worry about performance since we can optimize those things away later.

@rxwei rxwei added the enhancement New feature or request label May 31, 2019
@rxwei rxwei self-assigned this May 31, 2019
@eaplatanios
Copy link
Contributor Author

Sounds good. Done in the latest commit. :)

@rxwei
Copy link
Contributor

rxwei commented May 31, 2019

All tests pass. Thank you!

@rxwei rxwei merged commit f194110 into tensorflow:master May 31, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants