Skip to content
This repository was archived by the owner on Jul 1, 2023. It is now read-only.

Change momentum and epsilon properties to scalars. #525

Merged
merged 1 commit into from
Nov 7, 2019

Conversation

dan-zheng
Copy link
Member

@dan-zheng dan-zheng commented Oct 2, 2019

Change the following properties from Scalar to Tensor<Scalar>.

  • BatchNorm.momentum
  • BatchNorm.epsilon
  • LayerNorm.epsilon
    Semantically, these properties are always scalars.

Note: this will be an API breaking change in Swift for TensorFlow 0.6.
Deprecating the other BatchNorm and LayerNorm initializers is tricky
because it causes ambiguity problems.

@dan-zheng
Copy link
Member Author

I will trigger a full Kokoro build to check if other repositories (tensorflow/swift-models, fastai/swiftai) are broken.

Copy link
Contributor

@rxwei rxwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scalar is a generic parameter. It will turn BatchNorm into an address-only type, which will be lowered as a SIL address type. In the meantime, the BatchNorm.TangentVector is loadable and will lowered to a SIL object type. This will break the current assumptions in the differentiation transform.

@dan-zheng
Copy link
Member Author

dan-zheng commented Oct 2, 2019

Scalar is a generic parameter. It will turn BatchNorm into an address-only type, which will be lowered as a SIL address type. In the meantime, the BatchNorm.TangentVector is loadable and will lowered to a SIL object type. This will break the current assumptions in the differentiation transform.

I thought the same, but I couldn't reproduce a crash after trying a few tests.

import TensorFlow
// With modified `BatchNorm` from this patch.
func foo<T>(_ layer: BatchNorm<T>, _ x: Tensor<T>) -> Tensor<T> { layer(x) }
func bar<T>(_ layer: BatchNorm<T>, x: Tensor<T>) {
  _ = pullback(at: layer, x, in: foo)
}

@rxwei
Copy link
Contributor

rxwei commented Oct 2, 2019

It is not clear why BatchNorm originally defined these properties as Tensor<Scalar> since the beginning: 6ca3813

It was originally Tensor<Scalar> exactly because of the type mismatch issue. This will be fixed by TF-625.

@rxwei
Copy link
Contributor

rxwei commented Oct 2, 2019

I don't yet have a reproducer, but I'll look at swift-models and fast.ai notebooks test results. I'd be wary of turning this on just because no easy reproducer is found. It is technically pretty clear that lowering isn't able to handle such cases.

@dan-zheng
Copy link
Member Author

CI failed because tensorflow/swift-models requires changes:

/swift-models/MiniGo/Models/GoModel.swift:57:21: error: cannot invoke initializer for type 'BatchNorm<_>' with an argument list of type '(featureCount: Int, momentum: Tensor<Float>, epsilon: Tensor<Float>)'
        self.norm = BatchNorm(
                    ^
/swift-models/MiniGo/Models/GoModel.swift:57:21: note: expected an argument list of type '(featureCount: Int, axis: Int, momentum: Scalar, epsilon: Scalar)'
        self.norm = BatchNorm(
                    ^

Not pursuing further for now due to low-priority. swiftlang/swift#27298 will fix TF-625.

@rxwei
Copy link
Contributor

rxwei commented Oct 2, 2019

One possible reproducer at the SIL level is the following:

%f = function_ref @BatchNorm.callAsFunction(_:)
%f' = differentiable_function %f : $(*BatchNorm<Scalar>, Tensor<Scalar>) -> Tensor<Scalar>
%f_vjp = differentiable_function_extract [vjp] %f'

... after canonicalization:

%f = function_ref @BatchNorm.callAsFunction(_:)
%f' = differentiable_function %f : $(@in_guaranteed BatchNorm<Scalar>, @guaranteed Tensor<Scalar>) -> @owned Tensor<Scalar> with {
  %f_jvp : $(@in BatchNorm<Scalar>, @guaranteed Tensor<Scalar>) -> (@owned Tensor<Scalar>, (@guaranteed BatchNorm<Scalar>.TangentVector, @guaranteed Tensor<Scalar>) -> Tensor<Scalar>),
  %f_vjp : $(@in_guaranteed BatchNorm<Scalar>, @guaranteed Tensor<Scalar>) -> (@owned Tensor<Scalar>, (@guaranteed Tensor<Scalar>) -> (@owned BatchNorm<Scalar>.TangentVector, @owned Tensor<Scalar>) }
%f_vjp = differentiable_function_extract [vjp] %f'

According to the differentiable_function_extract type calculation rule, %f_vjp will have $(@in_guaranteed BatchNorm<Scalar>, @guaranteed Tensor<Scalar>) -> (@owned Tensor<Scalar>, (@guaranteed Tensor<Scalar>) -> (@out BatchNorm<Scalar>.TangentVector, @owned Tensor<Scalar>), which doesn't match what's in the canonicalized differentiable_function instruction.

It's possible that the current transform happens to handle this case despite formal type mismatches. For example, the transform sometimes checks the type of the actual SIL values instead of formal types, which is very suboptimal.

If pullback-based tests are not failing, it's possible that a differential-based test will trigger this.

@saeta
Copy link
Contributor

saeta commented Nov 7, 2019

Notes:

  • Change LayerNorm as well?
  • Be sure to include this in the v0.6 release notes as API breaking.
  • Also fix swift-models.

Change the following properties from `Scalar` to `Tensor<Scalar>`.
- `BatchNorm.momentum`
- `BatchNorm.epsilon`
- `LayerNorm.epsilon`
Semantically, these properties are always scalars.

Note: this will be an API breaking change in Swift for TensorFlow 0.6.
Deprecating the other `BatchNorm` and `LayerNorm` initializers is tricky
because it causes ambiguity problems.
@dan-zheng
Copy link
Member Author

LayerNorm.epsilon has also been updated.


Note: this will be an API breaking change in Swift for TensorFlow 0.6.

Deprecating the old BatchNorm and LayerNorm initializers is tricky because it causes ambiguity problems:

/Users/danielzheng/swift-apis/Tests/TensorFlowTests/LayerTests.swift:1171:23: error: ambiguous use of 'init(featureCount:axis:momentum:epsilon:)'
        let bnLayer = BatchNorm<Float>(featureCount: 5, axis: 0)
                      ^
TensorFlow.BatchNorm:12:12: note: found this candidate
    public init(featureCount: Int, axis: Int = -1, momentum: Scalar = 0.99, epsilon: Scalar = 0.001)
           ^
TensorFlow.BatchNorm:14:12: note: found this candidate
    public init(featureCount: Int, axis: Int = -1, momentum: TensorFlow.Tensor<Scalar> = Tensor(0.99), epsilon: TensorFlow.Tensor<Scalar> = Tensor(0.001))
           ^

@saeta saeta merged commit b7ba0d5 into tensorflow:master Nov 7, 2019
@dan-zheng dan-zheng deleted the batch-norm branch November 7, 2019 20:50
@dan-zheng dan-zheng changed the title Change BatchNorm momentum and epsilon to scalars. Change momentum and epsilon properties to scalars. Nov 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants