-
Notifications
You must be signed in to change notification settings - Fork 137
Change momentum and epsilon properties to scalars. #525
Conversation
I will trigger a full Kokoro build to check if other repositories (tensorflow/swift-models, fastai/swiftai) are broken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scalar
is a generic parameter. It will turn BatchNorm
into an address-only type, which will be lowered as a SIL address type. In the meantime, the BatchNorm.TangentVector
is loadable and will lowered to a SIL object type. This will break the current assumptions in the differentiation transform.
I thought the same, but I couldn't reproduce a crash after trying a few tests. import TensorFlow
// With modified `BatchNorm` from this patch.
func foo<T>(_ layer: BatchNorm<T>, _ x: Tensor<T>) -> Tensor<T> { layer(x) }
func bar<T>(_ layer: BatchNorm<T>, x: Tensor<T>) {
_ = pullback(at: layer, x, in: foo)
} |
I don't yet have a reproducer, but I'll look at swift-models and fast.ai notebooks test results. I'd be wary of turning this on just because no easy reproducer is found. It is technically pretty clear that lowering isn't able to handle such cases. |
CI failed because tensorflow/swift-models requires changes:
Not pursuing further for now due to low-priority. swiftlang/swift#27298 will fix TF-625. |
One possible reproducer at the SIL level is the following: %f = function_ref @BatchNorm.callAsFunction(_:)
%f' = differentiable_function %f : $(*BatchNorm<Scalar>, Tensor<Scalar>) -> Tensor<Scalar>
%f_vjp = differentiable_function_extract [vjp] %f' ... after canonicalization: %f = function_ref @BatchNorm.callAsFunction(_:)
%f' = differentiable_function %f : $(@in_guaranteed BatchNorm<Scalar>, @guaranteed Tensor<Scalar>) -> @owned Tensor<Scalar> with {
%f_jvp : $(@in BatchNorm<Scalar>, @guaranteed Tensor<Scalar>) -> (@owned Tensor<Scalar>, (@guaranteed BatchNorm<Scalar>.TangentVector, @guaranteed Tensor<Scalar>) -> Tensor<Scalar>),
%f_vjp : $(@in_guaranteed BatchNorm<Scalar>, @guaranteed Tensor<Scalar>) -> (@owned Tensor<Scalar>, (@guaranteed Tensor<Scalar>) -> (@owned BatchNorm<Scalar>.TangentVector, @owned Tensor<Scalar>) }
%f_vjp = differentiable_function_extract [vjp] %f' According to the It's possible that the current transform happens to handle this case despite formal type mismatches. For example, the transform sometimes checks the type of the actual SIL values instead of formal types, which is very suboptimal. If pullback-based tests are not failing, it's possible that a differential-based test will trigger this. |
Notes:
|
Change the following properties from `Scalar` to `Tensor<Scalar>`. - `BatchNorm.momentum` - `BatchNorm.epsilon` - `LayerNorm.epsilon` Semantically, these properties are always scalars. Note: this will be an API breaking change in Swift for TensorFlow 0.6. Deprecating the other `BatchNorm` and `LayerNorm` initializers is tricky because it causes ambiguity problems.
Note: this will be an API breaking change in Swift for TensorFlow 0.6. Deprecating the old
|
Change the following properties from
Scalar
toTensor<Scalar>
.BatchNorm.momentum
BatchNorm.epsilon
LayerNorm.epsilon
Semantically, these properties are always scalars.
Note: this will be an API breaking change in Swift for TensorFlow 0.6.
Deprecating the other
BatchNorm
andLayerNorm
initializers is trickybecause it causes ambiguity problems.