Skip to content

[AutoDiff] [stdlib] Made arrays differentiable #23183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Apr 17, 2019
Merged

[AutoDiff] [stdlib] Made arrays differentiable #23183

merged 21 commits into from
Apr 17, 2019

Conversation

eaplatanios
Copy link

This is a first attempt at making arrays differentiable.

@eaplatanios eaplatanios changed the base branch from master to tensorflow March 8, 2019 19:04
@rxwei rxwei self-requested a review March 8, 2019 19:34
@rxwei rxwei requested a review from marcrasi March 8, 2019 20:41
@rxwei rxwei added the tensorflow This is for "tensorflow" branch PRs. label Mar 8, 2019
@rxwei rxwei changed the title [TF] Made arrays differentiable [AutoDiff] [stdlib] Made arrays differentiable Mar 8, 2019
@eaplatanios
Copy link
Author

Thanks Richard! I addressed all the feedback and tests seem to pass locally on my machine! :)

@rxwei
Copy link
Contributor

rxwei commented Mar 8, 2019

Have you checked in those tests? I don’t see them in this PR. You can create a new file: test/AutodDiff/array.swift for tests.

@eaplatanios
Copy link
Author

Sorry I haven't. I'll add a few more and push in a few minutes. Quick question: is there a way to recompile and run only the newly added tests? utils/build-script --preset=tensorflow_test seems to be recompiling a lot of things and running all tests, which is killing my MacBook and is also taking forever every time I want to test.

@marcrasi
Copy link

marcrasi commented Mar 8, 2019

Quick question: is there a way to recompile and run only the newly added tests?

Yes!

To do an incremental compile, cd to build/<build dir name>/swift-<something> and run ninja. (build should be inside the parent directory of the swift repo. <build dir name> depends on what settings you used to build, and there's probably only one, unless you have built with multiple different settings.)

You can also run ninja swift to just recompile the compiler without recompiling the stdlib, to save time in cases where your changes don't affect the stdlib. But this PR is a change to the stdlib.

To run a specific test, you need to construct a lit invocation that runs that test. There's some documentation about it here: https://github.com/apple/swift/blob/master/docs/Testing.md. But doing it manually is pretty tedious, so I created a script where you can just pass it the test file and it'll automatically run lit in the correct way: https://gist.github.com/marcrasi/7493150a7247b0a43579126d3a6352de (I should really check this script into a repo and document it somewhere...)

@eaplatanios
Copy link
Author

Thanks a lot Marc! That's extremely useful! :) I made a couple simple tests, including the example you mentioned in the discussion group. I'll make sure they pass and commit them.

@rxwei
Copy link
Contributor

rxwei commented Mar 9, 2019

LGTM, thanks for pushing this through! If you can check in the tests, that'd be great!

@eaplatanios
Copy link
Author

Thanks for all the help! I'm about to commit a couple simple tests, but I can't seem to run them using the litrun script by @marcrasi . Marc, I'm getting the following error: RuntimeError: Device tests are currently only supported when the swift_test_mode is "only_non_executable". Current swift_test_mode is optimize_none.. Do you have any experience with that? I'm trying to run them from the build directory by using: ./litrun -b Ninja-RelWithDebInfoAssert ../../../swift/test/AutoDiff/array.swift.

@saeta
Copy link
Contributor

saeta commented Apr 16, 2019

Thanks for the update @marcrasi . re:#2: That's exactly the semantics that I implemented for the LayerList (that's a workaround for not having differentiable arrays.)

@marcrasi
Copy link

Here's an updated implementation that takes @rxwei's comments (above + some in-person discussion) into account: https://github.com/fastai/fastai_docs/blob/master/dev_swift/01c_array_differentiable.ipynb . (Also viewable as gist if the github ipynb viewer is broken: https://gist.github.com/marcrasi/b7618e3e7a8a8a920b73e662a40c4c7b#file-array-differentiable-swift)

@eaplatanios, what do you think about this approach?

@eaplatanios
Copy link
Author

@marcrasi Thanks lot for the update! I'll go through it later today, but just a quick question that pops in my mind is: what would the VJP of the .append function look like?

@rxwei
Copy link
Contributor

rxwei commented Apr 17, 2019

append will require AD to support for mutating functions and inout arguments. Roughly, AD will be treating an inout argument (or a mutating self) as both an input and an output.

@eaplatanios
Copy link
Author

@rxwei assuming that AD supports mutating functions and inout arguments, what would be the array length for the gradient?

@marcrasi
Copy link

We can think of concat as the immutable version of append, so a VJP for concat demonstrates how a VJP for append can eventually work:

// We should actually make the stdlib "+" differentiable instead of defining our own "concat". But it was
// easier to define my own "concat" for the example.
@differentiable(vjp: vjpConcat where E: Differentiable)
func concat<E>(_ lhs: [E], _ rhs: [E]) -> [E] {
    return lhs + rhs
}

func vjpConcat<E: Differentiable>(_ lhs: [E], _ rhs: [E])
    -> ([E], (Array<E>.CotangentVector) -> (Array<E>.CotangentVector, Array<E>.CotangentVector))
{
    func pullback(_ v: Array<E>.CotangentVector) -> (Array<E>.CotangentVector, Array<E>.CotangentVector) {
        precondition(
            v.base.count == lhs.count + rhs.count,
            "concat received gradient with wrong count. gradient: \(v.base.count), lhs: \(lhs.count), rhs: \(rhs.count)")
        // It would be more efficient if we find a way to keep the slices as ArraySlices.
        return (
            Array<E>.CotangentVector(Array(v.base[0..<lhs.count])),
            Array<E>.CotangentVector(Array(v.base[lhs.count...])))
    }
    return (concat(lhs, rhs), pullback)
}

// Similarly, we should actually make the stdlib subscript differentiable.
extension Array where Element : Differentiable {
  @differentiable(wrt: (self), vjp: vjpDifferentiableSubscript)
  func differentiableSubscript(_ i: Int) -> Element {
    return self[i]
  }

  func vjpDifferentiableSubscript(_ i: Int) -> (Element, (Element.CotangentVector) -> Array.CotangentVector) {
    let result = self[i]
    func pullback(_ v: Element.CotangentVector) -> Array.CotangentVector {
      var r = Array<Element.CotangentVector>(repeating: .zero, count: count)
      r[i] = v
      return Array.CotangentVector(r)
    }
    return (result, pullback)
  }
}

@differentiable
func sumFirstThreeConcatted(_ arrs: TwoArrs) -> Float {
    let c = concat(arrs.a, arrs.b)
    return c.differentiableSubscript(0) + c.differentiableSubscript(1) + c.differentiableSubscript(2)
}

struct TwoArrs: Differentiable {
    let a: [Float]
    let b: [Float]
    init(_ a: [Float], _ b: [Float]) {
        self.a = a
        self.b = b
    }
}

print(gradient(at: TwoArrs([0, 0], [0, 0]), in: sumFirstThreeConcatted))
// => (a: [1.0, 1.0], b: [1.0, 0.0])

print(gradient(at: TwoArrs([], [0, 0, 0]), in: sumFirstThreeConcatted))
// => (a: [], b: [1.0, 1.0, 1.0])

print(gradient(at: TwoArrs([0, 0, 0], []), in: sumFirstThreeConcatted))
// => (a: [1.0, 1.0, 1.0], b: [])

The append VJP should do something similar. It's pullback will take a gradient of length n + 1, "return" the first n elements as the gradient of the input array, and return the last element as the gradient of the appended element.

@marcrasi
Copy link

@eaplatanios, do you mind if I update this PR with these changes and try to get it merged soon? We're cutting a "v0.3" release of S4TF this evening, and we'd really like to have differentiable arrays in it!

@eaplatanios
Copy link
Author

@marcrasi Sorry it's been a super busy day so far. I took a look at your update and I agree it makes sense and like how we also avoid a lot of code duplication. Plus it makes more sense mathematically (I really like the last example you added in your earlier post). Feel free to update the PR and try to get it merged soon. I'm excited to see this making it in the 0.3 release. :)

@rxwei
Copy link
Contributor

rxwei commented Apr 17, 2019

Let me know when it's ready to review!

@marcrasi
Copy link

@rxwei This is ready to review!

@marcrasi
Copy link

Note about thing I ran into: I also got the crash that @eaplatanios had. It seems related to cross-module stuff, which is why I didn't get the crash when I was playing with the gists. I fixed it by adding @_fixed_layout to DifferentiableView. Seems like there is an underlying bug, so I filed one here: https://bugs.swift.org/browse/TF-442 . This bug doesn't block this PR because @_fixed_layout makes it work.

// TODO: Determine if that is a bug, and fix.
public var array: [Element] {
@differentiable(wrt: self, vjp: _vjpArray)
get {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use _read and _modify accessors instead of get/set to make it more efficient.

Suggested change
get {
_read { yield base }
_modify { yield &base }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @differentiable attr doses not like _read. It says:

/usr/local/google/home/marcrasi/swift-base/swift/stdlib/public/core/Array.swift:1888:8: error: cannot differentiate void function '_'
      @differentiable(wrt: self, vjp: _vjpBase)
       ^

So I will leave it as get / _modify for now, and this whole problem will go away once we fix the @differentiable stored property issue.

// I'm implementing this as a computed property instead of directly exposing `base` because the
// `@differentiable` annotation does not make the stored property actually differentiable. I
// think this is a bug. Maybe it's related to `@_fixed_layout`?
// TODO: Determine if that is a bug, and fix.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. How about renaming the current base to _base, and renaming array/_vjpArray to base/_vjpBase?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


/// Construct a view of the given array.
@differentiable(wrt: array, vjp: _vjpInit)
public init(_ array: [Element]) { self.base = array }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
public init(_ array: [Element]) { self.base = array }
public init(_ base: [Element]) { self.base = base }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return (array, { $0 })
}

/// Construct a view of the given array.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Construct a view of the given array.
/// Creates a differentiable view of the given array.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// SWIFT_ENABLE_TENSORFLOW
extension Array where Element : Differentiable {
/// Views the array as the differentiable product manifold of `Element` multiplied with itself
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Views the array as the differentiable product manifold of `Element` multiplied with itself
/// The view of an array as the differentiable product manifold of `Element` multiplied with itself

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

get {
return AllDifferentiableVariables(array.map { $0.allDifferentiableVariables })
}
set(newValue) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omit newValue since the name is implicitly bound.

Suggested change
set(newValue) {
set {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

public func tangentVector(from cotangentVector: CotangentVector) -> TangentVector {
precondition(
array.count == cotangentVector.array.count,
"cannot use Array.DifferentiableView with count \(array.count) to get tangentVector from " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these over 80 columns?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, I thought the limit was 100 based on some existing lines that go over 80

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sorry that was my mistake. I assumed it was 100 because that's in the Google Swift Style Guide.

get {
return DifferentiableView(self).allDifferentiableVariables
}
set(v) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set(v) {
set {

Use newValue?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -104,6 +104,13 @@ extension Array : KeyPathIterable {
}
}

extension Array.DifferentiableView : KeyPathIterable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need a special implementation here. Derived conformances should just do it for you when you declare : KeyPathIterable at DifferentiableView's declaration site.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

public func _vjpArray() -> ([Element], (Array<Element>.CotangentVector) -> CotangentVector) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make all VJPs be @usableFromInline internal

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@rxwei
Copy link
Contributor

rxwei commented Apr 17, 2019

The crash could be caused by @differentiable function lowering / VJP type calculation that we have today.

@marcrasi
Copy link

@swift-ci please test tensorflow

5 similar comments
@marcrasi
Copy link

@swift-ci please test tensorflow

@marcrasi
Copy link

@swift-ci please test tensorflow

@marcrasi
Copy link

@swift-ci please test tensorflow

@marcrasi
Copy link

@swift-ci please test tensorflow

@marcrasi
Copy link

@swift-ci please test tensorflow

@rxwei rxwei merged commit 9c34970 into swiftlang:tensorflow Apr 17, 2019
@rxwei
Copy link
Contributor

rxwei commented Apr 23, 2019

@eaplatanios Conforming Array to Differentiable made a crazy cool thing possible: Autograd in Swift.

@eaplatanios
Copy link
Author

@eaplatanios Conforming Array to Differentiable made a crazy cool thing possible: Autograd in Swift.

Wow this is really cool! On a side note, I also run into the array literal differentiability issue. I was wondering, how can we add a VJP for that?

@rxwei
Copy link
Contributor

rxwei commented Apr 23, 2019

It requires tweaking the AD transform to recognize _allocateUninitializedArray and a few loads and stores.

rxwei pushed a commit to rxwei/swift that referenced this pull request May 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tensorflow This is for "tensorflow" branch PRs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants