Skip to content

Rewrite of the code generation script. #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
May 2, 2019

Conversation

eaplatanios
Copy link
Contributor

@eaplatanios eaplatanios commented Apr 23, 2019

@rxwei @pschuh This is an attempt to rewrite the code generation script so that it supports the following features:

  1. There is a new flag called mode that you can set to either tfop, eager, or tfop-eager-fallback and it allows you to generate ops using either the #tfop operator or the eager mode C API (based on @pschuh 's previous implementation). tfop-eager-fallback uses tfop wherever possible and eager otherwise (e.g., for output lists).
  2. VariantHandle and ResourceHandle are now also supported, allowing us to replace many of the uses of #tfop in stdlib with calls to Raw functions.
  3. Output tensor lists are now supported when using eager mode.
  4. Function-valued attributes are now also supported in both top and eagermodes._tffunc` is used to trace them.
  5. Added support for ops that support for Tensor<T> and StringTensor. In this case, two functions are generated, one for each case.
  6. The code has been reorganized so that the tfop and eager modes share as much as possible and also makes sure that the API remains the same no matter which mode is used (stdlib can be compile with bindings generated in either mode without any changes).

Only list(func), ref-valued and complex-valued types are not supported by this script now, but this should be fine as they're not that common. This now covers 1096/1286 ops. Out of the remaining 190 ops, 131 are ref-valued, and so this covers almost everything now.

This also helps with the cleaning up of stdlib and transitioning stuff over to swift-apis.

Friend PRs: swiftlang/swift#24261 and tensorflow/swift-apis#109 .

@rxwei rxwei requested review from pschuh, rxwei and bgogul April 23, 2019 03:35
@eaplatanios
Copy link
Contributor Author

@rxwei So, if we add the constructor TensorArrayProtocol.init(_owning:count:) then I can add support for output tensor arrays by using TensorArrayProtocol for the output type in these cases.

@rxwei
Copy link
Contributor

rxwei commented Apr 23, 2019

@rxwei So, if we add the constructor TensorArrayProtocol.init(_owning:count:) then I can add support for output tensor arrays by using TensorArrayProtocol for the output type in these cases.

That makes sense to me!

@eaplatanios
Copy link
Contributor Author

That makes sense to me!

Cool, I'll go ahead and implement that as an experiment to see if it works out.

@eaplatanios
Copy link
Contributor Author

eaplatanios commented Apr 23, 2019

@rxwei Actually this makes a lot of sense and it revealed something interesting. The Array conformance should now look something like this:

extension Array : TensorArrayProtocol where Element : TensorGroup {
  public func _unpackTensorHandles(into address: UnsafeMutablePointer<CTensorHandle>?) {
    var ptr = address
    for elem in self {
      elem._unpackTensorHandles(into: ptr)
      ptr = ptr!.advanced(by: Int(elem._tensorHandleCount))
    }
  }

  public var _tensorHandleCount: Int32 {
    var count: Int32 = 0
    for elem in self { count += elem._tensorHandleCount }
    return count
  }

  public init(_owning tensorHandles: UnsafePointer<CTensorHandle>?, count: Int) {
    let size = count / Element._tensorHandleCount
    self = Array((0..<size).map { Element(
      _owning: tensorHandles[$0 * Element._tensorHandleCount])
    })
  }
}

That is, Element has to conform to TensorGroup. I don't see how the TensorArrayProtocol alone without the initializer with known count is useful in any setting. For example, in the previous case of extension Array : TensorArrayProtocol where Element : TensorArrayProtocol, you can't really ever construct that without knowing the count thus making it not so useful. Is it ok if I switch to the above implementation? Then I can easily add support for output lists in the bindings.

@rxwei
Copy link
Contributor

rxwei commented Apr 23, 2019

Actually this makes a lot of sense and it revealed something interesting. It doesn't really make sense to have automatic derivations for TensorArrayProtocol.

Makes sense, though we had to derive TensorArrayProtocol requirements because TensorGroup refines TensorArrayProtocol. Adding an init(_owning:count:) sounds like breaking this inheritance relation apart. Do you feel this inheritance is still necessary? If so, what does init(_owning:count:) mean for a TensorGroup, which is supposed to have a fixed number of elements?

@eaplatanios
Copy link
Contributor Author

In this case TensorGroup has a default implementation:

init(_owning tensorHandles: UnsafePointer<CTensorHandle>?, count: Int) {
  precondition(count == _tensorHandleCount)
  self.init(_owning: tensorHandles)
}

@rxwei
Copy link
Contributor

rxwei commented Apr 23, 2019

Yeah, makes sense. It would be very interesting to prototype this. I'm also curious to hear what @pschuh and @marcrasi think.

@rxwei
Copy link
Contributor

rxwei commented Apr 23, 2019

Also, a random idea about generating boilerplates: Since each of these ops calls the same TFE functions, would it make sense to define a tfop Swift function that dispatches an op and some arguments for us? Yeah, we will need variadic generics, but we can overload tfop to a certain arity for now. This will reduce a ton of code duplication and make optimizations easier.

@eaplatanios
Copy link
Contributor Author

Yes that would be great. I was hoping that we could make the current #tfop handle that and that's why I added the mode flag to support it. However, we would need to sort out the bug that causes the test failures when using #tfop and compiling with optimizations.

@rxwei
Copy link
Contributor

rxwei commented Apr 23, 2019

#tfop is actually causing a lot of problems in our infrastructure because it's not maintained, and we need to deal with it often when we merge from master. Personally, I think it can be designed as a Builtin function, but that probably wouldn't work well with const attributes. @lattner what do you think?

@eaplatanios
Copy link
Contributor Author

@rxwei I just pushed a change that implements this. The only changes required to compile this in stdlib are:

  • Adding init(_owning tensorHandles: UnsafePointer<CTensorHandle>?, count: Int) to TensorArrayProtocol.
  • Adding the following default implementation in a TensorGroup extension:
init(_owning tensorHandles: UnsafePointer<CTensorHandle>?, count: Int) {
  precondition(count == _tensorHandleCount)
  self.init(_owning: tensorHandles)
}
  • Changing the Array conformance to TensorArrayProtocol to this:
extension Array : TensorArrayProtocol where Element : TensorGroup {
  public func _unpackTensorHandles(into address: UnsafeMutablePointer<CTensorHandle>?) {
    var ptr = address
    for elem in self {
      elem._unpackTensorHandles(into: ptr)
      ptr = ptr!.advanced(by: Int(elem._tensorHandleCount))
    }
  }

  public var _tensorHandleCount: Int32 {
    var count: Int32 = 0
    for elem in self { count += elem._tensorHandleCount }
    return count
  }

  public init(_owning tensorHandles: UnsafePointer<CTensorHandle>?, count: Int) {
    let size = count / Element._tensorHandleCount
    self = Array((0..<size).map { Element(
      _owning: tensorHandles[$0 * Element._tensorHandleCount])
    })
  }
}

Regarding the Builtin function, I agree. It's actually nearly impossible for me to currently debug issues that pop up related to #tfop because it's super opaque with respect to what's happening underneath.

@rxwei
Copy link
Contributor

rxwei commented Apr 23, 2019

Nice!

@eaplatanios
Copy link
Contributor Author

eaplatanios commented Apr 23, 2019

There's still some issues to sort out with respect to data types of output lists but I'll look into that tomorrow.

…handling of input tensor lists for 'eager' mode.
@eaplatanios
Copy link
Contributor Author

This PR should be ready for review along with swiftlang/swift#24229 . All tests pass locally as the changes are all backwards compatible.

@dan-zheng
Copy link
Member

This PR should be ready for review along with apple/swift#24229 . All tests pass locally as the changes are all backwards compatible.

Could you please fix the merge conflict in RawOpsGenerated.swift?

@eaplatanios
Copy link
Contributor Author

Could you please fix the merge conflict in RawOpsGenerated.swift?

Done! :)

@eaplatanios
Copy link
Contributor Author

@rxwei This should be ready for review as it is backwards compatible and should not break anything in the existing codebase.

@rxwei
Copy link
Contributor

rxwei commented Apr 27, 2019

Great. I'd get @pschuh's opinions on this first as I'm less familiar with binding generation.

Copy link
Contributor

@pschuh pschuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments.

return self.op.inferred_counts[number_attr]
if number_attr:
return self.swift_name + 'Count'
if self.arg_def.type_list_attr:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't comment on the raw ops, but these appear to be codegenning as #tfops. This is breaking "saveV2" and "restoreV2" because they no longer return anything. I think the original logic was bad here. They should probably return [AnyTensor] or be otherwise blacklisted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the mode you use. I currently set it to tfop-eager-fallback just for backwards compatibility, but the signature should be the same with eager mode. I don't understand though what is wrong with the following two ops. saveV2 does not return anything, as expected, and restoreV2 returns a value with type Dtypes that conforms to TensorGroup. This allows you to save and restore say a struct of tensors and avoids the loss of type information incurred by using [AnyTensor]. Maybe I am missing something though.

@inlinable @inline(__always)
public static func saveV2<Dtypes: TensorArrayProtocol>(
  prefix: StringTensor,
  tensorNames: StringTensor,
  shapeAndSlices: StringTensor,
  tensors: Dtypes
) {
  return #tfop("SaveV2",
    prefix,
    tensorNames,
    shapeAndSlices,
    tensors,
    dtypes$dtype: tensors._typeList)
}

@inlinable @inline(__always)
public static func restoreV2<Dtypes: TensorGroup>(
  prefix: StringTensor,
  tensorNames: StringTensor,
  shapeAndSlices: StringTensor
) -> Dtypes {
  let op = TFE_Op("RestoreV2")
  let _ = op.addInput(prefix)
  let _ = op.addInput(tensorNames)
  let _ = op.addInput(shapeAndSlices)
  op.setAttr("dtypes", Dtypes._typeList)
  return op.execute(Int(Dtypes._typeList.count))
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, it was just restoreV2 that I had a problem with. You're constraining _typeList to be a static value. This is not useful in the plan that I have. I'll fix it later I guess, but I would prefer disabling it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that was a debate I had. I ended up with somewhat of a middle ground where if the type whose _typeList property we want appears as an output arg only, we constrain it to be a TensorGroup and use a static property. Otherwise, we constrain it to be a TensorArrayProtocol and use an instance property. It’s just that if it’s an output arg in either case we’d need to unpack the tensor handles and that’s something that TensorGroup allows us to do. We could disable it for now but I’m curious what use case it doesn’t work for so we can try and think of a better way to do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to determine the type list at runtime. In this case, I will be serializing and deserializing a dynamic list of tensors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a special use case that's not easy to generalize over the raw ops generation. Given that the current generation script generates somewhat type-safe code, how about we add an untyped overload for restoreV2 which offers the functionality you need?

rxwei pushed a commit to swiftlang/swift that referenced this pull request Apr 27, 2019
* Changed 'TensorArrayProtocol' such that it can be used to support output tensor arrays in raw ops.

* Added a '_typeList' property to 'TensorArrayProtocol'.

Friend PR: tensorflow/swift-bindings#26 .
@eaplatanios
Copy link
Contributor Author

@rxwei @pschuh @saeta This is now using eager mode by default, as we discussed. I realized I was actually using _TFCEagerExecute so no changes required there.

@eaplatanios
Copy link
Contributor Author

@pschuh I made the couple fixes you suggested. Currently trying to build and test swiftlang/swift#24425 with this version of the bindings.

@eaplatanios
Copy link
Contributor Author

@pschuh I fixed the bug with the number attributes, but I still had to disable check(status) for addInputList because it was still failing for some reason, even though I set the number attribute before calling addInputList. I think this may be a bug in the C API and we can ignore it for now as we are inferring the number attribute on our own now anyway.

I can also confirm that all tests pass on my machine now.

@eaplatanios
Copy link
Contributor Author

I also just removed support for the tfop mode to keep things cleaner.

@pschuh pschuh merged commit ee5ded2 into tensorflow:master May 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants