Skip to content

move ad before mandatory inlining #21560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

marcrasi
Copy link

Moves AD before mandatory inlining. Necessary fixes were:

  • Improve how AD-generated function linkage is calculated. Details in comments. (AD now differentiates more functions, and the linkage calculation wasn't working for some of them).
  • Store associated function names as identifiers because the linkage changes make it so that the optimizer sometimes deletes associated functions, which frees the memory backing the name.
  • Delete generated function bodies on errors.
  • Adjust tests.

Unfortunately, this does not fix stored property differentiation because SILGen actually generates direct struct_extracts instead of getter calls in some cases. Also, this does not fix tensor_autodiff_runtime.swift. I will look into these two problems next.

@marcrasi marcrasi requested review from rxwei and dan-zheng December 28, 2018 22:41
@marcrasi
Copy link
Author

@swift-ci please test tensorflow

Copy link
Contributor

@dan-zheng dan-zheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@marcrasi marcrasi merged commit 317664d into swiftlang:tensorflow Dec 29, 2018
@marcrasi marcrasi deleted the ad-before-mandatory-inlining branch December 29, 2018 00:43
// FIXME: Propagate wrt indices correctly so that this actually takes the
// gradient wrt only the first parameter, as intended.
// SupersetAdjointTests.test("CrossModule") {
// expectEqual(1, gradient(at: 1, 2, in: (+) as @autodiff (Float, @nondiff Float) -> Float))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this will not work semantically. The arity of the differential operator determines the number of results returned by the pullback. We can't expect to exhaustively define differential operators for arbitrary combinations of @nondiff and non-@nondiff parameters.

Therefore, gradient(at: 1, 2, in: ...), gives you a pullback that returns a tuple. To obtain VJPs only w.r.t. the first parameter, you need to pass the single argument in place of at: and pass non-diff arguments as constants, i.e. gradient(at: 1, in: { x in x + 2 }). The function being passed to Builtin.autodiffApply will not have a @nondiff on its parameter types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants