Skip to content
This repository was archived by the owner on Jul 1, 2023. It is now read-only.

[WIP] Equations for losses #579

Merged
merged 23 commits into from
Feb 5, 2020
Merged

[WIP] Equations for losses #579

merged 23 commits into from
Feb 5, 2020

Conversation

joaogui1
Copy link
Contributor

As discussed in the google group I'm adding equations for the losses, but I'm afraid that without LaTeX support it may be a little bit unreadable, what do you think?
I have added equations for the most common losses as of now, after feedback I can write the rest.

joaogui1 and others added 7 commits December 13, 2019 10:18
Co-Authored-By: Brad Larson <[email protected]>
Co-Authored-By: Brad Larson <[email protected]>
Co-Authored-By: Brad Larson <[email protected]>
Co-Authored-By: Brad Larson <[email protected]>
Co-Authored-By: Brad Larson <[email protected]>
Co-Authored-By: Brad Larson <[email protected]>
@joaogui1
Copy link
Contributor Author

Ok, so I guess that's all of the non-categorical losses, but I'm a bit confused about how to describe the categorical hinge and categorical crossentropy, can someone shine a light on this?

@8bitmp3
Copy link
Contributor

8bitmp3 commented Dec 13, 2019

Hi @joaogui1 @eaplatanios @BradLarson Just discussed this thread and the one from the Swift Google group about API docs with @saeta at an event we're attending offline.

I think this is awesome. I would like to suggest something to improve user experience for the growing community. Perhaps, a logical approach is to follow very closely or word-for-word, where applicable, both descriptions and equations in the TF 2.x API docs, if they exist already.

Otherwise, reinventing the wheel can introduce inconsistency, which TF/S4TF users will notice, since www.tensorflow.org has one search.

Example:

In TF 2.0 tf.keras.losses docs, the squared hinge loss is defined as follows (source):

class SquaredHinge(LossFunctionWrapper):
"""Computes the squared hinge loss between `y_true` and `y_pred`.
  `loss = square(maximum(1 - y_true * y_pred, 0))`
  `y_true` values are expected to be -1 or 1. If binary (0 or 1) labels are
  provided we will convert them to -1 or 1.
...

vs what is in one of the latest commits here:

/// Returns the squared hinge loss between predictions and expectations.
/// Given the predicted and expected, the Hinge loss is computed as follows:
///  `reduction(max(0, 1 - predicted * expected)^2)` 
///
/// - Parameters:
/// ...
public func squaredHingeLoss<Scalar: TensorFlowFloatingPoint>(
...) ...
}

The use of y_true and y_pred is pretty consistent and understandable across the TF 2.x APIs and because the S4TF docs are hosted on the same .org site and under /tensorflow/ on github, I think we should reuse what's been reviewed and edited by @lamberta and the rest of the docs team. If something is more Swifty rather than Pythonic, we could certainly amend it on a case-by-case basis.

Maybe I'm wrong, let me know what you all think. cc @dynamicwebpaige

Can I also draw the S4TF community's attention to the TensorFlow docs style guide
https://www.tensorflow.org/community/contribute/docs_style as well as the Google dev docs style guide https://developers.google.com/style

@joaogui1
Copy link
Contributor Author

I changed the names to be compatible, (so it's y_true and y_pred now) but didn't copy the documentation, do you guys think it would be better to have identical docs?
Also there's another difference, since we have a combined softmax + crossentropy loss calling the logits y_pred would be misleading (the real y_preds should be softmax(logits), do you think the name should be changed anyway?

@joaogui1 joaogui1 changed the title [Draft] Equations for losses [WIP] Equations for losses Dec 13, 2019
@joaogui1
Copy link
Contributor Author

Pinging @bartchr808 @8bitmp3

@bartchr808
Copy link
Contributor

Firstly thanks @joaogui1 for doing this and @8bitmp3 for bringing up the idea of "follow[ing] very closely...both descriptions and equations in the TF 2.x API docs"! 😄

do you guys think it would be better to have identical docs?

Personally, I agree with @8bitmp3 and I feel trying to have as close of a 1:1 matching between documentation would be ideal. As with code style, Python and Swift have fairly different styles in the way you define functions. Thus, our API naming cannot be 1:1 in order to stay Swifty. However, when it comes to writing the documentation, I think this is where we could try and match the documentation whenever possible as the exiting docs are generally TensorFlow specific and language agnostic. So then, it can make anyone transitioning from Python to Swift (either a straight switch or progressively through Python interop) have an easier time possibly recalling what certain functions do (e.g. you Google search some sentence from some documentation in Python TensorFlow, and it shows matches in Swift for TensorFlow docs).

Additionally, I think this can make maintaining docs much easier for the community, @lamberta and the rest of the docs team.

As a small disclaimer, I've been back and busy with school since September, so I haven't been able to follow along with the progress being made in the community, so I may sometimes miss out on some details here and there! 😆

joaogui1 and others added 5 commits December 18, 2019 13:22
Co-Authored-By: Bart Chrzaszcz <[email protected]>
Co-Authored-By: Bart Chrzaszcz <[email protected]>
Co-Authored-By: Bart Chrzaszcz <[email protected]>
Co-Authored-By: Bart Chrzaszcz <[email protected]>
Co-Authored-By: Bart Chrzaszcz <[email protected]>
Co-Authored-By: Bart Chrzaszcz <[email protected]>
@BradLarson
Copy link
Contributor

The general consensus seems to be that we would prefer to closely mirror the documentation from the Python implementations, but replace the variables such that they reflect the Swift code. For example, in the case that @8bitmp3 cites above, that would turn into:

/// Computes the squared hinge loss between `expected` and `predicted`.
///   `loss = square(maximum(1 - expected * predicted, 0))`

(I'm not sure if the subsequent two lines in the Keras implementation about the parameters apply to how we handle this, but if they do, we could bring those over too with parameter names replaced.)

@8bitmp3
Copy link
Contributor

8bitmp3 commented Dec 20, 2019

Thumbs up and cheers @bartchr808 @BradLarson for your feedback and thanks @joaogui1 for kick-starting this process. To sum up, so that we're on the same page:

  • Mirror TensorFlow API docs as much as possible to help Python users and maintain consistency.
  • Amend variables and other bits, where applicable, to keep it Swifty.

And, for additional guidance, these Google docs/code style guides should also be useful:

(I'll also conduct some deep exploration since the project is already at v0.6 and it's very cold outside.)

image

@8bitmp3
Copy link
Contributor

8bitmp3 commented Dec 20, 2019

@saeta since you mentioned it in https://groups.google.com/a/tensorflow.org/forum/#!topic/swift/tQKFYL1Ykac

Update: I can add a few bits to CONTRIBUTING.md, since it already exists, if y'all don't mind. Will raise via a separate PR fyi @BradLarson @bartchr808 @eaplatanios @dan-zheng.

And if you guys are ok with it, it can built upon and expanded in already existing CONTRIBUTING.md https://github.com/tensorflow/swift-apis/blob/master/CONTRIBUTING.md :

An example of soon-to-be-proposed additions:

### Contribution guidelines and standards
...
#### General guidelines for Swift for TensorFlow API docs contribution

* Closely mirror Swift for TensorFlow API docs with implementations in [tensorflow/tensorflow/python](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/python) to maintain consistency across platforms.
  - If required, replace variable names from Python docs to reflect the Swift code. For example: `y_true` and `y_predict` in TF 2.x API docs would become `expected` and `predicted` in Swift for TensorFlow, respectively.
* When you contribute a new feature to Swift for TensorFlow, the maintenance burden is (by default) transferred to the Swift to TensorFlow team. This means that the benefit of the contribution must be compared against the cost of maintaining the feature.

For your reference, this is what it says in the TensorFlow 2.x API docs under https://github.com/tensorflow/tensorflow/blob/master/CONTRIBUTING.md:

### Contribution guidelines and standards
...
#### General guidelines and philosophy for contribution
...
* When you contribute a new feature to TensorFlow, the maintenance burden is (by default) transferred to the TensorFlow team. This means that the benefit of the contribution must be compared against the cost of maintaining the feature.

@BradLarson
Copy link
Contributor

It's been a little while since we last discussed this, are you still interested in working on these documentation additions? We're revisiting this after the holidays, and wanted to check back in.

@joaogui1
Copy link
Contributor Author

joaogui1 commented Jan 8, 2020

Hi, the idea was to finish it after the holidays, but I've been quite sick. If it's not a problem I will finish it once I've recovered

@BradLarson
Copy link
Contributor

Please take your time, hope you feel better soon. We were just checking in on the list of pending pull requests to make sure we weren't neglecting anything. Whenever you feel ready to work on this, we'll be glad to review.

@RahulBhalley
Copy link

Great to see that API documentation is now being discussed seriously! Very much 👍 for this effort.

@@ -262,6 +282,8 @@ func _vjpSoftmaxCrossEntropyHelper<Scalar: TensorFlowFloatingPoint>(
}

/// Returns the sigmoid cross entropy (binary cross entropy) between logits and labels.
/// Given the logits and probabilites, the sigmoid cross entropy computes `reduction(-sigmoid(logits) * log(p))`
/// Where sigmoid(x) = `1 / (1 + exp(-x))`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will equations like these be shown as LaTex does?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, code enclosed in backticks appear with inline code formatting, just like in Markdown.
The Jazzy API documentation generator doesn't support LaTex formatting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having no LaTex—at least for now—is good for user experience. @fchollet wrote an entire (best-selling) book on deep learning without any LaTex:

"...we’ll steer away from mathematical notation, which can be off-putting for those without any mathematics background and isn’t strictly necessary to explain things well."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@8bitmp3 you just made me feel nervous 😬 because my book will include equations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#suntorytime

@joaogui1
Copy link
Contributor Author

So, there are some points worthy discussing about the hinge losses:

  1. In tf/keras it seems they convert the values from binary to -1/1 if the input is not -1/1, and swift doesn't do that, should we do it?

  2. Our implementation of CategoricalHingeLoss seems to be both the opposite and somewhat different from tf/keras, to clarify swift's positive is keras' negative and swift's negative is a little bit different from keras' positive, but pretty similar. Should we worry?

@joaogui1
Copy link
Contributor Author

Oks, sorry again for the delay, I believe that's about it, feedback is welcome as always.

@joaogui1 joaogui1 requested a review from BradLarson January 21, 2020 08:08
@joaogui1
Copy link
Contributor Author

Pinging @BradLarson

Co-Authored-By: Brad Larson <[email protected]>
@joaogui1
Copy link
Contributor Author

joaogui1 commented Feb 3, 2020

So, there are some points worthy discussing about the hinge losses:

  1. In tf/keras it seems they convert the values from binary to -1/1 if the input is not -1/1, and swift doesn't do that, should we do it?
  2. Our implementation of CategoricalHingeLoss seems to be both the opposite and somewhat different from tf/keras, to clarify swift's positive is keras' negative and swift's negative is a little bit different from keras' positive, but pretty similar. Should we worry?

And what do you guys think of this @8bitmp3 @BradLarson ?

Copy link
Contributor

@BradLarson BradLarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me so long to review this, I wanted to get to the bottom of the categorical hinge loss question. Near as I can tell, our implementation matches that of both Keras and tf.keras:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/losses.py#L975

and it looks like the documentation for tf.keras might actually be wrong. If you submitted a PR correcting that on the tf.keras side, I bet they'd be happy to look into that.

Regarding converting binary values, let's pull that out as a separate issue to be discussed so that the rest of these documentation improvements can be integrated. Thank you again for your work on this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants