Skip to content
This repository was archived by the owner on Apr 23, 2025. It is now read-only.

Removing softmax as terminal activation function from two models #232

Merged
merged 1 commit into from
Dec 2, 2019

Conversation

BradLarson
Copy link
Contributor

The LeNet-5 and DenseNet models were previously set up with a softmax activation function on their last layer. This caused problems during training, because they were then used with softmaxCrossEntropy(logits:labels:), leading to the softmax being applied twice. With this modification, the LeNet-5 model trained on MNIST matches the loss and accuracy of a reference Python TF 2.0 version of the same model at each step in training.

Additionally, the sum of the loss, rather than the average loss, was being reported for the LeNet-MNIST example. This has been corrected.

Finally, leftover TODO comments from a completed fix have been removed.

@BradLarson BradLarson requested a review from marcrasi December 2, 2019 21:09
@BradLarson BradLarson merged commit 14e694d into tensorflow:master Dec 2, 2019
@BradLarson BradLarson deleted the softmax branch December 2, 2019 21:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants