Skip to content

Support of Horovod and TF 1.12 for TensorFlow Script Mode. TFS 1.12 support #567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 69 commits into from
Jan 17, 2019

Conversation

mvsusp
Copy link
Contributor

@mvsusp mvsusp commented Dec 19, 2018

Description of changes:

  • feature: support for Tensorflow 1.12
  • feature: support for Tensorflow Serving 1.12
  • feature: support for Horovod

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have updated the changelog with a description of my changes (if appropriate)
  • I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@mvsusp mvsusp requested review from nadiaya and yangaws December 19, 2018 23:43
@codecov-io
Copy link

codecov-io commented Dec 19, 2018

Codecov Report

Merging #567 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #567      +/-   ##
=========================================
+ Coverage   92.69%   92.7%   +0.01%     
=========================================
  Files          71      71              
  Lines        5405    5418      +13     
=========================================
+ Hits         5010    5023      +13     
  Misses        395     395
Impacted Files Coverage Δ
src/sagemaker/tensorflow/estimator.py 94.94% <100%> (+0.26%) ⬆️
src/sagemaker/estimator.py 90.48% <100%> (+0.08%) ⬆️
src/sagemaker/__init__.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f341864...177f37f. Read the comment docs.

Copy link
Contributor

@nadiaya nadiaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update tf_version fixture with new version.

@@ -12,4 +12,4 @@
# language governing permissions and limitations under the License.
from __future__ import absolute_import

TF_VERSION = '1.11'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change default version.

Add LATEST_VERSION to the estimator instead, e.g.: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/mxnet/estimator.py#L33

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Mxnet also the default version in defaults.py and LATEST_VERSION in estimator.py is explicityly defined.
MxNet estimatopr.py => https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/mxnet/estimator.py#L33
MxNet defaults.py -> https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/mxnet/defaults.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change the version @uditbhatia . Version is a required field starting by TF 1.12.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didnt changed any field

(default: None). Currently we only support distributed training with parameter servers. To enable it
use the following setup:
(default: None). Currently we support distributed training with parameter servers and MPI. To enable
parameter servers use the following setup:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up:
we should have a test that checks parmater server + mpi works.

@nadiaya nadiaya changed the base branch from master to t1.12-tests December 20, 2018 20:00
@nadiaya nadiaya changed the base branch from t1.12-tests to master December 20, 2018 20:09
@nadiaya nadiaya changed the base branch from master to t1.12-tests December 20, 2018 20:14
@nadiaya nadiaya changed the base branch from t1.12-tests to master December 20, 2018 20:15
Copy link
Contributor Author

@mvsusp mvsusp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comments, we are almost there.

@@ -12,4 +12,4 @@
# language governing permissions and limitations under the License.
from __future__ import absolute_import

TF_VERSION = '1.11'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change the version @uditbhatia . Version is a required field starting by TF 1.12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants