Skip to content

Allow Local Mode to work with a local training script. #178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 10, 2018

Conversation

iquintero
Copy link
Contributor

@iquintero iquintero commented May 9, 2018

This change works when ~/.sagemaker/config.yaml has

local:
local_code: True

It depends on the container image supporting a local training script
instead of an s3 location.

Issue #, if available:

Description of changes:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have updated the changelog with a description of my changes (if appropriate)
  • I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@iquintero iquintero requested a review from owen-t May 9, 2018 23:22
This change works when ~/.sagemaker/config.yaml has

local:
   no_internet: True

It depends on the container image supporting a local training script
instead of an s3 location.
This fixes the unit tests that were broken.
Also adds the remaining work that allows serving for TF to work.
Pending: MXNet.
@codecov-io
Copy link

codecov-io commented May 10, 2018

Codecov Report

Merging #178 into master will decrease coverage by 0.02%.
The diff coverage is 84.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #178      +/-   ##
==========================================
- Coverage   90.57%   90.55%   -0.03%     
==========================================
  Files          37       38       +1     
  Lines        2504     2562      +58     
==========================================
+ Hits         2268     2320      +52     
- Misses        236      242       +6
Impacted Files Coverage Δ
src/sagemaker/tensorflow/model.py 100% <100%> (ø) ⬆️
src/sagemaker/estimator.py 86.1% <100%> (+1.54%) ⬆️
src/sagemaker/utils.py 87.5% <100%> (+4.89%) ⬆️
src/sagemaker/fw_utils.py 98.5% <100%> (-0.03%) ⬇️
src/sagemaker/local/__init__.py 100% <100%> (ø)
src/sagemaker/mxnet/estimator.py 100% <100%> (ø) ⬆️
src/sagemaker/local/local_session.py 79.8% <36.36%> (-2.85%) ⬇️
src/sagemaker/model.py 95% <72.72%> (-5%) ⬇️
src/sagemaker/session.py 87.41% <80%> (+0.12%) ⬆️
src/sagemaker/local/image.py 87.62% <81.81%> (+0.62%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c1f1ab9...cc3ffc3. Read the comment docs.

if self.output_path is None:
self.output_path = 's3://{}/'.format(self.sagemaker_session.default_bucket())
no_internet = get_config_value('local.no_internet', self.sagemaker_session.config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This property should be local.local_code and it should default to true. Customers can still call out to the internet if 'no_internet' is true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better, I was not a fan of no_internet.

if self.output_path is None:
self.output_path = 's3://{}/'.format(self.sagemaker_session.default_bucket())
no_internet = get_config_value('local.no_internet', self.sagemaker_session.config)
if self.sagemaker_session.local_mode and no_internet:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we start throwing more configuration into local_mode, then we should have a local_mode configuration object in place of the local_mode boolean variable. I think this is okay now, but once we get one more piece of config, let's refactor this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.

raise ValueError('"{}" does not exist.'.format(directory))
if not os.path.isdir(directory):
raise ValueError('"{}" is not a directory.'.format(directory))
if script not in os.listdir(directory):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very small point - but doing a stat on os.path.join(directory, script) would be better, because it avoids listing the entire directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to rewrite this entire thing as:

if not os.path.isfile(os.path.join(directory, script)): 
    raise ValueError('...')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. This is was just taken from the original tar_and_upload_dir() but this is a great time to improve it 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just realized I didnt do this. Let me do it now.

self._upload_code(deploy_key_prefix)

no_internet = get_config_value('local.no_internet', self.sagemaker_session.config)
print('no_internet: %s local_mode: %s' % (no_internet, self.sagemaker_session.local_mode))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

no_internet = get_config_value('local.no_internet', self.sagemaker_session.config)
print('no_internet: %s local_mode: %s' % (no_internet, self.sagemaker_session.local_mode))
if self.sagemaker_session.local_mode and no_internet:
self.uploaded_code = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a typo - should be self_uploaded_code, but you could probably remove this entire block anyway

if self.sagemaker_session.local_mode and no_internet:
self.uploaded_code = None
else:
self._upload_code(deploy_key_prefix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like something that could exist in the parent class.

In general boto_session is not None anymore. no_internet renamed to
local_code. Added more unit tests to bring coverage up.
@iquintero iquintero changed the title Allow Local Mode to work without a boto session. Allow Local Mode to work with a local training script. May 10, 2018
Raises:
ValueError: If ``directory`` does not exist, is not a directory, or does not contain ``script``.
"""
if directory:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this if statement

@iquintero iquintero merged commit fda0e0e into aws:master May 10, 2018
apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this pull request Nov 15, 2018
resolving the Redshift DNS name to a private IP
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants