-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Allow Local Mode to work with a local training script. #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This change works when ~/.sagemaker/config.yaml has local: no_internet: True It depends on the container image supporting a local training script instead of an s3 location.
This fixes the unit tests that were broken. Also adds the remaining work that allows serving for TF to work. Pending: MXNet.
Codecov Report
@@ Coverage Diff @@
## master #178 +/- ##
==========================================
- Coverage 90.57% 90.55% -0.03%
==========================================
Files 37 38 +1
Lines 2504 2562 +58
==========================================
+ Hits 2268 2320 +52
- Misses 236 242 +6
Continue to review full report at Codecov.
|
src/sagemaker/estimator.py
Outdated
if self.output_path is None: | ||
self.output_path = 's3://{}/'.format(self.sagemaker_session.default_bucket()) | ||
no_internet = get_config_value('local.no_internet', self.sagemaker_session.config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This property should be local.local_code and it should default to true. Customers can still call out to the internet if 'no_internet' is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much better, I was not a fan of no_internet.
src/sagemaker/estimator.py
Outdated
if self.output_path is None: | ||
self.output_path = 's3://{}/'.format(self.sagemaker_session.default_bucket()) | ||
no_internet = get_config_value('local.no_internet', self.sagemaker_session.config) | ||
if self.sagemaker_session.local_mode and no_internet: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we start throwing more configuration into local_mode, then we should have a local_mode configuration object in place of the local_mode boolean variable. I think this is okay now, but once we get one more piece of config, let's refactor this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good.
src/sagemaker/fw_utils.py
Outdated
raise ValueError('"{}" does not exist.'.format(directory)) | ||
if not os.path.isdir(directory): | ||
raise ValueError('"{}" is not a directory.'.format(directory)) | ||
if script not in os.listdir(directory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very small point - but doing a stat on os.path.join(directory, script) would be better, because it avoids listing the entire directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be tempted to rewrite this entire thing as:
if not os.path.isfile(os.path.join(directory, script)):
raise ValueError('...')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. This is was just taken from the original tar_and_upload_dir() but this is a great time to improve it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just realized I didnt do this. Let me do it now.
src/sagemaker/tensorflow/model.py
Outdated
self._upload_code(deploy_key_prefix) | ||
|
||
no_internet = get_config_value('local.no_internet', self.sagemaker_session.config) | ||
print('no_internet: %s local_mode: %s' % (no_internet, self.sagemaker_session.local_mode)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this
src/sagemaker/tensorflow/model.py
Outdated
no_internet = get_config_value('local.no_internet', self.sagemaker_session.config) | ||
print('no_internet: %s local_mode: %s' % (no_internet, self.sagemaker_session.local_mode)) | ||
if self.sagemaker_session.local_mode and no_internet: | ||
self.uploaded_code = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a typo - should be self_uploaded_code, but you could probably remove this entire block anyway
src/sagemaker/tensorflow/model.py
Outdated
if self.sagemaker_session.local_mode and no_internet: | ||
self.uploaded_code = None | ||
else: | ||
self._upload_code(deploy_key_prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like something that could exist in the parent class.
In general boto_session is not None anymore. no_internet renamed to local_code. Added more unit tests to bring coverage up.
41a9ddb
to
ebdb0b7
Compare
Raises: | ||
ValueError: If ``directory`` does not exist, is not a directory, or does not contain ``script``. | ||
""" | ||
if directory: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this if statement
resolving the Redshift DNS name to a private IP
This change works when ~/.sagemaker/config.yaml has
local:
local_code: True
It depends on the container image supporting a local training script
instead of an s3 location.
Issue #, if available:
Description of changes:
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.