Allow Framework Estimators to use custom image #223

iquintero · 2018-06-11T00:57:18Z

Chainer, Tensorflow and MXNet estimators can now pass
an image_name argument to the constructor to use that image
instead of the default sagemaker ones.

Issue #, if available:

Description of changes:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

I have read the CONTRIBUTING doc
I have added tests that prove my fix is effective or that my feature works (if appropriate)
I have updated the changelog with a description of my changes (if appropriate)
I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

codecov-io · 2018-06-11T01:00:08Z

Codecov Report

Merging #223 into master will increase coverage by 0.06%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #223      +/-   ##
==========================================
+ Coverage   92.27%   92.33%   +0.06%     
==========================================
  Files          49       49              
  Lines        3261     3274      +13     
==========================================
+ Hits         3009     3023      +14     
+ Misses        252      251       -1

Impacted Files	Coverage Δ
src/sagemaker/tensorflow/estimator.py	`96.29% <100%> (+0.05%)`	⬆️
src/sagemaker/pytorch/estimator.py	`100% <100%> (ø)`	⬆️
src/sagemaker/estimator.py	`86.08% <100%> (+0.3%)`	⬆️
src/sagemaker/chainer/estimator.py	`100% <100%> (ø)`	⬆️
src/sagemaker/mxnet/estimator.py	`100% <100%> (ø)`	⬆️
src/sagemaker/fw_utils.py	`100% <0%> (+1.4%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b458d3d...df57063. Read the comment docs.

Chainer, Tensorflow and MXNet estimators can now pass an image_name argument to the constructor to use that image instead of the default sagemaker ones.

iquintero · 2018-06-11T19:45:58Z

src/sagemaker/estimator.py

@@ -521,6 +526,9 @@ def __init__(self, entry_point, source_dir=None, hyperparameters=None, enable_cl
        self.container_log_level = container_log_level
        self._hyperparameters = hyperparameters or {}
        self.code_location = code_location
+        self.image_name = image_name
+        print(self.image_name)
+        print(kwargs)


iquintero · 2018-06-11T19:46:24Z

src/sagemaker/estimator.py

@@ -226,6 +227,7 @@ def attach(cls, training_job_name, sagemaker_session=None):
        job_details = sagemaker_session.sagemaker_client.describe_training_job(TrainingJobName=training_job_name)
        init_params = cls._prepare_init_params_from_job_description(job_details)

+        print(init_params)


winstonaws

Looks good!

Please add documentation to the README about this.
Add a test which verifies that the custom image is used for the training job when the estimator is fit.

winstonaws · 2018-06-11T20:01:03Z

src/sagemaker/estimator.py

@@ -226,6 +227,7 @@ def attach(cls, training_job_name, sagemaker_session=None):
        job_details = sagemaker_session.sagemaker_client.describe_training_job(TrainingJobName=training_job_name)
        init_params = cls._prepare_init_params_from_job_description(job_details)

+        print(init_params)


Remove debugging print?

winstonaws · 2018-06-11T20:08:08Z

src/sagemaker/estimator.py

@@ -521,6 +526,9 @@ def __init__(self, entry_point, source_dir=None, hyperparameters=None, enable_cl
        self.container_log_level = container_log_level
        self._hyperparameters = hyperparameters or {}
        self.code_location = code_location
+        self.image_name = image_name
+        print(self.image_name)


Remove debugging prints?

winstonaws · 2018-06-11T20:13:22Z

src/sagemaker/estimator.py

@@ -513,6 +515,9 @@ def __init__(self, entry_point, source_dir=None, hyperparameters=None, enable_cl
            code_location (str): Name of the S3 bucket where custom code is uploaded (default: None).
                If not specified, default bucket created by ``sagemaker.session.Session`` is used.
            **kwargs: Additional kwargs passed to the ``EstimatorBase`` constructor.
+            image_name (str): An alternate image name to use instead of the official Sagemaker image


I'd go with "image" instead of "image_name" here since I think the "_name" part of it doesn't add any descriptiveness (it might even be a little confusing).

In the docstring you should explain the valid formats (e.g. ecr url, dockerhub name + tag) - include examples.

Also state that this is used for both training and deployment.

I went with image_name because that is what the Estimator class uses, so I wanted to at least be consistent with that. I think just image is better but I dont know if we should have a different parameter name across different estimators.

Oh okay, let's keep image_name then.

winstonaws · 2018-06-11T20:18:21Z

src/sagemaker/chainer/estimator.py

@@ -67,9 +67,12 @@ def __init__(self, entry_point, use_mpi=None, num_processes=None, process_slots_
                              One of 'py2' or 'py3'.
            framework_version (str): Chainer version you want to use for executing your model training code.
                List of supported versions https://github.com/aws/sagemaker-python-sdk#chainer-sagemaker-estimators
+            image_name (str): The container image to use for training. This will override py_version and


Instead of "this will override", how about something like - "if specified, the estimator will use this image for training and hosting, instead of selecting the appropriate SageMaker official image based on framework_version and py_version.

Applies to everywhere this wording is used.

winstonaws · 2018-06-11T20:20:15Z

tests/unit/test_chainer.py

+    job_name = 'new_name'
+    chainer.fit(inputs='s3://mybucket/train', job_name='new_name')
+    model = chainer.create_model()
+    chainer.container_log_level


What's this line for?

this line was carried over from another test that I used as a template, I just realized this was present in other tests as well but its basically useless. I will get rid of it.

winstonaws · 2018-06-11T20:21:06Z

tests/unit/test_chainer.py

+    chainer.container_log_level
+
+    assert model.sagemaker_session == sagemaker_session
+    assert model.image == custom_image


I think I would keep the asserts minimal for this test, and just assert the image, assuming that other unit tests cover the other parameters already.

winstonaws · 2018-06-11T20:22:13Z

tests/unit/test_chainer.py

+    assert estimator.hyperparameters()['training_steps'] == '100'
+    assert estimator.source_dir == 's3://some/sourcedir.tar.gz'
+    assert estimator.entry_point == 'iris-dnn-classifier.py'
+    assert estimator.train_image() == training_image


Same comment here about keeping minimal asserts (unless there's a specific reason that these interact with overriding the image.)

winstonaws · 2018-06-14T23:56:23Z

src/sagemaker/chainer/README.rst

@@ -175,6 +175,12 @@ The following are optional arguments. When you create a ``Chainer`` object, you
 -  ``job_name`` Name to assign for the training job that the fit()
   method launches. If not specified, the estimator generates a default
   job name, based on the training image name and current timestamp
+-  ``image_name`` An alternative docker image to use for training and


Formatting looks broken here - take a look in the rich view?

Applies to all the readmes changed.

winstonaws · 2018-06-14T23:59:09Z

src/sagemaker/chainer/estimator.py

-                framework_version. The image is expected to be a modification of the SageMaker Chainer image.
+            image_name (str): If specified, the estimator will use this image for training and hosting, instead of
+                selecting the appropriate SageMaker official image based on framework_version and py_version. It can
+                be an ECR url or dockerhub image and tag: 123.dkr.ecr.us-west-2.amazonaws.com/my-custom-image:1.0,


Lets make it more explicit that those are two separate examples. You could do something like:

"It can be an ECR url or dockerhub image and tag. Examples: 123.dkr.ecr.us-west-2.amazonaws.com/my-custom-image:1.0
custom-image:latest"

Fixed: Linear Learner's kernel to align with SageMaker Notebooks

iquintero requested a review from winstonaws June 11, 2018 00:57

Allow Framework Estimators to use custom image

2fef9cd

Chainer, Tensorflow and MXNet estimators can now pass an image_name argument to the constructor to use that image instead of the default sagemaker ones.

iquintero force-pushed the custom_image_name branch from 6965ca1 to 2fef9cd Compare June 11, 2018 01:05

iquintero commented Jun 11, 2018

View reviewed changes

winstonaws suggested changes Jun 11, 2018

View reviewed changes

Ignacio Quintero and others added 3 commits June 14, 2018 13:53

Adding better docs

4926e10

Merge branch 'master' into custom_image_name

e3f6ab5

Cleanup tests

17d14d1

winstonaws suggested changes Jun 15, 2018

View reviewed changes

Ignacio Quintero and others added 2 commits June 15, 2018 10:43

Fix doc formatting

74775e4

Merge branch 'master' into custom_image_name

1ecf93d

winstonaws previously approved these changes Jun 15, 2018

View reviewed changes

Merge branch 'master' into custom_image_name

e6108b7

iquintero dismissed winstonaws’s stale review via e6108b7 June 22, 2018 18:23

laurenyu and others added 4 commits June 22, 2018 13:16

Merge branch 'master' into custom_image_name

471908d

Add support for PyTorch

f2cab5f

update changelog

31643bd

Merge branch 'master' into custom_image_name

df57063

laurenyu approved these changes Jun 25, 2018

View reviewed changes

iquintero merged commit 235e5c5 into aws:master Jun 25, 2018

apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this pull request Nov 15, 2018

Merge pull request aws#223 from awslabs/arpin_linear_learner_kernel

98e3fb2

Fixed: Linear Learner's kernel to align with SageMaker Notebooks

Allow Framework Estimators to use custom image #223

Allow Framework Estimators to use custom image #223

Uh oh!

Conversation

iquintero commented Jun 11, 2018 • edited by laurenyu Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Checklist

Uh oh!

codecov-io commented Jun 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

winstonaws left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

winstonaws Jun 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

iquintero commented Jun 11, 2018 •

edited by laurenyu

Loading

codecov-io commented Jun 11, 2018 •

edited

Loading

winstonaws Jun 14, 2018 •

edited

Loading