Skip to content

Update default versions to TensorFlow 1.6 and MXNet 1.1 #118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 2, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,13 @@
CHANGELOG
=========

1.1.dev4
1.2.0
========
* feature: Frameworks: Use more idiomatic ECR repository naming scheme

* feature: Add Support for Local Mode
* feature: Estimators: add support for TensorFlow 1.6.0
* feature: Estimators: add support for MXNet 1.1.0
* feature: Frameworks: Use more idiomatic ECR repository naming scheme

1.1.3
========
Expand Down
90 changes: 39 additions & 51 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -692,23 +692,23 @@ When training and deploying training scripts, SageMaker runs your Python script

SageMaker runs MXNet Estimator scripts in either Python 2.7 or Python 3.5. You can select the Python version by passing a ``py_version`` keyword arg to the MXNet Estimator constructor. Setting this to ``py2`` (the default) will cause your training script to be run on Python 2.7. Setting this to ``py3`` will cause your training script to be run on Python 3.5. This Python version applies to both the Training Job, created by fit, and the Endpoint, created by deploy.

Your MXNet training script will be run on version 1.0.0 (by default) or 0.12 of MXNet, built for either GPU or CPU use. The decision to use the GPU or CPU version of MXNet is made by the ``train_instance_type``, set on the MXNet constructor. If you choose a GPU instance type, your training job will be run on a GPU version of MXNet. If you choose a CPU instance type, your training job will be run on a CPU version of MXNet. Similarly, when you call deploy, specifying a GPU or CPU deploy_instance_type, will control which MXNet build your Endpoint runs.
Your MXNet training script will be run on version 1.1.0 by default. (See below for how to choose a different version, and currently supported versions.) The decision to use the GPU or CPU version of MXNet is made by the ``train_instance_type``, set on the MXNet constructor. If you choose a GPU instance type, your training job will be run on a GPU version of MXNet. If you choose a CPU instance type, your training job will be run on a CPU version of MXNet. Similarly, when you call deploy, specifying a GPU or CPU deploy_instance_type, will control which MXNet build your Endpoint runs.

The Docker images have the following dependencies installed:

+-------------------------+--------------+-------------+
| Dependencies | MXNet 0.12.1 | MXNet 1.0.0 |
+-------------------------+--------------+-------------+
| Python | 2.7 or 3.5 | 2.7 or 3.5|
+-------------------------+--------------+-------------+
| CUDA | 9.0 | 9.0 |
+-------------------------+--------------+-------------+
| numpy | 1.13.3 | 1.13.3 |
+-------------------------+--------------+-------------+
+-------------------------+--------------+-------------+-------------+
| Dependencies | MXNet 0.12.1 | MXNet 1.0.0 | MXNet 1.1.0 |
+-------------------------+--------------+-------------+-------------+
| Python | 2.7 or 3.5 | 2.7 or 3.5| 2.7 or 3.5|
+-------------------------+--------------+-------------+-------------+
| CUDA | 9.0 | 9.0 | 9.0 |
+-------------------------+--------------+-------------+-------------+
| numpy | 1.13.3 | 1.13.3 | 1.13.3 |
+-------------------------+--------------+-------------+-------------+

The Docker images extend Ubuntu 16.04.

You can select version of MXNet by passing a ``framework_version`` keyword arg to the MXNet Estimator constructor. Currently supported versions are ``1.0.0`` and ``0.12.1``. You can also set ``framework_version`` to ``1.0 (default)`` or ``0.12`` which will cause your training script to be run on the latest supported MXNet 1.0 or 0.12 versions respectively.
You can select version of MXNet by passing a ``framework_version`` keyword arg to the MXNet Estimator constructor. Currently supported versions are listed in the above table. You can also set ``framework_version`` to only specify major and minor version, e.g ``1.1``, which will cause your training script to be run on the latest supported patch version of that minor version, which in this example would be 1.1.0.

TensorFlow SageMaker Estimators
-------------------------------
Expand All @@ -717,7 +717,7 @@ TensorFlow SageMaker Estimators allow you to run your own TensorFlow
training algorithms on SageMaker Learner, and to host your own TensorFlow
models on SageMaker Hosting.

Supported versions of TensorFlow: ``1.4.1``, ``1.5.0``.
Supported versions of TensorFlow: ``1.4.1``, ``1.5.0``, ``1.6.0``.

Training with TensorFlow
~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -752,7 +752,7 @@ Preparing the TensorFlow training script
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Your TensorFlow training script must be a **Python 2.7** source file. The current supported TensorFlow
versions are **1.5.0 (default)** and **1.4.1**. This training script **must contain** the following functions:
versions are **1.6.0 (default)**, **1.5.0**, and **1.4.1**. This training script **must contain** the following functions:

- ``model_fn``: defines the model that will be trained.
- ``train_input_fn``: preprocess and load training data.
Expand Down Expand Up @@ -1443,47 +1443,35 @@ SageMaker TensorFlow Docker containers

The TensorFlow Docker images support Python 2.7 and have the following Python modules installed:

+------------------------+------------------+------------------+
| Dependencies | tensorflow 1.4.1 | tensorflow 1.5.0 |
+------------------------+------------------+------------------+
| awscli | 1.12.1 | 1.14.35 |
+------------------------+------------------+------------------+
| boto3 | 1.4.7 | 1.5.22 |
+------------------------+------------------+------------------+
| botocore | 1.5.92 | 1.8.36 |
+------------------------+------------------+------------------+
| futures | 2.2.0 | 2.2.0 |
+------------------------+------------------+------------------+
| gevent | 1.2.2 | 1.2.2 |
+------------------------+------------------+------------------+
| grpcio | 1.7.0 | 1.9.0 |
+------------------------+------------------+------------------+
| numpy | 1.13.3 | 1.14.0 |
+------------------------+------------------+------------------+
| pandas | 0.21.0 | 0.22.0 |
+------------------------+------------------+------------------+
| protobuf | 3.4.0 | 3.5.1 |
+------------------------+------------------+------------------+
| requests | 2.14.2 | 2.18.4 |
+------------------------+------------------+------------------+
| scikit-learn | 0.19.1 | 0.19.1 |
+------------------------+------------------+------------------+
| scipy | 1.0.0 | 1.0.0 |
+------------------------+------------------+------------------+
| six | 1.10.0 | 1.10.0 |
+------------------------+------------------+------------------+
| sklearn | 0.0 | 0.0 |
+------------------------+------------------+------------------+
| tensorflow | 1.4.1 | 1.5.0 |
+------------------------+------------------+------------------+
| tensorflow-serving-api | 1.4.0 | 1.5.0 |
+------------------------+------------------+------------------+
| tensorflow-tensorboard | 0.4.0 | 1.5.1 |
+------------------------+------------------+------------------+
+------------------------+------------------+------------------+------------------+
| Dependencies | tensorflow 1.4.1 | tensorflow 1.5.0 | tensorflow 1.6.0 |
+------------------------+------------------+------------------+------------------+
| boto3 | 1.4.7 | 1.5.22 | 1.6.21 |
+------------------------+------------------+------------------+------------------+
| botocore | 1.5.92 | 1.8.36 | 1.9.21 |
+------------------------+------------------+------------------+------------------+
| grpcio | 1.7.0 | 1.9.0 | 1.10.0 |
+------------------------+------------------+------------------+------------------+
| numpy | 1.13.3 | 1.14.0 | 1.14.2 |
+------------------------+------------------+------------------+------------------+
| pandas | 0.21.0 | 0.22.0 | 0.22.0 |
+------------------------+------------------+------------------+------------------+
| protobuf | 3.4.0 | 3.5.1 | 3.5.2 |
+------------------------+------------------+------------------+------------------+
| scikit-learn | 0.19.1 | 0.19.1 | 0.19.1 |
+------------------------+------------------+------------------+------------------+
| scipy | 1.0.0 | 1.0.0 | 1.0.1 |
+------------------------+------------------+------------------+------------------+
| sklearn | 0.0 | 0.0 | 0.0 |
+------------------------+------------------+------------------+------------------+
| tensorflow | 1.4.1 | 1.5.0 | 1.6.0 |
+------------------------+------------------+------------------+------------------+
| tensorflow-serving-api | 1.4.0 | 1.5.0 | 1.5.0 |
+------------------------+------------------+------------------+------------------+

The Docker images extend Ubuntu 16.04.

You can select version of TensorFlow by passing a ``framework_version`` keyword arg to the TensorFlow Estimator constructor. Currently supported versions are ``1.5.0`` and ``1.4.1``. You can also set ``framework_version`` to ``1.5 (default)`` or ``1.4`` which will cause your training script to be run on the latest supported TensorFlow 1.5 or 1.4 versions respectively.
You can select version of TensorFlow by passing a ``framework_version`` keyword arg to the TensorFlow Estimator constructor. Currently supported versions are listed in the table above. You can also set ``framework_version`` to only specify major and minor version, e.g ``1.6``, which will cause your training script to be run on the latest supported patch version of that minor version, which in this example would be 1.6.0.

AWS SageMaker Estimators
------------------------
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ def read(fname):


setup(name="sagemaker",
version="1.1.3",
version="1.2.0",
description="Open source library for training and deploying models on Amazon SageMaker.",
packages=find_packages('src'),
package_dir={'': 'src'},
Expand Down
2 changes: 1 addition & 1 deletion src/sagemaker/mxnet/defaults.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
# ANY KIND, either express or implied. See the License for the specific
# language governing permissions and limitations under the License.
MXNET_VERSION = '1.0'
MXNET_VERSION = '1.1'
2 changes: 1 addition & 1 deletion src/sagemaker/tensorflow/defaults.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
# ANY KIND, either express or implied. See the License for the specific
# language governing permissions and limitations under the License.
TF_VERSION = '1.5'
TF_VERSION = '1.6'
8 changes: 4 additions & 4 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,21 +56,21 @@ def sagemaker_session(sagemaker_client_config, sagemaker_runtime_config, boto_co
sagemaker_runtime_client=runtime_client)


@pytest.fixture(scope='module', params=["1.4", "1.4.1", "1.5", "1.5.0"])
@pytest.fixture(scope='module', params=['1.4', '1.4.1', '1.5', '1.5.0', '1.6', '1.6.0'])
def tf_version(request):
return request.param


@pytest.fixture(scope='module', params=["0.12", "0.12.1", "1.0", "1.0.0"])
@pytest.fixture(scope='module', params=['0.12', '0.12.1', '1.0', '1.0.0', '1.1', '1.1.0'])
def mxnet_version(request):
return request.param


@pytest.fixture(scope='module', params=["1.4.1", "1.5.0"])
@pytest.fixture(scope='module', params=['1.4.1', '1.5.0', '1.6.0'])
def tf_full_version(request):
return request.param


@pytest.fixture(scope='module', params=["0.12.1", "1.0.0"])
@pytest.fixture(scope='module', params=['0.12.1', '1.0.0', '1.1.0'])
def mxnet_full_version(request):
return request.param
32 changes: 15 additions & 17 deletions tests/integ/test_mxnet_train.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,16 +52,28 @@ def test_attach_deploy(mxnet_training_job, sagemaker_session):
predictor.predict(data)


def test_async_fit(sagemaker_session, mxnet_full_version):
def test_deploy_model(mxnet_training_job, sagemaker_session):
endpoint_name = 'test-mxnet-deploy-model-{}'.format(sagemaker_timestamp())

with timeout_and_delete_endpoint_by_name(endpoint_name, sagemaker_session, minutes=20):
desc = sagemaker_session.sagemaker_client.describe_training_job(TrainingJobName=mxnet_training_job)
model_data = desc['ModelArtifacts']['S3ModelArtifacts']
script_path = os.path.join(DATA_DIR, 'mxnet_mnist', 'mnist.py')
model = MXNetModel(model_data, 'SageMakerRole', entry_point=script_path, sagemaker_session=sagemaker_session)
predictor = model.deploy(1, 'ml.m4.xlarge', endpoint_name=endpoint_name)

training_job_name = ""
data = numpy.zeros(shape=(1, 1, 28, 28))
predictor.predict(data)


def test_async_fit(sagemaker_session):
endpoint_name = 'test-mxnet-attach-deploy-{}'.format(sagemaker_timestamp())

with timeout(minutes=5):
script_path = os.path.join(DATA_DIR, 'mxnet_mnist', 'mnist.py')
data_path = os.path.join(DATA_DIR, 'mxnet_mnist')

mx = MXNet(entry_point=script_path, role='SageMakerRole', framework_version=mxnet_full_version,
mx = MXNet(entry_point=script_path, role='SageMakerRole',
train_instance_count=1, train_instance_type='ml.c4.xlarge',
sagemaker_session=sagemaker_session)

Expand All @@ -84,20 +96,6 @@ def test_async_fit(sagemaker_session, mxnet_full_version):
predictor.predict(data)


def test_deploy_model(mxnet_training_job, sagemaker_session):
endpoint_name = 'test-mxnet-deploy-model-{}'.format(sagemaker_timestamp())

with timeout_and_delete_endpoint_by_name(endpoint_name, sagemaker_session, minutes=20):
desc = sagemaker_session.sagemaker_client.describe_training_job(TrainingJobName=mxnet_training_job)
model_data = desc['ModelArtifacts']['S3ModelArtifacts']
script_path = os.path.join(DATA_DIR, 'mxnet_mnist', 'mnist.py')
model = MXNetModel(model_data, 'SageMakerRole', entry_point=script_path, sagemaker_session=sagemaker_session)
predictor = model.deploy(1, 'ml.m4.xlarge', endpoint_name=endpoint_name)

data = numpy.zeros(shape=(1, 1, 28, 28))
predictor.predict(data)


def test_failed_training_job(sagemaker_session, mxnet_full_version):
with timeout(minutes=15):
script_path = os.path.join(DATA_DIR, 'mxnet_mnist', 'failure_script.py')
Expand Down
4 changes: 1 addition & 3 deletions tests/integ/test_tf.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,14 +55,12 @@ def test_tf(sagemaker_session, tf_full_version):
assert dict_result == list_result


def test_tf_async(sagemaker_session, tf_full_version):
training_job_name = ""
def test_tf_async(sagemaker_session):
with timeout(minutes=5):
script_path = os.path.join(DATA_DIR, 'iris', 'iris-dnn-classifier.py')

estimator = TensorFlow(entry_point=script_path,
role='SageMakerRole',
framework_version=tf_full_version,
training_steps=1,
evaluation_steps=1,
hyperparameters={'input_tensor_name': 'inputs'},
Expand Down