Skip to content

feature: support MXNet 1.4 with MMS #812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 24, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 25 additions & 17 deletions doc/using_mxnet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ Using MXNet with the SageMaker Python SDK

With the SageMaker Python SDK, you can train and host MXNet models on Amazon SageMaker.

Supported versions of MXNet: ``1.3.0``, ``1.2.1``, ``1.1.0``, ``1.0.0``, ``0.12.1``.
Supported versions of MXNet: ``1.4.0``, ``1.3.0``, ``1.2.1``, ``1.1.0``, ``1.0.0``, ``0.12.1``.

Supported versions of MXNet for Elastic Inference: ``1.3.0``.
Supported versions of MXNet for Elastic Inference: ``1.4.0``, ``1.3.0``.

Training with MXNet
-------------------
Expand Down Expand Up @@ -38,7 +38,7 @@ Preparing the MXNet training script
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| WARNING |
+==========================================================================================================================================================+
| The structure for training scripts changed with MXNet version 1.3. |
| The structure for training scripts changed starting at MXNet version 1.3. |
| Make sure you refer to the correct section of this README when you prepare your script. |
| For information on how to upgrade an old script to the new format, see `"Updating your MXNet training script" <#updating-your-mxnet-training-script>`__. |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
Expand Down Expand Up @@ -700,6 +700,13 @@ Where ``model`` is the model objected loaded by ``model_fn``, ``request_body`` i
This one function should handle processing the input, performing a prediction, and processing the output.
The return object should be one of the following:

For versions 1.4 and higher:
----------------------------
- a tuple with two items: the response data and ``accept_type`` (the content type of the response data), or
- the response data: (the content type of the response will be set to either the accept header in the initial request or default to application/json)

For versions 1.3 and lower:
---------------------------
- a tuple with two items: the response data and ``accept_type`` (the content type of the response data), or
- a Flask response object: http://flask.pocoo.org/docs/1.0/api/#response-objects

Expand Down Expand Up @@ -802,23 +809,24 @@ Your MXNet training script will be run on version 1.2.1 by default. (See below f

The Docker images have the following dependencies installed:

+-------------------------+--------------+-------------+-------------+-------------+-------------+
| Dependencies | MXNet 0.12.1 | MXNet 1.0.0 | MXNet 1.1.0 | MXNet 1.2.1 | MXNet 1.3.0 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| Python | 2.7 or 3.5 | 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5|
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| CUDA (GPU image only) | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| numpy | 1.13.3 | 1.13.3 | 1.13.3 | 1.14.5 | 1.14.6 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| onnx | N/A | N/A | N/A | 1.2.1 | 1.2.1 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| keras-mxnet | N/A | N/A | N/A | N/A | 2.2.2 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| Dependencies | MXNet 0.12.1 | MXNet 1.0.0 | MXNet 1.1.0 | MXNet 1.2.1 | MXNet 1.3.0 | MXNet 1.4.0 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| Python | 2.7 or 3.5 | 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.6|
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| CUDA (GPU image only) | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 9.2 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| numpy | 1.13.3 | 1.13.3 | 1.13.3 | 1.14.5 | 1.14.6 | 1.16.3 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| onnx | N/A | N/A | N/A | 1.2.1 | 1.2.1 | 1.4.1 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| keras-mxnet | N/A | N/A | N/A | N/A | 2.2.2 | 2.2.4.1 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+

The Docker images extend Ubuntu 16.04.

You can select version of MXNet by passing a ``framework_version`` keyword arg to the MXNet Estimator constructor. Currently supported versions are listed in the above table. You can also set ``framework_version`` to only specify major and minor version, e.g ``1.2``, which will cause your training script to be run on the latest supported patch version of that minor version, which in this example would be 1.2.1.
Alternatively, you can build your own image by following the instructions in the SageMaker MXNet containers repository, and passing ``image_name`` to the MXNet Estimator constructor.

You can visit the SageMaker MXNet containers repository here: https://github.com/aws/sagemaker-mxnet-container
You can visit the SageMaker MXNet training containers repository here: https://github.com/aws/sagemaker-mxnet-container
You can visit the SageMaker MXNet serving containers repository here: https://github.com/aws/sagemaker-mxnet-serving-container
2 changes: 1 addition & 1 deletion src/sagemaker/fw_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
'Please add framework_version={} to your constructor to avoid this error.'

VALID_PY_VERSIONS = ['py2', 'py3']
VALID_EIA_FRAMEWORKS = ['tensorflow', 'tensorflow-serving', 'mxnet']
VALID_EIA_FRAMEWORKS = ['tensorflow', 'tensorflow-serving', 'mxnet', 'mxnet-serving']
VALID_ACCOUNTS_BY_REGION = {'us-gov-west-1': '246785580436',
'us-iso-east-1': '744548109606'}

Expand Down
28 changes: 20 additions & 8 deletions src/sagemaker/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,11 @@

import json
import logging
import os

import sagemaker
from sagemaker import fw_utils, local, session, utils
from sagemaker.fw_utils import UploadedCode
from sagemaker.transformer import Transformer

LOGGER = logging.getLogger('sagemaker')
Expand Down Expand Up @@ -408,6 +410,7 @@ def __init__(self, model_data, image, role, entry_point, source_dir=None, predic
else:
self.bucket, self.key_prefix = None, None
self.uploaded_code = None
self.repacked_model_data = None

def prepare_container_def(self, instance_type, accelerator_type=None): # pylint disable=unused-argument
"""Return a container definition with framework configuration set in model environment variables.
Expand All @@ -428,18 +431,27 @@ def prepare_container_def(self, instance_type, accelerator_type=None): # pylint
deploy_env.update(self._framework_env_vars())
return sagemaker.container_def(self.image, self.model_data, deploy_env)

def _upload_code(self, key_prefix):
def _upload_code(self, key_prefix, repack=False):
local_code = utils.get_config_value('local.local_code', self.sagemaker_session.config)
if self.sagemaker_session.local_mode and local_code:
self.uploaded_code = None
else:
bucket = self.bucket or self.sagemaker_session.default_bucket()
self.uploaded_code = fw_utils.tar_and_upload_dir(session=self.sagemaker_session.boto_session,
bucket=bucket,
s3_key_prefix=key_prefix,
script=self.entry_point,
directory=self.source_dir,
dependencies=self.dependencies)
if not repack:
bucket = self.bucket or self.sagemaker_session.default_bucket()
self.uploaded_code = fw_utils.tar_and_upload_dir(session=self.sagemaker_session.boto_session,
bucket=bucket,
s3_key_prefix=key_prefix,
script=self.entry_point,
directory=self.source_dir,
dependencies=self.dependencies)

if repack:
self.repacked_model_data = utils.repack_model(inference_script=self.entry_point,
source_directory=self.source_dir,
model_uri=self.model_data,
sagemaker_session=self.sagemaker_session)
self.uploaded_code = UploadedCode(s3_prefix=self.repacked_model_data,
script_name=os.path.basename(self.entry_point))

def _framework_env_vars(self):
if self.uploaded_code:
Expand Down
37 changes: 20 additions & 17 deletions src/sagemaker/mxnet/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ Using MXNet with the SageMaker Python SDK

With the SageMaker Python SDK, you can train and host MXNet models on Amazon SageMaker.

Supported versions of MXNet: ``1.3.0``, ``1.2.1``, ``1.1.0``, ``1.0.0``, ``0.12.1``.
Supported versions of MXNet: ``1.4.0``, ``1.3.0``, ``1.2.1``, ``1.1.0``, ``1.0.0``, ``0.12.1``.

Supported versions of MXNet for Elastic Inference: ``1.3.0``.
Supported versions of MXNet for Elastic Inference: ``1.4.0``, ``1.3.0``.

For information about using MXNet with the SageMaker Python SDK, see https://sagemaker.readthedocs.io/en/stable/using_mxnet.html.

Expand All @@ -15,29 +15,32 @@ SageMaker MXNet Containers

When training and deploying training scripts, SageMaker runs your Python script in a Docker container with several libraries installed. When creating the Estimator and calling deploy to create the SageMaker Endpoint, you can control the environment your script runs in.

SageMaker runs MXNet Estimator scripts in either Python 2.7 or Python 3.5. You can select the Python version by passing a ``py_version`` keyword arg to the MXNet Estimator constructor. Setting this to ``py2`` (the default) will cause your training script to be run on Python 2.7. Setting this to ``py3`` will cause your training script to be run on Python 3.5. This Python version applies to both the Training Job, created by fit, and the Endpoint, created by deploy.
SageMaker runs MXNet scripts in either Python 2.7 or Python 3.6. You can select the Python version by passing a ``py_version`` keyword arg to the MXNet Estimator constructor. Setting this to ``py2`` (the default) will cause your training script to be run on Python 2.7. Setting this to ``py3`` will cause your training script to be run on Python 3.6. This Python version applies to both the Training Job, created by fit, and the Endpoint, created by deploy.

Your MXNet training script will be run on version 1.2.1 by default. (See below for how to choose a different version, and currently supported versions.) The decision to use the GPU or CPU version of MXNet is made by the ``train_instance_type``, set on the MXNet constructor. If you choose a GPU instance type, your training job will be run on a GPU version of MXNet. If you choose a CPU instance type, your training job will be run on a CPU version of MXNet. Similarly, when you call deploy, specifying a GPU or CPU deploy_instance_type, will control which MXNet build your Endpoint runs.

The Docker images have the following dependencies installed:

+-------------------------+--------------+-------------+-------------+-------------+-------------+
| Dependencies | MXNet 0.12.1 | MXNet 1.0.0 | MXNet 1.1.0 | MXNet 1.2.1 | MXNet 1.3.0 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| Python | 2.7 or 3.5 | 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5|
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| CUDA (GPU image only) | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| numpy | 1.13.3 | 1.13.3 | 1.13.3 | 1.14.5 | 1.14.6 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| onnx | N/A | N/A | N/A | 1.2.1 | 1.2.1 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
| keras-mxnet | N/A | N/A | N/A | N/A | 2.2.2 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| Dependencies | MXNet 0.12.1 | MXNet 1.0.0 | MXNet 1.1.0 | MXNet 1.2.1 | MXNet 1.3.0 | MXNet 1.4.0 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| Python | 2.7 or 3.5 | 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.6|
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| CUDA (GPU image only) | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 9.2 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| numpy | 1.13.3 | 1.13.3 | 1.13.3 | 1.14.5 | 1.14.6 | 1.16.3 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| onnx | N/A | N/A | N/A | 1.2.1 | 1.2.1 | 1.4.1 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
| keras-mxnet | N/A | N/A | N/A | N/A | 2.2.2 | 2.2.4.1 |
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+

The Docker images extend Ubuntu 16.04.

You can select version of MXNet by passing a ``framework_version`` keyword arg to the MXNet Estimator constructor. Currently supported versions are listed in the above table. You can also set ``framework_version`` to only specify major and minor version, e.g ``1.2``, which will cause your training script to be run on the latest supported patch version of that minor version, which in this example would be 1.2.1.
Alternatively, you can build your own image by following the instructions in the SageMaker MXNet containers repository, and passing ``image_name`` to the MXNet Estimator constructor.

You can visit the SageMaker MXNet containers repository here: https://github.com/aws/sagemaker-mxnet-container
You can visit the SageMaker MXNet container repositories here:

- training: https://github.com/aws/sagemaker-mxnet-container
- serving: https://github.com/aws/sagemaker-mxnet-serving-container
2 changes: 1 addition & 1 deletion src/sagemaker/mxnet/estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ class MXNet(Framework):
__framework_name__ = 'mxnet'
_LOWEST_SCRIPT_MODE_VERSION = ['1', '3']

LATEST_VERSION = '1.3'
LATEST_VERSION = '1.4'
"""The latest version of MXNet included in the SageMaker pre-built Docker images."""

def __init__(self, entry_point, source_dir=None, hyperparameters=None, py_version='py2',
Expand Down
16 changes: 13 additions & 3 deletions src/sagemaker/mxnet/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@

import logging

from pkg_resources import parse_version

import sagemaker
from sagemaker.fw_utils import create_image_uri, model_code_key_prefix, python_deprecation_warning
from sagemaker.model import FrameworkModel, MODEL_SERVER_WORKERS_PARAM_NAME
Expand Down Expand Up @@ -45,6 +47,7 @@ class MXNetModel(FrameworkModel):
"""An MXNet SageMaker ``Model`` that can be deployed to a SageMaker ``Endpoint``."""

__framework_name__ = 'mxnet'
_LOWEST_MMS_VERSION = '1.4'

def __init__(self, model_data, role, entry_point, image=None, py_version='py2', framework_version=MXNET_VERSION,
predictor_cls=MXNetPredictor, model_server_workers=None, **kwargs):
Expand Down Expand Up @@ -89,17 +92,24 @@ def prepare_container_def(self, instance_type, accelerator_type=None):
Returns:
dict[str, str]: A container definition object usable with the CreateModel API.
"""
mms_version = parse_version(self.framework_version) >= parse_version(self._LOWEST_MMS_VERSION)

deploy_image = self.image
if not deploy_image:
region_name = self.sagemaker_session.boto_session.region_name
deploy_image = create_image_uri(region_name, self.__framework_name__, instance_type,

framework_name = self.__framework_name__
if mms_version:
framework_name += '-serving'

deploy_image = create_image_uri(region_name, framework_name, instance_type,
self.framework_version, self.py_version, accelerator_type=accelerator_type)

deploy_key_prefix = model_code_key_prefix(self.key_prefix, self.name, deploy_image)
self._upload_code(deploy_key_prefix)
self._upload_code(deploy_key_prefix, mms_version)
deploy_env = dict(self.env)
deploy_env.update(self._framework_env_vars())

if self.model_server_workers:
deploy_env[MODEL_SERVER_WORKERS_PARAM_NAME.upper()] = str(self.model_server_workers)
return sagemaker.container_def(deploy_image, self.model_data, deploy_env)
return sagemaker.container_def(deploy_image, self.repacked_model_data or self.model_data, deploy_env)
2 changes: 1 addition & 1 deletion src/sagemaker/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -344,7 +344,7 @@ def repack_model(inference_script, source_directory, model_uri, sagemaker_sessio
local_code_path = os.path.join(tmp, 'local_code.tar.gz')
download_file_from_url(source_directory, local_code_path, sagemaker_session)

with tarfile.open(name=local_model_path, mode='r:gz') as t:
with tarfile.open(name=local_code_path, mode='r:gz') as t:
t.extractall(path=code_dir)

elif source_directory:
Expand Down
2 changes: 1 addition & 1 deletion tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ def chainer_version(request):


@pytest.fixture(scope='module', params=['0.12', '0.12.1', '1.0', '1.0.0', '1.1', '1.1.0', '1.2',
'1.2.1', '1.3', '1.3.0'])
'1.2.1', '1.3', '1.3.0', '1.4', '1.4.0'])
def mxnet_version(request):
return request.param

Expand Down
Loading