Skip to content

Commit 065ce30

Browse files
Danpengk19
authored andcommitted
feature: support MXNet 1.4 with MMS (aws#812)
1 parent 8bbcd06 commit 065ce30

File tree

12 files changed

+194
-68
lines changed

12 files changed

+194
-68
lines changed

doc/using_mxnet.rst

Lines changed: 25 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ Using MXNet with the SageMaker Python SDK
66

77
With the SageMaker Python SDK, you can train and host MXNet models on Amazon SageMaker.
88

9-
Supported versions of MXNet: ``1.3.0``, ``1.2.1``, ``1.1.0``, ``1.0.0``, ``0.12.1``.
9+
Supported versions of MXNet: ``1.4.0``, ``1.3.0``, ``1.2.1``, ``1.1.0``, ``1.0.0``, ``0.12.1``.
1010

11-
Supported versions of MXNet for Elastic Inference: ``1.3.0``.
11+
Supported versions of MXNet for Elastic Inference: ``1.4.0``, ``1.3.0``.
1212

1313
Training with MXNet
1414
-------------------
@@ -38,7 +38,7 @@ Preparing the MXNet training script
3838
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
3939
| WARNING |
4040
+==========================================================================================================================================================+
41-
| The structure for training scripts changed with MXNet version 1.3. |
41+
| The structure for training scripts changed starting at MXNet version 1.3. |
4242
| Make sure you refer to the correct section of this README when you prepare your script. |
4343
| For information on how to upgrade an old script to the new format, see `"Updating your MXNet training script" <#updating-your-mxnet-training-script>`__. |
4444
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
@@ -700,6 +700,13 @@ Where ``model`` is the model objected loaded by ``model_fn``, ``request_body`` i
700700
This one function should handle processing the input, performing a prediction, and processing the output.
701701
The return object should be one of the following:
702702

703+
For versions 1.4 and higher:
704+
----------------------------
705+
- a tuple with two items: the response data and ``accept_type`` (the content type of the response data), or
706+
- the response data: (the content type of the response will be set to either the accept header in the initial request or default to application/json)
707+
708+
For versions 1.3 and lower:
709+
---------------------------
703710
- a tuple with two items: the response data and ``accept_type`` (the content type of the response data), or
704711
- a Flask response object: http://flask.pocoo.org/docs/1.0/api/#response-objects
705712

@@ -802,23 +809,24 @@ Your MXNet training script will be run on version 1.2.1 by default. (See below f
802809

803810
The Docker images have the following dependencies installed:
804811

805-
+-------------------------+--------------+-------------+-------------+-------------+-------------+
806-
| Dependencies | MXNet 0.12.1 | MXNet 1.0.0 | MXNet 1.1.0 | MXNet 1.2.1 | MXNet 1.3.0 |
807-
+-------------------------+--------------+-------------+-------------+-------------+-------------+
808-
| Python | 2.7 or 3.5 | 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5|
809-
+-------------------------+--------------+-------------+-------------+-------------+-------------+
810-
| CUDA (GPU image only) | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 |
811-
+-------------------------+--------------+-------------+-------------+-------------+-------------+
812-
| numpy | 1.13.3 | 1.13.3 | 1.13.3 | 1.14.5 | 1.14.6 |
813-
+-------------------------+--------------+-------------+-------------+-------------+-------------+
814-
| onnx | N/A | N/A | N/A | 1.2.1 | 1.2.1 |
815-
+-------------------------+--------------+-------------+-------------+-------------+-------------+
816-
| keras-mxnet | N/A | N/A | N/A | N/A | 2.2.2 |
817-
+-------------------------+--------------+-------------+-------------+-------------+-------------+
812+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
813+
| Dependencies | MXNet 0.12.1 | MXNet 1.0.0 | MXNet 1.1.0 | MXNet 1.2.1 | MXNet 1.3.0 | MXNet 1.4.0 |
814+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
815+
| Python | 2.7 or 3.5 | 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.6|
816+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
817+
| CUDA (GPU image only) | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 9.2 |
818+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
819+
| numpy | 1.13.3 | 1.13.3 | 1.13.3 | 1.14.5 | 1.14.6 | 1.16.3 |
820+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
821+
| onnx | N/A | N/A | N/A | 1.2.1 | 1.2.1 | 1.4.1 |
822+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
823+
| keras-mxnet | N/A | N/A | N/A | N/A | 2.2.2 | 2.2.4.1 |
824+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
818825

819826
The Docker images extend Ubuntu 16.04.
820827

821828
You can select version of MXNet by passing a ``framework_version`` keyword arg to the MXNet Estimator constructor. Currently supported versions are listed in the above table. You can also set ``framework_version`` to only specify major and minor version, e.g ``1.2``, which will cause your training script to be run on the latest supported patch version of that minor version, which in this example would be 1.2.1.
822829
Alternatively, you can build your own image by following the instructions in the SageMaker MXNet containers repository, and passing ``image_name`` to the MXNet Estimator constructor.
823830

824-
You can visit the SageMaker MXNet containers repository here: https://github.com/aws/sagemaker-mxnet-container
831+
You can visit the SageMaker MXNet training containers repository here: https://github.com/aws/sagemaker-mxnet-container
832+
You can visit the SageMaker MXNet serving containers repository here: https://github.com/aws/sagemaker-mxnet-serving-container

src/sagemaker/fw_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@
4545
'Please add framework_version={} to your constructor to avoid this error.'
4646

4747
VALID_PY_VERSIONS = ['py2', 'py3']
48-
VALID_EIA_FRAMEWORKS = ['tensorflow', 'tensorflow-serving', 'mxnet']
48+
VALID_EIA_FRAMEWORKS = ['tensorflow', 'tensorflow-serving', 'mxnet', 'mxnet-serving']
4949
VALID_ACCOUNTS_BY_REGION = {'us-gov-west-1': '246785580436',
5050
'us-iso-east-1': '744548109606'}
5151

src/sagemaker/model.py

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,11 @@
1414

1515
import json
1616
import logging
17+
import os
1718

1819
import sagemaker
1920
from sagemaker import fw_utils, local, session, utils
21+
from sagemaker.fw_utils import UploadedCode
2022
from sagemaker.transformer import Transformer
2123

2224
LOGGER = logging.getLogger('sagemaker')
@@ -408,6 +410,7 @@ def __init__(self, model_data, image, role, entry_point, source_dir=None, predic
408410
else:
409411
self.bucket, self.key_prefix = None, None
410412
self.uploaded_code = None
413+
self.repacked_model_data = None
411414

412415
def prepare_container_def(self, instance_type, accelerator_type=None): # pylint disable=unused-argument
413416
"""Return a container definition with framework configuration set in model environment variables.
@@ -428,18 +431,27 @@ def prepare_container_def(self, instance_type, accelerator_type=None): # pylint
428431
deploy_env.update(self._framework_env_vars())
429432
return sagemaker.container_def(self.image, self.model_data, deploy_env)
430433

431-
def _upload_code(self, key_prefix):
434+
def _upload_code(self, key_prefix, repack=False):
432435
local_code = utils.get_config_value('local.local_code', self.sagemaker_session.config)
433436
if self.sagemaker_session.local_mode and local_code:
434437
self.uploaded_code = None
435438
else:
436-
bucket = self.bucket or self.sagemaker_session.default_bucket()
437-
self.uploaded_code = fw_utils.tar_and_upload_dir(session=self.sagemaker_session.boto_session,
438-
bucket=bucket,
439-
s3_key_prefix=key_prefix,
440-
script=self.entry_point,
441-
directory=self.source_dir,
442-
dependencies=self.dependencies)
439+
if not repack:
440+
bucket = self.bucket or self.sagemaker_session.default_bucket()
441+
self.uploaded_code = fw_utils.tar_and_upload_dir(session=self.sagemaker_session.boto_session,
442+
bucket=bucket,
443+
s3_key_prefix=key_prefix,
444+
script=self.entry_point,
445+
directory=self.source_dir,
446+
dependencies=self.dependencies)
447+
448+
if repack:
449+
self.repacked_model_data = utils.repack_model(inference_script=self.entry_point,
450+
source_directory=self.source_dir,
451+
model_uri=self.model_data,
452+
sagemaker_session=self.sagemaker_session)
453+
self.uploaded_code = UploadedCode(s3_prefix=self.repacked_model_data,
454+
script_name=os.path.basename(self.entry_point))
443455

444456
def _framework_env_vars(self):
445457
if self.uploaded_code:

src/sagemaker/mxnet/README.rst

Lines changed: 20 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ Using MXNet with the SageMaker Python SDK
44

55
With the SageMaker Python SDK, you can train and host MXNet models on Amazon SageMaker.
66

7-
Supported versions of MXNet: ``1.3.0``, ``1.2.1``, ``1.1.0``, ``1.0.0``, ``0.12.1``.
7+
Supported versions of MXNet: ``1.4.0``, ``1.3.0``, ``1.2.1``, ``1.1.0``, ``1.0.0``, ``0.12.1``.
88

9-
Supported versions of MXNet for Elastic Inference: ``1.3.0``.
9+
Supported versions of MXNet for Elastic Inference: ``1.4.0``, ``1.3.0``.
1010

1111
For information about using MXNet with the SageMaker Python SDK, see https://sagemaker.readthedocs.io/en/stable/using_mxnet.html.
1212

@@ -15,29 +15,32 @@ SageMaker MXNet Containers
1515

1616
When training and deploying training scripts, SageMaker runs your Python script in a Docker container with several libraries installed. When creating the Estimator and calling deploy to create the SageMaker Endpoint, you can control the environment your script runs in.
1717

18-
SageMaker runs MXNet Estimator scripts in either Python 2.7 or Python 3.5. You can select the Python version by passing a ``py_version`` keyword arg to the MXNet Estimator constructor. Setting this to ``py2`` (the default) will cause your training script to be run on Python 2.7. Setting this to ``py3`` will cause your training script to be run on Python 3.5. This Python version applies to both the Training Job, created by fit, and the Endpoint, created by deploy.
18+
SageMaker runs MXNet scripts in either Python 2.7 or Python 3.6. You can select the Python version by passing a ``py_version`` keyword arg to the MXNet Estimator constructor. Setting this to ``py2`` (the default) will cause your training script to be run on Python 2.7. Setting this to ``py3`` will cause your training script to be run on Python 3.6. This Python version applies to both the Training Job, created by fit, and the Endpoint, created by deploy.
1919

2020
Your MXNet training script will be run on version 1.2.1 by default. (See below for how to choose a different version, and currently supported versions.) The decision to use the GPU or CPU version of MXNet is made by the ``train_instance_type``, set on the MXNet constructor. If you choose a GPU instance type, your training job will be run on a GPU version of MXNet. If you choose a CPU instance type, your training job will be run on a CPU version of MXNet. Similarly, when you call deploy, specifying a GPU or CPU deploy_instance_type, will control which MXNet build your Endpoint runs.
2121

2222
The Docker images have the following dependencies installed:
2323

24-
+-------------------------+--------------+-------------+-------------+-------------+-------------+
25-
| Dependencies | MXNet 0.12.1 | MXNet 1.0.0 | MXNet 1.1.0 | MXNet 1.2.1 | MXNet 1.3.0 |
26-
+-------------------------+--------------+-------------+-------------+-------------+-------------+
27-
| Python | 2.7 or 3.5 | 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5|
28-
+-------------------------+--------------+-------------+-------------+-------------+-------------+
29-
| CUDA (GPU image only) | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 |
30-
+-------------------------+--------------+-------------+-------------+-------------+-------------+
31-
| numpy | 1.13.3 | 1.13.3 | 1.13.3 | 1.14.5 | 1.14.6 |
32-
+-------------------------+--------------+-------------+-------------+-------------+-------------+
33-
| onnx | N/A | N/A | N/A | 1.2.1 | 1.2.1 |
34-
+-------------------------+--------------+-------------+-------------+-------------+-------------+
35-
| keras-mxnet | N/A | N/A | N/A | N/A | 2.2.2 |
36-
+-------------------------+--------------+-------------+-------------+-------------+-------------+
24+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
25+
| Dependencies | MXNet 0.12.1 | MXNet 1.0.0 | MXNet 1.1.0 | MXNet 1.2.1 | MXNet 1.3.0 | MXNet 1.4.0 |
26+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
27+
| Python | 2.7 or 3.5 | 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.5| 2.7 or 3.6|
28+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
29+
| CUDA (GPU image only) | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 9.2 |
30+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
31+
| numpy | 1.13.3 | 1.13.3 | 1.13.3 | 1.14.5 | 1.14.6 | 1.16.3 |
32+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
33+
| onnx | N/A | N/A | N/A | 1.2.1 | 1.2.1 | 1.4.1 |
34+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
35+
| keras-mxnet | N/A | N/A | N/A | N/A | 2.2.2 | 2.2.4.1 |
36+
+-------------------------+--------------+-------------+-------------+-------------+-------------+-------------+
3737

3838
The Docker images extend Ubuntu 16.04.
3939

4040
You can select version of MXNet by passing a ``framework_version`` keyword arg to the MXNet Estimator constructor. Currently supported versions are listed in the above table. You can also set ``framework_version`` to only specify major and minor version, e.g ``1.2``, which will cause your training script to be run on the latest supported patch version of that minor version, which in this example would be 1.2.1.
4141
Alternatively, you can build your own image by following the instructions in the SageMaker MXNet containers repository, and passing ``image_name`` to the MXNet Estimator constructor.
4242

43-
You can visit the SageMaker MXNet containers repository here: https://github.com/aws/sagemaker-mxnet-container
43+
You can visit the SageMaker MXNet container repositories here:
44+
45+
- training: https://github.com/aws/sagemaker-mxnet-container
46+
- serving: https://github.com/aws/sagemaker-mxnet-serving-container

src/sagemaker/mxnet/estimator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ class MXNet(Framework):
3030
__framework_name__ = 'mxnet'
3131
_LOWEST_SCRIPT_MODE_VERSION = ['1', '3']
3232

33-
LATEST_VERSION = '1.3'
33+
LATEST_VERSION = '1.4'
3434
"""The latest version of MXNet included in the SageMaker pre-built Docker images."""
3535

3636
def __init__(self, entry_point, source_dir=None, hyperparameters=None, py_version='py2',

src/sagemaker/mxnet/model.py

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@
1414

1515
import logging
1616

17+
from pkg_resources import parse_version
18+
1719
import sagemaker
1820
from sagemaker.fw_utils import create_image_uri, model_code_key_prefix, python_deprecation_warning
1921
from sagemaker.model import FrameworkModel, MODEL_SERVER_WORKERS_PARAM_NAME
@@ -45,6 +47,7 @@ class MXNetModel(FrameworkModel):
4547
"""An MXNet SageMaker ``Model`` that can be deployed to a SageMaker ``Endpoint``."""
4648

4749
__framework_name__ = 'mxnet'
50+
_LOWEST_MMS_VERSION = '1.4'
4851

4952
def __init__(self, model_data, role, entry_point, image=None, py_version='py2', framework_version=MXNET_VERSION,
5053
predictor_cls=MXNetPredictor, model_server_workers=None, **kwargs):
@@ -89,17 +92,24 @@ def prepare_container_def(self, instance_type, accelerator_type=None):
8992
Returns:
9093
dict[str, str]: A container definition object usable with the CreateModel API.
9194
"""
95+
mms_version = parse_version(self.framework_version) >= parse_version(self._LOWEST_MMS_VERSION)
96+
9297
deploy_image = self.image
9398
if not deploy_image:
9499
region_name = self.sagemaker_session.boto_session.region_name
95-
deploy_image = create_image_uri(region_name, self.__framework_name__, instance_type,
100+
101+
framework_name = self.__framework_name__
102+
if mms_version:
103+
framework_name += '-serving'
104+
105+
deploy_image = create_image_uri(region_name, framework_name, instance_type,
96106
self.framework_version, self.py_version, accelerator_type=accelerator_type)
97107

98108
deploy_key_prefix = model_code_key_prefix(self.key_prefix, self.name, deploy_image)
99-
self._upload_code(deploy_key_prefix)
109+
self._upload_code(deploy_key_prefix, mms_version)
100110
deploy_env = dict(self.env)
101111
deploy_env.update(self._framework_env_vars())
102112

103113
if self.model_server_workers:
104114
deploy_env[MODEL_SERVER_WORKERS_PARAM_NAME.upper()] = str(self.model_server_workers)
105-
return sagemaker.container_def(deploy_image, self.model_data, deploy_env)
115+
return sagemaker.container_def(deploy_image, self.repacked_model_data or self.model_data, deploy_env)

src/sagemaker/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -344,7 +344,7 @@ def repack_model(inference_script, source_directory, model_uri, sagemaker_sessio
344344
local_code_path = os.path.join(tmp, 'local_code.tar.gz')
345345
download_file_from_url(source_directory, local_code_path, sagemaker_session)
346346

347-
with tarfile.open(name=local_model_path, mode='r:gz') as t:
347+
with tarfile.open(name=local_code_path, mode='r:gz') as t:
348348
t.extractall(path=code_dir)
349349

350350
elif source_directory:

tests/conftest.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ def chainer_version(request):
108108

109109

110110
@pytest.fixture(scope='module', params=['0.12', '0.12.1', '1.0', '1.0.0', '1.1', '1.1.0', '1.2',
111-
'1.2.1', '1.3', '1.3.0'])
111+
'1.2.1', '1.3', '1.3.0', '1.4', '1.4.0'])
112112
def mxnet_version(request):
113113
return request.param
114114

0 commit comments

Comments
 (0)