Skip to content
This repository was archived by the owner on May 23, 2024. It is now read-only.

Add EI support to TFS container. #10

Merged
merged 6 commits into from
Mar 4, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ tox

To test Elastic Inference with Accelerator, you will need an AWS account, publish your built image to ECR repository and run the following command:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you provide an example of the command needed to run the test?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"To test against Elastic Inference, you will..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update these as well as the above comments.


pytest test/functional/test_elastic_inference.py --aws-id <aws_account> \
pytest test/sagemaker/test_elastic_inference.py --aws-id <aws_account> \
--docker-base-name <ECR_repository_name> \
--instance-type <instance_type> \
--accelerator-type <accelerator_type> \
Expand Down
4 changes: 2 additions & 2 deletions docker/Dockerfile.ei
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true

ARG TFS_SHORT_VERSION

COPY AmazonEI_Tensorflow_Serving_v${TFS_SHORT_VERSION}_v1 /usr/bin/tensorflow_model_server
COPY AmazonEI_TensorFlow_Serving_v${TFS_SHORT_VERSION}_v1 /usr/bin/tensorflow_model_server

# downloaded 1.12 version is not executable
RUN chmod +x /usr/bin/tensorflow_model_server
Expand All @@ -19,7 +19,7 @@ RUN \
apt-get clean

COPY ./ /
RUN rm AmazonEI_Tensorflow_Serving_v${TFS_SHORT_VERSION}_v1
RUN rm AmazonEI_TensorFlow_Serving_v${TFS_SHORT_VERSION}_v1

ENV SAGEMAKER_TFS_VERSION "${TFS_SHORT_VERSION}"
ENV PATH "$PATH:/sagemaker"
16 changes: 6 additions & 10 deletions scripts/shared.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,18 +29,14 @@ function get_aws_account() {
}

function get_tfs_executable() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this function utilize the parsed args below for the TF version?

It also looks like the naming scheme isn't consistent for the zip file for 1.11 and 1.12 already, which might require updating this file often.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only utilizes $version here, coz the naming for v1.11 and v1.12 are:
v1.11 -> Ubuntu -> Ubuntu.zip: Unbuntu/executable
v1.12 -> Ubuntu -> tfs_ei_v1_12_ubuntu.zip: v1_12_Ubuntu_2/executable

Even the v1.12's zip name and unzipped directory name are different. So I believe we will need to up date this file in the future.

Copy link

@ChoiByungWook ChoiByungWook Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like in S3 it is always going to follow the pattern of: s3://amazonei-tensorflow/Tensorflow\ Serving/{version}/Ubuntu.

We can do a discovery on the item for example with:
aws s3 ls s3://amazonei-tensorflow/Tensorflow\ Serving/v1.11/Ubuntu/ | awk '{print $4}'
which should return Ubuntu.zip

Afterwards, I think we can probably also discover the unzipped name as well. For example:
find . -name AmazonEI_TensorFlow_Serving_v${version}_v1* -exec mv {} container/ \;

There are probably better commands than the ones I used above in my examples.

# default to v1.12 in accordance with defaults below
s3_object='tfs_ei_v1_12_ubuntu'
unzipped='v1_12_Ubuntu'
zip_file=$(aws s3 ls 's3://amazonei-tensorflow/Tensorflow Serving/v'${version}'/Ubuntu/' | awk '{print $4}')
aws s3 cp 's3://amazonei-tensorflow/Tensorflow Serving/v'${version}'/Ubuntu/'${zip_file} .

if [ ${version} = '1.11' ]; then
s3_object='Ubuntu'
unzipped='Ubuntu'
fi
mkdir exec_dir
unzip ${zip_file} -d exec_dir

aws s3 cp 's3://amazonei-tensorflow/Tensorflow Serving/v'${version}'/Ubuntu/'${s3_object}'.zip' .
unzip ${s3_object} && mv ${unzipped}/AmazonEI_Tensorflow_Serving_v${version}_v1 container/
rm ${s3_object}.zip && rm -rf ${unzipped}
find . -name AmazonEI_TensorFlow_Serving_v${version}_v1* -exec mv {} container/ \;
rm ${zip_file} && rm -rf exec_dir
}

function parse_std_args() {
Expand Down
68 changes: 0 additions & 68 deletions test/functional/test_elastic_inference.py

This file was deleted.

6 changes: 1 addition & 5 deletions test/functional/conftest.py → test/sagemaker/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@

def pytest_addoption(parser):
parser.addoption('--aws-id')
parser.addoption('--docker-base-name', default='sagemaker-tensorflow-serving')
parser.addoption('--docker-base-name', default='functional-tensorflow-serving')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is functional meant to correspond to test/functional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's meant to be sagemaker-tensorflow-serving ... I just accidentally changed it to 'functional' for some reason I dont remember.

parser.addoption('--instance-type')
parser.addoption('--accelerator-type', default=None)
parser.addoption('--region', default='us-west-2')
Expand Down Expand Up @@ -94,7 +94,3 @@ def docker_image_uri(docker_registry, docker_image):
uri = '{}/{}'.format(docker_registry, docker_image)
return uri


@pytest.fixture(scope='session')
def sagemaker_session(region):
return Session(boto_session=boto3.Session(region_name=region))
114 changes: 114 additions & 0 deletions test/sagemaker/test_elastic_inference.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"). You
# may not use this file except in compliance with the License. A copy of
# the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "license" file accompanying this file. This file is
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
# ANY KIND, either express or implied. See the License for the specific
# language governing permissions and limitations under the License.
import io
import json
import logging
import time

import boto3
import numpy as np

import pytest

EI_SUPPORTED_REGIONS = ['us-east-1', 'us-east-2', 'us-west-2', 'eu-west-1', 'ap-northeast-1', 'ap-northeast-2']

logger = logging.getLogger(__name__)
logging.getLogger('boto3').setLevel(logging.INFO)
logging.getLogger('botocore').setLevel(logging.INFO)
logging.getLogger('factory.py').setLevel(logging.INFO)
logging.getLogger('auth.py').setLevel(logging.INFO)
logging.getLogger('connectionpool.py').setLevel(logging.INFO)
logging.getLogger('session.py').setLevel(logging.DEBUG)
logging.getLogger('functional').setLevel(logging.DEBUG)


@pytest.fixture(autouse=True)
def skip_if_no_accelerator(accelerator_type):
if accelerator_type is None:
pytest.skip('Skipping because accelerator type was not provided')


@pytest.fixture(autouse=True)
def skip_if_non_supported_ei_region(region):
if region not in EI_SUPPORTED_REGIONS:
pytest.skip('EI is not supported in {}'.format(region))


@pytest.fixture
def pretrained_model_data(region):
return 's3://sagemaker-sample-data-{}/tensorflow/model/resnet/resnet_50_v2_fp32_NCHW.tar.gz'.format(region)


def _timestamp():
return time.strftime("%Y-%m-%d-%H-%M-%S")


def _execution_role(session):
return session.resource('iam').Role('SageMakerRole').arn

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consider adding an argparser for the role?



def _production_variants(model_name, instance_type, accelerator_type):
production_variants = [{
'VariantName': 'AllTraffic',
'ModelName': model_name,
'InitialInstanceCount': 1,
'InstanceType': instance_type,
'AcceleratorType': accelerator_type
}]
return production_variants


@pytest.mark.skip_if_non_supported_ei_region
@pytest.mark.skip_if_no_accelerator
def test_deploy_elastic_inference_with_pretrained_model(pretrained_model_data,
docker_image_uri,
instance_type,
accelerator_type):
endpoint_name = 'test-tfs-ei-deploy-model-{}'.format(_timestamp())
endpoint_config_name = 'test-tfs-endpoint-config-{}'.format(_timestamp())
model_name = 'test-tfs-ei-model-{}'.format(_timestamp())

session = boto3.Session()
client = session.client('sagemaker')
runtime_client = session.client('runtime.sagemaker')
client.create_model(ModelName=model_name,
ExecutionRoleArn=_execution_role(session),
PrimaryContainer={
'Image': docker_image_uri,
'ModelDataUrl': pretrained_model_data
})

logger.info('deploying model to endpoint: {}'.format(endpoint_name))

client.create_endpoint_config(EndpointConfigName=endpoint_config_name,
ProductionVariants=_production_variants(model_name, instance_type, accelerator_type))

client.create_endpoint(EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name)

try:
client.get_waiter('endpoint_in_service').wait(EndpointName=endpoint_name)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to do this, however I can foresee us reusing this logic, maybe it would be better to refactor to another file?

finally:
status = client.describe_endpoint(EndpointName=endpoint_name)['EndpointStatus']
if status != 'InService':
raise Exception('Failed to create endpoint.')

input_data = {'instances': np.random.rand(1, 1, 3, 3).tolist()}

response = runtime_client.invoke_endpoint(EndpointName=endpoint_name,
ContentType='application/json',
Body=json.dumps(input_data))
result = json.loads(response['Body'].read().decode())
assert result['predictions'] is not None

client.delete_endpoint(EndpointName=endpoint_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete_endpoint should clear endpoint_config too
also needs to be in a finally block so delete happens even if test fails

2 changes: 1 addition & 1 deletion tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ require-code = True
# Can be used to specify which tests to run, e.g.: tox -- -s
basepython = python3
commands =
python -m pytest {posargs} --ignore=test/functional
python -m pytest {posargs} --ignore=test/sagemaker
deps =
pytest
requests
Expand Down