-
Notifications
You must be signed in to change notification settings - Fork 101
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a EI specific integ test for this?
In addition, we should update the docs: https://github.com/aws/sagemaker-tensorflow-serving-container#building-your-image.
docker/Dockerfile.ei
Outdated
# Validate the argument is specified | ||
RUN test $TENSORFLOW_MODEL_SERVER || exit 1 | ||
|
||
COPY ./$TENSORFLOW_MODEL_SERVER /usr/bin/tensorflow_model_server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the ./ needed?
scripts/shared.sh
Outdated
@@ -32,6 +32,7 @@ function parse_std_args() { | |||
# defaults | |||
arch='cpu' | |||
version='1.12.0' | |||
model='AmazonEI_TensorFlow_Serving_v1.12_v1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file hosted in S3 is a zip, where does that get handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also 'model' isn't a good name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 the s3 fetch is missing, and there should be updates to the build.sh as well (or nothing is getting built).
also, if the S3 fetch normalized the file name when it downloaded, then there would likely be no need to pass the filename into the dockerfile as an ARG.
scripts/build.sh
Outdated
# logout 520713654638 account after building | ||
if [ "$arch" = "ei" ]; then | ||
docker logout https://520713654638.dkr.ecr.$aws_region.amazonaws.com &>/dev/null | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add nl @ eof
scripts/build.sh
Outdated
# Dockerfile.ei needs to login 520713654638 to pull the cpu image | ||
if [ "$arch" = "ei" ]; then | ||
echo "pulling cpu image..." | ||
$(aws ecr get-login --no-include-email --registry-id 520713654638) &>/dev/null || echo 'warning: ecr login failed' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no do not log in to prod repo. this breaks pr and release build flow.
docker/Dockerfile.ei
Outdated
@@ -0,0 +1,10 @@ | |||
ARG TFS_VERSION | |||
|
|||
FROM 520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tensorflow-serving:${TFS_VERSION}-cpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no do not use the prod repo! this breaks pr and release build flow. the image will be built from the previously released production image instead of the current image and we will ship a stale container!
scripts/shared.sh
Outdated
@@ -55,6 +56,11 @@ function parse_std_args() { | |||
shift | |||
shift | |||
;; | |||
-m|--model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model
isn't a good name for something that points at a tfs executable
docker/Dockerfile.ei
Outdated
ARG TENSORFLOW_MODEL_SERVER | ||
|
||
# Validate the argument is specified | ||
RUN test $TENSORFLOW_MODEL_SERVER || exit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think this check is needed. docker build should fail by itself it you don't specify, because the arg declaration has no default
scripts/build.sh
Outdated
|
||
echo "building image... " | ||
docker build \ | ||
--cache-from $aws_account.dkr.ecr.$aws_region.amazonaws.com/sagemaker-tensorflow-serving:$full_version-$arch \ | ||
--build-arg TFS_VERSION=$full_version \ | ||
--build-arg TFS_SHORT_VERSION=$short_version \ | ||
--build-arg TENSORFLOW_MODEL_SERVER=$model \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider EI_EXECUTABLE
or TFS_EI_EXECUTABLE
instead
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
I've updated the scripts to handle zip files from s3 bucket, changed the EI dockerfile to avoid using prod account, dropped unnecessary arguments. I also updated README and add a functional test. |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
|
||
@pytest.mark.skip_if_non_supported_ei_region | ||
@pytest.mark.skip_if_no_accelerator | ||
def test_deploy_elastic_inference_with_pretrained_model(pretrained_model_data, docker_image_uri, sagemaker_session, instance_type, accelerator_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- endpoint needs to be removed at end of test
- not excited about taking a sagemaker sdk dependency for this. can we change this to use boto3 only? it shouldn't be too complicated since the model is already in s3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohhh, gotcha!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job!
@@ -28,6 +28,21 @@ function get_aws_account() { | |||
aws sts get-caller-identity --query 'Account' --output text | |||
} | |||
|
|||
function get_tfs_executable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this function utilize the parsed args below for the TF version?
It also looks like the naming scheme isn't consistent for the zip file for 1.11 and 1.12 already, which might require updating this file often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only utilizes $version here, coz the naming for v1.11 and v1.12 are:
v1.11 -> Ubuntu -> Ubuntu.zip: Unbuntu/executable
v1.12 -> Ubuntu -> tfs_ei_v1_12_ubuntu.zip: v1_12_Ubuntu_2/executable
Even the v1.12's zip name and unzipped directory name are different. So I believe we will need to up date this file in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like in S3 it is always going to follow the pattern of: s3://amazonei-tensorflow/Tensorflow\ Serving/{version}/Ubuntu.
We can do a discovery on the item for example with:
aws s3 ls s3://amazonei-tensorflow/Tensorflow\ Serving/v1.11/Ubuntu/ | awk '{print $4}'
which should return Ubuntu.zip
Afterwards, I think we can probably also discover the unzipped name as well. For example:
find . -name AmazonEI_TensorFlow_Serving_v${version}_v1* -exec mv {} container/ \;
There are probably better commands than the ones I used above in my examples.
test/functional/conftest.py
Outdated
@@ -0,0 +1,100 @@ | |||
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jesterhazy, do you care about this being under functional or integration/sagemaker?
I think most of our framework containers are under integration/sagemaker, except for the sagemaker-tensorflow-container repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it seems a bit funky to have both. the stuff in test/integration are local docker container tests, so renaming test/functional to test/sagemaker or renaming everything to test/integration/{local, sagemaker} would fit the convention better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rename it to test/sagemaker
|
||
@pytest.mark.skip_if_non_supported_ei_region | ||
@pytest.mark.skip_if_no_accelerator | ||
def test_deploy_elastic_inference_with_pretrained_model(pretrained_model_data, docker_image_uri, sagemaker_session, instance_type, accelerator_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test/sagemaker/conftest.py
Outdated
|
||
def pytest_addoption(parser): | ||
parser.addoption('--aws-id') | ||
parser.addoption('--docker-base-name', default='functional-tensorflow-serving') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is functional meant to correspond to test/functional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's meant to be sagemaker-tensorflow-serving ... I just accidentally changed it to 'functional' for some reason I dont remember.
test/sagemaker/conftest.py
Outdated
@pytest.fixture(scope='session') | ||
def tag(request, framework_version, processor): | ||
provided_tag = request.config.getoption('--tag') | ||
default_tag = '{}-{}-py2'.format(framework_version, processor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we ever specify python within the tag for this specific container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope
EndpointConfigName=endpoint_config_name) | ||
|
||
try: | ||
client.get_waiter('endpoint_in_service').wait(EndpointName=endpoint_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to do this, however I can foresee us reusing this logic, maybe it would be better to refactor to another file?
README.md
Outdated
@@ -106,6 +108,15 @@ checkers using `tox`: | |||
tox | |||
``` | |||
|
|||
To test Elastic Inference with Accelerator, you will need an AWS account, publish your built image to ECR repository and run the following command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you provide an example of the command needed to run the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"To test against Elastic Inference, you will..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update these as well as the above comments.
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
result = json.loads(response['Body'].read().decode()) | ||
assert result['predictions'] is not None | ||
|
||
client.delete_endpoint(EndpointName=endpoint_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete_endpoint should clear endpoint_config too
also needs to be in a finally
block so delete happens even if test fails
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Let's wait on approval from @jesterhazy
|
||
|
||
def _execution_role(session): | ||
return session.resource('iam').Role('SageMakerRole').arn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we consider adding an argparser for the role?
@@ -0,0 +1,96 @@ | |||
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be under test/integration/sagemaker
to match the other repos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renaming test/functional to test/sagemaker or renaming everything to test/integration/{local, sagemaker} would fit the convention better
I went with the option that involves fewer changes. but I'm also okay to put everything under test/integration/... 😺
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you do move it, do it in a different PR haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks great!
Issue #, if available:
aws/amazon-sagemaker-examples#613
aws/sagemaker-python-sdk@4732545#r32336726
Description of changes:
Introduce a Dockerfile.ei to build TFS container with EI support.
Modify scripts to build and run Dockerfile.ei:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.