-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feature: upgrade Neo MxNet to 1.7 #1934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -58,9 +58,6 @@ def mxnet_training_job( | |
|
||
|
||
@pytest.mark.canary_quick | ||
@pytest.mark.skip( | ||
reason="This test is failing because the image uri and the training script format has changed." | ||
) | ||
def test_attach_deploy( | ||
mxnet_training_job, sagemaker_session, cpu_instance_type, cpu_instance_family | ||
): | ||
|
@@ -71,7 +68,7 @@ def test_attach_deploy( | |
|
||
estimator.compile_model( | ||
target_instance_family=cpu_instance_family, | ||
input_shape={"data": [1, 1, 28, 28]}, | ||
input_shape={"data": [1, 1, 28, 28], "softmax_label": [1]}, | ||
output_path=estimator.output_path, | ||
) | ||
|
||
|
@@ -89,9 +86,6 @@ def test_attach_deploy( | |
predictor.predict(data) | ||
|
||
|
||
@pytest.mark.skip( | ||
reason="This test is failing because the image uri and the training script format has changed." | ||
) | ||
def test_deploy_model( | ||
mxnet_training_job, | ||
sagemaker_session, | ||
|
@@ -123,7 +117,7 @@ def test_deploy_model( | |
|
||
model.compile( | ||
target_instance_family=cpu_instance_family, | ||
input_shape={"data": [1, 1, 28, 28]}, | ||
input_shape={"data": [1, 1, 28, 28], "softmax_label": [1]}, | ||
role=role, | ||
job_name=unique_name_from_base("test-deploy-model-compilation-job"), | ||
output_path="/".join(model_data.split("/")[:-1]), | ||
|
@@ -165,7 +159,7 @@ def test_inferentia_deploy_model( | |
|
||
model.compile( | ||
target_instance_family=inf_instance_family, | ||
input_shape={"data": [1, 1, 28, 28]}, | ||
input_shape={"data": [1, 1, 28, 28], "softmax_label": [1]}, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isnt needed for Inferentia compilation flow. But if it works thats ok, it doesnt hurt anything. |
||
role=role, | ||
job_name=unique_name_from_base("test-deploy-model-compilation-job"), | ||
output_path="/".join(model_data.split("/")[:-1]), | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -175,7 +175,7 @@ def _create_compilation_job(input_shape, output_location): | |
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are there similar tests for pytorch? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, we only have unit tests for pytorch. It's debatable if we have a gap there since the mxnet test is already testing the part of the logic that calls the neo service to compile model. We are planning to revamp our build system to handle more parallel integ testing. Once the system is more scalable we will be in a better position to add integ tests. |
||
|
||
def _neo_inference_image(mxnet_version): | ||
return "301217895009.dkr.ecr.us-west-2.amazonaws.com/sagemaker-neo-{}:{}-cpu-py3".format( | ||
return "301217895009.dkr.ecr.us-west-2.amazonaws.com/sagemaker-inference-{}:{}-cpu-py3".format( | ||
FRAMEWORK.lower(), mxnet_version | ||
) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now all versions will map to 1.7?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. the older version has different repo now. If you compile a model with neo service now. The only container you can use to serve the model is the 1.7 one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats right, Neo doesnt support compiling with different versions of any Framework (MX, TF, or PT) yet. Whatever version of the framework used to train the model, only the one version in Neo (i.e MX-1.7, PT-1.4, TF-1.15) will be used to load that model. backward compatibility is assumed