Replies: 1 comment
-
I resolved the problem. The issue is at sm_client.create_model():
I have used Deep Learning Container Images for training the model. So for the PrimaryContainer['Image'] (above code), I have to pass an inference image and not a training image. AS long as my training image does not change, my inference image should not change. I chose the correct inference image from the AWS Sagemkaer Deep Learning Container Images and pass it to the
PROBLEM RESOLVED :) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have trained a model, deployed it successfully by just running the notebook (https://github.com/huggingface/notebooks/blob/master/sagemaker/01_getting_started_pytorch/sagemaker-notebook.ipynb)
However, I am trying to rerun the training job and deploy the new model at the endpoint. I am able to create a new trainingjob (whose specs are identical) to the training job obtained from the example notebook. I am able to create a new model and new endpoint configuration. But I get the following error when I update the endpoint with the new endpoint configuration:
The primary container for production variant did not pass the ping health check

How I create models, endpoint configuration, and update endpoint:
I have the following code in my AWS Lambda to trigger the creation process:
Using Python 3.6
Other Approaches:
I received the same error at the endpoint when I tried to create a model and an endpoint configuration using AWS Sagemaker GUI.
Kindly Assist
Beta Was this translation helpful? Give feedback.
All reactions