@@ -684,6 +684,63 @@ For more detailed explanations of the classes that this library provides for aut
684
684
- `API docs for HyperparameterTuner and parameter range classes <https://sagemaker.readthedocs.io/en/stable/tuner.html >`__
685
685
- `API docs for analytics classes <https://sagemaker.readthedocs.io/en/stable/analytics.html >`__
686
686
687
+ *******************************
688
+ SageMaker Serverless Inference
689
+ *******************************
690
+ Amazon SageMaker Serverless Inference enables you to easily deploy machine learning models for inference without having
691
+ to configure or manage the underlying infrastructure. After you trained a model, you can deploy it to Amazon Sagemaker
692
+ Serverless endpoint and then invoke the endpoint with the model to get inference results back. More information about
693
+ SageMaker Serverless Inference can be found in the `AWS documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html >`__.
694
+
695
+ To deploy serverless endpoint, you will need to create a ``ServerlessInferenceConfig ``.
696
+ If you create ``ServerlessInferenceConfig `` without specifying its arguments, the default ``MemorySizeInMB `` will be **2048 ** and
697
+ the default ``MaxConcurrency `` will be **5 ** :
698
+
699
+ .. code :: python
700
+
701
+ from sagemaker.serverless import ServerlessInferenceConfig
702
+
703
+ # Create an empty ServerlessInferenceConfig object to use default values
704
+ serverless_config = new ServerlessInferenceConfig()
705
+
706
+ Or you can specify ``MemorySizeInMB `` and ``MaxConcurrency `` in ``ServerlessInferenceConfig `` (example shown below):
707
+
708
+ .. code :: python
709
+
710
+ # Specify MemorySizeInMB and MaxConcurrency in the serverless config object
711
+ serverless_config = new ServerlessInferenceConfig(
712
+ memory_size_in_mb = 4096 ,
713
+ max_concurrency = 10 ,
714
+ )
715
+
716
+ Then use the ``ServerlessInferenceConfig `` in the estimator's ``deploy() `` method to deploy a serverless endpoint:
717
+
718
+ .. code :: python
719
+
720
+ # Deploys the model that was generated by fit() to a SageMaker serverless endpoint
721
+ serverless_predictor = estimator.deploy(serverless_inference_config = serverless_config)
722
+
723
+ After deployment is complete, you can use predictor's ``predict() `` method to invoke the serverless endpoint just like
724
+ real-time endpoints:
725
+
726
+ .. code :: python
727
+
728
+ # Serializes data and makes a prediction request to the SageMaker serverless endpoint
729
+ response = serverless_predictor.predict(data)
730
+
731
+ Clean up the endpoint and model if needed after inference:
732
+
733
+ .. code :: python
734
+
735
+ # Tears down the SageMaker endpoint and endpoint configuration
736
+ serverless_predictor.delete_endpoint()
737
+
738
+ # Deletes the SageMaker model
739
+ serverless_predictor.delete_model()
740
+
741
+ For more details about ``ServerlessInferenceConfig ``,
742
+ see the API docs for `Serverless Inference <https://sagemaker.readthedocs.io/en/stable/api/inference/serverless.html >`__
743
+
687
744
*************************
688
745
SageMaker Batch Transform
689
746
*************************
0 commit comments