Skip to content

Commit 70308b1

Browse files
bhaozBasil Beirouti
authored andcommitted
doc: more documentation for serverless inference (#2859)
1 parent 1b52edc commit 70308b1

File tree

2 files changed

+66
-0
lines changed

2 files changed

+66
-0
lines changed

doc/api/inference/serverless.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Serverless Inference
2+
---------------------
3+
4+
This module contains classes related to Amazon Sagemaker Serverless Inference
5+
6+
.. automodule:: sagemaker.serverless.serverless_inference_config
7+
:members:
8+
:undoc-members:
9+
:show-inheritance:

doc/overview.rst

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -684,6 +684,63 @@ For more detailed explanations of the classes that this library provides for aut
684684
- `API docs for HyperparameterTuner and parameter range classes <https://sagemaker.readthedocs.io/en/stable/tuner.html>`__
685685
- `API docs for analytics classes <https://sagemaker.readthedocs.io/en/stable/analytics.html>`__
686686

687+
*******************************
688+
SageMaker Serverless Inference
689+
*******************************
690+
Amazon SageMaker Serverless Inference enables you to easily deploy machine learning models for inference without having
691+
to configure or manage the underlying infrastructure. After you trained a model, you can deploy it to Amazon Sagemaker
692+
Serverless endpoint and then invoke the endpoint with the model to get inference results back. More information about
693+
SageMaker Serverless Inference can be found in the `AWS documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html>`__.
694+
695+
To deploy serverless endpoint, you will need to create a ``ServerlessInferenceConfig``.
696+
If you create ``ServerlessInferenceConfig`` without specifying its arguments, the default ``MemorySizeInMB`` will be **2048** and
697+
the default ``MaxConcurrency`` will be **5** :
698+
699+
.. code:: python
700+
701+
from sagemaker.serverless import ServerlessInferenceConfig
702+
703+
# Create an empty ServerlessInferenceConfig object to use default values
704+
serverless_config = new ServerlessInferenceConfig()
705+
706+
Or you can specify ``MemorySizeInMB`` and ``MaxConcurrency`` in ``ServerlessInferenceConfig`` (example shown below):
707+
708+
.. code:: python
709+
710+
# Specify MemorySizeInMB and MaxConcurrency in the serverless config object
711+
serverless_config = new ServerlessInferenceConfig(
712+
memory_size_in_mb=4096,
713+
max_concurrency=10,
714+
)
715+
716+
Then use the ``ServerlessInferenceConfig`` in the estimator's ``deploy()`` method to deploy a serverless endpoint:
717+
718+
.. code:: python
719+
720+
# Deploys the model that was generated by fit() to a SageMaker serverless endpoint
721+
serverless_predictor = estimator.deploy(serverless_inference_config=serverless_config)
722+
723+
After deployment is complete, you can use predictor's ``predict()`` method to invoke the serverless endpoint just like
724+
real-time endpoints:
725+
726+
.. code:: python
727+
728+
# Serializes data and makes a prediction request to the SageMaker serverless endpoint
729+
response = serverless_predictor.predict(data)
730+
731+
Clean up the endpoint and model if needed after inference:
732+
733+
.. code:: python
734+
735+
# Tears down the SageMaker endpoint and endpoint configuration
736+
serverless_predictor.delete_endpoint()
737+
738+
# Deletes the SageMaker model
739+
serverless_predictor.delete_model()
740+
741+
For more details about ``ServerlessInferenceConfig``,
742+
see the API docs for `Serverless Inference <https://sagemaker.readthedocs.io/en/stable/api/inference/serverless.html>`__
743+
687744
*************************
688745
SageMaker Batch Transform
689746
*************************

0 commit comments

Comments
 (0)