Skip to content

Commit 4f08a3d

Browse files
authored
add tensorflow serving docs (#468)
* add tensorflow serving docs * add content_type to tensorflow.serving.Predictor * support CustomAttributes in local mode
1 parent 576af44 commit 4f08a3d

File tree

8 files changed

+730
-190
lines changed

8 files changed

+730
-190
lines changed

CHANGELOG.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ CHANGELOG
55
1.14.2-dev
66
==========
77

8+
* bug-fix: support ``CustomAttributes`` argument in local mode ``invoke_endpoint`` requests
9+
* enhancement: add ``content_type`` parameter to ``sagemaker.tensorflow.serving.Predictor``
10+
* doc-fix: add TensorFlow Serving Container docs
811
* doc-fix: fix rendering error in README.rst
912
* enhancement: Local Mode: support optional input channels
1013
* build: added pylint

src/sagemaker/local/local_session.py

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -164,14 +164,20 @@ def __init__(self, config=None):
164164
self.config = config
165165
self.serving_port = get_config_value('local.serving_port', config) or 8080
166166

167-
def invoke_endpoint(self, Body, EndpointName, ContentType, Accept): # pylint: disable=unused-argument
167+
def invoke_endpoint(self, Body, EndpointName, # pylint: disable=unused-argument
168+
ContentType=None, Accept=None, CustomAttributes=None):
168169
url = "http://localhost:%s/invocations" % self.serving_port
169-
headers = {
170-
'Content-type': ContentType
171-
}
170+
headers = {}
171+
172+
if ContentType is not None:
173+
headers['Content-type'] = ContentType
174+
172175
if Accept is not None:
173176
headers['Accept'] = Accept
174177

178+
if CustomAttributes is not None:
179+
headers['X-Amzn-SageMaker-Custom-Attributes'] = CustomAttributes
180+
175181
r = self.http.request('POST', url, body=Body, preload_content=False,
176182
headers=headers)
177183

src/sagemaker/tensorflow/README.rst

Lines changed: 23 additions & 183 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
==========================================
21
TensorFlow SageMaker Estimators and Models
32
==========================================
43

@@ -59,7 +58,7 @@ In addition, it may optionally contain:
5958

6059
- ``serving_input_fn``: Defines the features to be passed to the model during prediction. **Important:**
6160
this function is used only during training, but is required to deploy the model resulting from training
62-
in a SageMaker endpoint.
61+
to a SageMaker endpoint.
6362

6463
Creating a ``model_fn``
6564
^^^^^^^^^^^^^^^^^^^^^^^
@@ -229,9 +228,14 @@ More details on how to create input functions can be find in `Building Input Fun
229228
Creating a ``serving_input_fn``
230229
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
231230

232-
``serving_input_fn`` is used to define the shapes and types of the inputs the model accepts when the model is exported for Tensorflow Serving. It is optional, but required for deploying the trained model to a SageMaker endpoint.
231+
``serving_input_fn`` is used to define the shapes and types of the inputs the model accepts when
232+
the model is exported for Tensorflow Serving. This function is optional if you only want to
233+
train a model, but it is required if you want to create a SavedModel bundle that can be
234+
deployed to a SageMaker endpoint.
233235

234-
``serving_input_fn`` is called at the end of model training and is **not** called during inference. (If you'd like to preprocess inference data, please see **Overriding input preprocessing with an input_fn**).
236+
``serving_input_fn`` is called at the end of model training and is **not** called during
237+
inference. (If you'd like to preprocess inference data, please see
238+
**Overriding input preprocessing with an input_fn**).
235239

236240
The basic skeleton for the ``serving_input_fn`` looks like this:
237241

@@ -558,14 +562,13 @@ For more information on training and evaluation process, see `tf.estimator.train
558562

559563
For more information on fit, see `SageMaker Python SDK Overview <#sagemaker-python-sdk-overview>`_.
560564

561-
TensorFlow serving models
565+
TensorFlow Serving models
562566
^^^^^^^^^^^^^^^^^^^^^^^^^
563567

564568
After your training job is complete in SageMaker and the ``fit`` call ends, the training job
565-
will generate a `TensorFlow serving <https://www.tensorflow.org/serving/serving_basic>`_
566-
model ready for deployment. Your TensorFlow serving model will be available in the S3 location
567-
``output_path`` that you specified when you created your `sagemaker.tensorflow.TensorFlow`
568-
estimator.
569+
will generate a `TensorFlow SavedModel <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md>`_
570+
bundle ready for deployment. Your model will be available in S3 at the ``output_path`` location
571+
that you specified when you created your ``sagemaker.tensorflow.TensorFlow`` estimator.
569572

570573
Restoring from checkpoints
571574
^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -614,188 +617,25 @@ Note that TensorBoard is not supported when passing wait=False to ``fit``.
614617
Deploying TensorFlow Serving models
615618
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
616619

617-
After a ``TensorFlow`` Estimator has been fit, it saves a ``TensorFlow Serving`` model in
618-
the S3 location defined by ``output_path``. You can call ``deploy`` on a ``TensorFlow``
620+
After a TensorFlow estimator has been fit, it saves a TensorFlow SavedModel in
621+
the S3 location defined by ``output_path``. You can call ``deploy`` on a TensorFlow
619622
estimator to create a SageMaker Endpoint.
620623

621-
A common usage of the ``deploy`` method, after the ``TensorFlow`` estimator has been fit look
622-
like this:
623-
624-
.. code:: python
625-
626-
from sagemaker.tensorflow import TensorFlow
627-
628-
estimator = TensorFlow(entry_point='tf-train.py', ..., train_instance_count=1,
629-
train_instance_type='ml.c4.xlarge', framework_version='1.10.0')
630-
631-
estimator.fit(inputs)
632-
633-
predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge')
634-
635-
636-
The code block above deploys a SageMaker Endpoint with one instance of the type 'ml.c4.xlarge'.
637-
638-
What happens when deploy is called
639-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
640-
641-
Calling ``deploy`` starts the process of creating a SageMaker Endpoint. This process includes the following steps.
642-
643-
- Starts ``initial_instance_count`` EC2 instances of the type ``instance_type``.
644-
- On each instance, it will do the following steps:
645-
646-
- start a Docker container optimized for TensorFlow Serving, see `SageMaker TensorFlow Docker containers`_.
647-
- start a production ready HTTP Server which supports protobuf, JSON and CSV content types, see `Making predictions against a SageMaker Endpoint`_.
648-
- start a `TensorFlow Serving` process
649-
650-
When the ``deploy`` call finishes, the created SageMaker Endpoint is ready for prediction requests. The next chapter will explain
651-
how to make predictions against the Endpoint, how to use different content-types in your requests, and how to extend the Web server
652-
functionality.
624+
SageMaker provides two different options for deploying TensorFlow models to a SageMaker
625+
Endpoint:
653626

654-
Deploying directly from model artifacts
655-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
627+
- The first option uses a Python-based server that allows you to specify your own custom
628+
input and output handling functions in a Python script. This is the default option.
656629

657-
If you already have existing model artifacts, you can skip training and deploy them directly to an endpoint:
658-
659-
.. code:: python
660-
661-
from sagemaker.tensorflow import TensorFlowModel
662-
663-
tf_model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz',
664-
role='MySageMakerRole',
665-
entry_point='entry.py',
666-
name='model_name')
667-
668-
predictor = tf_model.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge')
669-
670-
You can also optionally specify a pip `requirements file <https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format>`_ if you need to install additional packages into the deployed
671-
runtime environment by including it in your source_dir and specifying it in the ``'SAGEMAKER_REQUIREMENTS'`` env variable:
672-
673-
.. code:: python
674-
675-
from sagemaker.tensorflow import TensorFlowModel
676-
677-
tf_model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz',
678-
role='MySageMakerRole',
679-
entry_point='entry.py',
680-
source_dir='my_src', # directory which contains entry_point script and requirements file
681-
name='model_name',
682-
env={'SAGEMAKER_REQUIREMENTS': 'requirements.txt'}) # path relative to source_dir
683-
684-
predictor = tf_model.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge')
685-
686-
687-
Making predictions against a SageMaker Endpoint
688-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
689-
690-
The following code adds a prediction request to the previous code example:
691-
692-
.. code:: python
630+
See `Deploying to Python-based Endpoints <deploying_python.rst>`_ to learn how to use this option.
693631

694-
estimator = TensorFlow(entry_point='tf-train.py', ..., train_instance_count=1,
695-
train_instance_type='ml.c4.xlarge', framework_version='1.10.0')
696-
697-
estimator.fit(inputs)
698-
699-
predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge')
700-
701-
result = predictor.predict([6.4, 3.2, 4.5, 1.5])
702-
703-
The ``predictor.predict`` method call takes one parameter, the input ``data`` for which you want the ``SageMaker Endpoint``
704-
to provide inference. ``predict`` will serialize the input data, and send it in as request to the ``SageMaker Endpoint`` by
705-
an ``InvokeEndpoint`` SageMaker operation. ``InvokeEndpoint`` operation requests can be made by ``predictor.predict``, by
706-
boto3 ``SageMaker.runtime`` client or by AWS CLI.
707-
708-
The ``SageMaker Endpoint`` web server will process the request, make an inference using the deployed model, and return a response.
709-
The ``result`` returned by ``predict`` is
710-
a Python dictionary with the model prediction. In the code example above, the prediction ``result`` looks like this:
711-
712-
.. code:: python
713-
714-
{'result':
715-
{'classifications': [
716-
{'classes': [
717-
{'label': '0', 'score': 0.0012890376383438706},
718-
{'label': '1', 'score': 0.9814321994781494},
719-
{'label': '2', 'score': 0.017278732731938362}
720-
]}
721-
]}
722-
}
723-
724-
Specifying the output of a prediction request
725-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
726-
727-
The format of the prediction ``result`` is determined by the parameter ``export_outputs`` of the `tf.estimator.EstimatorSpec <https://www.tensorflow.org/api_docs/python/tf/estimator/EstimatorSpec>`_ that you returned when you created your ``model_fn``, see
728-
`Example of a complete model_fn`_ for an example of ``export_outputs``.
729-
730-
More information on how to create ``export_outputs`` can find in `specifying the outputs of a custom model <https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/docs_src/programmers_guide/saved_model.md#specifying-the-outputs-of-a-custom-model>`_.
731-
732-
Endpoint prediction request handling
733-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
734-
735-
Whenever a prediction request is made to a SageMaker Endpoint via a ``InvokeEndpoint`` SageMaker operation, the request will
736-
be deserialized by the web server, sent to TensorFlow Serving, and serialized back to the client as response.
737-
738-
The TensorFlow Web server breaks request handling into three steps:
739-
740-
- input processing,
741-
- TensorFlow Serving prediction, and
742-
- output processing.
743-
744-
The SageMaker Endpoint provides default input and output processing, which support by default JSON, CSV, and protobuf requests.
745-
This process looks like this:
746-
747-
.. code:: python
748-
749-
# Deserialize the Invoke request body into an object we can perform prediction on
750-
deserialized_input = input_fn(serialized_input, request_content_type)
751-
752-
# Perform prediction on the deserialized object, with the loaded model
753-
prediction_result = make_tensorflow_serving_prediction(deserialized_input)
754-
755-
# Serialize the prediction result into the desired response content type
756-
serialized_output = output_fn(prediction_result, accepts)
757-
758-
The common functionality can be extended by the addiction of the following two functions to your training script:
759-
760-
Overriding input preprocessing with an ``input_fn``
761-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
762-
763-
An example of ``input_fn`` for the content-type "application/python-pickle" can be seen below:
764-
765-
.. code:: python
766-
767-
import numpy as np
768-
769-
def input_fn(serialized_input, content_type):
770-
"""An input_fn that loads a pickled object"""
771-
if request_content_type == "application/python-pickle":
772-
deserialized_input = pickle.loads(serialized_input)
773-
return deserialized_input
774-
else:
775-
# Handle other content-types here or raise an Exception
776-
# if the content type is not supported.
777-
pass
778-
779-
Overriding output postprocessing with an ``output_fn``
780-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
781-
782-
An example of ``output_fn`` for the accept type "application/python-pickle" can be seen below:
783-
784-
.. code:: python
785632

786-
import numpy as np
633+
- The second option uses a TensorFlow Serving-based server to provide a super-set of the
634+
`TensorFlow Serving REST API <https://www.tensorflow.org/serving/api_rest>`_. This option
635+
does not require (or allow) a custom python script.
787636

788-
def output_fn(prediction_result, accepts):
789-
"""An output_fn that dumps a pickled object as response"""
790-
if request_content_type == "application/python-pickle":
791-
return np.dumps(prediction_result)
792-
else:
793-
# Handle other content-types here or raise an Exception
794-
# if the content type is not supported.
795-
pass
637+
See `Deploying to TensorFlow Serving Endpoints <deploying_tensorflow_serving.rst>`_ to learn how to use this option.
796638

797-
A example with ``input_fn`` and ``output_fn`` above can be found in
798-
`here <https://github.com/aws/sagemaker-python-sdk/blob/master/tests/data/cifar_10/source/resnet_cifar_10.py#L143>`_.
799639

800640
Training with Pipe Mode using PipeModeDataset
801641
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)