Skip to content

documentation: TFS support for pre/processing functions #807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 31, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
186 changes: 186 additions & 0 deletions src/sagemaker/tensorflow/deploying_tensorflow_serving.rst
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,192 @@ More information on how to create ``export_outputs`` can be found in `specifying
refer to TensorFlow's `Save and Restore <https://www.tensorflow.org/guide/saved_model>`_ documentation for other ways to control the
inference-time behavior of your SavedModels.

Providing Python scripts for pre/pos-processing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can add your customized Python code to process your input and output data:

.. code::

from sagemaker.tensorflow.serving import Model

model = Model(entry_point='inference.py',
model_data='s3://mybucket/model.tar.gz',
role='MySageMakerRole')

How to implement the pre- and/or post-processing handler(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Your entry point file should implement either a pair of ``input_handler``
and ``output_handler`` functions or a single ``handler`` function.
Note that if ``handler`` function is implemented, ``input_handler``
and ``output_handler`` are ignored.

To implement pre- and/or post-processing handler(s), use the Context
object that the Python service creates. The Context object is a namedtuple with the following attributes:

- ``model_name (string)``: the name of the model to use for
inference. For example, 'half-plus-three'

- ``model_version (string)``: version of the model. For example, '5'

- ``method (string)``: inference method. For example, 'predict',
'classify' or 'regress', for more information on methods, please see
`Classify and Regress
API <https://www.tensorflow.org/tfx/serving/api_rest#classify_and_regress_api>`__
and `Predict
API <https://www.tensorflow.org/tfx/serving/api_rest#predict_api>`__

- ``rest_uri (string)``: the TFS REST uri generated by the Python
service. For example,
'http://localhost:8501/v1/models/half_plus_three:predict'

- ``grpc_uri (string)``: the GRPC port number generated by the Python
service. For example, '9000'

- ``custom_attributes (string)``: content of
'X-Amzn-SageMaker-Custom-Attributes' header from the original
request. For example,
'tfs-model-name=half*plus*\ three,tfs-method=predict'

- ``request_content_type (string)``: the original request content type,
defaulted to 'application/json' if not provided

- ``accept_header (string)``: the original request accept type,
defaulted to 'application/json' if not provided

- ``content_length (int)``: content length of the original request

The following code example implements ``input_handler`` and
``output_handler``. By providing these, the Python service posts the
request to the TFS REST URI with the data pre-processed by ``input_handler``
and passes the response to ``output_handler`` for post-processing.

.. code::

import json

def input_handler(data, context):
""" Pre-process request input before it is sent to TensorFlow Serving REST API
Args:
data (obj): the request data, in format of dict or string
context (Context): an object containing request and configuration details
Returns:
(dict): a JSON-serializable dict that contains request body and headers
"""
if context.request_content_type == 'application/json':
# pass through json (assumes it's correctly formed)
d = data.read().decode('utf-8')
return d if len(d) else ''

if context.request_content_type == 'text/csv':
# very simple csv handler
return json.dumps({
'instances': [float(x) for x in data.read().decode('utf-8').split(',')]
})

raise ValueError('{{"error": "unsupported content type {}"}}'.format(
context.request_content_type or "unknown"))


def output_handler(data, context):
"""Post-process TensorFlow Serving output before it is returned to the client.
Args:
data (obj): the TensorFlow serving response
context (Context): an object containing request and configuration details
Returns:
(bytes, string): data to return to client, response content type
"""
if data.status_code != 200:
raise ValueError(data.content.decode('utf-8'))

response_content_type = context.accept_header
prediction = data.content
return prediction, response_content_type

You might want to have complete control over the request.
For example, you might want to make a TFS request (REST or GRPC) to the first model,
inspect the results, and then make a request to a second model. In this case, implement
the ``handler`` method instead of the ``input_handler`` and ``output_handler`` methods, as demonstrated
in the following code:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to have complete control over the request. For example, you might want to make a TFS request (REST or GRPC) to the first model, inspect the results, and then make a request to a second model. In this case, implement the handler method instead of the input_handler and output_handler methods, as demonstrated in the following code:

.. code::

import json
import requests


def handler(data, context):
"""Handle request.
Args:
data (obj): the request data
context (Context): an object containing request and configuration details
Returns:
(bytes, string): data to return to client, (optional) response content type
"""
processed_input = _process_input(data, context)
response = requests.post(context.rest_uri, data=processed_input)
return _process_output(response, context)


def _process_input(data, context):
if context.request_content_type == 'application/json':
# pass through json (assumes it's correctly formed)
d = data.read().decode('utf-8')
return d if len(d) else ''

if context.request_content_type == 'text/csv':
# very simple csv handler
return json.dumps({
'instances': [float(x) for x in data.read().decode('utf-8').split(',')]
})

raise ValueError('{{"error": "unsupported content type {}"}}'.format(
context.request_content_type or "unknown"))


def _process_output(data, context):
if data.status_code != 200:
raise ValueError(data.content.decode('utf-8'))

response_content_type = context.accept_header
prediction = data.content
return prediction, response_content_type

You can also bring in external dependencies to help with your data
processing. There are 2 ways to do this:

1. If you included ``requirements.txt`` in your ``source_dir`` or in
your dependencies, the container installs the Python dependencies at runtime using ``pip install -r``:

.. code::

from sagemaker.tensorflow.serving import Model

model = Model(entry_point='inference.py',
dependencies=['requirements.txt'],
model_data='s3://mybucket/model.tar.gz',
role='MySageMakerRole')


2. If you are working in a network-isolation situation or if you don't
want to install dependencies at runtime every time your endpoint starts or a batch
transform job runs, you might want to put
pre-downloaded dependencies under a ``lib`` directory and this
directory as dependency. The container adds the modules to the Python
path. Note that if both ``lib`` and ``requirements.txt``
are present in the model archive, the ``requirements.txt`` is ignored:

.. code::

from sagemaker.tensorflow.serving import Model

model = Model(entry_point='inference.py',
dependencies=['/path/to/folder/named/lib'],
model_data='s3://mybucket/model.tar.gz',
role='MySageMakerRole')


Deploying more than one model to your Endpoint
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down