Skip to content

documentation: update PyTorch BYOM topic #1457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
May 15, 2020
Merged
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
9659e05
doc: update PyTorch BYOM topic
eslesar-aws Apr 22, 2020
31c8ff1
doc: fix merge conflict
eslesar-aws May 5, 2020
7c4dfa2
doc: fix sphinx issues in using_pytorch.rst
eslesar-aws May 5, 2020
db84ea7
doc: fix sphinx issues in using_pytortch.rst
eslesar-aws May 5, 2020
92d4fec
Merge branch 'master' into pytorch-byom
ajaykarpur May 5, 2020
cd80ca7
Merge branch 'master' into pytorch-byom
ajaykarpur May 5, 2020
fa22a81
Merge branch 'master' into pytorch-byom
nadiaya May 5, 2020
e2cf89c
Apply suggestions from code review
eslesar-aws May 7, 2020
8f7c92a
Merge branch 'master' of https://github.com/aws/sagemaker-python-sdk …
eslesar-aws May 7, 2020
ad29c6c
doc: address review comments
eslesar-aws May 8, 2020
596fc57
Merge branch 'master' of https://github.com/aws/sagemaker-python-sdk …
eslesar-aws May 8, 2020
16a61fe
Merge branch 'master' into pytorch-byom
nadiaya May 11, 2020
20b21bc
doc: address feedback in using_pytorch.rst
eslesar-aws May 11, 2020
badad47
Merge branch 'master' of https://github.com/aws/sagemaker-python-sdk …
eslesar-aws May 11, 2020
4fded1d
Merge branch 'pytorch-byom' of https://github.com/eslesar-aws/sagemak…
eslesar-aws May 11, 2020
6a9d152
Merge branch 'master' into pytorch-byom
chuyang-deng May 12, 2020
9695a03
Merge branch 'master' of https://github.com/aws/sagemaker-python-sdk …
eslesar-aws May 14, 2020
8ea4170
doc: fix trailing space errors and address feedback
eslesar-aws May 14, 2020
009624f
Merge branch 'pytorch-byom' of https://github.com/eslesar-aws/sagemak…
eslesar-aws May 14, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 69 additions & 68 deletions doc/using_pytorch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ Note that SageMaker doesn't support argparse actions. If you want to use, for ex
you need to specify `type` as `bool` in your script and provide an explicit `True` or `False` value for this hyperparameter
when instantiating PyTorch Estimator.

For more on training environment variables, please visit `SageMaker Containers <https://github.com/aws/sagemaker-containers>`_.
For more on training environment variables, see the `SageMaker Training Toolkit <https://github.com/aws/sagemaker-training-toolkit/blob/master/ENVIRONMENT_VARIABLES.md>`_.

Save the Model
--------------
Expand All @@ -115,7 +115,7 @@ to a certain filesystem path called ``model_dir``. This value is accessible thro
with open(os.path.join(args.model_dir, 'model.pth'), 'wb') as f:
torch.save(model.state_dict(), f)

After your training job is complete, SageMaker will compress and upload the serialized model to S3, and your model data
After your training job is complete, SageMaker compresses and uploads the serialized model to S3, and your model data
will be available in the S3 ``output_path`` you specified when you created the PyTorch Estimator.

If you are using Elastic Inference, you must convert your models to the TorchScript format and use ``torch.jit.save`` to save the model.
Expand Down Expand Up @@ -566,12 +566,76 @@ The function should return a byte array of data serialized to content_type.
The default implementation expects ``prediction`` to be a torch.Tensor and can serialize the result to JSON, CSV, or NPY.
It accepts response content types of "application/json", "text/csv", and "application/x-npy".

Working with Existing Model Data and Training Jobs
==================================================

Attach to existing training jobs
Bring your own model
====================

You can deploy a PyTorch model that you trained outside of SageMaker by using the ``PyTorchModel`` class.
Typically, you save a PyTorch model as a file with extension ``.pt`` or ``.pth``.
To do this, you need to:

* Write an inference script.
* Create the directory structure for your model files.
* Create the ``PyTorchModel`` object.

Write an inference script
-------------------------

You must create an inference script that implements (at least) the ``model_fn`` function that calls the loaded model to get a prediction.

**Note**: If you use elastic inference with PyTorch, you can use the default ``model_fn`` implementation provided in the serving container.

Optionally, you can also implement ``input_fn`` and ``output_fn`` to process input and output,
and ``predict_fn`` to customize how the model server gets predictions from the loaded model.
For information about how to write an inference script, see `Serve a PyTorch Model <#serve-a-pytorch-model>`_.
Save the inference script in the same folder where you saved your PyTorch model.
Pass the filename of the inference script as the ``entry_point`` parameter when you create the ``PyTorchModel`` object.

Create the directory structure for your model files
---------------------------------------------------

You have to create a directory structure and place your model files in the correct location.
The ``PyTorchModel`` constructor packs the files into a ``tar.gz`` file and uploads it to S3.

The directory structure where you saved your PyTorch model should look something like the following:

**Note:** This directory struture is for PyTorch versions 1.2 and higher.
For the directory structure for versions 1.1 and lower,
see `For versions 1.1 and lower <#for-versions-1.1-and-lower>`_.

::

| my_model
| |--model.pth
|
| code
| |--inference.py
| |--requirements.txt

Where ``requirments.txt`` is an optional file that specifies dependencies on third-party libraries.

Create a ``PyTorchModel`` object
--------------------------------

Now call the :class:`sagemaker.pytorch.model.PyTorchModel` constructor to create a model object, and then call its ``deploy()`` method to deploy your model for inference.

.. code:: python

from sagemaker import get_execution_role
role = get_execution_role()

pytorch_model = PyTorchModel(model_data='s3://my-bucket/my-path/model.tar.gz', role=role,
entry_point='inference.py')

predictor = pytorch_model.deploy(instance_type='ml.c4.xlarge', initial_instance_count=1)


Now you can call the ``predict()`` method to get predictions from your deployed model.

***********************************************
Attach an estimator to an existing training job
***********************************************

You can attach a PyTorch Estimator to an existing training job using the
``attach`` method.

Expand All @@ -592,69 +656,6 @@ The ``attach`` method accepts the following arguments:
- ``sagemaker_session:`` The Session used
to interact with SageMaker

Deploy Endpoints from model data
--------------------------------

In addition to attaching to existing training jobs, you can deploy models directly from model data in S3.
The following code sample shows how to do this, using the ``PyTorchModel`` class.

.. code:: python

pytorch_model = PyTorchModel(model_data='s3://bucket/model.tar.gz', role='SageMakerRole',
entry_point='transform_script.py')

predictor = pytorch_model.deploy(instance_type='ml.c4.xlarge', initial_instance_count=1)

The PyTorchModel constructor takes the following arguments:

- ``model_dat:`` An S3 location of a SageMaker model data
.tar.gz file
- ``image:`` A Docker image URI
- ``role:`` An IAM role name or Arn for SageMaker to access AWS
resources on your behalf.
- ``predictor_cls:`` A function to
call to create a predictor. If not None, ``deploy`` will return the
result of invoking this function on the created endpoint name
- ``env:`` Environment variables to run with
``image`` when hosted in SageMaker.
- ``name:`` The model name. If None, a default model name will be
selected on each ``deploy.``
- ``entry_point:`` Path (absolute or relative) to the Python file
which should be executed as the entry point to model hosting.
- ``source_dir:`` Optional. Path (absolute or relative) to a
directory with any other training source code dependencies including
the entry point file. Structure within this directory will be
preserved when training on SageMaker.
- ``enable_cloudwatch_metrics:`` Optional. If true, training
and hosting containers will generate Cloudwatch metrics under the
AWS/SageMakerContainer namespace.
- ``container_log_level:`` Log level to use within the container.
Valid values are defined in the Python logging module.
- ``code_location:`` Optional. Name of the S3 bucket where your
custom code will be uploaded to. If not specified, will use the
SageMaker default bucket created by sagemaker.Session.
- ``sagemaker_session:`` The SageMaker Session
object, used for SageMaker interaction

Your model data must be a .tar.gz file in S3. SageMaker Training Job model data is saved to .tar.gz files in S3,
however if you have local data you want to deploy, you can prepare the data yourself.

Assuming you have a local directory containg your model data named "my_model" you can tar and gzip compress the file and
upload to S3 using the following commands:

::

tar -czf model.tar.gz my_model
aws s3 cp model.tar.gz s3://my-bucket/my-path/model.tar.gz

This uploads the contents of my_model to a gzip compressed tar file to S3 in the bucket "my-bucket", with the key
"my-path/model.tar.gz".

To run this command, you'll need the AWS CLI tool installed. Please refer to our `FAQ`_ for more information on
installing this.

.. _FAQ: ../../../README.rst#faq

*************************
PyTorch Training Examples
*************************
Expand Down