Skip to content

documentation: update PyTorch BYOM topic #1457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
May 15, 2020
Merged
Changes from 11 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
9659e05
doc: update PyTorch BYOM topic
eslesar-aws Apr 22, 2020
31c8ff1
doc: fix merge conflict
eslesar-aws May 5, 2020
7c4dfa2
doc: fix sphinx issues in using_pytorch.rst
eslesar-aws May 5, 2020
db84ea7
doc: fix sphinx issues in using_pytortch.rst
eslesar-aws May 5, 2020
92d4fec
Merge branch 'master' into pytorch-byom
ajaykarpur May 5, 2020
cd80ca7
Merge branch 'master' into pytorch-byom
ajaykarpur May 5, 2020
fa22a81
Merge branch 'master' into pytorch-byom
nadiaya May 5, 2020
e2cf89c
Apply suggestions from code review
eslesar-aws May 7, 2020
8f7c92a
Merge branch 'master' of https://github.com/aws/sagemaker-python-sdk …
eslesar-aws May 7, 2020
ad29c6c
doc: address review comments
eslesar-aws May 8, 2020
596fc57
Merge branch 'master' of https://github.com/aws/sagemaker-python-sdk …
eslesar-aws May 8, 2020
16a61fe
Merge branch 'master' into pytorch-byom
nadiaya May 11, 2020
20b21bc
doc: address feedback in using_pytorch.rst
eslesar-aws May 11, 2020
badad47
Merge branch 'master' of https://github.com/aws/sagemaker-python-sdk …
eslesar-aws May 11, 2020
4fded1d
Merge branch 'pytorch-byom' of https://github.com/eslesar-aws/sagemak…
eslesar-aws May 11, 2020
6a9d152
Merge branch 'master' into pytorch-byom
chuyang-deng May 12, 2020
9695a03
Merge branch 'master' of https://github.com/aws/sagemaker-python-sdk …
eslesar-aws May 14, 2020
8ea4170
doc: fix trailing space errors and address feedback
eslesar-aws May 14, 2020
009624f
Merge branch 'pytorch-byom' of https://github.com/eslesar-aws/sagemak…
eslesar-aws May 14, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 95 additions & 69 deletions doc/using_pytorch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ With PyTorch Estimators and Models, you can train and host PyTorch models on Ama

Supported versions of PyTorch: ``0.4.0``, ``1.0.0``, ``1.1.0``, ``1.2.0``, ``1.3.1``, ``1.4.0``, ``1.5.0``.

Supported versions of PyTorch for Elastic Inference: ``1.3.1``.
* Supported versions of PyTorch for Elastic Inference: ``1.3.1``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might make sense to just remove the bullet point entirely (and make it a normal paragraph) since the previous line doesn't have a bullet point.


We recommend that you use the latest supported version because that's where we focus our development efforts.

Expand Down Expand Up @@ -90,7 +90,7 @@ Note that SageMaker doesn't support argparse actions. If you want to use, for ex
you need to specify `type` as `bool` in your script and provide an explicit `True` or `False` value for this hyperparameter
when instantiating PyTorch Estimator.

For more on training environment variables, please visit `SageMaker Containers <https://github.com/aws/sagemaker-containers>`_.
For more on training environment variables, see the `SageMaker Training Toolkit <https://github.com/aws/sagemaker-training-toolkit/blob/master/ENVIRONMENT_VARIABLES.md>`_.

Save the Model
--------------
Expand All @@ -115,7 +115,7 @@ to a certain filesystem path called ``model_dir``. This value is accessible thro
with open(os.path.join(args.model_dir, 'model.pth'), 'wb') as f:
torch.save(model.state_dict(), f)

After your training job is complete, SageMaker will compress and upload the serialized model to S3, and your model data
After your training job is complete, SageMaker compresses and uploads the serialized model to S3, and your model data
will be available in the S3 ``output_path`` you specified when you created the PyTorch Estimator.

If you are using Elastic Inference, you must convert your models to the TorchScript format and use ``torch.jit.save`` to save the model.
Expand Down Expand Up @@ -566,94 +566,120 @@ The function should return a byte array of data serialized to content_type.
The default implementation expects ``prediction`` to be a torch.Tensor and can serialize the result to JSON, CSV, or NPY.
It accepts response content types of "application/json", "text/csv", and "application/x-npy".

Working with Existing Model Data and Training Jobs
==================================================

Attach to existing training jobs
--------------------------------
Bring your own model
====================

You can attach a PyTorch Estimator to an existing training job using the
``attach`` method.
You can deploy a PyTorch model that you trained outside of SageMaker by using the ``PyTorchModel`` class.
Typically, you save a PyTorch model as a file with extension ``.pt`` or ``.pth``.
To do this, you need to:

* Write an inference script.
* Package the model artifacts into a ``tar.gz`` file.
* Upload the ``tar.gz`` file to an S3 bucket.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GH isn't letting me reply to #1457 (comment) directly, so starting a new comment thread here.

the only advantage I can think of for doing the packing oneself is better control over the S3 location and how the file is packed. however, I'm pretty sure everything is covered in through the Python SDK with specifying an S3 location, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this the _upload_code function in FrameworkModel? If so, does that mean that the user supplies the optional code_location param to the constructor to specify a bucket, or use the default session bucket?

And then the constructor uploads to that S3 location (either default session or `code_location)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, PyTorchModel takes care of repacking the model and uploading it, and the location can be defined by code_location if the user doesn't want to use the default session bucket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eslesar-aws do you still lean toward keeping those two steps in or do you think it'd be better to remove them since PyTorchModel can take care of repacking the model and uploading it?

* Create the ``PyTorchModel`` object.

Write an inference script
-------------------------

You must create an inference script that implements (at least) the ``model_fn`` function that calls the loaded model to get a prediction.

**Note**: If you use elastic inference with PyTorch, you can use the default ``model_fn`` implementation provided in the serving container.

Optionally, you can also implement ``input_fn`` and ``output_fn`` to process input and output.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this paragraph should probably also include that one can also (optionally) implement predict_fn

For information about how to write an inference script, see `Serve a PyTorch Model <#serve-a-pytorch-model>`_.
Save the inference script as ``inference.py`` in the same folder where you saved your PyTorch model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't have to be named inference.py - it can be named anything, as the name is passed through entry_point and then communicated to the serving container (through an environment variable)


Package model artifacts into a tar.gz file
------------------------------------------

The directory structure where you saved your PyTorch model should look something like the following:

**Note:** This directory struture is for PyTorch versions 1.2 and higher. For the directory structure for versions 1.1 and lower,
see `For versions 1.1 and lower <#for-versions-1.1-and-lower>`_.

::

| my_model
| |--model.pth
|
| code
| |--inference.py
| |--requirements.txt

Where ``requirments.txt`` is an optional file that specifies dependencies on third-party libraries.

With this file structure, run the following command to package your model as a ``tar.gz`` file:

``tar -czf model.tar.gz my_model code``

Upload model.tar.gz to S3
-------------------------

After you package your model into a ``tar.gz`` file, upload it to an S3 bucket by running the following Python code:

.. code:: python

my_training_job_name = 'MyAwesomePyTorchTrainingJob'
pytorch_estimator = PyTorch.attach(my_training_job_name)
import boto3
import sagemaker
s3 = boto3.client('s3')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these look to be unused


After attaching, if the training job has finished with job status "Completed", it can be
``deploy``\ ed to create a SageMaker Endpoint and return a
``Predictor``. If the training job is in progress,
attach will block and display log messages from the training job, until the training job completes.
from sagemaker import get_execution_role
role = get_execution_role()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is role used anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to the next section where role is passed to the model constructor.


The ``attach`` method accepts the following arguments:
response = s3.upload_file('model.tar.gz', 'my-bucket', '%s/%s' %('my-path', 'model.tar.gz'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


- ``training_job_name:`` The name of the training job to attach
to.
- ``sagemaker_session:`` The Session used
to interact with SageMaker
Where ``my-bucket`` is the name of your S3 bucket, and ``my-path`` is the folder where you want to store the model.


You can also upload to S3 by using the AWS CLI:

.. code:: python

aws s3 cp model.tar.gz s3://my-bucket/my-path/model.tar.gz``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the stray backticks at the end of the line


Deploy Endpoints from model data

To run this command, you'll need to have the AWS CLI tool installed. For information about installing the AWS CLI,
see `Installing the AWS CLI <https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html>`_.

Create a ``PyTorchModel`` object
--------------------------------

In addition to attaching to existing training jobs, you can deploy models directly from model data in S3.
The following code sample shows how to do this, using the ``PyTorchModel`` class.
Now call the :class:`sagemaker.pytorch.model.PyTorchModel` constructor to create a model object, and then call its ``deploy()`` method to deploy your model for inference.

.. code:: python

pytorch_model = PyTorchModel(model_data='s3://bucket/model.tar.gz', role='SageMakerRole',
entry_point='transform_script.py')
pytorch_model = PyTorchModel(model_data='s3://my-bucket/my-path/model.tar.gz', role=role,
entry_point='inference.py')

predictor = pytorch_model.deploy(instance_type='ml.c4.xlarge', initial_instance_count=1)

The PyTorchModel constructor takes the following arguments:

- ``model_dat:`` An S3 location of a SageMaker model data
.tar.gz file
- ``image:`` A Docker image URI
- ``role:`` An IAM role name or Arn for SageMaker to access AWS
resources on your behalf.
- ``predictor_cls:`` A function to
call to create a predictor. If not None, ``deploy`` will return the
result of invoking this function on the created endpoint name
- ``env:`` Environment variables to run with
``image`` when hosted in SageMaker.
- ``name:`` The model name. If None, a default model name will be
selected on each ``deploy.``
- ``entry_point:`` Path (absolute or relative) to the Python file
which should be executed as the entry point to model hosting.
- ``source_dir:`` Optional. Path (absolute or relative) to a
directory with any other training source code dependencies including
the entry point file. Structure within this directory will be
preserved when training on SageMaker.
- ``enable_cloudwatch_metrics:`` Optional. If true, training
and hosting containers will generate Cloudwatch metrics under the
AWS/SageMakerContainer namespace.
- ``container_log_level:`` Log level to use within the container.
Valid values are defined in the Python logging module.
- ``code_location:`` Optional. Name of the S3 bucket where your
custom code will be uploaded to. If not specified, will use the
SageMaker default bucket created by sagemaker.Session.
- ``sagemaker_session:`` The SageMaker Session
object, used for SageMaker interaction

Your model data must be a .tar.gz file in S3. SageMaker Training Job model data is saved to .tar.gz files in S3,
however if you have local data you want to deploy, you can prepare the data yourself.

Assuming you have a local directory containg your model data named "my_model" you can tar and gzip compress the file and
upload to S3 using the following commands:

::
Now you can call the ``predict()`` method to get predictions from your deployed model.

tar -czf model.tar.gz my_model
aws s3 cp model.tar.gz s3://my-bucket/my-path/model.tar.gz
***********************************************
Attach an estimator to an existing training job
***********************************************

This uploads the contents of my_model to a gzip compressed tar file to S3 in the bucket "my-bucket", with the key
"my-path/model.tar.gz".
You can attach a PyTorch Estimator to an existing training job using the
``attach`` method.

To run this command, you'll need the AWS CLI tool installed. Please refer to our `FAQ`_ for more information on
installing this.
.. code:: python

.. _FAQ: ../../../README.rst#faq
my_training_job_name = 'MyAwesomePyTorchTrainingJob'
pytorch_estimator = PyTorch.attach(my_training_job_name)

After attaching, if the training job has finished with job status "Completed", it can be
``deploy``\ ed to create a SageMaker Endpoint and return a
``Predictor``. If the training job is in progress,
attach will block and display log messages from the training job, until the training job completes.

The ``attach`` method accepts the following arguments:

- ``training_job_name:`` The name of the training job to attach
to.
- ``sagemaker_session:`` The Session used
to interact with SageMaker

*************************
PyTorch Training Examples
Expand Down