Skip to content

Commit 3c5655b

Browse files
documentation: Add processing readthedocs (#1226)
Co-authored-by: Lauren Yu <[email protected]>
1 parent e857c91 commit 3c5655b

File tree

6 files changed

+168
-0
lines changed

6 files changed

+168
-0
lines changed

README.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ Table of Contents
6666
22. `SageMaker Autopilot <#sagemaker-autopilot>`__
6767
23. `Model Monitoring <#amazon-sagemaker-model-monitoring>`__
6868
24. `SageMaker Debugger <#amazon-sagemaker-debugger>`__
69+
25. `SageMaker Processing <#amazon-sagemaker-processing>`__
6970

7071

7172
Installing the SageMaker Python SDK
@@ -377,3 +378,13 @@ For more information, see `Amazon SageMaker Debugger`_.
377378

378379
.. _Amazon SageMaker Debugger: https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_debugger.html
379380

381+
382+
Amazon SageMaker Processing
383+
---------------------------------
384+
385+
You can use Amazon SageMaker Processing to perform data processing tasks such as data pre- and post-processing, feature engineering, data validation, and model evaluation
386+
387+
388+
For more information, see `Amazon SageMaker Processing`_.
389+
390+
.. _Amazon SageMaker Processing: https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_processing.html

doc/amazon_sagemaker_processing.rst

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
.. sectnum::
2+
3+
##############################
4+
Amazon SageMaker Processing
5+
##############################
6+
7+
8+
Amazon SageMaker Processing allows you to run steps for data pre- or post-processing, feature engineering, data validation, or model evaluation workloads on Amazon SageMaker.
9+
10+
.. contents::
11+
12+
Background
13+
==========
14+
15+
Amazon SageMaker lets developers and data scientists train and deploy machine learning models. With Amazon SageMaker Processing, you can run processing jobs on for data processing steps in your machine learning pipeline, which accept data from Amazon S3 as input, and put data into Amazon S3 as output.
16+
17+
.. image:: ./amazon_sagemaker_processing_image1.png
18+
19+
Setup
20+
=====
21+
22+
The fastest way to run get started with Amazon SageMaker Processing is by running a Jupyter notebook. You can follow the `Getting Started with Amazon SageMaker`_ guide to start running notebooks on Amazon SageMaker.
23+
24+
.. _Getting Started with Amazon SageMaker: https://docs.aws.amazon.com/sagemaker/latest/dg/gs.html
25+
26+
You can run notebooks on Amazon SageMaker that demonstrate end-to-end examples of using processing jobs to perform data pre-processing, feature engineering and model evaluation steps. See `Learn More`_ at the bottom of this page for more in-depth information.
27+
28+
29+
Data Pre-Processing and Model Evaluation with Scikit-Learn
30+
==================================================================
31+
32+
You can run a Scikit-Learn script to do data processing on SageMaker using the `SKLearnProcessor`_ class.
33+
34+
.. _SKLearnProcessor: https://sagemaker.readthedocs.io/en/stable/sagemaker.sklearn.html#sagemaker.sklearn.processing.SKLearnProcessor
35+
36+
You first create a ``SKLearnProcessor``
37+
38+
.. code:: python
39+
40+
from sagemaker.sklearn.processing import SKLearnProcessor
41+
42+
sklearn_processor = SKLearnProcessor(framework_version='0.20.0',
43+
role='[Your SageMaker-compatible IAM role]',
44+
instance_type='ml.m5.xlarge',
45+
instance_count=1)
46+
47+
Then you can run a Scikit-Learn script ``preprocessing.py`` in a processing job. In this example, our script takes one input from S3 and one command-line argument, processes the data, then splits the data into two datasets for output. When the job is finished, we can retrive the output from S3.
48+
49+
.. code:: python
50+
51+
from sagemaker.processing import ProcessingInput, ProcessingOutput
52+
53+
sklearn_processor.run(code='preprocessing.py',
54+
inputs=[ProcessingInput(
55+
source='s3://your-bucket/path/to/your/data,
56+
destination='/opt/ml/processing/input')],
57+
outputs=[ProcessingOutput(output_name='train_data',
58+
source='/opt/ml/processing/train'),
59+
ProcessingOutput(output_name='test_data',
60+
source='/opt/ml/processing/test')],
61+
arguments=['--train-test-split-ratio', '0.2']
62+
)
63+
64+
preprocessing_job_description = sklearn_processor.jobs[-1].describe()
65+
66+
For an in-depth look, please see the `Scikit-Learn Data Processing and Model Evaluation`_ example notebook.
67+
68+
.. _Scikit-Learn Data Processing and Model Evaluation: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_processing/scikit_learn_data_processing_and_model_evaluation/scikit_learn_data_processing_and_model_evaluation.ipynb
69+
70+
71+
Data Pre-Processing with Spark
72+
==============================
73+
74+
You can use the `ScriptProcessor`_ class to run a script in a processing container, including your own container.
75+
76+
.. _ScriptProcessor: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ScriptProcessor
77+
78+
This example shows how you can run a processing job inside of a container that can run a Spark script called ``preprocess.py`` by invoking a command ``/opt/program/submit`` inside the container.
79+
80+
.. code:: python
81+
82+
from sagemaker.processing import ScriptProcessor, ProcessingInput
83+
84+
spark_processor = ScriptProcessor(base_job_name='spark-preprocessor',
85+
image_uri='<ECR repository URI to your Spark processing image>',
86+
command=['/opt/program/submit'],
87+
role=role,
88+
instance_count=2,
89+
instance_type='ml.r5.xlarge',
90+
max_runtime_in_seconds=1200,
91+
env={'mode': 'python'})
92+
93+
spark_processor.run(code='preprocess.py',
94+
arguments=['s3_input_bucket', bucket,
95+
's3_input_key_prefix', input_prefix,
96+
's3_output_bucket', bucket,
97+
's3_output_key_prefix', input_preprocessed_prefix],
98+
logs=False)
99+
100+
For an in-depth look, please see the `Feature Transformation with Spark`_ example notebook.
101+
102+
.. _Feature Transformation with Spark: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_processing/feature_transformation_with_sagemaker_processing/feature_transformation_with_sagemaker_processing.ipynb
103+
104+
105+
Learn More
106+
==========
107+
108+
Processing class documentation
109+
------------------------------
110+
111+
- ``Processor``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.Processor
112+
- ``ScriptProcessor``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ScriptProcessor
113+
- ``SKLearnProcessor``: https://sagemaker.readthedocs.io/en/stable/sagemaker.sklearn.html#sagemaker.sklearn.processing.SKLearnProcessor
114+
- ``ProcessingInput``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingInput
115+
- ``ProcessingOutput``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingOutput
116+
- ``ProcessingJob``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingJob
117+
118+
119+
Further documentation
120+
---------------------
121+
122+
- Processing class documentation: https://sagemaker.readthedocs.io/en/stable/processing.html
123+
- ​​AWS Documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html
124+
- AWS Notebook examples: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker_processing
125+
- Processing API documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateProcessingJob.html
126+
- Processing container specification: https://docs.aws.amazon.com/sagemaker/latest/dg/build-your-own-processing-container.html
24.2 KB
Loading

doc/index.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -219,3 +219,13 @@ You can use Amazon SageMaker Debugger to automatically detect anomalies while tr
219219
:maxdepth: 2
220220

221221
amazon_sagemaker_debugger
222+
223+
***************************
224+
Amazon SageMaker Processing
225+
***************************
226+
You can use Amazon SageMaker Processing to perform data processing tasks such as data pre- and post-processing, feature engineering, data validation, and model evaluation
227+
228+
.. toctree::
229+
:maxdepth: 2
230+
231+
amazon_sagemaker_processing

doc/overview.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ SageMaker Python SDK provides several high-level abstractions for working with A
88
- **Models**: Encapsulate built ML models.
99
- **Predictors**: Provide real-time inference and transformation using Python data-types against a SageMaker endpoint.
1010
- **Session**: Provides a collection of methods for working with SageMaker resources.
11+
- **Transformers**: Encapsulate batch transform jobs for inference on SageMaker
12+
- **Processors**: Encapsulate running processing jobs for data processing on SageMaker
1113

1214
``Estimator`` and ``Model`` implementations for MXNet, TensorFlow, Chainer, PyTorch, scikit-learn, Amazon SageMaker built-in algorithms, Reinforcement Learning, are included.
1315
There's also an ``Estimator`` that runs SageMaker compatible custom Docker containers, enabling you to run your own ML algorithms by using the SageMaker Python SDK.
@@ -1057,6 +1059,17 @@ For more information, see `SageMaker Debugger`_.
10571059
10581060
.. _SageMaker Debugger: https://github.com/aws/sagemaker-python-sdk/blob/master/doc/amazon_sagemaker_debugger.rst
10591061
1062+
********************
1063+
SageMaker Processing
1064+
********************
1065+
You can use Amazon SageMaker Processing with "Processors" to perform data processing tasks such as data pre- and post-processing, feature engineering, data validation, and model evaluation
1066+
1067+
.. toctree::
1068+
:maxdepth: 2
1069+
1070+
amazon_sagemaker_processing
1071+
1072+
10601073
***
10611074
FAQ
10621075
***

doc/sagemaker.sklearn.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,11 @@ Scikit Learn Predictor
2424
:members:
2525
:undoc-members:
2626
:show-inheritance:
27+
28+
Scikit Learn Processor
29+
----------------------
30+
31+
.. autoclass:: sagemaker.sklearn.processing.SKLearnProcessor
32+
:members:
33+
:undoc-members:
34+
:show-inheritance:

0 commit comments

Comments
 (0)