aws · laurenyu · Jan 17, 2020 · Jan 11, 2020 · Jan 17, 2020 · laurenyu
@@ -66,6 +66,7 @@ Table of Contents
 22. `SageMaker Autopilot <#sagemaker-autopilot>`__
 23. `Model Monitoring <#amazon-sagemaker-model-monitoring>`__
 24. `SageMaker Debugger <#amazon-sagemaker-debugger>`__
+25. `SageMaker Processing <#amazon-sagemaker-processing>`__
 
 
 Installing the SageMaker Python SDK
@@ -377,3 +378,13 @@ For more information, see `Amazon SageMaker Debugger`_.
 
 .. _Amazon SageMaker Debugger: https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_debugger.html
 
+
+Amazon SageMaker Processing
+---------------------------------
+
+You can use Amazon SageMaker Processing to perform data processing tasks such as data pre- and post-processing, feature engineering, data validation, and model evaluation
+
+
+For more information, see `Amazon SageMaker Processing`_.
+
+.. _Amazon SageMaker Processing: https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_processing.html
@@ -0,0 +1,126 @@
+.. sectnum::
+
+##############################
+Amazon SageMaker Processing
+##############################
+
+
+Amazon SageMaker Processing allows you to run steps for data pre- or post-processing, feature engineering, data validation, or model evaluation workloads on Amazon SageMaker.
+
+.. contents::
+
+Background
+==========
+
+Amazon SageMaker lets developers and data scientists train and deploy machine learning models. With Amazon SageMaker Processing, you can run processing jobs on for data processing steps in your machine learning pipeline, which accept data from Amazon S3 as input, and put data into Amazon S3 as output.
+
+.. image:: ./amazon_sagemaker_processing_image1.png
+
+Setup
+=====
+
+The fastest way to run get started with Amazon SageMaker Processing is by running a Jupyter notebook. You can follow the `Getting Started with Amazon SageMaker`_ guide to start running notebooks on Amazon SageMaker.
+
+.. _Getting Started with Amazon SageMaker: https://docs.aws.amazon.com/sagemaker/latest/dg/gs.html
+
+You can run notebooks on Amazon SageMaker that demonstrate end-to-end examples of using processing jobs to perform data pre-processing, feature engineering and model evaluation steps. See `Learn More`_ at the bottom of this page for more in-depth information.
+
+
+Data Pre-Processing and Model Evaluation with Scikit-Learn
+==================================================================
+
+You can run a Scikit-Learn script to do data processing on SageMaker using the `SKLearnProcessor`_ class.
+
+.. _SKLearnProcessor: https://sagemaker.readthedocs.io/en/stable/sagemaker.sklearn.html#sagemaker.sklearn.processing.SKLearnProcessor
+
+You first create a ``SKLearnProcessor``
+
+.. code:: python
+
+    from sagemaker.sklearn.processing import SKLearnProcessor
+
+    sklearn_processor = SKLearnProcessor(framework_version='0.20.0',
+                                     role='[Your SageMaker-compatible IAM role]',
+                                     instance_type='ml.m5.xlarge',
+                                     instance_count=1)
+
+Then you can run a Scikit-Learn script ``preprocessing.py`` in a processing job. In this example, our script takes one input from S3 and one command-line argument, processes the data, then splits the data into two datasets for output. When the job is finished, we can retrive the output from S3.
+
+.. code:: python
+
+    from sagemaker.processing import ProcessingInput, ProcessingOutput
+
+    sklearn_processor.run(code='preprocessing.py',
+                      inputs=[ProcessingInput(
+                        source='s3://your-bucket/path/to/your/data,
+                        destination='/opt/ml/processing/input')],
+                      outputs=[ProcessingOutput(output_name='train_data',
+                                                source='/opt/ml/processing/train'),
+                               ProcessingOutput(output_name='test_data',
+                                                source='/opt/ml/processing/test')],
+                      arguments=['--train-test-split-ratio', '0.2']
+                     )
+
+    preprocessing_job_description = sklearn_processor.jobs[-1].describe()
+
+For an in-depth look, please see the `Scikit-Learn Data Processing and Model Evaluation`_ example notebook.
+
+.. _Scikit-Learn Data Processing and Model Evaluation: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_processing/scikit_learn_data_processing_and_model_evaluation/scikit_learn_data_processing_and_model_evaluation.ipynb
+
+
+Data Pre-Processing with Spark
+==============================
+
+You can use the `ScriptProcessor`_ class to run a script in a processing container, including your own container.
+
+.. _ScriptProcessor: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ScriptProcessor
+
+This example shows how you can run a processing job inside of a container that can run a Spark script called ``preprocess.py`` by invoking a command ``/opt/program/submit`` inside the container.
+
+.. code:: python
+
+    from sagemaker.processing import ScriptProcessor, ProcessingInput
+
+    spark_processor = ScriptProcessor(base_job_name='spark-preprocessor',
+                                  image_uri='<ECR repository URI to your Spark processing image>',
+                                  command=['/opt/program/submit'],
+                                  role=role,
+                                  instance_count=2,
+                                  instance_type='ml.r5.xlarge',
+                                  max_runtime_in_seconds=1200,
+                                  env={'mode': 'python'})
+
+    spark_processor.run(code='preprocess.py',
+                    arguments=['s3_input_bucket', bucket,
+                               's3_input_key_prefix', input_prefix,
+                               's3_output_bucket', bucket,
+                               's3_output_key_prefix', input_preprocessed_prefix],
+                    logs=False)
+
+For an in-depth look, please see the `Feature Transformation with Spark`_ example notebook.
+
+.. _Feature Transformation with Spark: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_processing/feature_transformation_with_sagemaker_processing/feature_transformation_with_sagemaker_processing.ipynb
+
+
+Learn More
+==========
+
+Processing class documentation
+------------------------------
+
+- ``Processor``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.Processor
+- ``ScriptProcessor``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ScriptProcessor
+- ``SKLearnProcessor``: https://sagemaker.readthedocs.io/en/stable/sagemaker.sklearn.html#sagemaker.sklearn.processing.SKLearnProcessor
+- ``ProcessingInput``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingInput
+- ``ProcessingOutput``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingOutput
+- ``ProcessingJob``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingJob
+
+
+Further documentation
+---------------------
+
+- Processing class documentation: https://sagemaker.readthedocs.io/en/stable/processing.html
+- AWS Documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html
+- AWS Notebook examples: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker_processing
+- Processing API documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateProcessingJob.html
+- Processing container specification: https://docs.aws.amazon.com/sagemaker/latest/dg/build-your-own-processing-container.html
@@ -219,3 +219,13 @@ You can use Amazon SageMaker Debugger to automatically detect anomalies while tr
     :maxdepth: 2
 
     amazon_sagemaker_debugger
+
+***************************
+Amazon SageMaker Processing
+***************************
+You can use Amazon SageMaker Processing to perform data processing tasks such as data pre- and post-processing, feature engineering, data validation, and model evaluation
+
+.. toctree::
+    :maxdepth: 2
+
+    amazon_sagemaker_processing
@@ -8,6 +8,8 @@ SageMaker Python SDK provides several high-level abstractions for working with A
 - **Models**: Encapsulate built ML models.
 - **Predictors**: Provide real-time inference and transformation using Python data-types against a SageMaker endpoint.
 - **Session**: Provides a collection of methods for working with SageMaker resources.
+- **Transformers**: Encapsulate batch transform jobs for inference on SageMaker
+- **Processors**: Encapsulate running processing jobs for data processing on SageMaker
 
 ``Estimator`` and ``Model`` implementations for MXNet, TensorFlow, Chainer, PyTorch, scikit-learn, Amazon SageMaker built-in algorithms, Reinforcement Learning,  are included.
 There's also an ``Estimator`` that runs SageMaker compatible custom Docker containers, enabling you to run your own ML algorithms by using the SageMaker Python SDK.
@@ -1057,6 +1059,17 @@ For more information, see `SageMaker Debugger`_.
 
 .. _SageMaker Debugger: https://github.com/aws/sagemaker-python-sdk/blob/master/doc/amazon_sagemaker_debugger.rst
 
+********************
+SageMaker Processing
+********************
+You can use Amazon SageMaker Processing with "Processors" to perform data processing tasks such as data pre- and post-processing, feature engineering, data validation, and model evaluation
+
+.. toctree::
+    :maxdepth: 2
+
+    amazon_sagemaker_processing
+
+
 ***
 FAQ
 ***

@@ -24,3 +24,11 @@ Scikit Learn Predictor
     :members:
     :undoc-members:
     :show-inheritance:
+
+Scikit Learn Processor
+----------------------
+
+.. autoclass:: sagemaker.sklearn.processing.SKLearnProcessor
+    :members:
+    :undoc-members:
+    :show-inheritance: