1
- ##############################
1
+ ###########################
2
2
Amazon SageMaker Processing
3
- ##############################
3
+ ###########################
4
4
5
5
6
6
Amazon SageMaker Processing allows you to run steps for data pre- or post-processing, feature engineering, data validation, or model evaluation workloads on Amazon SageMaker.
@@ -27,37 +27,38 @@ You can run notebooks on Amazon SageMaker that demonstrate end-to-end examples o
27
27
Data Pre-Processing and Model Evaluation with Scikit-Learn
28
28
==================================================================
29
29
30
- You can run a Scikit-Learn script to do data processing on SageMaker using the `SKLearnProcessor `_ class.
31
-
32
- .. _SKLearnProcessor : https://sagemaker.readthedocs.io/en/stable/sagemaker.sklearn.html#sagemaker.sklearn.processing.SKLearnProcessor
30
+ You can run a Scikit-Learn script to do data processing on SageMaker using the :class: `sagemaker.sklearn.processing.SKLearnProcessor ` class.
33
31
34
32
You first create a ``SKLearnProcessor ``
35
33
36
34
.. code :: python
37
35
38
36
from sagemaker.sklearn.processing import SKLearnProcessor
39
37
40
- sklearn_processor = SKLearnProcessor(framework_version = ' 0.20.0' ,
41
- role = ' [Your SageMaker-compatible IAM role]' ,
42
- instance_type = ' ml.m5.xlarge' ,
43
- instance_count = 1 )
38
+ sklearn_processor = SKLearnProcessor(
39
+ framework_version = ' 0.20.0' ,
40
+ role = ' [Your SageMaker-compatible IAM role]' ,
41
+ instance_type = ' ml.m5.xlarge' ,
42
+ instance_count = 1 ,
43
+ )
44
44
45
45
Then you can run a Scikit-Learn script ``preprocessing.py `` in a processing job. In this example, our script takes one input from S3 and one command-line argument, processes the data, then splits the data into two datasets for output. When the job is finished, we can retrive the output from S3.
46
46
47
47
.. code :: python
48
48
49
49
from sagemaker.processing import ProcessingInput, ProcessingOutput
50
50
51
- sklearn_processor.run(code = ' preprocessing.py' ,
52
- inputs = [ProcessingInput(
53
- source = ' s3://your-bucket/path/to/your/data,
54
- destination = ' /opt/ml/processing/input' )],
55
- outputs = [ProcessingOutput(output_name = ' train_data' ,
56
- source = ' /opt/ml/processing/train' ),
57
- ProcessingOutput(output_name = ' test_data' ,
58
- source = ' /opt/ml/processing/test' )],
59
- arguments = [' --train-test-split-ratio' , ' 0.2' ]
60
- )
51
+ sklearn_processor.run(
52
+ code = ' preprocessing.py' ,
53
+ inputs = [
54
+ ProcessingInput(source = ' s3://your-bucket/path/to/your/data, destination=' / opt/ ml/ processing/ input ' ),
55
+ ],
56
+ outputs = [
57
+ ProcessingOutput(output_name = ' train_data' , source = ' /opt/ml/processing/train' ),
58
+ ProcessingOutput(output_name = ' test_data' , source = ' /opt/ml/processing/test' ),
59
+ ],
60
+ arguments = [' --train-test-split-ratio' , ' 0.2' ],
61
+ )
61
62
62
63
preprocessing_job_description = sklearn_processor.jobs[- 1 ].describe()
63
64
@@ -69,31 +70,39 @@ For an in-depth look, please see the `Scikit-Learn Data Processing and Model Eva
69
70
Data Pre- Processing with Spark
70
71
==============================
71
72
72
- You can use the `ScriptProcessor `_ class to run a script in a processing container, including your own container.
73
-
74
- .. _ScriptProcessor : https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ScriptProcessor
73
+ You can use the :class :`sagemaker.processing.ScriptProcessor` class to run a script in a processing container, including your own container.
75
74
76
75
This example shows how you can run a processing job inside of a container that can run a Spark script called `` preprocess.py`` by invoking a command `` / opt/ program/ submit`` inside the container.
77
76
78
77
.. code:: python
79
78
80
79
from sagemaker.processing import ScriptProcessor, ProcessingInput
81
80
82
- spark_processor = ScriptProcessor(base_job_name = ' spark-preprocessor' ,
83
- image_uri = ' <ECR repository URI to your Spark processing image>' ,
84
- command = [' /opt/program/submit' ],
85
- role = role,
86
- instance_count = 2 ,
87
- instance_type = ' ml.r5.xlarge' ,
88
- max_runtime_in_seconds = 1200 ,
89
- env = {' mode' : ' python' })
90
-
91
- spark_processor.run(code = ' preprocess.py' ,
92
- arguments = [' s3_input_bucket' , bucket,
93
- ' s3_input_key_prefix' , input_prefix,
94
- ' s3_output_bucket' , bucket,
95
- ' s3_output_key_prefix' , input_preprocessed_prefix],
96
- logs = False )
81
+ spark_processor = ScriptProcessor(
82
+ base_job_name = ' spark-preprocessor' ,
83
+ image_uri = ' <ECR repository URI to your Spark processing image>' ,
84
+ command = [' /opt/program/submit' ],
85
+ role = role,
86
+ instance_count = 2 ,
87
+ instance_type = ' ml.r5.xlarge' ,
88
+ max_runtime_in_seconds = 1200 ,
89
+ env = {' mode' : ' python' },
90
+ )
91
+
92
+ spark_processor.run(
93
+ code = ' preprocess.py' ,
94
+ arguments = [
95
+ ' s3_input_bucket' ,
96
+ bucket,
97
+ ' s3_input_key_prefix' ,
98
+ input_prefix,
99
+ ' s3_output_bucket' ,
100
+ bucket,
101
+ ' s3_output_key_prefix' ,
102
+ input_preprocessed_prefix,
103
+ ],
104
+ logs = False ,
105
+ )
97
106
98
107
For an in - depth look, please see the `Feature Transformation with Spark` _ example notebook.
99
108
@@ -106,19 +115,19 @@ Learn More
106
115
Processing class documentation
107
116
------------------------------
108
117
109
- - `` Processor ``: https:// sagemaker.readthedocs.io/en/stable/ processing.html#sagemaker.processing. Processor
110
- - `` ScriptProcessor ``: https:// sagemaker.readthedocs.io/en/stable/ processing.html#sagemaker.processing. ScriptProcessor
111
- - `` SKLearnProcessor ``: https:// sagemaker.readthedocs.io/en/stable/sagemaker. sklearn.html#sagemaker.sklearn. processing.SKLearnProcessor
112
- - `` ProcessingInput ``: https:// sagemaker.readthedocs.io/en/stable/ processing.html#sagemaker.processing. ProcessingInput
113
- - `` ProcessingOutput ``: https:// sagemaker.readthedocs.io/en/stable/ processing.html#sagemaker.processing. ProcessingOutput
114
- - `` ProcessingJob ``: https:// sagemaker.readthedocs.io/en/stable/ processing.html#sagemaker.processing. ProcessingJob
118
+ - : class : ` sagemaker.processing.Processor`
119
+ - : class : ` sagemaker.processing.ScriptProcessor`
120
+ - : class : ` sagemaker.sklearn.processing.SKLearnProcessor`
121
+ - : class : ` sagemaker.processing.ProcessingInput`
122
+ - : class : ` sagemaker.processing.ProcessingOutput`
123
+ - : class : ` sagemaker.processing.ProcessingJob`
115
124
116
125
117
126
Further documentation
118
127
-------------------- -
119
128
120
- - Processing class documentation: https://sagemaker.readthedocs.io/en/stable/processing.html
121
- - AWS Documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html
122
- - AWS Notebook examples: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker_processing
123
- - Processing API documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateProcessingJob.html
124
- - Processing container specification: https://docs.aws.amazon.com/sagemaker/latest/dg/build-your-own-processing-container.html
129
+ - ` Processing class documentation < https:// sagemaker.readthedocs.io/ en/ stable/ processing.html> ` _
130
+ - ` AWS Documentation < https:// docs.aws.amazon.com/ sagemaker/ latest/ dg/ processing- job.html> ` _
131
+ - ` AWS Notebook examples < https:// github.com/ awslabs/ amazon- sagemaker- examples/ tree/ master/ sagemaker_processing> ` _
132
+ - ` Processing API documentation < https:// docs.aws.amazon.com/ sagemaker/ latest/ dg/ API_CreateProcessingJob .html> ` _
133
+ - ` Processing container specification < https:// docs.aws.amazon.com/ sagemaker/ latest/ dg/ build- your- own- processing- container.html> ` _
0 commit comments