Skip to content

Commit 23b818b

Browse files
authored
Merge branch 'zwei' into support-multiple-accept
2 parents 1f1f70d + c4bb695 commit 23b818b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+920
-660
lines changed

doc/v2.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -203,6 +203,12 @@ To view logs after attaching a training job to an estimator, use :func:`sagemake
203203
until the completion of the Hyperparameter Tuning Job or Batch Transform Job, respectively.
204204
To make the function non-blocking, use ``wait=False``.
205205

206+
XGBoost Predictor
207+
-----------------
208+
209+
The default serializer of ``sagemaker.xgboost.model.XGBoostPredictor`` has been changed from ``NumpySerializer`` to ``LibSVMSerializer``.
210+
211+
206212
Parameter and Class Name Changes
207213
================================
208214

@@ -263,6 +269,8 @@ The follow serializer/deserializer classes have been renamed and/or moved:
263269
| ``sagemaker.predictor._JsonDeserializer`` | ``sagemaker.deserializers.JSONDeserializer`` |
264270
+--------------------------------------------------------+-------------------------------------------------------+
265271

272+
``sagemaker.serializers.LibSVMSerializer`` has been added in v2.0.
273+
266274
``distributions``
267275
~~~~~~~~~~~~~~~~~
268276

doc/workflows/kubernetes/using_amazon_sagemaker_components.rst

Lines changed: 17 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -463,21 +463,24 @@ you can create your classification pipeline. To create your pipeline,
463463
you need to define and compile it. You then deploy it and use it to run
464464
workflows. You can define your pipeline in Python and use the KFP
465465
dashboard, KFP CLI, or Python SDK to compile, deploy, and run your
466-
workflows.
466+
workflows. The full code for the MNIST classification pipeline example is available in the
467+
`Kubeflow Github
468+
repository <https://github.com/kubeflow/pipelines/blob/master/samples/contrib/aws-samples/mnist-kmeans-sagemaker>`__.
469+
To use it, clone the example Python files to your gateway node.
467470

468471
Prepare datasets
469472
~~~~~~~~~~~~~~~~
470473

471-
To run the pipelines, you need to have the datasets in an S3 bucket in
472-
your account. This bucket must be located in the region where you want
473-
to run Amazon SageMaker jobs. If you don’t have a bucket, create one
474+
To run the pipelines, you need to upload the data extraction pre-processing script to an S3 bucket. This bucket and all resources for this example must be located in the ``us-east-1`` Amazon Region. If you don’t have a bucket, create one
474475
using the steps in `Creating a
475476
bucket <https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html>`__.
476477

477-
From your gateway node, run the `sample dataset
478-
creation <https://github.com/kubeflow/pipelines/tree/34615cb19edfacf9f4d9f2417e9254d52dd53474/samples/contrib/aws-samples/mnist-kmeans-sagemaker#the-sample-dataset>`__
479-
script to copy the datasets into your bucket. Change the bucket name in
480-
the script to the one you created.
478+
From the ``mnist-kmeans-sagemaker`` folder of the Kubeflow repository you cloned on your gateway node, run the following command to upload the ``kmeans_preprocessing.py`` file to your S3 bucket. Change ``<bucket-name>`` to the name of the S3 bucket you created.
479+
480+
::
481+
482+
aws s3 cp mnist-kmeans-sagemaker/kmeans_preprocessing.py s3://<bucket-name>/mnist_kmeans_example/processing_code/kmeans_preprocessing.py
483+
481484

482485
Create a Kubeflow Pipeline using Amazon SageMaker Components
483486
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -496,54 +499,14 @@ parameters for each component of your pipeline. These parameters can
496499
also be updated when using other pipelines. We have provided default
497500
values for all parameters in the sample classification pipeline file.
498501

499-
The following are the only parameters you may need to modify to run the
500-
sample pipelines. To modify these parameters, update their entries in
501-
the sample classification pipeline file.
502+
The following are the only parameters you need to pass to run the
503+
sample pipelines. To pass these parameters, update their entries when creating a new run.
502504

503505
- **Role-ARN:** This must be the ARN of an IAM role that has full
504506
Amazon SageMaker access in your AWS account. Use the ARN
505507
of  ``kfp-example-pod-role``.
506508
507-
- **The Dataset Buckets**: You must change the S3 bucket with the input
508-
data for each of the components. Replace the following with the link
509-
to your S3 bucket:
510-
511-
- **Train channel:** ``"S3Uri": "s3://<your-s3-bucket-name>/data"``
512-
513-
- **HPO channels for test/HPO channel for
514-
train:** ``"S3Uri": "s3://<your-s3-bucket-name>/data"``
515-
516-
- **Batch
517-
transform:** ``"batch-input": "s3://<your-s3-bucket-name>/data"``
518-
519-
- **Output buckets:** Replace the output buckets with S3 buckets you
520-
have write permission to. Replace the following with the link to your
521-
S3 bucket:
522-
523-
- **Training/HPO**:
524-
``output_location='s3://<your-s3-bucket-name>/output'``
525-
526-
- **Batch Transform**:
527-
``batch_transform_ouput='s3://<your-s3-bucket-name>/output'``
528-
529-
- **Region:**\ The default pipelines work in us-east-1. If your
530-
cluster is in a different region, update the following:
531-
532-
- The ``region='us-east-1'`` Parameter in the input list.
533-
534-
- The algorithm images for Amazon SageMaker. If you use one of
535-
the Amazon SageMaker built-in algorithm images, select the image
536-
for your region. Construct the image name using the information
537-
in `Common parameters for built-in
538-
algorithms <https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html>`__.
539-
For Example:
540-
541-
::
542-
543-
382416733822.dkr.ecr.us-east-1.amazonaws.com/kmeans:1
544-
545-
- The S3 buckets with the dataset. Use the steps in Prepare datasets
546-
to copy the data to a bucket in the same region as the cluster.
509+
- **Bucket**: This is the name of the S3 bucket that you uploaded the ``kmeans_preprocessing.py`` file to.
547510

548511
You can adjust any of the input parameters using the KFP UI and trigger
549512
your run again.
@@ -632,18 +595,18 @@ currently does not support specifying input parameters while creating
632595
the run. You need to update your parameters in the Python pipeline file
633596
before compiling. Replace ``<experiment-name>`` and ``<job-name>``
634597
with any names. Replace ``<pipeline-id>`` with the ID of your submitted
635-
pipeline.
598+
pipeline. Replace ``<your-role-arn>`` with the ARN of ``kfp-example-pod-role``. Replace ``<your-bucket-name>`` with the name of the S3 bucket you created.
636599

637600
::
638601

639-
kfp run submit --experiment-name <experiment-name> --run-name <job-name> --pipeline-id <pipeline-id>
602+
kfp run submit --experiment-name <experiment-name> --run-name <job-name> --pipeline-id <pipeline-id> role_arn="<your-role-arn>" bucket_name="<your-bucket-name>"
640603

641604
You can also directly submit a run using the compiled pipeline package
642605
created as the output of the ``dsl-compile`` command.
643606

644607
::
645608

646-
kfp run submit --experiment-name <experiment-name> --run-name <job-name> --package-file <path-to-output>
609+
kfp run submit --experiment-name <experiment-name> --run-name <job-name> --package-file <path-to-output> role_arn="<your-role-arn>" bucket_name="<your-bucket-name>"
647610

648611
Your output should look like the following:
649612

src/sagemaker/algorithm.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -229,13 +229,13 @@ def hyperparameters(self):
229229
"""
230230
return self.hyperparam_dict
231231

232-
def train_image(self):
232+
def training_image_uri(self):
233233
"""Returns the docker image to use for training.
234234
235235
The fit() method, that does the model training, calls this method to
236236
find the image to use for model training.
237237
"""
238-
raise RuntimeError("train_image is never meant to be called on Algorithm Estimators")
238+
raise RuntimeError("training_image_uri is never meant to be called on Algorithm Estimators")
239239

240240
def enable_network_isolation(self):
241241
"""Return True if this Estimator will need network isolation to run.

src/sagemaker/amazon/amazon_estimator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ def __init__(
9191
)
9292
self._data_location = data_location
9393

94-
def train_image(self):
94+
def training_image_uri(self):
9595
"""Placeholder docstring"""
9696
return image_uris.retrieve(
9797
self.repo_name, self.sagemaker_session.boto_region_name, version=self.repo_version,

src/sagemaker/cli/compatibility/v2/ast_transformer.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
modifiers.renamed_params.SessionCreateEndpointImageURIRenamer(),
3636
modifiers.training_params.TrainPrefixRemover(),
3737
modifiers.training_input.TrainingInputConstructorRefactor(),
38+
modifiers.training_input.ShuffleConfigModuleRenamer(),
3839
modifiers.serde.SerdeConstructorRenamer(),
3940
]
4041

@@ -51,6 +52,7 @@
5152
modifiers.predictors.PredictorImportFromRenamer(),
5253
modifiers.tfs.TensorFlowServingImportFromRenamer(),
5354
modifiers.training_input.TrainingInputImportFromRenamer(),
55+
modifiers.training_input.ShuffleConfigImportFromRenamer(),
5456
modifiers.serde.SerdeImportFromAmazonCommonRenamer(),
5557
modifiers.serde.SerdeImportFromPredictorRenamer(),
5658
]

src/sagemaker/cli/compatibility/v2/modifiers/training_input.py

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,3 +100,73 @@ def modify_node(self, node):
100100
if node.module == "sagemaker.session":
101101
node.module = "sagemaker.inputs"
102102
return node
103+
104+
105+
class ShuffleConfigModuleRenamer(Modifier):
106+
"""A class to change ``ShuffleConfig`` usage to use ``sagemaker.inputs.ShuffleConfig``."""
107+
108+
def node_should_be_modified(self, node):
109+
"""Checks if the ``ast.Call`` node instantiates a class of interest.
110+
111+
This looks for the following calls:
112+
113+
- ``sagemaker.session.ShuffleConfig``
114+
- ``session.ShuffleConfig``
115+
116+
Args:
117+
node (ast.Call): a node that represents a function call. For more,
118+
see https://docs.python.org/3/library/ast.html#abstract-grammar.
119+
120+
Returns:
121+
bool: If the ``ast.Call`` instantiates a class of interest.
122+
"""
123+
if isinstance(node.func, ast.Name):
124+
return False
125+
126+
return matching.matches_name_or_namespaces(
127+
node, "ShuffleConfig", ("sagemaker.session", "session")
128+
)
129+
130+
def modify_node(self, node):
131+
"""Modifies the ``ast.Call`` node to call ``sagemaker.inputs.ShuffleConfig``.
132+
133+
Args:
134+
node (ast.Call): a node that represents a ``sagemaker.session.ShuffleConfig``
135+
constructor.
136+
137+
Returns:
138+
ast.Call: the original node, with its namespace changed to use the ``inputs`` module.
139+
"""
140+
_rename_namespace(node, "session")
141+
return node
142+
143+
144+
class ShuffleConfigImportFromRenamer(Modifier):
145+
"""A class to update import statements of ``ShuffleConfig``."""
146+
147+
def node_should_be_modified(self, node):
148+
"""Checks if the import statement imports ``sagemaker.session.ShuffleConfig``.
149+
150+
Args:
151+
node (ast.ImportFrom): a node that represents a ``from ... import ... `` statement.
152+
For more, see https://docs.python.org/3/library/ast.html#abstract-grammar.
153+
154+
Returns:
155+
bool: If the import statement imports ``sagemaker.session.ShuffleConfig``.
156+
"""
157+
return node.module == "sagemaker.session" and any(
158+
name.name == "ShuffleConfig" for name in node.names
159+
)
160+
161+
def modify_node(self, node):
162+
"""Changes the ``ast.ImportFrom`` node's namespace to ``sagemaker.inputs``.
163+
164+
Args:
165+
node (ast.ImportFrom): a node that represents a ``from ... import ... `` statement.
166+
For more, see https://docs.python.org/3/library/ast.html#abstract-grammar.
167+
168+
Returns:
169+
ast.ImportFrom: the original node, with its module modified to ``"sagemaker.inputs"``.
170+
"""
171+
node.module = "sagemaker.inputs"
172+
return node

src/sagemaker/cli/framework_upgrade.py

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,9 @@ def get_latest_values(existing_content, scope=None):
4141
)
4242

4343
latest_version = list(existing_content["versions"].keys())[-1]
44-
registries = existing_content["versions"][latest_version]["registries"]
45-
py_versions = existing_content["versions"][latest_version]["py_versions"]
46-
repository = existing_content["versions"][latest_version]["repository"]
44+
registries = existing_content["versions"][latest_version].get("registries", None)
45+
py_versions = existing_content["versions"][latest_version].get("py_versions", None)
46+
repository = existing_content["versions"][latest_version].get("repository", None)
4747

4848
return registries, py_versions, repository
4949

@@ -92,8 +92,9 @@ def add_dlc_framework_version(
9292
new_version = {
9393
"registries": registries,
9494
"repository": repository,
95-
"py_versions": py_versions,
9695
}
96+
if py_versions:
97+
new_version["py_versions"] = py_versions
9798
existing_content[scope]["versions"][full_version] = new_version
9899

99100

@@ -128,10 +129,11 @@ def add_algo_version(
128129
existing_content["scope"].append(scope)
129130

130131
new_version = {
131-
"py_versions": py_versions,
132132
"registries": registries,
133133
"repository": repository,
134134
}
135+
if py_versions:
136+
new_version["py_versions"] = py_versions
135137
if tag_prefix:
136138
new_version["tag_prefix"] = tag_prefix
137139
existing_content["versions"][full_version] = new_version
@@ -171,7 +173,8 @@ def add_version(
171173
py_versions (str): Supported Python versions (e.g. "py3,py37").
172174
tag_prefix (str): Algorithm image's tag prefix.
173175
"""
174-
py_versions = py_versions.split(",")
176+
if py_versions:
177+
py_versions = py_versions.split(",")
175178
processors = processors.split(",")
176179
latest_registries, latest_py_versions, latest_repository = get_latest_values(
177180
existing_content, scope

src/sagemaker/estimator.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -285,7 +285,7 @@ def __init__(
285285
self._enable_network_isolation = enable_network_isolation
286286

287287
@abstractmethod
288-
def train_image(self):
288+
def training_image_uri(self):
289289
"""Return the Docker image to use for training.
290290
291291
The :meth:`~sagemaker.estimator.EstimatorBase.fit` method, which does
@@ -329,7 +329,7 @@ def _ensure_base_job_name(self):
329329
"""Set ``self.base_job_name`` if it is not set already."""
330330
# honor supplied base_job_name or generate it
331331
if self.base_job_name is None:
332-
self.base_job_name = base_name_from_image(self.train_image())
332+
self.base_job_name = base_name_from_image(self.training_image_uri())
333333

334334
def _get_or_create_name(self, name=None):
335335
"""Generate a name based on the base job name or training image if needed.
@@ -507,7 +507,7 @@ def fit(self, inputs=None, wait=True, logs="All", job_name=None, experiment_conf
507507

508508
def _compilation_job_name(self):
509509
"""Placeholder docstring"""
510-
base_name = self.base_job_name or base_name_from_image(self.train_image())
510+
base_name = self.base_job_name or base_name_from_image(self.training_image_uri())
511511
return name_from_base("compilation-" + base_name)
512512

513513
def compile_model(
@@ -1083,7 +1083,7 @@ def start_new(cls, estimator, inputs, experiment_config):
10831083
if isinstance(estimator, sagemaker.algorithm.AlgorithmEstimator):
10841084
train_args["algorithm_arn"] = estimator.algorithm_arn
10851085
else:
1086-
train_args["image_uri"] = estimator.train_image()
1086+
train_args["image_uri"] = estimator.training_image_uri()
10871087

10881088
if estimator.debugger_rule_configs:
10891089
train_args["debugger_rule_configs"] = estimator.debugger_rule_configs
@@ -1350,7 +1350,7 @@ def __init__(
13501350
enable_network_isolation=enable_network_isolation,
13511351
)
13521352

1353-
def train_image(self):
1353+
def training_image_uri(self):
13541354
"""Returns the docker image to use for training.
13551355
13561356
The fit() method, that does the model training, calls this method to
@@ -1424,7 +1424,7 @@ def predict_wrapper(endpoint, session):
14241424
kwargs["enable_network_isolation"] = self.enable_network_isolation()
14251425

14261426
return Model(
1427-
image_uri or self.train_image(),
1427+
image_uri or self.training_image_uri(),
14281428
self.model_data,
14291429
role,
14301430
vpc_config=self.get_vpc_config(vpc_config_override),
@@ -1826,7 +1826,7 @@ class constructor
18261826

18271827
return init_params
18281828

1829-
def train_image(self):
1829+
def training_image_uri(self):
18301830
"""Return the Docker image to use for training.
18311831
18321832
The :meth:`~sagemaker.estimator.EstimatorBase.fit` method, which does

0 commit comments

Comments
 (0)