aws
diff --git a/‎doc/v2.rst
Lines changed: 8 additions & 0 deletions b/‎doc/v2.rst
Lines changed: 8 additions & 0 deletions
diff --git a/‎doc/workflows/kubernetes/using_amazon_sagemaker_components.rst
Lines changed: 17 additions & 54 deletions b/‎doc/workflows/kubernetes/using_amazon_sagemaker_components.rst
Lines changed: 17 additions & 54 deletions
diff --git a/‎src/sagemaker/algorithm.py
Lines changed: 2 additions & 2 deletions b/‎src/sagemaker/algorithm.py
Lines changed: 2 additions & 2 deletions
diff --git a/‎src/sagemaker/amazon/amazon_estimator.py
Lines changed: 1 addition & 1 deletion b/‎src/sagemaker/amazon/amazon_estimator.py
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/sagemaker/cli/compatibility/v2/ast_transformer.py
Lines changed: 2 additions & 0 deletions b/‎src/sagemaker/cli/compatibility/v2/ast_transformer.py
Lines changed: 2 additions & 0 deletions
diff --git a/‎src/sagemaker/cli/compatibility/v2/modifiers/training_input.py
Lines changed: 70 additions & 0 deletions b/‎src/sagemaker/cli/compatibility/v2/modifiers/training_input.py
Lines changed: 70 additions & 0 deletions
diff --git a/‎src/sagemaker/cli/framework_upgrade.py
Lines changed: 9 additions & 6 deletions b/‎src/sagemaker/cli/framework_upgrade.py
Lines changed: 9 additions & 6 deletions
diff --git a/‎src/sagemaker/estimator.py
Lines changed: 7 additions & 7 deletions b/‎src/sagemaker/estimator.py
Lines changed: 7 additions & 7 deletions
@@ -203,6 +203,12 @@ To view logs after attaching a training job to an estimator, use :func:`sagemake
 until the completion of the Hyperparameter Tuning Job or Batch Transform Job, respectively.
 To make the function non-blocking, use ``wait=False``.
 
+XGBoost Predictor
+-----------------
+
+The default serializer of ``sagemaker.xgboost.model.XGBoostPredictor`` has been changed from ``NumpySerializer`` to ``LibSVMSerializer``.
+
+
 Parameter and Class Name Changes
 ================================
 
@@ -263,6 +269,8 @@ The follow serializer/deserializer classes have been renamed and/or moved:
 | ``sagemaker.predictor._JsonDeserializer``              | ``sagemaker.deserializers.JSONDeserializer``          |
 +--------------------------------------------------------+-------------------------------------------------------+
 
+``sagemaker.serializers.LibSVMSerializer`` has been added in v2.0.
+
 ``distributions``
 ~~~~~~~~~~~~~~~~~
 
 
@@ -463,21 +463,24 @@ you can create your classification pipeline. To create your pipeline,
 you need to define and compile it. You then deploy it and use it to run
 workflows. You can define your pipeline in Python and use the KFP
 dashboard, KFP CLI, or Python SDK to compile, deploy, and run your
-workflows.
+workflows. The full code for the MNIST classification pipeline example is available in the
+`Kubeflow Github
+repository <https://github.com/kubeflow/pipelines/blob/master/samples/contrib/aws-samples/mnist-kmeans-sagemaker>`__.
+To use it, clone the example Python files to your gateway node.
 
 Prepare datasets
 ~~~~~~~~~~~~~~~~
 
-To run the pipelines, you need to have the datasets in an S3 bucket in
-your account. This bucket must be located in the region where you want
-to run Amazon SageMaker jobs. If you don’t have a bucket, create one
+To run the pipelines, you need to upload the data extraction pre-processing script to an S3 bucket. This bucket and all resources for this example must be located in the ``us-east-1`` Amazon Region. If you don’t have a bucket, create one
 using the steps in `Creating a
 bucket <https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html>`__.
 
-From your gateway node, run the `sample dataset
-creation <https://github.com/kubeflow/pipelines/tree/34615cb19edfacf9f4d9f2417e9254d52dd53474/samples/contrib/aws-samples/mnist-kmeans-sagemaker#the-sample-dataset>`__
-script to copy the datasets into your bucket. Change the bucket name in
-the script to the one you created.
+From the ``mnist-kmeans-sagemaker`` folder of the Kubeflow repository you cloned on your gateway node, run the following command to upload the ``kmeans_preprocessing.py`` file to your S3 bucket. Change ``<bucket-name>`` to the name of the S3 bucket you created.
+
+::
+
+    aws s3 cp mnist-kmeans-sagemaker/kmeans_preprocessing.py s3://<bucket-name>/mnist_kmeans_example/processing_code/kmeans_preprocessing.py
+
 
 Create a Kubeflow Pipeline using Amazon SageMaker Components
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -496,54 +499,14 @@ parameters for each component of your pipeline. These parameters can
 also be updated when using other pipelines. We have provided default
 values for all parameters in the sample classification pipeline file.
 
-The following are the only parameters you may need to modify to run the
-sample pipelines. To modify these parameters, update their entries in
-the sample classification pipeline file.
+The following are the only parameters you need to pass to run the
+sample pipelines. To pass these parameters, update their entries when creating a new run.
 
 -  **Role-ARN:** This must be the ARN of an IAM role that has full
    Amazon SageMaker access in your AWS account. Use the ARN
    of  ``kfp-example-pod-role``.
 
--  **The Dataset Buckets**: You must change the S3 bucket with the input
-   data for each of the components. Replace the following with the link
-   to your S3 bucket:
-
-   -  **Train channel:** ``"S3Uri": "s3://<your-s3-bucket-name>/data"``
-
-   -  **HPO channels for test/HPO channel for
-      train:** ``"S3Uri": "s3://<your-s3-bucket-name>/data"``
-
-   -  **Batch
-      transform:** ``"batch-input": "s3://<your-s3-bucket-name>/data"``
-
--  **Output buckets:** Replace the output buckets with S3 buckets you
-   have write permission to. Replace the following with the link to your
-   S3 bucket:
-
-   -  **Training/HPO**:
-      ``output_location='s3://<your-s3-bucket-name>/output'``
-
-   -  **Batch Transform**:
-      ``batch_transform_ouput='s3://<your-s3-bucket-name>/output'``
-
--  **Region:**\ The default pipelines work in us-east-1. If your
-   cluster is in a different region, update the following:
-
-   -  The ``region='us-east-1'`` Parameter in the input list.
-
-   -  The algorithm images for Amazon SageMaker. If you use one of
-      the Amazon SageMaker built-in algorithm images, select the image
-      for your region. Construct the image name using the information
-      in `Common parameters for built-in
-      algorithms <https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html>`__.
-      For Example:
-
-      ::
-
-          382416733822.dkr.ecr.us-east-1.amazonaws.com/kmeans:1
-
-   -  The S3 buckets with the dataset. Use the steps in Prepare datasets
-      to copy the data to a bucket in the same region as the cluster.
+-  **Bucket**: This is the name of the S3 bucket that you uploaded the ``kmeans_preprocessing.py`` file to.
 
 You can adjust any of the input parameters using the KFP UI and trigger
 your run again.
@@ -632,18 +595,18 @@ currently does not support specifying input parameters while creating
 the run. You need to update your parameters in the Python pipeline file
 before compiling. Replace ``<experiment-name>`` and ``<job-name>``
 with any names. Replace ``<pipeline-id>`` with the ID of your submitted
-pipeline.
+pipeline. Replace ``<your-role-arn>`` with the ARN of ``kfp-example-pod-role``. Replace ``<your-bucket-name>`` with the name of the S3 bucket you created.
 
 ::
 
-    kfp run submit --experiment-name <experiment-name> --run-name <job-name> --pipeline-id <pipeline-id>
+    kfp run submit --experiment-name <experiment-name> --run-name <job-name> --pipeline-id <pipeline-id> role_arn="<your-role-arn>" bucket_name="<your-bucket-name>"
 
 You can also directly submit a run using the compiled pipeline package
 created as the output of the ``dsl-compile`` command.
 
 ::
 
-    kfp run submit --experiment-name <experiment-name> --run-name <job-name> --package-file <path-to-output>
+    kfp run submit --experiment-name <experiment-name> --run-name <job-name> --package-file <path-to-output> role_arn="<your-role-arn>" bucket_name="<your-bucket-name>"
 
 Your output should look like the following:
 
 
@@ -229,13 +229,13 @@ def hyperparameters(self):
         """
         return self.hyperparam_dict
 
-    def train_image(self):
+    def training_image_uri(self):
         """Returns the docker image to use for training.
 
         The fit() method, that does the model training, calls this method to
         find the image to use for model training.
         """
-        raise RuntimeError("train_image is never meant to be called on Algorithm Estimators")
+        raise RuntimeError("training_image_uri is never meant to be called on Algorithm Estimators")
 
     def enable_network_isolation(self):
         """Return True if this Estimator will need network isolation to run.
 
@@ -91,7 +91,7 @@ def __init__(
         )
         self._data_location = data_location
 
-    def train_image(self):
+    def training_image_uri(self):
         """Placeholder docstring"""
         return image_uris.retrieve(
             self.repo_name, self.sagemaker_session.boto_region_name, version=self.repo_version,
 
@@ -35,6 +35,7 @@
     modifiers.renamed_params.SessionCreateEndpointImageURIRenamer(),
     modifiers.training_params.TrainPrefixRemover(),
     modifiers.training_input.TrainingInputConstructorRefactor(),
+    modifiers.training_input.ShuffleConfigModuleRenamer(),
     modifiers.serde.SerdeConstructorRenamer(),
 ]
 
@@ -51,6 +52,7 @@
     modifiers.predictors.PredictorImportFromRenamer(),
     modifiers.tfs.TensorFlowServingImportFromRenamer(),
     modifiers.training_input.TrainingInputImportFromRenamer(),
+    modifiers.training_input.ShuffleConfigImportFromRenamer(),
     modifiers.serde.SerdeImportFromAmazonCommonRenamer(),
     modifiers.serde.SerdeImportFromPredictorRenamer(),
 ]
 
@@ -100,3 +100,73 @@ def modify_node(self, node):
             if node.module == "sagemaker.session":
                 node.module = "sagemaker.inputs"
         return node
+
+
+class ShuffleConfigModuleRenamer(Modifier):
+    """A class to change ``ShuffleConfig`` usage to use ``sagemaker.inputs.ShuffleConfig``."""
+
+    def node_should_be_modified(self, node):
+        """Checks if the ``ast.Call`` node instantiates a class of interest.
+
+        This looks for the following calls:
+
+        - ``sagemaker.session.ShuffleConfig``
+        - ``session.ShuffleConfig``
+
+        Args:
+            node (ast.Call): a node that represents a function call. For more,
+                see https://docs.python.org/3/library/ast.html#abstract-grammar.
+
+        Returns:
+            bool: If the ``ast.Call`` instantiates a class of interest.
+        """
+        if isinstance(node.func, ast.Name):
+            return False
+
+        return matching.matches_name_or_namespaces(
+            node, "ShuffleConfig", ("sagemaker.session", "session")
+        )
+
+    def modify_node(self, node):
+        """Modifies the ``ast.Call`` node to call ``sagemaker.inputs.ShuffleConfig``.
+
+        Args:
+            node (ast.Call): a node that represents a ``sagemaker.session.ShuffleConfig``
+                constructor.
+
+        Returns:
+            ast.Call: the original node, with its namespace changed to use the ``inputs`` module.
+        """
+        _rename_namespace(node, "session")
+        return node
+
+
+class ShuffleConfigImportFromRenamer(Modifier):
+    """A class to update import statements of ``ShuffleConfig``."""
+
+    def node_should_be_modified(self, node):
+        """Checks if the import statement imports ``sagemaker.session.ShuffleConfig``.
+
+        Args:
+            node (ast.ImportFrom): a node that represents a ``from ... import ... `` statement.
+                For more, see https://docs.python.org/3/library/ast.html#abstract-grammar.
+
+        Returns:
+            bool: If the import statement imports ``sagemaker.session.ShuffleConfig``.
+        """
+        return node.module == "sagemaker.session" and any(
+            name.name == "ShuffleConfig" for name in node.names
+        )
+
+    def modify_node(self, node):
+        """Changes the ``ast.ImportFrom`` node's namespace to ``sagemaker.inputs``.
+
+        Args:
+            node (ast.ImportFrom): a node that represents a ``from ... import ... `` statement.
+                For more, see https://docs.python.org/3/library/ast.html#abstract-grammar.
+
+        Returns:
+            ast.ImportFrom: the original node, with its module modified to ``"sagemaker.inputs"``.
+        """
+        node.module = "sagemaker.inputs"
+        return node
@@ -41,9 +41,9 @@ def get_latest_values(existing_content, scope=None):
             )
 
     latest_version = list(existing_content["versions"].keys())[-1]
-    registries = existing_content["versions"][latest_version]["registries"]
-    py_versions = existing_content["versions"][latest_version]["py_versions"]
-    repository = existing_content["versions"][latest_version]["repository"]
+    registries = existing_content["versions"][latest_version].get("registries", None)
+    py_versions = existing_content["versions"][latest_version].get("py_versions", None)
+    repository = existing_content["versions"][latest_version].get("repository", None)
 
     return registries, py_versions, repository
 
@@ -92,8 +92,9 @@ def add_dlc_framework_version(
     new_version = {
         "registries": registries,
         "repository": repository,
-        "py_versions": py_versions,
     }
+    if py_versions:
+        new_version["py_versions"] = py_versions
     existing_content[scope]["versions"][full_version] = new_version
 
 
@@ -128,10 +129,11 @@ def add_algo_version(
             existing_content["scope"].append(scope)
 
     new_version = {
-        "py_versions": py_versions,
         "registries": registries,
         "repository": repository,
     }
+    if py_versions:
+        new_version["py_versions"] = py_versions
     if tag_prefix:
         new_version["tag_prefix"] = tag_prefix
     existing_content["versions"][full_version] = new_version
@@ -171,7 +173,8 @@ def add_version(
         py_versions (str): Supported Python versions (e.g. "py3,py37").
         tag_prefix (str): Algorithm image's tag prefix.
      """
-    py_versions = py_versions.split(",")
+    if py_versions:
+        py_versions = py_versions.split(",")
     processors = processors.split(",")
     latest_registries, latest_py_versions, latest_repository = get_latest_values(
         existing_content, scope
 
@@ -285,7 +285,7 @@ def __init__(
         self._enable_network_isolation = enable_network_isolation
 
     @abstractmethod
-    def train_image(self):
+    def training_image_uri(self):
         """Return the Docker image to use for training.
 
         The :meth:`~sagemaker.estimator.EstimatorBase.fit` method, which does
@@ -329,7 +329,7 @@ def _ensure_base_job_name(self):
         """Set ``self.base_job_name`` if it is not set already."""
         # honor supplied base_job_name or generate it
         if self.base_job_name is None:
-            self.base_job_name = base_name_from_image(self.train_image())
+            self.base_job_name = base_name_from_image(self.training_image_uri())
 
     def _get_or_create_name(self, name=None):
         """Generate a name based on the base job name or training image if needed.
@@ -507,7 +507,7 @@ def fit(self, inputs=None, wait=True, logs="All", job_name=None, experiment_conf
 
     def _compilation_job_name(self):
         """Placeholder docstring"""
-        base_name = self.base_job_name or base_name_from_image(self.train_image())
+        base_name = self.base_job_name or base_name_from_image(self.training_image_uri())
         return name_from_base("compilation-" + base_name)
 
     def compile_model(
@@ -1083,7 +1083,7 @@ def start_new(cls, estimator, inputs, experiment_config):
         if isinstance(estimator, sagemaker.algorithm.AlgorithmEstimator):
             train_args["algorithm_arn"] = estimator.algorithm_arn
         else:
-            train_args["image_uri"] = estimator.train_image()
+            train_args["image_uri"] = estimator.training_image_uri()
 
         if estimator.debugger_rule_configs:
             train_args["debugger_rule_configs"] = estimator.debugger_rule_configs
@@ -1350,7 +1350,7 @@ def __init__(
             enable_network_isolation=enable_network_isolation,
         )
 
-    def train_image(self):
+    def training_image_uri(self):
         """Returns the docker image to use for training.
 
         The fit() method, that does the model training, calls this method to
@@ -1424,7 +1424,7 @@ def predict_wrapper(endpoint, session):
             kwargs["enable_network_isolation"] = self.enable_network_isolation()
 
         return Model(
-            image_uri or self.train_image(),
+            image_uri or self.training_image_uri(),
             self.model_data,
             role,
             vpc_config=self.get_vpc_config(vpc_config_override),
@@ -1826,7 +1826,7 @@ class constructor
 
         return init_params
 
-    def train_image(self):
+    def training_image_uri(self):
         """Return the Docker image to use for training.
 
         The :meth:`~sagemaker.estimator.EstimatorBase.fit` method, which does
Original file line number	Diff line number	Diff line change
`@@ -91,7 +91,7 @@ def __init__(`
`91`	`91`	`)`
`92`	`92`	`self._data_location = data_location`
`93`	`93`
`94`		`- def train_image(self):`
	`94`	`+ def training_image_uri(self):`
`95`	`95`	`"""Placeholder docstring"""`
`96`	`96`	`return image_uris.retrieve(`
`97`	`97`	`self.repo_name, self.sagemaker_session.boto_region_name, version=self.repo_version,`
Original file line number	Diff line number	Diff line change
`@@ -35,6 +35,7 @@`
`35`	`35`	`modifiers.renamed_params.SessionCreateEndpointImageURIRenamer(),`
`36`	`36`	`modifiers.training_params.TrainPrefixRemover(),`
`37`	`37`	`modifiers.training_input.TrainingInputConstructorRefactor(),`
	`38`	`+ modifiers.training_input.ShuffleConfigModuleRenamer(),`
`38`	`39`	`modifiers.serde.SerdeConstructorRenamer(),`
`39`	`40`	`]`
`40`	`41`
`@@ -51,6 +52,7 @@`
`51`	`52`	`modifiers.predictors.PredictorImportFromRenamer(),`
`52`	`53`	`modifiers.tfs.TensorFlowServingImportFromRenamer(),`
`53`	`54`	`modifiers.training_input.TrainingInputImportFromRenamer(),`
	`55`	`+ modifiers.training_input.ShuffleConfigImportFromRenamer(),`
`54`	`56`	`modifiers.serde.SerdeImportFromAmazonCommonRenamer(),`
`55`	`57`	`modifiers.serde.SerdeImportFromPredictorRenamer(),`
`56`	`58`	`]`