Skip to content

Commit c4bb695

Browse files
authored
doc: update KFP full pipeline (#1771)
1 parent 4293c26 commit c4bb695

File tree

1 file changed

+17
-54
lines changed

1 file changed

+17
-54
lines changed

doc/workflows/kubernetes/using_amazon_sagemaker_components.rst

Lines changed: 17 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -463,21 +463,24 @@ you can create your classification pipeline. To create your pipeline,
463463
you need to define and compile it. You then deploy it and use it to run
464464
workflows. You can define your pipeline in Python and use the KFP
465465
dashboard, KFP CLI, or Python SDK to compile, deploy, and run your
466-
workflows.
466+
workflows. The full code for the MNIST classification pipeline example is available in the
467+
`Kubeflow Github
468+
repository <https://github.com/kubeflow/pipelines/blob/master/samples/contrib/aws-samples/mnist-kmeans-sagemaker>`__.
469+
To use it, clone the example Python files to your gateway node.
467470

468471
Prepare datasets
469472
~~~~~~~~~~~~~~~~
470473

471-
To run the pipelines, you need to have the datasets in an S3 bucket in
472-
your account. This bucket must be located in the region where you want
473-
to run Amazon SageMaker jobs. If you don’t have a bucket, create one
474+
To run the pipelines, you need to upload the data extraction pre-processing script to an S3 bucket. This bucket and all resources for this example must be located in the ``us-east-1`` Amazon Region. If you don’t have a bucket, create one
474475
using the steps in `Creating a
475476
bucket <https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html>`__.
476477

477-
From your gateway node, run the `sample dataset
478-
creation <https://github.com/kubeflow/pipelines/tree/34615cb19edfacf9f4d9f2417e9254d52dd53474/samples/contrib/aws-samples/mnist-kmeans-sagemaker#the-sample-dataset>`__
479-
script to copy the datasets into your bucket. Change the bucket name in
480-
the script to the one you created.
478+
From the ``mnist-kmeans-sagemaker`` folder of the Kubeflow repository you cloned on your gateway node, run the following command to upload the ``kmeans_preprocessing.py`` file to your S3 bucket. Change ``<bucket-name>`` to the name of the S3 bucket you created.
479+
480+
::
481+
482+
aws s3 cp mnist-kmeans-sagemaker/kmeans_preprocessing.py s3://<bucket-name>/mnist_kmeans_example/processing_code/kmeans_preprocessing.py
483+
481484

482485
Create a Kubeflow Pipeline using Amazon SageMaker Components
483486
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -496,54 +499,14 @@ parameters for each component of your pipeline. These parameters can
496499
also be updated when using other pipelines. We have provided default
497500
values for all parameters in the sample classification pipeline file.
498501

499-
The following are the only parameters you may need to modify to run the
500-
sample pipelines. To modify these parameters, update their entries in
501-
the sample classification pipeline file.
502+
The following are the only parameters you need to pass to run the
503+
sample pipelines. To pass these parameters, update their entries when creating a new run.
502504

503505
- **Role-ARN:** This must be the ARN of an IAM role that has full
504506
Amazon SageMaker access in your AWS account. Use the ARN
505507
of  ``kfp-example-pod-role``.
506508
507-
- **The Dataset Buckets**: You must change the S3 bucket with the input
508-
data for each of the components. Replace the following with the link
509-
to your S3 bucket:
510-
511-
- **Train channel:** ``"S3Uri": "s3://<your-s3-bucket-name>/data"``
512-
513-
- **HPO channels for test/HPO channel for
514-
train:** ``"S3Uri": "s3://<your-s3-bucket-name>/data"``
515-
516-
- **Batch
517-
transform:** ``"batch-input": "s3://<your-s3-bucket-name>/data"``
518-
519-
- **Output buckets:** Replace the output buckets with S3 buckets you
520-
have write permission to. Replace the following with the link to your
521-
S3 bucket:
522-
523-
- **Training/HPO**:
524-
``output_location='s3://<your-s3-bucket-name>/output'``
525-
526-
- **Batch Transform**:
527-
``batch_transform_ouput='s3://<your-s3-bucket-name>/output'``
528-
529-
- **Region:**\ The default pipelines work in us-east-1. If your
530-
cluster is in a different region, update the following:
531-
532-
- The ``region='us-east-1'`` Parameter in the input list.
533-
534-
- The algorithm images for Amazon SageMaker. If you use one of
535-
the Amazon SageMaker built-in algorithm images, select the image
536-
for your region. Construct the image name using the information
537-
in `Common parameters for built-in
538-
algorithms <https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html>`__.
539-
For Example:
540-
541-
::
542-
543-
382416733822.dkr.ecr.us-east-1.amazonaws.com/kmeans:1
544-
545-
- The S3 buckets with the dataset. Use the steps in Prepare datasets
546-
to copy the data to a bucket in the same region as the cluster.
509+
- **Bucket**: This is the name of the S3 bucket that you uploaded the ``kmeans_preprocessing.py`` file to.
547510

548511
You can adjust any of the input parameters using the KFP UI and trigger
549512
your run again.
@@ -632,18 +595,18 @@ currently does not support specifying input parameters while creating
632595
the run. You need to update your parameters in the Python pipeline file
633596
before compiling. Replace ``<experiment-name>`` and ``<job-name>``
634597
with any names. Replace ``<pipeline-id>`` with the ID of your submitted
635-
pipeline.
598+
pipeline. Replace ``<your-role-arn>`` with the ARN of ``kfp-example-pod-role``. Replace ``<your-bucket-name>`` with the name of the S3 bucket you created.
636599

637600
::
638601

639-
kfp run submit --experiment-name <experiment-name> --run-name <job-name> --pipeline-id <pipeline-id>
602+
kfp run submit --experiment-name <experiment-name> --run-name <job-name> --pipeline-id <pipeline-id> role_arn="<your-role-arn>" bucket_name="<your-bucket-name>"
640603

641604
You can also directly submit a run using the compiled pipeline package
642605
created as the output of the ``dsl-compile`` command.
643606

644607
::
645608

646-
kfp run submit --experiment-name <experiment-name> --run-name <job-name> --package-file <path-to-output>
609+
kfp run submit --experiment-name <experiment-name> --run-name <job-name> --package-file <path-to-output> role_arn="<your-role-arn>" bucket_name="<your-bucket-name>"
647610

648611
Your output should look like the following:
649612

0 commit comments

Comments
 (0)