-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feature: Support all processor types in ProcessingStep #2209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: Support all processor types in ProcessingStep #2209
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
…ub.com/staubhp/sagemaker-python-sdk into processing-step-support-all-processors
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I am mainly concerned about ensuring that extending classes implement their additional layer. I'm wondering if an abstractmethod might be the best solution for this?
However, looking at this base class, it looks like it isn't an abstract base class, so it might be a lot more work. So I am okay with this default method definition as well.
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Dan <[email protected]> Co-authored-by: Ahsan Khan <[email protected]>
Issue #, if available:
#2197
Description of changes:
SageMaker ML Pipelines was launched at reInvent 2020. These pipelines are composed of steps representing various SageMaker jobs/actions, and are built using the SageMaker Python SDK.
The ProcessingStep constructor mostly matches the
run()
method on theScriptProcessor
class. This works for processors that inheritScriptProcessor
directly likeSKLearnProcessor
, but presents a problem for processors that have an additional layer of abstraction, likePySparkProcessor
. The latter take special arguments in theirrun()
method which are transformed in that method and passed to their parent class. Because of this disconnect, users don't have an easy way of passing arguments for such processors toProcessingStep
.This change adds a
get_run_args()
method to theScriptProcessor
base class which subclasses can override. This method takes the same arguments as the processor'srun()
method and does the necessary transformation, returning aRunArgs
object with properties matching those on theProcessingStep
constructor.Testing done:
This change adds unit tests and integ tests for the new method. In addition, it was tested manually in a local notebook.
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_base
to create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.