Skip to content

feature: EMR step runtime role support #3703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 23, 2023

Conversation

thbrooks22
Copy link
Contributor

@thbrooks22 thbrooks22 commented Mar 8, 2023

Issue #, if available: N/A

Description of changes: Allow runtime_role parameter to EMRStep step in SageMaker Pipelines. Note: I'd like to get this reviewed now, but we cannot merge until the backend changes are available in all new regions. I will keep this PR updated with that info, and engage SDK team as needed with updates.

Testing done: Two new unit tests, and one modification to an existing test.

  1. [New]: Check that the pipeline JSON contains ExecutionRoleArn fields where appropriate if it's included in the EMRStep arguments.
  2. [New]: Check that SDK throws exception if cluster_config and execution_role_arn are both not None.
  3. [Existing]: Check that if ExecutionRoleArn is not None, then it's not serialized to pipelines JSON.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@thbrooks22 thbrooks22 requested a review from a team as a code owner March 8, 2023 17:44
@thbrooks22 thbrooks22 requested review from knikure and removed request for a team March 8, 2023 17:44
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: acfafd8
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: acfafd8
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: acfafd8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: acfafd8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@knikure knikure self-assigned this Mar 8, 2023
@knikure
Copy link
Contributor

knikure commented Mar 8, 2023

@thbrooks22 Can you fix the flake8 and black check issues in your code?
Run tox -e black-format and then ./.githooks/pre-push to run all the checks.

@@ -155,6 +163,7 @@ def __init__(
depends_on: Optional[List[Union[str, Step, StepCollection]]] = None,
cache_config: CacheConfig = None,
cluster_config: Dict[str, Any] = None,
execution_role_arn: str = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class init method is public facing, which means once we release this change, it's not able to update the parameters here until next big version bump up, e.g. v2 -> v3.
So, have we considered why this is a good place to add this execution_role_arn arg? Or in other words, why execution_role_arn is not appropriate to be added in step_config.

Also let's add @aoguo64 to review as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

step_config is mapped to StepConfig in EMR's AddJobFlowSteps, it's not a single-step configuration but applies to all steps to run in a single AddJobFlowSteps (though Pipelines does only run one job step per EMR step).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qidewenwhen are you suggesting using kwargs intead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, then step_config is not a good place to go, in terms of execution_role_arn.

I'm just hoping to raise discussion and caution on this public facing interface change as it's not revertible.
And I do have a concern that if EMR side keeps adding new args, our EMRStep interface will keep expanding and eventually it may be hard to use, see example of the obsoleted RegisterModel interface which contains ~30 args: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/workflow/step_collections.py#L57-L94

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So **kwargs seems to be able to resolve my concern above but usually it may not be a good practice to include **kwargs.
Thus I'd suggest to have thorough consideration on this step interface and have more eyes on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced up offline: there's no better approach to avoid the call out issue. We will introduce a step_arg parameter to group all EMRStep argument related parameters in the future when the step interface grows too big.

@@ -198,9 +208,16 @@ def __init__(
if cluster_id is not None and cluster_config is not None:
raise ValueError(ERR_STR_WITH_BOTH_CLUSTER_ID_AND_CLUSTER_CFG.format(step_name=name))

if execution_role_arn is not None and cluster_config is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems execution_role_arn has to come along with cluster_id and the former is used to allow accessing the cluster_id right?

If it's true, then can we make the logic here more intuitive? e.g. check that execution_role_arn can be not None only if cluster_id is not None.

And it'd be good to improve the error message to make it more informative, e.g. "execution_role_arn should not be set if ... since it's used to ...", so that it can educate the users why the proper usage looks like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the two ways are equally intuitive here. There are two ways for the customer to make an EMR step: by specifying ID of existing cluster to add steps to, or by specifying config of new cluster to be created with steps. EMR runtime roles are only supported on the former request (where AddJobFlowSteps API is used) and not on the latter (where RunJobFlow API is used).

How much info should the error message include? Should it contain all the above background?

Copy link
Contributor

@qidewenwhen qidewenwhen Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it'd be better to include the following in the error message:

  • The context - what EMR runtime role is used for, i.e. to add step with the existing cluster.
    • at least we should include this in the sdk doc string as we don't expect customers to carefully explore all related docs, e.g. API docs
  • What's wrong - i.e. EMR runtime role should not be set if cluster_config is not None.

Given this context, that's why I feel it'd be better to group execution_role_arn validation with cluster_id. But as long as we put enough info in the doc string or error message to educate users and future developers, either way is fine.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: acfafd8
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@qidewenwhen
Copy link
Contributor

As the PR is public facing and is retrievable after merge, let's avoid adding any internal link here:

Issue #, if available: Customer issue tracked here.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: acfafd8
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: acfafd8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: acfafd8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: acfafd8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: acfafd8
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@thbrooks22 thbrooks22 force-pushed the emr_step_runtime_role_support branch from acfafd8 to ea28e10 Compare March 9, 2023 20:56
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: ea28e10
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: ea28e10
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: ea28e10
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: ea28e10
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@codecov-commenter
Copy link

codecov-commenter commented Mar 9, 2023

Codecov Report

Merging #3703 (a0f11f8) into master (4fb7c01) will decrease coverage by 0.78%.
The diff coverage is 100.00%.

❗ Current head a0f11f8 differs from pull request most recent head 1e0790a. Consider uploading reports for the commit 1e0790a to get more accurate results

@@            Coverage Diff             @@
##           master    #3703      +/-   ##
==========================================
- Coverage   89.84%   89.06%   -0.78%     
==========================================
  Files         984      231     -753     
  Lines       92360    22835   -69525     
==========================================
- Hits        82984    20339   -62645     
+ Misses       9376     2496    -6880     
Impacted Files Coverage Δ
src/sagemaker/workflow/emr_step.py 100.00% <100.00%> (ø)

... and 1214 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: ea28e10
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@@ -198,9 +208,16 @@ def __init__(
if cluster_id is not None and cluster_config is not None:
raise ValueError(ERR_STR_WITH_BOTH_CLUSTER_ID_AND_CLUSTER_CFG.format(step_name=name))

if execution_role_arn is not None and cluster_config is not None:
raise ValueError(ERR_STR_WITH_BOTH_EXEC_ROLE_ARN_AND_CLUSTER_CFG.format(step_name=name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we need to validate? This seems like a requirement from EMR side right? Should we start the EMR job and let it do this validation instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this validation in the DSL. I used the same pattern that we are enforcing for cluster_id and cluster_config: for these, we are checking both in the SDK and in the DSL that they are not both set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curious, do you know is there any reason we need to do this checking in both places?
If the DSL validation already covers these and they happen during creating pipeline time, we don't need to repeat the same validation in SDK.
However, I'm fine to follow the same pattern that we already have for cluster_id and cluster_config in this PR.

@knikure knikure added the do-not-merge Use when PR needs to be marked as do not merge label Mar 14, 2023
@thbrooks22 thbrooks22 force-pushed the emr_step_runtime_role_support branch 2 times, most recently from 2caa265 to a0f11f8 Compare March 14, 2023 16:55
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: a0f11f8
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: a0f11f8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: a0f11f8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: a0f11f8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: a0f11f8
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@thbrooks22 thbrooks22 force-pushed the emr_step_runtime_role_support branch from a0f11f8 to 1e0790a Compare March 22, 2023 20:27
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: 1e0790a
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 1e0790a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 1e0790a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 1e0790a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 1e0790a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: 1e0790a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@thbrooks22 thbrooks22 removed the do-not-merge Use when PR needs to be marked as do not merge label Mar 23, 2023
@knikure knikure merged commit 04e3f60 into aws:master Mar 23, 2023
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: 1e0790a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

doddaspk-amzn pushed a commit to doddaspk-amzn/sagemaker-python-sdk that referenced this pull request Apr 6, 2023
nmadan pushed a commit to nmadan/sagemaker-python-sdk that referenced this pull request Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants