-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: disable profiler by default for regions not support it #2134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: disable profiler by default for regions not support it #2134
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
@@ -349,6 +349,9 @@ def __init__( | |||
self.profiler_config = profiler_config | |||
self.disable_profiler = disable_profiler | |||
|
|||
if not _region_supports_profiler(self.sagemaker_session.boto_region_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think a test exercising this branch is warranted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have been passed in Unit and integ test (mxnet using default profiler confit in test profiler) from my side. Do you think it's necessary to test the branch in environment of China regions?
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
@metrizable can you help check the failed test of './amazon-sagemaker-examples/sagemaker-python-sdk/mxnet_onnx_eia/mxnet_onnx_eia.ipynb' in sagemaker-python-sdk-notebook-tests? (container not passing health check), as this PR is not for deployment. I have tested 'amazon-sagemaker-examples/sagemaker-python-sdk/mxnet_onnx_eia/mxnet_onnx_eia.ipynb' on 2.24.5 in us-west-2, it's ok. |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to throw an exception on the client side if the user attempts to use the profiler explicitly in an unsupported region?
I would rather the user's job request fail fast if they attempt to use a feature in an unsupported region, instead of defaulting everyone off and having the job succeed. The user may get confused about why the profiler wasn't enabled in their job, even though they set the flag.
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Thanks for the concern. I have received more than 10 cases from customers in China and the error log cannot directly show them how to handle this fail. I also worry about those who are new to SageMaker and we haven't touched directly in China and encounter this fail when start a very simple training job.(this may lead to user churn) It's better to change this parameter to 'True' for the region not supporting Profiler. I also agree with your concern about those who set this to 'False' but not enabled. So I recommend that we can add a print log inside this 'if' logic, to tell the user that this region is not supporting Profiler and we automatically set this to ‘True’, to eliminate their confusion. |
Is profiler on by default for all regions or is it because users are attempting to use it in an unsupported region? |
(I correct my above comment for the value of this parameter) |
Well... since it is on by default, I don't think you can discern easily whether or not a user wants to enable this feature in an unsupported region in the current if statement. You would have to do it where the defaults happen. At least how I understand it is, if we add print statement in here, everyone in an unsupported region will get this print region, even if they aren't aware of this feature. If the feature doesn't exist in that region, then disabling it by default seems fine. I am wondering why we enabled this feature by default and didn't make it an opt-in feature at release. |
I totally agree with you about why we enable this feature by default. I don't know the specific reason maybe have some other considerations. So in this PR I propose the commits after the default setting for unsupported regions, instead of changing the default parameter from 'False' to 'None' in the init function. Changing the default opt is an important issue I don't think we can do it in this PR. |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
In China regions (cn-north-1/cn-northwest-1), if sdk is updated from 1 to latest one and start a training job, an error occurrs:
Many customers have reported this error (No direct solution to it in the error log) and we have to help resolve it one by one, and this also brings potential customer's churn if we haven't covered them yet.
Description of changes:
add logic of judging whether region supports profiler, if not, set 'disable_profiler' to True.
Testing done:
Unit Testing
Integ Testing
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_base
to create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.