-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feature: support JSON for input dataset and model output #3687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Codecov Report
@@ Coverage Diff @@
## master #3687 +/- ##
==========================================
- Coverage 89.79% 89.02% -0.77%
==========================================
Files 980 230 -750
Lines 92208 22811 -69397
==========================================
- Hits 82796 20307 -62489
+ Misses 9412 2504 -6908
... and 1209 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -311,7 +312,7 @@ def __init__( | |||
s3_analysis_config_output_path: Optional[str] = None, | |||
label: Optional[str] = None, | |||
headers: Optional[List[str]] = None, | |||
features: Optional[List[str]] = None, | |||
features: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a backward compatible change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mypy type was incorrect. When schema validation is done, it's required to be a str
, so it would have failed anyways: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/clarify.py#L62
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chasing back the changes: 853e4d8#diff-5742d4f1aec0dedeca53accab9a77e0a63cb45e6044419975002be003357df82L68
It was accidentally being changed to List[str], which should be str.
Better re-examine all these types since it's not being added by our team. @spoorn
src/sagemaker/clarify.py
Outdated
if dataset_type not in [ | ||
"text/csv", | ||
"application/jsonlines", | ||
"application/json", | ||
"application/x-parquet", | ||
"application/x-image", | ||
]: | ||
raise ValueError( | ||
f"Invalid dataset_type '{dataset_type}'." | ||
f" Please check the API documentation for the supported dataset types." | ||
) | ||
# parameters for analysis on datasets without facets are only supported for CSV datasets | ||
if dataset_type != "text/csv": | ||
if dataset_type not in [ | ||
"text/csv", | ||
"application/jsonlines", | ||
"application/json", | ||
"application/x-parquet", | ||
]: | ||
if predicted_label: | ||
raise ValueError( | ||
f"The parameter 'predicted_label' is not supported" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check on line 394 do not make sense to me. Can you please explain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predicted_label
is the model predicted labels that the user can optionally provide. excluded_columns
are the column(s) that are excluded from model inference and the Clarify report/computations. These are only supported for tabular datasets, thus we use an inclusive allowlist here and exclude other datasets such as images.
I added a comment to the code
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/bot run slow-tests
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
/bot run slow-tests |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/bot run pr,slow-tests
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Co-authored-by: Michael Trinh <[email protected]>
Issue #, if available: https://issues.amazon.com/issues/RAI-4252, https://issues.amazon.com/issues/RAI-4256
Description of changes: Add support for JSON for input dataset and model output for Clarify jobs. This includes enabling
application/json
type, updating documentation for fields' support for JSON, and removing some checks that blockedapplication/json
or some other types for functionality (predicted_label
andexcluded_columns
support).A follow up PR will be done to support JSON for model inputs.
Testing done: Ran the unit tests locally, manual testing using a notebook
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_base
to create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.