Skip to content

Added doc update for dataset builder #3539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Dec 15, 2022
Merged

Conversation

mizanfiu
Copy link
Contributor

Issue #, if available:
Added doc update for dataset builder

Description of changes:
Added doc update for dataset builder

Testing done:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

imingtsou and others added 18 commits December 13, 2022 01:00
)

* Bug fixed - as_of, event_range, join, default behavior and duplicates and tests

Bugs:
1. as_of was not working properly on deleted events
2. Same event_time_range
3. Join was not working when including feature names
4. Default sql was returning only most recent, whereas it should all excluding duplicates
5. Include duplicates was not return all non-deleted data
6. instanceof(dataframe) case was also applied to non-df cases while join
7. Include column was returning unnecessary columns.

* Fix on pylint error

* Fix on include_duplicated_records for panda data frames

* Fix format issue for black

* Bug fixed related to line break

* Bug fix related to dataframe and inclde_deleted_record and include_duplicated_record

* Addressed comments and code refactored

* changed to_csv to to_csv_file and added error messages for query limit and recent record limit

* Revert a change which was not intended

* Resolved the leak of feature group deletion in integration test
@mizanfiu mizanfiu requested a review from a team as a code owner December 14, 2022 18:59
@mizanfiu mizanfiu requested review from claytonparnell and removed request for a team December 14, 2022 18:59
@mizanfiu
Copy link
Contributor Author

/bot run all

1 similar comment
@mizanfiu
Copy link
Contributor Author

/bot run all

Copy link
Contributor

@navinsoni navinsoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bot run all

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: ac8954a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: ac8954a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: ac8954a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: ac8954a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: ac8954a
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Collaborator

@claytonparnell claytonparnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bot run unit-tests

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: c948ee8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: c948ee8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: c948ee8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: c948ee8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: c948ee8
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@codecov-commenter
Copy link

codecov-commenter commented Dec 14, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.78%. Comparing base (1cbfc83) to head (c948ee8).
Report is 1447 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3539      +/-   ##
==========================================
- Coverage   89.57%   88.78%   -0.80%     
==========================================
  Files         960      226     -734     
  Lines       88744    21945   -66799     
==========================================
- Hits        79495    19483   -60012     
+ Misses       9249     2462    -6787     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: c948ee8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@claytonparnell claytonparnell merged commit eef679c into aws:master Dec 15, 2022
claytonparnell pushed a commit to claytonparnell/sagemaker-python-sdk that referenced this pull request Dec 16, 2022
* Add list_feature_groups API (aws#647)

* feat: Feature/get record api (aws#650)

Co-authored-by: Eric Zou <[email protected]>

* Add delete_record API (aws#664)

* feat: Add DatasetBuilder class (aws#667)

Co-authored-by: Eric Zou <[email protected]>

* feat: Add to_csv method in DatasetBuilder (aws#699)

* feat: Add pandas.Dataframe as base case (aws#708)

* feat: Add with_feature_group method in DatasetBuilder (aws#726)

* feat: Handle merge and timestamp filters (aws#727)

* feat: Add to_dataframe method in DatasetBuilder (aws#729)

* Address TODOs (aws#731)

* Unit test for DatasetBuilder (aws#734)

* fix: Fix list_feature_groups max_results (aws#744)

* Add integration tests for create_dataset (aws#743)

* feature: Aggregate commits

* fix: as_of, event_range, join, default behavior and duplicates… (aws#764)

* Bug fixed - as_of, event_range, join, default behavior and duplicates and tests

Bugs:
1. as_of was not working properly on deleted events
2. Same event_time_range
3. Join was not working when including feature names
4. Default sql was returning only most recent, whereas it should all excluding duplicates
5. Include duplicates was not return all non-deleted data
6. instanceof(dataframe) case was also applied to non-df cases while join
7. Include column was returning unnecessary columns.

* Fix on pylint error

* Fix on include_duplicated_records for panda data frames

* Fix format issue for black

* Bug fixed related to line break

* Bug fix related to dataframe and inclde_deleted_record and include_duplicated_record

* Addressed comments and code refactored

* changed to_csv to to_csv_file and added error messages for query limit and recent record limit

* Revert a change which was not intended

* Resolved the leak of feature group deletion in integration test

* Added doc update for dataset builder

* Fix the issue in doc

Co-authored-by: Yiming Zou <[email protected]>
Co-authored-by: Brandon Chatham <[email protected]>
Co-authored-by: Eric Zou <[email protected]>
Co-authored-by: jiapinw <[email protected]>
mufaddal-rohawala pushed a commit to mufaddal-rohawala/sagemaker-python-sdk that referenced this pull request Dec 19, 2022
* Add list_feature_groups API (aws#647)

* feat: Feature/get record api (aws#650)

Co-authored-by: Eric Zou <[email protected]>

* Add delete_record API (aws#664)

* feat: Add DatasetBuilder class (aws#667)

Co-authored-by: Eric Zou <[email protected]>

* feat: Add to_csv method in DatasetBuilder (aws#699)

* feat: Add pandas.Dataframe as base case (aws#708)

* feat: Add with_feature_group method in DatasetBuilder (aws#726)

* feat: Handle merge and timestamp filters (aws#727)

* feat: Add to_dataframe method in DatasetBuilder (aws#729)

* Address TODOs (aws#731)

* Unit test for DatasetBuilder (aws#734)

* fix: Fix list_feature_groups max_results (aws#744)

* Add integration tests for create_dataset (aws#743)

* feature: Aggregate commits

* fix: as_of, event_range, join, default behavior and duplicates… (aws#764)

* Bug fixed - as_of, event_range, join, default behavior and duplicates and tests

Bugs:
1. as_of was not working properly on deleted events
2. Same event_time_range
3. Join was not working when including feature names
4. Default sql was returning only most recent, whereas it should all excluding duplicates
5. Include duplicates was not return all non-deleted data
6. instanceof(dataframe) case was also applied to non-df cases while join
7. Include column was returning unnecessary columns.

* Fix on pylint error

* Fix on include_duplicated_records for panda data frames

* Fix format issue for black

* Bug fixed related to line break

* Bug fix related to dataframe and inclde_deleted_record and include_duplicated_record

* Addressed comments and code refactored

* changed to_csv to to_csv_file and added error messages for query limit and recent record limit

* Revert a change which was not intended

* Resolved the leak of feature group deletion in integration test

* Added doc update for dataset builder

* Fix the issue in doc

Co-authored-by: Yiming Zou <[email protected]>
Co-authored-by: Brandon Chatham <[email protected]>
Co-authored-by: Eric Zou <[email protected]>
Co-authored-by: jiapinw <[email protected]>
mufaddal-rohawala pushed a commit that referenced this pull request Dec 20, 2022
* Add list_feature_groups API (#647)

* feat: Feature/get record api (#650)

Co-authored-by: Eric Zou <[email protected]>

* Add delete_record API (#664)

* feat: Add DatasetBuilder class (#667)

Co-authored-by: Eric Zou <[email protected]>

* feat: Add to_csv method in DatasetBuilder (#699)

* feat: Add pandas.Dataframe as base case (#708)

* feat: Add with_feature_group method in DatasetBuilder (#726)

* feat: Handle merge and timestamp filters (#727)

* feat: Add to_dataframe method in DatasetBuilder (#729)

* Address TODOs (#731)

* Unit test for DatasetBuilder (#734)

* fix: Fix list_feature_groups max_results (#744)

* Add integration tests for create_dataset (#743)

* feature: Aggregate commits

* fix: as_of, event_range, join, default behavior and duplicates… (#764)

* Bug fixed - as_of, event_range, join, default behavior and duplicates and tests

Bugs:
1. as_of was not working properly on deleted events
2. Same event_time_range
3. Join was not working when including feature names
4. Default sql was returning only most recent, whereas it should all excluding duplicates
5. Include duplicates was not return all non-deleted data
6. instanceof(dataframe) case was also applied to non-df cases while join
7. Include column was returning unnecessary columns.

* Fix on pylint error

* Fix on include_duplicated_records for panda data frames

* Fix format issue for black

* Bug fixed related to line break

* Bug fix related to dataframe and inclde_deleted_record and include_duplicated_record

* Addressed comments and code refactored

* changed to_csv to to_csv_file and added error messages for query limit and recent record limit

* Revert a change which was not intended

* Resolved the leak of feature group deletion in integration test

* Added doc update for dataset builder

* Fix the issue in doc

Co-authored-by: Yiming Zou <[email protected]>
Co-authored-by: Brandon Chatham <[email protected]>
Co-authored-by: Eric Zou <[email protected]>
Co-authored-by: jiapinw <[email protected]>
JoseJuan98 pushed a commit to JoseJuan98/sagemaker-python-sdk that referenced this pull request Mar 4, 2023
* Add list_feature_groups API (aws#647)

* feat: Feature/get record api (aws#650)

Co-authored-by: Eric Zou <[email protected]>

* Add delete_record API (aws#664)

* feat: Add DatasetBuilder class (aws#667)

Co-authored-by: Eric Zou <[email protected]>

* feat: Add to_csv method in DatasetBuilder (aws#699)

* feat: Add pandas.Dataframe as base case (aws#708)

* feat: Add with_feature_group method in DatasetBuilder (aws#726)

* feat: Handle merge and timestamp filters (aws#727)

* feat: Add to_dataframe method in DatasetBuilder (aws#729)

* Address TODOs (aws#731)

* Unit test for DatasetBuilder (aws#734)

* fix: Fix list_feature_groups max_results (aws#744)

* Add integration tests for create_dataset (aws#743)

* feature: Aggregate commits

* fix: as_of, event_range, join, default behavior and duplicates… (aws#764)

* Bug fixed - as_of, event_range, join, default behavior and duplicates and tests

Bugs:
1. as_of was not working properly on deleted events
2. Same event_time_range
3. Join was not working when including feature names
4. Default sql was returning only most recent, whereas it should all excluding duplicates
5. Include duplicates was not return all non-deleted data
6. instanceof(dataframe) case was also applied to non-df cases while join
7. Include column was returning unnecessary columns.

* Fix on pylint error

* Fix on include_duplicated_records for panda data frames

* Fix format issue for black

* Bug fixed related to line break

* Bug fix related to dataframe and inclde_deleted_record and include_duplicated_record

* Addressed comments and code refactored

* changed to_csv to to_csv_file and added error messages for query limit and recent record limit

* Revert a change which was not intended

* Resolved the leak of feature group deletion in integration test

* Added doc update for dataset builder

* Fix the issue in doc

Co-authored-by: Yiming Zou <[email protected]>
Co-authored-by: Brandon Chatham <[email protected]>
Co-authored-by: Eric Zou <[email protected]>
Co-authored-by: jiapinw <[email protected]>
JoseJuan98 pushed a commit to JoseJuan98/sagemaker-python-sdk that referenced this pull request Mar 4, 2023
* Add list_feature_groups API (aws#647)

* feat: Feature/get record api (aws#650)

Co-authored-by: Eric Zou <[email protected]>

* Add delete_record API (aws#664)

* feat: Add DatasetBuilder class (aws#667)

Co-authored-by: Eric Zou <[email protected]>

* feat: Add to_csv method in DatasetBuilder (aws#699)

* feat: Add pandas.Dataframe as base case (aws#708)

* feat: Add with_feature_group method in DatasetBuilder (aws#726)

* feat: Handle merge and timestamp filters (aws#727)

* feat: Add to_dataframe method in DatasetBuilder (aws#729)

* Address TODOs (aws#731)

* Unit test for DatasetBuilder (aws#734)

* fix: Fix list_feature_groups max_results (aws#744)

* Add integration tests for create_dataset (aws#743)

* feature: Aggregate commits

* fix: as_of, event_range, join, default behavior and duplicates… (aws#764)

* Bug fixed - as_of, event_range, join, default behavior and duplicates and tests

Bugs:
1. as_of was not working properly on deleted events
2. Same event_time_range
3. Join was not working when including feature names
4. Default sql was returning only most recent, whereas it should all excluding duplicates
5. Include duplicates was not return all non-deleted data
6. instanceof(dataframe) case was also applied to non-df cases while join
7. Include column was returning unnecessary columns.

* Fix on pylint error

* Fix on include_duplicated_records for panda data frames

* Fix format issue for black

* Bug fixed related to line break

* Bug fix related to dataframe and inclde_deleted_record and include_duplicated_record

* Addressed comments and code refactored

* changed to_csv to to_csv_file and added error messages for query limit and recent record limit

* Revert a change which was not intended

* Resolved the leak of feature group deletion in integration test

* Added doc update for dataset builder

* Fix the issue in doc

Co-authored-by: Yiming Zou <[email protected]>
Co-authored-by: Brandon Chatham <[email protected]>
Co-authored-by: Eric Zou <[email protected]>
Co-authored-by: jiapinw <[email protected]>
nmadan pushed a commit to nmadan/sagemaker-python-sdk that referenced this pull request Apr 18, 2023
* Add list_feature_groups API (aws#647)

* feat: Feature/get record api (aws#650)

Co-authored-by: Eric Zou <[email protected]>

* Add delete_record API (aws#664)

* feat: Add DatasetBuilder class (aws#667)

Co-authored-by: Eric Zou <[email protected]>

* feat: Add to_csv method in DatasetBuilder (aws#699)

* feat: Add pandas.Dataframe as base case (aws#708)

* feat: Add with_feature_group method in DatasetBuilder (aws#726)

* feat: Handle merge and timestamp filters (aws#727)

* feat: Add to_dataframe method in DatasetBuilder (aws#729)

* Address TODOs (aws#731)

* Unit test for DatasetBuilder (aws#734)

* fix: Fix list_feature_groups max_results (aws#744)

* Add integration tests for create_dataset (aws#743)

* feature: Aggregate commits

* fix: as_of, event_range, join, default behavior and duplicates… (aws#764)

* Bug fixed - as_of, event_range, join, default behavior and duplicates and tests

Bugs:
1. as_of was not working properly on deleted events
2. Same event_time_range
3. Join was not working when including feature names
4. Default sql was returning only most recent, whereas it should all excluding duplicates
5. Include duplicates was not return all non-deleted data
6. instanceof(dataframe) case was also applied to non-df cases while join
7. Include column was returning unnecessary columns.

* Fix on pylint error

* Fix on include_duplicated_records for panda data frames

* Fix format issue for black

* Bug fixed related to line break

* Bug fix related to dataframe and inclde_deleted_record and include_duplicated_record

* Addressed comments and code refactored

* changed to_csv to to_csv_file and added error messages for query limit and recent record limit

* Revert a change which was not intended

* Resolved the leak of feature group deletion in integration test

* Added doc update for dataset builder

* Fix the issue in doc

Co-authored-by: Yiming Zou <[email protected]>
Co-authored-by: Brandon Chatham <[email protected]>
Co-authored-by: Eric Zou <[email protected]>
Co-authored-by: jiapinw <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants