Skip to content

breaking: require framework_version, py_version for pytorch #1568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 11, 2020

Conversation

metrizable
Copy link
Contributor

require framework_version and py_version for PyTorch

Issue #, if available: #1465

Description of changes:

changes include:

  • framework_version, py_version required for framework PyTorch
  • framework_version, py_version required for framework PyTorchModel
  • re-order of args: convention to follow entry_point, push args with defaults below
  • testing updates
  • doc updates
  • ignore coverage results for py27 env due to v2 migration scripts

Testing done:

unit:

% tox --parallel 3 tests/unit
✔ OK black-format in 8.908 seconds
✔ OK twine in 13.317 seconds
✔ OK flake8 in 22.256 seconds
✔ OK pylint in 33.459 seconds
✔ OK doc8 in 18.439 seconds
✔ OK py36 in 41.879 seconds
✔ OK sphinx in 1 minute, 19.063 seconds
✔ OK py37 in 41.58 seconds
✔ OK py27 in 1 minute, 47.14 seconds

integ:

% export IGNORE_COVERAGE=- ; tox -e py37 -- -s -vv tests/integ/test_pytorch_train.py; unset IGNORE_COVERAGE
...
tests/integ/test_pytorch_train.py::test_region <- tests/integ/__init__.py PASSED
tests/integ/test_pytorch_train.py::test_sync_fit_deploy 2020-06-10 16:37:12 Starting - Starting the training job...
2020-06-10 16:37:15 Starting - Launching requested ML instances......
...
...
2020-06-10 16:49:30,774 [INFO ] pool-1-thread-2 ACCESS_LOG - /10.32.0.2:47254 "GET /ping HTTP/1.1" 200 0
PASSED
tests/integ/test_pytorch_train.py::test_fit_deploy
...
algo-1-ny85y_1  | 2020-06-10 16:58:01,649 [INFO ] W-9002-model ACCESS_LOG - /172.17.0.1:54396 "POST /invocations HTTP/1.1" 200 23
Gracefully stopping... (press Ctrl+C again to force)
PASSED
tests/integ/test_pytorch_train.py::test_deploy_model
...
2020-06-10 17:05:40,357 [INFO ] pool-1-thread-2 ACCESS_LOG - /10.32.0.2:60302 "GET /ping HTTP/1.1" 200 1
PASSED
tests/integ/test_pytorch_train.py::test_deploy_packed_model_with_entry_point_name
...

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to any/all clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

* framework_version, py_version required for framework PyTorch
* framework_version, py_version required for framework PyTorchModel
* re-order of args: convention to follow entry_point, push args with defaults below
* testing updates
* doc updates
* ignore coverage results for py27 env due to v2 migration scripts
@metrizable metrizable force-pushed the require-framework-version-pytorch branch from f2a13ef to d850210 Compare June 10, 2020 17:14
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: f2a13ef
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: f2a13ef
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: d850210
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: f2a13ef
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-zwei-pr
  • Commit ID: d850210
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: d850210
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: d850210
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 5f4926e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: a1d4192
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 5f4926e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: a1d4192
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@laurenyu laurenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small comment. otherwise looks good.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-zwei-pr
  • Commit ID: a1d4192
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 5f4926e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: a1d4192
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: c55492e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: c55492e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: c55492e
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 8015936
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 8015936
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 8015936
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

laurenyu
laurenyu previously approved these changes Jun 11, 2020
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-zwei-pr
  • Commit ID: 8015936
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 9b92c02
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 9b92c02
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-zwei-pr
  • Commit ID: 9b92c02
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 9b92c02
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@laurenyu laurenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can hardcode the PyTorch versions to whatever the default used to be for the ones that aren't easily upgradable to PyTorch 1.4

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: a9c51b5
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: a9c51b5
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-zwei-pr
  • Commit ID: a9c51b5
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: a9c51b5
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@metrizable metrizable force-pushed the require-framework-version-pytorch branch from a9c51b5 to 8527ed4 Compare June 11, 2020 17:48
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 8527ed4
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-zwei-pr
  • Commit ID: 8527ed4
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 8527ed4
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 8527ed4
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@metrizable metrizable merged commit 7919331 into aws:zwei Jun 11, 2020
@metrizable metrizable deleted the require-framework-version-pytorch branch June 11, 2020 19:29
pintaoz-aws added a commit that referenced this pull request Dec 4, 2024
* Base model trainer (#1521)

* Base model trainer

* flake8

* add testing notebook

* add param validation & set defaults

* Implement simple train method

* feature: support script mode with local train.sh (#1523)

* feature: support script mode with local train.sh

* Stop tracking train.sh and add it to .gitignore

* update message

* make dir if not exist

* fix docs

* fix: docstyle

* Address comments

* fix hyperparams

* Revert pydantic custom error

* pylint

* Image Spec refactoring and updates (#1525)

* Image Spec refactoring and updates

* Unit tests and update function for Image Spec

* Fix hugging face test

* Fix Tests

* Add unit tests for ModelTrainer (#1527)

* Add unit tests for ModelTrainer

* Flake8

* format

* Add example notebook (#1528)

* Add testing notebook

* format

* use smaller data

* remove large dataset

* update

* pylint

* flake8

* ignore docstyle in directories with test

* format

* format

* Add enviornment variable bootstrapping script (#1530)

* Add enviornment variables scripts

* format

* fix comment

* add docstrings

* fix comment

* feature: add utility function to capture local snapshot (#1524)

* local snapshot

* Update pip list command

* Remove function calls

* Address comments

* Address comments

* Change to make Model Trainer return a Model Object

* Fix

* Cleanup

* Support intelligent parameters (#1540)

* Support intelligent parameters

* fix codestyle

* Revert Image Spec (#1541)

* Cleanup ModelTrainer (#1542)

* General image builder (#1546)

* General image builder

* General image builder

* Fix codestyle

* Fix codestyle

* Move location

* Add warnings

* Add integ tests

* Fix integ test

* Fix integ test

* Fix region error

* Add region

* Latest Container Image (#1545)

* Latest Container Image

* Test Fixes

* Parameterized tests and some logic updates

* Test fixes

* Move to Image URI

* Fixes for unit test

* Fixes for unit test

* Fix codestyle error checks

* Cleanup ModelTrainer code (#1552)

* Updates

* feat: add pre-processing and post-processing logic to inference_spec (#1560)

* add pre-processing and post-processing logic to inference_spec

* fix format

* make  accept_type and content_type optional

* remove accept_type and content_type from pre/post processing

* correct typo

* Add Distributed Training Support Model Trainer (#1536)

* Add path to set Additional Settings in ModelTrainer (#1555)

* Updates

* Mask Sensitive Env Logs in Container (#1568)

* Cleanup PR

* Codestyle fixes

* Update logic to use model parameter instead of model_path

* Fixes

* Fixes

* Tests

* Codestyle Fixes

* Codestyle Fixes

* Codestyle Fixes

* Codestyle Fixes

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
pintaoz-aws added a commit that referenced this pull request Dec 4, 2024
* Base model trainer (#1521)

* Base model trainer

* flake8

* add testing notebook

* add param validation & set defaults

* Implement simple train method

* feature: support script mode with local train.sh (#1523)

* feature: support script mode with local train.sh

* Stop tracking train.sh and add it to .gitignore

* update message

* make dir if not exist

* fix docs

* fix: docstyle

* Address comments

* fix hyperparams

* Revert pydantic custom error

* pylint

* Image Spec refactoring and updates (#1525)

* Image Spec refactoring and updates

* Unit tests and update function for Image Spec

* Fix hugging face test

* Fix Tests

* Add unit tests for ModelTrainer (#1527)

* Add unit tests for ModelTrainer

* Flake8

* format

* Add example notebook (#1528)

* Add testing notebook

* format

* use smaller data

* remove large dataset

* update

* pylint

* flake8

* ignore docstyle in directories with test

* format

* format

* Add enviornment variable bootstrapping script (#1530)

* Add enviornment variables scripts

* format

* fix comment

* add docstrings

* fix comment

* feature: add utility function to capture local snapshot (#1524)

* local snapshot

* Update pip list command

* Remove function calls

* Address comments

* Address comments

* Support intelligent parameters (#1540)

* Support intelligent parameters

* fix codestyle

* Revert Image Spec (#1541)

* Cleanup ModelTrainer (#1542)

* General image builder (#1546)

* General image builder

* General image builder

* Fix codestyle

* Fix codestyle

* Move location

* Add warnings

* Add integ tests

* Fix integ test

* Fix integ test

* Fix region error

* Add region

* Latest Container Image (#1545)

* Latest Container Image

* Test Fixes

* Parameterized tests and some logic updates

* Test fixes

* Move to Image URI

* Fixes for unit test

* Fixes for unit test

* Fix codestyle error checks

* Cleanup ModelTrainer code (#1552)

* Single container local mode training

* Add wait argument

* Implement helper funtions

* Add helper functions

* Fix bugs

* Fix codestyle

* feat: add pre-processing and post-processing logic to inference_spec (#1560)

* add pre-processing and post-processing logic to inference_spec

* fix format

* make  accept_type and content_type optional

* remove accept_type and content_type from pre/post processing

* correct typo

* Fix test and codestyle

* Add Distributed Training Support Model Trainer (#1536)

* Add tests

* Add path to set Additional Settings in ModelTrainer (#1555)

* Added example notebook

* Fix codestyle

* Address comments

* resolve merge conflict

* Support multi container local training (#1576)

* Fix codestyle

* Mask Sensitive Env Logs in Container (#1568)

* Fix bug in script mode setup ModelTrainer (#1575)

* Support multi container local training

* Merge branch 'single_container_local_training' into multi_container_local_training

* Update unit tests

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* Remove LocalTrainingJob class

* Bypass pydantic check

* Add example

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
pintaoz-aws added a commit that referenced this pull request Dec 4, 2024
* Base model trainer (#1521)

* Base model trainer

* flake8

* add testing notebook

* add param validation & set defaults

* Implement simple train method

* feature: support script mode with local train.sh (#1523)

* feature: support script mode with local train.sh

* Stop tracking train.sh and add it to .gitignore

* update message

* make dir if not exist

* fix docs

* fix: docstyle

* Address comments

* fix hyperparams

* Revert pydantic custom error

* pylint

* Image Spec refactoring and updates (#1525)

* Image Spec refactoring and updates

* Unit tests and update function for Image Spec

* Fix hugging face test

* Fix Tests

* Add unit tests for ModelTrainer (#1527)

* Add unit tests for ModelTrainer

* Flake8

* format

* Add example notebook (#1528)

* Add testing notebook

* format

* use smaller data

* remove large dataset

* update

* pylint

* flake8

* ignore docstyle in directories with test

* format

* format

* Add enviornment variable bootstrapping script (#1530)

* Add enviornment variables scripts

* format

* fix comment

* add docstrings

* fix comment

* feature: add utility function to capture local snapshot (#1524)

* local snapshot

* Update pip list command

* Remove function calls

* Address comments

* Address comments

* Support intelligent parameters (#1540)

* Support intelligent parameters

* fix codestyle

* Revert Image Spec (#1541)

* Cleanup ModelTrainer (#1542)

* General image builder (#1546)

* General image builder

* General image builder

* Fix codestyle

* Fix codestyle

* Move location

* Add warnings

* Add integ tests

* Fix integ test

* Fix integ test

* Fix region error

* Add region

* Latest Container Image (#1545)

* Latest Container Image

* Test Fixes

* Parameterized tests and some logic updates

* Test fixes

* Move to Image URI

* Fixes for unit test

* Fixes for unit test

* Fix codestyle error checks

* Cleanup ModelTrainer code (#1552)

* feat: add pre-processing and post-processing logic to inference_spec (#1560)

* add pre-processing and post-processing logic to inference_spec

* fix format

* make  accept_type and content_type optional

* remove accept_type and content_type from pre/post processing

* correct typo

* Add Distributed Training Support Model Trainer (#1536)

* Add path to set Additional Settings in ModelTrainer (#1555)

* feature: support HuggingFace models with JumpStart configs

* Update bucket name for the model mapping

* Mask Sensitive Env Logs in Container (#1568)

* Fix unit test

* Fix bug in script mode setup ModelTrainer (#1575)

* Save mapping as attribute

* Fix style issues

* Fix style issues

* Fix: bypass jumpstart mapping when not in endpoint mode

* Skip JS model mapping with env vars or image URI provided

* Revert "Merge branch 'aws:master' into dev-morpheus"

This reverts commit 26a0b0bb37e0343b3287f5c5c484df22726fc858, reversing
changes made to d19d4e178442be4b6e1d07d55498dd76dfac50f0.

* Merge branch 'aws:master' into dev-morpheus

This reverts commit 076442bd83e5ca977bf5b6ce1b716474d2794feb.

* Rebase on master-morpheus

* Fix unit test description

* Fix TEI integ test

* Fix style issue

* Fix style issues

* Fix schema builder integ tests

* Fix TEI integ test

* Fix code style issue

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
Co-authored-by: Xiong Zeng <[email protected]>
Co-authored-by: Gary Wang <[email protected]>
pintaoz-aws added a commit that referenced this pull request Dec 4, 2024
* Base model trainer (#1521)

* Base model trainer

* flake8

* add testing notebook

* add param validation & set defaults

* Implement simple train method

* feature: support script mode with local train.sh (#1523)

* feature: support script mode with local train.sh

* Stop tracking train.sh and add it to .gitignore

* update message

* make dir if not exist

* fix docs

* fix: docstyle

* Address comments

* fix hyperparams

* Revert pydantic custom error

* pylint

* Image Spec refactoring and updates (#1525)

* Image Spec refactoring and updates

* Unit tests and update function for Image Spec

* Fix hugging face test

* Fix Tests

* Add unit tests for ModelTrainer (#1527)

* Add unit tests for ModelTrainer

* Flake8

* format

* Add example notebook (#1528)

* Add testing notebook

* format

* use smaller data

* remove large dataset

* update

* pylint

* flake8

* ignore docstyle in directories with test

* format

* format

* Add enviornment variable bootstrapping script (#1530)

* Add enviornment variables scripts

* format

* fix comment

* add docstrings

* fix comment

* feature: add utility function to capture local snapshot (#1524)

* local snapshot

* Update pip list command

* Remove function calls

* Address comments

* Address comments

* Change to make Model Trainer return a Model Object

* Fix

* Cleanup

* Support intelligent parameters (#1540)

* Support intelligent parameters

* fix codestyle

* Revert Image Spec (#1541)

* Cleanup ModelTrainer (#1542)

* General image builder (#1546)

* General image builder

* General image builder

* Fix codestyle

* Fix codestyle

* Move location

* Add warnings

* Add integ tests

* Fix integ test

* Fix integ test

* Fix region error

* Add region

* Latest Container Image (#1545)

* Latest Container Image

* Test Fixes

* Parameterized tests and some logic updates

* Test fixes

* Move to Image URI

* Fixes for unit test

* Fixes for unit test

* Fix codestyle error checks

* Cleanup ModelTrainer code (#1552)

* Updates

* feat: add pre-processing and post-processing logic to inference_spec (#1560)

* add pre-processing and post-processing logic to inference_spec

* fix format

* make  accept_type and content_type optional

* remove accept_type and content_type from pre/post processing

* correct typo

* Add Distributed Training Support Model Trainer (#1536)

* Add path to set Additional Settings in ModelTrainer (#1555)

* Updates

* Mask Sensitive Env Logs in Container (#1568)

* Cleanup PR

* Codestyle fixes

* Update logic to use model parameter instead of model_path

* Fixes

* Fixes

* Tests

* Codestyle Fixes

* Codestyle Fixes

* Codestyle Fixes

* Codestyle Fixes

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
pintaoz-aws added a commit that referenced this pull request Dec 4, 2024
* Base model trainer (#1521)

* Base model trainer

* flake8

* add testing notebook

* add param validation & set defaults

* Implement simple train method

* feature: support script mode with local train.sh (#1523)

* feature: support script mode with local train.sh

* Stop tracking train.sh and add it to .gitignore

* update message

* make dir if not exist

* fix docs

* fix: docstyle

* Address comments

* fix hyperparams

* Revert pydantic custom error

* pylint

* Image Spec refactoring and updates (#1525)

* Image Spec refactoring and updates

* Unit tests and update function for Image Spec

* Fix hugging face test

* Fix Tests

* Add unit tests for ModelTrainer (#1527)

* Add unit tests for ModelTrainer

* Flake8

* format

* Add example notebook (#1528)

* Add testing notebook

* format

* use smaller data

* remove large dataset

* update

* pylint

* flake8

* ignore docstyle in directories with test

* format

* format

* Add enviornment variable bootstrapping script (#1530)

* Add enviornment variables scripts

* format

* fix comment

* add docstrings

* fix comment

* feature: add utility function to capture local snapshot (#1524)

* local snapshot

* Update pip list command

* Remove function calls

* Address comments

* Address comments

* Support intelligent parameters (#1540)

* Support intelligent parameters

* fix codestyle

* Revert Image Spec (#1541)

* Cleanup ModelTrainer (#1542)

* General image builder (#1546)

* General image builder

* General image builder

* Fix codestyle

* Fix codestyle

* Move location

* Add warnings

* Add integ tests

* Fix integ test

* Fix integ test

* Fix region error

* Add region

* Latest Container Image (#1545)

* Latest Container Image

* Test Fixes

* Parameterized tests and some logic updates

* Test fixes

* Move to Image URI

* Fixes for unit test

* Fixes for unit test

* Fix codestyle error checks

* Cleanup ModelTrainer code (#1552)

* Single container local mode training

* Add wait argument

* Implement helper funtions

* Add helper functions

* Fix bugs

* Fix codestyle

* feat: add pre-processing and post-processing logic to inference_spec (#1560)

* add pre-processing and post-processing logic to inference_spec

* fix format

* make  accept_type and content_type optional

* remove accept_type and content_type from pre/post processing

* correct typo

* Fix test and codestyle

* Add Distributed Training Support Model Trainer (#1536)

* Add tests

* Add path to set Additional Settings in ModelTrainer (#1555)

* Added example notebook

* Fix codestyle

* Address comments

* resolve merge conflict

* Support multi container local training (#1576)

* Fix codestyle

* Mask Sensitive Env Logs in Container (#1568)

* Fix bug in script mode setup ModelTrainer (#1575)

* Support multi container local training

* Merge branch 'single_container_local_training' into multi_container_local_training

* Update unit tests

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* Remove LocalTrainingJob class

* Bypass pydantic check

* Add example

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
pintaoz-aws added a commit that referenced this pull request Dec 4, 2024
* Base model trainer (#1521)

* Base model trainer

* flake8

* add testing notebook

* add param validation & set defaults

* Implement simple train method

* feature: support script mode with local train.sh (#1523)

* feature: support script mode with local train.sh

* Stop tracking train.sh and add it to .gitignore

* update message

* make dir if not exist

* fix docs

* fix: docstyle

* Address comments

* fix hyperparams

* Revert pydantic custom error

* pylint

* Image Spec refactoring and updates (#1525)

* Image Spec refactoring and updates

* Unit tests and update function for Image Spec

* Fix hugging face test

* Fix Tests

* Add unit tests for ModelTrainer (#1527)

* Add unit tests for ModelTrainer

* Flake8

* format

* Add example notebook (#1528)

* Add testing notebook

* format

* use smaller data

* remove large dataset

* update

* pylint

* flake8

* ignore docstyle in directories with test

* format

* format

* Add enviornment variable bootstrapping script (#1530)

* Add enviornment variables scripts

* format

* fix comment

* add docstrings

* fix comment

* feature: add utility function to capture local snapshot (#1524)

* local snapshot

* Update pip list command

* Remove function calls

* Address comments

* Address comments

* Support intelligent parameters (#1540)

* Support intelligent parameters

* fix codestyle

* Revert Image Spec (#1541)

* Cleanup ModelTrainer (#1542)

* General image builder (#1546)

* General image builder

* General image builder

* Fix codestyle

* Fix codestyle

* Move location

* Add warnings

* Add integ tests

* Fix integ test

* Fix integ test

* Fix region error

* Add region

* Latest Container Image (#1545)

* Latest Container Image

* Test Fixes

* Parameterized tests and some logic updates

* Test fixes

* Move to Image URI

* Fixes for unit test

* Fixes for unit test

* Fix codestyle error checks

* Cleanup ModelTrainer code (#1552)

* feat: add pre-processing and post-processing logic to inference_spec (#1560)

* add pre-processing and post-processing logic to inference_spec

* fix format

* make  accept_type and content_type optional

* remove accept_type and content_type from pre/post processing

* correct typo

* Add Distributed Training Support Model Trainer (#1536)

* Add path to set Additional Settings in ModelTrainer (#1555)

* feature: support HuggingFace models with JumpStart configs

* Update bucket name for the model mapping

* Mask Sensitive Env Logs in Container (#1568)

* Fix unit test

* Fix bug in script mode setup ModelTrainer (#1575)

* Save mapping as attribute

* Fix style issues

* Fix style issues

* Fix: bypass jumpstart mapping when not in endpoint mode

* Skip JS model mapping with env vars or image URI provided

* Revert "Merge branch 'aws:master' into dev-morpheus"

This reverts commit 26a0b0bb37e0343b3287f5c5c484df22726fc858, reversing
changes made to d19d4e178442be4b6e1d07d55498dd76dfac50f0.

* Merge branch 'aws:master' into dev-morpheus

This reverts commit 076442bd83e5ca977bf5b6ce1b716474d2794feb.

* Rebase on master-morpheus

* Fix unit test description

* Fix TEI integ test

* Fix style issue

* Fix style issues

* Fix schema builder integ tests

* Fix TEI integ test

* Fix code style issue

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
Co-authored-by: Xiong Zeng <[email protected]>
Co-authored-by: Gary Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants