Skip to content

change: move script mode branch to master #234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 87 commits into from
Sep 19, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
c93d034
Scriptmode single machine training implementation (#78)
icywang86rui Sep 27, 2018
3763697
Add tox.ini and configure coverage and flake runs (#80)
icywang86rui Oct 2, 2018
99eaf6b
Add integration tests to run training jobs with sagemaker (#81)
icywang86rui Oct 5, 2018
1338820
Add Script Mode example (#83)
mvsusp Oct 9, 2018
a1916a8
Add benchmarking script (#86)
mvsusp Oct 23, 2018
7047101
Edited the tf script mode notebook (#90)
eslesar-aws Oct 27, 2018
032cf60
Add distributed training support (#98)
icywang86rui Nov 6, 2018
177773d
Add CI configuration files (#109)
mvsusp Nov 15, 2018
a897135
Set S3 environment variables (#112)
icywang86rui Nov 16, 2018
5913b17
GPU fix (#117)
mvsusp Nov 19, 2018
1fab499
Update sagemaker containers (#119)
mvsusp Nov 19, 2018
c4abcae
Set parameter process waiting to False (#120)
mvsusp Nov 20, 2018
378add5
Disable GPU for parameter process (#121)
icywang86rui Nov 21, 2018
534ffa7
Unset CUDA_VISIBLE_DEVICES for worker processes (#122)
icywang86rui Nov 21, 2018
e6bf988
Fix broken unit tests (#124)
icywang86rui Nov 23, 2018
49a0547
Add Keras support (#126)
mvsusp Nov 24, 2018
962f15b
Create parameter server in different thread (#129)
icywang86rui Nov 27, 2018
8e6c4f2
Fix Keras test (#132)
icywang86rui Dec 4, 2018
d2f9f48
Skip keras local mode test on gpu and use random port for serving in …
icywang86rui Dec 5, 2018
80aa735
Update script_mode_train_any_tf_script_in_sage_maker.ipynb (#110)
mvsusp Dec 21, 2018
441adb0
Add python-dev and build-essential to Dockerfiles (#141)
laurenyu Dec 21, 2018
a9e4359
Force parameter server to run on CPU (#143)
icywang86rui Jan 3, 2019
a4e6cfa
Deprecate get_marker. Use get_closest_marker instead (#146)
icywang86rui Jan 7, 2019
4f66042
TensorFlow 1.12 and Horovod support (#138)
mvsusp Jan 8, 2019
8be0efe
Skip horovod integration tests (#149)
icywang86rui Jan 8, 2019
070e5fb
Add Horovod tests (#151)
mvsusp Jan 10, 2019
658ec5a
Skip horovod local CPU test in GPU instances (#152)
mvsusp Jan 11, 2019
48507bb
Add S3 plugin tests (#155)
icywang86rui Jan 25, 2019
ec07c35
Fix broken test test_distributed_mnist_no_ps (#156)
icywang86rui Jan 28, 2019
f339949
Use the test argement framework_version in all tests (#158)
icywang86rui Jan 29, 2019
c4d6b85
Configure encoding to be utf-8 (#160)
yangaws Feb 11, 2019
a7c0aaf
Fix SageMaker Session handling in Horovod test (#165)
laurenyu Feb 15, 2019
269c9a1
Read framework version from Python SDK for integ test default (#167)
laurenyu Feb 15, 2019
e16a936
Fix instance_type fixture setup for tests (#168)
laurenyu Feb 18, 2019
b3cb548
Specify region when creating S3 resource in integ tests (#169)
laurenyu Feb 19, 2019
686ae25
Add model saving warning at end of training (#171)
icywang86rui Feb 28, 2019
c276dac
Skip the s3_plugin test before new binary released (#177)
yangaws Mar 26, 2019
c286f01
Tune test_s3_plugin test (#178)
icywang86rui Apr 3, 2019
00a7a0b
fix: change model_dir to training job name if it is for tuning. (#179)
chuyang-deng Apr 12, 2019
ce47c76
Fix model_dir adjustment for hyperparameter tuning jobs (#181)
laurenyu Apr 22, 2019
215179b
Add Horovod benchmark (#157)
mvsusp Apr 24, 2019
1b60209
Add SageMaker integ test for hyperparameter tuning model_dir logic (#…
laurenyu Apr 25, 2019
f40f010
Add mpi4py to pip installs (#185)
laurenyu Apr 30, 2019
c097ca1
Upgrade to TensorFlow 1.13.1 (#184)
icywang86rui May 8, 2019
85cded3
Update integ test for checking Python version (#189)
laurenyu May 13, 2019
2b7138d
Pull request to test codebuild trigger on TensorFlow script mode (#186)
yangaws May 14, 2019
fb1fbdf
Explicitly set lower-bound for botocore version (#187)
laurenyu May 15, 2019
4610af3
Add release build (#191)
icywang86rui May 20, 2019
d57e1ae
fix: use tar file name as framework_support_installable in build_all.…
icywang86rui May 21, 2019
5b81f42
fix: ignore coverage in release build tests (#193)
icywang86rui May 21, 2019
7bb4475
fix: remove setup file in release build gpu test (#194)
icywang86rui May 21, 2019
cfc4ac5
fix: add branch name to remote gpu test run command (#195)
icywang86rui May 21, 2019
8e97672
fix: add setup file back (#196)
icywang86rui May 21, 2019
c257cf1
fix: skip setup on second remote run (#197)
icywang86rui May 21, 2019
7b1faac
prepare release v0.1.0
May 22, 2019
29fd3c6
update development version to v0.1.1.dev0
May 22, 2019
92e5579
fix: skip gpu SageMaker test in regions with limited amount of p2/3 i…
icywang86rui May 23, 2019
0b0ddb7
fix: fix flake8 errors and add flake8 run in buildspec.yml (#200)
icywang86rui May 23, 2019
541a724
fix: use unique name for integration job hyperparameter tuning job (#…
laurenyu May 30, 2019
823b107
fix: Parameterize processor and py_version for test runs (#208)
icywang86rui Jun 3, 2019
c686cd3
prepare release v2.0.0
Jun 3, 2019
79fc3f4
update development version to v2.0.1.dev0
Jun 3, 2019
5af4a2a
fix: remove extra comma in buildspec-release.yml (#209)
icywang86rui Jun 3, 2019
2718e0c
fix: remove non-ascii character in CHANGELOG (#210)
icywang86rui Jun 3, 2019
32caf76
prepare release v2.0.1
Jun 3, 2019
369c67f
update development version to v2.0.2.dev0
Jun 3, 2019
06fbbb4
fix: resolve pluggy version conflict (#211)
icywang86rui Jun 4, 2019
f4ca5a0
prepare release v2.0.2
Jun 4, 2019
bddf48d
update development version to v2.0.3.dev0
Jun 4, 2019
73d41fc
fix: only run one test during deployment (#212)
icywang86rui Jun 5, 2019
c8d84bd
prepare release v2.0.3
Jun 6, 2019
90c7e07
update development version to v2.0.4.dev0
Jun 6, 2019
b0c8879
fix: fix integ test errors when running with py2 (#213)
icywang86rui Jun 6, 2019
8367056
prepare release v2.0.4
Jun 6, 2019
30ac4fd
update development version to v2.0.5.dev0
Jun 6, 2019
1aa7659
fix: add hyperparameter tuning test (#216)
icywang86rui Jun 14, 2019
4f334d0
fix: bump sagemaker-containers version to 2.4.10 (#217)
icywang86rui Jun 14, 2019
874f9fd
prepare release v2.0.5
Jun 17, 2019
3a20d42
update development version to v2.0.6.dev0
Jun 17, 2019
88b06d2
change: fix horovod mnist script (#224)
Richardwan7 Jul 20, 2019
1b86b56
prepare release v2.0.6
Aug 1, 2019
18496c5
update development version to v2.0.7.dev0
Aug 1, 2019
c977f5f
change: update no-p2 and no-p3 regions. (#230)
lianyiding Aug 14, 2019
9f4a224
prepare release v2.0.7
Aug 15, 2019
e3a7a65
update development version to v2.0.8.dev0
Aug 15, 2019
2757315
deleting master content
mvsusp Sep 18, 2019
0b4fd04
Merge remote-tracking branch 'origin/script-mode' into mvs-script-mod…
mvsusp Sep 18, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .coveragerc_py27
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
[run]
branch = True
timid = True

[report]
exclude_lines =
pragma: no cover
pragma: py2 no cover
if six.PY3
elif six.PY3

partial_branches =
pragma: no cover
pragma: py2 no cover
if six.PY3
elif six.PY3

show_missing = True

fail_under = 75
20 changes: 20 additions & 0 deletions .coveragerc_py36
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
[run]
branch = True
timid = True

[report]
exclude_lines =
pragma: no cover
pragma: py3 no cover
if six.PY2
elif six.PY2

partial_branches =
pragma: no cover
pragma: py3 no cover
if six.PY3
elif six.PY3

show_missing = True

fail_under = 90
3 changes: 3 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[flake8]
application_import_names = sagemaker_tensorflow_container, test, timeout, utils
import-order-style = google
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,8 @@ dist
**/*.egg-info
.DS_Store
.idea/
*.iml
*.iml
**/.ipynb_checkpoints
**/.python-version
.tox
*~
295 changes: 295 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,295 @@
# Changelog

## v2.0.7 (2019-08-15)

### Bug fixes and other changes

* update no-p2 and no-p3 regions.

## v2.0.6 (2019-08-01)

### Bug fixes and other changes

* fix horovod mnist script

## v2.0.5 (2019-06-17)

### Bug fixes and other changes

* bump sagemaker-containers version to 2.4.10
* add hyperparameter tuning test

## v2.0.4 (2019-06-06)

### Bug fixes and other changes

* fix integ test errors when running with py2

## v2.0.3 (2019-06-06)

### Bug fixes and other changes

* only run one test during deployment

## v2.0.2 (2019-06-04)

### Bug fixes and other changes

* resolve pluggy version conflict

## v2.0.1 (2019-06-03)

### Bug fixes and other changes

* remove non-ascii character in CHANGELOG
* remove extra comma in buildspec-release.yml

## v2.0.0 (2019-06-03)

### Bug fixes and other changes

* Parameterize processor and py_version for test runs
* use unique name for integration job hyperparameter tuning job
* fix flake8 errors and add flake8 run in buildspec.yml
* skip gpu SageMaker test in regions with limited amount of p2/3 instances
* skip setup on second remote run
* add setup file back
* add branch name to remote gpu test run command
* remove setup file in release build gpu test
* ignore coverage in release build tests
* use tar file name as framework_support_installable in build_all.py
* Add release build
* Explicitly set lower-bound for botocore version
* Pull request to test codebuild trigger on TensorFlow script mode
* Update integ test for checking Python version
* Upgrade to TensorFlow 1.13.1
* Add mpi4py to pip installs
* Add SageMaker integ test for hyperparameter tuning model_dir logic
* Add Horovod benchmark
* Fix model_dir adjustment for hyperparameter tuning jobs
* change model_dir to training job name if it is for tuning.
* Tune test_s3_plugin test
* Skip the s3_plugin test before new binary released
* Add model saving warning at end of training
* Specify region when creating S3 resource in integ tests
* Fix instance_type fixture setup for tests
* Read framework version from Python SDK for integ test default
* Fix SageMaker Session handling in Horovod test
* Configure encoding to be utf-8
* Use the test argement framework_version in all tests
* Fix broken test test_distributed_mnist_no_ps
* Add S3 plugin tests
* Skip horovod local CPU test in GPU instances
* Add Horovod tests
* Skip horovod integration tests
* TensorFlow 1.12 and Horovod support
* Deprecate get_marker. Use get_closest_marker instead
* Force parameter server to run on CPU
* Add python-dev and build-essential to Dockerfiles
* Update script_mode_train_any_tf_script_in_sage_maker.ipynb
* Skip keras local mode test on gpu and use random port for serving in the test
* Fix Keras test
* Create parameter server in different thread
* Add Keras support
* Fix broken unit tests
* Unset CUDA_VISIBLE_DEVICES for worker processes
* Disable GPU for parameter process
* Set parameter process waiting to False
* Update sagemaker containers
* GPU fix
* Set S3 environment variables
* Add CI configuration files
* Add distributed training support
* Edited the tf script mode notebook
* Add benchmarking script
* Add Script Mode example
* Add integration tests to run training jobs with sagemaker
* Add tox.ini and configure coverage and flake runs
* Scriptmode single machine training implementation
* Update region in s3 boto client in serve
* Update readme with instructions for 1.9.0 and above
* Fix deserialization of dicts for json predict requests
* Add dockerfile and update test for tensorflow 1.10.0
* Support tensorflow 1.9.0
* Add integ tests to verify that tensorflow in gpu-image can access gpu-devices.
* train on 3 epochs for pipe mode test
* Change error classes used by _default_input_fn() and _default_output_fn()
* Changing assertion to check only existence
* Install sagemaker-tensorflow from pypi. Add MKL environment variables for TF 1.8
* get most recent saved model to export
* pip install tensorflow 1.8 in 1.8 cpu image
* install tensorflow extensions
* upgrade cpu binaries in docker build
* Force upgrade of the framework binaries to make sure the right binaries are installed.
* Add Pillow to pip install list
* Increase train steps for cifar distributed test to mitigate race condition
* Add TensorFlow 1.8 dockerfiles
* Add TensorFlow 1.7 dockerfiles
* Explain how to download tf binaries from PyPI
* Allow training without S3
* Fix hyperparameter name for detecting a tuning job
* Checkout v1.4.1 tag instead of r1.4 branch
* Move processing of requirements file in.
* Generate checkpoint path using TRAINING_JOB_NAME environment variable if needed
* Wrap user-provided model_fn to pass arguments positionally (maintains compatibility with existing behavior)
* Add more unit tests for trainer, fix __all__ and rename train.py to avoid import conflict
* Use regional endpoint for S3 client
* Update README.rst
* Pass input_channels to eval_input_fn if defined
* Fix setup.py to refer to renamed README
* Add test and build instructions
* Fix year in license headers
* Add TensorFlow 1.6
* Add test instructions in README
* Add container support to install_requires
* Add Apache license headers
* Use wget to install tensorflow-model-server
* Fix file path for integ test
* Fix s3_prefix path in integ test
* Fix typo in path for integ test
* Add input_channels to train_input_fn interface.
* Update logging and make serving_input_fn optional.
* remove pip install in tensorflow training
* Modify integration tests to run nvidia-docker for gpu
* add h5py for keras models
* Add local integ tests & resources
* Restructure repo to use a directory per TF version for dockerfiles
* Rename "feature_map" variables to "feature_dict" to avoid overloading it with the ML term "feature map"
* Copying in changes from internal repo:
* Add functional test
* Fix FROM image names for final build dockerfiles
* Add dockerfiles for building our production images (TF 1.4)
* GPU Dockerfile and setup.py fixes
* Add base image Dockerfiles for 1.4
* Merge pull request #1 from aws/mvs-first-commit
* first commit
* Updating initial README.md from template
* Creating initial file from template
* Creating initial file from template
* Creating initial file from template
* Creating initial file from template
* Creating initial file from template
* Initial commit

## v0.1.0 (2019-05-22)

### Bug fixes and other changes

* skip setup on second remote run
* add setup file back
* add branch name to remote gpu test run command
* remove setup file in release build gpu test
* ignore coverage in release build tests
* use tar file name as framework_support_installable in build_all.py
* Add release build
* Explicitly set lower-bound for botocore version
* Pull request to test codebuild trigger on TensorFlow script mode
* Update integ test for checking Python version
* Upgrade to TensorFlow 1.13.1
* Add mpi4py to pip installs
* Add SageMaker integ test for hyperparameter tuning model_dir logic
* Add Horovod benchmark
* Fix model_dir adjustment for hyperparameter tuning jobs
* change model_dir to training job name if it is for tuning.
* Tune test_s3_plugin test
* Skip the s3_plugin test before new binary released
* Add model saving warning at end of training
* Specify region when creating S3 resource in integ tests
* Fix instance_type fixture setup for tests
* Read framework version from Python SDK for integ test default
* Fix SageMaker Session handling in Horovod test
* Configure encoding to be utf-8
* Use the test argement framework_version in all tests
* Fix broken test test_distributed_mnist_no_ps
* Add S3 plugin tests
* Skip horovod local CPU test in GPU instances
* Add Horovod tests
* Skip horovod integration tests
* TensorFlow 1.12 and Horovod support
* Deprecate get_marker. Use get_closest_marker instead
* Force parameter server to run on CPU
* Add python-dev and build-essential to Dockerfiles
* Update script_mode_train_any_tf_script_in_sage_maker.ipynb
* Skip keras local mode test on gpu and use random port for serving in the test
* Fix Keras test
* Create parameter server in different thread
* Add Keras support
* Fix broken unit tests
* Unset CUDA_VISIBLE_DEVICES for worker processes
* Disable GPU for parameter process
* Set parameter process waiting to False
* Update sagemaker containers
* GPU fix
* Set S3 environment variables
* Add CI configuration files
* Add distributed training support
* Edited the tf script mode notebook
* Add benchmarking script
* Add Script Mode example
* Add integration tests to run training jobs with sagemaker
* Add tox.ini and configure coverage and flake runs
* Scriptmode single machine training implementation
* Update region in s3 boto client in serve
* Update readme with instructions for 1.9.0 and above
* Fix deserialization of dicts for json predict requests
* Add dockerfile and update test for tensorflow 1.10.0
* Support tensorflow 1.9.0
* Add integ tests to verify that tensorflow in gpu-image can access gpu-devices.
* train on 3 epochs for pipe mode test
* Change error classes used by _default_input_fn() and _default_output_fn()
* Changing assertion to check only existence
* Install sagemaker-tensorflow from pypi. Add MKL environment variables for TF 1.8
* get most recent saved model to export
* pip install tensorflow 1.8 in 1.8 cpu image
* install tensorflow extensions
* upgrade cpu binaries in docker build
* Force upgrade of the framework binaries to make sure the right binaries are installed.
* Add Pillow to pip install list
* Increase train steps for cifar distributed test to mitigate race condition
* Add TensorFlow 1.8 dockerfiles
* Add TensorFlow 1.7 dockerfiles
* Explain how to download tf binaries from PyPI
* Allow training without S3
* Fix hyperparameter name for detecting a tuning job
* Checkout v1.4.1 tag instead of r1.4 branch
* Move processing of requirements file in.
* Generate checkpoint path using TRAINING_JOB_NAME environment variable if needed
* Wrap user-provided model_fn to pass arguments positionally (maintains compatibility with existing behavior)
* Add more unit tests for trainer, fix __all__ and rename train.py to avoid import conflict
* Use regional endpoint for S3 client
* Update README.rst
* Pass input_channels to eval_input_fn if defined
* Fix setup.py to refer to renamed README
* Add test and build instructions
* Fix year in license headers
* Add TensorFlow 1.6
* Add test instructions in README
* Add container support to install_requires
* Add Apache license headers
* Use wget to install tensorflow-model-server
* Fix file path for integ test
* Fix s3_prefix path in integ test
* Fix typo in path for integ test
* Add input_channels to train_input_fn interface.
* Update logging and make serving_input_fn optional.
* remove pip install in tensorflow training
* Modify integration tests to run nvidia-docker for gpu
* add h5py for keras models
* Add local integ tests & resources
* Restructure repo to use a directory per TF version for dockerfiles
* Rename "feature_map" variables to "feature_dict" to avoid overloading it with the ML term "feature map"
* Copying in changes from internal repo:
* Add functional test
* Fix FROM image names for final build dockerfiles
* Add dockerfiles for building our production images (TF 1.4)
* GPU Dockerfile and setup.py fixes
* Add base image Dockerfiles for 1.4
* Merge pull request #1 from aws/mvs-first-commit
* first commit
* Updating initial README.md from template
* Creating initial file from template
* Creating initial file from template
* Creating initial file from template
* Creating initial file from template
* Creating initial file from template
* Initial commit
4 changes: 0 additions & 4 deletions CODE_OF_CONDUCT.md

This file was deleted.

Loading