-
Notifications
You must be signed in to change notification settings - Fork 1.2k
support tuning step parameter range parameterization + support retry strategy in tuner #2551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
48 commits
Select commit
Hold shift + click to select a range
868db55
add helper function to generate no-op (data ingestion only) recipe
jerrypeng7773 21bedbb
Merge branch 'aws:master' into master
jerrypeng7773 854dd10
separate flow generation by source input type + move generation helpe…
jerrypeng7773 8798b65
Merge branch 'aws:master' into master
jerrypeng7773 69ae4bd
create an internal helper function to generate output node
jerrypeng7773 a6a8449
Merge branch 'master' of github.com:jerrypeng7773/sagemaker-python-sdk
jerrypeng7773 2aa256e
Merge branch 'aws:master' into master
jerrypeng7773 06557a8
add ingestion test using dw processor via pipeline execution
jerrypeng7773 dcbfd13
Merge branch 'aws:master' into master
jerrypeng7773 fc6522e
verify the fg query df
jerrypeng7773 b6f9371
Merge branch 'master' into master
ahsan-z-khan 86fa47d
fix tests
jerrypeng7773 05ccfa6
Merge branch 'master' into master
ahsan-z-khan 0716e9f
Merge branch 'aws:master' into master
jerrypeng7773 7ca5af4
add tuning step support
jerrypeng7773 8cf18b8
fix docstyle check
jerrypeng7773 1f95b82
add helper function to get tuning step top performing model s3 uri
jerrypeng7773 1b9d66b
Merge branch 'aws:master' into master
jerrypeng7773 5bc47bd
allow step depends on pass in step instance
jerrypeng7773 603b934
Merge branch 'aws:master' into master
jerrypeng7773 664f2a8
Merge branch 'master' of github.com:jerrypeng7773/sagemaker-python-sdk
jerrypeng7773 a8755ec
Merge branch 'master' into master
apogupta2018 e25d36c
Merge branch 'aws:master' into master
jerrypeng7773 a9cfab4
Merge branch 'master' into accept-step-object-in-dependson-list
jerrypeng7773 c0066ea
resolve merge conflict
jerrypeng7773 e9ac9fa
support passing step object to tuning step depends on list
jerrypeng7773 eb6a523
fix test_workflow_with_clarify
jerrypeng7773 c19c426
add tuning step to docs
jerrypeng7773 450e4a5
allow step instance in depends on list for repack and reigster model …
jerrypeng7773 cb7be4a
Merge branch 'master' into master
ahsan-z-khan 2918765
add tuning step get_top_model_s3_uri to doc
jerrypeng7773 fe9bd70
Merge branch 'aws:master' into master
jerrypeng7773 378c868
Merge branch 'master' of github.com:jerrypeng7773/sagemaker-python-sdk
jerrypeng7773 93cdb68
remove extra new line
jerrypeng7773 24226f9
add callback step to doc
jerrypeng7773 001cac5
switch order in doc
jerrypeng7773 b5c00c1
Merge branch 'master' into master
ahsan-z-khan 3b75821
Merge branch 'master' into accept-step-object-in-dependson-list
ahsan-z-khan e70ae34
Merge branch 'aws:master' into master
jerrypeng7773 0eaf41b
Merge branch 'master' of https://github.com/aws/sagemaker-python-sdk …
jerrypeng7773 dad08c4
fix formatting
jerrypeng7773 edf9cba
support parameterize tuning job parameter ranges
jerrypeng7773 57bd90d
Merge branch 'aws:master' into master
jerrypeng7773 597bb74
Merge branch 'aws:master' into master
jerrypeng7773 ae55619
support tuning step parameter range parameterization + support retry …
jerrypeng7773 5a6148a
Merge branch 'master' into master
ahsan-z-khan 9b1d905
Merge branch 'master' into master
ahsan-z-khan 282c9fe
Merge branch 'master' into master
ahsan-z-khan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1075,7 +1075,7 @@ def test_conditional_pytorch_training_model_registration( | |
pass | ||
|
||
|
||
def test_tuning( | ||
def test_tuning_single_algo( | ||
sagemaker_session, | ||
role, | ||
cpu_instance_type, | ||
|
@@ -1098,14 +1098,17 @@ def test_tuning( | |
role=role, | ||
framework_version="1.5.0", | ||
py_version="py3", | ||
instance_count=1, | ||
instance_type="ml.m5.xlarge", | ||
instance_count=instance_count, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where are these coming from? Do we have fixtures for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. line 1093 and 1094, they are the pipeline parameters |
||
instance_type=instance_type, | ||
sagemaker_session=sagemaker_session, | ||
enable_sagemaker_metrics=True, | ||
max_retry_attempts=3, | ||
) | ||
|
||
min_batch_size = ParameterString(name="MinBatchSize", default_value="64") | ||
max_batch_size = ParameterString(name="MaxBatchSize", default_value="128") | ||
hyperparameter_ranges = { | ||
"batch-size": IntegerParameter(64, 128), | ||
"batch-size": IntegerParameter(min_batch_size, max_batch_size), | ||
} | ||
|
||
tuner = HyperparameterTuner( | ||
|
@@ -1161,7 +1164,7 @@ def test_tuning( | |
|
||
pipeline = Pipeline( | ||
name=pipeline_name, | ||
parameters=[instance_count, instance_type], | ||
parameters=[instance_count, instance_type, min_batch_size, max_batch_size], | ||
steps=[step_tune, step_best_model, step_second_best_model], | ||
sagemaker_session=sagemaker_session, | ||
) | ||
|
@@ -1185,6 +1188,93 @@ def test_tuning( | |
pass | ||
|
||
|
||
def test_tuning_multi_algos( | ||
sagemaker_session, | ||
role, | ||
cpu_instance_type, | ||
pipeline_name, | ||
region_name, | ||
): | ||
base_dir = os.path.join(DATA_DIR, "pytorch_mnist") | ||
entry_point = os.path.join(base_dir, "mnist.py") | ||
input_path = sagemaker_session.upload_data( | ||
path=os.path.join(base_dir, "training"), | ||
key_prefix="integ-test-data/pytorch_mnist/training", | ||
) | ||
|
||
instance_count = ParameterInteger(name="InstanceCount", default_value=1) | ||
instance_type = ParameterString(name="InstanceType", default_value="ml.m5.xlarge") | ||
|
||
pytorch_estimator = PyTorch( | ||
entry_point=entry_point, | ||
role=role, | ||
framework_version="1.5.0", | ||
py_version="py3", | ||
instance_count=instance_count, | ||
instance_type=instance_type, | ||
sagemaker_session=sagemaker_session, | ||
enable_sagemaker_metrics=True, | ||
max_retry_attempts=3, | ||
) | ||
|
||
min_batch_size = ParameterString(name="MinBatchSize", default_value="64") | ||
max_batch_size = ParameterString(name="MaxBatchSize", default_value="128") | ||
|
||
tuner = HyperparameterTuner.create( | ||
estimator_dict={ | ||
"estimator-1": pytorch_estimator, | ||
"estimator-2": pytorch_estimator, | ||
}, | ||
objective_metric_name_dict={ | ||
"estimator-1": "test:acc", | ||
"estimator-2": "test:acc", | ||
}, | ||
hyperparameter_ranges_dict={ | ||
"estimator-1": {"batch-size": IntegerParameter(min_batch_size, max_batch_size)}, | ||
"estimator-2": {"batch-size": IntegerParameter(min_batch_size, max_batch_size)}, | ||
}, | ||
metric_definitions_dict={ | ||
"estimator-1": [{"Name": "test:acc", "Regex": "Overall test accuracy: (.*?);"}], | ||
"estimator-2": [{"Name": "test:acc", "Regex": "Overall test accuracy: (.*?);"}], | ||
}, | ||
) | ||
inputs = { | ||
"estimator-1": TrainingInput(s3_data=input_path), | ||
"estimator-2": TrainingInput(s3_data=input_path), | ||
} | ||
|
||
step_tune = TuningStep( | ||
name="my-tuning-step", | ||
tuner=tuner, | ||
inputs=inputs, | ||
) | ||
|
||
pipeline = Pipeline( | ||
name=pipeline_name, | ||
parameters=[instance_count, instance_type, min_batch_size, max_batch_size], | ||
steps=[step_tune], | ||
sagemaker_session=sagemaker_session, | ||
) | ||
|
||
try: | ||
response = pipeline.create(role) | ||
create_arn = response["PipelineArn"] | ||
assert re.match( | ||
fr"arn:aws:sagemaker:{region_name}:\d{{12}}:pipeline/{pipeline_name}", create_arn | ||
) | ||
|
||
execution = pipeline.start(parameters={}) | ||
assert re.match( | ||
fr"arn:aws:sagemaker:{region_name}:\d{{12}}:pipeline/{pipeline_name}/execution/", | ||
execution.arn, | ||
) | ||
finally: | ||
try: | ||
pipeline.delete() | ||
except Exception: | ||
pass | ||
|
||
|
||
def test_mxnet_model_registration( | ||
sagemaker_session, | ||
role, | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we already have a standard mechanism to propagate API changes to pipeline steps? i.e. simultaneous release of new API features and pipeline support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from DSL level yes, as long as the model has it, we can support it simultaneously. However, in python SDK, when translating the estimator or tuner to request arguments, one will need to manually add the new fields. For instance, in this case, the estimator supports
max_retry_attempts
, however when passed to tuner, and during python object to request arguments translations, this is not included.