Skip to content

support tuning step parameter range parameterization + support retry strategy in tuner #2551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 48 commits into from
Aug 11, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
868db55
add helper function to generate no-op (data ingestion only) recipe
jerrypeng7773 May 11, 2021
21bedbb
Merge branch 'aws:master' into master
jerrypeng7773 May 11, 2021
854dd10
separate flow generation by source input type + move generation helpe…
jerrypeng7773 May 11, 2021
8798b65
Merge branch 'aws:master' into master
jerrypeng7773 May 11, 2021
69ae4bd
create an internal helper function to generate output node
jerrypeng7773 May 12, 2021
a6a8449
Merge branch 'master' of github.com:jerrypeng7773/sagemaker-python-sdk
jerrypeng7773 May 12, 2021
2aa256e
Merge branch 'aws:master' into master
jerrypeng7773 May 18, 2021
06557a8
add ingestion test using dw processor via pipeline execution
jerrypeng7773 May 19, 2021
dcbfd13
Merge branch 'aws:master' into master
jerrypeng7773 May 19, 2021
fc6522e
verify the fg query df
jerrypeng7773 May 19, 2021
b6f9371
Merge branch 'master' into master
ahsan-z-khan May 19, 2021
86fa47d
fix tests
jerrypeng7773 May 19, 2021
05ccfa6
Merge branch 'master' into master
ahsan-z-khan May 20, 2021
0716e9f
Merge branch 'aws:master' into master
jerrypeng7773 Jun 14, 2021
7ca5af4
add tuning step support
jerrypeng7773 Jun 24, 2021
8cf18b8
fix docstyle check
jerrypeng7773 Jun 24, 2021
1f95b82
add helper function to get tuning step top performing model s3 uri
jerrypeng7773 Jun 29, 2021
1b9d66b
Merge branch 'aws:master' into master
jerrypeng7773 Jun 30, 2021
5bc47bd
allow step depends on pass in step instance
jerrypeng7773 Jun 30, 2021
603b934
Merge branch 'aws:master' into master
jerrypeng7773 Jun 30, 2021
664f2a8
Merge branch 'master' of github.com:jerrypeng7773/sagemaker-python-sdk
jerrypeng7773 Jun 30, 2021
a8755ec
Merge branch 'master' into master
apogupta2018 Jul 1, 2021
e25d36c
Merge branch 'aws:master' into master
jerrypeng7773 Jul 1, 2021
a9cfab4
Merge branch 'master' into accept-step-object-in-dependson-list
jerrypeng7773 Jul 1, 2021
c0066ea
resolve merge conflict
jerrypeng7773 Jul 1, 2021
e9ac9fa
support passing step object to tuning step depends on list
jerrypeng7773 Jul 1, 2021
eb6a523
fix test_workflow_with_clarify
jerrypeng7773 Jul 1, 2021
c19c426
add tuning step to docs
jerrypeng7773 Jul 6, 2021
450e4a5
allow step instance in depends on list for repack and reigster model …
jerrypeng7773 Jul 6, 2021
cb7be4a
Merge branch 'master' into master
ahsan-z-khan Jul 7, 2021
2918765
add tuning step get_top_model_s3_uri to doc
jerrypeng7773 Jul 9, 2021
fe9bd70
Merge branch 'aws:master' into master
jerrypeng7773 Jul 9, 2021
378c868
Merge branch 'master' of github.com:jerrypeng7773/sagemaker-python-sdk
jerrypeng7773 Jul 9, 2021
93cdb68
remove extra new line
jerrypeng7773 Jul 9, 2021
24226f9
add callback step to doc
jerrypeng7773 Jul 9, 2021
001cac5
switch order in doc
jerrypeng7773 Jul 9, 2021
b5c00c1
Merge branch 'master' into master
ahsan-z-khan Jul 12, 2021
3b75821
Merge branch 'master' into accept-step-object-in-dependson-list
ahsan-z-khan Jul 12, 2021
e70ae34
Merge branch 'aws:master' into master
jerrypeng7773 Jul 14, 2021
0eaf41b
Merge branch 'master' of https://github.com/aws/sagemaker-python-sdk …
jerrypeng7773 Jul 14, 2021
dad08c4
fix formatting
jerrypeng7773 Jul 14, 2021
edf9cba
support parameterize tuning job parameter ranges
jerrypeng7773 Aug 3, 2021
57bd90d
Merge branch 'aws:master' into master
jerrypeng7773 Aug 3, 2021
597bb74
Merge branch 'aws:master' into master
jerrypeng7773 Aug 3, 2021
ae55619
support tuning step parameter range parameterization + support retry …
jerrypeng7773 Aug 3, 2021
5a6148a
Merge branch 'master' into master
ahsan-z-khan Aug 9, 2021
9b1d905
Merge branch 'master' into master
ahsan-z-khan Aug 11, 2021
282c9fe
Merge branch 'master' into master
ahsan-z-khan Aug 11, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion doc/workflows/pipelines/sagemaker.workflow.pipelines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ ConditionStep
-------------

.. autoclass:: sagemaker.workflow.condition_step.ConditionStep

.. deprecated:: sagemaker.workflow.condition_step.JsonGet

Conditions
Expand Down
14 changes: 10 additions & 4 deletions src/sagemaker/parameter.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@
# language governing permissions and limitations under the License.
"""Placeholder docstring"""
from __future__ import absolute_import

import json
from sagemaker.workflow.parameters import Parameter as PipelineParameter


class ParameterRange(object):
Expand Down Expand Up @@ -68,8 +70,12 @@ def as_tuning_range(self, name):
"""
return {
"Name": name,
"MinValue": str(self.min_value),
"MaxValue": str(self.max_value),
"MinValue": str(self.min_value)
if not isinstance(self.min_value, PipelineParameter)
else self.min_value,
"MaxValue": str(self.max_value)
if not isinstance(self.max_value, PipelineParameter)
else self.max_value,
"ScalingType": self.scaling_type,
}

Expand Down Expand Up @@ -103,9 +109,9 @@ def __init__(self, values): # pylint: disable=super-init-not-called
This input will be converted into a list of strings.
"""
if isinstance(values, list):
self.values = [str(v) for v in values]
self.values = [str(v) if not isinstance(v, PipelineParameter) else v for v in values]
else:
self.values = [str(values)]
self.values = [str(values) if not isinstance(values, PipelineParameter) else values]

def as_tuning_range(self, name):
"""Represent the parameter range as a dictionary.
Expand Down
4 changes: 4 additions & 0 deletions src/sagemaker/session.py
Original file line number Diff line number Diff line change
Expand Up @@ -2213,6 +2213,7 @@ def _map_training_config(
use_spot_instances=False,
checkpoint_s3_uri=None,
checkpoint_local_path=None,
max_retry_attempts=None,
):
"""Construct a dictionary of training job configuration from the arguments.

Expand Down Expand Up @@ -2266,6 +2267,7 @@ def _map_training_config(
objective_metric_name (str): Name of the metric for evaluating training jobs.
parameter_ranges (dict): Dictionary of parameter ranges. These parameter ranges can
be one of three types: Continuous, Integer, or Categorical.
max_retry_attempts (int): The number of times to retry the job.

Returns:
A dictionary of training job configuration. For format details, please refer to
Expand Down Expand Up @@ -2322,6 +2324,8 @@ def _map_training_config(
if parameter_ranges is not None:
training_job_definition["HyperParameterRanges"] = parameter_ranges

if max_retry_attempts is not None:
training_job_definition["RetryStrategy"] = {"MaximumRetryAttempts": max_retry_attempts}
return training_job_definition

def stop_tuning_job(self, name):
Expand Down
8 changes: 7 additions & 1 deletion src/sagemaker/tuner.py
Original file line number Diff line number Diff line change
Expand Up @@ -1507,7 +1507,10 @@ def _get_tuner_args(cls, tuner, inputs):

if tuner.estimator is not None:
tuner_args["training_config"] = cls._prepare_training_config(
inputs, tuner.estimator, tuner.static_hyperparameters, tuner.metric_definitions
inputs=inputs,
estimator=tuner.estimator,
static_hyperparameters=tuner.static_hyperparameters,
metric_definitions=tuner.metric_definitions,
)

if tuner.estimator_dict is not None:
Expand Down Expand Up @@ -1580,6 +1583,9 @@ def _prepare_training_config(
if parameter_ranges is not None:
training_config["parameter_ranges"] = parameter_ranges

if estimator.max_retry_attempts is not None:
training_config["max_retry_attempts"] = estimator.max_retry_attempts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we already have a standard mechanism to propagate API changes to pipeline steps? i.e. simultaneous release of new API features and pipeline support

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from DSL level yes, as long as the model has it, we can support it simultaneously. However, in python SDK, when translating the estimator or tuner to request arguments, one will need to manually add the new fields. For instance, in this case, the estimator supports max_retry_attempts, however when passed to tuner, and during python object to request arguments translations, this is not included.


return training_config

def stop(self):
Expand Down
1 change: 1 addition & 0 deletions src/sagemaker/workflow/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,7 @@ def _interpolate(
"""
if isinstance(obj, (Expression, Parameter, Properties)):
return obj.expr

if isinstance(obj, CallbackOutput):
step_name = callback_output_to_step_map[obj.output_name]
return obj.expr(step_name)
Expand Down
100 changes: 95 additions & 5 deletions tests/integ/test_workflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -1075,7 +1075,7 @@ def test_conditional_pytorch_training_model_registration(
pass


def test_tuning(
def test_tuning_single_algo(
sagemaker_session,
role,
cpu_instance_type,
Expand All @@ -1098,14 +1098,17 @@ def test_tuning(
role=role,
framework_version="1.5.0",
py_version="py3",
instance_count=1,
instance_type="ml.m5.xlarge",
instance_count=instance_count,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are these coming from? Do we have fixtures for instance_count and instance_type ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 1093 and 1094, they are the pipeline parameters

instance_type=instance_type,
sagemaker_session=sagemaker_session,
enable_sagemaker_metrics=True,
max_retry_attempts=3,
)

min_batch_size = ParameterString(name="MinBatchSize", default_value="64")
max_batch_size = ParameterString(name="MaxBatchSize", default_value="128")
hyperparameter_ranges = {
"batch-size": IntegerParameter(64, 128),
"batch-size": IntegerParameter(min_batch_size, max_batch_size),
}

tuner = HyperparameterTuner(
Expand Down Expand Up @@ -1161,7 +1164,7 @@ def test_tuning(

pipeline = Pipeline(
name=pipeline_name,
parameters=[instance_count, instance_type],
parameters=[instance_count, instance_type, min_batch_size, max_batch_size],
steps=[step_tune, step_best_model, step_second_best_model],
sagemaker_session=sagemaker_session,
)
Expand All @@ -1185,6 +1188,93 @@ def test_tuning(
pass


def test_tuning_multi_algos(
sagemaker_session,
role,
cpu_instance_type,
pipeline_name,
region_name,
):
base_dir = os.path.join(DATA_DIR, "pytorch_mnist")
entry_point = os.path.join(base_dir, "mnist.py")
input_path = sagemaker_session.upload_data(
path=os.path.join(base_dir, "training"),
key_prefix="integ-test-data/pytorch_mnist/training",
)

instance_count = ParameterInteger(name="InstanceCount", default_value=1)
instance_type = ParameterString(name="InstanceType", default_value="ml.m5.xlarge")

pytorch_estimator = PyTorch(
entry_point=entry_point,
role=role,
framework_version="1.5.0",
py_version="py3",
instance_count=instance_count,
instance_type=instance_type,
sagemaker_session=sagemaker_session,
enable_sagemaker_metrics=True,
max_retry_attempts=3,
)

min_batch_size = ParameterString(name="MinBatchSize", default_value="64")
max_batch_size = ParameterString(name="MaxBatchSize", default_value="128")

tuner = HyperparameterTuner.create(
estimator_dict={
"estimator-1": pytorch_estimator,
"estimator-2": pytorch_estimator,
},
objective_metric_name_dict={
"estimator-1": "test:acc",
"estimator-2": "test:acc",
},
hyperparameter_ranges_dict={
"estimator-1": {"batch-size": IntegerParameter(min_batch_size, max_batch_size)},
"estimator-2": {"batch-size": IntegerParameter(min_batch_size, max_batch_size)},
},
metric_definitions_dict={
"estimator-1": [{"Name": "test:acc", "Regex": "Overall test accuracy: (.*?);"}],
"estimator-2": [{"Name": "test:acc", "Regex": "Overall test accuracy: (.*?);"}],
},
)
inputs = {
"estimator-1": TrainingInput(s3_data=input_path),
"estimator-2": TrainingInput(s3_data=input_path),
}

step_tune = TuningStep(
name="my-tuning-step",
tuner=tuner,
inputs=inputs,
)

pipeline = Pipeline(
name=pipeline_name,
parameters=[instance_count, instance_type, min_batch_size, max_batch_size],
steps=[step_tune],
sagemaker_session=sagemaker_session,
)

try:
response = pipeline.create(role)
create_arn = response["PipelineArn"]
assert re.match(
fr"arn:aws:sagemaker:{region_name}:\d{{12}}:pipeline/{pipeline_name}", create_arn
)

execution = pipeline.start(parameters={})
assert re.match(
fr"arn:aws:sagemaker:{region_name}:\d{{12}}:pipeline/{pipeline_name}/execution/",
execution.arn,
)
finally:
try:
pipeline.delete()
except Exception:
pass


def test_mxnet_model_registration(
sagemaker_session,
role,
Expand Down
17 changes: 13 additions & 4 deletions tests/unit/sagemaker/workflow/test_steps.py
Original file line number Diff line number Diff line change
Expand Up @@ -716,14 +716,16 @@ def test_multi_algo_tuning_step(sagemaker_session):
data_source_uri_parameter = ParameterString(
name="DataSourceS3Uri", default_value=f"s3://{BUCKET}/train_manifest"
)
instance_count = ParameterInteger(name="InstanceCount", default_value=1)
estimator = Estimator(
image_uri=IMAGE_URI,
role=ROLE,
instance_count=1,
instance_count=instance_count,
instance_type="ml.c5.4xlarge",
profiler_config=ProfilerConfig(system_monitor_interval_millis=500),
rules=[],
sagemaker_session=sagemaker_session,
max_retry_attempts=10,
)

estimator.set_hyperparameters(
Expand All @@ -739,8 +741,9 @@ def test_multi_algo_tuning_step(sagemaker_session):
augmentation_type="crop",
)

initial_lr_param = ParameterString(name="InitialLR", default_value="0.0001")
hyperparameter_ranges = {
"learning_rate": ContinuousParameter(0.0001, 0.05),
"learning_rate": ContinuousParameter(initial_lr_param, 0.05),
"momentum": ContinuousParameter(0.0, 0.99),
"weight_decay": ContinuousParameter(0.0, 0.99),
}
Expand Down Expand Up @@ -825,7 +828,7 @@ def test_multi_algo_tuning_step(sagemaker_session):
"ContinuousParameterRanges": [
{
"Name": "learning_rate",
"MinValue": "0.0001",
"MinValue": initial_lr_param,
"MaxValue": "0.05",
"ScalingType": "Auto",
},
Expand All @@ -845,6 +848,9 @@ def test_multi_algo_tuning_step(sagemaker_session):
"CategoricalParameterRanges": [],
"IntegerParameterRanges": [],
},
"RetryStrategy": {
"MaximumRetryAttempts": 10,
},
},
{
"StaticHyperParameters": {
Expand Down Expand Up @@ -889,7 +895,7 @@ def test_multi_algo_tuning_step(sagemaker_session):
"ContinuousParameterRanges": [
{
"Name": "learning_rate",
"MinValue": "0.0001",
"MinValue": initial_lr_param,
"MaxValue": "0.05",
"ScalingType": "Auto",
},
Expand All @@ -909,6 +915,9 @@ def test_multi_algo_tuning_step(sagemaker_session):
"CategoricalParameterRanges": [],
"IntegerParameterRanges": [],
},
"RetryStrategy": {
"MaximumRetryAttempts": 10,
},
},
],
},
Expand Down