Skip to content

Update hyperparameter tuning/analytics docstrings #215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 7, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions doc/analytics.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Analytics
---------

.. autoclass:: sagemaker.analytics.AnalyticsMetricsBase
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: sagemaker.analytics.HyperparameterTuningJobAnalytics
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: sagemaker.analytics.TrainingJobAnalytics
:members:
:undoc-members:
:show-inheritance:
4 changes: 3 additions & 1 deletion doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Amazon SageMaker Python SDK is an open source library for training and deploying

With the SDK, you can train and deploy models using popular deep learning frameworks: **Apache MXNet** and **TensorFlow**. You can also train and deploy models with **algorithms provided by Amazon**, these are scalable implementations of core machine learning algorithms that are optimized for SageMaker and GPU training. If you have **your own algorithms** built into SageMaker-compatible Docker containers, you can train and host models using these as well.

Here you'll find API docs for SageMaker Python SDK. The project home-page is in Github: https://github.com/aws/sagemaker-python-sdk, there you can find the SDK source, installation instructions and a general overview of the library there.
Here you'll find API docs for SageMaker Python SDK. The project home-page is in Github: https://github.com/aws/sagemaker-python-sdk, there you can find the SDK source, installation instructions and a general overview of the library there.

Overview
----------
Expand All @@ -14,9 +14,11 @@ The SageMaker Python SDK consists of a few primary interfaces:
:maxdepth: 2

estimators
tuner
predictors
session
model
analytics

MXNet
----------
Expand Down
22 changes: 22 additions & 0 deletions doc/tuner.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
HyperparameterTuner
-------------------

.. autoclass:: sagemaker.tuner.HyperparameterTuner
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: sagemaker.tuner.ContinuousParameter
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: sagemaker.tuner.IntegerParameter
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: sagemaker.tuner.CategoricalParameter
:members:
:undoc-members:
:show-inheritance:
47 changes: 26 additions & 21 deletions src/sagemaker/analytics.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,25 +64,24 @@ def _fetch_dataframe(self):
pass

def clear_cache(self):
"""Clears the object of all local caches of API methods, so
"""Clear the object of all local caches of API methods, so
that the next time any properties are accessed they will be refreshed from
the service.
"""
self._dataframe = None


class HyperparameterTuningJobAnalytics(AnalyticsMetricsBase):
"""Fetches results about this tuning job and makes them accessible for analytics.
"""Fetch results about a hyperparameter tuning job and make them accessible for analytics.
"""

def __init__(self, hyperparameter_tuning_job_name, sagemaker_session=None):
"""Initialize an ``HyperparameterTuningJobAnalytics`` instance.
"""Initialize a ``HyperparameterTuningJobAnalytics`` instance.

Args:
hyperparameter_tuning_job_name (str): name of the HyperparameterTuningJob to
analyze.
hyperparameter_tuning_job_name (str): name of the HyperparameterTuningJob to analyze.
sagemaker_session (sagemaker.session.Session): Session object which manages interactions with
Amazon SageMaker APIs and any other AWS services needed. If not specified, the estimator creates one
Amazon SageMaker APIs and any other AWS services needed. If not specified, one is created
using the default AWS configuration chain.
"""
sagemaker_session = sagemaker_session or Session()
Expand All @@ -100,16 +99,16 @@ def __repr__(self):
return "<sagemaker.HyperparameterTuningJobAnalytics for %s>" % self.name

def clear_cache(self):
"""Clears the object of all local caches of API methods.
"""Clear the object of all local caches of API methods.
"""
super(HyperparameterTuningJobAnalytics, self).clear_cache()
self._tuning_job_describe_result = None
self._training_job_summaries = None

def _fetch_dataframe(self):
"""Returns a pandas dataframe with all the training jobs, their
hyperparameters, results, and metadata about the training jobs.
Includes a column to indicate that any job was the best seen so far.
"""Return a pandas dataframe with all the training jobs, along with their
hyperparameters, results, and metadata. This also includes a column to indicate
if a training job was the best seen so far.
"""
def reshape(training_summary):
# Helper method to reshape a single training job summary into a dataframe record
Expand Down Expand Up @@ -139,8 +138,8 @@ def reshape(training_summary):

@property
def tuning_ranges(self):
"""A dict describing the ranges of all tuned hyperparameters.
Dict's key is the name of the hyper param. Dict's value is the range.
"""A dictionary describing the ranges of all tuned hyperparameters.
The keys are the names of the hyperparameter, and the values are the ranges.
"""
out = {}
for _, ranges in self.description()['HyperParameterTuningJobConfig']['ParameterRanges'].items():
Expand All @@ -149,10 +148,13 @@ def tuning_ranges(self):
return out

def description(self, force_refresh=False):
"""Response to DescribeHyperParameterTuningJob
"""Call ``DescribeHyperParameterTuningJob`` for the hyperparameter tuning job.

Args:
force_refresh (bool): Set to True to fetch the latest data from SageMaker API.

Returns:
dict: The Amazon SageMaker response for ``DescribeHyperParameterTuningJob``.
"""
if force_refresh:
self.clear_cache()
Expand All @@ -163,10 +165,13 @@ def description(self, force_refresh=False):
return self._tuning_job_describe_result

def training_job_summaries(self, force_refresh=False):
"""A list of everything (paginated) from ListTrainingJobsForTuningJob
"""A (paginated) list of everything from ``ListTrainingJobsForTuningJob``.

Args:
force_refresh (bool): Set to True to fetch the latest data from SageMaker API.

Returns:
dict: The Amazon SageMaker response for ``ListTrainingJobsForTuningJob``.
"""
if force_refresh:
self.clear_cache()
Expand All @@ -191,19 +196,19 @@ def training_job_summaries(self, force_refresh=False):


class TrainingJobAnalytics(AnalyticsMetricsBase):
"""Fetches training curve data from CloudWatch Metrics for a specific training job.
"""Fetch training curve data from CloudWatch Metrics for a specific training job.
"""

CLOUDWATCH_NAMESPACE = '/aws/sagemaker/HyperParameterTuningJobs'

def __init__(self, training_job_name, metric_names, sagemaker_session=None):
"""Initialize an ``TrainingJobAnalytics`` instance.
"""Initialize a ``TrainingJobAnalytics`` instance.

Args:
training_job_name (str): name of the TrainingJob to analyze.
metric_names (list): string names of all the metrics to collect for this training job
sagemaker_session (sagemaker.session.Session): Session object which manages interactions with
Amazon SageMaker APIs and any other AWS services needed. If not specified, the estimator creates one
Amazon SageMaker APIs and any other AWS services needed. If not specified, one is specified
using the default AWS configuration chain.
"""
sagemaker_session = sagemaker_session or Session()
Expand All @@ -223,7 +228,7 @@ def __repr__(self):
return "<sagemaker.TrainingJobAnalytics for %s>" % self.name

def clear_cache(self):
"""Clears the object of all local caches of API methods, so
"""Clear the object of all local caches of API methods, so
that the next time any properties are accessed they will be refreshed from
the service.
"""
Expand All @@ -232,7 +237,7 @@ def clear_cache(self):
self._time_interval = self._determine_timeinterval()

def _determine_timeinterval(self):
"""Returns a dict with two datetime objects, start_time and end_time
"""Return a dictionary with two datetime objects, start_time and end_time,
covering the interval of the training job
"""
description = self._sage_client.describe_training_job(TrainingJobName=self.name)
Expand All @@ -249,7 +254,7 @@ def _fetch_dataframe(self):
return pd.DataFrame(self._data)

def _fetch_metric(self, metric_name):
"""Fetches all the values of a named metric, and adds them to _data
"""Fetch all the values of a named metric, and add them to _data
"""
request = {
'Namespace': self.CLOUDWATCH_NAMESPACE,
Expand Down Expand Up @@ -284,7 +289,7 @@ def _fetch_metric(self, metric_name):
self._add_single_metric(elapsed_seconds, metric_name, value)

def _add_single_metric(self, timestamp, metric_name, value):
"""Stores a single metric in the _data dict which can be
"""Store a single metric in the _data dict which can be
converted to a dataframe.
"""
# note that this method is built this way to make it possible to
Expand Down
2 changes: 1 addition & 1 deletion src/sagemaker/estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,7 @@ def delete_endpoint(self):

@property
def training_job_analytics(self):
"""Returns a TrainingJobAnalytics object for the current training job.
"""Return a ``TrainingJobAnalytics`` object for the current training job.
"""
if self._current_job_name is None:
raise ValueError('Estimator is not associated with a TrainingJob')
Expand Down
51 changes: 33 additions & 18 deletions src/sagemaker/session.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,10 +222,12 @@ def train(self, image, input_mode, input_config, role, job_name, output_config,
job_name (str): Name of the training job being created.
output_config (dict): The S3 URI where you want to store the training results and optional KMS key ID.
resource_config (dict): Contains values for ResourceConfig:

* instance_count (int): Number of EC2 instances to use for training.
The key in resource_config is 'InstanceCount'.
* instance_type (str): Type of EC2 instance to use for training, for example, 'ml.c4.xlarge'.
The key in resource_config is 'InstanceType'.

hyperparameters (dict): Hyperparameters for model training. The hyperparameters are made accessible as
a dict[str, str] to the training code on SageMaker. For convenience, this accepts other types for
keys and values, but ``str()`` will be called to convert them before training.
Expand Down Expand Up @@ -269,22 +271,28 @@ def tune(self, job_name, strategy, objective_type, objective_metric_name,

Args:
job_name (str): Name of the tuning job being created.
strategy (str): Strategy to be used.
objective_type (str): Minimize/Maximize
objective_metric_name (str): Name of the metric to use when evaluating training job.
max_jobs (int): Maximum total number of jobs to start.
max_parallel_jobs (int): Maximum number of parallel jobs to start.
parameter_ranges (dict): Parameter ranges in a dictionary of types: Continuous, Integer, Categorical
static_hyperparameters (dict): Hyperparameters for model training. The hyperparameters are made accessible
as a dict[str, str] to the training code on SageMaker. For convenience, this accepts other types for
keys and values, but ``str()`` will be called to convert them before training.
strategy (str): Strategy to be used for hyperparameter estimations.
objective_type (str): The type of the objective metric for evaluating training jobs. This value can be
either 'Minimize' or 'Maximize'.
objective_metric_name (str): Name of the metric for evaluating training jobs.
max_jobs (int): Maximum total number of training jobs to start for the hyperparameter tuning job.
max_parallel_jobs (int): Maximum number of parallel training jobs to start.
parameter_ranges (dict): Dictionary of parameter ranges. These parameter ranges can be one of three types:
Continuous, Integer, or Categorical.
static_hyperparameters (dict): Hyperparameters for model training. These hyperparameters remain
unchanged across all of the training jobs for the hyperparameter tuning job. The hyperparameters are
made accessible as a dictionary for the training code on SageMaker.
image (str): Docker image containing training code.
input_mode (str): The input mode that the algorithm supports. Valid modes:

* 'File' - Amazon SageMaker copies the training dataset from the S3 location to
a directory in the Docker container.
* 'Pipe' - Amazon SageMaker streams data directly from S3 to the container via a Unix-named pipe.
metric_definitions (list[dict]): Metrics definition with 'name' and 'regex' keys.

metric_definitions (list[dict]): A list of dictionaries that defines the metric(s) used to evaluate the
training jobs. Each dictionary contains two keys: 'Name' for the name of the metric, and 'Regex' for
the regular expression used to extract the metric from the logs. This should be defined only for
hyperparameter tuning jobs that don't use an Amazon algorithm.
role (str): An AWS IAM role (either name or full ARN). The Amazon SageMaker training jobs and APIs
that create Amazon SageMaker endpoints use this role to access training data and model artifacts.
You must grant sufficient permissions to this role.
Expand All @@ -293,11 +301,15 @@ def tune(self, job_name, strategy, objective_type, objective_metric_name,
https://botocore.readthedocs.io/en/latest/reference/services/sagemaker.html#SageMaker.Client.create_training_job
output_config (dict): The S3 URI where you want to store the training results and optional KMS key ID.
resource_config (dict): Contains values for ResourceConfig:
instance_count (int): Number of EC2 instances to use for training.
instance_type (str): Type of EC2 instance to use for training, for example, 'ml.c4.xlarge'.
stop_condition (dict): Defines when training shall finish. Contains entries that can be understood by the
service like ``MaxRuntimeInSeconds``.
tags (list[dict]): List of tags for labeling the tuning job.

* instance_count (int): Number of EC2 instances to use for training.
The key in resource_config is 'InstanceCount'.
* instance_type (str): Type of EC2 instance to use for training, for example, 'ml.c4.xlarge'.
The key in resource_config is 'InstanceType'.

stop_condition (dict): When training should finish, e.g. ``MaxRuntimeInSeconds``.
tags (list[dict]): List of tags for labeling the tuning job. For more, see
https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.
"""
tune_request = {
'HyperParameterTuningJobName': job_name,
Expand Down Expand Up @@ -338,10 +350,13 @@ def tune(self, job_name, strategy, objective_type, objective_metric_name,
self.sagemaker_client.create_hyper_parameter_tuning_job(**tune_request)

def stop_tuning_job(self, name):
"""Attempts to stop tuning job on Amazon SageMaker with specified name.
"""Stop the Amazon SageMaker hyperparameter tuning job with the specified name.

Args:
name: Name of Amazon SageMaker tuning job.
name (str): Name of the Amazon SageMaker hyperparameter tuning job.

Raises:
ClientError: If an error occurs while trying to stop the hyperparameter tuning job.
"""
try:
LOGGER.info('Stopping tuning job: {}'.format(name))
Expand Down Expand Up @@ -491,7 +506,7 @@ def wait_for_job(self, job, poll=5):
return desc

def wait_for_tuning_job(self, job, poll=5):
"""Wait for an Amazon SageMaker tuning job to complete.
"""Wait for an Amazon SageMaker hyperparameter tuning job to complete.

Args:
job (str): Name of the tuning job to wait for.
Expand Down
Loading