Skip to content

Commit 226720a

Browse files
authored
Merge branch 'master' into remove_cw_metrics_arg
2 parents d6729b3 + 8051399 commit 226720a

26 files changed

+870
-68
lines changed

CHANGELOG.rst

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,19 @@
22
CHANGELOG
33
=========
44

5-
1.7.1dev
5+
1.8.0
6+
=====
7+
8+
* bug-fix: removing PCA from tuner
9+
* feature: Estimators: add support for Amazon k-nearest neighbors(KNN) algorithm
10+
11+
1.7.2
12+
=====
13+
14+
* bug-fix: Prediction output for the TF_JSON_SERIALIZER
15+
* enhancement: Add better training job status report
16+
17+
1.7.1
618
=====
719

820
* bug-fix: get_execution_role no longer fails if user can't call get_role

README.rst

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,12 @@ You can install from source by cloning this repository and issuing a pip install
5151

5252
git clone https://github.com/aws/sagemaker-python-sdk.git
5353
python setup.py sdist
54-
pip install dist/sagemaker-1.7.0.tar.gz
54+
pip install dist/sagemaker-1.8.0.tar.gz
55+
56+
Supported Operating Systems
57+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
58+
59+
SageMaker Python SDK supports Unix/Linux and Mac.
5560

5661
Supported Python versions
5762
~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -290,7 +295,7 @@ Amazon SageMaker provides several built-in machine learning algorithms that you
290295

291296
The full list of algorithms is available on the AWS website: https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html
292297

293-
SageMaker Python SDK includes Estimator wrappers for the AWS K-means, Principal Components Analysis(PCA), Linear Learner, Factorization Machines, Latent Dirichlet Allocation(LDA), Neural Topic Model(NTM) and Random Cut Forest algorithms.
298+
SageMaker Python SDK includes Estimator wrappers for the AWS K-means, Principal Components Analysis(PCA), Linear Learner, Factorization Machines, Latent Dirichlet Allocation(LDA), Neural Topic Model(NTM) Random Cut Forest and k-nearest neighbors (k-NN) algorithms.
294299

295300
More details at `AWS SageMaker Estimators and Models`_.
296301

setup.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ def read(fname):
2323

2424

2525
setup(name="sagemaker",
26-
version="1.7.0",
26+
version="1.8.0",
2727
description="Open source library for training and deploying models on Amazon SageMaker.",
2828
packages=find_packages('src'),
2929
package_dir={'': 'src'},
@@ -45,7 +45,7 @@ def read(fname):
4545

4646
# Declare minimal set for installation
4747
install_requires=['boto3>=1.4.8', 'numpy>=1.9.0', 'protobuf>=3.1', 'scipy>=0.19.0', 'urllib3>=1.2',
48-
'PyYAML>=3.2'],
48+
'PyYAML>=3.2', 'protobuf3-to-dict>=0.1.5'],
4949

5050
extras_require={
5151
'test': ['tox', 'flake8', 'pytest', 'pytest-cov', 'pytest-xdist',

src/sagemaker/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
from sagemaker.amazon.ntm import NTM, NTMModel, NTMPredictor # noqa: F401
2323
from sagemaker.amazon.randomcutforest import (RandomCutForest, RandomCutForestModel, # noqa: F401
2424
RandomCutForestPredictor)
25+
from sagemaker.amazon.knn import KNN, KNNModel, KNNPredictor # noqa: F401
2526

2627
from sagemaker.analytics import TrainingJobAnalytics, HyperparameterTuningJobAnalytics # noqa: F401
2728
from sagemaker.local.local_session import LocalSession # noqa: F401

src/sagemaker/amazon/README.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Amazon SageMaker provides several built-in machine learning algorithms that you
77

88
The full list of algorithms is available on the AWS website: https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html
99

10-
SageMaker Python SDK includes Estimator wrappers for the AWS K-means, Principal Components Analysis(PCA), Linear Learner, Factorization Machines, Latent Dirichlet Allocation(LDA), Neural Topic Model(NTM) and Random Cut Forest algorithms.
10+
SageMaker Python SDK includes Estimator wrappers for the AWS K-means, Principal Components Analysis(PCA), Linear Learner, Factorization Machines, Latent Dirichlet Allocation(LDA), Neural Topic Model(NTM), Random Cut Forest algorithms and k-nearest neighbors (k-NN).
1111

1212
Definition and usage
1313
~~~~~~~~~~~~~~~~~~~~

src/sagemaker/amazon/amazon_estimator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -278,7 +278,7 @@ def registry(region_name, algorithm=None):
278278
https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/amazon
279279
"""
280280
if algorithm in [None, "pca", "kmeans", "linear-learner", "factorization-machines", "ntm",
281-
"randomcutforest"]:
281+
"randomcutforest", "knn"]:
282282
account_id = {
283283
"us-east-1": "382416733822",
284284
"us-east-2": "404615174143",

src/sagemaker/amazon/knn.py

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Copyright 2017-2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License"). You
4+
# may not use this file except in compliance with the License. A copy of
5+
# the License is located at
6+
#
7+
# http://aws.amazon.com/apache2.0/
8+
#
9+
# or in the "license" file accompanying this file. This file is
10+
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
11+
# ANY KIND, either express or implied. See the License for the specific
12+
# language governing permissions and limitations under the License.
13+
from __future__ import absolute_import
14+
15+
from sagemaker.amazon.amazon_estimator import AmazonAlgorithmEstimatorBase, registry
16+
from sagemaker.amazon.common import numpy_to_record_serializer, record_deserializer
17+
from sagemaker.amazon.hyperparameter import Hyperparameter as hp # noqa
18+
from sagemaker.amazon.validation import ge, isin
19+
from sagemaker.predictor import RealTimePredictor
20+
from sagemaker.model import Model
21+
from sagemaker.session import Session
22+
23+
24+
class KNN(AmazonAlgorithmEstimatorBase):
25+
repo_name = 'knn'
26+
repo_version = 1
27+
28+
k = hp('k', (ge(1)), 'An integer greater than 0', int)
29+
sample_size = hp('sample_size', (ge(1)), 'An integer greater than 0', int)
30+
predictor_type = hp('predictor_type', isin('classifier', 'regressor'),
31+
'One of "classifier" or "regressor"', str)
32+
dimension_reduction_target = hp('dimension_reduction_target', (ge(1)),
33+
'An integer greater than 0 and less than feature_dim', int)
34+
dimension_reduction_type = hp('dimension_reduction_type', isin('sign', 'fjlt'), 'One of "sign" or "fjlt"', str)
35+
index_metric = hp('index_metric', isin('COSINE', 'INNER_PRODUCT', 'L2'),
36+
'One of "COSINE", "INNER_PRODUCT", "L2"', str)
37+
index_type = hp('index_type', isin('faiss.Flat', 'faiss.IVFFlat', 'faiss.IVFPQ'),
38+
'One of "faiss.Flat", "faiss.IVFFlat", "faiss.IVFPQ"', str)
39+
faiss_index_ivf_nlists = hp('faiss_index_ivf_nlists', (), '"auto" or an integer greater than 0', str)
40+
faiss_index_pq_m = hp('faiss_index_pq_m', (ge(1)), 'An integer greater than 0', int)
41+
42+
def __init__(self, role, train_instance_count, train_instance_type, k, sample_size, predictor_type,
43+
dimension_reduction_type=None, dimension_reduction_target=None, index_type=None,
44+
index_metric=None, faiss_index_ivf_nlists=None, faiss_index_pq_m=None, **kwargs):
45+
"""k-nearest neighbors (KNN) is :class:`Estimator` used for classification and regression.
46+
47+
This Estimator may be fit via calls to
48+
:meth:`~sagemaker.amazon.amazon_estimator.AmazonAlgorithmEstimatorBase.fit`. It requires Amazon
49+
:class:`~sagemaker.amazon.record_pb2.Record` protobuf serialized data to be stored in S3.
50+
There is an utility :meth:`~sagemaker.amazon.amazon_estimator.AmazonAlgorithmEstimatorBase.record_set` that
51+
can be used to upload data to S3 and creates :class:`~sagemaker.amazon.amazon_estimator.RecordSet` to be passed
52+
to the `fit` call.
53+
54+
To learn more about the Amazon protobuf Record class and how to prepare bulk data in this format, please
55+
consult AWS technical documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html
56+
57+
After this Estimator is fit, model data is stored in S3. The model may be deployed to an Amazon SageMaker
58+
Endpoint by invoking :meth:`~sagemaker.amazon.estimator.EstimatorBase.deploy`. As well as deploying an Endpoint,
59+
deploy returns a :class:`~sagemaker.amazon.knn.KNNPredictor` object that can be used
60+
for inference calls using the trained model hosted in the SageMaker Endpoint.
61+
62+
KNN Estimators can be configured by setting hyperparameters. The available hyperparameters for
63+
KNN are documented below.
64+
65+
For further information on the AWS KNN algorithm,
66+
please consult AWS technical documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/knn.html
67+
68+
Args:
69+
role (str): An AWS IAM role (either name or full ARN). The Amazon SageMaker training jobs and
70+
APIs that create Amazon SageMaker endpoints use this role to access
71+
training data and model artifacts. After the endpoint is created,
72+
the inference code might use the IAM role, if accessing AWS resource.
73+
train_instance_type (str): Type of EC2 instance to use for training, for example, 'ml.c4.xlarge'.
74+
k (int): Required. Number of nearest neighbors.
75+
sample_size(int): Required. Number of data points to be sampled from the training data set.
76+
predictor_type (str): Required. Type of inference to use on the data's labels,
77+
allowed values are 'classifier' and 'regressor'.
78+
dimension_reduction_type (str): Optional. Type of dimension reduction technique to use.
79+
Valid values: “sign”, “fjlt”
80+
dimension_reduction_target (int): Optional. Target dimension to reduce to. Required when
81+
dimension_reduction_type is specified.
82+
index_type (str): Optional. Type of index to use. Valid values are
83+
“faiss.Flat”, “faiss.IVFFlat”, “faiss.IVFPQ”.
84+
index_metric(str): Optional. Distance metric to measure between points when finding nearest neighbors.
85+
Valid values are "COSINE", "INNER_PRODUCT", "L2"
86+
faiss_index_ivf_nlists(str): Optional. Number of centroids to construct in the index if
87+
index_type is “faiss.IVFFlat” or “faiss.IVFPQ”.
88+
faiss_index_pq_m(int): Optional. Number of vector sub-components to construct in the index,
89+
if index_type is “faiss.IVFPQ”.
90+
**kwargs: base class keyword argument values.
91+
"""
92+
93+
super(KNN, self).__init__(role, train_instance_count, train_instance_type, **kwargs)
94+
self.k = k
95+
self.sample_size = sample_size
96+
self.predictor_type = predictor_type
97+
self.dimension_reduction_type = dimension_reduction_type
98+
self.dimension_reduction_target = dimension_reduction_target
99+
self.index_type = index_type
100+
self.index_metric = index_metric
101+
self.faiss_index_ivf_nlists = faiss_index_ivf_nlists
102+
self.faiss_index_pq_m = faiss_index_pq_m
103+
if dimension_reduction_type and not dimension_reduction_target:
104+
raise ValueError('"dimension_reduction_target" is required when "dimension_reduction_type" is set.')
105+
106+
def create_model(self):
107+
"""Return a :class:`~sagemaker.amazon.KNNModel` referencing the latest
108+
s3 model data produced by this Estimator."""
109+
110+
return KNNModel(self.model_data, self.role, sagemaker_session=self.sagemaker_session)
111+
112+
def _prepare_for_training(self, records, mini_batch_size=None, job_name=None):
113+
super(KNN, self)._prepare_for_training(records, mini_batch_size=mini_batch_size, job_name=job_name)
114+
115+
116+
class KNNPredictor(RealTimePredictor):
117+
"""Performs classification or regression prediction from input vectors.
118+
119+
The implementation of :meth:`~sagemaker.predictor.RealTimePredictor.predict` in this
120+
`RealTimePredictor` requires a numpy ``ndarray`` as input. The array should contain the
121+
same number of columns as the feature-dimension of the data used to fit the model this
122+
Predictor performs inference on.
123+
124+
:func:`predict` returns a list of :class:`~sagemaker.amazon.record_pb2.Record` objects, one
125+
for each row in the input ``ndarray``. The prediction is stored in the ``"predicted_label"``
126+
key of the ``Record.label`` field."""
127+
128+
def __init__(self, endpoint, sagemaker_session=None):
129+
super(KNNPredictor, self).__init__(endpoint, sagemaker_session, serializer=numpy_to_record_serializer(),
130+
deserializer=record_deserializer())
131+
132+
133+
class KNNModel(Model):
134+
"""Reference S3 model data created by KNN estimator. Calling :meth:`~sagemaker.model.Model.deploy`
135+
creates an Endpoint and returns :class:`KNNPredictor`."""
136+
137+
def __init__(self, model_data, role, sagemaker_session=None):
138+
sagemaker_session = sagemaker_session or Session()
139+
repo = '{}:{}'.format(KNN.repo_name, KNN.repo_version)
140+
image = '{}/{}'.format(registry(sagemaker_session.boto_session.region_name, KNN.repo_name), repo)
141+
super(KNNModel, self).__init__(model_data, image, role, predictor_cls=KNNPredictor,
142+
sagemaker_session=sagemaker_session)

src/sagemaker/amazon/linear_learner.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -247,7 +247,7 @@ def __init__(self, role, train_instance_count, train_instance_type, predictor_ty
247247
"For predictor_type 'multiclass_classifier', 'num_classes' should be set to a value greater than 2.")
248248

249249
def create_model(self):
250-
"""Return a :class:`~sagemaker.amazon.kmeans.LinearLearnerModel` referencing the latest
250+
"""Return a :class:`~sagemaker.amazon.LinearLearnerModel` referencing the latest
251251
s3 model data produced by this Estimator."""
252252

253253
return LinearLearnerModel(self.model_data, self.role, self.sagemaker_session)

src/sagemaker/chainer/README.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ Optional arguments
145145
The following are optional arguments. When you create a ``Chainer`` object, you can specify these as keyword arguments.
146146

147147
- ``source_dir`` Path (absolute or relative) to a directory with any
148-
other training source code dependencies aside from the entry point
148+
other training source code dependencies including the entry point
149149
file. Structure within this directory will be preserved when training
150150
on SageMaker.
151151
- ``hyperparameters`` Hyperparameters that will be used for training.
@@ -574,7 +574,7 @@ The ChainerModel constructor takes the following arguments:
574574
- ``entry_point (str):`` Path (absolute or relative) to the Python file
575575
which should be executed as the entry point to model hosting.
576576
- ``source_dir (str):`` Optional. Path (absolute or relative) to a
577-
directory with any other training source code dependencies aside from
577+
directory with any other training source code dependencies including
578578
tne entry point file. Structure within this directory will be
579579
preserved when training on SageMaker.
580580
- ``enable_cloudwatch_metrics (boolean):`` Optional. If true, training

src/sagemaker/mxnet/README.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ Optional arguments
123123
The following are optional arguments. When you create an ``MXNet`` object, you can specify these as keyword arguments.
124124

125125
- ``source_dir`` Path (absolute or relative) to a directory with any
126-
other training source code dependencies aside from the entry point
126+
other training source code dependencies including the entry point
127127
file. Structure within this directory will be preserved when training
128128
on SageMaker.
129129
- ``hyperparameters`` Hyperparameters that will be used for training.
@@ -171,7 +171,7 @@ Required argument
171171
- ``inputs``: This can take one of the following forms: A string
172172
s3 URI, for example ``s3://my-bucket/my-training-data``. In this
173173
case, the s3 objects rooted at the ``my-training-data`` prefix will
174-
be available in the default ``train`` channel. A dict from
174+
be available in the default ``training`` channel. A dict from
175175
string channel names to s3 URIs. In this case, the objects rooted at
176176
each s3 prefix will available as files in each channel directory.
177177

@@ -540,7 +540,7 @@ The MXNetModel constructor takes the following arguments:
540540
- ``entry_point (str):`` Path (absolute or relative) to the Python file
541541
which should be executed as the entry point to model hosting.
542542
- ``source_dir (str):`` Optional. Path (absolute or relative) to a
543-
directory with any other training source code dependencies aside from
543+
directory with any other training source code dependencies including
544544
tne entry point file. Structure within this directory will be
545545
preserved when training on SageMaker.
546546
- ``container_log_level (int):`` Log level to use within the container.

src/sagemaker/pytorch/README.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ Optional arguments
171171
The following are optional arguments. When you create a ``PyTorch`` object, you can specify these as keyword arguments.
172172

173173
- ``source_dir`` Path (absolute or relative) to a directory with any
174-
other training source code dependencies aside from the entry point
174+
other training source code dependencies including the entry point
175175
file. Structure within this directory will be preserved when training
176176
on SageMaker.
177177
- ``hyperparameters`` Hyperparameters that will be used for training.
@@ -606,7 +606,7 @@ The PyTorchModel constructor takes the following arguments:
606606
- ``entry_point:`` Path (absolute or relative) to the Python file
607607
which should be executed as the entry point to model hosting.
608608
- ``source_dir:`` Optional. Path (absolute or relative) to a
609-
directory with any other training source code dependencies aside from
609+
directory with any other training source code dependencies including
610610
tne entry point file. Structure within this directory will be
611611
preserved when training on SageMaker.
612612
- ``enable_cloudwatch_metrics:`` Optional. If true, training

0 commit comments

Comments
 (0)