Skip to content

Commit 4292f61

Browse files
committed
Merge branch 'update-hf-neuronx-dlcs' of https://github.com/JingyaHuang/sagemaker-python-sdk into update-hf-neuronx-dlcs
2 parents 44f4096 + 71c20dc commit 4292f61

File tree

230 files changed

+15511
-2483
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

230 files changed

+15511
-2483
lines changed

.pylintrc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ unsafe-load-any-extension=no
4242
# A comma-separated list of package or module names from where C extensions may
4343
# be loaded. Extensions are loading into the active Python interpreter and may
4444
# run arbitrary code
45-
extension-pkg-whitelist=numpy
45+
extension-pkg-allow-list=numpy,math,_struct,_hashlib
4646

4747
# Allow optimization of some AST trees. This will activate a peephole AST
4848
# optimizer, which will apply various small optimizations. For instance, it can

CHANGELOG.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,106 @@
11
# Changelog
22

3+
## v2.159.0 (2023-05-23)
4+
5+
### Features
6+
7+
* Add TF Serving 2.12.1 images to the SM PySDK
8+
9+
### Bug Fixes and Other Changes
10+
11+
* Update the list of extension packages pylint is allowed to load
12+
13+
## v2.158.0 (2023-05-22)
14+
15+
### Features
16+
17+
* Enable default role for Spark processors
18+
* SDK Defaults - S3 Params for Session
19+
* Bump up images for DJL transformers Neuronx DLCs
20+
21+
### Bug Fixes and Other Changes
22+
23+
* Relax local-mode PyPI requirements on urllib3
24+
25+
### Documentation Changes
26+
27+
* Fix Tensorflow and PyTorch supported version in HuggingFaceProcessor
28+
* Update doc for model_server_workers param in PyTorchModel
29+
30+
## v2.157.0 (2023-05-18)
31+
32+
### Features
33+
34+
* Handle use case where endpoint is created outside of python …
35+
36+
### Bug Fixes and Other Changes
37+
38+
* Make type annotation of UploadedCode consistent
39+
* Add SELinux label to local docker volumes
40+
41+
## v2.156.0 (2023-05-17)
42+
43+
### Features
44+
45+
* Partition support for DJLModel using SM Training job
46+
* Update run-notebook-test to consider skips failures
47+
48+
### Bug Fixes and Other Changes
49+
50+
* Update apache airflow and update test requirements
51+
* Perform integrity checks for remote function execution
52+
* Add p2 instances to integ tests
53+
* Fix typo in logging message within ir mixin
54+
* double Run create on load_run
55+
* Update dtype logic for huggingface backend for new containers
56+
57+
### Documentation Changes
58+
59+
* Update container version for SKLearn
60+
* Add description for parameters in TransformInput
61+
62+
## v2.155.0 (2023-05-15)
63+
64+
### Features
65+
66+
* Add support for SageMaker Serverless inference Provisioned Concurrency feature
67+
68+
### Bug Fixes and Other Changes
69+
70+
* Revert "fix: make RemoteExecutor context manager non-blocking on pend…
71+
* Add BOM to no No P2 Availability region list
72+
73+
## v2.154.0 (2023-05-11)
74+
75+
### Features
76+
77+
* Add integ tests for remote_function, auto_capture functionality
78+
* jumpstart model estimator classes
79+
80+
### Bug Fixes and Other Changes
81+
82+
* integs - pytorch transformer deps and add test retry
83+
* adding .lower() so new Pandas dtypes will match the type lookup.
84+
* Pass KMS value to create processing job
85+
86+
## v2.153.0 (2023-05-09)
87+
88+
### Features
89+
90+
* Support npz archives in NumpyDeserializer
91+
* Add FasterTransformer DJL support
92+
* support for Sample Weights for SageMaker Autopilot
93+
94+
### Bug Fixes and Other Changes
95+
96+
* retry is_run assertion
97+
* Avoid 'AttributeError' for endpoint_name, if deploy() is not yet called
98+
* Fix LambdaStep Creation
99+
* Fix error when instance_count>1 in remote_function
100+
* Remove deprecated update_endpoint from deploy() args in TensorFlowModel
101+
* Update DJL deepspeed and fastertransformer DLC image uris
102+
* remote_function python version mismatch issue
103+
3104
## v2.152.0 (2023-05-04)
4105

5106
### Features

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.152.1.dev0
1+
2.159.1.dev0

doc/frameworks/djl/using_djl.rst

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,31 @@ see the `DJL Serving Documentation on Python Mode. <https://docs.djl.ai/docs/ser
221221

222222
For more information about DJL Serving, see the `DJL Serving documentation. <https://docs.djl.ai/docs/serving/index.html>`_
223223

224+
**************************
225+
Ahead of time partitioning
226+
**************************
227+
228+
To optimize the deployment of large models that do not fit in a single GPU, the model’s tensor weights are partitioned at
229+
runtime and each partition is loaded in individual GPU. But runtime partitioning takes significant amount of time and
230+
memory on model loading. So, DJLModel offers an ahead of time partitioning capability for DeepSpeed and FasterTransformer
231+
engines, which lets you partition your model weights and save them before deployment. HuggingFace does not support
232+
tensor parallelism, so ahead of time partitioning cannot be done for it. In our experiment with GPT-J model, loading
233+
this model with partitioned checkpoints increased the model loading time by 40%.
234+
235+
`partition` method invokes an Amazon SageMaker Training job to partition the model and upload those partitioned
236+
checkpoints to S3 bucket. You can either provide your desired S3 bucket to upload the partitioned checkpoints or it will be
237+
uploaded to the default SageMaker S3 bucket. Please note that this S3 bucket will be remembered for deployment. When you
238+
call `deploy` method after partition, DJLServing downloads the partitioned model checkpoints directly from the uploaded
239+
s3 url, if available.
240+
241+
.. code::
242+
243+
# partitions the model using Amazon Sagemaker Training Job.
244+
djl_model.partition("ml.g5.12xlarge")
245+
246+
predictor = deepspeed_model.deploy("ml.g5.12xlarge",
247+
initial_instance_count=1)
248+
224249
***********************
225250
SageMaker DJL Classes
226251
***********************

doc/frameworks/sklearn/using_sklearn.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ With Scikit-learn Estimators, you can train and host Scikit-learn models on Amaz
77
For information about supported versions of Scikit-learn, see the `AWS documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/sklearn.html>`__.
88
We recommend that you use the latest supported version because that's where we focus most of our development efforts.
99

10-
For more information about the framework, see the `Sciket-Learn <https://github.com/scikit-learn/scikit-learn>`_ repository.
10+
For more information about the framework, see the `Scikit-Learn <https://github.com/scikit-learn/scikit-learn>`_ repository.
1111
For general information about using the SageMaker Python SDK, see :ref:`overview:Using the SageMaker Python SDK`.
1212

1313
.. contents::
@@ -31,7 +31,7 @@ To train a Scikit-learn model by using the SageMaker Python SDK:
3131
Prepare a Scikit-learn Training Script
3232
======================================
3333

34-
Your Scikit-learn training script must be a Python 3.6 compatible source file.
34+
Your Scikit-learn training script must be a Python 3.7 compatible source file.
3535

3636
The training script is similar to a training script you might run outside of SageMaker, but you
3737
can access useful properties about the training environment through various environment variables.
@@ -140,7 +140,7 @@ directories ('train' and 'test').
140140
141141
sklearn_estimator = SKLearn('sklearn-train.py',
142142
instance_type='ml.m4.xlarge',
143-
framework_version='0.20.0',
143+
framework_version='1.0-1',
144144
hyperparameters = {'epochs': 20, 'batch-size': 64, 'learning-rate': 0.1})
145145
sklearn_estimator.fit({'train': 's3://my-data-bucket/path/to/my/training/data',
146146
'test': 's3://my-data-bucket/path/to/my/test/data'})
@@ -204,7 +204,7 @@ operation.
204204
# Train my estimator
205205
sklearn_estimator = SKLearn(entry_point='train_and_deploy.py',
206206
instance_type='ml.m4.xlarge',
207-
framework_version='0.20.0')
207+
framework_version='1.0-1')
208208
sklearn_estimator.fit('s3://my_bucket/my_training_data/')
209209
210210
# Deploy my estimator to a SageMaker Endpoint and get a Predictor
@@ -478,7 +478,7 @@ The following code sample shows how to do this, using the ``SKLearnModel`` class
478478
sklearn_model = SKLearnModel(model_data="s3://bucket/model.tar.gz",
479479
role="SageMakerRole",
480480
entry_point="transform_script.py",
481-
framework_version="0.20.0")
481+
framework_version="1.0-1")
482482
483483
predictor = sklearn_model.deploy(instance_type="ml.c4.xlarge", initial_instance_count=1)
484484
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
urllib3==1.26.8
1+
urllib3>=1.26.8,<1.26.15
22
docker-compose==1.29.2
33
docker>=5.0.2,<7.0.0
44
PyYAML==5.4.1

requirements/extras/test_requirements.txt

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,18 @@ awslogs==0.14.0
1212
black==22.3.0
1313
stopit==1.1.2
1414
# Update tox.ini to have correct version of airflow constraints file
15-
apache-airflow==2.5.1
15+
apache-airflow==2.6.0
1616
apache-airflow-providers-amazon==7.2.1
17-
attrs==22.1.0
17+
attrs>=23.1.0,<24
1818
fabric==2.6.0
19-
requests==2.27.1
19+
requests==2.31.0
2020
sagemaker-experiments==0.1.35
2121
Jinja2==3.0.3
2222
pyvis==0.2.1
2323
pandas>=1.3.5,<1.5
2424
scikit-learn==1.0.2
2525
cloudpickle==2.2.1
26+
scipy==1.7.3
27+
urllib3>=1.26.8,<1.26.15
28+
docker>=5.0.2,<7.0.0
29+
PyYAML==6.0

setup.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,8 @@ def read_requirements(filename):
4747

4848
# Declare minimal set for installation
4949
required_packages = [
50-
"attrs>=20.3.0,<23",
51-
"boto3>=1.26.28,<2.0",
50+
"attrs>=23.1.0,<24",
51+
"boto3>=1.26.131,<2.0",
5252
"cloudpickle==2.2.1",
5353
"google-pasta",
5454
"numpy>=1.9.0,<2.0",
@@ -60,7 +60,7 @@ def read_requirements(filename):
6060
"pandas",
6161
"pathos",
6262
"schema",
63-
"PyYAML==5.4.1",
63+
"PyYAML==6.0",
6464
"jsonschema",
6565
"platformdirs",
6666
"tblib==1.7.0",
@@ -75,7 +75,7 @@ def read_requirements(filename):
7575
# Meta dependency groups
7676
extras["all"] = [item for group in extras.values() for item in group]
7777
# Tests specific dependencies (do not need to be included in 'all')
78-
extras["test"] = (extras["all"] + read_requirements("requirements/extras/test_requirements.txt"),)
78+
extras["test"] = (read_requirements("requirements/extras/test_requirements.txt"),)
7979

8080
setup(
8181
name="sagemaker",

src/sagemaker/accept_types.py

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License"). You
4+
# may not use this file except in compliance with the License. A copy of
5+
# the License is located at
6+
#
7+
# http://aws.amazon.com/apache2.0/
8+
#
9+
# or in the "license" file accompanying this file. This file is
10+
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
11+
# ANY KIND, either express or implied. See the License for the specific
12+
# language governing permissions and limitations under the License.
13+
"""This module is for SageMaker accept types."""
14+
from __future__ import absolute_import
15+
from typing import List, Optional
16+
17+
from sagemaker.jumpstart import artifacts, utils as jumpstart_utils
18+
19+
20+
def retrieve_options(
21+
region: Optional[str] = None,
22+
model_id: Optional[str] = None,
23+
model_version: Optional[str] = None,
24+
tolerate_vulnerable_model: bool = False,
25+
tolerate_deprecated_model: bool = False,
26+
) -> List[str]:
27+
"""Retrieves the supported accept types for the model matching the given arguments.
28+
29+
Args:
30+
region (str): The AWS Region for which to retrieve the supported accept types.
31+
Defaults to ``None``.
32+
model_id (str): The model ID of the model for which to
33+
retrieve the supported accept types. (Default: None).
34+
model_version (str): The version of the model for which to retrieve the
35+
supported accept types. (Default: None).
36+
tolerate_vulnerable_model (bool): True if vulnerable versions of model
37+
specifications should be tolerated (exception not raised). If False, raises an
38+
exception if the script used by this version of the model has dependencies with known
39+
security vulnerabilities. (Default: False).
40+
tolerate_deprecated_model (bool): True if deprecated models should be tolerated
41+
(exception not raised). False if these models should raise an exception.
42+
(Default: False).
43+
Returns:
44+
list: The supported accept types to use for the model.
45+
46+
Raises:
47+
ValueError: If the combination of arguments specified is not supported.
48+
"""
49+
if not jumpstart_utils.is_jumpstart_model_input(model_id, model_version):
50+
raise ValueError(
51+
"Must specify JumpStart `model_id` and `model_version` when retrieving accept types."
52+
)
53+
54+
return artifacts._retrieve_supported_accept_types(
55+
model_id,
56+
model_version,
57+
region,
58+
tolerate_vulnerable_model,
59+
tolerate_deprecated_model,
60+
)
61+
62+
63+
def retrieve_default(
64+
region: Optional[str] = None,
65+
model_id: Optional[str] = None,
66+
model_version: Optional[str] = None,
67+
tolerate_vulnerable_model: bool = False,
68+
tolerate_deprecated_model: bool = False,
69+
) -> str:
70+
"""Retrieves the default accept type for the model matching the given arguments.
71+
72+
Args:
73+
region (str): The AWS Region for which to retrieve the default accept type.
74+
Defaults to ``None``.
75+
model_id (str): The model ID of the model for which to
76+
retrieve the default accept type. (Default: None).
77+
model_version (str): The version of the model for which to retrieve the
78+
default accept type. (Default: None).
79+
tolerate_vulnerable_model (bool): True if vulnerable versions of model
80+
specifications should be tolerated (exception not raised). If False, raises an
81+
exception if the script used by this version of the model has dependencies with known
82+
security vulnerabilities. (Default: False).
83+
tolerate_deprecated_model (bool): True if deprecated models should be tolerated
84+
(exception not raised). False if these models should raise an exception.
85+
(Default: False).
86+
Returns:
87+
str: The default accept type to use for the model.
88+
89+
Raises:
90+
ValueError: If the combination of arguments specified is not supported.
91+
"""
92+
if not jumpstart_utils.is_jumpstart_model_input(model_id, model_version):
93+
raise ValueError(
94+
"Must specify JumpStart `model_id` and `model_version` when retrieving accept types."
95+
)
96+
97+
return artifacts._retrieve_default_accept_type(
98+
model_id,
99+
model_version,
100+
region,
101+
tolerate_vulnerable_model,
102+
tolerate_deprecated_model,
103+
)

src/sagemaker/amazon/amazon_estimator.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020

2121
from six.moves.urllib.parse import urlparse
2222

23-
from sagemaker import image_uris
23+
from sagemaker import image_uris, s3_utils
2424
from sagemaker.amazon import validation
2525
from sagemaker.amazon.hyperparameter import Hyperparameter as hp # noqa
2626
from sagemaker.amazon.common import write_numpy_to_dense_tensor
@@ -93,8 +93,15 @@ def __init__(
9393
enable_network_isolation=enable_network_isolation,
9494
**kwargs
9595
)
96-
data_location = data_location or "s3://{}/sagemaker-record-sets/".format(
97-
self.sagemaker_session.default_bucket()
96+
97+
data_location = data_location or (
98+
s3_utils.s3_path_join(
99+
"s3://",
100+
self.sagemaker_session.default_bucket(),
101+
self.sagemaker_session.default_bucket_prefix,
102+
"sagemaker-record-sets",
103+
with_end_slash=True,
104+
)
98105
)
99106
self._data_location = data_location
100107

0 commit comments

Comments
 (0)