Skip to content

Commit a81e10e

Browse files
authored
Merge branch 'master' into add-torchrun-hf
2 parents 8cd844a + 79cfb94 commit a81e10e

File tree

70 files changed

+3822
-246
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+3822
-246
lines changed

CHANGELOG.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,62 @@
11
# Changelog
22

3+
## v2.146.0 (2023-04-13)
4+
5+
### Features
6+
7+
* Add support for JSON model inputs for Clarify Processor
8+
9+
### Bug Fixes and Other Changes
10+
11+
* Feature/list collection
12+
* improve reliability of Run integration test
13+
* Add a comment that smdataparallel lib excludes tf 2.12 support
14+
15+
### Documentation Changes
16+
17+
* Update reference to load run method in documentation
18+
19+
## v2.145.0 (2023-04-06)
20+
21+
### Features
22+
23+
* add support for async inline error notifications
24+
* Add methods for feature group to list feature metadata parameters and tags
25+
* Support huggingface hub model_id for DJL Models
26+
27+
### Bug Fixes and Other Changes
28+
29+
* load_sagemaker_config should lazy initialize a default S3 resource
30+
31+
## v2.144.0 (2023-04-05)
32+
33+
### Features
34+
35+
* support create Clarify explainer enabled endpoint for Clarify Online Explainability
36+
* Combined inference and training script artifact
37+
* jumpstart instance types
38+
* Deprecation warning for framework profiling for TF 2.12 and on, PT 2.0 and on
39+
40+
### Bug Fixes and Other Changes
41+
42+
* always delete temporary directory even during exception
43+
* Fixes the completion_criteria_config dict in the to_input_req method
44+
* Update CHANGELOG.md
45+
46+
### Documentation Changes
47+
48+
* Update SageMaker Debugger doc
49+
50+
## v2.143.0 (2023-03-29)
51+
52+
### Features
53+
54+
* Support for SageMaker SDK Defaults
55+
56+
### Bug Fixes and Other Changes
57+
58+
* update feature store offline s3 path used in tests
59+
360
## v2.142.0 (2023-03-27)
461

562
### Features

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.142.1.dev0
1+
2.146.1.dev0

doc/amazon_sagemaker_debugger.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@ Amazon SageMaker Debugger
44
#########################
55

66

7+
.. warning::
8+
9+
This page is no longer supported for maintenence. The live documentation is at `Debug and Profile Training Jobs Using Amazon SageMaker Debugger <https://docs.aws.amazon.com/sagemaker/latest/dg/train-debugger.html>`_
10+
and `Debugger API <https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html>`_.
11+
12+
713
Amazon SageMaker Debugger allows you to detect anomalies while training your machine learning model by emitting relevant data during training, storing the data and then analyzing it.
814

915
.. contents::

doc/api/inference/explainer.rst

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
Online Explainability
2+
---------------------
3+
4+
This module contains classes related to Amazon Sagemaker Clarify Online Explainability
5+
6+
.. automodule:: sagemaker.explainer.explainer_config
7+
:members:
8+
:undoc-members:
9+
:show-inheritance:
10+
11+
.. automodule:: sagemaker.explainer.clarify_explainer_config
12+
:members:
13+
:undoc-members:
14+
:show-inheritance:
15+
16+

doc/api/prep_data/feature_store.rst

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Feature Store APIs
22
------------------
33

4-
Feature group
4+
Feature Group
55
*************
66

77
.. autoclass:: sagemaker.feature_store.feature_group.FeatureGroup
@@ -18,7 +18,7 @@ Feature group
1818
:show-inheritance:
1919

2020

21-
Feature definition
21+
Feature Definition
2222
******************
2323

2424
.. autoclass:: sagemaker.feature_store.feature_definition.FeatureDefinition
@@ -77,10 +77,46 @@ Inputs
7777
:members:
7878
:show-inheritance:
7979

80+
.. autoclass:: sagemaker.feature_store.inputs.ResourceEnum
81+
:members:
82+
:show-inheritance:
83+
84+
.. autoclass:: sagemaker.feature_store.inputs.SearchOperatorEnum
85+
:members:
86+
:show-inheritance:
87+
88+
.. autoclass:: sagemaker.feature_store.inputs.SortOrderEnum
89+
:members:
90+
:show-inheritance:
91+
92+
.. autoclass:: sagemaker.feature_store.inputs.FilterOperatorEnum
93+
:members:
94+
:show-inheritance:
95+
96+
.. autoclass:: sagemaker.feature_store.inputs.Filter
97+
:members:
98+
:show-inheritance:
99+
100+
.. autoclass:: sagemaker.feature_store.inputs.Identifier
101+
:members:
102+
:show-inheritance:
103+
104+
.. autoclass:: sagemaker.feature_store.inputs.FeatureParameter
105+
:members:
106+
:show-inheritance:
107+
80108

81109
Dataset Builder
82110
***************
83111

84112
.. autoclass:: sagemaker.feature_store.dataset_builder.DatasetBuilder
85113
:members:
86114
:show-inheritance:
115+
116+
117+
Feature Store
118+
*************
119+
120+
.. autoclass:: sagemaker.feature_store.feature_store.FeatureStore
121+
:members:
122+
:show-inheritance:

doc/api/training/debugger.rst

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,13 @@ Configure the Debugger-specific parameters when constructing
1010
a SageMaker estimator to gain visibility and insights
1111
into your training job.
1212

13+
.. contents::
14+
1315
.. currentmodule:: sagemaker.debugger
1416

17+
Debugger Rule APIs
18+
~~~~~~~~~~~~~~~~~~
19+
1520
.. autoclass:: get_rule_container_image_uri
1621
:show-inheritance:
1722

@@ -44,6 +49,9 @@ into your training job.
4449
:show-inheritance:
4550
:inherited-members:
4651

52+
Debugger Configuration APIs
53+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
54+
4755
.. autoclass:: CollectionConfig
4856
:show-inheritance:
4957

@@ -56,6 +64,21 @@ into your training job.
5664
.. autoclass:: ProfilerConfig
5765
:show-inheritance:
5866

67+
Debugger Configuration APIs for Framework Profiling (Deprecated)
68+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
69+
70+
.. warning::
71+
72+
SageMaker Debugger deprecates the framework profiling feature starting from TensorFlow 2.11 and PyTorch 2.0. You can still use the feature in the previous versions of the frameworks and SDKs as follows.
73+
74+
* SageMaker Python SDK <= v2.130.0
75+
* PyTorch >= v1.6.0, < v2.0
76+
* TensorFlow >= v2.3.1, < v2.11
77+
78+
With the deprecation, SageMaker Debugger discontinues support for the APIs below this note.
79+
80+
See also `Amazon SageMaker Debugger Release Notes: March 16, 2023 <https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-release-notes.html#debugger-release-notes-20230315>`_.
81+
5982
.. autoclass:: FrameworkProfile
6083
:show-inheritance:
6184

doc/experiments/sagemaker.experiments.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Run
77
.. autoclass:: sagemaker.experiments.Run
88
:members:
99

10-
.. automethod:: sagemaker.experiments.load_run
10+
.. automethod:: sagemaker.experiments.run.load_run
1111

1212
.. automethod:: sagemaker.experiments.list_runs
1313

doc/frameworks/djl/using_djl.rst

Lines changed: 44 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ You can either deploy your model using DeepSpeed or HuggingFace Accelerate, or l
2929
3030
# Create a DJL Model, backend is chosen automatically
3131
djl_model = DJLModel(
32-
"s3://my_bucket/my_saved_model_artifacts/",
32+
"s3://my_bucket/my_saved_model_artifacts/", # This can also be a HuggingFace Hub model id
3333
"my_sagemaker_role",
3434
data_type="fp16",
3535
task="text-generation",
@@ -46,7 +46,7 @@ If you want to use a specific backend, then you can create an instance of the co
4646
4747
# Create a model using the DeepSpeed backend
4848
deepspeed_model = DeepSpeedModel(
49-
"s3://my_bucket/my_saved_model_artifacts/",
49+
"s3://my_bucket/my_saved_model_artifacts/", # This can also be a HuggingFace Hub model id
5050
"my_sagemaker_role",
5151
data_type="bf16",
5252
task="text-generation",
@@ -56,7 +56,7 @@ If you want to use a specific backend, then you can create an instance of the co
5656
# Create a model using the HuggingFace Accelerate backend
5757
5858
hf_accelerate_model = HuggingFaceAccelerateModel(
59-
"s3://my_bucket/my_saved_model_artifacts/",
59+
"s3://my_bucket/my_saved_model_artifacts/", # This can also be a HuggingFace Hub model id
6060
"my_sagemaker_role",
6161
data_type="fp16",
6262
task="text-generation",
@@ -91,9 +91,37 @@ model server configuration.
9191
Model Artifacts
9292
---------------
9393

94+
DJL Serving supports two ways to load models for inference.
95+
1. A HuggingFace Hub model id.
96+
2. Uncompressed model artifacts stored in a S3 bucket.
97+
98+
HuggingFace Hub model id
99+
^^^^^^^^^^^^^^^^^^^^^^^^
100+
101+
Using a HuggingFace Hub model id is the easiest way to get started with deploying Large Models via DJL Serving on SageMaker.
102+
DJL Serving will use this model id to download the model at runtime via the HuggingFace Transformers ``from_pretrained`` API.
103+
This method makes it easy to deploy models quickly, but for very large models the download time can become unreasonable.
104+
105+
For example, you can deploy the EleutherAI gpt-j-6B model like this:
106+
107+
.. code::
108+
109+
model = DJLModel(
110+
"EleutherAI/gpt-j-6B",
111+
"my_sagemaker_role",
112+
data_type="fp16",
113+
number_of_partitions=2
114+
)
115+
116+
predictor = model.deploy("ml.g5.12xlarge")
117+
118+
Uncompressed Model Artifacts stored in a S3 bucket
119+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
120+
121+
For models that are larger than 20GB (total checkpoint size), we recommend that you store the model in S3.
122+
Download times will be much faster compared to downloading from the HuggingFace Hub at runtime.
94123
DJL Serving Models expect a different model structure than most of the other frameworks in the SageMaker Python SDK.
95124
Specifically, DJLModels do not support loading models stored in tar.gz format.
96-
You must provide an Amazon S3 url pointing to uncompressed model artifacts (bucket and prefix).
97125
This is because DJL Serving is optimized for large models, and it implements a fast downloading mechanism for large models that require the artifacts be uncompressed.
98126

99127
For example, lets say you want to deploy the EleutherAI/gpt-j-6B model available on the HuggingFace Hub.
@@ -107,7 +135,18 @@ You can download the model and upload to S3 like this:
107135
# Upload to S3
108136
aws s3 sync gpt-j-6B s3://my_bucket/gpt-j-6B
109137
110-
You would then pass "s3://my_bucket/gpt-j-6B" as ``model_s3_uri`` to the ``DJLModel``.
138+
You would then pass "s3://my_bucket/gpt-j-6B" as ``model_id`` to the ``DJLModel`` like this:
139+
140+
.. code::
141+
142+
model = DJLModel(
143+
"s3://my_bucket/gpt-j-6B",
144+
"my_sagemaker_role",
145+
data_type="fp16",
146+
number_of_partitions=2
147+
)
148+
149+
predictor = model.deploy("ml.g5.12xlarge")
111150
112151
For language models we expect that the model weights, model config, and tokenizer config are provided in S3. The model
113152
should be loadable from the HuggingFace Transformers AutoModelFor<Task>.from_pretrained API, where task

doc/overview.rst

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1164,7 +1164,8 @@ More information about SageMaker Asynchronous Inference can be found in the `AWS
11641164

11651165
To deploy asynchronous inference endpoint, you will need to create a ``AsyncInferenceConfig`` object.
11661166
If you create ``AsyncInferenceConfig`` without specifying its arguments, the default ``S3OutputPath`` will
1167-
be ``s3://sagemaker-{REGION}-{ACCOUNTID}/async-endpoint-outputs/{UNIQUE-JOB-NAME}``. (example shown below):
1167+
be ``s3://sagemaker-{REGION}-{ACCOUNTID}/async-endpoint-outputs/{UNIQUE-JOB-NAME}``, ``S3FailurePath`` will
1168+
be ``s3://sagemaker-{REGION}-{ACCOUNTID}/async-endpoint-failures/{UNIQUE-JOB-NAME}`` (example shown below):
11681169

11691170
.. code:: python
11701171
@@ -1174,18 +1175,21 @@ be ``s3://sagemaker-{REGION}-{ACCOUNTID}/async-endpoint-outputs/{UNIQUE-JOB-NAME
11741175
async_config = AsyncInferenceConfig()
11751176
11761177
Or you can specify configurations in ``AsyncInferenceConfig`` as you like. All of those configuration parameters
1177-
are optional but if you don’t specify the ``output_path``, Amazon SageMaker will use the default ``S3OutputPath``
1178+
are optional but if you don’t specify the ``output_path`` or ``failure_path``, Amazon SageMaker will use the
1179+
default ``S3OutputPath`` or ``S3FailurePath``
11781180
mentioned above (example shown below):
11791181

11801182
.. code:: python
11811183
1182-
# Specify S3OutputPath, MaxConcurrentInvocationsPerInstance and NotificationConfig in the async config object
1184+
# Specify S3OutputPath, S3FailurePath, MaxConcurrentInvocationsPerInstance and NotificationConfig
1185+
# in the async config object
11831186
async_config = AsyncInferenceConfig(
11841187
output_path="s3://{s3_bucket}/{bucket_prefix}/output",
11851188
max_concurrent_invocations_per_instance=10,
11861189
notification_config = {
11871190
"SuccessTopic": "arn:aws:sns:aws-region:account-id:topic-name",
11881191
"ErrorTopic": "arn:aws:sns:aws-region:account-id:topic-name",
1192+
"IncludeInferenceResponseIn": ["SUCCESS_NOTIFICATION_TOPIC","ERROR_NOTIFICATION_TOPIC"],
11891193
}
11901194
)
11911195

src/sagemaker/async_inference/async_inference_config.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ def __init__(
3131
max_concurrent_invocations_per_instance=None,
3232
kms_key_id=None,
3333
notification_config=None,
34+
failure_path=None,
3435
):
3536
"""Initialize an AsyncInferenceConfig object for async inference configuration.
3637
@@ -45,6 +46,9 @@ def __init__(
4546
kms_key_id (str): Optional. The Amazon Web Services Key Management Service
4647
(Amazon Web Services KMS) key that Amazon SageMaker uses to encrypt the
4748
asynchronous inference output in Amazon S3. (Default: None)
49+
failure_path (str): Optional. The Amazon S3 location that endpoints upload model
50+
responses for failed requests. If no value is provided, Amazon SageMaker will
51+
use default Amazon S3 Async Inference failure path. (Default: None)
4852
notification_config (dict): Optional. Specifies the configuration for notifications
4953
of inference results for asynchronous inference. Only one notification is generated
5054
per invocation request (Default: None):
@@ -54,17 +58,24 @@ def __init__(
5458
* error_topic (str): Amazon SNS topic to post a notification to when inference
5559
fails. If no topic is provided, no notification is sent on failure.
5660
The key in notification_config is 'ErrorTopic'.
61+
* include_inference_response_in (list): Optional. When provided the inference
62+
response will be included in the notification topics. If not provided,
63+
a notification will still be generated on success/error, but will not
64+
contain the inference response.
65+
Valid options are SUCCESS_NOTIFICATION_TOPIC, ERROR_NOTIFICATION_TOPIC
5766
"""
5867
self.output_path = output_path
5968
self.max_concurrent_invocations_per_instance = max_concurrent_invocations_per_instance
6069
self.kms_key_id = kms_key_id
6170
self.notification_config = notification_config
71+
self.failure_path = failure_path
6272

6373
def _to_request_dict(self):
6474
"""Generates a request dictionary using the parameters provided to the class."""
6575
request_dict = {
6676
"OutputConfig": {
6777
"S3OutputPath": self.output_path,
78+
"S3FailurePath": self.failure_path,
6879
},
6980
}
7081

0 commit comments

Comments
 (0)