Skip to content

Commit 564e454

Browse files
committed
2 parents 95b1056 + 8d84618 commit 564e454

35 files changed

+2987
-98
lines changed

CHANGELOG.md

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,25 @@
11
# Changelog
22

3+
## v2.87.0 (2022-04-20)
4+
5+
### Features
6+
7+
* Add Jumpstart example notebooks
8+
* add Tensorflow and Pytorch version for SM Training Compiler and expand to regular regions
9+
10+
### Bug Fixes and Other Changes
11+
12+
* integs for training compiler in non-PDX regions
13+
* TrainingStep cache misses due to timestamp based job name
14+
* retry context delete
15+
* Add more logging when unexpected number of artifacts found
16+
17+
## v2.86.2 (2022-04-14)
18+
19+
### Bug Fixes and Other Changes
20+
21+
* #using uuid to randomize, otherwise system timestamp is used
22+
323
## v2.86.1 (2022-04-13)
424

525
### Bug Fixes and Other Changes
@@ -159,7 +179,7 @@
159179
### Features
160180

161181
* override jumpstart content bucket
162-
* jumpstart model id suggestions
182+
* jumpstart model ID suggestions
163183
* adding customer metadata support to registermodel step
164184

165185
### Bug Fixes and Other Changes

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.86.2.dev0
1+
2.87.1.dev0

doc/api/training/distributed.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ The SageMaker Distributed Data Parallel Library
1010
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1111

1212
.. toctree::
13-
:maxdepth: 3
13+
:maxdepth: 2
1414

1515
smd_data_parallel
1616
sdp_versions/latest
@@ -23,7 +23,7 @@ The SageMaker Distributed Model Parallel Library
2323
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2424

2525
.. toctree::
26-
:maxdepth: 3
26+
:maxdepth: 2
2727

2828
smd_model_parallel
2929
smp_versions/latest

doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst

Lines changed: 40 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,48 @@ Release Notes
55
New features, bug fixes, and improvements are regularly made to the SageMaker
66
distributed model parallel library.
77

8-
SageMaker Distributed Model Parallel 1.7.0 Release Notes
8+
SageMaker Distributed Model Parallel 1.8.0 Release Notes
99
========================================================
1010

11+
*Date: March. 23. 2022*
12+
13+
**New Features**
14+
15+
* Added tensor parallelism support for the `GPT-J model
16+
<https://huggingface.co/docs/transformers/model_doc/gptj>`_.
17+
When using the GPT-J model of Hugging Face Transformers v4.17.0 with
18+
tensor parallelism, the SageMaker model parallel library automatically
19+
replaces the model with a tensor parallel distributed GPT-J model.
20+
For more information, see `Support for Hugging Face Transformer Models
21+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-hugging-face.html>`_
22+
in the *Amazon SageMaker Model Parallel Training developer guide*.
23+
24+
**Migration to AWS Deep Learning Containers**
25+
26+
This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers:
27+
28+
* HuggingFace 4.17.0 DLC with PyTorch 1.10.2
29+
30+
.. code::
31+
32+
763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04
33+
34+
35+
The binary file of this version of the library for custom container users:
36+
37+
.. code::
38+
39+
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.10.0/build-artifacts/2022-03-12-00-33/smdistributed_modelparallel-1.8.0-cp38-cp38-linux_x86_64.whl
40+
41+
42+
----
43+
44+
Release History
45+
===============
46+
47+
SageMaker Distributed Model Parallel 1.7.0 Release Notes
48+
--------------------------------------------------------
49+
1150
*Date: March. 07. 2022*
1251

1352
**Currency Updates**
@@ -49,11 +88,6 @@ This version passed benchmark testing and is migrated to the following AWS Deep
4988
763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.10.2-gpu-py38-cu113-ubuntu20.04-sagemaker
5089
5190
52-
----
53-
54-
Release History
55-
===============
56-
5791
SageMaker Distributed Model Parallel 1.6.0 Release Notes
5892
--------------------------------------------------------
5993

doc/api/training/smp_versions/latest.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ depending on which version of the library you need to use.
1010
To use the library, reference the
1111
**Common API** documentation alongside the framework specific API documentation.
1212

13-
Version 1.7.0 (Latest)
14-
======================
13+
Version 1.7.0, 1.8.0 (Latest)
14+
=============================
1515

1616
To use the library, reference the Common API documentation alongside the framework specific API documentation.
1717

doc/doc_utils/jumpstart_doc_utils.py

Lines changed: 74 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,73 @@
1414
from urllib import request
1515
import json
1616
from packaging.version import Version
17+
from enum import Enum
18+
19+
20+
class Tasks(str, Enum):
21+
"""The ML task name as referenced in the infix of the model ID."""
22+
23+
IC = "ic"
24+
OD = "od"
25+
OD1 = "od1"
26+
SEMSEG = "semseg"
27+
IS = "is"
28+
TC = "tc"
29+
SPC = "spc"
30+
EQA = "eqa"
31+
TEXT_GENERATION = "textgeneration"
32+
IC_EMBEDDING = "icembedding"
33+
TC_EMBEDDING = "tcembedding"
34+
NER = "ner"
35+
SUMMARIZATION = "summarization"
36+
TRANSLATION = "translation"
37+
TABULAR_REGRESSION = "regression"
38+
TABULAR_CLASSIFICATION = "classification"
39+
40+
41+
class ProblemTypes(str, Enum):
42+
"""Possible problem types for JumpStart models."""
43+
44+
IMAGE_CLASSIFICATION = "Image Classification"
45+
IMAGE_EMBEDDING = "Image Embedding"
46+
OBJECT_DETECTION = "Object Detection"
47+
SEMANTIC_SEGMENTATION = "Semantic Segmentation"
48+
INSTANCE_SEGMENTATION = "Instance Segmentation"
49+
TEXT_CLASSIFICATION = "Text Classification"
50+
TEXT_EMBEDDING = "Text Embedding"
51+
QUESTION_ANSWERING = "Question Answering"
52+
SENTENCE_PAIR_CLASSIFICATION = "Sentence Pair Classification"
53+
TEXT_GENERATION = "Text Generation"
54+
TEXT_SUMMARIZATION = "Text Summarization"
55+
MACHINE_TRANSLATION = "Machine Translation"
56+
NAMED_ENTITY_RECOGNITION = "Named Entity Recognition"
57+
TABULAR_REGRESSION = "Regression"
58+
TABULAR_CLASSIFICATION = "Classification"
59+
1760

1861
JUMPSTART_REGION = "eu-west-2"
1962
SDK_MANIFEST_FILE = "models_manifest.json"
2063
JUMPSTART_BUCKET_BASE_URL = "https://jumpstart-cache-prod-{}.s3.{}.amazonaws.com".format(
2164
JUMPSTART_REGION, JUMPSTART_REGION
2265
)
66+
TASK_MAP = {
67+
Tasks.IC: ProblemTypes.IMAGE_CLASSIFICATION,
68+
Tasks.IC_EMBEDDING: ProblemTypes.IMAGE_EMBEDDING,
69+
Tasks.OD: ProblemTypes.OBJECT_DETECTION,
70+
Tasks.OD1: ProblemTypes.OBJECT_DETECTION,
71+
Tasks.SEMSEG: ProblemTypes.SEMANTIC_SEGMENTATION,
72+
Tasks.IS: ProblemTypes.INSTANCE_SEGMENTATION,
73+
Tasks.TC: ProblemTypes.TEXT_CLASSIFICATION,
74+
Tasks.TC_EMBEDDING: ProblemTypes.TEXT_EMBEDDING,
75+
Tasks.EQA: ProblemTypes.QUESTION_ANSWERING,
76+
Tasks.SPC: ProblemTypes.SENTENCE_PAIR_CLASSIFICATION,
77+
Tasks.TEXT_GENERATION: ProblemTypes.TEXT_GENERATION,
78+
Tasks.SUMMARIZATION: ProblemTypes.TEXT_SUMMARIZATION,
79+
Tasks.TRANSLATION: ProblemTypes.MACHINE_TRANSLATION,
80+
Tasks.NER: ProblemTypes.NAMED_ENTITY_RECOGNITION,
81+
Tasks.TABULAR_REGRESSION: ProblemTypes.TABULAR_REGRESSION,
82+
Tasks.TABULAR_CLASSIFICATION: ProblemTypes.TABULAR_CLASSIFICATION,
83+
}
2384

2485

2586
def get_jumpstart_sdk_manifest():
@@ -36,6 +97,11 @@ def get_jumpstart_sdk_spec(key):
3697
return json.loads(model_spec)
3798

3899

100+
def get_model_task(id):
101+
task_short = id.split("-")[1]
102+
return TASK_MAP[task_short] if task_short in TASK_MAP else "Source"
103+
104+
39105
def create_jumpstart_model_table():
40106
sdk_manifest = get_jumpstart_sdk_manifest()
41107
sdk_manifest_top_versions_for_models = {}
@@ -56,9 +122,9 @@ def create_jumpstart_model_table():
56122
file_content.append("==================================\n")
57123
file_content.append(
58124
"""
59-
JumpStart for the SageMaker Python SDK uses model ids and model versions to access the necessary
125+
JumpStart for the SageMaker Python SDK uses model IDs and model versions to access the necessary
60126
utilities. This table serves to provide the core material plus some extra information that can be useful
61-
in selecting the correct model id and corresponding parameters.\n
127+
in selecting the correct model ID and corresponding parameters.\n
62128
"""
63129
)
64130
file_content.append(
@@ -69,26 +135,29 @@ def create_jumpstart_model_table():
69135
)
70136
file_content.append(
71137
"""
72-
Each model id is linked to an external page that describes the model.\n
138+
Click on the Problem Type to navigate to the source of the model.\n
73139
"""
74140
)
75141
file_content.append("\n")
76142
file_content.append(".. list-table:: Available Models\n")
77-
file_content.append(" :widths: 50 20 20 20\n")
143+
file_content.append(" :widths: 50 20 20 20 30\n")
78144
file_content.append(" :header-rows: 1\n")
79145
file_content.append(" :class: datatable\n")
80146
file_content.append("\n")
81147
file_content.append(" * - Model ID\n")
82148
file_content.append(" - Fine Tunable?\n")
83149
file_content.append(" - Latest Version\n")
84150
file_content.append(" - Min SDK Version\n")
151+
file_content.append(" - Problem Type/Source\n")
85152

86153
for model in sdk_manifest_top_versions_for_models.values():
87154
model_spec = get_jumpstart_sdk_spec(model["spec_key"])
88-
file_content.append(" * - `{} <{}>`_\n".format(model_spec["model_id"], model_spec["url"]))
155+
model_task = get_model_task(model_spec["model_id"])
156+
file_content.append(" * - {}\n".format(model_spec["model_id"]))
89157
file_content.append(" - {}\n".format(model_spec["training_supported"]))
90158
file_content.append(" - {}\n".format(model["version"]))
91159
file_content.append(" - {}\n".format(model["min_version"]))
160+
file_content.append(" - `{} <{}>`__\n".format(model_task, model_spec["url"]))
92161

93162
f = open("doc_utils/jumpstart.rst", "w")
94163
f.writelines(file_content)

doc/overview.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -670,7 +670,7 @@ the ``model_id`` and ``model_version`` needed to retrieve the URI.
670670
model. To use the latest version, enter ``"*"``. This is a
671671
required parameter.
672672
673-
To retrieve a model, first select a ``model id`` and ``version`` from
673+
To retrieve a model, first select a ``model ID`` and ``version`` from
674674
the :doc:`available models <./doc_utils/jumpstart>`.
675675

676676
.. code:: python

src/sagemaker/estimator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -457,7 +457,7 @@ def __init__(
457457
self._hyperparameters = hyperparameters.copy() if hyperparameters else {}
458458
self.code_location = code_location
459459
self.entry_point = entry_point
460-
self.dependencies = dependencies
460+
self.dependencies = dependencies or []
461461
self.uploaded_code = None
462462
self.tags = add_jumpstart_tags(
463463
tags=tags, training_model_uri=self.model_uri, training_script_uri=self.source_dir

src/sagemaker/image_uri_config/huggingface-training-compiler.json

Lines changed: 64 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22
"training": {
33
"processors": ["gpu"],
44
"version_aliases": {
5-
"4.11": "4.11.0"
5+
"4.11": "4.11.0",
6+
"4.17": "4.17.0"
67
},
78
"versions": {
89
"4.11.0": {
@@ -32,6 +33,68 @@
3233
"repository": "huggingface-tensorflow-trcomp-training",
3334
"container_version": {"gpu":"cu112-ubuntu18.04"}
3435
}
36+
},
37+
"4.17.0": {
38+
"version_aliases": {
39+
"pytorch1.10": "pytorch1.10.2",
40+
"tensorflow2.6": "tensorflow2.6.3"
41+
},
42+
"pytorch1.10.2": {
43+
"py_versions": ["py38"],
44+
"registries": {
45+
"af-south-1": "626614931356",
46+
"ap-east-1": "871362719292",
47+
"ap-northeast-1": "763104351884",
48+
"ap-northeast-2": "763104351884",
49+
"ap-northeast-3": "364406365360",
50+
"ap-south-1": "763104351884",
51+
"ap-southeast-1": "763104351884",
52+
"ap-southeast-2": "763104351884",
53+
"ca-central-1": "763104351884",
54+
"eu-central-1": "763104351884",
55+
"eu-north-1": "763104351884",
56+
"eu-south-1": "692866216735",
57+
"eu-west-1": "763104351884",
58+
"eu-west-2": "763104351884",
59+
"eu-west-3": "763104351884",
60+
"me-south-1": "217643126080",
61+
"sa-east-1": "763104351884",
62+
"us-east-1": "763104351884",
63+
"us-east-2": "763104351884",
64+
"us-west-1": "763104351884",
65+
"us-west-2": "763104351884"
66+
},
67+
"repository": "huggingface-pytorch-trcomp-training",
68+
"container_version": {"gpu":"cu113-ubuntu20.04"}
69+
},
70+
"tensorflow2.6.3": {
71+
"py_versions": ["py38"],
72+
"registries": {
73+
"af-south-1": "626614931356",
74+
"ap-east-1": "871362719292",
75+
"ap-northeast-1": "763104351884",
76+
"ap-northeast-2": "763104351884",
77+
"ap-northeast-3": "364406365360",
78+
"ap-south-1": "763104351884",
79+
"ap-southeast-1": "763104351884",
80+
"ap-southeast-2": "763104351884",
81+
"ca-central-1": "763104351884",
82+
"eu-central-1": "763104351884",
83+
"eu-north-1": "763104351884",
84+
"eu-south-1": "692866216735",
85+
"eu-west-1": "763104351884",
86+
"eu-west-2": "763104351884",
87+
"eu-west-3": "763104351884",
88+
"me-south-1": "217643126080",
89+
"sa-east-1": "763104351884",
90+
"us-east-1": "763104351884",
91+
"us-east-2": "763104351884",
92+
"us-west-1": "763104351884",
93+
"us-west-2": "763104351884"
94+
},
95+
"repository": "huggingface-tensorflow-trcomp-training",
96+
"container_version": {"gpu":"cu112-ubuntu20.04"}
97+
}
3598
}
3699
}
37100
}

0 commit comments

Comments
 (0)