Skip to content

Commit ff5ec7f

Browse files
committed
Revert "change: resolve merge conflicts for PR #3118"
This reverts commit 654cac4, reversing changes made to 7952ca3.
1 parent 654cac4 commit ff5ec7f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+322
-732
lines changed

CHANGELOG.md

Lines changed: 0 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,5 @@
11
# Changelog
22

3-
## v2.91.0 (2022-05-19)
4-
5-
### Features
6-
7-
* Support Properties for StepCollection
8-
9-
### Bug Fixes and Other Changes
10-
11-
* Prevent passing PipelineVariable object into image_uris.retrieve
12-
* support image_uri being property ref for model
13-
* ResourceConflictException from AWS Lambda on pipeline upsert
14-
15-
### Documentation Changes
16-
17-
* release notes for SMDDP 1.4.1 and SMDMP 1.9.0
18-
193
## v2.90.0 (2022-05-16)
204

215
### Features

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.91.1.dev0
1+
2.90.1.dev0

doc/api/training/sdp_versions/latest.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ depending on the version of the library you use.
2626
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-use-python-skd-api>`_
2727
for more information.
2828

29-
Version 1.4.0, 1.4.1 (Latest)
30-
=============================
29+
Version 1.4.0 (Latest)
30+
======================
3131

3232
.. toctree::
3333
:maxdepth: 1

doc/api/training/sdp_versions/v1.2.x/smd_data_parallel_pytorch.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -266,7 +266,7 @@ PyTorch API
266266
.. note::
267267

268268
The ``no_sync()`` context manager is available from smdistributed-dataparallel v1.2.2.
269-
To find the release note, see :ref:`sdp_release_note`.
269+
To find the release note, see :ref:`sdp_1.2.2_release_note`.
270270

271271
**Example:**
272272

doc/api/training/smd_data_parallel_release_notes/smd_data_parallel_change_log.rst

Lines changed: 7 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
.. _sdp_release_note:
1+
.. _sdp_1.2.2_release_note:
22

33
#############
44
Release Notes
@@ -7,44 +7,8 @@ Release Notes
77
New features, bug fixes, and improvements are regularly made to the SageMaker
88
distributed data parallel library.
99

10-
SageMaker Distributed Data Parallel 1.4.1 Release Notes
11-
=======================================================
12-
13-
*Date: May. 3. 2022*
14-
15-
**Currency Updates**
16-
17-
* Added support for PyTorch 1.11.0
18-
19-
**Known Issues**
20-
21-
* The library currently does not support the PyTorch sub-process groups API (torch.distributed.new_group (https://pytorch.org/docs/stable/distributed.html#torch.distributed.new_group)).
22-
23-
24-
**Migration to AWS Deep Learning Containers**
25-
26-
This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
27-
28-
- PyTorch 1.11.0 DLC
29-
30-
.. code::
31-
32-
763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker
33-
34-
Binary file of this version of the library for custom container users:
35-
36-
.. code::
37-
38-
https://smdataparallel.s3.amazonaws.com/binary/pytorch/1.11.0/cu113/2022-04-14/smdistributed_dataparallel-1.4.1-cp38-cp38-linux_x86_64.whl
39-
40-
41-
----
42-
43-
Release History
44-
===============
45-
4610
SageMaker Distributed Data Parallel 1.4.0 Release Notes
47-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
11+
=======================================================
4812

4913
*Date: Feb. 24. 2022*
5014

@@ -108,6 +72,11 @@ This version passed benchmark testing and is migrated to the following AWS Deep
10872
763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.10.2-gpu-py38-cu113-ubuntu20.04-sagemaker
10973
11074
75+
----
76+
77+
Release History
78+
===============
79+
11180
SageMaker Distributed Data Parallel 1.2.2 Release Notes
11281
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
11382

doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst

Lines changed: 7 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -5,40 +5,8 @@ Release Notes
55
New features, bug fixes, and improvements are regularly made to the SageMaker
66
distributed model parallel library.
77

8-
SageMaker Distributed Model Parallel 1.9.0 Release Notes
9-
========================================================
10-
11-
*Date: May. 3. 2022*
12-
13-
**Currency Updates**
14-
15-
* Added support for PyTorch 1.11.0
16-
17-
**Migration to AWS Deep Learning Containers**
18-
19-
This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
20-
21-
- PyTorch 1.11.0 DLC
22-
23-
.. code::
24-
25-
763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker
26-
27-
Binary file of this version of the library for custom container users:
28-
29-
.. code::
30-
31-
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.11.0/build-artifacts/2022-04-20-17-05/smdistributed_modelparallel-1.9.0-cp38-cp38-linux_x86_64.whl
32-
33-
34-
35-
----
36-
37-
Release History
38-
===============
39-
408
SageMaker Distributed Model Parallel 1.8.1 Release Notes
41-
--------------------------------------------------------
9+
========================================================
4210

4311
*Date: April. 23. 2022*
4412

@@ -91,6 +59,11 @@ This version passed benchmark testing and is migrated to the following AWS Deep
9159
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.10.0/build-artifacts/2022-04-14-03-58/smdistributed_modelparallel-1.8.1-cp38-cp38-linux_x86_64.whl
9260
9361
62+
----
63+
64+
Release History
65+
===============
66+
9467
SageMaker Distributed Model Parallel 1.8.0 Release Notes
9568
--------------------------------------------------------
9669

@@ -118,7 +91,7 @@ This version passed benchmark testing and is migrated to the following AWS Deep
11891
763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04
11992
12093
121-
The binary file of this version of the library for custom container users:
94+
* The binary file of this version of the library for custom container users
12295

12396
.. code::
12497

doc/api/training/smp_versions/latest.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ depending on which version of the library you need to use.
1010
To use the library, reference the
1111
**Common API** documentation alongside the framework specific API documentation.
1212

13-
Version 1.7.0, 1.8.0, 1.8.1, 1.9.0 (Latest)
14-
===========================================
13+
Version 1.7.0, 1.8.0, 1.8.1 (Latest)
14+
====================================
1515

1616
To use the library, reference the Common API documentation alongside the framework specific API documentation.
1717

src/sagemaker/fw_utils.py

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@
1616
import logging
1717
import os
1818
import re
19-
import time
2019
import shutil
2120
import tempfile
2221
from collections import namedtuple
@@ -25,7 +24,6 @@
2524
import sagemaker.image_uris
2625
from sagemaker.session_settings import SessionSettings
2726
import sagemaker.utils
28-
from sagemaker.workflow import is_pipeline_variable
2927

3028
from sagemaker.deprecations import renamed_warning
3129

@@ -397,10 +395,8 @@ def model_code_key_prefix(code_location_key_prefix, model_name, image):
397395
Returns:
398396
str: the key prefix to be used in uploading code
399397
"""
400-
name_from_image = f"/model_code/{int(time.time())}"
401-
if not is_pipeline_variable(image):
402-
name_from_image = sagemaker.utils.name_from_image(image)
403-
return "/".join(filter(None, [code_location_key_prefix, model_name or name_from_image]))
398+
training_job_name = sagemaker.utils.name_from_image(image)
399+
return "/".join(filter(None, [code_location_key_prefix, model_name or training_job_name]))
404400

405401

406402
def warn_if_parameter_server_with_multi_gpu(training_instance_type, distribution):

src/sagemaker/image_uris.py

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@
2323
from sagemaker.jumpstart.utils import is_jumpstart_model_input
2424
from sagemaker.spark import defaults
2525
from sagemaker.jumpstart import artifacts
26-
from sagemaker.workflow import is_pipeline_variable
2726

2827
logger = logging.getLogger(__name__)
2928

@@ -105,17 +104,11 @@ def retrieve(
105104
106105
Raises:
107106
NotImplementedError: If the scope is not supported.
108-
ValueError: If the combination of arguments specified is not supported or
109-
any PipelineVariable object is passed in.
107+
ValueError: If the combination of arguments specified is not supported.
110108
VulnerableJumpStartModelError: If any of the dependencies required by the script have
111109
known security vulnerabilities.
112110
DeprecatedJumpStartModelError: If the version of the model is deprecated.
113111
"""
114-
args = dict(locals())
115-
for name, val in args.items():
116-
if is_pipeline_variable(val):
117-
raise ValueError("%s should not be a pipeline variable (%s)" % (name, type(val)))
118-
119112
if is_jumpstart_model_input(model_id, model_version):
120113
return artifacts._retrieve_image_uri(
121114
model_id,

src/sagemaker/lambda_helper.py

Lines changed: 22 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@
1515

1616
from io import BytesIO
1717
import zipfile
18-
import time
1918
from botocore.exceptions import ClientError
2019
from sagemaker.session import Session
2120

@@ -135,35 +134,32 @@ def update(self):
135134
Returns: boto3 response from Lambda's update_function method.
136135
"""
137136
lambda_client = _get_lambda_client(self.session)
138-
retry_attempts = 7
139-
for i in range(retry_attempts):
137+
138+
if self.script is not None:
139+
try:
140+
response = lambda_client.update_function_code(
141+
FunctionName=self.function_name, ZipFile=_zip_lambda_code(self.script)
142+
)
143+
return response
144+
except ClientError as e:
145+
error = e.response["Error"]
146+
raise ValueError(error)
147+
else:
140148
try:
141-
if self.script is not None:
142-
response = lambda_client.update_function_code(
143-
FunctionName=self.function_name, ZipFile=_zip_lambda_code(self.script)
144-
)
145-
else:
146-
response = lambda_client.update_function_code(
147-
FunctionName=(self.function_name or self.function_arn),
148-
S3Bucket=self.s3_bucket,
149-
S3Key=_upload_to_s3(
150-
s3_client=_get_s3_client(self.session),
151-
function_name=self.function_name,
152-
zipped_code_dir=self.zipped_code_dir,
153-
s3_bucket=self.s3_bucket,
154-
),
155-
)
149+
response = lambda_client.update_function_code(
150+
FunctionName=(self.function_name or self.function_arn),
151+
S3Bucket=self.s3_bucket,
152+
S3Key=_upload_to_s3(
153+
s3_client=_get_s3_client(self.session),
154+
function_name=self.function_name,
155+
zipped_code_dir=self.zipped_code_dir,
156+
s3_bucket=self.s3_bucket,
157+
),
158+
)
156159
return response
157160
except ClientError as e:
158161
error = e.response["Error"]
159-
code = error["Code"]
160-
if code == "ResourceConflictException":
161-
if i == retry_attempts - 1:
162-
raise ValueError(error)
163-
# max wait time = 2**0 + 2**1 + .. + 2**6 = 127 seconds
164-
time.sleep(2**i)
165-
else:
166-
raise ValueError(error)
162+
raise ValueError(error)
167163

168164
def upsert(self):
169165
"""Method to create a lambda function or update it if it already exists

src/sagemaker/workflow/_utils.py

Lines changed: 7 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
import shutil
1818
import tarfile
1919
import tempfile
20-
from typing import List, Union, Optional, TYPE_CHECKING
20+
from typing import List, Union, Optional
2121
from sagemaker import image_uris
2222
from sagemaker.inputs import TrainingInput
2323
from sagemaker.estimator import EstimatorBase
@@ -34,9 +34,6 @@
3434
from sagemaker.utils import _save_model, download_file_from_url
3535
from sagemaker.workflow.retry import RetryPolicy
3636

37-
if TYPE_CHECKING:
38-
from sagemaker.workflow.step_collections import StepCollection
39-
4037
FRAMEWORK_VERSION = "0.23-1"
4138
INSTANCE_TYPE = "ml.m5.large"
4239
REPACK_SCRIPT = "_repack_model.py"
@@ -60,7 +57,7 @@ def __init__(
6057
description: str = None,
6158
source_dir: str = None,
6259
dependencies: List = None,
63-
depends_on: Optional[List[Union[str, Step, "StepCollection"]]] = None,
60+
depends_on: Union[List[str], List[Step]] = None,
6461
retry_policies: List[RetryPolicy] = None,
6562
subnets=None,
6663
security_group_ids=None,
@@ -127,9 +124,8 @@ def __init__(
127124
>>> |------ virtual-env
128125
129126
This is not supported with "local code" in Local Mode.
130-
depends_on (List[Union[str, Step, StepCollection]]): The list of `Step`/`StepCollection`
131-
names or `Step` instances or `StepCollection` instances that the current `Step`
132-
depends on (default: None).
127+
depends_on (List[str] or List[Step]): A list of step names or instances
128+
this step depends on (default: None).
133129
retry_policies (List[RetryPolicy]): The list of retry policies for the current step
134130
(default: None).
135131
subnets (list[str]): List of subnet ids. If not specified, the re-packing
@@ -278,7 +274,7 @@ def __init__(
278274
compile_model_family=None,
279275
display_name: str = None,
280276
description=None,
281-
depends_on: Optional[List[Union[str, Step, "StepCollection"]]] = None,
277+
depends_on: Optional[Union[List[str], List[Step]]] = None,
282278
retry_policies: Optional[List[RetryPolicy]] = None,
283279
tags=None,
284280
container_def_list=None,
@@ -316,9 +312,8 @@ def __init__(
316312
if specified, a compiled model will be used (default: None).
317313
display_name (str): The display name of this `_RegisterModelStep` step (default: None).
318314
description (str): Model Package description (default: None).
319-
depends_on (List[Union[str, Step, StepCollection]]): The list of `Step`/`StepCollection`
320-
names or `Step` instances or `StepCollection` instances that the current `Step`
321-
depends on (default: None).
315+
depends_on (List[str] or List[Step]): A list of step names or instances
316+
this step depends on (default: None).
322317
retry_policies (List[RetryPolicy]): The list of retry policies for the current step
323318
(default: None).
324319
tags (List[dict[str, str]]): A list of dictionaries containing key-value pairs used to

0 commit comments

Comments
 (0)