Skip to content

Commit 92e29d1

Browse files
authored
Merge branch 'master' into shortUri
2 parents f7c58a1 + e3c54e1 commit 92e29d1

File tree

5 files changed

+72
-5
lines changed

5 files changed

+72
-5
lines changed

CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,17 @@
11
# Changelog
22

3+
## v2.23.6 (2021-01-20)
4+
5+
### Bug Fixes and Other Changes
6+
7+
* add artifact, action, context to virsualizer
8+
9+
## v2.23.5 (2021-01-18)
10+
11+
### Bug Fixes and Other Changes
12+
13+
* increase time allowed for trial components to index
14+
315
## v2.23.4.post0 (2021-01-14)
416

517
### Documentation Changes

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.23.5.dev0
1+
2.23.7.dev0

doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,44 @@
1+
# Sagemaker Distributed Model Parallel 1.2.0 Release Notes
2+
3+
- New Features
4+
- Bug Fixes
5+
- Known Issues
6+
7+
## New Features
8+
9+
### PyTorch
10+
11+
#### Add support for PyTorch 1.7
12+
13+
- Adds support for `gradient_as_bucket_view` (PyTorch 1.7 only), `find_unused_parameters` (PyTorch 1.7 only) and `broadcast_buffers` options to `smp.DistributedModel`. These options behave the same as the corresponding options (with the same names) in
14+
`torch.DistributedDataParallel` API. Please refer to the [SageMaker distributed model parallel API documentation](https://sagemaker.readthedocs.io/en/stable/api/training/smd_model_parallel_pytorch.html#smp.DistributedModel) for more information.
15+
16+
- Adds support for `join` (PyTorch 1.7 only) context manager, which is to be used in conjunction with an instance of `smp.DistributedModel` to be able to train with uneven inputs across participating processes.
17+
18+
- Adds support for `_register_comm_hook` (PyTorch 1.7 only) which will register the callable as a communication hook for DDP. NOTE: Like in DDP, this is an experimental API and subject to change.
19+
20+
### Tensorflow
21+
22+
- Adds support for Tensorflow 2.4
23+
24+
## Bug Fixes
25+
26+
### PyTorch
27+
28+
- `Serialization`: Fix a bug with serialization/flattening where instances of subclasses of dict/OrderedDicts were serialized/deserialized or internally flattened/unflattened as
29+
regular dicts.
30+
31+
### Tensorflow
32+
33+
- Fix a bug that may cause a hang during evaluation when there is no model input for one partition.
34+
35+
## Known Issues
36+
37+
### PyTorch
38+
39+
- A performance regression was observed when training on SMP with PyTorch 1.7.1 compared to 1.6. The rootcause was found to be the slowdown in performance of `.grad` method calls in PyTorch 1.7.1 compared to 1.6. Please see the related discussion: https://github.com/pytorch/pytorch/issues/50636.
40+
41+
142
# Sagemaker Distributed Model Parallel 1.1.0 Release Notes
243

344
- New Features

src/sagemaker/lineage/visualizer.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,9 @@ def show(
3737
pipeline_execution_step=None,
3838
model_package_arn=None,
3939
endpoint_arn=None,
40+
artifact_arn=None,
41+
context_arn=None,
42+
actions_arn=None,
4043
):
4144
"""Generate a dataframe containing all incoming and outgoing lineage entities.
4245
@@ -55,6 +58,9 @@ def show(
5558
pipeline_execution_step (obj, optional): Pipeline execution step. Defaults to None.
5659
model_package_arn (str, optional): Model package arn. Defaults to None.
5760
endpoint_arn (str, optional): Endpoint arn. Defaults to None.
61+
artifact_arn (str, optional): Artifact arn. Defaults to None.
62+
context_arn (str, optional): Context arn. Defaults to None.
63+
actions_arn (str, optional): Action arn. Defaults to None.
5864
5965
Returns:
6066
DataFrame: Pandas dataframe containing lineage associations.
@@ -75,6 +81,12 @@ def show(
7581
start_arn = self._get_start_arn_from_model_package_arn(model_package_arn)
7682
elif endpoint_arn:
7783
start_arn = self._get_start_arn_from_endpoint_arn(endpoint_arn)
84+
elif artifact_arn:
85+
start_arn = artifact_arn
86+
elif context_arn:
87+
start_arn = context_arn
88+
elif actions_arn:
89+
start_arn = actions_arn
7890

7991
return self._get_associations_dataframe(start_arn)
8092

tests/integ/sagemaker/lineage/test_artifact.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -78,10 +78,12 @@ def test_list(artifact_objs, sagemaker_session):
7878

7979

8080
def test_downstream_trials(trial_associated_artifact, trial_obj, sagemaker_session):
81-
# wait for TC to index
82-
time.sleep(3)
83-
84-
trials = trial_associated_artifact.downstream_trials(sagemaker_session=sagemaker_session)
81+
# allow trial components to index, 30 seconds max
82+
for i in range(3):
83+
time.sleep(10)
84+
trials = trial_associated_artifact.downstream_trials(sagemaker_session=sagemaker_session)
85+
if len(trials) > 0:
86+
break
8587

8688
assert len(trials) == 1
8789
assert trial_obj.trial_name in trials

0 commit comments

Comments
 (0)