documentation: adding details about mpi options, other small updates #2135

TEChopra1000 · 2021-02-11T18:13:30Z

Description of changes:

Adding section to describe mpi options
Additional small additions
Fixing formatting in join() description.

Testing done:
tox -e black-check,flake8,pylint,docstyle,sphinx,doc8 --parallel all

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

I have read the CONTRIBUTING doc
I used the commit message format described in CONTRIBUTING
I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

I have added tests that prove my fix is effective or that my feature works (if appropriate)
I have checked that my tests are not configured for a specific region or account (if appropriate)
I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

sagemaker-bot · 2021-02-11T18:15:20Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-notebook-tests
Commit ID: 9ef1642
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-02-11T18:16:15Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-pr
Commit ID: 9ef1642
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-02-11T18:16:43Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: 9ef1642
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-02-11T18:22:29Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: 9ef1642
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

karakusc · 2021-02-11T18:44:06Z

doc/api/training/smd_model_parallel_general.rst

+* ``"enabled"``: Set to ``True`` to launch the training job with MPI.
+
+* ``"processes_per_host"``: Specifies the number of processes MPI should launch on each host.
+  In SageMaker a host is a single Amazon EC2 ml instance. The SageMaker Python SDK maintains


We should say "SageMaker modelparallel library maintains ..." instead of SageMaker Python SDK maintains...

karakusc · 2021-02-11T18:44:43Z

doc/api/training/smd_model_parallel_general.rst

+  <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-training-script.html#model-parallel-customize-training-script-pt-16>`_.
+
+  .. important::
+   ``process_per_host`` must be less than the number of GPUs per instance, and typically will be equal to


"...must be less than or equal to the number of GPUs per instance"

karakusc · 2021-02-11T18:45:12Z

doc/api/training/smd_model_parallel_general.rst

+  such as an ml.p3.16xlarge.
+
+  The following image illustrates how 2-way data parallelism and 4-way model parallelism is distributed across 8 GPUs:
+  the models is partitioned across 4 GPUs, and each partition is added to 2 GPUs.


models -> model

karakusc · 2021-02-11T18:46:18Z

doc/api/training/smp_versions/v1.2.0/smd_model_parallel_pytorch.rst

+   Unlike the original DDP wrapper, when you use ``DistributedModel``,
+   model parameters and buffers are not immediately broadcast across
+   processes when the wrapper is called. Instead, the broadcast is deferred to the first call of the
+   ``smp.step-decorated`` function when the partition is done.


Only smp.step should be in code style

sagemaker-bot · 2021-02-11T18:57:58Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-notebook-tests
Commit ID: 2c161b2
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-02-11T18:58:15Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-pr
Commit ID: 2c161b2
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-02-11T18:58:20Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: 2c161b2
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-02-11T19:05:02Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: 2c161b2
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-02-11T19:54:37Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-notebook-tests
Commit ID: e1e8b5e
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-02-11T19:55:18Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: e1e8b5e
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-02-11T19:56:36Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-pr
Commit ID: e1e8b5e
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-02-11T20:02:13Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: e1e8b5e
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

documentation: adding details about mpi options, other small updates

9ef1642

karakusc reviewed Feb 11, 2021

View reviewed changes

documentation: small fixes to sm dist. mp updates

2c161b2

karakusc approved these changes Feb 11, 2021

View reviewed changes

ngluna approved these changes Feb 11, 2021

View reviewed changes

Merge branch 'master' into master

e1e8b5e

TEChopra1000 merged commit 56d27d2 into aws:master Feb 11, 2021

documentation: adding details about mpi options, other small updates #2135

documentation: adding details about mpi options, other small updates #2135

Uh oh!

Conversation

TEChopra1000 commented Feb 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Checklist

General

Tests

Uh oh!

sagemaker-bot commented Feb 11, 2021

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Feb 11, 2021

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Feb 11, 2021

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Feb 11, 2021

AWS CodeBuild CI Report

Uh oh!

karakusc Feb 11, 2021

Choose a reason for hiding this comment

Uh oh!

karakusc Feb 11, 2021

Choose a reason for hiding this comment

Uh oh!

karakusc Feb 11, 2021

Choose a reason for hiding this comment

Uh oh!

karakusc Feb 11, 2021

Choose a reason for hiding this comment

Uh oh!

sagemaker-bot commented Feb 11, 2021

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Feb 11, 2021

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Feb 11, 2021

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Feb 11, 2021

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Feb 11, 2021

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Feb 11, 2021

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Feb 11, 2021

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Feb 11, 2021

AWS CodeBuild CI Report

Uh oh!

Uh oh!

TEChopra1000 commented Feb 11, 2021 •

edited

Loading