@@ -312,7 +312,7 @@ For the ``"mpi"`` key, a dict must be passed which contains:
312
312
* ``"enabled" ``: Set to ``True `` to launch the training job with MPI.
313
313
314
314
* ``"processes_per_host" ``: Specifies the number of processes MPI should launch on each host.
315
- In SageMaker a host is a single Amazon EC2 ml instance. The SageMaker Python SDK maintains
315
+ In SageMaker a host is a single Amazon EC2 ml instance. The SageMaker distributed model parallel library maintains
316
316
a one-to-one mapping between processes and GPUs across model and data parallelism.
317
317
This means that SageMaker schedules each process on a single, separate GPU and no GPU contains more than one process.
318
318
If you are using PyTorch, you must restrict each process to its own device using
@@ -321,15 +321,15 @@ For the ``"mpi"`` key, a dict must be passed which contains:
321
321
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-training-script.html#model-parallel-customize-training-script-pt-16> `_.
322
322
323
323
.. important ::
324
- ``process_per_host `` must be less than the number of GPUs per instance, and typically will be equal to
324
+ ``process_per_host `` must be less than or equal to the number of GPUs per instance, and typically will be equal to
325
325
the number of GPUs per instance.
326
326
327
327
For example, if you use one instance with 4-way model parallelism and 2-way data parallelism,
328
328
then processes_per_host should be 2 x 4 = 8. Therefore, you must choose an instance that has at least 8 GPUs,
329
329
such as an ml.p3.16xlarge.
330
330
331
331
The following image illustrates how 2-way data parallelism and 4-way model parallelism is distributed across 8 GPUs:
332
- the models is partitioned across 4 GPUs, and each partition is added to 2 GPUs.
332
+ the model is partitioned across 4 GPUs, and each partition is added to 2 GPUs.
333
333
334
334
.. image :: smp_versions/model-data-parallel.png
335
335
:width: 650
0 commit comments