|
5 | 5 |
|
6 | 6 | .. _sm-sdk-modelparallel-params:
|
7 | 7 |
|
8 |
| -SageMaker Python SDK ``modelparallel`` parameters |
9 |
| -================================================= |
| 8 | +Required SageMaker Python SDK parameters |
| 9 | +======================================== |
10 | 10 |
|
11 | 11 | The TensorFlow and PyTorch ``Estimator`` objects contains a ``distribution`` parameter,
|
12 | 12 | which is used to enable and specify parameters for the
|
13 | 13 | initialization of the SageMaker distributed model parallel library. The library internally uses MPI,
|
14 |
| -so in order to use model parallelism, MPI must be enabled using the ``distribution`` parameter. |
| 14 | +so in order to use model parallelism, MPI must also be enabled using the ``distribution`` parameter. |
15 | 15 |
|
16 | 16 | The following is an example of how you can launch a new PyTorch training job with the library.
|
17 | 17 |
|
@@ -55,6 +55,9 @@ The following is an example of how you can launch a new PyTorch training job wit
|
55 | 55 |
|
56 | 56 | smd_mp_estimator.fit('s3://my_bucket/my_training_data/')
|
57 | 57 |
|
| 58 | +``smdistributed`` Parameters |
| 59 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 60 | + |
58 | 61 | You can use the following parameters to initialize the library using the ``parameters``
|
59 | 62 | in the ``smdistributed`` of ``distribution``.
|
60 | 63 |
|
@@ -302,6 +305,41 @@ table are optional.
|
302 | 305 | | | | | SageMaker. |
|
303 | 306 | +-------------------+-------------------------+-----------------+-----------------------------------+
|
304 | 307 |
|
| 308 | +``mpi`` Parameters |
| 309 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 310 | +For the ``"mpi"`` key, a dict must be passed which contains: |
| 311 | + |
| 312 | +* ``"enabled"``: Set to ``True`` to launch the training job with MPI. |
| 313 | + |
| 314 | +* ``"processes_per_host"``: Specifies the number of processes MPI should launch on each host. |
| 315 | + In SageMaker a host is a single Amazon EC2 ml instance. The SageMaker Python SDK maintains |
| 316 | + a one-to-one mapping between processes and GPUs across model and data parallelism. |
| 317 | + This means that SageMaker schedules each process on a single, separate GPU and no GPU contains more than one process. |
| 318 | + If you are using PyTorch, you must restrict each process to its own device using |
| 319 | + ``torch.cuda.set_device(smp.local_rank())``. To learn more, see |
| 320 | + `Modify a PyTorch Training Script |
| 321 | + <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-training-script.html#model-parallel-customize-training-script-pt-16>`_. |
| 322 | + |
| 323 | + .. important:: |
| 324 | + ``process_per_host`` must be less than the number of GPUs per instance, and typically will be equal to |
| 325 | + the number of GPUs per instance. |
| 326 | + |
| 327 | + For example, if you use one instance with 4-way model parallelism and 2-way data parallelism, |
| 328 | + then processes_per_host should be 2 x 4 = 8. Therefore, you must choose an instance that has at least 8 GPUs, |
| 329 | + such as an ml.p3.16xlarge. |
| 330 | + |
| 331 | + The following image illustrates how 2-way data parallelism and 4-way model parallelism is distributed across 8 GPUs: |
| 332 | + the models is partitioned across 4 GPUs, and each partition is added to 2 GPUs. |
| 333 | + |
| 334 | + .. image:: smp_versions/model-data-parallel.png |
| 335 | + :width: 650 |
| 336 | + :alt: 2-way data parallelism and 4-way model parallelism distributed across 8 GPUs |
| 337 | + |
| 338 | + |
| 339 | +* ``"custom_mpi_options"``: Use this key to pass any custom MPI options you might need. |
| 340 | + To avoid Docker warnings from contaminating your training logs, we recommend the following flag. |
| 341 | + ```--mca btl_vader_single_copy_mechanism none``` |
| 342 | + |
305 | 343 |
|
306 | 344 | .. _ranking-basics:
|
307 | 345 |
|
|
0 commit comments