|
1 |
| -# Example Python programs that use Pytorch Distributed Data Parallel module |
| 1 | +# Example Python programs that use Pytorch DDP module |
2 | 2 |
|
3 | 3 | This directory contains example python programs that make use of Pytorch
|
4 |
| -Distributed Data Parallel (DDP) module and MPI to run on multiple MPI processes |
5 |
| -in parallel. Detailed information describing the example programs is provided |
6 |
| -at the beginning of each file. |
| 4 | +Distributed Data Parallel |
| 5 | +([DDP](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html)) module |
| 6 | +and [mpi4pi](https://mpi4py.readthedocs.io/en/stable/) to run on multiple MPI |
| 7 | +processes in parallel. Detailed information describing the example programs is |
| 8 | +provided at the beginning of each file. |
7 | 9 |
|
8 |
| -## [torch_ddp_skeleton.py](./torch_ddp_skeleton.py) shows how to set up the MPI |
9 |
| -and DDP environment to run a program in parallel. |
| 10 | +* [torch_ddp_skeleton.py](#torch_ddp_skeleton_py) -- a template for using |
| 11 | + Pytorch DDP |
10 | 12 |
|
11 |
| -Command usage: |
12 |
| -```sh |
13 |
| -% mpiexec -n 4 python ./torch_ddp_skeleton.py |
14 |
| -nprocs = 4 rank = 0 device = cpu |
15 |
| -nprocs = 4 rank = 1 device = cpu |
16 |
| -nprocs = 4 rank = 2 device = cpu |
17 |
| -nprocs = 4 rank = 3 device = cpu |
18 |
| -``` |
| 13 | +--- |
19 | 14 |
|
| 15 | +## torch_ddp_skeleton_py |
| 16 | +[torch_ddp_skeleton.py](./torch_ddp_skeleton.py) is a skeleton program showing |
| 17 | +how to set up the MPI and DDP environment to run a program in parallel. |
| 18 | + |
| 19 | +* Command usage and output on screen: |
| 20 | + ```sh |
| 21 | + % mpiexec -n 4 python ./torch_ddp_skeleton.py |
| 22 | + nprocs = 4 rank = 0 device = cpu |
| 23 | + nprocs = 4 rank = 1 device = cpu |
| 24 | + nprocs = 4 rank = 2 device = cpu |
| 25 | + nprocs = 4 rank = 3 device = cpu |
| 26 | + ``` |
20 | 27 |
|
0 commit comments