Skip to content

Commit da841d7

Browse files
committed
add README with directory structure and run instructions
1 parent d0836f9 commit da841d7

File tree

1 file changed

+46
-0
lines changed

1 file changed

+46
-0
lines changed

examples/MNIST/README.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# PnetCDF-python MNIST example
2+
3+
This directory contains the description and run instructions for the MNIST example Python programs that utilize PnetCDF for file I/O and parallel training with MNIST data.
4+
5+
## Directory Structure
6+
7+
- **MNIST_data**: This folder contains a mini MNIST test dataset stored in a NetCDF file (`mnist_images_mini.nc`). The file includes:
8+
- 60 training samples
9+
- 12 testing samples
10+
11+
- **MNIST_codes**: This folder contains the example MNIST training code. The example code is based on the [PyTorch MNIST example](https://github.com/pytorch/examples/tree/main/mnist) and uses `DistributedDataParallel` for parallel training.
12+
13+
## Running the MNIST Example Program
14+
15+
To run the MNIST example program, use the `mpiexec` command. The example below runs the program on 4 MPI processes.
16+
17+
### Command:
18+
19+
```sh
20+
mpiexec -n 4 python main.py
21+
```
22+
23+
### Expected Output:
24+
25+
When using 4 MPI processes, the output is expected to be similar to the following:
26+
27+
```sh
28+
nprocs = 4 rank = 0 device = cpu mpi_size = 4 mpi_rank = 0
29+
nprocs = 4 rank = 2 device = cpu mpi_size = 4 mpi_rank = 2
30+
nprocs = 4 rank = 1 device = cpu mpi_size = 4 mpi_rank = 1
31+
nprocs = 4 rank = 3 device = cpu mpi_size = 4 mpi_rank = 3
32+
33+
Train Epoch: 1 Average Loss: 2.288340
34+
Test set: Average loss: 2.7425, Accuracy: 0/12 (0%)
35+
36+
Train Epoch: 2 Average Loss: 2.490800
37+
Test set: Average loss: 1.9361, Accuracy: 6/12 (50%)
38+
39+
Train Epoch: 3 Average Loss: 2.216520
40+
Test set: Average loss: 1.8703, Accuracy: 7/12 (58%)
41+
```
42+
43+
### Notes:
44+
- The test set accuracy may vary slightly depending on how the data is distributed across the MPI processes.
45+
- The accuracy and loss reported after each epoch are averaged across all MPI processes.
46+

0 commit comments

Comments
 (0)