Skip to content

initial reference implementation for Breast Density FL Challenge #680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions federated_learning/breast_density_challenge/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Ignore the following files/folders during docker build

__pycache__/
12 changes: 12 additions & 0 deletions federated_learning/breast_density_challenge/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# IDE
.idea/

# artifacts
poc/
*.pyc
result_*
*.pth
logs

# example data
*preprocessed*
36 changes: 36 additions & 0 deletions federated_learning/breast_density_challenge/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# use python base image
FROM python:3.8.10
ENV DEBIAN_FRONTEND noninteractive

# specify the server FQDN as commandline argument
ARG server_fqdn
RUN echo "Setting up FL workspace wit FQDN: ${server_fqdn}"

# add your code to container
COPY code /code

# add code to path
ENV PYTHONPATH=${PYTHONPATH}:"/code"

# install dependencies
# RUN python -m pip install --upgrade pip
RUN pip3 install tensorboard sklearn torchvision
RUN pip3 install monai==0.8.1
RUN pip3 install nvflare==2.0.16

# mount nvflare from source
#RUN pip install tenseal
#WORKDIR /code
#RUN git clone https://github.com/NVIDIA/NVFlare.git
#ENV PYTHONPATH=${PYTHONPATH}:"/code/NVFlare"

# download pretrained weights
ENV TORCH_HOME=/opt/torch
RUN python3 /code/pt/utils/download_model.py --model_url=https://download.pytorch.org/models/resnet18-f37072fd.pth

# prepare FL workspace
WORKDIR /code
RUN sed -i "s|{SERVER_FQDN}|${server_fqdn}|g" fl_project.yml
RUN python3 -m nvflare.lighter.provision -p fl_project.yml
RUN cp -r workspace/fl_project/prod_00 fl_workspace
RUN mv fl_workspace/${server_fqdn} fl_workspace/server
176 changes: 176 additions & 0 deletions federated_learning/breast_density_challenge/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
## MammoFL_MICCAI2022

Reference implementation for
[ACR-NVIDIA-NCI Breast Density FL challenge](http://BreastDensityFL.acr.org).

Held in conjunction with [MICCAI 2022](https://conferences.miccai.org/2022/en/).


------------------------------------------------
## 1. Run Training using [NVFlare](https://github.com/NVIDIA/NVFlare) reference implementation

We provide a minimal example of how to implement Federated Averaging using [NVFlare 2.0](https://github.com/NVIDIA/NVFlare) and [MONAI](https://monai.io/) to train
a breast density prediction model with ResNet18.

### 1.1 Download example data
Follow the steps described in [./data/README.md](./data/README.md) to download an example breast density mammography dataset.
Note, the data used in the actual challenge will be different. We do however follow the same preprocessing steps and
use the same four BI-RADS breast density classes for prediction, See [./code/pt/utils/preprocess_dicomdir.py](./code/pt/utils/preprocess_dicomdir.py) for details.

We provide a set of random data splits. Please download them using
```
python3 ./code/pt/utils/download_datalists_and_predictions.py
```
After download, they will be available as `./data/dataset_blinded_site-*.json` which follows the same format as what
will be used in the challenge.
Please do not modify the data list filenames in the configs as they will be the same during the challenge.

Note, the location of the dataset and data lists will be given by the system.
Do not change the locations given in [config_fed_client.json](./code/configs/mammo_fedavg/config/config_fed_client.json):
```
"DATASET_ROOT": "/data/preprocessed",
"DATALIST_PREFIX": "/data/dataset_blinded_",
```

### 1.2 Build container
The argument specifies the FQDN (Fully Qualified Domain Name) of the FL server. Use `localhost` when simulating FL on your machine.
```
./build_docker.sh localhost
```
Note, all code and pretrained models need to be included in the docker image.
The virtual machines running the containers will not have public internet access during training.
For an example, please see the `download_model.py` used to download ImageNet pretrained weights in this example.

The Dockerfile will be submitted using the [MedICI platform](https://www.medici-challenges.org).
For detailed instructions, see the [challenge website](http://BreastDensityFL.acr.org).

### 1.3 Run server and clients containers, and start training
Run all commands at once using. Note this will also create separate logs under `./logs`
```
./run_all_fl.sh
```
Note, the GPU index to use for each client is specified inside `run_all_fl.sh`.
See the individual `run_docker_site-*.sh` commands described below.
Note, the server script will automatically kill all running container used in this example
and final results will be placed under `./result_server`.

(optional) Run each command in a separate terminals to get site-specific printouts in separate windows.

The argument for each shell script specifies the GPU index to be used.
```
./run_docker_server.sh
./run_docker_site-1.sh 0
./run_docker_site-2.sh 1
./run_docker_site-3.sh 0
```

### 1.4 (Optional) Visualize training using TensorBoard
After training completed, the training curves can be visualized using
```
tensorboard --logdir=./result_server
```
A visualization of the global accuracy and [Kappa](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cohen_kappa_score.html) validation scores for each site with the provided example data is shown below.
The current setup runs on a machine with two NVIDIA GPUs with 12GB memory each.
The runtime for this experiment is about 45 minutes.
You can adjust the argument to the `run_docker_site-*.sh` scripts to specify different
GPU indices if needed in your environment.

![](./figs/example_data_val_global_acc_kappa.png)

### 1.5 (Optional) Kill all containers
If you didn't use `run_all_fl.sh`, all containers can be killed by running
```
docker kill server site-1 site-2 site-3
```


------------------------------------------------
## 2. Modify the FL algorithm

You can modify and extend the provided example code under [./code/pt](./code/pt).

You could use other components available at [NVFlare](https://github.com/NVIDIA/NVFlare)
or enhance the training pipeline using your custom code or features of other libraries.

See the [NVFlare examples](https://github.com/NVIDIA/NVFlare/tree/main/examples) for features that could be utilized in this challenge.

### 2.1 Debugging the learning algorithm

The example NVFlare `Learner` class is implemented at [./code/pt/learners/mammo_learner.py](./code/pt/learners/mammo_learner.py).
You can debug the file using the `MockClientEngine` as shown in the script by running
```
python3 code/pt/learners/mammo_learner.py
```
Furthermore, you can test it inside the container, by first running
```
./run_docker_debug.sh
```
Note, set `inside_container = True` to reflect the changed filepaths inside the container.


------------------------------------------------
## 3. Bring your own FL framework
If you would like to use your own FL framework to participate in the challenge,
please modify the Dockerfile accordingly to include all the dependencies.

Your container needs to provide the following scripts that implement the starting of server, clients, and finalizing of the server.
They will be executed by the system in the following order.

### 3.1 start server
```
/code/start_server.sh
```

### 3.2 start each client (in parallel)
```
/code/start_site-1.sh
/code/start_site-2.sh
/code/start_site-3.sh
```

### 3.3 finalize the server
```
/code/finalize_server.sh
```
For an example on how the challenge system will execute these commands, see the provided `run_docker*.sh` scripts.

### 3.4 Communication
The communication channels for FL will be restricted to the ports specified in [fl_project.yml](./code/fl_project.yml).
Your FL framework will also need those ports for implementing the communication.

### 3.5 Results
Results will need to be written to `/result/predictions.json`.
Please follow the format produced by the reference implementation at [./result_server_example/predictions.json](./result_server_example/predictions.json)
(available after running `python3 ./code/pt/utils/download_datalists_and_predictions.py`)
The code is expected to return a json file containing at least list of image names and prediction probabilities for each breast density class
for the global model (should be named `SRV_best_FL_global_model.pt`).
```
{
"site-1": {
"SRV_best_FL_global_model.pt": {
...
"test_probs": [{
"image": "Calc-Test_P_00643_LEFT_MLO.npy",
"probs": [0.005602597258985043, 0.7612965703010559, 0.23040543496608734, 0.0026953918859362602]
}, {
...
},
"site-2": {
"SRV_best_FL_global_model.pt": {
...
"test_probs": [{
"image": "Calc-Test_P_00643_LEFT_MLO.npy",
"probs": [0.005602597258985043, 0.7612965703010559, 0.23040543496608734, 0.0026953918859362602]
}, {
...
},
"site-3": {
"SRV_best_FL_global_model.pt": {
...
"test_probs": [{
"image": "Calc-Test_P_00643_LEFT_MLO.npy",
"probs": [0.005602597258985043, 0.7612965703010559, 0.23040543496608734, 0.0026953918859362602]
}, {
...
}
```
15 changes: 15 additions & 0 deletions federated_learning/breast_density_challenge/build_docker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env bash

#SERVER_FQDN="localhost"
SERVER_FQDN=$1

if test -z "${SERVER_FQDN}"
then
echo "Usage: ./build_docker.sh [SERVER_FQDN], e.g. ./build_docker.sh localhost"
exit 1
fi

NEW_IMAGE=monai-nvflare:latest

DOCKER_BUILDKIT=0 # show command outputs
docker build --network=host -t ${NEW_IMAGE} --build-arg server_fqdn=${SERVER_FQDN} -f Dockerfile .
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
{
"format_version": 2,

"DATASET_ROOT": "/data/preprocessed",
"DATALIST_PREFIX": "/data/dataset_blinded_",

"executors": [
{
"tasks": [
"train", "submit_model", "validate"
],
"executor": {
"id": "Executor",
"path": "nvflare.app_common.executors.learner_executor.LearnerExecutor",
"args": {
"learner_id": "learner"
}
}
}
],

"task_result_filters": [
],
"task_data_filters": [
],

"components": [
{
"id": "learner",
"path": "pt.learners.mammo_learner.MammoLearner",
"args": {
"dataset_root": "{DATASET_ROOT}",
"datalist_prefix": "{DATALIST_PREFIX}",
"aggregation_epochs": 1,
"lr": 2e-3,
"batch_size": 64,
"val_frac": 0.1
}
},
{
"id": "analytic_sender",
"name": "AnalyticsSender",
"args": {}
},
{
"id": "event_to_fed",
"name": "ConvertToFedEvent",
"args": {"events_to_convert": ["analytix_log_stats"], "fed_event_prefix": "fed."}
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
{
"format_version": 2,

"min_clients": 3,
"num_rounds": 100,

"server": {
"heart_beat_timeout": 600
},
"task_data_filters": [],
"task_result_filters": [],
"components": [
{
"id": "persistor",
"name": "PTFileModelPersistor",
"args": {
"model": {
"path": "monai.networks.nets.TorchVisionFCModel",
"args": {
"model_name": "resnet18",
"n_classes": 4,
"use_conv": false,
"pretrained": true,
"pool": null
}
}
}
},
{
"id": "shareable_generator",
"name": "FullModelShareableGenerator",
"args": {}
},
{
"id": "aggregator",
"name": "InTimeAccumulateWeightedAggregator",
"args": {}
},
{
"id": "model_selector",
"name": "IntimeModelSelectionHandler",
"args": {}
},
{
"id": "model_locator",
"name": "PTFileModelLocator",
"args": {
"pt_persistor_id": "persistor"
}
},
{
"id": "json_generator",
"name": "ValidationJsonGenerator",
"args": {}
},
{
"id": "tb_analytics_receive",
"name": "TBAnalyticsReceiver",
"args": {"events": ["fed.analytix_log_stats"]}
}
],
"workflows": [
{
"id": "scatter_gather_ctl",
"name": "ScatterAndGather",
"args": {
"min_clients" : "{min_clients}",
"num_rounds" : "{num_rounds}",
"start_round": 0,
"wait_time_after_min_received": 10,
"aggregator_id": "aggregator",
"persistor_id": "persistor",
"shareable_generator_id": "shareable_generator",
"train_task_name": "train",
"train_timeout": 0
}
},
{
"id": "global_model_eval",
"name": "GlobalModelEval",
"args": {
"model_locator_id": "model_locator",
"validation_timeout": 6000,
"cleanup_models": true
}
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env bash
SERVER="server"
echo "FINALIZING ${CLIENT_NAME}"
cp -r ./fl_workspace/${SERVER}/run_1 /result/.
cp ./fl_workspace/${SERVER}/*.txt /result/.
cp ./fl_workspace/*_log.txt /result/.
cp ./fl_workspace/${SERVER}/run_1/cross_site_val/cross_val_results.json /result/predictions.json # only file required for leaderboard computation
# TODO: might need some more standardization of the result folder
Loading