-
Notifications
You must be signed in to change notification settings - Fork 739
initial reference implementation for Breast Density FL Challenge #680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
wyli
merged 3 commits into
Project-MONAI:master
from
holgerroth:breast_density_fl_challenge
May 4, 2022
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Ignore the following files/folders during docker build | ||
|
||
__pycache__/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# IDE | ||
.idea/ | ||
|
||
# artifacts | ||
poc/ | ||
*.pyc | ||
result_* | ||
*.pth | ||
logs | ||
|
||
# example data | ||
*preprocessed* |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# use python base image | ||
FROM python:3.8.10 | ||
ENV DEBIAN_FRONTEND noninteractive | ||
|
||
# specify the server FQDN as commandline argument | ||
ARG server_fqdn | ||
RUN echo "Setting up FL workspace wit FQDN: ${server_fqdn}" | ||
|
||
# add your code to container | ||
COPY code /code | ||
|
||
# add code to path | ||
ENV PYTHONPATH=${PYTHONPATH}:"/code" | ||
|
||
# install dependencies | ||
# RUN python -m pip install --upgrade pip | ||
RUN pip3 install tensorboard sklearn torchvision | ||
RUN pip3 install monai==0.8.1 | ||
RUN pip3 install nvflare==2.0.16 | ||
|
||
# mount nvflare from source | ||
#RUN pip install tenseal | ||
#WORKDIR /code | ||
#RUN git clone https://github.com/NVIDIA/NVFlare.git | ||
#ENV PYTHONPATH=${PYTHONPATH}:"/code/NVFlare" | ||
|
||
# download pretrained weights | ||
ENV TORCH_HOME=/opt/torch | ||
RUN python3 /code/pt/utils/download_model.py --model_url=https://download.pytorch.org/models/resnet18-f37072fd.pth | ||
|
||
# prepare FL workspace | ||
WORKDIR /code | ||
RUN sed -i "s|{SERVER_FQDN}|${server_fqdn}|g" fl_project.yml | ||
RUN python3 -m nvflare.lighter.provision -p fl_project.yml | ||
RUN cp -r workspace/fl_project/prod_00 fl_workspace | ||
RUN mv fl_workspace/${server_fqdn} fl_workspace/server |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,176 @@ | ||
## MammoFL_MICCAI2022 | ||
|
||
Reference implementation for | ||
[ACR-NVIDIA-NCI Breast Density FL challenge](http://BreastDensityFL.acr.org). | ||
|
||
Held in conjunction with [MICCAI 2022](https://conferences.miccai.org/2022/en/). | ||
|
||
|
||
------------------------------------------------ | ||
## 1. Run Training using [NVFlare](https://github.com/NVIDIA/NVFlare) reference implementation | ||
|
||
We provide a minimal example of how to implement Federated Averaging using [NVFlare 2.0](https://github.com/NVIDIA/NVFlare) and [MONAI](https://monai.io/) to train | ||
a breast density prediction model with ResNet18. | ||
|
||
### 1.1 Download example data | ||
Follow the steps described in [./data/README.md](./data/README.md) to download an example breast density mammography dataset. | ||
Note, the data used in the actual challenge will be different. We do however follow the same preprocessing steps and | ||
use the same four BI-RADS breast density classes for prediction, See [./code/pt/utils/preprocess_dicomdir.py](./code/pt/utils/preprocess_dicomdir.py) for details. | ||
|
||
We provide a set of random data splits. Please download them using | ||
``` | ||
python3 ./code/pt/utils/download_datalists_and_predictions.py | ||
``` | ||
After download, they will be available as `./data/dataset_blinded_site-*.json` which follows the same format as what | ||
will be used in the challenge. | ||
Please do not modify the data list filenames in the configs as they will be the same during the challenge. | ||
|
||
Note, the location of the dataset and data lists will be given by the system. | ||
Do not change the locations given in [config_fed_client.json](./code/configs/mammo_fedavg/config/config_fed_client.json): | ||
``` | ||
"DATASET_ROOT": "/data/preprocessed", | ||
"DATALIST_PREFIX": "/data/dataset_blinded_", | ||
``` | ||
|
||
### 1.2 Build container | ||
The argument specifies the FQDN (Fully Qualified Domain Name) of the FL server. Use `localhost` when simulating FL on your machine. | ||
``` | ||
./build_docker.sh localhost | ||
``` | ||
Note, all code and pretrained models need to be included in the docker image. | ||
The virtual machines running the containers will not have public internet access during training. | ||
For an example, please see the `download_model.py` used to download ImageNet pretrained weights in this example. | ||
|
||
The Dockerfile will be submitted using the [MedICI platform](https://www.medici-challenges.org). | ||
For detailed instructions, see the [challenge website](http://BreastDensityFL.acr.org). | ||
|
||
### 1.3 Run server and clients containers, and start training | ||
Run all commands at once using. Note this will also create separate logs under `./logs` | ||
``` | ||
./run_all_fl.sh | ||
``` | ||
Note, the GPU index to use for each client is specified inside `run_all_fl.sh`. | ||
See the individual `run_docker_site-*.sh` commands described below. | ||
Note, the server script will automatically kill all running container used in this example | ||
and final results will be placed under `./result_server`. | ||
|
||
(optional) Run each command in a separate terminals to get site-specific printouts in separate windows. | ||
|
||
The argument for each shell script specifies the GPU index to be used. | ||
``` | ||
./run_docker_server.sh | ||
./run_docker_site-1.sh 0 | ||
./run_docker_site-2.sh 1 | ||
./run_docker_site-3.sh 0 | ||
``` | ||
|
||
### 1.4 (Optional) Visualize training using TensorBoard | ||
After training completed, the training curves can be visualized using | ||
``` | ||
tensorboard --logdir=./result_server | ||
``` | ||
A visualization of the global accuracy and [Kappa](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cohen_kappa_score.html) validation scores for each site with the provided example data is shown below. | ||
The current setup runs on a machine with two NVIDIA GPUs with 12GB memory each. | ||
The runtime for this experiment is about 45 minutes. | ||
You can adjust the argument to the `run_docker_site-*.sh` scripts to specify different | ||
GPU indices if needed in your environment. | ||
|
||
 | ||
|
||
### 1.5 (Optional) Kill all containers | ||
If you didn't use `run_all_fl.sh`, all containers can be killed by running | ||
``` | ||
docker kill server site-1 site-2 site-3 | ||
``` | ||
|
||
|
||
------------------------------------------------ | ||
## 2. Modify the FL algorithm | ||
|
||
You can modify and extend the provided example code under [./code/pt](./code/pt). | ||
|
||
You could use other components available at [NVFlare](https://github.com/NVIDIA/NVFlare) | ||
or enhance the training pipeline using your custom code or features of other libraries. | ||
|
||
See the [NVFlare examples](https://github.com/NVIDIA/NVFlare/tree/main/examples) for features that could be utilized in this challenge. | ||
|
||
### 2.1 Debugging the learning algorithm | ||
|
||
The example NVFlare `Learner` class is implemented at [./code/pt/learners/mammo_learner.py](./code/pt/learners/mammo_learner.py). | ||
You can debug the file using the `MockClientEngine` as shown in the script by running | ||
``` | ||
python3 code/pt/learners/mammo_learner.py | ||
``` | ||
Furthermore, you can test it inside the container, by first running | ||
``` | ||
./run_docker_debug.sh | ||
``` | ||
Note, set `inside_container = True` to reflect the changed filepaths inside the container. | ||
|
||
|
||
------------------------------------------------ | ||
## 3. Bring your own FL framework | ||
If you would like to use your own FL framework to participate in the challenge, | ||
please modify the Dockerfile accordingly to include all the dependencies. | ||
|
||
Your container needs to provide the following scripts that implement the starting of server, clients, and finalizing of the server. | ||
They will be executed by the system in the following order. | ||
|
||
### 3.1 start server | ||
``` | ||
/code/start_server.sh | ||
``` | ||
|
||
### 3.2 start each client (in parallel) | ||
``` | ||
/code/start_site-1.sh | ||
/code/start_site-2.sh | ||
/code/start_site-3.sh | ||
``` | ||
|
||
### 3.3 finalize the server | ||
``` | ||
/code/finalize_server.sh | ||
``` | ||
For an example on how the challenge system will execute these commands, see the provided `run_docker*.sh` scripts. | ||
|
||
### 3.4 Communication | ||
The communication channels for FL will be restricted to the ports specified in [fl_project.yml](./code/fl_project.yml). | ||
Your FL framework will also need those ports for implementing the communication. | ||
|
||
### 3.5 Results | ||
Results will need to be written to `/result/predictions.json`. | ||
Please follow the format produced by the reference implementation at [./result_server_example/predictions.json](./result_server_example/predictions.json) | ||
(available after running `python3 ./code/pt/utils/download_datalists_and_predictions.py`) | ||
The code is expected to return a json file containing at least list of image names and prediction probabilities for each breast density class | ||
for the global model (should be named `SRV_best_FL_global_model.pt`). | ||
``` | ||
{ | ||
"site-1": { | ||
"SRV_best_FL_global_model.pt": { | ||
... | ||
"test_probs": [{ | ||
"image": "Calc-Test_P_00643_LEFT_MLO.npy", | ||
"probs": [0.005602597258985043, 0.7612965703010559, 0.23040543496608734, 0.0026953918859362602] | ||
}, { | ||
... | ||
}, | ||
"site-2": { | ||
"SRV_best_FL_global_model.pt": { | ||
... | ||
"test_probs": [{ | ||
"image": "Calc-Test_P_00643_LEFT_MLO.npy", | ||
"probs": [0.005602597258985043, 0.7612965703010559, 0.23040543496608734, 0.0026953918859362602] | ||
}, { | ||
... | ||
}, | ||
"site-3": { | ||
"SRV_best_FL_global_model.pt": { | ||
... | ||
"test_probs": [{ | ||
"image": "Calc-Test_P_00643_LEFT_MLO.npy", | ||
"probs": [0.005602597258985043, 0.7612965703010559, 0.23040543496608734, 0.0026953918859362602] | ||
}, { | ||
... | ||
} | ||
``` |
15 changes: 15 additions & 0 deletions
15
federated_learning/breast_density_challenge/build_docker.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
#!/usr/bin/env bash | ||
|
||
#SERVER_FQDN="localhost" | ||
SERVER_FQDN=$1 | ||
|
||
if test -z "${SERVER_FQDN}" | ||
then | ||
echo "Usage: ./build_docker.sh [SERVER_FQDN], e.g. ./build_docker.sh localhost" | ||
exit 1 | ||
fi | ||
|
||
NEW_IMAGE=monai-nvflare:latest | ||
|
||
DOCKER_BUILDKIT=0 # show command outputs | ||
docker build --network=host -t ${NEW_IMAGE} --build-arg server_fqdn=${SERVER_FQDN} -f Dockerfile . |
51 changes: 51 additions & 0 deletions
51
...learning/breast_density_challenge/code/configs/mammo_fedavg/config/config_fed_client.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
{ | ||
"format_version": 2, | ||
|
||
"DATASET_ROOT": "/data/preprocessed", | ||
"DATALIST_PREFIX": "/data/dataset_blinded_", | ||
|
||
"executors": [ | ||
{ | ||
"tasks": [ | ||
"train", "submit_model", "validate" | ||
], | ||
"executor": { | ||
"id": "Executor", | ||
"path": "nvflare.app_common.executors.learner_executor.LearnerExecutor", | ||
"args": { | ||
"learner_id": "learner" | ||
} | ||
} | ||
} | ||
], | ||
|
||
"task_result_filters": [ | ||
], | ||
"task_data_filters": [ | ||
], | ||
|
||
"components": [ | ||
{ | ||
"id": "learner", | ||
"path": "pt.learners.mammo_learner.MammoLearner", | ||
"args": { | ||
"dataset_root": "{DATASET_ROOT}", | ||
"datalist_prefix": "{DATALIST_PREFIX}", | ||
"aggregation_epochs": 1, | ||
"lr": 2e-3, | ||
"batch_size": 64, | ||
"val_frac": 0.1 | ||
} | ||
}, | ||
{ | ||
"id": "analytic_sender", | ||
"name": "AnalyticsSender", | ||
"args": {} | ||
}, | ||
{ | ||
"id": "event_to_fed", | ||
"name": "ConvertToFedEvent", | ||
"args": {"events_to_convert": ["analytix_log_stats"], "fed_event_prefix": "fed."} | ||
} | ||
] | ||
} |
88 changes: 88 additions & 0 deletions
88
...learning/breast_density_challenge/code/configs/mammo_fedavg/config/config_fed_server.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
{ | ||
"format_version": 2, | ||
|
||
"min_clients": 3, | ||
"num_rounds": 100, | ||
|
||
"server": { | ||
"heart_beat_timeout": 600 | ||
}, | ||
"task_data_filters": [], | ||
"task_result_filters": [], | ||
"components": [ | ||
{ | ||
"id": "persistor", | ||
"name": "PTFileModelPersistor", | ||
"args": { | ||
"model": { | ||
"path": "monai.networks.nets.TorchVisionFCModel", | ||
"args": { | ||
"model_name": "resnet18", | ||
"n_classes": 4, | ||
"use_conv": false, | ||
"pretrained": true, | ||
"pool": null | ||
} | ||
} | ||
} | ||
}, | ||
{ | ||
"id": "shareable_generator", | ||
"name": "FullModelShareableGenerator", | ||
"args": {} | ||
}, | ||
{ | ||
"id": "aggregator", | ||
"name": "InTimeAccumulateWeightedAggregator", | ||
"args": {} | ||
}, | ||
{ | ||
"id": "model_selector", | ||
"name": "IntimeModelSelectionHandler", | ||
"args": {} | ||
}, | ||
{ | ||
"id": "model_locator", | ||
"name": "PTFileModelLocator", | ||
"args": { | ||
"pt_persistor_id": "persistor" | ||
} | ||
}, | ||
{ | ||
"id": "json_generator", | ||
"name": "ValidationJsonGenerator", | ||
"args": {} | ||
}, | ||
{ | ||
"id": "tb_analytics_receive", | ||
"name": "TBAnalyticsReceiver", | ||
"args": {"events": ["fed.analytix_log_stats"]} | ||
} | ||
], | ||
"workflows": [ | ||
{ | ||
"id": "scatter_gather_ctl", | ||
"name": "ScatterAndGather", | ||
"args": { | ||
"min_clients" : "{min_clients}", | ||
"num_rounds" : "{num_rounds}", | ||
"start_round": 0, | ||
"wait_time_after_min_received": 10, | ||
"aggregator_id": "aggregator", | ||
"persistor_id": "persistor", | ||
"shareable_generator_id": "shareable_generator", | ||
"train_task_name": "train", | ||
"train_timeout": 0 | ||
} | ||
}, | ||
{ | ||
"id": "global_model_eval", | ||
"name": "GlobalModelEval", | ||
"args": { | ||
"model_locator_id": "model_locator", | ||
"validation_timeout": 6000, | ||
"cleanup_models": true | ||
} | ||
} | ||
] | ||
} |
8 changes: 8 additions & 0 deletions
8
federated_learning/breast_density_challenge/code/finalize_server.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
#!/usr/bin/env bash | ||
SERVER="server" | ||
echo "FINALIZING ${CLIENT_NAME}" | ||
cp -r ./fl_workspace/${SERVER}/run_1 /result/. | ||
cp ./fl_workspace/${SERVER}/*.txt /result/. | ||
cp ./fl_workspace/*_log.txt /result/. | ||
cp ./fl_workspace/${SERVER}/run_1/cross_site_val/cross_val_results.json /result/predictions.json # only file required for leaderboard computation | ||
# TODO: might need some more standardization of the result folder |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.