Update the fast training with profiling tool #757

yuchen-xu · 2022-06-16T03:19:49Z

Signed-off-by: Yuchen Xu [email protected]

Fixes #729 .

Description

Made changes in acceleration/fast_training_tutorial.ipynb and added a profiling option. When not profiling, the notebook runs like before. When profiling, the notebook must be run in the terminal as one piece (command provided), as Nsight systems does not support running from inside the script.

Status

Ready

Checks

Signed-off-by: Yuchen Xu <[email protected]>

review-notebook-app · 2022-06-16T03:19:53Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Nic-Ma · 2022-06-23T02:08:06Z

I think the outputs folder can be removed?
@wyli Could you please help take a look at this PR first? You raised the feature request.

Thanks in advance.

wyli · 2022-06-23T07:44:36Z

thanks, it looks great. cc @AHarouni

some minor points perhaps beyond this profiling example:

any insight about further improving the speed or GPU utilization?
could try this new flag for thread loader? https://github.com/ericspod/MONAI/blob/22924f5709e8c3b7786f3e201bf91be24a44da6e/monai/data/thread_buffer.py#L182
might be interesting to try this recent feature of instancenorm https://github.com/Project-MONAI/MONAI/blob/812275ae5f4a1bc8a69b70a9823632b6f30a5724/monai/networks/layers/factories.py#L253 the current network is with batch norm, but I think the performance comparison using the enhanced instance norm is useful

yuchen-xu · 2022-06-23T16:13:03Z

@Nic-Ma I think the outputs/ folder makes outputs easier to manage (so that they don't live in the same directory as all the other tutorials), and allows users to customize where they want the outputs to go. Open to more discussion/opinions.

yuchen-xu · 2022-06-23T16:17:12Z

@wyli Thanks for the comments!

Could you clarify a little further what kinds of insights you are looking for?
The Thread Worker flag seems to be in development and not part of MONAI yet?
It would be interesting to try InstanceNorm indeed, but as you noted, they go beyond the profiling example - shall we open a new issue along the lines of "integrating latest features into fast training tutorial"?

wyli · 2022-06-23T16:32:44Z

sure, they're probably out of the scope of this PR, but to clarify:

Could you clarify a little further what kinds of insights you are looking for?

such as, which module is currently the main bottleneck and if we look for further performance gain, where should we focus on

The Thread Worker flag seems to be in development and not part of MONAI yet?

it's in v0.9

It would be interesting to try InstanceNorm indeed, but as you noted, they go beyond the profiling example - shall we open a new issue along the lines of "integrating latest features into fast training tutorial"?

perhaps if you use the latest docker, it's a one-line change, @yiheng-wang-nv please advice..

yuchen-xu · 2022-06-24T16:17:35Z

@wyli Thanks. I'm working on those.

yuchen-xu · 2022-06-24T22:03:18Z

Hi @wyli

The use_thread_workers argument does not seem to be available in the MONAI v0.9.0rc2 docker (see attached image). Also, in the fast version, the main bottleneck seems to be CacheDataset (~40 seconds) rather than ThreadDataLoader (~0.002 seconds).

- I was able to replace batch norm with instance norm and instance_nvfuser (the one you cited above). In the first few epochs, instance norm does seem to be faster than batch norm (~10% improvement from 10 seconds), and instance_nvfuser a little faster than that, but the two instance norms result in a worse metric. I will perform some more testing and include that in the updated version.

Nic-Ma · 2022-06-25T00:11:38Z

Hi @yuchen-xu ,

Please try with MONAI v0.9.0 docker instead of RC2. And no need to worry about the preparing cache time.
If instance_nvfuser doesn't help, no need to update the tutorial for it. Please do some more experiments to confirm.

Thanks in advance.

yuchen-xu · 2022-06-29T02:59:57Z

@wyli @Nic-Ma I performed experiments on the two changes, here are the findings:

Thread Workers didn't seem to improve the speed of the tutorial. In the figure, "orig" refers to the original setup (use_thread_workers=False), "thread1/2/3/4" refers to use_thread_workers=True, num_workers=1/2/3/4. Clearly, "orig" is the fastest among these. This can be further verified by the second part of the image (tlpo = training load per operation), which is the average time it takes the next operation when called on an iterator on the train loader. Again, "orig" is fastest. Curiously, num_workers=1 makes it worse too.
Keeping use_thread_workers=False, it seems that replacing Batch Norm with either Instance Norm or Instance_NVFuser makes it faster. Performance on loss and Dice metric ends up around the same level, although the trends are different; see figures below. Also, Instance_NVFuser has a significantly slow first operation (taking ~10s under the fast regime), but is faster on later operations. I'm using instance_nvfuser, but will note the option to use instance norm too.

Each experiment is run with 600 epochs and repeated 3 times to control variance.

The next version (which I expect to push tomorrow) will also fix a minor issue in the generated graph of Dice metrics, such that the Dice and loss graphs will both show epochs going to the same number (right now the Dice graph has wrong values on the x-axis).

wyli · 2022-06-30T06:00:42Z

Nice analysis, thanks @yuchen-xu!

Nic-Ma

Thanks for the enhancement, it overall looks good to me.
Just some minor comments inline.

Thanks.

acceleration/fast_training_tutorial.ipynb

Nic-Ma · 2022-06-30T14:17:17Z

Hi @yuchen-xu ,

I think maybe there is something wrong during your last round training.
The epoch time curve of regular training looks strange, it jumped from 60s to 40s:

And the fast training is much slower to achieve mean_dice=0.94 than your previous round training, because it uses much more epochs (105 vs 60), so maybe we should not use instance norm?

Thanks.

May need 1 more time update.

yuchen-xu · 2022-06-30T19:48:13Z

@Nic-Ma Thanks for the review!
The regular epoch time is a little strange indeed, and I could see that happen if someone else is also using the GPU. I'll try to run it again.
As noted in an earlier post, instance_nvfuser norm has overall the fastest runtime, but is pretty close to batch norm in terms of metric trends. On the other hand, instance norm is slightly slower (but still faster than batch norm) and may have better metric trends. I'll try again with instance norm and see what we get.

yuchen-xu · 2022-07-01T16:00:57Z

As it turns out, instance_nvfuser norm takes 105 epochs/258 secs to achieve the target Dice of 0.94, compared to 95 epochs/144 secs for instance norm and 60 epochs/106 secs for batch norm (original). While the two instance norms result in faster per-epoch time, they take significantly longer to achieve our goal metric. Therefore, we will continue using batch norm in the tutorial. @wyli

…torials into 729-ftt-profiling-tag

wyli · 2022-07-04T11:41:09Z

the notebook testing script will modify max_epochs variable to speed up the tests

tutorials/runner.sh

Lines 84 to 85 in 2da31b6

    
           echo "MONAI tutorials testing utilities. When running the notebooks, we first search for variables, such as" 
        
           echo "\"max_epochs\" and set them to 1 to reduce testing time."

looks like this PR is not compatible with that. please help confirm

Nic-Ma · 2022-07-04T13:57:08Z

Maybe there are 2 some layout issues:

I think the description should start from a new line:

2. This `span` may not be necessary:

Others look good to me.

Thanks.

yuchen-xu · 2022-07-05T04:35:49Z

@wyli Could you elaborate what you mean by "this PR is not compatible with that"? If you are talking about max_epochs being changed, it seems that runner.sh is running a copy of the notebook when replacing max_epochs, so the values in the original notebook are not changed; see

tutorials/runner.sh

Line 293 in 2da31b6

notebook=$(cat "$filename")

. If you are talking about the notebook taking forever to run when profiling = False (since it would still be trying to run for 600 epochs), I made an update to address that and ran it successfully locally.

@Nic-Ma Thanks for the comments, I have fixed them.

Nic-Ma

Thanks for the quick update.
Looks good to me now, @wyli do you have any other comments?

Thanks.

acceleration/fast_training_tutorial.ipynb

* draft for profiling Signed-off-by: Yuchen Xu <[email protected]> * fixed coding style errors * ready for discussion * checkpoint missing nsys figures * ready for review * addressed Nic's comments * fixed typo * removed output files * intermediate commit with experiments included * ready for review * various updates; ready for review * addressed style comments Co-authored-by: Yuchen Xu <[email protected]>

draft for profiling

fc15157

Signed-off-by: Yuchen Xu <[email protected]>

yuchen-xu changed the title ~~draft for profiling~~ [WIP] draft for profiling Jun 16, 2022

Yuchen Xu added 4 commits June 16, 2022 14:25

fixed coding style errors

450a0a3

ready for discussion

4ee23b1

checkpoint missing nsys figures

b108561

ready for review

08f2ac7

yuchen-xu marked this pull request as ready for review June 22, 2022 22:37

Nic-Ma requested a review from wyli June 23, 2022 02:07

addressed Nic's comments

22d3e29

fixed typo

bd6ab90

removed output files

880738a

intermediate commit with experiments included

5a9cd3f

ready for review

f93f35d

Nic-Ma previously approved these changes Jun 30, 2022

View reviewed changes

acceleration/fast_training_tutorial.ipynb Outdated Show resolved Hide resolved

Merge branch 'main' into 729-ftt-profiling-tag

873b755

Nic-Ma changed the title ~~[WIP] draft for profiling~~ Update the fast training with profiling tool Jun 30, 2022

Nic-Ma reviewed Jun 30, 2022

View reviewed changes

acceleration/fast_training_tutorial.ipynb Outdated Show resolved Hide resolved

Yuchen Xu added 2 commits July 2, 2022 21:58

various updates; ready for review

2839422

Merge branch '729-ftt-profiling-tag' of github.com:yuchen-xu/monai-tu…

192ba9a

…torials into 729-ftt-profiling-tag

addressed style comments

cc93495

Nic-Ma approved these changes Jul 5, 2022

View reviewed changes

Merge branch 'main' into 729-ftt-profiling-tag

1783310

wyli enabled auto-merge (squash) July 5, 2022 06:37

wyli merged commit 61dae66 into Project-MONAI:main Jul 5, 2022

yuchen-xu mentioned this pull request Jul 9, 2022

Optimize Swin UNETR #774

Merged

wyli reviewed Jul 11, 2022

View reviewed changes

acceleration/fast_training_tutorial.ipynb Show resolved Hide resolved

Update the fast training with profiling tool #757

Update the fast training with profiling tool #757

Uh oh!

Conversation

yuchen-xu commented Jun 16, 2022 • edited by Nic-Ma Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Status

Checks

Uh oh!

review-notebook-app bot commented Jun 16, 2022

Uh oh!

Nic-Ma commented Jun 23, 2022

Uh oh!

wyli commented Jun 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuchen-xu commented Jun 23, 2022

Uh oh!

yuchen-xu commented Jun 23, 2022

Uh oh!

wyli commented Jun 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuchen-xu commented Jun 24, 2022

Uh oh!

yuchen-xu commented Jun 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Nic-Ma commented Jun 25, 2022

Uh oh!

yuchen-xu commented Jun 29, 2022

Uh oh!

wyli commented Jun 30, 2022

Uh oh!

Nic-Ma left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Nic-Ma commented Jun 30, 2022

Uh oh!

yuchen-xu commented Jun 30, 2022

Uh oh!

yuchen-xu commented Jul 1, 2022

Uh oh!

wyli commented Jul 4, 2022

Uh oh!

Nic-Ma commented Jul 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuchen-xu commented Jul 5, 2022

Uh oh!

Nic-Ma left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yuchen-xu commented Jun 16, 2022 •

edited by Nic-Ma

Loading

wyli commented Jun 23, 2022 •

edited

Loading

wyli commented Jun 23, 2022 •

edited

Loading

yuchen-xu commented Jun 24, 2022 •

edited

Loading

Nic-Ma commented Jul 4, 2022 •

edited

Loading