Skip to content

LTO: Use -flto and -flto-partition only as needed #6436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 27, 2022

Conversation

dhalbert
Copy link
Collaborator

@dhalbert dhalbert commented May 26, 2022

-flto slows down a build by a factor of two or so. We were also always using -flto-partition=none, which saves more code space than the default -flto-partition=balanced. However, one some builds that need -flto, they don't really need the extra savings of -flto-partition=none. Using the default -flto-partition reduces build times to 2/3-3/4 of the original.

I did a bunch of experiments to see what space savings were lost, and also tried the other -flto-partition= options.

Also I tried -flto=auto, which does parallel make, but that seems to be the default. Leaving it off did not make any difference.

Here are some Metro M4 builds, done with -j12 on my Intel i7-8700T Linux dev system. The M4 builds are generally -O2, but need LTO for that to fit. Times rounded to half seconds, because the variance between runs can be about that. The first line in each is a clean en_US build; the second is a pt_BR build in the same build dir as en_US, to avoid unnecessary recompiles, as is done in the GitHub Actions.

-flto-partition= firmware bytes free build time (secs) translation
1to1 32316 29 en_US (clean)
1to1 29568 16.5 pt_BR (using en_US build)
balanced 32296 28 en_US (clean)
balanced 29556 14.5 pt_BR (using en_US build)
none 33484 40 en_US (clean)
none 30728 27 pt_BR (using en_US build)
one 33684 40 en_US (clean)
one 30928 27 pt_BR (using en_US build)

So flto-partition=one is slightly better than -flto-partition=none, and the time is the same. balanced is much faster than than none or one, as you can see. 1to1 is worse than balanced.

I also did some trials on Trinket M0 and CPX. one is very slightly better than none (8-18 bytes). one vs balanced is 92 bytes better for Trinket M0 and 120 bytes better for CPX. So I left -flto-partition-=one on for all SAMD21 builds, and for CIRCUITPY_FULL_BUILD=0 SAMx5x builds.

We only use LTO now for atmel-samd and nrf builds. In this PR, I turned off LTO on roomy nrf builds, which saves a lot of time. Roomy is very roomy - there are still 100k's of unused space.

I'll look at the total time savings after the CI runs.

@dhalbert dhalbert marked this pull request as ready for review May 27, 2022 15:21
@dhalbert dhalbert requested a review from tannewt May 27, 2022 15:21
@dhalbert
Copy link
Collaborator Author

This is ready for review. The single build failure appears to be a CI problem.

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@tannewt tannewt merged commit dc5565a into adafruit:main May 27, 2022
@dhalbert dhalbert deleted the judicious-lto branch May 27, 2022 17:31
@dhalbert
Copy link
Collaborator Author

It's a little hard to tell, because the build duration times include waiting time, but I think this is saving about 15 minutes, reducing a full build from about 80 minutes to 65 minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants