LTO: Use -flto and -flto-partition only as needed #6436
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
-flto
slows down a build by a factor of two or so. We were also always using-flto-partition=none
, which saves more code space than the default-flto-partition=balanced
. However, one some builds that need-flto
, they don't really need the extra savings of-flto-partition=none
. Using the default-flto-partition
reduces build times to 2/3-3/4 of the original.I did a bunch of experiments to see what space savings were lost, and also tried the other
-flto-partition=
options.Also I tried
-flto=auto
, which does parallelmake
, but that seems to be the default. Leaving it off did not make any difference.Here are some Metro M4 builds, done with
-j12
on my Intel i7-8700T Linux dev system. The M4 builds are generally-O2
, but need LTO for that to fit. Times rounded to half seconds, because the variance between runs can be about that. The first line in each is a clean en_US build; the second is apt_BR
build in the same build dir asen_US
, to avoid unnecessary recompiles, as is done in the GitHub Actions.-flto-partition=
1to1
en_US
(clean)1to1
pt_BR
(usingen_US
build)balanced
en_US
(clean)balanced
pt_BR
(usingen_US
build)none
en_US
(clean)none
pt_BR
(usingen_US
build)one
en_US
(clean)one
pt_BR
(usingen_US
build)So
flto-partition=one
is slightly better than-flto-partition=none
, and the time is the same.balanced
is much faster than thannone
orone
, as you can see.1to1
is worse thanbalanced
.I also did some trials on Trinket M0 and CPX.
one
is very slightly better thannone
(8-18 bytes).one
vsbalanced
is 92 bytes better for Trinket M0 and 120 bytes better for CPX. So I left-flto-partition-=one
on for all SAMD21 builds, and forCIRCUITPY_FULL_BUILD=0
SAMx5x builds.We only use LTO now for
atmel-samd
andnrf
builds. In this PR, I turned off LTO on roomynrf
builds, which saves a lot of time. Roomy is very roomy - there are still 100k's of unused space.I'll look at the total time savings after the CI runs.