-
Notifications
You must be signed in to change notification settings - Fork 178
LTO build profile extension documentation #1219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
08fb0fa
to
26a37d3
Compare
updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think useful to mention the issue https://bugs.launchpad.net/gcc-arm-embedded/+bug/1747966
which causing ISR handler to be removed @fkjagodzinski @maciejbocianski
|
||
- There is no LTO available for IAR compiler. | ||
|
||
### GCC_ARM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any know limitation need to mention for ARMC6 LTO? I can see a lot ot place we need to add MBED_USED macro.
|
||
<span class="notes">**Note**: In LTO builds compiler produce bytecode/bitcode instead of regular object code. And it's hard to analyse this output by object code analysis tools.</span> | ||
|
||
## Limitations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I suggest in the Limitation section to mention the impact to the debuggability also the impact to the symbol size, which disruption to memory map generation. even we briefly mentioned above. here is an article that could be useful https://interrupt.memfault.com/blog/best-and-worst-gcc-clang-compiler-flags#-flto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
``` | ||
|
||
<span class="notes">**Note**: In LTO builds compiler produce bytecode/bitcode instead of regular object code. And it's hard to analyse this output by object code analysis tools.</span> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we say something about the good side of the LTO as well? e.g. how much ROM/RAM can be saved in terms of blinky example and PDMC example? doesn't need to be with too many details, a table about roughly how many bytes can be saved would be good enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Good point. I will include it. The problem could emerge when using other build system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the memory savings, feel free to use the updated Results section from ARMmbed/mbed-os#11856.
### GCC_ARM | ||
|
||
- The minimal required version of the `GCC_ARM` is now the GNU Arm Embedded Toolchain Version 9-2019-q4-major. Earlier `GCC_ARM` versions can cause various issues when the `-flto` flag is used. | ||
- The `noinline` attribute has to be used for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link-time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. `This may lead to a section ".section_name" will not fit in region "region_name"` type errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The `noinline` attribute has to be used for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link-time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. `This may lead to a section ".section_name" will not fit in region "region_name"` type errors. | |
- The `noinline` attribute has to be used for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link-time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. This may lead to a `section ".section_name" will not fit in region "region_name"` type errors. |
(formatting of the last sentence)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
26a37d3
to
f7578d2
Compare
@fkjagodzinski @jamesbeyond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@bulislaw @kjbracey-arm are you happy with this LTO documentation? |
@@ -0,0 +1,66 @@ | |||
# Link Time Optimization | |||
|
|||
Link Time Optimization (LTO) is a program memory usage optimization mechanism performed by compiler at link time. At compile time compiler creates special intermediate represention of all translation units and then optimize them as a single unit at link time resulting in much better optimization comparing to non-LTO builds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: representation -> representation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
f7578d2
to
f368b74
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
As mentioned earlier -- you may want to update the memory savings for the mbed-cloud-client-example
with the updated Results section from ARMmbed/mbed-os#11856.
I will update memory savings tables right after ARM compiler RAM stats fix (ARMmbed/mbed-os#12462) will be merged |
updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I've left some queries for you to address.
@@ -0,0 +1,66 @@ | |||
# Link Time Optimization | |||
|
|||
Link Time Optimization (LTO) is a program memory usage optimization mechanism that the compiler performs at link time. At compile time, the compiler creates a special intermediate representation of all translation units. It then optimizes them as a single unit at link time, which uses less memory than non-LTO builds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is link time? What are translation units?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
translation unit it's usually single C/C++ source file .c/.cpp which is then compiled (compilation time) to single object file .o/.obj and then linker links (link time) this object files in to single executable file
|
||
## Using LTO in Mbed OS | ||
|
||
The Mbed OS build system implements LTO as an optional profile extension in `tools\profiles\extensions\lto.json`. Enabling LTO amends the build profile with LTO flags. LTO performs heavy memory optimizations that break debugging, so we don't recommend enabling it in debug or develop profiles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to list the one profile we recommend it for, rather than the two we don't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
|
||
The Mbed OS build system implements LTO as an optional profile extension in `tools\profiles\extensions\lto.json`. Enabling LTO amends the build profile with LTO flags. LTO performs heavy memory optimizations that break debugging, so we don't recommend enabling it in debug or develop profiles. | ||
|
||
To enable LTO, retype the `--profile` option with LTO file path `tools\profiles\extensions\lto.json`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When did we type it the first time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First one is when selecting build profile.
mbed compile -t TOOLCHAIN -m TARGET --profile release --profile mbed-os/tools/profiles/extensions/lto.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
|
||
Example LTO profile memory savings for [mbed-os-example-blinky](https://github.com/ARMmbed/mbed-os-example-blinky): | ||
|
||
|Build type|Total static RAM memory (data + BSS)|Total flash memory (text + data)| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's BSS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All data
BSS
and text
are a segments of memory to store different type of variables/data
|
||
|Build type|Total static RAM memory (data + BSS)|Total flash memory (text + data)| | ||
| --- | --- | --- | | ||
| GCC_ARM - release - no LTO | 12,096B | 44,628B | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect these numbers to change? If so, we should remove the table, so we don't have to keep updating it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly only amount of saved memory is important. But true, it will change slightly depending on Mbed OS version used.
Maybe it would be better to round the numbers to kB like below example?
What do you think @jamesbeyond ?
Build type | Total Static RAM memory (data + bss) | Total Flash memory (text + data) |
---|---|---|
GCC_ARM - release - no LTO | 12.1 kB | 44.6 kB |
GCC_ARM - release - LTO | 11.8 kB | 41.1 kB |
saved memory | 0.3 kB | 3.5 kB |
ARM - release - no LTO | 10.36 kB | 35.5 kB |
ARM - release - LTO | 10.15 kB | 31.5 kB |
saved memory | 0.21 kB | 4 kB |
|
||
|Build type|Total Static RAM memory (data + BSS)|Total Flash memory (text + data)| | ||
| --- | --- | --- | | ||
| GCC_ARM - release - no LTO | 59,760B | 389,637B | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect these numbers to change? If so, we should remove the table, so we don't have to keep updating it.
|
||
<span class="notes">**Note**: In LTO builds, the compiler produces bytecode/bitcode instead of regular object code. It's hard to analyze this output with object code analysis tools.</span> | ||
|
||
## Limitations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move most or all of Limitations to our release notes? That may be a better place for this content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @jamesbeyond
64ca488
to
ddcf40e
Compare
It looks like my changes were overwritten. Could you please fix this? |
Edit file, mostly for international spelling and active voice.
@AnotherButler fixed |
@AnotherButler @bulislaw, are you happy about this document? shall we get it in? |
I'd still like to remove the limitations section into release notes. I'd also like to remove the table if the numbers are likely to change. Does anyone object to that? |
@iriark01 This is still waiting on responses to my questions. This is related to the blog post you edited. |
@maciejbocianski I'm not sure what the status of this PR is. Are you still waiting on answers from @jamesbeyond? |
ping @bulislaw |
@bulislaw this seems to be waiting for answers |
|
||
## Using LTO in Mbed OS | ||
|
||
The Mbed OS build system implements LTO as an optional profile extension in `tools\profiles\extensions\lto.json`. Enabling LTO amends the build profile with LTO flags. LTO performs heavy memory optimizations that break debugging, so it's recommended to use it only with release profile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Mbed OS build system implements LTO as an optional profile extension in `tools\profiles\extensions\lto.json`. Enabling LTO amends the build profile with LTO flags. LTO performs heavy memory optimizations that break debugging, so it's recommended to use it only with release profile. | |
The Mbed OS build system implements LTO as an optional profile extension in `tools\profiles\extensions\lto.json`. Enabling LTO amends the build profile with LTO flags. | |
<span class="notes">**Note:** LTO performs heavy memory optimizations that break debugging, so we recommend using it only with the release profile, not debugging and develop.</span> |
|
||
The Mbed OS build system implements LTO as an optional profile extension in `tools\profiles\extensions\lto.json`. Enabling LTO amends the build profile with LTO flags. LTO performs heavy memory optimizations that break debugging, so it's recommended to use it only with release profile. | ||
|
||
To enable LTO add `--profile` option with LTO file path `tools\profiles\extensions\lto.json` to the build command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To enable LTO add `--profile` option with LTO file path `tools\profiles\extensions\lto.json` to the build command. | |
To enable LTO, add the `--profile` option with the LTO file path `tools\profiles\extensions\lto.json` to the build command. |
| ARM - release - LTO | 10,153B | 31,514B | | ||
|***saved memory***|212B|3,982B| | ||
|
||
LTO profile build results for [mbed-cloud-client-example](https://github.com/ARMmbed/mbed-cloud-client-example) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are lines 19 and 30 the same thing for two different examples? The words is so different I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not the same. It show how much we gain for simple application (line 19) and complex application (line 30).
|
||
## Limitations | ||
|
||
- LTO slows down build process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- LTO slows down build process. | |
- LTO slows down the build process. |
- LTO slows down build process. | ||
- It’s very hard to control memory placement when using LTO. | ||
- LTO performs heavy memory optimizations that break debugging. | ||
- In LTO builds, the compiler produce bytecode/bitcode instead of regular object code. It's hard to analyze this output with object code analysis tools. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- In LTO builds, the compiler produce bytecode/bitcode instead of regular object code. It's hard to analyze this output with object code analysis tools. | |
- In LTO builds, the compiler produces bytecode/bitcode instead of regular object code. It's hard to analyze this output with object code analysis tools. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've said this already in the main text
- Arm recommends that link time optimization is only performed on code and data that does not require precise placement in the scatter file, with general input section selectors such as *(+RO) and .ANY(+RO) used to select sections generated by link time optimization. It is not possible to match bitcode in .llvmbc sections by name in a scatter file. | ||
- Bitcode objects are not guaranteed to be compatible across compiler versions. This means that you should ensure all your bitcode files are built using the same version of the compiler when linking with LTO. | ||
|
||
### IAR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support IAR in the 6 docs; you can remove this bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
- In LTO builds, the compiler produce bytecode/bitcode instead of regular object code. It's hard to analyze this output with object code analysis tools. | ||
- LTO could cause increases to the stack space needed due to cross-object inlining. | ||
|
||
### Arm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Arm | |
### Arm Compiler 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If any of these apply only to ARMC5, please remove them - we don't support ARMC5 in Mbed OS 6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that all apply for ARM 6 @kjbracey-arm can you confirm ?
|
||
### GCC_ARM | ||
|
||
- The minimal required version of `GCC_ARM` is now the GNU Arm Embedded Toolchain Version 9-2019-q4-major. Earlier `GCC_ARM` versions can cause various issues when the `-flto` flag is used, for example a platform-specific error during the final link stage on Windows hosts with GCC8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "now" mean? This is also the version listed for Mbed OS 6 (on the VPN: https://os.mbed.com/docs/mbed-os/development/build-tools/index.html) so it's really the only option. In other words: do we need "now"?
### GCC_ARM | ||
|
||
- The minimal required version of `GCC_ARM` is now the GNU Arm Embedded Toolchain Version 9-2019-q4-major. Earlier `GCC_ARM` versions can cause various issues when the `-flto` flag is used, for example a platform-specific error during the final link stage on Windows hosts with GCC8. | ||
- The `noinline` attribute has to be used for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link-time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. This may lead to `section ".section_name" will not fit in region "region_name"` type errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The `noinline` attribute has to be used for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link-time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. This may lead to `section ".section_name" will not fit in region "region_name"` type errors. | |
- You must use the `noinline` attribute for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. This may lead to `section ".section_name" will not fit in region "region_name"` type errors. |
|
||
- The minimal required version of `GCC_ARM` is now the GNU Arm Embedded Toolchain Version 9-2019-q4-major. Earlier `GCC_ARM` versions can cause various issues when the `-flto` flag is used, for example a platform-specific error during the final link stage on Windows hosts with GCC8. | ||
- The `noinline` attribute has to be used for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link-time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. This may lead to `section ".section_name" will not fit in region "region_name"` type errors. | ||
- There is a bug in all GCC versions causing that LTO removes C functions declared as weak in assembler (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83967 https://bugs.launchpad.net/gcc-arm-embedded/+bug/1747966). The Mbed OS build system provides a fix for this. In case of exporting Mbed OS project to other build system the problem will emerge. This can be fixed by changing the order of object files in the linker command. Objects providing the weak symbols and compiled from assembly must be listed before the objects providing the strong symbols. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- There is a bug in all GCC versions causing that LTO removes C functions declared as weak in assembler (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83967 https://bugs.launchpad.net/gcc-arm-embedded/+bug/1747966). The Mbed OS build system provides a fix for this. In case of exporting Mbed OS project to other build system the problem will emerge. This can be fixed by changing the order of object files in the linker command. Objects providing the weak symbols and compiled from assembly must be listed before the objects providing the strong symbols. | |
- In all GCC versions, LTO removes C functions declared as weak in assembler (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83967 https://bugs.launchpad.net/gcc-arm-embedded/+bug/1747966). For Mbed OS, the problem emerges when exporting Mbed OS projects to other build systems. You can fix this by changing the order of object files in the linker command: Objects providing the weak symbols and compiled from assembly must be listed before the objects providing the strong symbols. |
@maciejbocianski or @bulislaw I can't merge this without my comments being addressed |
@iriark01 @jamesbeyond @bulislaw I have added changes according review suggestions |
Sorry, I've been out. I'll try to work on this this week. |
Proofreading
Thank you, Elise. |
This is draft of LTO documentation, and will be updated soon
LTO build profile extension is being introduced by following PRs:
ARMmbed/mbed-os#11874
ARMmbed/mbed-os#11856