-
Notifications
You must be signed in to change notification settings - Fork 178
LTO build profile extension documentation #1219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,66 @@ | |||||||||||||||||||||||
# Link Time Optimization | |||||||||||||||||||||||
|
|||||||||||||||||||||||
Link Time Optimization (LTO) is a program memory usage optimization mechanism that the compiler performs at link time. At compile time, the compiler creates a special intermediate representation of all translation units. It then optimizes them as a single unit at link time, which uses less memory than non-LTO builds. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
## Using LTO in Mbed OS | |||||||||||||||||||||||
|
|||||||||||||||||||||||
The Mbed OS build system implements LTO as an optional profile extension in `tools\profiles\extensions\lto.json`. Enabling LTO amends the build profile with LTO flags. LTO performs heavy memory optimizations that break debugging, so it's recommended to use it only with release profile. | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
To enable LTO add `--profile` option with LTO file path `tools\profiles\extensions\lto.json` to the build command. | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
<span class="notes">**Note**: For profile extensions you have to put the full path relative to the project's root folder.</span> | |||||||||||||||||||||||
|
|||||||||||||||||||||||
To enable LTO with the `release` profile: | |||||||||||||||||||||||
|
|||||||||||||||||||||||
``` | |||||||||||||||||||||||
mbed compile -t TOOLCHAIN -m TARGET --profile release --profile mbed-os/tools/profiles/extensions/lto.json | |||||||||||||||||||||||
``` | |||||||||||||||||||||||
|
|||||||||||||||||||||||
Example LTO profile memory savings for [mbed-os-example-blinky](https://github.com/ARMmbed/mbed-os-example-blinky): | |||||||||||||||||||||||
|
|||||||||||||||||||||||
|Build type|Total static RAM memory (data + BSS)|Total flash memory (text + data)| | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's BSS? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All |
|||||||||||||||||||||||
| --- | --- | --- | | |||||||||||||||||||||||
| GCC_ARM - release - no LTO | 12,096B | 44,628B | | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we expect these numbers to change? If so, we should remove the table, so we don't have to keep updating it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Frankly only amount of saved memory is important. But true, it will change slightly depending on Mbed OS version used. Maybe it would be better to round the numbers to kB like below example?
|
|||||||||||||||||||||||
| GCC_ARM - release - LTO | 11,800B | 41,088B | | |||||||||||||||||||||||
|***saved memory***|296B|3,540B| | |||||||||||||||||||||||
| ARM - release - no LTO | 10,365B | 35,496B | | |||||||||||||||||||||||
| ARM - release - LTO | 10,153B | 31,514B | | |||||||||||||||||||||||
|***saved memory***|212B|3,982B| | |||||||||||||||||||||||
|
|||||||||||||||||||||||
LTO profile build results for [mbed-cloud-client-example](https://github.com/ARMmbed/mbed-cloud-client-example) | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are lines 19 and 30 the same thing for two different examples? The words is so different I'm not sure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not the same. It show how much we gain for simple application (line 19) and complex application (line 30). |
|||||||||||||||||||||||
|
|||||||||||||||||||||||
|Build type|Total Static RAM memory (data + BSS)|Total Flash memory (text + data)| | |||||||||||||||||||||||
| --- | --- | --- | | |||||||||||||||||||||||
| GCC_ARM - release - no LTO | 59,760B | 389,637B | | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we expect these numbers to change? If so, we should remove the table, so we don't have to keep updating it. |
|||||||||||||||||||||||
| GCC_ARM - release - LTO | 59,432B | 354,167B | | |||||||||||||||||||||||
|***saved memory***| 328B | 35,470B| | |||||||||||||||||||||||
| ARM - release - no LTO | 58,099B | 353,849B | | |||||||||||||||||||||||
| ARM - release - LTO | 57,150B | 322,500B | | |||||||||||||||||||||||
|***saved memory***| 949B | 31,349B| | |||||||||||||||||||||||
|
|||||||||||||||||||||||
<span class="notes">**Note**: In LTO builds, the compiler produces bytecode/bitcode instead of regular object code. It's hard to analyze this output with object code analysis tools.</span> | |||||||||||||||||||||||
|
|||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we say something about the good side of the LTO as well? e.g. how much ROM/RAM can be saved in terms of blinky example and PDMC example? doesn't need to be with too many details, a table about roughly how many bytes can be saved would be good enough There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated |
|||||||||||||||||||||||
## Limitations | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I suggest in the Limitation section to mention the impact to the debuggability also the impact to the symbol size, which disruption to memory map generation. even we briefly mentioned above. here is an article that could be useful https://interrupt.memfault.com/blog/best-and-worst-gcc-clang-compiler-flags#-flto There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we move most or all of Limitations to our release notes? That may be a better place for this content. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ping @jamesbeyond |
|||||||||||||||||||||||
|
|||||||||||||||||||||||
- LTO slows down build process. | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
|||||||||||||||||||||||
- It’s very hard to control memory placement when using LTO. | |||||||||||||||||||||||
- LTO performs heavy memory optimizations that break debugging. | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We've said this already in the main text |
|||||||||||||||||||||||
- In LTO builds, the compiler produce bytecode/bitcode instead of regular object code. It's hard to analyze this output with object code analysis tools. | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We've said this already in the main text |
|||||||||||||||||||||||
- LTO could cause increases to the stack space needed due to cross-object inlining. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
### Arm | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If any of these apply only to ARMC5, please remove them - we don't support ARMC5 in Mbed OS 6 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that all apply for ARM 6 @kjbracey-arm can you confirm ? |
|||||||||||||||||||||||
|
|||||||||||||||||||||||
- No bitcode libraries: armlink only supports bitcode objects on the command line. It does not support bitcode objects coming from libraries. armlink gives an error message if it encounters a file containing bitcode while loading from a library. | |||||||||||||||||||||||
- Partial Linking is not supported with LTO as it only works with elf objects not bitcode files. | |||||||||||||||||||||||
- Arm recommends that link time optimization is only performed on code and data that does not require precise placement in the scatter file, with general input section selectors such as *(+RO) and .ANY(+RO) used to select sections generated by link time optimization. It is not possible to match bitcode in .llvmbc sections by name in a scatter file. | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Had to put the |
|||||||||||||||||||||||
- Bitcode objects are not guaranteed to be compatible across compiler versions. This means that you should ensure all your bitcode files are built using the same version of the compiler when linking with LTO. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
### IAR | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't support IAR in the 6 docs; you can remove this bit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed |
|||||||||||||||||||||||
|
|||||||||||||||||||||||
- There is no LTO available for the IAR compiler. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
### GCC_ARM | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have any know limitation need to mention for ARMC6 LTO? I can see a lot ot place we need to add MBED_USED macro. |
|||||||||||||||||||||||
|
|||||||||||||||||||||||
- The minimal required version of `GCC_ARM` is now the GNU Arm Embedded Toolchain Version 9-2019-q4-major. Earlier `GCC_ARM` versions can cause various issues when the `-flto` flag is used, for example a platform-specific error during the final link stage on Windows hosts with GCC8. | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does "now" mean? This is also the version listed for Mbed OS 6 (on the VPN: https://os.mbed.com/docs/mbed-os/development/build-tools/index.html) so it's really the only option. In other words: do we need "now"? |
|||||||||||||||||||||||
- The `noinline` attribute has to be used for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link-time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. This may lead to `section ".section_name" will not fit in region "region_name"` type errors. | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
|||||||||||||||||||||||
- There is a bug in all GCC versions causing that LTO removes C functions declared as weak in assembler (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83967 https://bugs.launchpad.net/gcc-arm-embedded/+bug/1747966). The Mbed OS build system provides a fix for this. In case of exporting Mbed OS project to other build system the problem will emerge. This can be fixed by changing the order of object files in the linker command. Objects providing the weak symbols and compiled from assembly must be listed before the objects providing the strong symbols. | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is link time? What are translation units?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
translation unit it's usually single C/C++ source file .c/.cpp which is then compiled (compilation time) to single object file .o/.obj and then linker links (link time) this object files in to single executable file