Skip to content

LTO build profile extension documentation #1219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 8, 2020
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions docs/api/memory/link_time_optimization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Link Time Optimization

Link Time Optimization (LTO) is a program memory usage optimization mechanism that the compiler performs at link time. At compile time, the compiler creates a special intermediate representation of all translation units. It then optimizes them as a single unit at link time, which uses less memory than non-LTO builds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is link time? What are translation units?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

translation unit it's usually single C/C++ source file .c/.cpp which is then compiled (compilation time) to single object file .o/.obj and then linker links (link time) this object files in to single executable file


## Using LTO in Mbed OS

The Mbed OS build system implements LTO as an optional profile extension in `tools\profiles\extensions\lto.json`. Enabling LTO amends the build profile with LTO flags. LTO performs heavy memory optimizations that break debugging, so it's recommended to use it only with release profile.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Mbed OS build system implements LTO as an optional profile extension in `tools\profiles\extensions\lto.json`. Enabling LTO amends the build profile with LTO flags. LTO performs heavy memory optimizations that break debugging, so it's recommended to use it only with release profile.
The Mbed OS build system implements LTO as an optional profile extension in `tools\profiles\extensions\lto.json`. Enabling LTO amends the build profile with LTO flags.
<span class="notes">**Note:** LTO performs heavy memory optimizations that break debugging, so we recommend using it only with the release profile, not debugging and develop.</span>


To enable LTO add `--profile` option with LTO file path `tools\profiles\extensions\lto.json` to the build command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To enable LTO add `--profile` option with LTO file path `tools\profiles\extensions\lto.json` to the build command.
To enable LTO, add the `--profile` option with the LTO file path `tools\profiles\extensions\lto.json` to the build command.


<span class="notes">**Note**: For profile extensions you have to put the full path relative to the project's root folder.</span>

To enable LTO with the `release` profile:

```
mbed compile -t TOOLCHAIN -m TARGET --profile release --profile mbed-os/tools/profiles/extensions/lto.json
```

Example LTO profile memory savings for [mbed-os-example-blinky](https://github.com/ARMmbed/mbed-os-example-blinky):

|Build type|Total static RAM memory (data + BSS)|Total flash memory (text + data)|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's BSS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All data BSS and text are a segments of memory to store different type of variables/data

| --- | --- | --- |
| GCC_ARM - release - no LTO | 12,096B | 44,628B |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect these numbers to change? If so, we should remove the table, so we don't have to keep updating it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly only amount of saved memory is important. But true, it will change slightly depending on Mbed OS version used.

Maybe it would be better to round the numbers to kB like below example?
What do you think @jamesbeyond ?

Build type Total Static RAM memory (data + bss) Total Flash memory (text + data)
GCC_ARM - release - no LTO 12.1 kB 44.6 kB
GCC_ARM - release - LTO 11.8 kB 41.1 kB
saved memory 0.3 kB 3.5 kB
ARM - release - no LTO 10.36 kB 35.5 kB
ARM - release - LTO 10.15 kB 31.5 kB
saved memory 0.21 kB ‭ 4 kB

| GCC_ARM - release - LTO | 11,800B | 41,088B |
|***saved memory***|296B|3,540B|
| ARM - release - no LTO | 10,365B | 35,496B |
| ARM - release - LTO | 10,153B | 31,514B |
|***saved memory***|212B|‭3,982‬B|

LTO profile build results for [mbed-cloud-client-example](https://github.com/ARMmbed/mbed-cloud-client-example)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are lines 19 and 30 the same thing for two different examples? The words is so different I'm not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the same. It show how much we gain for simple application (line 19) and complex application (line 30).


|Build type|Total Static RAM memory (data + BSS)|Total Flash memory (text + data)|
| --- | --- | --- |
| GCC_ARM - release - no LTO | 59,760B | 389,637B |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect these numbers to change? If so, we should remove the table, so we don't have to keep updating it.

| GCC_ARM - release - LTO | 59,432B | 354,167B |
|***saved memory***| 328B | ‭35,470‬B|
| ARM - release - no LTO | 58,099B | 353,849B |
| ARM - release - LTO | 57,150B | 322,500B |
|***saved memory***| 949B | ‭31,349‬B|

<span class="notes">**Note**: In LTO builds, the compiler produces bytecode/bitcode instead of regular object code. It's hard to analyze this output with object code analysis tools.</span>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we say something about the good side of the LTO as well? e.g. how much ROM/RAM can be saved in terms of blinky example and PDMC example? doesn't need to be with too many details, a table about roughly how many bytes can be saved would be good enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

## Limitations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I suggest in the Limitation section to mention the impact to the debuggability also the impact to the symbol size, which disruption to memory map generation. even we briefly mentioned above. here is an article that could be useful https://interrupt.memfault.com/blog/best-and-worst-gcc-clang-compiler-flags#-flto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move most or all of Limitations to our release notes? That may be a better place for this content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


- LTO slows down build process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- LTO slows down build process.
- LTO slows down the build process.

- It’s very hard to control memory placement when using LTO.
- LTO performs heavy memory optimizations that break debugging.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've said this already in the main text

- In LTO builds, the compiler produce bytecode/bitcode instead of regular object code. It's hard to analyze this output with object code analysis tools.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- In LTO builds, the compiler produce bytecode/bitcode instead of regular object code. It's hard to analyze this output with object code analysis tools.
- In LTO builds, the compiler produces bytecode/bitcode instead of regular object code. It's hard to analyze this output with object code analysis tools.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've said this already in the main text

- LTO could cause increases to the stack space needed due to cross-object inlining.

### Arm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Arm
### Arm Compiler 6

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any of these apply only to ARMC5, please remove them - we don't support ARMC5 in Mbed OS 6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that all apply for ARM 6 @kjbracey-arm can you confirm ?


- No bitcode libraries: armlink only supports bitcode objects on the command line. It does not support bitcode objects coming from libraries. armlink gives an error message if it encounters a file containing bitcode while loading from a library.
- Partial Linking is not supported with LTO as it only works with elf objects not bitcode files.
- Arm recommends that link time optimization is only performed on code and data that does not require precise placement in the scatter file, with general input section selectors such as *(+RO) and .ANY(+RO) used to select sections generated by link time optimization. It is not possible to match bitcode in .llvmbc sections by name in a scatter file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Arm recommends that link time optimization is only performed on code and data that does not require precise placement in the scatter file, with general input section selectors such as *(+RO) and .ANY(+RO) used to select sections generated by link time optimization. It is not possible to match bitcode in .llvmbc sections by name in a scatter file.
- Arm recommends that link time optimization is only performed on code and data that does not require precise placement in the scatter file, with general input section selectors such as `*(+RO)` and `.ANY(+RO)` used to select sections generated by link time optimization. It is not possible to match bitcode in `.llvmbc` sections by name in a scatter file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to put the * in code ears or it makes everything else italics.

- Bitcode objects are not guaranteed to be compatible across compiler versions. This means that you should ensure all your bitcode files are built using the same version of the compiler when linking with LTO.

### IAR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support IAR in the 6 docs; you can remove this bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed


- There is no LTO available for the IAR compiler.

### GCC_ARM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any know limitation need to mention for ARMC6 LTO? I can see a lot ot place we need to add MBED_USED macro.


- The minimal required version of `GCC_ARM` is now the GNU Arm Embedded Toolchain Version 9-2019-q4-major. Earlier `GCC_ARM` versions can cause various issues when the `-flto` flag is used, for example a platform-specific error during the final link stage on Windows hosts with GCC8.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "now" mean? This is also the version listed for Mbed OS 6 (on the VPN: https://os.mbed.com/docs/mbed-os/development/build-tools/index.html) so it's really the only option. In other words: do we need "now"?

- The `noinline` attribute has to be used for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link-time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. This may lead to `section ".section_name" will not fit in region "region_name"` type errors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The `noinline` attribute has to be used for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link-time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. This may lead to `section ".section_name" will not fit in region "region_name"` type errors.
- You must use the `noinline` attribute for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. This may lead to `section ".section_name" will not fit in region "region_name"` type errors.

- There is a bug in all GCC versions causing that LTO removes C functions declared as weak in assembler (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83967 https://bugs.launchpad.net/gcc-arm-embedded/+bug/1747966). The Mbed OS build system provides a fix for this. In case of exporting Mbed OS project to other build system the problem will emerge. This can be fixed by changing the order of object files in the linker command. Objects providing the weak symbols and compiled from assembly must be listed before the objects providing the strong symbols.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- There is a bug in all GCC versions causing that LTO removes C functions declared as weak in assembler (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83967 https://bugs.launchpad.net/gcc-arm-embedded/+bug/1747966). The Mbed OS build system provides a fix for this. In case of exporting Mbed OS project to other build system the problem will emerge. This can be fixed by changing the order of object files in the linker command. Objects providing the weak symbols and compiled from assembly must be listed before the objects providing the strong symbols.
- In all GCC versions, LTO removes C functions declared as weak in assembler (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83967 https://bugs.launchpad.net/gcc-arm-embedded/+bug/1747966). For Mbed OS, the problem emerges when exporting Mbed OS projects to other build systems. You can fix this by changing the order of object files in the linker command: Objects providing the weak symbols and compiled from assembly must be listed before the objects providing the strong symbols.