Skip to content

LTO build profile extension documentation #1219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 8, 2020

Conversation

maciejbocianski
Copy link
Contributor

This is draft of LTO documentation, and will be updated soon

LTO build profile extension is being introduced by following PRs:
ARMmbed/mbed-os#11874
ARMmbed/mbed-os#11856

@maciejbocianski
Copy link
Contributor Author

updated

Copy link
Contributor

@jamesbeyond jamesbeyond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think useful to mention the issue https://bugs.launchpad.net/gcc-arm-embedded/+bug/1747966
which causing ISR handler to be removed @fkjagodzinski @maciejbocianski


- There is no LTO available for IAR compiler.

### GCC_ARM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any know limitation need to mention for ARMC6 LTO? I can see a lot ot place we need to add MBED_USED macro.


<span class="notes">**Note**: In LTO builds compiler produce bytecode/bitcode instead of regular object code. And it's hard to analyse this output by object code analysis tools.</span>

## Limitations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I suggest in the Limitation section to mention the impact to the debuggability also the impact to the symbol size, which disruption to memory map generation. even we briefly mentioned above. here is an article that could be useful https://interrupt.memfault.com/blog/best-and-worst-gcc-clang-compiler-flags#-flto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

```

<span class="notes">**Note**: In LTO builds compiler produce bytecode/bitcode instead of regular object code. And it's hard to analyse this output by object code analysis tools.</span>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we say something about the good side of the LTO as well? e.g. how much ROM/RAM can be saved in terms of blinky example and PDMC example? doesn't need to be with too many details, a table about roughly how many bytes can be saved would be good enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@maciejbocianski
Copy link
Contributor Author

Do you think useful to mention the issue https://bugs.launchpad.net/gcc-arm-embedded/+bug/1747966
which causing ISR handler to be removed @fkjagodzinski @maciejbocianski

Good point. I will include it. The problem could emerge when using other build system

Copy link
Member

@fkjagodzinski fkjagodzinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the memory savings, feel free to use the updated Results section from ARMmbed/mbed-os#11856.

### GCC_ARM

- The minimal required version of the `GCC_ARM` is now the GNU Arm Embedded Toolchain Version 9-2019-q4-major. Earlier `GCC_ARM` versions can cause various issues when the `-flto` flag is used.
- The `noinline` attribute has to be used for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link-time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. `This may lead to a section ".section_name" will not fit in region "region_name"` type errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The `noinline` attribute has to be used for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link-time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. `This may lead to a section ".section_name" will not fit in region "region_name"` type errors.
- The `noinline` attribute has to be used for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link-time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. This may lead to a `section ".section_name" will not fit in region "region_name"` type errors.

(formatting of the last sentence)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@maciejbocianski
Copy link
Contributor Author

@fkjagodzinski @jamesbeyond
updated please re-review

Copy link
Contributor

@jamesbeyond jamesbeyond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jamesbeyond
Copy link
Contributor

@bulislaw @kjbracey-arm are you happy with this LTO documentation?

@@ -0,0 +1,66 @@
# Link Time Optimization

Link Time Optimization (LTO) is a program memory usage optimization mechanism performed by compiler at link time. At compile time compiler creates special intermediate represention of all translation units and then optimize them as a single unit at link time resulting in much better optimization comparing to non-LTO builds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: representation -> representation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

@fkjagodzinski fkjagodzinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
As mentioned earlier -- you may want to update the memory savings for the mbed-cloud-client-example with the updated Results section from ARMmbed/mbed-os#11856.

@maciejbocianski
Copy link
Contributor Author

I will update memory savings tables right after ARM compiler RAM stats fix (ARMmbed/mbed-os#12462) will be merged

@maciejbocianski
Copy link
Contributor Author

I will update memory savings tables right after ARM compiler RAM stats fix (ARMmbed/mbed-os#12462) will be merged

updated

Copy link
Contributor

@AnotherButler AnotherButler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I've left some queries for you to address.

@@ -0,0 +1,66 @@
# Link Time Optimization

Link Time Optimization (LTO) is a program memory usage optimization mechanism that the compiler performs at link time. At compile time, the compiler creates a special intermediate representation of all translation units. It then optimizes them as a single unit at link time, which uses less memory than non-LTO builds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is link time? What are translation units?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

translation unit it's usually single C/C++ source file .c/.cpp which is then compiled (compilation time) to single object file .o/.obj and then linker links (link time) this object files in to single executable file


## Using LTO in Mbed OS

The Mbed OS build system implements LTO as an optional profile extension in `tools\profiles\extensions\lto.json`. Enabling LTO amends the build profile with LTO flags. LTO performs heavy memory optimizations that break debugging, so we don't recommend enabling it in debug or develop profiles.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to list the one profile we recommend it for, rather than the two we don't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


The Mbed OS build system implements LTO as an optional profile extension in `tools\profiles\extensions\lto.json`. Enabling LTO amends the build profile with LTO flags. LTO performs heavy memory optimizations that break debugging, so we don't recommend enabling it in debug or develop profiles.

To enable LTO, retype the `--profile` option with LTO file path `tools\profiles\extensions\lto.json`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When did we type it the first time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First one is when selecting build profile.
mbed compile -t TOOLCHAIN -m TARGET --profile release --profile mbed-os/tools/profiles/extensions/lto.json

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


Example LTO profile memory savings for [mbed-os-example-blinky](https://github.com/ARMmbed/mbed-os-example-blinky):

|Build type|Total static RAM memory (data + BSS)|Total flash memory (text + data)|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's BSS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All data BSS and text are a segments of memory to store different type of variables/data


|Build type|Total static RAM memory (data + BSS)|Total flash memory (text + data)|
| --- | --- | --- |
| GCC_ARM - release - no LTO | 12,096B | 44,628B |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect these numbers to change? If so, we should remove the table, so we don't have to keep updating it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly only amount of saved memory is important. But true, it will change slightly depending on Mbed OS version used.

Maybe it would be better to round the numbers to kB like below example?
What do you think @jamesbeyond ?

Build type Total Static RAM memory (data + bss) Total Flash memory (text + data)
GCC_ARM - release - no LTO 12.1 kB 44.6 kB
GCC_ARM - release - LTO 11.8 kB 41.1 kB
saved memory 0.3 kB 3.5 kB
ARM - release - no LTO 10.36 kB 35.5 kB
ARM - release - LTO 10.15 kB 31.5 kB
saved memory 0.21 kB ‭ 4 kB


|Build type|Total Static RAM memory (data + BSS)|Total Flash memory (text + data)|
| --- | --- | --- |
| GCC_ARM - release - no LTO | 59,760B | 389,637B |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect these numbers to change? If so, we should remove the table, so we don't have to keep updating it.


<span class="notes">**Note**: In LTO builds, the compiler produces bytecode/bitcode instead of regular object code. It's hard to analyze this output with object code analysis tools.</span>

## Limitations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move most or all of Limitations to our release notes? That may be a better place for this content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnotherButler
Copy link
Contributor

It looks like my changes were overwritten. Could you please fix this?

Edit file, mostly for international spelling and active voice.
@maciejbocianski
Copy link
Contributor Author

@AnotherButler fixed

@jamesbeyond
Copy link
Contributor

@AnotherButler @bulislaw, are you happy about this document? shall we get it in?

@AnotherButler
Copy link
Contributor

AnotherButler commented Mar 19, 2020

I'd still like to remove the limitations section into release notes. I'd also like to remove the table if the numbers are likely to change. Does anyone object to that?

@AnotherButler
Copy link
Contributor

@iriark01 This is still waiting on responses to my questions. This is related to the blog post you edited.

@iriark01
Copy link
Contributor

@maciejbocianski I'm not sure what the status of this PR is. Are you still waiting on answers from @jamesbeyond?

@maciejbocianski
Copy link
Contributor Author

ping @bulislaw

@iriark01
Copy link
Contributor

@bulislaw this seems to be waiting for answers


## Using LTO in Mbed OS

The Mbed OS build system implements LTO as an optional profile extension in `tools\profiles\extensions\lto.json`. Enabling LTO amends the build profile with LTO flags. LTO performs heavy memory optimizations that break debugging, so it's recommended to use it only with release profile.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Mbed OS build system implements LTO as an optional profile extension in `tools\profiles\extensions\lto.json`. Enabling LTO amends the build profile with LTO flags. LTO performs heavy memory optimizations that break debugging, so it's recommended to use it only with release profile.
The Mbed OS build system implements LTO as an optional profile extension in `tools\profiles\extensions\lto.json`. Enabling LTO amends the build profile with LTO flags.
<span class="notes">**Note:** LTO performs heavy memory optimizations that break debugging, so we recommend using it only with the release profile, not debugging and develop.</span>


The Mbed OS build system implements LTO as an optional profile extension in `tools\profiles\extensions\lto.json`. Enabling LTO amends the build profile with LTO flags. LTO performs heavy memory optimizations that break debugging, so it's recommended to use it only with release profile.

To enable LTO add `--profile` option with LTO file path `tools\profiles\extensions\lto.json` to the build command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To enable LTO add `--profile` option with LTO file path `tools\profiles\extensions\lto.json` to the build command.
To enable LTO, add the `--profile` option with the LTO file path `tools\profiles\extensions\lto.json` to the build command.

| ARM - release - LTO | 10,153B | 31,514B |
|***saved memory***|212B|‭3,982‬B|

LTO profile build results for [mbed-cloud-client-example](https://github.com/ARMmbed/mbed-cloud-client-example)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are lines 19 and 30 the same thing for two different examples? The words is so different I'm not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the same. It show how much we gain for simple application (line 19) and complex application (line 30).


## Limitations

- LTO slows down build process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- LTO slows down build process.
- LTO slows down the build process.

- LTO slows down build process.
- It’s very hard to control memory placement when using LTO.
- LTO performs heavy memory optimizations that break debugging.
- In LTO builds, the compiler produce bytecode/bitcode instead of regular object code. It's hard to analyze this output with object code analysis tools.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- In LTO builds, the compiler produce bytecode/bitcode instead of regular object code. It's hard to analyze this output with object code analysis tools.
- In LTO builds, the compiler produces bytecode/bitcode instead of regular object code. It's hard to analyze this output with object code analysis tools.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've said this already in the main text

- Arm recommends that link time optimization is only performed on code and data that does not require precise placement in the scatter file, with general input section selectors such as *(+RO) and .ANY(+RO) used to select sections generated by link time optimization. It is not possible to match bitcode in .llvmbc sections by name in a scatter file.
- Bitcode objects are not guaranteed to be compatible across compiler versions. This means that you should ensure all your bitcode files are built using the same version of the compiler when linking with LTO.

### IAR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support IAR in the 6 docs; you can remove this bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

- In LTO builds, the compiler produce bytecode/bitcode instead of regular object code. It's hard to analyze this output with object code analysis tools.
- LTO could cause increases to the stack space needed due to cross-object inlining.

### Arm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Arm
### Arm Compiler 6

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any of these apply only to ARMC5, please remove them - we don't support ARMC5 in Mbed OS 6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that all apply for ARM 6 @kjbracey-arm can you confirm ?


### GCC_ARM

- The minimal required version of `GCC_ARM` is now the GNU Arm Embedded Toolchain Version 9-2019-q4-major. Earlier `GCC_ARM` versions can cause various issues when the `-flto` flag is used, for example a platform-specific error during the final link stage on Windows hosts with GCC8.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "now" mean? This is also the version listed for Mbed OS 6 (on the VPN: https://os.mbed.com/docs/mbed-os/development/build-tools/index.html) so it's really the only option. In other words: do we need "now"?

### GCC_ARM

- The minimal required version of `GCC_ARM` is now the GNU Arm Embedded Toolchain Version 9-2019-q4-major. Earlier `GCC_ARM` versions can cause various issues when the `-flto` flag is used, for example a platform-specific error during the final link stage on Windows hosts with GCC8.
- The `noinline` attribute has to be used for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link-time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. This may lead to `section ".section_name" will not fit in region "region_name"` type errors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The `noinline` attribute has to be used for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link-time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. This may lead to `section ".section_name" will not fit in region "region_name"` type errors.
- You must use the `noinline` attribute for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. This may lead to `section ".section_name" will not fit in region "region_name"` type errors.


- The minimal required version of `GCC_ARM` is now the GNU Arm Embedded Toolchain Version 9-2019-q4-major. Earlier `GCC_ARM` versions can cause various issues when the `-flto` flag is used, for example a platform-specific error during the final link stage on Windows hosts with GCC8.
- The `noinline` attribute has to be used for every function that must be placed into a specific section (specified with a `section(".section_name")` attribute). In general, when a function is considered for inlining, the `section` attribute is always ignored. However, with the link-time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. This may lead to `section ".section_name" will not fit in region "region_name"` type errors.
- There is a bug in all GCC versions causing that LTO removes C functions declared as weak in assembler (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83967 https://bugs.launchpad.net/gcc-arm-embedded/+bug/1747966). The Mbed OS build system provides a fix for this. In case of exporting Mbed OS project to other build system the problem will emerge. This can be fixed by changing the order of object files in the linker command. Objects providing the weak symbols and compiled from assembly must be listed before the objects providing the strong symbols.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- There is a bug in all GCC versions causing that LTO removes C functions declared as weak in assembler (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83967 https://bugs.launchpad.net/gcc-arm-embedded/+bug/1747966). The Mbed OS build system provides a fix for this. In case of exporting Mbed OS project to other build system the problem will emerge. This can be fixed by changing the order of object files in the linker command. Objects providing the weak symbols and compiled from assembly must be listed before the objects providing the strong symbols.
- In all GCC versions, LTO removes C functions declared as weak in assembler (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83967 https://bugs.launchpad.net/gcc-arm-embedded/+bug/1747966). For Mbed OS, the problem emerges when exporting Mbed OS projects to other build systems. You can fix this by changing the order of object files in the linker command: Objects providing the weak symbols and compiled from assembly must be listed before the objects providing the strong symbols.

@iriark01
Copy link
Contributor

@maciejbocianski or @bulislaw I can't merge this without my comments being addressed

@maciejbocianski
Copy link
Contributor Author

@iriark01 @jamesbeyond @bulislaw I have added changes according review suggestions

@iriark01
Copy link
Contributor

Sorry, I've been out. I'll try to work on this this week.

@iriark01
Copy link
Contributor

iriark01 commented Jul 8, 2020

Thank you, Elise.

@iriark01 iriark01 merged commit c1bbcee into ARMmbed:development Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants