IAR: Enable linker optimizations for develop/release profile #11865

maciejbocianski · 2019-11-14T13:28:59Z

Description

Enabled linker optimizations in IAR toolchain for the release/develop profile.
See also #11856

IAR doesn't provide single link-time optimization flag (like --flto in gcc).
Instead IAR offers bunch of linker options:

--vfe
--inline
--merge_duplicate_sections

Test results:

Test results for IAR ANSI C/C++ Compiler V8.32.1.169/W32 for ARM and release profile
Tested on mbed-cloud-client-example

Added --vfe --inline --merge_duplicate_sections options to IAR linker

Memory type	no flags	with flags	yield
Total Static RAM memory (data + bss)	55078 bytes	55078 bytes	0 bytes
Total Flash memory (text + data)	353771 bytes	350529 bytes	‭3242‬ bytes

Compilation time is almost the same

Summary of change (What the change is for and why)

Added --vfe --inline --merge_duplicate_sections options to IAR linker

Documentation (Details of any document updates required)

Pull request type (required)

[] Patch update (Bug fix / Target update / Docs update / Test update / Refactor)
[x] Feature update (New feature / Functionality change / New API)
[] Major update (Breaking change E.g. Return code change / API behaviour change)

Test results (required)

[] No Tests required for this change (E.g docs only update)
[x] Covered by existing mbed-os tests (Greentea or Unittest)
[] Tests / results supplied as part of this PR

Reviewers (optional)

@jamesbeyond @kjbracey-arm @fkjagodzinski

Release Notes (required for feature/major PRs)

Summary of changes

Impact of changes

Migration actions required

ciarmcom · 2019-11-14T14:00:33Z

@maciejbocianski, thank you for your changes.
@jamesbeyond @fkjagodzinski @kjbracey-arm @ARMmbed/mbed-os-tools @ARMmbed/mbed-os-maintainers please review.

kjbracey · 2019-11-15T10:42:10Z

This isn't really the same link-time optimisation as we're talking about for GCC, so the "when to enable" reasoning is different.

vfe - I thought that was on by default. And there's no real downside, afaik. If it's not on by default, I'd suggest turning it on in all profiles.
inline - basically no downside that I can think of, except potentially interfering with breakpoints. I'd have it on for develop and release at least.
merge_duplicate_sections - that makes me nervous. It doesn't just affect debugging, it can affect program behaviour, and makes the compiler non-conformant. I can see it could help with templates though, and chances of it breaking a program are slim.

If there was a "safe" version of merge_duplicate_sections, I'd be happier (see the reference to --icf=safe in https://stackoverflow.com/questions/15168924/gcc-clang-merging-functions-with-identical-instructions-comdat-folding)

Can you show how much each option is saving in your build?

maciejbocianski · 2019-11-18T09:12:22Z

@kjbracey-arm
Below savings per option:

Memory type	old config	+vfe	+merge_duplicate_sections	+inline
Total Static RAM memory (data + bss) [B]	55078	55078(0)	55078(0)	55078(0)
Total Flash memory (text + data) [B]	353771	353771(0)	350715(-‭3056‬)	353536(-‭235‬)

Unfortunately most of the savings gives merge_duplicate_sections.
I didn't find any safer version of merge_duplicate_sections in official docs
http://supp.iar.com/FilesPublic/UPDINFO/013240/arm/doc/EWARM_DevelopmentGuide.ENU.pdf

You are right, this option can be harmful in some situations. Description below

--merge_duplicate_sections
Description: Use this option to keep only one copy of equivalent read-only sections. Note that this can cause different functions or constants to have the same address, so an application
that depends on the addresses being different will not work correctly with this option enabled.

kjbracey · 2019-11-18T10:08:27Z

I wouldn't totally rule out the merge option. I think the failure chances are slim, and the saving is significant. I'd certainly be inclined to try it at a pinch. If I was trying to fit my application into a teeny device, I'd try it, and not as the last resort.

But I don't think it's something we can unilaterally decide to enable in the toolchain for all applications. This could break application code.

And I know I've written code that could be broken by it. For example, in Nanostack I've done

const uint8_t ipv6_any[16] = { 0 };

int my_fn(const uint8_t *addr, ...)
{
    // not bothering with memcmp(addr, ipv6_any, 16) == 0
    // because I know no external user calls this, and all paths
    // leading to this with a null address are using our one true `::` literal.
    if (addr == ipv6_any) {
         special case
    }
}

Ah, here it is:

mbed-os/features/nanostack/sal-stack-nanostack/source/Common_Protocols/mld.c

Lines 152 to 162 in 2e96145

    
           buffer_t *mld_build(protocol_interface_info_entry_t *cur, uint8_t type, uint16_t max_response_delay, const uint8_t address[static 16]) 
        
           { 
        
               uint8_t *ptr; 
        
               /* Validity checks - must be multicast, scope >= link-local */ 
        
               if (!addr_is_ipv6_multicast(address) || addr_ipv6_multicast_scope(address) <= IPV6_SCOPE_INTERFACE_LOCAL) { 
        
                   /* Except we allow unspecified for general query */ 
        
                   if (!(type == ICMPV6_TYPE_INFO_MCAST_LIST_QUERY && address == ADDR_UNSPECIFIED)) { 
        
                       return NULL; 
        
                   } 
        
               }

If the compiler or linker decided to merge that 16-byte zero array with some other handy zeroes, that test no longer works.

maciejbocianski · 2019-11-19T09:36:10Z

I managed to reproduce this behavior, but had to put each variables in separate translation unit to get it finally merged into single memory location by linker
Also found that others had problem with this option: http://lwip.100.n7.nabble.com/lwip-has-errors-with-linker-optimization-enabled-td33792.html

kjbracey · 2019-11-19T10:35:44Z

Ah interesting find with that link. And doubly interesting that it's another network stack :)

I guess network stacks are one of the most complicated things you run on an embedded system - or at least most prone to clever trickery.

So if lwIP is known to fail, I think that totally rules out having it on by default. It is the sort of thing we might mention as a possibilty in a hypothetical "memory saving techniques" doc. With a "not with Nanostack or lwIP" and "don't blame us" proviso.

maciejbocianski · 2019-11-20T13:54:28Z

updated

merge_duplicate_sections was removed as it can be problematic
vfe is enabled by default, so it's no need for adding it (only forcing vfe requires adding it, but it's unsafe --vfe=forced)
inline added to both develop and release profile

Final savings is about 200B of flash

REL +"--inline"
Total Static RAM memory (data + bss): 55078(+0) bytes
Total Flash memory (text + data): 353536(-235) bytes

DEV +"--inline"
Total Static RAM memory (data + bss): 55078(+0) bytes
Total Flash memory (text + data): 373515(-220) bytes

kjbracey · 2019-11-21T07:44:32Z

Yep, this is fine - just need to update the GitHub description. (And get rid of the plurals in the Git commit - it's just the inline optimisation).

maciejbocianski · 2019-11-21T07:56:45Z

@kjbracey-arm
I noticed that IAR Embedded Workbench puts vfe even if forced is not enabled.
So I have added it though.
Commit description updated.

kjbracey · 2019-11-21T08:00:15Z

I noticed that IAR Embedded Workbench puts vfe even if forced is not enabled.
So I have added it though.

Documentation is clear that it's the default - I'm not convinced that it's worth copying the IDE. (And if you were, it should surely be a general policy for all options?) That could be some hold-over legacy from an older compiler version where vfe wasn't the default.

- add --inline option to linker flags Some routines are so small that they can fit in the space of the instruction that calls the routine. Use this option to make the linker replace the call of a routine with the body of the routine, where applicable.

maciejbocianski · 2019-11-21T08:07:45Z

@kjbracey-arm Updated - only inline included

fkjagodzinski

👍 Too bad no more savings can be achieved safely.

kjbracey

Change is fine, but GitHub description should match final outcome.

maciejbocianski · 2019-11-21T11:28:43Z

PR description updated

jamesbeyond

based on the original plan this is for 5.15. So now is this 5.15 or 6.0 @bulislaw @0xc0170

0xc0170 · 2019-11-22T14:58:42Z

Ci started

0xc0170 · 2019-11-22T15:06:34Z

Set to 5.15 as agreed

mbed-ci · 2019-11-22T17:41:13Z

Test run: FAILED

Summary: 1 of 11 test jobs failed
Build number : 1
Build artifacts

Failed test jobs:

jenkins-ci/mbed-os-ci_greentea-test

ciarmcom requested review from fkjagodzinski, jamesbeyond, kjbracey and a team November 14, 2019 14:00

ciarmcom added the needs: review label Nov 14, 2019

maciejbocianski force-pushed the iar_lto branch from 51e6354 to f68a121 Compare November 20, 2019 13:46

maciejbocianski force-pushed the iar_lto branch from f68a121 to fb8ccea Compare November 21, 2019 07:50

maciejbocianski force-pushed the iar_lto branch from fb8ccea to 683ba96 Compare November 21, 2019 08:06

fkjagodzinski approved these changes Nov 21, 2019

View reviewed changes

bulislaw approved these changes Nov 21, 2019

View reviewed changes

kjbracey approved these changes Nov 21, 2019

View reviewed changes

maciejbocianski changed the title ~~IAR: Enable link-time optimizer for release profile~~ IAR: Enable linker optimizations for develop/release profile Nov 21, 2019

jamesbeyond approved these changes Nov 22, 2019

View reviewed changes

0xc0170 approved these changes Nov 22, 2019

View reviewed changes

0xc0170 removed the needs: review label Nov 22, 2019

0xc0170 added needs: CI release-version: 5.15.0-rc1 labels Nov 22, 2019

0xc0170 removed the needs: CI label Nov 23, 2019

0xc0170 merged commit e03180d into ARMmbed:master Nov 23, 2019

IAR: Enable linker optimizations for develop/release profile #11865

IAR: Enable linker optimizations for develop/release profile #11865

Uh oh!

Conversation

maciejbocianski commented Nov 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary of change (What the change is for and why)

Documentation (Details of any document updates required)

Pull request type (required)

Test results (required)

Reviewers (optional)

Release Notes (required for feature/major PRs)

Summary of changes

Impact of changes

Migration actions required

Uh oh!

ciarmcom commented Nov 14, 2019

Uh oh!

kjbracey commented Nov 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maciejbocianski commented Nov 18, 2019

Uh oh!

kjbracey commented Nov 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maciejbocianski commented Nov 19, 2019

Uh oh!

kjbracey commented Nov 19, 2019

Uh oh!

maciejbocianski commented Nov 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kjbracey commented Nov 21, 2019

Uh oh!

maciejbocianski commented Nov 21, 2019

Uh oh!

kjbracey commented Nov 21, 2019

Uh oh!

maciejbocianski commented Nov 21, 2019

Uh oh!

fkjagodzinski left a comment

Choose a reason for hiding this comment

Uh oh!

kjbracey left a comment

Choose a reason for hiding this comment

Uh oh!

maciejbocianski commented Nov 21, 2019

Uh oh!

jamesbeyond left a comment

Choose a reason for hiding this comment

Uh oh!

0xc0170 commented Nov 22, 2019

Uh oh!

0xc0170 commented Nov 22, 2019

Uh oh!

mbed-ci commented Nov 22, 2019

Test run: FAILED

Uh oh!

Uh oh!

maciejbocianski commented Nov 14, 2019 •

edited

Loading

kjbracey commented Nov 15, 2019 •

edited

Loading

kjbracey commented Nov 18, 2019 •

edited

Loading

maciejbocianski commented Nov 20, 2019 •

edited

Loading