Skip to content

IAR: Enable linker optimizations for develop/release profile #11865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 23, 2019

Conversation

maciejbocianski
Copy link
Contributor

@maciejbocianski maciejbocianski commented Nov 14, 2019

Description

Enabled linker optimizations in IAR toolchain for the release/develop profile.
See also #11856

IAR doesn't provide single link-time optimization flag (like --flto in gcc).
Instead IAR offers bunch of linker options:

  • --vfe
  • --inline
  • --merge_duplicate_sections

Test results:

Test results for IAR ANSI C/C++ Compiler V8.32.1.169/W32 for ARM and release profile
Tested on mbed-cloud-client-example

Added --vfe --inline --merge_duplicate_sections options to IAR linker

Memory type no flags with flags yield
Total Static RAM memory (data + bss) 55078 bytes 55078 bytes 0 bytes
Total Flash memory (text + data) 353771 bytes 350529 bytes ‭3242‬ bytes

Compilation time is almost the same

Summary of change (What the change is for and why)

Added --vfe --inline --merge_duplicate_sections options to IAR linker

Documentation (Details of any document updates required)

Pull request type (required)

[] Patch update (Bug fix / Target update / Docs update / Test update / Refactor)
[x] Feature update (New feature / Functionality change / New API)
[] Major update (Breaking change E.g. Return code change / API behaviour change)

Test results (required)

[] No Tests required for this change (E.g docs only update)
[x] Covered by existing mbed-os tests (Greentea or Unittest)
[] Tests / results supplied as part of this PR

Reviewers (optional)

@jamesbeyond @kjbracey-arm @fkjagodzinski


Release Notes (required for feature/major PRs)

Summary of changes
Impact of changes
Migration actions required

@ciarmcom
Copy link
Member

@maciejbocianski, thank you for your changes.
@jamesbeyond @fkjagodzinski @kjbracey-arm @ARMmbed/mbed-os-tools @ARMmbed/mbed-os-maintainers please review.

@kjbracey
Copy link
Contributor

kjbracey commented Nov 15, 2019

This isn't really the same link-time optimisation as we're talking about for GCC, so the "when to enable" reasoning is different.

  • vfe - I thought that was on by default. And there's no real downside, afaik. If it's not on by default, I'd suggest turning it on in all profiles.
  • inline - basically no downside that I can think of, except potentially interfering with breakpoints. I'd have it on for develop and release at least.
  • merge_duplicate_sections - that makes me nervous. It doesn't just affect debugging, it can affect program behaviour, and makes the compiler non-conformant. I can see it could help with templates though, and chances of it breaking a program are slim.

If there was a "safe" version of merge_duplicate_sections, I'd be happier (see the reference to --icf=safe in https://stackoverflow.com/questions/15168924/gcc-clang-merging-functions-with-identical-instructions-comdat-folding)

Can you show how much each option is saving in your build?

@maciejbocianski
Copy link
Contributor Author

@kjbracey-arm
Below savings per option:

Memory type old config +vfe +merge_duplicate_sections +inline
Total Static RAM memory (data + bss) [B] 55078 55078(0) 55078(0) 55078(0)
Total Flash memory (text + data) [B] 353771 353771(0) 350715(-‭3056‬) 353536(-‭235‬)

Unfortunately most of the savings gives merge_duplicate_sections.
I didn't find any safer version of merge_duplicate_sections in official docs
http://supp.iar.com/FilesPublic/UPDINFO/013240/arm/doc/EWARM_DevelopmentGuide.ENU.pdf

You are right, this option can be harmful in some situations. Description below

--merge_duplicate_sections
Description: Use this option to keep only one copy of equivalent read-only sections. Note that this can cause different functions or constants to have the same address, so an application
that depends on the addresses being different will not work correctly with this option enabled.

@kjbracey
Copy link
Contributor

kjbracey commented Nov 18, 2019

I wouldn't totally rule out the merge option. I think the failure chances are slim, and the saving is significant. I'd certainly be inclined to try it at a pinch. If I was trying to fit my application into a teeny device, I'd try it, and not as the last resort.

But I don't think it's something we can unilaterally decide to enable in the toolchain for all applications. This could break application code.

And I know I've written code that could be broken by it. For example, in Nanostack I've done

const uint8_t ipv6_any[16] = { 0 };

int my_fn(const uint8_t *addr, ...)
{
    // not bothering with memcmp(addr, ipv6_any, 16) == 0
    // because I know no external user calls this, and all paths
    // leading to this with a null address are using our one true `::` literal.
    if (addr == ipv6_any) {
         special case
    }
}

Ah, here it is:

buffer_t *mld_build(protocol_interface_info_entry_t *cur, uint8_t type, uint16_t max_response_delay, const uint8_t address[static 16])
{
uint8_t *ptr;
/* Validity checks - must be multicast, scope >= link-local */
if (!addr_is_ipv6_multicast(address) || addr_ipv6_multicast_scope(address) <= IPV6_SCOPE_INTERFACE_LOCAL) {
/* Except we allow unspecified for general query */
if (!(type == ICMPV6_TYPE_INFO_MCAST_LIST_QUERY && address == ADDR_UNSPECIFIED)) {
return NULL;
}
}

If the compiler or linker decided to merge that 16-byte zero array with some other handy zeroes, that test no longer works.

@maciejbocianski
Copy link
Contributor Author

I managed to reproduce this behavior, but had to put each variables in separate translation unit to get it finally merged into single memory location by linker
Also found that others had problem with this option: http://lwip.100.n7.nabble.com/lwip-has-errors-with-linker-optimization-enabled-td33792.html

@kjbracey
Copy link
Contributor

Ah interesting find with that link. And doubly interesting that it's another network stack :)

I guess network stacks are one of the most complicated things you run on an embedded system - or at least most prone to clever trickery.

So if lwIP is known to fail, I think that totally rules out having it on by default. It is the sort of thing we might mention as a possibilty in a hypothetical "memory saving techniques" doc. With a "not with Nanostack or lwIP" and "don't blame us" proviso.

@maciejbocianski
Copy link
Contributor Author

maciejbocianski commented Nov 20, 2019

updated

  • merge_duplicate_sections was removed as it can be problematic
  • vfe is enabled by default, so it's no need for adding it (only forcing vfe requires adding it, but it's unsafe --vfe=forced)
  • inline added to both develop and release profile

Final savings is about 200B of flash

REL +"--inline"
Total Static RAM memory (data + bss): 55078(+0) bytes
Total Flash memory (text + data): 353536(-235) bytes

DEV +"--inline"
Total Static RAM memory (data + bss): 55078(+0) bytes
Total Flash memory (text + data): 373515(-220) bytes

@kjbracey
Copy link
Contributor

Yep, this is fine - just need to update the GitHub description. (And get rid of the plurals in the Git commit - it's just the inline optimisation).

@maciejbocianski
Copy link
Contributor Author

@kjbracey-arm
I noticed that IAR Embedded Workbench puts vfe even if forced is not enabled.
So I have added it though.
Commit description updated.

@kjbracey
Copy link
Contributor

I noticed that IAR Embedded Workbench puts vfe even if forced is not enabled.
So I have added it though.

Documentation is clear that it's the default - I'm not convinced that it's worth copying the IDE. (And if you were, it should surely be a general policy for all options?) That could be some hold-over legacy from an older compiler version where vfe wasn't the default.

- add --inline option to linker flags
  Some routines are so small that they can fit in the space of the
  instruction that calls the routine. Use this option to make the
  linker replace the call of a routine with the body of the routine,
  where applicable.
@maciejbocianski
Copy link
Contributor Author

@kjbracey-arm Updated - only inline included

Copy link
Member

@fkjagodzinski fkjagodzinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Too bad no more savings can be achieved safely.

Copy link
Contributor

@kjbracey kjbracey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change is fine, but GitHub description should match final outcome.

@maciejbocianski maciejbocianski changed the title IAR: Enable link-time optimizer for release profile IAR: Enable linker optimizations for develop/release profile Nov 21, 2019
@maciejbocianski
Copy link
Contributor Author

PR description updated

Copy link
Contributor

@jamesbeyond jamesbeyond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the original plan this is for 5.15. So now is this 5.15 or 6.0 @bulislaw @0xc0170

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 22, 2019

Ci started

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 22, 2019

Set to 5.15 as agreed

@mbed-ci
Copy link

mbed-ci commented Nov 22, 2019

Test run: FAILED

Summary: 1 of 11 test jobs failed
Build number : 1
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_greentea-test

@0xc0170 0xc0170 removed the needs: CI label Nov 23, 2019
@0xc0170 0xc0170 merged commit e03180d into ARMmbed:master Nov 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants