Skip to content

Replace microlib arithmetic #13289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

kjbracey
Copy link
Contributor

@kjbracey kjbracey commented Jul 14, 2020

Summary of changes

Microlib 64-bit multiply and 32/64-bit divide routines are extremely slow, and in the case of multiply, bigger too! The 64-bit divide routine is slow enough to cause interrupt latency problems.

Replace with local implementations extracted from the ARM Compiler 6 standard library.

Fixes #12221.

Impact of changes

Need to consider licensing - these routines were extracted from the ARM Compiler 6 c_p.l and c_w.l libraries, which have their own End User Licence. Can we republish them here under the Apache license? Assembly was hand-generated by converting the fromelf output.

They are exact copies of the standard library code, checked by diffing the fromelf output from both.

Migration actions required

Documentation

None.


Pull request type

[X] Patch update (Bug fix / Target update / Docs update / Test update / Refactor)
[] Feature update (New feature / Functionality change / New API)
[] Major update (Breaking change E.g. Return code change / API behaviour change)

Test results

[] No Tests required for this change (E.g docs only update)
[X] Covered by existing mbed-os tests (Greentea or Unittest)
[] Tests / results supplied as part of this PR

Reviewers


@ciarmcom ciarmcom requested review from a team July 14, 2020 11:00
@ciarmcom
Copy link
Member

@kjbracey-arm, thank you for your changes.
@ARMmbed/mbed-os-tools @ARMmbed/mbed-os-core @ARMmbed/mbed-os-maintainers please review.

kjbracey added 3 commits July 17, 2020 09:20
To make it easier to have Thumb-1 and Thumb-2 versions of assembly,
add labels corresponding to them. Naming mirrors that of ARMv8-M
variants, but "baseline" also covers ARMv6-M, so does not include
exclusive-access instructions, for example.

These two labels correspond to the "p" and "w" variants of the
ARM standard library.
Microlib 64-bit multiply and 32/64-bit divide routines are extremely
slow, and in the case of multiply, bigger too! The 64-bit divide routine
is slow enough to cause interrupt latency problems.

Replace with local implementations extracted from the ARM Compiler 6
standard library.
`FRAME PUSH {regs},offset` doesn't work the way I expected, or that
would be more useful - it puts the registers at the new stack
position, not below the old one. Use `FRAME SAVE` and `FRAME ADDRESS`
manually instead.
@kjbracey
Copy link
Contributor Author

Moved files to correct location - PR was cherry-picked from an old Mbed OS fork with different layout.

@adbridge adbridge added the release-type: patch Indentifies a PR as containing just a patch label Jul 17, 2020
Copy link
Contributor

@0xc0170 0xc0170 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why we do not fix the library itself ?
  2. The license - we need to check internally how to proceed if we decide this PR should get in (1st question)

@donatieng
Copy link
Contributor

Thanks @kjbracey-arm, I'll take it up with the AC6 team - I really see the problem but I'm worried about maintaining what is essentially a fork of Microlib within Mbed OS. As you noted I'm not sure we can introduce those changes under Apache 2.

@evedon
Copy link
Contributor

evedon commented Jul 28, 2020

  • Why we do not fix the library itself ?

A ticket was raised to the Compiler team. See internal reference SDCOMP-55606 and SDCOMP-55674

@donatieng
Copy link
Contributor

Hi @kjbracey-arm and all, the compiler team kindly agreed to fix the 64-bit multiply issue, once the fix is in and the relevant AC6 version is out, we can update the recommended compiler version.

For division, it looks like an acceptable trade-off (8 times smaller and slower) so I suggest we re-run benchmarks once the multiply fix is in to see if it actually is an issue.

@kjbracey-arm are you happy with this? If so I suggest we close this PR.

@kjbracey
Copy link
Contributor Author

kjbracey commented Sep 10, 2020

Good to hear they're looking at the multiply, at least.

The divides are still pretty rubbish. Those were the actual real issue we've seen in practice, rather than the multiply - being too slow for interrupt use in low-clock rate devices. Sure, they're not blatantly breaking the microlib concept by being bigger, but they're still a problem.

(Just urged someone to avoid % for that reason here: #13563 (review) )

Having extracted and examined this divide, which has a huge amount of unrolling, it would be nice to see a somewhere-inbetween version for microlib. The microlib one is a simple C loop, so it's double-hit by lack of unrolling and being clunky compiled C. How about a tight small hand-crafted assembler loop, maybe with a teeny bit of unrolling?

@kjbracey
Copy link
Contributor Author

Ah, just read the history more carefully and checked the compiler issues. Looks like a division improvement is incoming. Yay.

I guess we can close this now.

@kjbracey kjbracey closed this Sep 10, 2020
@mergify mergify bot removed do not merge needs: review release-type: patch Indentifies a PR as containing just a patch labels Sep 10, 2020
@kjbracey kjbracey deleted the microlib_replace branch September 10, 2020 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Division operation much slower with microlib than other libs on cortex-M0
6 participants