-
Notifications
You must be signed in to change notification settings - Fork 3k
Replace microlib arithmetic #13289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace microlib arithmetic #13289
Conversation
@kjbracey-arm, thank you for your changes. |
To make it easier to have Thumb-1 and Thumb-2 versions of assembly, add labels corresponding to them. Naming mirrors that of ARMv8-M variants, but "baseline" also covers ARMv6-M, so does not include exclusive-access instructions, for example. These two labels correspond to the "p" and "w" variants of the ARM standard library.
Microlib 64-bit multiply and 32/64-bit divide routines are extremely slow, and in the case of multiply, bigger too! The 64-bit divide routine is slow enough to cause interrupt latency problems. Replace with local implementations extracted from the ARM Compiler 6 standard library.
`FRAME PUSH {regs},offset` doesn't work the way I expected, or that would be more useful - it puts the registers at the new stack position, not below the old one. Use `FRAME SAVE` and `FRAME ADDRESS` manually instead.
366851c
to
f2d01b0
Compare
Moved files to correct location - PR was cherry-picked from an old Mbed OS fork with different layout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why we do not fix the library itself ?
- The license - we need to check internally how to proceed if we decide this PR should get in (1st question)
Thanks @kjbracey-arm, I'll take it up with the AC6 team - I really see the problem but I'm worried about maintaining what is essentially a fork of Microlib within Mbed OS. As you noted I'm not sure we can introduce those changes under Apache 2. |
A ticket was raised to the Compiler team. See internal reference SDCOMP-55606 and SDCOMP-55674 |
Hi @kjbracey-arm and all, the compiler team kindly agreed to fix the 64-bit multiply issue, once the fix is in and the relevant AC6 version is out, we can update the recommended compiler version. For division, it looks like an acceptable trade-off (8 times smaller and slower) so I suggest we re-run benchmarks once the multiply fix is in to see if it actually is an issue. @kjbracey-arm are you happy with this? If so I suggest we close this PR. |
Good to hear they're looking at the multiply, at least. The divides are still pretty rubbish. Those were the actual real issue we've seen in practice, rather than the multiply - being too slow for interrupt use in low-clock rate devices. Sure, they're not blatantly breaking the microlib concept by being bigger, but they're still a problem. (Just urged someone to avoid Having extracted and examined this divide, which has a huge amount of unrolling, it would be nice to see a somewhere-inbetween version for microlib. The microlib one is a simple C loop, so it's double-hit by lack of unrolling and being clunky compiled C. How about a tight small hand-crafted assembler loop, maybe with a teeny bit of unrolling? |
Ah, just read the history more carefully and checked the compiler issues. Looks like a division improvement is incoming. Yay. I guess we can close this now. |
Summary of changes
Microlib 64-bit multiply and 32/64-bit divide routines are extremely slow, and in the case of multiply, bigger too! The 64-bit divide routine is slow enough to cause interrupt latency problems.
Replace with local implementations extracted from the ARM Compiler 6 standard library.
Fixes #12221.
Impact of changes
Need to consider licensing - these routines were extracted from the ARM Compiler 6 c_p.l and c_w.l libraries, which have their own End User Licence. Can we republish them here under the Apache license? Assembly was hand-generated by converting the fromelf output.
They are exact copies of the standard library code, checked by diffing the fromelf output from both.
Migration actions required
Documentation
None.
Pull request type
Test results
Reviewers