Skip to content

Force inline Timer::attach() to get rid of floating-point instructions #11236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

hugueskamba
Copy link
Collaborator

@hugueskamba hugueskamba commented Aug 15, 2019

Description

Calls to Timer::attach() are inlined in order not to use floating-point
library functions calls given Timer::attach_us() expects an integer
for the callback interval.

Pull request type

[ ] Fix
[X] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@evedon @gpsimenos @jamesbeyond

Release Notes

@ciarmcom
Copy link
Member

@hugueskamba, thank you for your changes.
@evedon @jamesbeyond @gpsimenos @ARMmbed/mbed-os-core @ARMmbed/mbed-os-hal @ARMmbed/mbed-os-test @ARMmbed/mbed-os-maintainers please review.

@kjbracey
Copy link
Contributor

kjbracey commented Aug 15, 2019

Personally I'm not that strongly opposed to the float API existing, but there should be an integer one. We do have attach_us, but that's clunky if you just want 1s.

I'm a bit wary about making changes to any time-related APIs at this point, because we have just shifted to C++14.

That opens the door to doing all time-related APIs better using <chrono> - the type-safe time system with automatic resolution conversion. You could redefine (or overload) the APIs to take std::duration, so users can do

 ticker->attach(1s);
 ticker->attach(40ms);

That could work now, with ARMC6, GCC and IAR, but we don't have chrono for ARMC5 yet, and I don't want to totally break it.

But if we are making a change, I'd want the change to be to that, rather than some intermediate.

But given that attach(float) is actually inline anyway, and almost always used with a constant, is it actually ever really generating actual floating point instructions in practice?

attach(0.5) should be getting inlined to attach_us(500000) anyway.

@hugueskamba
Copy link
Collaborator Author

hugueskamba commented Aug 15, 2019

But given that attach(float) is actually inline anyway, and almost always used with a constant, is it actually ever really generating actual floating point instructions in practice?

attach(0.5) should be getting inlined to attach_us(500000) anyway.

Build the example project mbed-os-example-tls/benchmark with no floating point in minimal printf and have a look at the ELF file. There are floating point function calls prefixed with __aeabi_. The example project calls attach in main.cpp. With this change the floating point function calls disappear.

I'll let @evedon comment on the rest of your comment.
Thanks

@kjbracey
Copy link
Contributor

kjbracey commented Aug 15, 2019

Which toolchain(s) and profiles were you looking at for the ELF?

@hugueskamba
Copy link
Collaborator Author

Which toolchain(s) were you looking at for the ELF?

GCC_ARM

@evedon
Copy link
Contributor

evedon commented Aug 15, 2019

Personally I'm not that strongly opposed to the float API existing, but there should be an integer one. We do have attach_us, but that's clunky if you just want 1s.

We are trying to remove usage of floats In mbed-os where it is not necessary. I think that this attach float API could be removed without losing functionality so it is a good candidate.

@kjbracey
Copy link
Contributor

We are trying to remove usage of floats In mbed-os where it is not necessary.

Removing unnecessary usage is good, but I'm a bit wary about outright yanking APIs.

In this case, the problem does seem to be arising due to reluctance of the compiler to inline attach. I see both GCC and ARMC6 being a bit reluctant. That's possibly due to the complexity of the Callback bit.

But if you add MBED_FORCEINLINE to attach, then that seems to make it go away - an attach(X, 0.5) does get inlined to attach_us(X, 500000), dropping the call to the FP library.

I'd be happy with something like that as an interim measure (pending a <chrono> API extension) , given that the vast majority of such calls take constants, thus that will be sufficient to eliminate the vast majority of FP.

For other non-time-related unnecessarily-float driver APIs, like ADC, I'd be a bit less reticent to make changes - it's just that time is an overall system, and I'm not keen on seeing piecemeal changes that are not in the direction I ultimately expect it to go in (<chrono>).

@kjbracey
Copy link
Contributor

kjbracey commented Aug 16, 2019

Still, I can live with attach_s. Neither that nor the deprecation of attach(float) gets in the way of a future attach(std::duration) overload.

(But I would like to deprecate all non-chrono C++ time APIs, includingattach_s and attach_us, at that time).

But do consider whether just adding the MBED_FORCEINLINE achieves a good enough effect without any user disruption.

@hugueskamba
Copy link
Collaborator Author

Still, I can live with attach_s. Neither that nor the deprecation of attach(float) gets in the way of a future attach(std::duration) overload.

(But I would like to deprecate all non-chrono C++ time APIs, includingattach_s and attach_us, at that time).

But do consider whether just adding the MBED_FORCEINLINE achieves a good enough effect without any user disruption.

If indeed MBED_FORCEINLINE removes the FP calls I am in favour of this as it less code changes.

@hugueskamba hugueskamba changed the title Require integer to specify callback interval with Ticker API Force inline Timer::attach() to get rid of floating-point instruction Aug 16, 2019
@hugueskamba hugueskamba changed the title Force inline Timer::attach() to get rid of floating-point instruction Force inline Timer::attach() to get rid of floating-point instructions Aug 16, 2019
Copy link
Contributor

@evedon evedon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice solution indeed !

@kjbracey
Copy link
Contributor

kjbracey commented Aug 16, 2019

It is possible this has a negative effect of inlining an extra call to the Callback copy operator, making it a bit bigger than just calling attach_us directly. (Which may have been the reason for the compiler choosing not to inline).

A possible alternative would be to turn it into perfect-forwarding template:

template <typename F>
void attach(F&& func, float t)
{
    attach_us(std::forward<F>(func), t * 1000000.0f);
}

(That's the C++11 for "forward any F, preserving type/lifetime info by using an appropriate reference").

That version then might not actually need a MBED_FORCEINLINE, because the generated reference-using code should be simpler than the copy of a callback. It's definitely a better way of doing it than copying a Callback (which is not a trivial operation). It should be the case that the template version produces smaller code.

Only downside is binary compatibility - I don't think you could easily preserve the original attach for any pre-existing binaries that didn't inline it - the overloading with the template would go wrong. If you cared about that you'd have to have some macro magic to make the old method only visible for Ticker.cpp or something, so it could define it but no external code would see it.

@evedon
Copy link
Contributor

evedon commented Aug 16, 2019

In light of the last comment, I would rather go back to the previous solution i.e. introduce a new attach_s API and deprecate attach(float).

@kjbracey
Copy link
Contributor

Actually, just realised there is no binary compatibility issue - this is C++ inlining, so anyone using the old version non-inlined would have generated their own common out-of-line definition in their object file, they wouldn't be relying on us giving them one.

I was thinking in C99.

@hugueskamba
Copy link
Collaborator Author

It is possible this has a negative effect of inlining an extra call to the Callback copy operator, making it a bit bigger than just calling attach_us directly. (Which may have been the reason for the compiler choosing not to inline).

A possible alternative would be to turn it into perfect-forwarding template:

template <typename F>
void attach(F&& func, float t)
{
    attach_us(std::forward<F>(func), t * 1000000.0f);
}

(That's the C++11 for "forward any F, preserving type/lifetime info by using an appropriate reference").

That version then might not actually need a MBED_FORCEINLINE, because the generated reference-using code should be simpler than the copy of a callback. It's definitely a better way of doing it than copying a Callback (which is not a trivial operation). It should be the case that the template version produces smaller code.

Only downside is binary compatibility - I don't think you could easily preserve the original attach for any pre-existing binaries that didn't inline it - the overloading with the template would go wrong. If you cared about that you'd have to have some macro magic to make the old method only visible for Ticker.cpp or something, so it could define it but no external code would see it.

@evedon @kjbracey-arm
Given the above, I used Perfect forwarding to avoid the possible heavy callback copy. The forced inlining is still needed as the compiler chose not to inline.
7df6160

@kjbracey
Copy link
Contributor

Did you observe a size difference between those last two versions, at least?

@hugueskamba hugueskamba force-pushed the hk-iotcore-1315-remove-floating-point-ticker branch from 1f5b4db to 3373d78 Compare August 19, 2019 15:14
@hugueskamba
Copy link
Collaborator Author

Force pushed after rebasing from master because of intermittent CI failure.

@mbed-ci
Copy link

mbed-ci commented Aug 19, 2019

Test run: FAILED

Summary: 2 of 4 test jobs failed
Build number : 2
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_build-IAR
  • jenkins-ci/mbed-os-ci_build-GCC_ARM

@hugueskamba
Copy link
Collaborator Author

CI re-started

@mbed-ci
Copy link

mbed-ci commented Aug 20, 2019

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 3
Build artifacts

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 20, 2019

Add release notes, will be merged afterwards

@kjbracey
Copy link
Contributor

I would say this isn't actually a functionality change - it's a refactor.

@hugueskamba
Copy link
Collaborator Author

I would say this isn't actually a functionality change - it's a refactor.

@0xc0170 Indeed it is a refactor (description updated).

@0xc0170 0xc0170 merged commit 398515a into ARMmbed:master Aug 20, 2019
@hugueskamba hugueskamba deleted the hk-iotcore-1315-remove-floating-point-ticker branch September 10, 2019 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants