Skip to content

Optimise HAL_GetTick API #12915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 11, 2020
Merged

Conversation

rajkan01
Copy link
Contributor

@rajkan01 rajkan01 commented May 4, 2020

Summary of changes

HAL_GetTick API has taken about 51us of one-time execution on bare metal profile with small c library (Microlib) on NUCLEO_F303RE target with ARMC6 toolchain and causes failures in lp ticker speed test case from common tickers, so optimized that API to improve overall performance to achieve optimal execution time and this also improves in other profiles. Refer PR#12880 for the more details on this issue.

Impact of changes

with these changes, HAL_GetTick API performance has improved in all profiles, and the below table shows improvement results

Ticker API Full profile(RTOS)
(HAL_GetTick without optimization)
Bare metal with std lib
(HAL_GetTick without optimization)
Bare metal profile with small C lib (HAL_GetTick without optimization) All profiles
(HAL_GetTick with optimization)
clear_interrupt 10us 10us 51us 5us
disable_interrupt 10us 10us 51us 5us

Migration actions required

None.

Documentation

None.


Pull request type

[x] Patch update (Bug fix / Target update / Docs update / Test update / Refactor)
[] Feature update (New feature / Functionality change / New API)
[] Major update (Breaking change E.g. Return code change / API behaviour change)

Test results

[] No Tests required for this change (E.g docs only update)
[x] Covered by existing mbed-os tests (Greentea or Unittest)
[] Tests / results supplied as part of this PR

Reviewers

@evedon @kjbracey-arm


@rajkan01 rajkan01 force-pushed the hal_gettick_api_optim branch from a22fc21 to 2bde715 Compare May 4, 2020 16:29
@ciarmcom ciarmcom requested review from evedon, kjbracey and a team May 4, 2020 17:00
@ciarmcom
Copy link
Member

ciarmcom commented May 4, 2020

@rajkan01, thank you for your changes.
@evedon @kjbracey-arm @ARMmbed/mbed-os-maintainers please review.

elapsed_time += (new_time - prev_time) & 0xFFFF; // Only use the lower 16 bits
prev_time = new_time;
return (elapsed_time / 1000);
elapsed_ticks = elapsed_ticks / 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be elapsed_time / 1000.

You would have got a warning about using an unset variable if you hadn't set elapsed_ticks to 0 at the start - good example of why bogus initialisation can cause errors. Personally I would do

prev_time = new_time
uint32_t elapsed_ticks;
if (TIM_MST_BIT_WIDTH <= 16 || elapsed_time < 65536) {
    elapsed_ticks = 0;
    while (elapsed_time >= 1000) {

to maximally localise declaration and minimise unnecessary sets, but others' taste no doubt differs. The current form is fine once corrected, and probably compiles to the same code.

// Variables also reset in us_ticker_init()
uint32_t total_ticks = 0;
Copy link
Contributor

@kjbracey kjbracey May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If retaining these variables as global, please give them some sort of prefix to minimise chance of namespace collisions.

But you can make them static. As I said, I don't think they need manual init, but even then, this current code makes no sense. Why is init_16bit_timer setting them to zero, rather than HAL_InitTick itself? Do it there, and they can be static.

Ah comments suggest that us_ticker_init may have previously been initialising them (why?) But that's no longer true, it seems. Some start-up previous ordering problems have apparently been resolved. Clean up any such comments.

@kjbracey
Copy link
Contributor

kjbracey commented May 5, 2020

It's worth noting in the description that this is also a small optimisation for other C libraries - you had it going from 10us to 5us for ARM_STD, was it?

@rajkan01 rajkan01 force-pushed the hal_gettick_api_optim branch from 2bde715 to 54d00df Compare May 5, 2020 13:41
@mergify mergify bot dismissed kjbracey’s stale review May 5, 2020 13:41

Pull request has been modified.

@rajkan01
Copy link
Contributor Author

rajkan01 commented May 5, 2020

It's worth noting in the description that this is also a small optimisation for other C libraries - you had it going from 10us to 5us for ARM_STD, was it?

@kjbracey-arm Updated the PR description to show that information.

Copy link
Contributor

@kjbracey kjbracey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine. Only suggestion is just maybe a comment on the if about it being a speed optimisation for small time intervals, avoiding a potentially-slow C library divide,

(Otherwise the casual reader might be trying to figure out what the functional difference is between the if and the else)

@mergify mergify bot added needs: CI and removed needs: work labels May 5, 2020
@mergify mergify bot added needs: work and removed needs: CI labels May 5, 2020
@rajkan01 rajkan01 force-pushed the hal_gettick_api_optim branch from 54d00df to 45bebd1 Compare May 5, 2020 14:20
@mergify mergify bot dismissed kjbracey’s stale review May 5, 2020 14:21

Pull request has been modified.

…ptimized HAL_GetTick API to improve performance.
@rajkan01 rajkan01 force-pushed the hal_gettick_api_optim branch from 45bebd1 to 4ab794b Compare May 5, 2020 15:25
@mergify mergify bot added needs: CI and removed needs: work labels May 6, 2020
@0xc0170
Copy link
Contributor

0xc0170 commented May 6, 2020

CI started

@MarceloSalazar MarceloSalazar requested a review from a team May 6, 2020 09:23
@mbed-ci
Copy link

mbed-ci commented May 6, 2020

Test run: SUCCESS

Summary: 6 of 6 test jobs passed
Build number : 1
Build artifacts

Copy link
Contributor

@evedon evedon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great improvement by calling us_ticker_read directly!

@kjbracey
Copy link
Contributor

kjbracey commented May 6, 2020

I think in this case the direct call to us_ticker_read isn't the most important factor - using ticker_read_us it is actually quite fast on ST because its us_ticker is 1MHz, and the ticker_read_us already special-cases that to avoid doing a divisions.

It probably amounts to a few microseconds of the drop (as seen in the 10us->5us change). Still worthwhile. And we bothered to do it in #10609, as it really matters on platforms where it would have needed a division, and we end up avoiding any runtime division (eg wait for an 8MHz timer to increment by 8000, rather than waiting for "timer divided by 8" to increment by 1000).

Here the bulk of the gain is avoiding division, particularly the 64-bit division.

But do check out #12897, which tries to make the general code smaller, and will make it faster in some cases as a secondary benefit. (Smaller when compiled at least - all the ifdefs make the source bigger :( )

Copy link
Collaborator

@jeromecoutant jeromecoutant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ST CI OK
(except timing drift tests, see
#12920 (comment))

@0xc0170
Copy link
Contributor

0xc0170 commented May 8, 2020

Will be merged as soon as we have 6.0.0 beta1 released

@0xc0170 0xc0170 merged commit a707fd1 into ARMmbed:master May 11, 2020
@mergify mergify bot removed the ready for merge label May 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants