Add extra delta latency on common tickers lp ticker speed greentea test #12880

rajkan01 · 2020-04-28T20:01:50Z

Summary of changes

lp ticker speed test cases from common_tickers test suite were failed while running greentea test on bare metal profile with a small c library on NUCLEO_F303RE target with ARMC6 toolchain but works with bare metal profile with std c library.

Below table shows the execution time of lp ticker failed APIs in micro second on different profiles:

Ticker API	Full profile	Bare metal profile with std C lib	Bare metal profile with small C lib
clear_interrupt	10	10	51
disable_interrupt	10	10	51

As it is a problem with small C library (microlib) slow division limitation and this target slow clock behaviour on deep sleep as like similar issue faced with NUCLEO_L073RZ Cortex-M0 target on deep sleep green tea test

As a workaround, I have added the additional extra delta latency of 25us on expected execution time calculation to allow clear_interrupt and disable_interrupt test cases to pass.

Impact of changes

With these changes, lp ticker speed greentea test cases are passed.

Migration actions required

None.

Documentation

None.

Pull request type

[x] Patch update (Bug fix / Target update / Docs update / Test update / Refactor)
[] Feature update (New feature / Functionality change / New API)
[] Major update (Breaking change E.g. Return code change / API behaviour change)

Test results

[] No Tests required for this change (E.g docs only update)
[x] Covered by existing mbed-os tests (Greentea or Unittest)
[] Tests / results supplied as part of this PR

Reviewers

cc @MarceloSalazar
@evedon

ciarmcom · 2020-04-28T21:00:29Z

@rajkan01, thank you for your changes.
@evedon @MarceloSalazar @ARMmbed/mbed-os-hal @ARMmbed/mbed-os-test @ARMmbed/mbed-os-maintainers please review.

MarceloSalazar · 2020-04-29T06:37:01Z

@ARMmbed/team-st-mcd please review as it's related to our conversations in #12221

jeromecoutant

Note that STM32F303 is a cortex M4,
so maybe #12221 scope has to be updated ?

kjbracey · 2020-04-29T08:24:20Z

The problem is the test exists for a reason. The moment total timer interrupt execution time exceeds 80us, you're losing serial data (at 115200, if no FIFO, which is common, and true here, I think). A platform failing this test is likely to have comms issues.

0xc0170

Add details from the PR description to the commit to explain reasoning for this extra time for microlib.

0xc0170 · 2020-04-29T09:07:34Z

TESTS/mbed_hal/common_tickers/main.cpp

@@ -54,6 +54,11 @@ extern "C" {

 #define MAX_FUNC_EXEC_TIME_US 20
 #define DELTA_FUNC_EXEC_TIME_US 5
+#if defined(__MICROLIB)


we dont have small std rather but specific one (microlib) only?

Microlib slow division causes this issue of slow execution and perfectly works with newlib-nano

Would it be an option to disable microlib on this platform, pending resolution of the speed issue?

@evedon @MarceloSalazar please comment on your view to disable microlib on this platform

I'm not keen on adding specific target dependencies as the code becomes a bit of a mess and it's hard to maintain/scale to support other targets.
But we can detect the situation that prevents code from being executed and skip accordingly (for now). Thoughts?

I meant not support use of microlib on this target - I believe it can be flagged as unsupported in targets.json?

Probably a question for @jeromecoutant on how he thinks this could/should be handled.

I don't think I will "unsupport" some specific targets, before we understand the real reason!
Please correct first #12221

Another approach would be to optimise HAL_GetTick. Once the OS is running, it is going via the generic ticker_read_us, which is a fully-generalised routine to get microseconds.

And I believe it's only ever used to get small timing delays. A suggestion would be to use us_ticker_read directly, as you do when mbed_sdk_initted is false. I don't see what the if condition is gaining you there - why can't you always use the "uninited" code?

I'm busy space/speed optimising ticker_read_us, but still, you can just bypass it.

And then the next trick is to optimise the division for locality. When speed matters, you know little time has elapsed. So divide the difference by 1000, not the total. The standard divide routines should terminate fast for small inputs, but if they don't you could code them yourself. And it's a 32-bit division, not a 64-by-32 one.

(Were you using ticker_read_us to get a properly wrapping 32-bit tick by dividing down the 64-bit microseconds? I guess the non-initted code doesn't give you that because it's dividing down 32-bits, but the below code deals with it).

Something like

uint32_t prev_time, remainder; uint32_t new_time = us_ticker_read(); uint32_t elapsed_time = ((new_time - prev_time) & US_TICKER_MASK) + remainder; uint32_t elapsed_ticks = elapsed_time / 1000; // XXX compiler should terminate above line fast. Probably worth helping it with // if (elapsed_time < 1000) { elapsed_ticks = 0; } // else if (elapsed_time < 2000) { elapsed_ticks = 1; } // else { elapsed_ticks = elapsed_ticks / 1000; } // or even do the division totally manually? // remainder = elapsed_time; // elapsed_ticks = 0; // while (elapsed_time >= 1000) { // elapsed_ticks++; remainder -= 1000; // } // Or hybrid - do the loop if elapsed_time < 65536? (Which would be always if 16-bit) remainder = elapsed_time % 1000; // XXX compiler should optimise the above line when immediately following `/ 1000` // if it doesn't, or not immediately following `/ 1000` use this instead: // remainder = elapsed_time - elapsed_ticks * 1000; // could special-case remainder inside the ifs suggested above return prev_time += elapsed_ticks;

As Kevin suggestion, HAL_GetTick API source has optimized to achieve better performance on all profile and test results on one-time execution as below

Ticker API HAL_GetTick without optimization on bare metal profile with small C lib HAL_GetTick with optimization on bare metal profile with small C lib

clear_interrupt 51us 5us

disable_interrupt 51us 5us

Pull request has been modified.

evedon · 2020-04-29T12:56:02Z

The problem is the test exists for a reason. The moment total timer interrupt execution time exceeds 80us, you're losing serial data (at 115200, if no FIFO, which is common, and true here, I think). A platform failing this test is likely to have comms issues.

In that case I think we shoud disable microlib for this platform in targets.json by removing "small" for the ARM toolchain and not proceed with the extra latency delay.

"supported_c_libs": {
            "arm": ["std"],
            "gcc_arm": ["std", "small" ],
            "iar": ["std" ]
        }

And we should increase the scope #12221

kjbracey · 2020-05-04T12:45:07Z

targets/TARGET_STM/hal_tick_overrides.c

+    uint32_t elapsed_ticks = 0;
+    uint32_t elapsed_time = (((new_time - prev_time) & US_TICKER_MASK) + prev_tick_remainder);
+    prev_time = new_time;
+    if (elapsed_time < 1000) { // elapsed time less than 1ms


These two special cases seem to me to be pointless code bloat. I don't see how they're going to be noticeably faster than just going into the while loop. We generally want to optimise for space, except where we absolutely have to.

They'd make sense if you didn't have the while loop and the alternative was the standard C library division.

kjbracey · 2020-05-04T12:50:46Z

targets/TARGET_STM/us_ticker.c

@@ -130,8 +131,9 @@ void init_16bit_timer(void)
    __HAL_TIM_DISABLE_IT(&TimMasterHandle, TIM_IT_CC1);

    // Used by HAL_GetTick()
+    total_ticks = 0;


I wonder why any of these need to be initialised here.

And the fact they weren't in the 32-bit case would seem to confirm that.

kjbracey · 2020-05-04T12:51:41Z

targets/TARGET_STM/hal_tick_overrides.c

 // Variables also reset in us_ticker_init()
+uint32_t total_ticks = 0;


I think you can make these static - see other comment.

Just to potentially save 2 bytes of RAM, I think I'd have the type of prev_time conditional on TIM_MST_BIT_WIDTH <= 16

kjbracey · 2020-05-04T12:55:39Z

targets/TARGET_STM/hal_tick_overrides.c

+    } else if (elapsed_time < 2000) { // elapsed time less than 2ms
+        elapsed_ticks = 1;
+        prev_tick_remainder = elapsed_time - 1000;
+    } else if (elapsed_time < 65536) { // For 16-bit ticker max value of 2^16 = 65536


The maximum elapsed_time value here is actually 65535+999, but I don't think that's a relevant factor for the threshold.

We just want to not spend a huge time going round the while loop, when the C division would be faster.

That crossover point would be pretty low for most C libraries, but maybe not microlib.

I had a vague idea the compiler could optimise out the test for a 16-bit timer if it was set high though - a space saving rather than a speed saving.

Maybe that would work, if we help it:

if (TIM_MST_BIT_WIDTH <= 16 || elapsed_time < 65536)

That will basically force the compiler to leave out the /% case when 16-bit. Going 65 times round the loop is probably fine. (Will almost never be that high anyway).

I'm not sure which (if any) compilers could figure out the maximum possible value of elapsed_time and eliminate the test when a 16-bit timer couldn't possibly reach the threshold.

evedon · 2020-05-04T12:57:18Z

@rajkan01 It would be cleaner to raise a new PR for the improvment to HAL_GetTick (and close this PR)

- Add the extra us delta latency on lp ticker speed test in particularly on clear_interrupt and disable_interrupt APIs test case expected execution time calculation due to issue with microlib slow division and target slow clock issue.

rajkan01 · 2020-05-04T17:40:06Z

Improvement of HAL_GetTick API is done in the #12915 and closing this PR

rajkan01 changed the title ~~Add the extra us delta latency on lp ticker speed greentea test~~ Add the extra delta latency on common tickers lp ticker speed greentea test Apr 28, 2020

rajkan01 changed the title ~~Add the extra delta latency on common tickers lp ticker speed greentea test~~ Add extra delta latency on common tickers lp ticker speed greentea test Apr 28, 2020

ciarmcom requested review from evedon, MarceloSalazar and a team April 28, 2020 21:00

ciarmcom added the needs: review label Apr 28, 2020

jeromecoutant approved these changes Apr 29, 2020

View reviewed changes

0xc0170 previously requested changes Apr 29, 2020

View reviewed changes

mergify bot added needs: work and removed needs: review labels Apr 29, 2020

rajkan01 force-pushed the lpticker_speed_test branch from fccf649 to 140e6b2 Compare April 29, 2020 10:56

0xc0170 self-requested a review April 29, 2020 11:06

0xc0170 added needs: review and removed needs: work labels Apr 29, 2020

0xc0170 removed the request for review from a team April 29, 2020 11:06

evedon mentioned this pull request Apr 29, 2020

Division operation much slower with microlib than other libs on cortex-M0 #12221

Closed

jeromecoutant mentioned this pull request Apr 30, 2020

Optimise mbed_ticker_api.c #12897

Merged

rajkan01 force-pushed the lpticker_speed_test branch from 140e6b2 to 44dabb3 Compare May 4, 2020 12:35

mergify bot added needs: work and removed needs: review labels May 4, 2020

rajkan01 force-pushed the lpticker_speed_test branch from 44dabb3 to 24c1111 Compare May 4, 2020 12:47

kjbracey reviewed May 4, 2020

View reviewed changes

rajkan01 force-pushed the lpticker_speed_test branch from 24c1111 to b5e14b4 Compare May 4, 2020 14:38

rajkan01 mentioned this pull request May 4, 2020

Optimise HAL_GetTick API #12915

Merged

rajkan01 closed this May 4, 2020

mergify bot removed the needs: work label May 4, 2020

		// Variables also reset in us_ticker_init()
		uint32_t total_ticks = 0;

Add extra delta latency on common tickers lp ticker speed greentea test #12880

Add extra delta latency on common tickers lp ticker speed greentea test #12880

Uh oh!

Conversation

rajkan01 commented Apr 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Impact of changes

Migration actions required

Documentation

Pull request type

Test results

Reviewers

Uh oh!

ciarmcom commented Apr 28, 2020

Uh oh!

MarceloSalazar commented Apr 29, 2020

Uh oh!

jeromecoutant left a comment

Choose a reason for hiding this comment

Uh oh!

kjbracey commented Apr 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0xc0170 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rajkan01 Apr 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarceloSalazar Apr 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kjbracey Apr 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evedon commented Apr 29, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kjbracey May 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evedon commented May 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rajkan01 commented May 4, 2020

Uh oh!

Uh oh!

rajkan01 commented Apr 28, 2020 •

edited

Loading

kjbracey commented Apr 29, 2020 •

edited

Loading

rajkan01 Apr 29, 2020 •

edited

Loading

MarceloSalazar Apr 29, 2020 •

edited

Loading

kjbracey Apr 30, 2020 •

edited

Loading

kjbracey May 4, 2020 •

edited

Loading

evedon commented May 4, 2020 •

edited

Loading