-
Notifications
You must be signed in to change notification settings - Fork 3k
Add extra delta latency on common tickers lp ticker speed greentea test #12880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@rajkan01, thank you for your changes. |
@ARMmbed/team-st-mcd please review as it's related to our conversations in #12221 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that STM32F303 is a cortex M4,
so maybe #12221 scope has to be updated ?
The problem is the test exists for a reason. The moment total timer interrupt execution time exceeds 80us, you're losing serial data (at 115200, if no FIFO, which is common, and true here, I think). A platform failing this test is likely to have comms issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add details from the PR description to the commit to explain reasoning for this extra time for microlib.
@@ -54,6 +54,11 @@ extern "C" { | |||
|
|||
#define MAX_FUNC_EXEC_TIME_US 20 | |||
#define DELTA_FUNC_EXEC_TIME_US 5 | |||
#if defined(__MICROLIB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we dont have small std rather but specific one (microlib) only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Microlib slow division causes this issue of slow execution and perfectly works with newlib-nano
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be an option to disable microlib on this platform, pending resolution of the speed issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@evedon @MarceloSalazar please comment on your view to disable microlib on this platform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not keen on adding specific target dependencies as the code becomes a bit of a mess and it's hard to maintain/scale to support other targets.
But we can detect the situation that prevents code from being executed and skip accordingly (for now). Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant not support use of microlib on this target - I believe it can be flagged as unsupported in targets.json?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a question for @jeromecoutant on how he thinks this could/should be handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I will "unsupport" some specific targets, before we understand the real reason!
Please correct first #12221
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another approach would be to optimise HAL_GetTick
. Once the OS is running, it is going via the generic ticker_read_us
, which is a fully-generalised routine to get microseconds.
And I believe it's only ever used to get small timing delays. A suggestion would be to use us_ticker_read
directly, as you do when mbed_sdk_initted
is false. I don't see what the if
condition is gaining you there - why can't you always use the "uninited" code?
I'm busy space/speed optimising ticker_read_us
, but still, you can just bypass it.
And then the next trick is to optimise the division for locality. When speed matters, you know little time has elapsed. So divide the difference by 1000, not the total. The standard divide routines should terminate fast for small inputs, but if they don't you could code them yourself. And it's a 32-bit division, not a 64-by-32 one.
(Were you using ticker_read_us
to get a properly wrapping 32-bit tick by dividing down the 64-bit microseconds? I guess the non-initted code doesn't give you that because it's dividing down 32-bits, but the below code deals with it).
Something like
uint32_t prev_time, remainder;
uint32_t new_time = us_ticker_read();
uint32_t elapsed_time = ((new_time - prev_time) & US_TICKER_MASK) + remainder;
uint32_t elapsed_ticks = elapsed_time / 1000;
// XXX compiler should terminate above line fast. Probably worth helping it with
// if (elapsed_time < 1000) { elapsed_ticks = 0; }
// else if (elapsed_time < 2000) { elapsed_ticks = 1; }
// else { elapsed_ticks = elapsed_ticks / 1000; }
// or even do the division totally manually?
// remainder = elapsed_time;
// elapsed_ticks = 0;
// while (elapsed_time >= 1000) {
// elapsed_ticks++; remainder -= 1000;
// }
// Or hybrid - do the loop if elapsed_time < 65536? (Which would be always if 16-bit)
remainder = elapsed_time % 1000;
// XXX compiler should optimise the above line when immediately following `/ 1000`
// if it doesn't, or not immediately following `/ 1000` use this instead:
// remainder = elapsed_time - elapsed_ticks * 1000;
// could special-case remainder inside the ifs suggested above
return prev_time += elapsed_ticks;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Kevin suggestion, HAL_GetTick API source has optimized to achieve better performance on all profile and test results on one-time execution as below
Ticker API | HAL_GetTick without optimization on bare metal profile with small C lib | HAL_GetTick with optimization on bare metal profile with small C lib |
---|---|---|
clear_interrupt | 51us | 5us |
disable_interrupt | 51us | 5us |
fccf649
to
140e6b2
Compare
In that case I think we shoud disable microlib for this platform in
And we should increase the scope #12221 |
140e6b2
to
44dabb3
Compare
44dabb3
to
24c1111
Compare
uint32_t elapsed_ticks = 0; | ||
uint32_t elapsed_time = (((new_time - prev_time) & US_TICKER_MASK) + prev_tick_remainder); | ||
prev_time = new_time; | ||
if (elapsed_time < 1000) { // elapsed time less than 1ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two special cases seem to me to be pointless code bloat. I don't see how they're going to be noticeably faster than just going into the while loop. We generally want to optimise for space, except where we absolutely have to.
They'd make sense if you didn't have the while loop and the alternative was the standard C library division.
targets/TARGET_STM/us_ticker.c
Outdated
@@ -130,8 +131,9 @@ void init_16bit_timer(void) | |||
__HAL_TIM_DISABLE_IT(&TimMasterHandle, TIM_IT_CC1); | |||
|
|||
// Used by HAL_GetTick() | |||
total_ticks = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why any of these need to be initialised here.
And the fact they weren't in the 32-bit case would seem to confirm that.
// Variables also reset in us_ticker_init() | ||
uint32_t total_ticks = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can make these static
- see other comment.
Just to potentially save 2 bytes of RAM, I think I'd have the type of prev_time
conditional on TIM_MST_BIT_WIDTH <= 16
} else if (elapsed_time < 2000) { // elapsed time less than 2ms | ||
elapsed_ticks = 1; | ||
prev_tick_remainder = elapsed_time - 1000; | ||
} else if (elapsed_time < 65536) { // For 16-bit ticker max value of 2^16 = 65536 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The maximum elapsed_time
value here is actually 65535+999, but I don't think that's a relevant factor for the threshold.
We just want to not spend a huge time going round the while loop, when the C division would be faster.
That crossover point would be pretty low for most C libraries, but maybe not microlib.
I had a vague idea the compiler could optimise out the test for a 16-bit timer if it was set high though - a space saving rather than a speed saving.
Maybe that would work, if we help it:
if (TIM_MST_BIT_WIDTH <= 16 || elapsed_time < 65536)
That will basically force the compiler to leave out the /%
case when 16-bit. Going 65 times round the loop is probably fine. (Will almost never be that high anyway).
I'm not sure which (if any) compilers could figure out the maximum possible value of elapsed_time
and eliminate the test when a 16-bit timer couldn't possibly reach the threshold.
@rajkan01 It would be cleaner to raise a new PR for the improvment to HAL_GetTick (and close this PR) |
- Add the extra us delta latency on lp ticker speed test in particularly on clear_interrupt and disable_interrupt APIs test case expected execution time calculation due to issue with microlib slow division and target slow clock issue.
24c1111
to
b5e14b4
Compare
Improvement of HAL_GetTick API is done in the #12915 and closing this PR |
Summary of changes
lp ticker speed test cases from common_tickers test suite were failed while running greentea test on bare metal profile with a
small c library
on NUCLEO_F303RE target with ARMC6 toolchain but works with bare metal profile withstd c library
.Below table shows the execution time of lp ticker failed APIs in micro second on different profiles:
As it is a problem with small C library (microlib) slow division limitation and this target slow clock behaviour on deep sleep as like similar issue faced with NUCLEO_L073RZ Cortex-M0 target on deep sleep green tea test
As a workaround, I have added the additional extra delta latency of 25us on expected execution time calculation to allow clear_interrupt and disable_interrupt test cases to pass.
Impact of changes
With these changes, lp ticker speed greentea test cases are passed.
Migration actions required
None.
Documentation
None.
Pull request type
Test results
Reviewers
cc @MarceloSalazar
@evedon