Skip to content

Optimize tickless tick computation #9786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 27, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 23 additions & 20 deletions rtos/TARGET_CORTEX/SysTimer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@

#include "hal/lp_ticker_api.h"
#include "mbed_critical.h"
#include "mbed_assert.h"
#if defined(TARGET_CORTEX_A)
#include "rtx_core_ca.h"
#else//Cortex-M
Expand All @@ -37,6 +38,9 @@ extern "C" {
#endif
}

#define US_IN_TICK (1000000 / OS_TICK_FREQ)
MBED_STATIC_ASSERT(1000000 % OS_TICK_FREQ == 0, "OS_TICK_FREQ must be a divisor of 1000000 for correct tick calculations");

#if (defined(NO_SYSTICK))
/**
* Return an IRQ number that can be used in the absence of SysTick
Expand All @@ -54,17 +58,17 @@ namespace rtos {
namespace internal {

SysTimer::SysTimer() :
TimerEvent(get_lp_ticker_data()), _start_time(0), _tick(0)
TimerEvent(get_lp_ticker_data()), _time_us(0), _tick(0)
{
_start_time = ticker_read_us(_ticker_data);
_time_us = ticker_read_us(_ticker_data);
_suspend_time_passed = true;
_suspended = false;
}

SysTimer::SysTimer(const ticker_data_t *data) :
TimerEvent(data), _start_time(0), _tick(0)
TimerEvent(data), _time_us(0), _tick(0)
{
_start_time = ticker_read_us(_ticker_data);
_time_us = ticker_read_us(_ticker_data);
}

void SysTimer::setup_irq()
Expand All @@ -83,14 +87,13 @@ void SysTimer::setup_irq()

void SysTimer::suspend(uint32_t ticks)
{
core_util_critical_section_enter();

// Remove ensures serialized access to SysTimer by stopping timer interrupt
remove();
schedule_tick(ticks);

_suspend_time_passed = false;
_suspended = true;

core_util_critical_section_exit();
schedule_tick(ticks);
}

bool SysTimer::suspend_time_passed()
Expand All @@ -100,43 +103,40 @@ bool SysTimer::suspend_time_passed()

uint32_t SysTimer::resume()
{
core_util_critical_section_enter();
// Remove ensures serialized access to SysTimer by stopping timer interrupt
remove();

_suspended = false;
_suspend_time_passed = true;
remove();

uint64_t new_tick = (ticker_read_us(_ticker_data) - _start_time) * OS_TICK_FREQ / 1000000;
if (new_tick > _tick) {
uint64_t elapsed_ticks = (ticker_read_us(_ticker_data) - _time_us) / US_IN_TICK;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah!!! I briefly wondered whether this might be a solution, but immediately shot it down because I assumed SysTimer::resume was being called in a critical section from outside. This is great.

I think it's still worth optimising for the common short case - if this is taking 20us every millisecond, that is 2% of CPU time just for the division. (On a 4MHz system I saw about 10% of CPU time being eaten up by the millisecond tickers.).

How about:

uint64_t elapsed_us = ticker_read_us(_ticker_data) - _time_us;
uint64_t elapsed_ticks;
if (elapsed_us > 0xFFFFFFFF) {
    // This will typically invoke full 64/64 software division
    elapsed_ticks = elapsed_us / US_IN_TICK;
} else {
    // This will will invoke 32/32 software division on ARMv6-M or ARMv8-M baseline, and
    // on ARMv7-M or ARMv8-M mainline can be optimised to a UMULL by 2^n/US_IN_TICK, or
    // at worst be a UDIV instruction. Thus it will be faster, maybe greatly.
    elapsed_ticks = (uint32_t) elapsed_us / US_IN_TICK;
}

Copy link
Contributor Author

@c1728p9 c1728p9 Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed SysTimer::resume was being called in a critical section from outside.

It used to be in a critical section, but had to be changed when the RTX API it relied on became unavailable. I missed it at this time as well.

Initially I had it avoid division all together in the fast case - when only a single tick had elapsed:

if (elapsed_time < US_IN_TICK) {
    elapsed_ticks = 0;
} else if (elapsed_time < US_IN_TICK * 2) {
    elapsed_ticks = 1;
} else {
    elapsed_ticks = elapsed_time / US_IN_TICK;
}

I ended up going with the simpler implementation you see now since I was just worried about reducing worst case execution time. Also, I was only setup to profile only worst-case execution time, which did not change much from this.

It sounds like you have profiled the average execution time on a couple of devices. Would you be able to profile the different optimizations and create a PR for the one which looks the most promising?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to see if those extra cases help too. My feeling is that division tends to have early termination, so might not help much, but worth checking.

I was just worried about reducing worst case execution time.

But we don't care about worst-case any more if we're outside of the critical section, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we don't care about worst-case any more if we're outside of the critical section, right?

True, worst case doesn't matter anymore. At the time I was only setup to profile worst-case, so I wasn't able to evaluate the impact of this change.

if (elapsed_ticks > 0) {
// Don't update to the current tick. Instead, update to the
// previous tick and let the SysTick handler increment it
// to the current value. This allows scheduling restart
// successfully after the OS is resumed.
new_tick--;
elapsed_ticks--;
}
uint32_t elapsed_ticks = new_tick - _tick;
_tick = new_tick;
_time_us += elapsed_ticks * US_IN_TICK;
_tick += elapsed_ticks;

core_util_critical_section_exit();
return elapsed_ticks;
}

void SysTimer::schedule_tick(uint32_t delta)
{
core_util_critical_section_enter();

insert_absolute(_start_time + (_tick + delta) * 1000000ULL / OS_TICK_FREQ);
insert_absolute(_time_us + delta * US_IN_TICK);

core_util_critical_section_exit();
}

void SysTimer::cancel_tick()
{
core_util_critical_section_enter();
// Underlying call is interrupt safe

remove();

core_util_critical_section_exit();
}

uint32_t SysTimer::get_tick()
Expand All @@ -146,6 +146,8 @@ uint32_t SysTimer::get_tick()

us_timestamp_t SysTimer::get_time()
{
// Underlying call is interrupt safe

return ticker_read_us(_ticker_data);
}

Expand All @@ -171,6 +173,7 @@ void SysTimer::_increment_tick()
// Protected function synchronized externally

_tick++;
_time_us += US_IN_TICK;
}

void SysTimer::handler()
Expand Down
2 changes: 1 addition & 1 deletion rtos/TARGET_CORTEX/SysTimer.h
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ class SysTimer: private mbed::TimerEvent, private mbed::NonCopyable<SysTimer> {
virtual void handler();
void _increment_tick();
static void _set_irq_pending();
us_timestamp_t _start_time;
us_timestamp_t _time_us;
uint64_t _tick;
bool _suspend_time_passed;
bool _suspended;
Expand Down