Update idle loop to reduce calls to suspend #6534

c1728p9 · 2018-04-03T21:22:09Z

Change tickless handling to use public RTX calls and to prevent unnecessary calls to suspend/resume by looping until the kernel needs to be resumed.

c1728p9 · 2018-04-03T21:56:38Z

This change also allows #6536 to work correctly, as the low power timer no longer needs to written to on every wakeup.

c1728p9 · 2018-04-03T22:05:26Z

CC @JonatanAntoni @bulislaw

kjbracey

Not familiar with the SysTimer, but I've looked at the RTX end.

kjbracey · 2018-04-04T08:06:55Z

rtos/TARGET_CORTEX/mbed_rtx_idle.cpp

+        if (!osRtxInfo.kernel.pendSV) { // check devices didn't run any interrupt handlers kicking the OS between suspend and critical section
+            sleep();                    // uses WFI - will wake as soon as any pending interrupts
+        }
+        core_util_critical_section_exit();


There might be call for an ISB here - you're wanting pending interrupts to trigger between this line and the next one, and an ISB should make sure that happens.

In theory I think you could redisable interrupts before any have a chance to trigger - formally the sequence "enable IRQs / disable IRQs" needs an ISB somewhere in the middle to guarantee they have time to trigger.

Relevant ARM ARM quote:

If execution of a CPS instruction:

Increases the execution priority, the CPS execution serializes that change to the instruction stream.

Decreases the execution priority, the architecture guarantees only that the new priority is visible to instructions executed after either executing an ISB, or performing an exception entry or exception return.

(In practice they're likely to trigger before you hit the next disable, which means you'd see pendSV set by the next time you hit line 104.)

kjbracey · 2018-04-04T08:07:15Z

rtos/TARGET_CORTEX/mbed_rtx_idle.cpp

-        os_timer->cancel_tick();
-        // calculate how long we slept
-        elapsed_ticks = os_timer->update_tick();
+        if (osRtxInfo.kernel.pendSV) {


This could just be part of the while condition? Could be true on first entry to the while

Change tickless handling to use public RTX calls and to prevent unnecessary calls to suspend/resume by looping until the kernel needs to be resumed.

JonatanAntoni · 2018-04-05T12:28:44Z

rtos/TARGET_CORTEX/mbed_rtx_idle.cpp

+    while (!os_timer->suspend_time_passed() && !event_pending) {
+
+        core_util_critical_section_enter();
+        if (osRtxInfo.kernel.pendSV) {


osRtxInfo.kernel.pendSV is not public API. Perhaps @RobertRostohar can justify if this solution is solid and a small API change would be valuable.

Might help with the review, details are here #6273 (comment)

Agree that there should be better way - pendSV peeking was the best I could think of with current APIs.

The "simple" API change I would suggest is to permit osKernelSuspend/Resume to be callable from critical section (which is what happens via direct svcKernelSuspend calls at the moment, and this replaces), or to have them or equivalents actually return from the suspend call such that the thread is now in a critical section.

I guess in general one also needs to think about priviliged/unprivileged operation, in case a system is using User mode - system suspension, critical sections and the WFI instruction are privileged operations anyway, so this whole "normal thread code does suspend" only works if the idle thread is privileged (or the suspend call makes you privileged?), or you have a WFE-only version (which I believe is less flexible and can reduce power saving opportunities).

@JonatanAntoni when I was initially working on tickless I proposed an extension to the RTOS to allow this, which you might be able to use:
https://github.com/ARMmbed/mbed-os/pull/5031/files#diff-415fd494349b5414064e0cc0d9f0d596R189

As for this PR, how do you want to proceed? Can we move forward using this internal API until RTX has a public API which supports this?

Sure, for the time being you can access this internal bit. But as it is not public API it might change in the future without being warned. So we should align the low power operation, soon. We need to address this privileged/unprivileged issue nonetheless talking about MPU protection.

c1728p9 · 2018-04-06T00:13:21Z

/morph build

mbed-ci · 2018-04-06T00:54:59Z

Build : FAILURE

Build number : 1668
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6534/

0xc0170 · 2018-04-06T15:19:04Z

mbedmicro-rtos-mbed needs an update?

cmonr · 2018-04-09T23:56:52Z

@0xc0170 Would appear to be the case.

Update the SysTimer test to match the updated API. Changes are: - increment_tick() renamed to _increment_tick() and explicitly synchronized -update_tick() replaced with resume() and a call to suspend() was added before this

c1728p9 · 2018-04-10T22:05:05Z

Even with the test update, this PR intermittently fails on test_update_tick on the nrf51. From profiling this test, I found that the call to suspend is taking at times +500us. I suspect that a tick is passing and that is why 1 is returned from suspend rather than 0 - FAIL: Expected 0 Was 1.

If I pull in #5629 suspend takes around 133us to complete, which is still a lot, but the test consistently passes. @fkjagodzinski you were the author of this test right? Do you have any recommendations on how I can get this test to pass consistently?

c1728p9 · 2018-04-10T22:05:38Z

/morph build

mbed-ci · 2018-04-10T22:34:06Z

Build : SUCCESS

Build number : 1714
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6534/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

mbed-ci · 2018-04-11T01:11:35Z

Exporter Build : SUCCESS

Build number : 1347
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/exporter/6534/

fkjagodzinski · 2018-04-11T10:34:31Z

@c1728p9 test_update_tick() is supposed to check just the update_tick() method with, and without a delay in between calls. Since update_tick() functionality was moved to resume() it's enough to update method name in the test. No calls to suspend() needed.

With this patch 5bfaf528d571de6ae67fe50c21ff25af9bb6fecf all tests-mbedmicro-rtos-mbed-systimer consistently pass on NRF51_DK.

0xc0170 · 2018-04-11T11:27:23Z

With this patch 5bfaf52 all tests-mbedmicro-rtos-mbed-systimer consistently pass on NRF51_DK.

Shall this be added here?

fkjagodzinski · 2018-04-17T09:09:54Z

My fault -- I tested that patch with GCC toolchain only. This commit fkjagodzinski@b5dc7f4 fixes ARM and IAR compilation errors.

c1728p9 · 2018-04-17T14:39:31Z

Thanks for the fix @fkjagodzinski. I pulled it into this PR.

cmonr · 2018-04-17T14:53:52Z

/morph build

mbed-ci · 2018-04-17T15:29:30Z

Build : SUCCESS

Build number : 1775
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6534/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

mbed-ci · 2018-04-17T22:27:47Z

Exporter Build : SUCCESS

Build number : 1416
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/exporter/6534/

mbed-ci · 2018-04-18T05:42:52Z

Test : FAILURE

Build number : 1585
Test logs :http://mbed-os-logs.s3-website-us-west-1.amazonaws.com/?prefix=logs/6534/1585

bulislaw

The idle loop and systimer changes looks good.

0xc0170 · 2018-04-18T12:04:59Z

Ah too slow to remove the comment to trigger another build :-) I invoked build instead of test to retest 😖

mbed-ci · 2018-04-18T12:33:28Z

Build : SUCCESS

Build number : 1789
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6534/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

mbed-ci · 2018-04-19T01:07:30Z

Exporter Build : SUCCESS

Build number : 1437
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/exporter/6534/

mbed-ci · 2018-04-19T02:26:19Z

Test : FAILURE

Build number : 1600
Test logs :http://mbed-os-logs.s3-website-us-west-1.amazonaws.com/?prefix=logs/6534/1600

fkjagodzinski · 2018-04-19T08:52:58Z

tests-mbed_drivers-timeout / 'Multiple timeouts running in parallel (attach)'test case compiled with ARM toolchain failed again on NRF52_DK board.

I'm investigating. So far, I can tell that the issue is also present on current master branch. #6515 might help here -- with this PR merged I couldn't reproduce the issue locally any more.

0xc0170 · 2018-04-19T09:07:58Z

I'm investigating. So far, I can tell that the issue is also present on current master branch. #6515 might help here -- with this PR merged I couldn't reproduce the issue locally any more.

Interesting, and that fix is blocked by another failure 😕

0xc0170 · 2018-04-19T14:18:28Z

I'm investigating. So far, I can tell that the issue is also present on current master branch.

@adbridge @cmonr Are you aware of this failure? I don't recall seeing this in CI, I'll look at the logs. @fkjagodzinski If this is present on master, and these tests were added not long time ago, were you aware of this issue?

fkjagodzinski · 2018-04-19T15:06:08Z

@0xc0170 No, not until I saw morph failures for this PR. It didn't look related to this PR, that's why I checked master. The issue does not show up very often, but can be reproduced.

cmonr · 2018-04-24T16:17:50Z

/morph test

fkjagodzinski · 2018-04-24T16:53:24Z

I've created an issue #6732 based on the failing test case tests-mbed_drivers-timeout / 'Multiple timeouts running in parallel (attach)'.

mbed-ci · 2018-04-24T21:47:53Z

Test : SUCCESS

Build number : 1646
Test logs :http://mbed-os-logs.s3-website-us-west-1.amazonaws.com/?prefix=logs/6534/1646

stevew817 · 2018-05-25T13:36:43Z

rtos/TARGET_CORTEX/SysTimer.cpp

-void SysTimer::cancel_tick()
-{
-    remove();
+    schedule_tick(ticks);


@c1728p9 Why is there no remove() call here? I'm seeing issues on my system because of this.

The SysTimer is based on a TimerEvent class, which has a static member for the timer's event data struct. Additionally, the call to suspend may happen before the previously set timer expires (for example when all threads go idle and the idle handler gets activated). At that point, the timer event data is still present in the timer queue. Calling schedule_tick here will insert the same event structure again in the queue. And since the queue is by reference, that'll lead to a recursive timer that breaks the system.

Hi @stevew817 I gave this a try but was unable to reproduce the failure. Do you have example code which reproduces the issue?

From both code analysis and running the code can I cant find a sequence which causes this. The basic flow of the idle loop should unconditionally call remove before each insert as indicated below:

while (true) { osKernelSuspend(); // removes event os_timer->suspend(); // inserts event sleep(); os_timer->resume(); // removes event osKernelResume(); // inserts event }

With a debugger I can confirm the following sequence is being met by setting a breakpoint in TimerEvent::remove, TimerEvent::insert and TimerEvent::insert_absolute. Below is the callstack from each:

osKernelSuspend() svcRtxKernelSuspend() KernelBlock() OS_Tick_Disable() os_timer->cancel_tick() remove() os_timer->suspend() schedule_tick() insert_absolute() os_timer->resume() remove() osKernelResume() svcRtxKernelResume() KernelUnblock() OS_Tick_Enable() schedule_tick() insert_absolute();

I ran across this when running cloud-client-example using Thread as a sleepy node in tickless mode. The code would hang, and attaching a debugger confirmed it was stuck in the tick handler IRQ because somehow the tick event had managed to add itself recursively (the next pointer of the tick event data was pointing to the same event data, causing the remove call to not do anything since the head just always ands up being the tick event data).

I fixed the issue by adding a cancel_tick() call before the schedule_ticks call in SysTimer::suspend. I didn't investigate further after solving my issue, so I'm afraid I don't have a reproducer ready.

@stevew817 I'm still unable to reproduce this problem, but I created #7027 just to be safe. Could you provide further details on how to replicate this failure (such as git shas used in cloud-client-example and mbed-os along with config, any local mods you made and the hardware you were using) or an example application?

cmonr · 2018-05-25T16:38:51Z

@c1728p9 Please take a look at this and if this needs a fix and/or update, we can take a fix in for RC2.

CC: @ARMmbed/mbed-os-maintainers

cmonr added the needs: review label Apr 3, 2018

This was referenced Apr 3, 2018

Proof of concept lp ticker fix #6473

Closed

Update cmsis/rtx to version 5.3 #6273

Merged

c1728p9 requested review from kjbracey, deepikabhavnani and 0xc0170 April 3, 2018 22:03

kjbracey reviewed Apr 4, 2018

View reviewed changes

c1728p9 force-pushed the rtos_suspend branch from 2521a6c to 2f61a93 Compare April 4, 2018 14:15

Update idle loop to reduce calls to suspend

953bbeb

Change tickless handling to use public RTX calls and to prevent unnecessary calls to suspend/resume by looping until the kernel needs to be resumed.

c1728p9 force-pushed the rtos_suspend branch from 2f61a93 to 953bbeb Compare April 4, 2018 14:17

0xc0170 previously approved these changes Apr 5, 2018

View reviewed changes

0xc0170 requested a review from bulislaw April 5, 2018 12:09

JonatanAntoni reviewed Apr 5, 2018

View reviewed changes

0xc0170 added needs: work and removed needs: review labels Apr 6, 2018

LMESTM mentioned this pull request Apr 9, 2018

Waking STM32L073 from sleep #6362

Closed

Update the SysTimer test

044eb1e

Update the SysTimer test to match the updated API. Changes are: - increment_tick() renamed to _increment_tick() and explicitly synchronized -update_tick() replaced with resume() and a call to suspend() was added before this

c1728p9 dismissed 0xc0170’s stale review via 044eb1e April 10, 2018 16:08

RTOS: SysTimer: Fix test compilation error on ARM & IAR

ec59cbb

bulislaw approved these changes Apr 18, 2018

View reviewed changes

cmonr added ready for merge and removed needs: work labels Apr 24, 2018

kjbracey approved these changes Apr 25, 2018

View reviewed changes

0xc0170 added the release-version: 5.9.0-rc1 label Apr 25, 2018

0xc0170 merged commit 675528b into ARMmbed:master Apr 25, 2018

0xc0170 removed the ready for merge label Apr 25, 2018

fkjagodzinski mentioned this pull request Apr 26, 2018

NRF52_DK: Not all callbacks called when multiple Timeout objects scheduled one by one #6732

Closed

stevew817 reviewed May 25, 2018

View reviewed changes

kjbracey mentioned this pull request Dec 17, 2018

Use of RTX layer in mbed_error #9125

Closed

embeddedteam103 mentioned this pull request Dec 18, 2018

Replace cmsis implementation #9139

Closed

Update idle loop to reduce calls to suspend #6534

Update idle loop to reduce calls to suspend #6534

Uh oh!

Conversation

c1728p9 commented Apr 3, 2018

Uh oh!

c1728p9 commented Apr 3, 2018

Uh oh!

c1728p9 commented Apr 3, 2018

Uh oh!

kjbracey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JonatanAntoni Apr 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

c1728p9 Apr 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

c1728p9 commented Apr 6, 2018

Uh oh!

mbed-ci commented Apr 6, 2018

Uh oh!

0xc0170 commented Apr 6, 2018

Uh oh!

cmonr commented Apr 9, 2018

Uh oh!

c1728p9 commented Apr 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

c1728p9 commented Apr 10, 2018

Uh oh!

mbed-ci commented Apr 10, 2018

Triggering tests

Uh oh!

mbed-ci commented Apr 11, 2018

Uh oh!

fkjagodzinski commented Apr 11, 2018

Uh oh!

0xc0170 commented Apr 11, 2018

Uh oh!

fkjagodzinski commented Apr 17, 2018

Uh oh!

c1728p9 commented Apr 17, 2018

Uh oh!

cmonr commented Apr 17, 2018

Uh oh!

mbed-ci commented Apr 17, 2018

Triggering tests

Uh oh!

mbed-ci commented Apr 17, 2018

Uh oh!

mbed-ci commented Apr 18, 2018

Uh oh!

bulislaw left a comment

Choose a reason for hiding this comment

Uh oh!

0xc0170 commented Apr 18, 2018

Uh oh!

mbed-ci commented Apr 18, 2018

Triggering tests

Uh oh!

mbed-ci commented Apr 19, 2018

Uh oh!

mbed-ci commented Apr 19, 2018

Uh oh!

fkjagodzinski commented Apr 19, 2018

Uh oh!

JonatanAntoni Apr 5, 2018 •

edited

Loading

c1728p9 Apr 5, 2018 •

edited

Loading

c1728p9 commented Apr 10, 2018 •

edited

Loading

c1728p9 May 26, 2018 •

edited

Loading