[rtos] Fix irq-safe rtx calls when inside critical sections #2391

geky · 2016-08-08T18:40:53Z

Before, rtx calls would hard fault in critical sections when an svc instruction was attempted with interrupts disabled.

Code exhibiting the issue:

Semaphore sema;

void called_from_irq_or_thread() {
    core_util_critical_section_enter();
    stateful_stuff();
    sema.release();
    more_stateful_stuff();
    core_util_critical_section_exit();
}

Required changes:

Added check for CPSR I bit in Cortex-A rtx
Added check for PRIMASK in Cortex-M rtx
Modified critical sections in Cortex-M rtx to be recursive
(already recursive in Cortex-A)

cc @c1728p9, @0xc0170

c1728p9 · 2016-08-08T19:01:24Z

Looks good to me

bridadan · 2016-08-08T19:19:09Z

Hold off on the bot for just a bit, making a couple tweaks right now. I'll start it when I'm done

bridadan · 2016-08-08T20:08:57Z

/morph test

mbed-bot · 2016-08-08T20:38:02Z

Result: FAILURE

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 612

Build failed!

geky · 2016-08-08T21:04:39Z

/morph test

mbed-bot · 2016-08-08T22:22:28Z

Result: FAILURE

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 613

Test failed!

0xc0170 · 2016-08-09T10:20:08Z

This is the limitation if RTX (not using any RTX API with interrupts disabled) - reported here - #1841 (there's a comment about unprivileged mode, something to consider?)

Taken from the RTX sources:

Functions that cannot be called from an ISR are verifying the interrupt status and return in case that they are called
from an ISR context the status code \b osErrorISR. In some implementations this condition might be caught using the HARD FAULT vector.

The example you provided looks fine as it does not have any functional change, but lets replace semaphore by mutex. With this fix, the unlock would return ISR error status even in the thread mode.

I am not fully certain this is something we could fix, however having hard fault for use cases like this is not user friendly. @RobertRostohar - please have a look at this change-set, what do you think?

geky · 2016-08-09T17:07:57Z

@0xc0170, this pr isn't attempting to change the rtos behaviour in interrupt contexts (maybe I should rename this pr), but rather changes the rtos to treat critical sections where interrupts are disabled as though they are interrupt contexts.

Mutexes should still error in critical sections, since it's a context where the thread can't yield, and the processor would deadlock. However irq-safe functions, such as the semaphore release, are perfectly valid.

geky · 2016-08-09T18:04:51Z

/morph test

mbed-bot · 2016-08-09T19:42:13Z

Result: SUCCESS

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 621

All builds and test passed!

0xc0170 · 2016-08-12T07:11:53Z

@RobertRostohar @ReinhardKeil Any concerns for this changeset ? Shall this be also sent upstream to the RTX kernel?

RobertRostohar · 2016-08-12T09:14:28Z

I'm not thrilled about this solution and would not go this way.

The proposed changes make function calls from threads with disabled interrupts to be treated by RTX as if they were called from ISR. Therefore the functions will either return an error indicating that they cannot be executed from ISR which is not compliant with the API. Or they will be executed as if they were called from ISR. This will bypass the problematic SVC but can have also other unwanted side effects. Some of the functions perform slightly different operations when called from thread or ISR. Some of the functions also assume that they are executed in Privileged mode (achieved by SVC) which might lead to other issues.

Because of all this I'm not in favour of this approach, especially not since it is applied to all functions.

I would not consider this to go into upstream RTX or RTX5 and would discourage using it in mbed also.

What is the problem that you are actually trying to solve?

The initial problem description was that printf cannot be used inside critical sections. That was due to interrupts being disabled when entering a critical section and RTX functions (implicitly called by printf) are not allowed to be called when interrupts are disabled.

Critical section is typically implemented by disabling interrupts (global or partially). However it is also common that during critical sections RTOS functionality is limited.

RTX currently does not provide directly support for critical sections but can co-exist with above mentioned implementation of critical sections as long as RTOS functions are not called inside critical sections. Even FreeRTOS which explicitly includes support for critical sections (enter/exit functions) states that no other RTOS function is allowed to be called inside critical section.

Is it only printf (or other Clib functions) that needs to be called from critical sections? In that case only the Clib mutli-threading interface functions that RTX implements need to be changed.

geky · 2016-08-15T20:00:16Z

What is the problem that you are actually trying to solve?

Consider an irq-friendly data structure that signals users on state-changes. A flexible implementation may take a callback with context to call on state-changes, leaving it up to the user how to defer from interrupt context if needed.

The state associated with the user-provided callback must be synchronized from both foreground and interrupt contexts, leaving a critical section as the obvious solution. This results in the user-provided callback being called from a critical-section, but given that the callback may already by called from an interrupt context, I would expect no problems (and there are none with this patch).

However, rtx does not understand the critical section and hardfaults even in irq-safe functions. This is especially problematic since it prohibits signalling a semaphore, which is one of the most common methods of deferring execution from interrupt contexts.

The specific case where this issue became problematic would be in the mbed-events library, where hardware/software timers need to be updated in foreground/irq contexts. This is also a problem for callback APIs with multiple implementations, such as the network-socket API and the mbed hal.

Alternatively, I did manage to stitch together a workaround using an RCU lock, however that requires irq-safe dynamic memory or finite simultaneous updates, which is a whole other issue.

Even FreeRTOS which explicitly includes support for critical sections (enter/exit functions) states that no other RTOS function is allowed to be called inside critical section.

This is not true. It is poorly documented on the FreeRTOS site, but the *FromISR set of functions have well-defined behaviour in critical-sections: http://www.openrtos.net/message_passing_performance.html

Another example is the Linux kernel, which allows semaphore signalling in any context, including critical sections with disabled preemption: http://lxr.free-electrons.com/source/kernel/locking/semaphore.c#L171

The proposed changes make function calls from threads with disabled interrupts to be treated by RTX as if they were called from ISR. Therefore the functions will either return an error indicating that they cannot be executed from ISR which is not compliant with the API. Or they will be executed as if they were called from ISR. This will bypass the problematic SVC but can have also other unwanted side effects. Some of the functions perform slightly different operations when called from thread or ISR. Some of the functions also assume that they are executed in Privileged mode (achieved by SVC) which might lead to other issues.

Although critical-sections are poorly defined in the cmsis specification, I would argue returning osErrorISR is better than non-descriptively hard-faulting.

You make a good point about privileged mode. If an rtx operation requires hard-mode, this patch retains the original behaviour, which is to hard-fault.

0xc0170 · 2016-08-19T08:29:36Z

Even FreeRTOS which explicitly includes support for critical sections (enter/exit functions) states that no other RTOS function is allowed to be called inside critical section.
This is not true. It is poorly documented on the FreeRTOS site, but the *FromISR set of functions have well-defined behaviour in critical-sections: http://www.openrtos.net/message_passing_performance.html

I believe @RobertRostohar meant that within critical section there should be no FreeRTOS API call, "FreeRTOS API functions must not be called from within a critical section. ". taken from: http://www.freertos.org/taskENTER_CRITICAL_taskEXIT_CRITICAL.html. I assume this is a general rule, and some fromISR* might be used within critical section as the link you shared shows some use cases for that. I noticed critical section functions are without fromISR thus shall not be invoked from ISR. Lets focus on RTX, and how we can make less hard faults :-)

Although critical-sections are poorly defined in the cmsis specification, I would argue returning osErrorISR is better than non-descriptively hard-faulting.

if (__get_PRIMASK() != 0U || __get_IPSR() != 0U) {
     return osErrorISR;                          // Not allowed in ISR
   }

As there was a valid objections to the return code, shall we consider a new error status for not allowed with interrupts disabled ? osErrorInterruptDisabled , to mark that RTOS API requires interrupt enabled. Ideal would be to have osErrorCriticalSection but RTOS does not have critical sections yet, thus this needs to be more general.

Because of all this I'm not in favour of this approach, especially not since it is applied to all functions.

This will bypass the problematic SVC but can have also other unwanted side effects. Some of the functions perform slightly different operations when called from thread or ISR. Some of the functions also assume that they are executed in Privileged mode (achieved by SVC) which might lead to other issues.

@RobertRostohar Any other proposals for this use case? What functions are missing? What side affects ? How we can address them?

Hard fault is not ideal, thus I would go with a new error code, fix comments (ISR or privilged , add there + interrupt disabled). If there's anything else we have missed, let us know.

sg- · 2016-08-26T17:06:58Z

Any updates here?

geky · 2016-08-26T18:37:47Z

@sg- This would require a bit of work if we wanted to adopt the osErrorInterruptsDisabled, although I'm not sure we want to introduce a new error code. Waiting on response from @RobertRostohar.

RobertRostohar · 2016-09-02T11:41:53Z

Despite that we have not really heard about user complaints not being able to use RTX functions while interrupts are disabled, I do recognize that there are use cases (also mentioned in this thread) where such functionality is needed.

Therefore we should make this possible but only for functions where this makes sense (avoiding overhead for all functions and potential side effects). We should identify those functions which will probably be only those that can be called from IRQ: osSignalSet/Clear, osSemaphoreRelease, osMessagePut/Get.

I have analyzed the impact of privileged/unprivileged code on those functions and the initially proposed changes should not be an issue from that perspective (see below).

We do plan to add functions for Critical sections to CMSIS RTOS API2 and RTX5 at some point and also address privileged/unprivileged threads. Currently we will be focusing on ARMv8-M and TrustZone which introduces even more complexity.

At this point Critical sections can be externally implemented by disabling all interrupts (PRIMASK) or partially (BASEPRI). However this can be only done from code executing in privileged mode. Therefore it is safe to execute RTX isr* functions which require privileged access and blocked interrupts used by RTX.

Therefore the proposed changes (slightly modified to cover BASEPRI) should be ok for the limited set of functions. There is also no need for new error codes in this case.

Example:

osStatus osSemaphoreRelease (osSemaphoreId semaphore_id) {  
  if ((__get_PRIMASK() != 0U) || (__get_BASEPRI() != 0U) || (__get_IPSR() != 0U)) {
    return   isrSemaphoreRelease(semaphore_id);  
  } else { 
    return __svcSemaphoreRelease(semaphore_id);  
  }  
}

geky · 2016-09-07T17:28:53Z

Hi @RobertRostohar, will be looking forward to critical sections in RTX5. Thanks for looking a bit into the impact of critical sections on the rtos.

I have updated this patch to also check the BASEPRI register as you suggested. Let us know if you see any concerns with this implementation.

RobertRostohar · 2016-09-08T08:59:49Z

As mentioned in my previous comment I would not add this for all functions but rather just for a few relevant ones. Otherwise I don’t have any concerns anymore (even if it is applied to all functions).

Anyway we plan to address this in RTX5 in the future together with critical sections (also document which functions may be called when in critical section).

sg- · 2016-09-08T22:09:20Z

@geky were you going to update for specific functions?

geky · 2016-09-09T17:07:25Z

Sorry, @RobertRostohar, the only changes to the other functions should be to consistently return an error if called in a critical section (correct me if I missed something).

Do you want the erroring checks to be removed to avoid possible regressions? I can replace the errors with asserts or just remove them completely. Let me know which you prefer.

sg- · 2016-09-09T20:23:07Z

@mbed-bot: TEST

HOST_OSES=ALL
BUILD_TOOLCHAINS=ALL
TARGETS=ALL

sg- · 2016-09-09T23:04:30Z

/morph test

mbed-bot · 2016-09-10T06:42:34Z

Result: FAILURE

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 809

Build failed!

mbed-bot · 2016-09-10T11:45:07Z

[Build 890]
FAILURE: Something went wrong when building and testing.

sg- · 2016-09-10T16:19:15Z

@geky Have a look at CI results. There are a few error such as this. Looks like all M0 targets are affected

[Error] rt_HAL_CM.h@65,0:  #1114: this feature not supported on target architecture/processor

RobertRostohar · 2016-09-12T06:57:59Z

@GKEY, I was trying to avoid additional overhead to all functions just to report errors when called from critical section (and the error code is anyway not consistent as discussed earlier).

Let’s keep the proposed solution for the time being as it is in the PR.

In the long term with CMSIS RTOS API2 and RTX5 we should:

introduce critical sections natively
more appropriate error code (existing or new) when function cannot be called from critical sections
implement a hard fault handler which handles errors rather than doing it in each function (less overhead)

geky · 2016-09-23T02:17:39Z

I've updated the pr to remove BASEPRI for now, since the register is absent on M0 devices and unused in the mbed OS codebase. Let us know if anyone sees an immediately important reason to accommodate the BASEPRI register.

Before, rtx calls would hard fault in critical sections when an svc instruction was attempted with interrupts disabled. Required changes: - Added check for CPSR I bit in cortex A rtx - Added check for PRIMASK in cortex M rtx - Modified critical sections in cortex M rtx to be recursive (already recursive in cortex A)

geky · 2016-09-23T06:00:19Z

/morph test

mbed-bot · 2016-09-23T07:21:54Z

Result: SUCCESS

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 931

All builds and test passed!

0xc0170

+1

geky force-pushed the fix-rtos-critical-sections branch from 9ad767b to 08ca38a Compare August 8, 2016 21:04

geky changed the title ~~[rtos] Fix rtx calls when inside critical sections~~ [rtos] Fix irq-safe rtx calls when inside critical sections Aug 9, 2016

geky force-pushed the fix-rtos-critical-sections branch from 08ca38a to 9fec1c4 Compare August 9, 2016 18:03

sg- added the needs: CI label Aug 23, 2016

geky force-pushed the fix-rtos-critical-sections branch from 9fec1c4 to e41b129 Compare September 7, 2016 17:22

c1728p9 mentioned this pull request Sep 12, 2016

GCC - hardfault in mktime with irq disabled #2647

Closed

sg- added needs: work and removed needs: CI labels Sep 16, 2016

geky force-pushed the fix-rtos-critical-sections branch from e41b129 to 4336283 Compare September 23, 2016 02:09

geky force-pushed the fix-rtos-critical-sections branch from 4336283 to 3e7b5ed Compare September 23, 2016 02:19

0xc0170 approved these changes Sep 23, 2016

View reviewed changes

0xc0170 added ready for merge and removed needs: work labels Sep 23, 2016

sg- merged commit e453b5e into ARMmbed:master Sep 24, 2016

sg- added release-version: 5.2.0 and removed ready for merge labels Sep 24, 2016

0xc0170 mentioned this pull request Jul 31, 2017

SPI driver reports EvrRtxMutexError -6 (Not allowed in ISR context) on STM32F412ZG #4834

Closed

[rtos] Fix irq-safe rtx calls when inside critical sections #2391

[rtos] Fix irq-safe rtx calls when inside critical sections #2391

Uh oh!

Conversation

geky commented Aug 8, 2016

Uh oh!

c1728p9 commented Aug 8, 2016

Uh oh!

bridadan commented Aug 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bridadan commented Aug 8, 2016

Uh oh!

mbed-bot commented Aug 8, 2016

Result: FAILURE

Output

Uh oh!

geky commented Aug 8, 2016

Uh oh!

mbed-bot commented Aug 8, 2016

Result: FAILURE

Output

Uh oh!

0xc0170 commented Aug 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geky commented Aug 9, 2016

Uh oh!

geky commented Aug 9, 2016

Uh oh!

mbed-bot commented Aug 9, 2016

Result: SUCCESS

Output

Uh oh!

0xc0170 commented Aug 12, 2016

Uh oh!

RobertRostohar commented Aug 12, 2016

Uh oh!

geky commented Aug 15, 2016

Uh oh!

0xc0170 commented Aug 19, 2016

Uh oh!

sg- commented Aug 26, 2016

Uh oh!

geky commented Aug 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RobertRostohar commented Sep 2, 2016

Uh oh!

geky commented Sep 7, 2016

Uh oh!

RobertRostohar commented Sep 8, 2016

Uh oh!

sg- commented Sep 8, 2016

Uh oh!

geky commented Sep 9, 2016

Uh oh!

sg- commented Sep 9, 2016

Uh oh!

sg- commented Sep 9, 2016

Uh oh!

mbed-bot commented Sep 10, 2016

Result: FAILURE

Output

Uh oh!

mbed-bot commented Sep 10, 2016

Uh oh!

sg- commented Sep 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RobertRostohar commented Sep 12, 2016

Uh oh!

geky commented Sep 23, 2016

Uh oh!

geky commented Sep 23, 2016

Uh oh!

mbed-bot commented Sep 23, 2016

Result: SUCCESS

Output

Uh oh!

bridadan commented Aug 8, 2016 •

edited

Loading

0xc0170 commented Aug 9, 2016 •

edited

Loading

geky commented Aug 26, 2016 •

edited

Loading

sg- commented Sep 10, 2016 •

edited

Loading