Skip to content

[rtos] Fix irq-safe rtx calls when inside critical sections #2391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 24, 2016

Conversation

geky
Copy link
Contributor

@geky geky commented Aug 8, 2016

Before, rtx calls would hard fault in critical sections when an svc instruction was attempted with interrupts disabled.

Code exhibiting the issue:

Semaphore sema;

void called_from_irq_or_thread() {
    core_util_critical_section_enter();
    stateful_stuff();
    sema.release();
    more_stateful_stuff();
    core_util_critical_section_exit();
}

Required changes:

  • Added check for CPSR I bit in Cortex-A rtx
  • Added check for PRIMASK in Cortex-M rtx
  • Modified critical sections in Cortex-M rtx to be recursive
    (already recursive in Cortex-A)

cc @c1728p9, @0xc0170

@c1728p9
Copy link
Contributor

c1728p9 commented Aug 8, 2016

Looks good to me

@bridadan
Copy link
Contributor

bridadan commented Aug 8, 2016

Hold off on the bot for just a bit, making a couple tweaks right now. I'll start it when I'm done

@bridadan
Copy link
Contributor

bridadan commented Aug 8, 2016

/morph test

@mbed-bot
Copy link

mbed-bot commented Aug 8, 2016

Result: FAILURE

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 612

Build failed!

@geky geky force-pushed the fix-rtos-critical-sections branch from 9ad767b to 08ca38a Compare August 8, 2016 21:04
@geky
Copy link
Contributor Author

geky commented Aug 8, 2016

/morph test

@mbed-bot
Copy link

mbed-bot commented Aug 8, 2016

Result: FAILURE

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 613

Test failed!

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 9, 2016

This is the limitation if RTX (not using any RTX API with interrupts disabled) - reported here - #1841 (there's a comment about unprivileged mode, something to consider?)

Taken from the RTX sources:

Functions that cannot be called from an ISR are verifying the interrupt status and return in case that they are called
from an ISR context the status code \b osErrorISR. In some implementations this condition might be caught using the HARD FAULT vector.

The example you provided looks fine as it does not have any functional change, but lets replace semaphore by mutex. With this fix, the unlock would return ISR error status even in the thread mode.

I am not fully certain this is something we could fix, however having hard fault for use cases like this is not user friendly. @RobertRostohar - please have a look at this change-set, what do you think?

@geky
Copy link
Contributor Author

geky commented Aug 9, 2016

@0xc0170, this pr isn't attempting to change the rtos behaviour in interrupt contexts (maybe I should rename this pr), but rather changes the rtos to treat critical sections where interrupts are disabled as though they are interrupt contexts.

Mutexes should still error in critical sections, since it's a context where the thread can't yield, and the processor would deadlock. However irq-safe functions, such as the semaphore release, are perfectly valid.

@geky geky changed the title [rtos] Fix rtx calls when inside critical sections [rtos] Fix irq-safe rtx calls when inside critical sections Aug 9, 2016
@geky geky force-pushed the fix-rtos-critical-sections branch from 08ca38a to 9fec1c4 Compare August 9, 2016 18:03
@geky
Copy link
Contributor Author

geky commented Aug 9, 2016

/morph test

@mbed-bot
Copy link

mbed-bot commented Aug 9, 2016

Result: SUCCESS

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 621

All builds and test passed!

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 12, 2016

@RobertRostohar @ReinhardKeil Any concerns for this changeset ? Shall this be also sent upstream to the RTX kernel?

@RobertRostohar
Copy link
Contributor

I'm not thrilled about this solution and would not go this way.

The proposed changes make function calls from threads with disabled interrupts to be treated by RTX as if they were called from ISR. Therefore the functions will either return an error indicating that they cannot be executed from ISR which is not compliant with the API. Or they will be executed as if they were called from ISR. This will bypass the problematic SVC but can have also other unwanted side effects. Some of the functions perform slightly different operations when called from thread or ISR. Some of the functions also assume that they are executed in Privileged mode (achieved by SVC) which might lead to other issues.

Because of all this I'm not in favour of this approach, especially not since it is applied to all functions.

I would not consider this to go into upstream RTX or RTX5 and would discourage using it in mbed also.

What is the problem that you are actually trying to solve?

The initial problem description was that printf cannot be used inside critical sections. That was due to interrupts being disabled when entering a critical section and RTX functions (implicitly called by printf) are not allowed to be called when interrupts are disabled.

Critical section is typically implemented by disabling interrupts (global or partially). However it is also common that during critical sections RTOS functionality is limited.

RTX currently does not provide directly support for critical sections but can co-exist with above mentioned implementation of critical sections as long as RTOS functions are not called inside critical sections. Even FreeRTOS which explicitly includes support for critical sections (enter/exit functions) states that no other RTOS function is allowed to be called inside critical section.

Is it only printf (or other Clib functions) that needs to be called from critical sections? In that case only the Clib mutli-threading interface functions that RTX implements need to be changed.

@geky
Copy link
Contributor Author

geky commented Aug 15, 2016

What is the problem that you are actually trying to solve?

Consider an irq-friendly data structure that signals users on state-changes. A flexible implementation may take a callback with context to call on state-changes, leaving it up to the user how to defer from interrupt context if needed.

The state associated with the user-provided callback must be synchronized from both foreground and interrupt contexts, leaving a critical section as the obvious solution. This results in the user-provided callback being called from a critical-section, but given that the callback may already by called from an interrupt context, I would expect no problems (and there are none with this patch).

However, rtx does not understand the critical section and hardfaults even in irq-safe functions. This is especially problematic since it prohibits signalling a semaphore, which is one of the most common methods of deferring execution from interrupt contexts.

The specific case where this issue became problematic would be in the mbed-events library, where hardware/software timers need to be updated in foreground/irq contexts. This is also a problem for callback APIs with multiple implementations, such as the network-socket API and the mbed hal.

Alternatively, I did manage to stitch together a workaround using an RCU lock, however that requires irq-safe dynamic memory or finite simultaneous updates, which is a whole other issue.

Even FreeRTOS which explicitly includes support for critical sections (enter/exit functions) states that no other RTOS function is allowed to be called inside critical section.

This is not true. It is poorly documented on the FreeRTOS site, but the *FromISR set of functions have well-defined behaviour in critical-sections: http://www.openrtos.net/message_passing_performance.html

Another example is the Linux kernel, which allows semaphore signalling in any context, including critical sections with disabled preemption: http://lxr.free-electrons.com/source/kernel/locking/semaphore.c#L171

The proposed changes make function calls from threads with disabled interrupts to be treated by RTX as if they were called from ISR. Therefore the functions will either return an error indicating that they cannot be executed from ISR which is not compliant with the API. Or they will be executed as if they were called from ISR. This will bypass the problematic SVC but can have also other unwanted side effects. Some of the functions perform slightly different operations when called from thread or ISR. Some of the functions also assume that they are executed in Privileged mode (achieved by SVC) which might lead to other issues.

Although critical-sections are poorly defined in the cmsis specification, I would argue returning osErrorISR is better than non-descriptively hard-faulting.

You make a good point about privileged mode. If an rtx operation requires hard-mode, this patch retains the original behaviour, which is to hard-fault.

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 19, 2016

Even FreeRTOS which explicitly includes support for critical sections (enter/exit functions) states that no other RTOS function is allowed to be called inside critical section.

This is not true. It is poorly documented on the FreeRTOS site, but the *FromISR set of functions have well-defined behaviour in critical-sections: http://www.openrtos.net/message_passing_performance.html

I believe @RobertRostohar meant that within critical section there should be no FreeRTOS API call, "FreeRTOS API functions must not be called from within a critical section. ". taken from: http://www.freertos.org/taskENTER_CRITICAL_taskEXIT_CRITICAL.html. I assume this is a general rule, and some fromISR* might be used within critical section as the link you shared shows some use cases for that. I noticed critical section functions are without fromISR thus shall not be invoked from ISR. Lets focus on RTX, and how we can make less hard faults :-)

Although critical-sections are poorly defined in the cmsis specification, I would argue returning osErrorISR is better than non-descriptively hard-faulting.

if (__get_PRIMASK() != 0U || __get_IPSR() != 0U) {
     return osErrorISR;                          // Not allowed in ISR
   }

As there was a valid objections to the return code, shall we consider a new error status for not allowed with interrupts disabled ? osErrorInterruptDisabled , to mark that RTOS API requires interrupt enabled. Ideal would be to have osErrorCriticalSection but RTOS does not have critical sections yet, thus this needs to be more general.

Because of all this I'm not in favour of this approach, especially not since it is applied to all functions.

This will bypass the problematic SVC but can have also other unwanted side effects. Some of the functions perform slightly different operations when called from thread or ISR. Some of the functions also assume that they are executed in Privileged mode (achieved by SVC) which might lead to other issues.

@RobertRostohar Any other proposals for this use case? What functions are missing? What side affects ? How we can address them?

Hard fault is not ideal, thus I would go with a new error code, fix comments (ISR or privilged , add there + interrupt disabled). If there's anything else we have missed, let us know.

@sg- sg- added the needs: CI label Aug 23, 2016
@sg-
Copy link
Contributor

sg- commented Aug 26, 2016

Any updates here?

@geky
Copy link
Contributor Author

geky commented Aug 26, 2016

@sg- This would require a bit of work if we wanted to adopt the osErrorInterruptsDisabled, although I'm not sure we want to introduce a new error code. Waiting on response from @RobertRostohar.

@RobertRostohar
Copy link
Contributor

Despite that we have not really heard about user complaints not being able to use RTX functions while interrupts are disabled, I do recognize that there are use cases (also mentioned in this thread) where such functionality is needed.

Therefore we should make this possible but only for functions where this makes sense (avoiding overhead for all functions and potential side effects). We should identify those functions which will probably be only those that can be called from IRQ: osSignalSet/Clear, osSemaphoreRelease, osMessagePut/Get.

I have analyzed the impact of privileged/unprivileged code on those functions and the initially proposed changes should not be an issue from that perspective (see below).

We do plan to add functions for Critical sections to CMSIS RTOS API2 and RTX5 at some point and also address privileged/unprivileged threads. Currently we will be focusing on ARMv8-M and TrustZone which introduces even more complexity.

At this point Critical sections can be externally implemented by disabling all interrupts (PRIMASK) or partially (BASEPRI). However this can be only done from code executing in privileged mode. Therefore it is safe to execute RTX isr* functions which require privileged access and blocked interrupts used by RTX.

Therefore the proposed changes (slightly modified to cover BASEPRI) should be ok for the limited set of functions. There is also no need for new error codes in this case.

Example:

osStatus osSemaphoreRelease (osSemaphoreId semaphore_id) {  
  if ((__get_PRIMASK() != 0U) || (__get_BASEPRI() != 0U) || (__get_IPSR() != 0U)) {
    return   isrSemaphoreRelease(semaphore_id);  
  } else { 
    return __svcSemaphoreRelease(semaphore_id);  
  }  
}  

@geky geky force-pushed the fix-rtos-critical-sections branch from 9fec1c4 to e41b129 Compare September 7, 2016 17:22
@geky
Copy link
Contributor Author

geky commented Sep 7, 2016

Hi @RobertRostohar, will be looking forward to critical sections in RTX5. Thanks for looking a bit into the impact of critical sections on the rtos.

I have updated this patch to also check the BASEPRI register as you suggested. Let us know if you see any concerns with this implementation.

@RobertRostohar
Copy link
Contributor

As mentioned in my previous comment I would not add this for all functions but rather just for a few relevant ones. Otherwise I don’t have any concerns anymore (even if it is applied to all functions).

Anyway we plan to address this in RTX5 in the future together with critical sections (also document which functions may be called when in critical section).

@sg-
Copy link
Contributor

sg- commented Sep 8, 2016

@geky were you going to update for specific functions?

@geky
Copy link
Contributor Author

geky commented Sep 9, 2016

Sorry, @RobertRostohar, the only changes to the other functions should be to consistently return an error if called in a critical section (correct me if I missed something).

Do you want the erroring checks to be removed to avoid possible regressions? I can replace the errors with asserts or just remove them completely. Let me know which you prefer.

@sg-
Copy link
Contributor

sg- commented Sep 9, 2016

@mbed-bot: TEST

HOST_OSES=ALL
BUILD_TOOLCHAINS=ALL
TARGETS=ALL

@sg-
Copy link
Contributor

sg- commented Sep 9, 2016

/morph test

@mbed-bot
Copy link

Result: FAILURE

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 809

Build failed!

@mbed-bot
Copy link

[Build 890]
FAILURE: Something went wrong when building and testing.

@sg-
Copy link
Contributor

sg- commented Sep 10, 2016

@geky Have a look at CI results. There are a few error such as this. Looks like all M0 targets are affected

[Error] rt_HAL_CM.h@65,0:  #1114: this feature not supported on target architecture/processor

@RobertRostohar
Copy link
Contributor

@GKEY, I was trying to avoid additional overhead to all functions just to report errors when called from critical section (and the error code is anyway not consistent as discussed earlier).

Let’s keep the proposed solution for the time being as it is in the PR.

In the long term with CMSIS RTOS API2 and RTX5 we should:

  • introduce critical sections natively
  • more appropriate error code (existing or new) when function cannot be called from critical sections
  • implement a hard fault handler which handles errors rather than doing it in each function (less overhead)

@geky
Copy link
Contributor Author

geky commented Sep 23, 2016

I've updated the pr to remove BASEPRI for now, since the register is absent on M0 devices and unused in the mbed OS codebase. Let us know if anyone sees an immediately important reason to accommodate the BASEPRI register.

Before, rtx calls would hard fault in critical sections when an svc
instruction was attempted with interrupts disabled.

Required changes:
- Added check for CPSR I bit in cortex A rtx
- Added check for PRIMASK in cortex M rtx
- Modified critical sections in cortex M rtx to be recursive
  (already recursive in cortex A)
@geky geky force-pushed the fix-rtos-critical-sections branch from 4336283 to 3e7b5ed Compare September 23, 2016 02:19
@geky
Copy link
Contributor Author

geky commented Sep 23, 2016

/morph test

@mbed-bot
Copy link

Result: SUCCESS

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 931

All builds and test passed!

Copy link
Contributor

@0xc0170 0xc0170 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants