-
Notifications
You must be signed in to change notification settings - Fork 3k
Cellular: fix ATHandler destructor possible crash on delete #10684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jarvte, thank you for your changes. |
set_file_handle(NULL); | ||
|
||
// cancel possible oob event from queue and wait if oob is ongoing | ||
_queue.cancel(_event_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the cancel
reports success, then you need to set oob_queued
to false, no? Maybe make this conditional on oob_queued
, to avoid stale event_id
.
(Could _oob_queued
be replaced by a null event_id
check? Don't need separate variables?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As cancel is void function we decided not to call it.
Replaced _oob_queued and _oob_pending with _event_id.
Force pushed changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
equeue should be handling stale event_ids.
Each event gets its own generation count which is packed into the id. If the generation count doesn't match equeue_cancel
is a noop.
equeue_cancel(0)
is also a noop, much like free(NULL)
.
4eb3665
to
2b2312d
Compare
set_file_handle(NULL); | ||
|
||
while (_event_id > 0) { | ||
#ifdef AT_HANDLER_MUTEX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are in a bit of a pickle here.
We can't tell if cancel worked, so if we do call it we don't know if we need to wait for event_id
to be cleared. That's why it got removed.
But if the OOBs are running on the same thread as this destructor, which I believe is quite possible in some event-driven apps, we must call cancel (and we can assume it would work in that case). We can't wait - we'd deadlock the event queue.
As a minimal change, this could be
#ifdef AT_HANDLER_MUTEX
while (_event_id > 0) {
_oobCv.wait();
}
#else
if (_event_id > 0) {
event_cancel(_event_id);
}
#endif
But that still isn't any good if using an event queue with AT_HANDLER_MUTEX defined.
Ideally I'd like a a success/error return from cancel. But I recall this was discussed before - @kivaisan was looking at it? I'm not sure if the conclusion was that it wasn't possible to get a solid return, or that a different solution was found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something that would be better fixed by using the event's destructor?
https://github.com/ARMmbed/mbed-os/blob/master/events/equeue/equeue.h#L161
This tells you when the event has expired, either by completing or being canceled.
It's indirectly exposed through the C++ API via function objects, though it's messy:
bool dead = false;
class DeathHook {
void operator()(void) {
// do stuff
}
~DeathHook() {
dead = true;
}
};
int id = queue.call(DeathHook());
queue.cancel(id);
printf("is dead? %d\n", dead);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that example work at present? You're passing a temporary by value, and that temporary gets copied then destroyed, so I think dead is set immediately after the call().
I think it could work if EventQueue was upgraded to use C++11 perfect forwarding. Or, more simply, if the constructor/destructor reference counted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know, I actually have no idea. I think you're right this would need reference counting to work.
Alternatively we could poke an explicit destructor-register function through the C++ API like in the C one.
Or you might be able to extend the Event class.
@@ -212,7 +213,8 @@ class ATHandler { | |||
protected: | |||
void event(); | |||
#ifdef AT_HANDLER_MUTEX | |||
PlatformMutex _fileHandleMutex; | |||
rtos::Mutex _fileHandleMutex; | |||
rtos::ConditionVariable _oobCv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of _oobCv
is unclear to me, could add here some description. If _oobCv
is supposed to guard _event_id
then event()
and unlock()
need locks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oobCv notifies oob completion (it's notified when event_id
becomes -1).
I agree that it is assuming event_id
is protected by the _fileHandleMutex
. That is held as unlock
is entered, so access there is fine. If event()
can be called without that held, then yes it should have a lock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
event
is actually called from IRQ context, which means mutex is not possible. But it also means it's pretty safe - this is no worse than the previous _oob_queued
version, at least. Suggest not changing at this point.
@jarvte Any update for this PR, is this for 5.14? |
Yes, for 5.14. No time to do this now. |
5.14 is closing soon, and this has not seen any update. Should we keep the label here or this shall be closed as "will be updated later and reopened" ? |
f318f60
to
40faf90
Compare
@mirelachirica @jarvte We need reviews here and the status. If no update made by later today, we'll remove 5.14 label and move this to the next minor version. We are close to the code freeze. |
40faf90
to
012e1a6
Compare
I have rebased the branch and it builds/passes now UT. What comes to rest of the content and changes to ATHandler, they need approval from @kjbracey-arm, @geky, @AnttiKauppila. |
Couple of my and @geky's comments are still outstanding - use the |
Removed 5.14 label, can move to the next patch release. |
@0xc0170 I made the original pr but @mirelachirica is working on this one now. @mirelachirica ? |
I have executed UT localy and they build and pass. I do not see what is the failure here. |
It is a valgrind error. I will check it. ==12455== Memcheck, a memory error detector ==12455== For counts of detected and suppressed errors, rerun with: -v |
Yes they are, via a |
d8aafd8
to
5b61b5c
Compare
Valgrind errors for athandler unittest are now fixed. |
CI restarted |
Test run: SUCCESSSummary: 11 of 11 test jobs passed |
@@ -0,0 +1,11 @@ | |||
#include "cmsis_os2.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code files need license on top of them, please add to all new files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added license where it was missing.
In some multithread cases there is possibility that process_oob function was called after ATHandler was deleted. Fix is to wait if oob processing is ongoing.
5b61b5c
to
753ba8c
Compare
CI started |
Test run: SUCCESSSummary: 11 of 11 test jobs passed |
Description
In some multithread cases there is possibility that process_oob function was called after ATHandler was deleted. Fix is to cancel eventqueue if it's holding an event and wait if oob processing is ongoing.
Pull request type
Reviewers
@kjbracey-arm @mirelachirica
Release Notes