Skip to content

EventQueue: Old pointers of sibling were not cleared #9144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 21, 2018

Conversation

deepikabhavnani
Copy link

Description

When adding sibling at the head of linked list, the head if pointing to something in linked list was not updated, hence a loop was formed in linked list

Element0 - First addition to linked list
Element1 - Has higher delay hence added to back
0 ->(next) 1
Element2 - Delay is same as Element0, hence should be sibling of 0, shall be added at head

Expected:
2 ------------->(next) 1
|(sibling)
0

Bug: (Resolved with this)
2 ------------->(next) 1
|(sibling)
0 ------------->(next) 1

If we add more elements and next pointer of sibling is updated, old references will cause issues
Element3 added

Expected:
2 ------------->(next) 3 ------------->(next) 1
|(sibling)
0

Bug: (Resolved with this)
2 ------------->(next) 3 ------------->(next) 1
|(sibling)
0 ------------->(next) 1
Both siblings here point to different next

Resolves: #8664

Pull request type

[X] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

When adding sibling at the head of linked list, the head if pointing
to something in linked list was not updated, hence a loop was formed
in linked list

Element0 - First addition to linked list
Element1 - Has higher delay hence added to back
0 ->(next) 1
Element2 - Delay is same as Element0, hence should be sibling of 0
           Shall be added at head

Expected:
2    ------------->(next) 1
|(sibling)
0

Bug: (Resolved with this)
2    ------------->(next) 1
|(sibling)
0    ------------->(next) 1

If we add more elements and next pointer of sibling is updated, old
references will cause issues
Element3 added

Expected:
2    ------------->(next) 3  ------------->(next) 1
|(sibling)
0

Bug: (Resolved with this)
2    ------------->(next) 3  ------------->(next) 1
|(sibling)
0    ------------->(next) 1
***Both siblings here point to different next***
@cmonr cmonr requested a review from geky December 19, 2018 03:56
Copy link
Contributor

@cmonr cmonr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

@pan-
Copy link
Member

pan- commented Dec 19, 2018

@deepikabhavnani Would it be possible to extend the event unit tests to make sure the problem is solved ?

@deepikabhavnani
Copy link
Author

@deepikabhavnani Would it be possible to extend the event unit tests to make sure the problem is solved ?

Yes. Wanted to confirm first if next should be null in case of siblings, or all of them point to same next and update on head should traverse and update next of all siblings. Not sure how it was designed to be. @geky

@deepikabhavnani
Copy link
Author

Additional commit to correct traverse in destructor

In `equeue_destroy` the external loop was for main events linked
list and internal loop for siblings.
Siblings start was not initialized correctly for each main link
Copy link
Contributor

@0xc0170 0xc0170 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugfix with the test 💯

@cmonr
Copy link
Contributor

cmonr commented Dec 20, 2018

CI started

@mbed-ci
Copy link

mbed-ci commented Dec 20, 2018

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 1
Build artifacts

@cmonr
Copy link
Contributor

cmonr commented Dec 21, 2018

Waiting on a final review by @geky, but otherwise should be good to go.

@geky
Copy link
Contributor

geky commented Dec 21, 2018

Sorry for such a late review, this looks like a perfect fix 👍

Looking into it, the equeue relies on sibling's next pointers being NULL here:
https://github.com/ARMmbed/mbed-os/blob/master/events/equeue/equeue.c#L294

Unfortunately it was a hard catch because it is only an issue when canceling siblings which also have a sibling. In this case the equeue tries to update the next event's reference when sibling is not NULL, which is wrong and corrupts the data structure. The is then made worse because events references can end up unused in a lot of use cases, making the corruption go unnoticed.

image

Thanks for the fix! After looking at the code again I'm embarrassed the equeue has been used for this long without some sort of design doc, sorry you guys had to reverse engineer all of that.

@cmonr
Copy link
Contributor

cmonr commented Dec 21, 2018

CI started

My mistake. Starting CI has become a reflex now. Updated status accordingly.

@cmonr cmonr merged commit 198a8bf into ARMmbed:master Dec 21, 2018
@deepikabhavnani deepikabhavnani deleted the equeue_cancel_issue branch December 21, 2018 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants