Skip to content

LWIP system mailbox overflow fix #11976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 4, 2019
Merged

LWIP system mailbox overflow fix #11976

merged 2 commits into from
Dec 4, 2019

Conversation

pstolarz
Copy link
Contributor

@pstolarz pstolarz commented Nov 28, 2019

Description

This is a bug fix for the LWIP mbed-os system mailbox overflow. The bug reveals during heavy network loads when a device is not able to handle all incoming network traffic in time, therefore the internal network buffers (including LWIP mailboxes and memory pools) work at their maximum capacities.

The bug was detected on K64F platform working as an UDP server flooded by large number of UDP packets for a long period of time. Description of encountered problems may be found here #11681.

Summary of change

Proposed patch constrains LWIP mbed-os system mailbox's post/fetch indexes value range from 0 to MB_SIZE-1 (that is mod MB_SIZE integers). In the current implementation indexes are not constrained and wraps up at their maximum integer type value (255 for uint8_t) causing mailbox corruption.

Bug root cause

sys_mbox_trypost() and sys_mbox_post() defined in features/lwipstack/lwip-sys/arch/lwip_sys_arch.c have a common code as follows:

1   int state = osKernelLock();
2
3   mbox->queue[mbox->post_idx % MB_SIZE] = msg;
4   mbox->post_idx += 1;
5
6   osEventFlagsSet(mbox->id, SYS_MBOX_FETCH_EVENT);
7   if (mbox->post_idx - mbox->fetch_idx == MB_SIZE-1)
8       osEventFlagsClear(mbox->id, SYS_MBOX_POST_EVENT);
9
10  osKernelRestoreLock(state);

Both post_idx and fetch_idx are uint8_t defined in sys_mbox_t (file features/lwipstack/lwip-sys/arch/sys_arch.h). Conditional check in line 7 protects the mailbox against overflow (anti-overflow check).

Issue 1

Let MB_SIZE == 8 (default mailbox size value), post_idx == 255, fetch_idx == 249, then post_idx - fetch_idx == 6. So there is possible to post only 1 more message before entirely filling up the mailbox. But, after posting the message we have post_idx == 0 (uint8_t wraps), so post_idx - fetch_idx == -249 (0xffffff07) (promotion from uint8_t to int in the anti-overflow check). This causes the anti-overflow check to return false and the mailbox will be filled with newly posted messages causing mailbox corruption (and memory leaks on the underlying memory pool where the messages are stored).

Issue 2

It will not happen for mailboxes where 255 mod MB_SIZE == MB_SIZE-1, so will not reveal if MB_SIZE is 8, 16, 32, 64.... Therefore the default mailbox (size 8) is safe regarding this issue.

Let MB_SIZE == 10, post_idx == 255, fetch_idx == 250. The mailbox looks like:

 0   1   2   3   4   5   6   7   8   9
[X] [X] [X] [X] [X] [ ] [ ] [ ] [ ] [ ]
 F                   P

F - fetch_idx
P - post_idx 
X - slot with a message

After posting a new message, post_idx == 0 (uint8_t wraps):

 0   1   2   3   4   5   6   7   8   9
[X] [X] [X] [X] [X] [X] [ ] [ ] [ ] [ ]
F,P

Next messages posted on the mailbox will overwrite the ones already stored in the mailbox (slots 0-5) causing mailbox corruption.


Pull request type

[X] Patch update (Bug fix)
[ ] Feature update (New feature / Functionality change / New API)
[ ] Major update (Breaking change E.g. Return code change / API behaviour change)

Test results

[ ] No Tests required for this change (E.g docs only update)
[ ] Covered by existing mbed-os tests (Greentea or Unittest)
[X] Tests / results supplied as part of this PR
  • mbed-os unit tests passed (patch baseline on tag: mbed-os-5.14.2).
  • GreenTea tests - not able to check due to some compilation issues on my building environment (not related to this patch).
  • The fix was tested on the K64F setup, where the bug was detected. The tests have been performed for various sizes of the receiving mailbox and NETBUF / PBUF memory pools. After each test a memory pool statistics were verified against memory leaks and double frees. No problems found.

@ciarmcom ciarmcom requested review from a team November 28, 2019 14:00
@ciarmcom
Copy link
Member

@pstolarz, thank you for your changes.
@ARMmbed/mbed-os-ipcore @ARMmbed/mbed-os-maintainers please review.

Copy link
Member

@bulislaw bulislaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, @kjbracey-arm please review

Copy link
Contributor

@kjbracey kjbracey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'm suggesting a slight simplification on the maths.

(Looking at the surrounding EventFlags code, I do wonder why is this doing it the hard way. Isn't this the canonical case of when a Semaphore would actually be a good idea? I normally loathe semaphores, cos I hardly ever want to count things, but here we do.)

@pstolarz
Copy link
Contributor Author

pstolarz commented Dec 3, 2019

As a side note (improvement).
The current implementation may be considered ineffective in terms of mailbox slots usage - the mailbox is considered full if only MB_SIZE-1 slots are occupied. It would be more effective to have all MB_SIZE slots occupied for fully filled mailbox. In the latter case we will have fetch_idx == post_idx for full as well as empty mailbox. To distinguish between these the mailbox fetch/post flags are used as follows:

FETCH_EVENT    POST_EVENT    Notes
     0             1         Mailbox empty; fetch_idx == post_idx; Initial state
     1             1         Mailbox partially occupied, fetch_idx != post_idx
     1             0         Mailbox full; fetch_idx == post_idx
     0             0         Invalid state

Since the current implementation has a proper code, which guards the mailbox against fetching from an empty mailbox and posting to the full one, basing on the above flags, the only change required to implement this improvement is to update the post/trypost anti-overflow check to:

if (mbox->post_idx == mbox->fetch_idx)
    osEventFlagsClear(mbox->id, SYS_MBOX_POST_EVENT);

So now, post/trypost check is fully symmetric to fetch/tryfetch check against empty mailbox condition. That's all.

EDIT: sys_mbox_free() needs also to be updated.

@kjbracey
Copy link
Contributor

kjbracey commented Dec 4, 2019

In a circular buffer scenario, yes, you can add a bunch of logic to ensure that you can cope with a totally-full buffer. But the logic is generally going to be bigger than just increasing the buffer by one entry. Unless your buffer entries are unusually large, at least.

I'd consider adding logic if these were 1.5K Ethernet frame buffers, but I don't think it's worth worrying about it for 4-byte void * entries.

Although, as you note, we already have a bunch of event flags complexity there so we've possibly already paid the cost. If you want to submit it as a separate PR, we'll consider it, but I'd need to stare at it quite a while.

Copy link
Contributor

@kjbracey kjbracey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note that astyle isn't caring about the formatting here - we'd normally want more braces. I guess this is being treated as "lwIP" code, so using lwIP style is fine.

@pstolarz
Copy link
Contributor Author

pstolarz commented Dec 4, 2019

It was just proposal - do whatever you want with it.

I'd consider adding logic if these were 1.5K Ethernet frame buffers, but I don't think it's worth worrying about it for 4-byte void * entries.

I don't agree. It's not a matter 1 extra sizeof(void*) but SW architecture quality. From OS ver 5.14.2 MB_SIZE is externally configurable via mbed_lib.json so is a part of user configuration API. As a user I'd expect if I set MB_SIZE to 8 I'll be able to store max. 8 messages - not 7 (since for some reason it'd be tricky to implement / require extra logic or so). Implementation shouldn't drive good architecture, but good implementation should follow the architecture in most effective way. So for example. if for some mailbox case it'd be extra implementation overhead to store MB_SIZE elements comparing to MB_SIZE-1 on MB_SIZE long buffer, implementation should use MB_SIZE+1 buffer internally just to make sure user is able to store MB_SIZE of messages as one expects.

But it was just side note. I found more fundamental issue related to LWIP mailbox sizes. I'll raise issue for it soon

@kjbracey
Copy link
Contributor

kjbracey commented Dec 4, 2019

Hmm, yes, despite our docs saying nothing, looking at the lwIP headers, they do clearly say that the MBOX_SIZE things are supposed to be the number of storable elements, so we should either be allocating that + 1 entries or having the full/empty logic.

(But then with either of those changes I'd probably reduce the config settings down to 7 to compensate, as we don't actually need to store an extra entry, afaik. If it works fine as it is...)

@0xc0170
Copy link
Contributor

0xc0170 commented Dec 4, 2019

CI started

@mbed-ci
Copy link

mbed-ci commented Dec 4, 2019

Test run: FAILED

Summary: 1 of 11 test jobs failed
Build number : 1
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_dynamic-memory-usage

@0xc0170
Copy link
Contributor

0xc0170 commented Dec 4, 2019

jenkins-ci/mbed-os-ci_dynamic-memory-usage restarted

@0xc0170 0xc0170 changed the title [BUGFIX] LWIP system mailbox overflow fix LWIP system mailbox overflow fix Dec 4, 2019
@0xc0170 0xc0170 added release-version: 6.0.0-alpha-1 First pre-release version of 6.0.0 and removed needs: CI labels Dec 4, 2019
@0xc0170 0xc0170 merged commit 9248169 into ARMmbed:master Dec 4, 2019
se7ensong pushed a commit to DNANUDGELTD/mbed-os that referenced this pull request Mar 12, 2020
LWIP system mailbox overflow fix

(cherry picked from commit 9248169)
se7ensong pushed a commit to DNANUDGELTD/mbed-os that referenced this pull request Mar 12, 2020
LWIP system mailbox overflow fix

(cherry picked from commit 9248169)
(cherry picked from commit 00ee497)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-version: 6.0.0-alpha-1 First pre-release version of 6.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants