Skip to content

MIMXRT1050: Fix ENET issues #10299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 4, 2019
Merged

Conversation

mmahadevan108
Copy link
Contributor

@mmahadevan108 mmahadevan108 commented Apr 2, 2019

Description

This is a fix for Issue#10239. The change aligns the receive buffer lengths

Pull request type

[x] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

Release Notes

@ciarmcom ciarmcom requested review from a team April 2, 2019 23:00
@ciarmcom
Copy link
Member

ciarmcom commented Apr 2, 2019

@mmahadevan108, thank you for your changes.
@ARMmbed/mbed-os-maintainers @ARMmbed/mbed-os-ipcore please review.

Copy link
Contributor

@kjbracey kjbracey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaves some things undone:

  • Alignment needs to be 32 bytes
  • You also need to invalidate the cache for a buffer before assigning it to a RX descriptor (in update _read_buffer?). This prevents writeback into the region during reception.
  • You need to clean the cache for a buffer when adding a TX descriptor.

I still don't really understand the original issue - it seems to me like it should be performing much worse due to the lack of invalidates and cleans, and the absence of them doesn't explain the crash - it would just lead to data corruption.

I'm suspicious that the caches have just been disabled somewhere (like other M7 platforms have done when turning on Ethernet...)

@@ -189,7 +192,8 @@ bool Kinetis_EMAC::low_level_init_successful()

/* Create buffers for each receive BD */
for (i = 0; i < ENET_RX_RING_LEN; i++) {
rx_buff[i] = memory_manager->alloc_heap(ENET_ETH_MAX_FLEN, ENET_BUFF_ALIGNMENT);
rx_buff[i] = memory_manager->alloc_heap(ENET_ALIGN(ENET_ETH_MAX_FLEN, ENET_BUFF_ALIGNMENT),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was inspecting it, I didn't think an extra ENET_ALIGN was necessary, as you're already passing in ENET_BUFF_ALIGNMENT, which should mean alloc_heap rounds up itself.

But because of the implementation, this will end up adding a couple more bytes, and maybe that makes the difference if we're overrunning?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This analysis was incorrect, as per comment on #10239 - alloc_heap does not in fact round up the size.

@@ -278,8 +282,14 @@ emac_mem_buf_t *Kinetis_EMAC::low_level_input(int idx)
p = rx_buff[idx];
memory_manager->set_len(p, length);

#if defined(FSL_SDK_ENABLE_DRIVER_CACHE_CONTROL) && FSL_SDK_ENABLE_DRIVER_CACHE_CONTROL
/* Add the cache invalidate maintain. */
DCACHE_InvalidateByRange((uint32_t)bdPtr->buffer, g_handle.rxBuffSizeAlign[0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are going to be cache invalidating, I believe the RX buffer alignment will need to be increased from 4 to 32 (FSL_FEATURE_L1DCACHE_LINESIZE_BYTE)

@@ -1917,6 +1917,7 @@
"SKIP_SYSCLK_INIT",
"FSL_FEATURE_PHYKSZ8081_USE_RMII50M_MODE",
"SDRAM_IS_SHAREABLE",
"FSL_SDK_ENABLE_DRIVER_CACHE_CONTROL=1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about turning this on. This activates code in fsl_enet.c which mostly isn't used, but I can see one incorrect one at line 812:

       DCACHE_InvalidateByRange((uint32_t)rxBuffer,  (buffCfg->rxBdNumber * rxBuffSizeAlign));

That's assuming rxBuffer is a contiguous array of packets (as its pointer type suggests), when really it's pointing to an array of pointers to packets, and accessed via insane casting:

       for (count = 0; count < buffCfg->rxBdNumber; count++)
        {
            /* Set data buffer and the length. */
            curBuffDescrip->buffer = (uint8_t *)(*((uint32_t *)(rxBuffer + count * 4)));

So that invalidate could be trashing quite a lot of system workspace.

You actually want

       DCACHE_InvalidateByRange((uint32_t)curBuffDescrip->buffer,  rxBuffSizeAlign);

inside the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for highlighting this, I will raise a ticket against the SDK driver.

This is a fix for Issue 10239. The change aligns
the receive buffer lengths

Signed-off-by: Mahesh Mahadevan <[email protected]>
Copy link
Contributor

@kjbracey kjbracey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct to me now.

Follow-up tasks would be:

  1. check other drivers for the same issue - several have the same logic flaw but I guess get away with it because their alignment is only 4, which is the same as the heap.
  2. Put in the data cache cleans and invalidates and turn that shareable flag off again so you get your data cache back into use. The i.MX RT1050's data cache has been effectively disabled since the Ethernet driver was added to the tree.

@0xc0170
Copy link
Contributor

0xc0170 commented Apr 4, 2019

CI started

@mbed-ci
Copy link

mbed-ci commented Apr 4, 2019

Test run: FAILED

Summary: 1 of 13 test jobs failed
Build number : 1
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_cloud-client-test

@0xc0170
Copy link
Contributor

0xc0170 commented Apr 4, 2019

I restarted client error - k64f target error - not related

@0xc0170 0xc0170 merged commit 71c84e8 into ARMmbed:master Apr 4, 2019
@unsignedint
Copy link

@0xc0170 I'm not sure what your release plan is... but it would be ideal if this and #10314 could go into 5.11 as a maintenance release. For our project we already have custom hardware and to upgrade to 5.12 at this point is non-trivial. Of course we can take it as an independent patch, but potentially others may also benefit from a 5.11 point update at some point? :)

@0xc0170
Copy link
Contributor

0xc0170 commented Apr 5, 2019

Please take them as independent patches. Only patch releases for 5.12 at the moment.

Note, besides our release plan, 5.12 introduced non-backward compatible changes (toolchain updates), testing older releases would take quite a time (CI switch and so on). We will do some improvements there soon, don't have anything solid to provide.

@mmahadevan108 mmahadevan108 deleted the Fix_MXRT1050_ENET branch April 5, 2019 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants