-
Notifications
You must be signed in to change notification settings - Fork 3k
Don't send events on close() #3374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes #3367 |
These socket functions are tricky (just looking at the git blame has a different commit for almost every single line), and I'm a little hesitant on having the close event only trigger sometimes (it will still trigger when a socket is in a recv/send loop), but having the initial state of the socket = state after a successful receive is logically sound to me. This patch seems good to go 👍 |
Ah! just commented on #3367 There isn't a "close" event. The event generated by close is really a "recv/send" event to inform the user that the result of a recv call has changed (in this case erroring since the socket is closed). |
Excellent, sounds like we are all in agreement then! 😃 |
Actually, could this be updated to be consistent in all three socket classes? |
Done. I force-pushed the modified version that changes |
/morph test-nightly |
Result: FAILUREYour command has finished executing! Here's what you wrote!
OutputTest failed! |
@bogdanm The test reports Restarting , just to be certain. I checked nighly 3 days ago, was green. /morph test-nightly |
Please do not merge yet. There might still be issues with this PR. |
Let us know. If this is critical, it shall be integrated within today. Can you @bogdanm specify what are the issues? |
It's currently possible to generate a socket event when a non-blocking socket is closed: 1. _pending is set to 0 in https://github.com/ARMmbed/mbed-os/blob/master/features/netsocket/TCPSocket.cpp#L22 when the socket is created. 2. close() calls event() in https://github.com/ARMmbed/mbed-os/blob/master/features/netsocket/Socket.cpp#L66 3. event() increments _pending, and since _pending is 1 it will call _callback() in https://github.com/ARMmbed/mbed-os/blob/master/features/netsocket/TCPSocket.cpp#L167 However, if send() (for example) is called, this can happen: - send() is called and sets _pending to 0. - when the data is sent, event() is called, which sets _pending to 1 and calls _callback(). - if close() is called at this point, there won't be an event generated for close() anymore, since _pending will be set to 2. Same thing for recv. Also, same thing for TCPServer and UDPSocket. This PR changes the initial value of _pending to 1 instead of 0, so that events are never generated for close().
Alright, pushed yet another version that actually sets |
/morph test-nightly |
Result: FAILUREYour command has finished executing! Here's what you wrote!
|
/morph test-nightly |
I think I might've confused the CI by doing a force push ? |
It should handle force pushes correctly, I'm hoping this was just a spurious issue |
Result: SUCCESSYour command has finished executing! Here's what you wrote!
OutputAll builds and test passed! |
Yep looks like it, should be good to go! @0xc0170 |
Result: FAILUREYour command has finished executing! Here's what you wrote!
OutputTest failed! |
Looks like this failed because of the same problematic NIST test. |
mbed client guys are seeing networking issues with 5.3.2 and lwIP Ethernet, and analysis so far seems to suggest they're caused by one of this change or #3457 or #3458. Trying to get to the bottom of it, and can't actually see a way any of these could directly cause their problem, either in isolation or in combination - it's possible there's just a pre-existing unreliability, either in NSAPI or the app. But while going over all three PRs with a fine-tooth comb to investigate, this change stood out as wrong to me. By setting it to non-zero on start-up, you're suppressing events until they make a network call that sets _pending to zero. After that, original behaviour resumes. That will certainly have the effect of making an immediate close() after open not generate an callback, but it hits both more and less than you were intending. AFAICT, TCPServer sockets that just listen won't get a connection callback, UDPSockets that just bind and just wait for incoming data won't get a receive callback. And it doesn't totally suppress events from close - it's still possible that close could send an event. (eg close immediately after non-blocking send). I think the correct fix is to set _callback to NULL in the actual close call, before calling event(), to always suppress the close callback, without affecting the rest of the event system. Does this analysis sound correct? If it is, it must mean we don't have a test coverage hole - no non-blocking server-type use. (CC @mikaleppanen who's checking that) Still, I don't see how this change affects the observed problem. (And I'm also a bit wary that this _pending mechanism might be suppressing too many events, what with a wide variety of activity and events for both RX and TX directions. I kind of wonder whether we should be making sure _pending gets set to 0 on every single socket operation. It's vaguely possible something like that might be the mbed client problem.) |
That analysis sounds correct to me. My earlier comment on the initial state of the socket seems wrong now. On a leave from a recv, the _pending flag should be zero unless a packet was recved during the call. Either way, zeroing the callback is a much more robust solution, will make a pr when I can. |
I think we got down to the basic mbed-client update problem - they were inadvertently relying on lwIP's "connected" event being suppressed by this change. #3457 made the connect() call zero |
Ah! That's good to know. Given the netsocket api/implementations, spurious interrupts are still a very real possibility. Consider what would happen if a small packet was recieved between these two lines: Unfortunately, even if we strictly controlled when interrupts were let through, the netsocket api would need a bit of restructuring/testing before we could garuntee when interrupts occur. State transistions based off the socket's attach function are probably a programming error and should instead be based of the return value of the socket's send/recv. @kjbracey-arm, thanks for clearing this up. |
Thank you @kjbracey-arm ! 👍 |
Agreed - abstraction layers can't control when the underlying stack generates events, and any attempt to interfere to improve the situation can make the situation worse. As this change did. I did see another report of a bad pattern - calling send() an HTTP request, expecting a subsequent event from it, and hence not bothering to do anything when that event was received. Then when response data arrived, its event is suppressed because It' s arguable that _pending is being unhelpful rather than helpful in that case, but the application is definitely in the wrong - it can't expect distinct "sent" and "received" events. (I would argue you can't necessarily expect "sent" events at all, unless you know you filled the output buffer and got EWOULDBLOCK). |
The intention of the I was thinking of how to help users avoid these mistakes in the future. One option would be to change the |
That's what I kind of figured - the ESP8266 or similar attaching it straight to the serial interrupt. I can't help but feel that such stacks should be exercising a little more constraint. My gut feeling is that an abstraction layer choosing to suppress events from lower layers (by deducing which ones are spurious) is potentially error-prone, and ESP8266 should not rely on that behaviour. |
Maybe, though it wouldn't be the easiest change to the ESP8266 driver for it to understand what type of events it's recieving (there is currently no state, the driver is just reactive to interrupts). The ESP8266 could do the coalescing, but it's convenient to have coalesce in the mbed socket layer since the code can be reused accross any combination of network stack + user code. |
I'm also concerned we may be making a documentation error. It seems the name/usage of callbacks/similarity to other attach functions in mbed is causing quite a bit of confusion on what the attach function actually provides. This may be remedied by introducing higher-level stream based apis. Would a select function be more useful to the client team? (I can't say we're going to provide one soon, and the attach function probably won't go away, but I'm curious if it's worth consideration). |
As it happens, I am working on a revision of the FileHandle/Stream stuff for other reasons - there is certainly overlap here, including the possiblity of sockets being FileHandles, and I want to discuss that. Even with that though, it's not clear that majorly any improved event functionality is implementable in general on the sort of network stacks we have to deal with. I'm thinking we need a solid "poll" for FileHandle, and am sure we could do that for driver-based devices, but no idea how many network stacks could reliably provide that. I still think the extremely restricted callback semantics we've already specified are probably the lowest common denominator. The only real ESP8266 event is packet reception isn't it? The chip spontaneously sends data when a packet arrives. So a serial interrupt is fine for that, except you should possible be masking off serial interrupts except when idle, so issuing socket calls doesn't generate events. On the other hand, there is no "socket has become writable" event. I guess people just haven't noticed that one. |
Don't send events on close()
It's currently possible to generate a socket event when a non-blocking socket is closed:
when the socket is created.
However, if send() (for example) is called, this can happen:
since _pending will be set to 2.
(same thing for recv)
This PR changes the initial value of _pending to 1 instead of 0, so that
events are never generated for close().