bpo-29883: Add UDP support to Windows Proactor Event Loop #1067

ameily · 2017-04-09T21:35:46Z

This PR adds UDP support to the Windows Proactor Event Loop. Several changes were performed:

Added wrappers around Winsock UDP functions:
- WSAConnect - ConnectEx doesn't support binding a UDP endpoint to a socket so I had to add this. WSAConnect completes immediately for UDP sockets.
- WSARecvFrom
- WSASendTo
Implement _make_datagram_transport() and the corresponding UDP transport class _ProactorDatagramTransport.
- Bound UDP endpoints can call WSARecv and WSASend. Unbound UDP endpoints require calls to WSARecvFrom and WSASendTo.
Several new unit tests exercising UDP proactor support.

I've verified that the change can be cleanly applied to master, 3.5, and 3.6. So, back porting this feature will be straightforward, which I would like personally since I require this feature on a Python 3.5 project.

One of the SSL unit tests is failing and I've verified that the same test fails with the same error in master.

cpython\lib\asyncio\sslproto.py:335: ResourceWarning: unclosed transport
<asyncio.sslproto._SSLProtocolTransport object at 0x00000171EABA8A90>
  source=self)

https://bugs.python.org/issue29883

mention-bot · 2017-04-09T21:35:48Z

@ameily, thanks for your PR! By analyzing the history of the files in this pull request, we identified @1st1, @zooba and @birkenfeld to be potential reviewers.

njsmith · 2017-04-10T02:29:34Z

You might be able to remove some code by using socket.connect in place of _overlapped.WSAConnect. Not sure.

If I'm reading this correctly, your sendto implementation kicks off a WSASendTo, then waits for it to complete, kicks off another one, waits for that to complete, etc., just like you'd expect from TCP, right? It turns out that this can be really really inefficient for overlapped UDP. Chrome actually switched from doing UDP via IOCP to doing it via non-blocking sockets because of this (in one of the test reports in that thread they were getting 250 KB/s with IOCP, 8000 KB/s with non-blocking I/O; that's not a typo. And note that testing on loopback doesn't work; you need to test on a real network.). The hypothesis is that this is because overlapped UDP effectively disables buffering, like, the completion notification doesn't mean "this packet was queued to the send buffer" like it does for overlapped TCP, but rather "this packet was actually transmitted on the network". Or something.

I'm not sure what the best solution is. AFAICT libuv's solution is that it just keeps starting WSASendTo calls without waiting for previous ones to complete, though this must eventually go wrong somehow if you keep queueing send calls faster than the network card can process them...

You might also find this interested as a general overview of the challenges of making IOCP work well: python-trio/trio#52

njsmith · 2017-04-10T02:32:28Z

Oops, forgot to include this link to more discussion of the chromium thing: https://groups.google.com/a/chromium.org/forum/#!topic/net-dev/VTPH1D8M6Ds

ameily · 2017-04-10T13:22:17Z

@njsmith Thanks for the feedback and resources.

You might be able to remove some code by using socket.connect in place of _overlapped.WSAConnect.

I can take a look at socket.connect() and see if that'll work instead of WSAConnect.

If I'm reading this correctly, your sendto implementation kicks off a WSASendTo, then waits for it to complete, kicks off another one, waits for that to complete, etc., just like you'd expect from TCP, right?

You are correct. I didn't want to run into the problem you described where we just spam the network with UDP packets. I can change it to only allow a certain number of outstanding WSASendTo operations at one time. If you're okay with this solution, what is an acceptable default number of outstanding send operations?

testing on loopback doesn't work; you need to test on a real network

Yup, I've been testing on a real network. I wrote a simple script to query a network's Windows systems NetBIOS name, which operates over UDP. The loopback address is only used in the unit tests, which I modeled after the TCP unit tests where nothing is actually written to the network.

I understand that IOCP UDP performance may not be the best but I think it's something that should be in the main CPython if possible. I need it on a project where I'm performing asynchronous TCP and UDP operations and the Selector event loop is adding a substantial amount of overhead (~80% in some cases). I can add a disclaimer in the documentation that IOCP sockets aren't the best option for high performance/traffic UDP applications.

I'll also take a look at the resources you linked to see if there are other ways I can improve the implementation.

njsmith · 2017-04-10T17:56:49Z

I can change it to only allow a certain number of outstanding WSASendTo operations at one time. If you're okay with this solution, what is an acceptable default number of outstanding send operations?

I don't know. It's something I have been trying to figure out as well :-). Like I said, AFAICT from reading the source libuv doesn't appear to put any upper bound on the number of outstanding send operations, so maybe that works?

There's even an argument that one should handle UDP by making non-blocking sendto calls and if it returns EWOULDBLOCK, silently discard the packet. That way you don't have to worry about select :-)

ameily · 2017-04-20T14:23:30Z

I've done some benchmarking to help determine the best design for high efficiency UDP over IOCP.

Each test was sending 1,048,576 packets to a single IP address. The IP has a static ARP rule set to hopefully improve measurement accuracy.
Each test has two applications:
- loop.call_soon(): UDP packets are sent one at a time with loop.call_soon(). This is to emulate a simple server where a UDP packet is sent as a response based on an incoming request from the event loop.
- Immediate: UDP packets are sent within the same loop callback (for i in range(...): transport.sendto(...)). This is to blast the network and fill the socket's outgoing queue.
Each test has two time measurements, using time.perf_counter().
- Time in Python: amount of time spent, in seconds, within Python from the first call to sendto to the last call to sendto.
- Actual Capture Time: amount of time, in seconds, from the first packet written on the wire to the last packet written on the wire.

Design	Application	Time in Python	Actual Capture Time	Notes
IOCP with buffering	`loop.call_soon()`	34.243	34.242	The current design where only a single `sendto` operation is active
IOCP with buffering	Immediate	0.965	31.692	The current design where only a single `sendto` operation is active
IOCP no buffering	`loop.call_soon()`	30.640	30.640	The modified current design where an unlimited number of `sendto` operations are active
IOCP no buffering	Immediate	22.014	20.329	The modified current design where an unlimited number of `sendto` operations are active
Sync	`loop.call_soon()`	23.058	23.058	Synchronous call to `sendto`
Sync	Immediate	13.896	13.896	Synchronous call to `sendto`

I was surprised that the synchronous design did not drop any packets and sendto never returned EWOULDBLOCK.

The benchmark script I wrote is here: https://gist.github.com/ameily/3b801a32b0c896a2c4791abcf56cc3e1. Each design I tested requires some modification of the core Python source code but I used the same script for each design.

larryhastings · 2017-12-08T21:56:20Z

FYI, since this is a new feature, it's not eligible for a backport to 3.5, or even 3.6. New features can't go into any released version of Python.

(unless Guido asks for it, which is rare)

1st1 · 2017-12-08T22:12:36Z

@larryhastings Yes, this will be a 3.7-only thing.

1st1 · 2017-12-08T22:13:34Z

@ameily Somehow I missed this PR :( Let's work on it and get it merged asap. Could you please rebase it against the current master?

ameily · 2017-12-16T18:45:00Z

I've rebased onto upstream master. Now, one of the tests is timing out:

test_sock_client_ops (test.test_asyncio.test_events.ProactorEventLoopTests) ... Timeout (0:20:00)!

I've narrowed it down to a call to sock_recv_into() which I did not modify for this PR. So, I don't think it's my code causing the issue, but I could be wrong.

ameily · 2018-09-15T18:12:50Z

This PR is now rebased onto upstream master.

vstinner · 2018-09-15T19:36:16Z

Amazing, thanks!

@vladima: Would you be interested to review this PR?

@asvetlov: Here is the PR that we were talking about ;-)

asvetlov · 2018-09-17T11:25:34Z

Cool!
Is it ready to review?

ameily · 2018-09-18T12:09:53Z

@asvetlov from my perspective, yes, this PR is ready for review.

zooba

Made a couple of linter-style comments, but apart from the borrowed references, it looks pretty good to me.

Modules/overlapped.c

ameily · 2018-09-22T15:37:01Z

@zooba I believe I addressed your feedback and made my changes PEP 7 compliant.

Lib/asyncio/proactor_events.py

ameily · 2018-09-26T21:34:14Z

@asvetlov Sorry, force of habit. I changed my calls to just be super().

1st1

One thing I'd like to fix in this PR is to remove mock tests completely and replace them with proper functional tests. Most of existing mock tests in asyncio will either be removed before 3.8 or replaced with proper functional tests.

Mock tests are tightly coupled with the implementation and prevent us from freely refactoring it. IMO It's better to have one or two functional tests than an array of mock tests.

I recommend you to take a the uvloop project and copy the UDP tests from there and maybe come up with a couple more, if needed.

ameily · 2019-05-02T22:22:09Z

@vstinner I believe I fixed the memory leak. I also rebased onto upstream/master.

python  -m test test_asyncio -R 3:3 -m test.test_asyncio.test_events.ProactorEventLoopTests.test_create_datagram_endpoint

0:00:00 load avg: 0.00 [1/1] test_asyncio
beginning 6 repetitions
123456
......
test_asyncio passed

== Tests result: SUCCESS ==

1 test OK.

Total duration: 3 sec 282 ms
Tests result: SUCCESS

vstinner · 2019-05-06T14:30:26Z

@vstinner I believe I fixed the memory leak. I also rebased onto upstream/master.

Strange. I don't know why it leaks on my Windows VM.

This PR has other flaws:

protocol is not paused/resumed
_proactor.recv() is called with an hardcoded size: 4096 bytes. It should be possible to configure that. Jumbo frames allows up to 9000 bytes per network packet. By the way, how did you pick the number 4096? _SelectorDatagramTransport uses 256 KiB by default: maybe it's way too much, I don't know.
repr() uses len(self._buffer) which is wrong for a collections.deque
not sure if it's an issue: abort() method is missing. Is it part of asyncio API or not?

I have a WIP change on top of this PR, but I didn't have time to finish it yet :-(

asvetlov · 2019-05-07T16:24:35Z

@vstinner could you provide a link to your WIP?
It can help a lot.

I think we should be consistent.
Let's use 256 KiB for both Unix and Windows implementations.
Can change it later in the one place.
Putting this constant into asyncio/constants.py would be nice.

abort() is a part of DatagramTransport interface.

alexchandel · 2019-05-14T19:06:51Z

@vstinner which function originates the leak? afaict Overlapped_getresult does not leak.

alexchandel · 2019-05-24T23:12:21Z

@ameily looks like it failed some tests in test_asyncio, all similar to:

======================================================================
ERROR: test_fatal_error_connected (test.test_asyncio.test_proactor_events.ProactorDatagramTransportTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/vsts/agent/2.150.3/work/1/s/Lib/unittest/mock.py", line 1226, in patched
    return func(*args, **keywargs)
  File "/Users/vsts/agent/2.150.3/work/1/s/Lib/test/test_asyncio/test_proactor_events.py", line 1008, in test_fatal_error_connected
    transport = self.datagram_transport(address=('0.0.0.0', 1))
  File "/Users/vsts/agent/2.150.3/work/1/s/Lib/test/test_asyncio/test_proactor_events.py", line 740, in datagram_transport
    transport = _SelectorDatagramTransport(self.loop, self.sock,
NameError: name '_SelectorDatagramTransport' is not defined

asvetlov · 2019-05-28T09:53:26Z

Closing.
The work is finished by #13440
Thank you very much @ameily and all reviewers

Follow-up for #1067 https://bugs.python.org/issue29883

Follow-up for python#1067 https://bugs.python.org/issue29883

the-knights-who-say-ni added the CLA signed label Apr 9, 2017

asvetlov added the topic-asyncio label Dec 8, 2017

1st1 self-requested a review December 8, 2017 22:12

1st1 requested review from asvetlov and vstinner December 8, 2017 22:13

ameily force-pushed the fix-issue-29883 branch from c979273 to 9fc145b Compare December 16, 2017 03:42

brettcannon added the awaiting review label Feb 21, 2018

ameily force-pushed the fix-issue-29883 branch from 53ab614 to ec3a18b Compare September 15, 2018 15:29

zooba reviewed Sep 19, 2018

View reviewed changes

Modules/overlapped.c Outdated Show resolved Hide resolved

Modules/overlapped.c Outdated Show resolved Hide resolved

Modules/overlapped.c Outdated Show resolved Hide resolved

Modules/overlapped.c Show resolved Hide resolved

asvetlov reviewed Sep 26, 2018

View reviewed changes

Lib/asyncio/proactor_events.py Outdated Show resolved Hide resolved

vstinner mentioned this pull request Sep 28, 2018

bpo-34687: Update asyncio doc for ProactorEventLoop #9623

Merged

1st1 self-assigned this Sep 28, 2018

1st1 requested changes Sep 28, 2018

View reviewed changes

bedevere-bot removed the awaiting review label Sep 28, 2018

ameily and others added 18 commits May 2, 2019 18:17

fixed UDP proactor tests

668d6d7

add missing cast to PyCFunction

05a704c

implement extra data for proactor UDP transport

685ed0f

enable UDP proactor unit tests on Windows

038739e

fix bad rebase of datagram proactor tests

ae14118

add news item for bpo29883

93ce3f8

fix bad rebase with missing finish_recv() in recv_into()

781e2ef

change WSAConnect to a module method

8761fcd

PEP 7 fixes and address PR feedback

5e3d4d3

fix build error

d25833d

simplify calls to super()

16d2ce7

address PR feedback

da40f28

Update asyncio-eventloop.rst

2962f5c

address feedback from PR

ce73a32

fix bad rebase

4211be9

address PR feedback

504ae83

fix memory leak

58e9f8d

fix memory leak when receiving on UDP endpoint

0c935b0

ameily force-pushed the fix-issue-29883 branch from 9c9f7d2 to 0c935b0 Compare May 2, 2019 22:17

eukreign mentioned this pull request May 8, 2019

[WIP] use Windows Pipes on Windows for more secure RPC lbryio/lbry-sdk#1955

Closed

asvetlov mentioned this pull request May 20, 2019

bpo-29883: Asyncio proactor udp #13440

Merged

asvetlov closed this May 28, 2019

miss-islington pushed a commit that referenced this pull request May 28, 2019

bpo-29883: Asyncio proactor udp (GH-13440)

bafd4b5

Follow-up for #1067 https://bugs.python.org/issue29883

DinoV pushed a commit to DinoV/cpython that referenced this pull request Jan 14, 2020

bpo-29883: Asyncio proactor udp (pythonGH-13440)

8e8629a

Follow-up for python#1067 https://bugs.python.org/issue29883

Uh oh!

bpo-29883: Add UDP support to Windows Proactor Event Loop #1067

bpo-29883: Add UDP support to Windows Proactor Event Loop #1067

Uh oh!

Conversation

ameily commented Apr 9, 2017 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mention-bot commented Apr 9, 2017

Uh oh!

njsmith commented Apr 10, 2017

Uh oh!

njsmith commented Apr 10, 2017

Uh oh!

ameily commented Apr 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njsmith commented Apr 10, 2017

Uh oh!

ameily commented Apr 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

larryhastings commented Dec 8, 2017

Uh oh!

1st1 commented Dec 8, 2017

Uh oh!

1st1 commented Dec 8, 2017

Uh oh!

ameily commented Dec 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ameily commented Sep 15, 2018

Uh oh!

vstinner commented Sep 15, 2018

Uh oh!

asvetlov commented Sep 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ameily commented Sep 18, 2018

Uh oh!

zooba left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ameily commented Sep 22, 2018

Uh oh!

Uh oh!

ameily commented Sep 26, 2018

Uh oh!

1st1 left a comment

Choose a reason for hiding this comment

Uh oh!

ameily commented May 2, 2019

Uh oh!

vstinner commented May 6, 2019

Uh oh!

asvetlov commented May 7, 2019

Uh oh!

alexchandel commented May 14, 2019

Uh oh!

alexchandel commented May 24, 2019

Uh oh!

asvetlov commented May 28, 2019

Uh oh!

Uh oh!

ameily commented Apr 9, 2017 •

edited by bedevere-bot

Loading

ameily commented Apr 10, 2017 •

edited

Loading

ameily commented Apr 20, 2017 •

edited

Loading

ameily commented Dec 16, 2017 •

edited

Loading

asvetlov commented Sep 17, 2018 •

edited

Loading