-
Notifications
You must be signed in to change notification settings - Fork 1.1k
PYTHON-4669 - Update More APIs for Motor Compatibility #1815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -1484,6 +1486,17 @@ def __init__( | |||
_file: Any | |||
_chunk_iter: Any | |||
|
|||
async def __anext__(self) -> bytes: | |||
return super().__next__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect. We can't call super().__next__()
because that does blocking I/O.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good catch. We don't have an async equivalent here unless we write one ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IOBase implements next using readline:
IOBase (and its subclasses) supports the iterator protocol, meaning that an IOBase object can be iterated over yielding the lines in a stream. Lines are defined slightly differently depending on whether the stream is a binary stream (yielding bytes), or a text stream (yielding character strings). See readline() below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't an asyncio
version of readline
, so we'd need to write our own. The canonical way to do so appears to be with threads (https://stackoverflow.com/questions/34699948/does-asyncio-supports-asynchronous-i-o-for-file-operations), at which point I question if the performance gained by not blocking the loop is more than the cost of thread overhead. The official CPython forums have similar concerns at the OS level: https://discuss.python.org/t/asyncio-for-files/31077/15.
def __next__(self) -> bytes: | ||
return super().__next__() | ||
|
||
def __next__(self) -> bytes: # noqa: F811, RUF100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way to avoid the duplicate def __next__(self)
definitions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a limitation of the synchro script: it will translate the async __anext__
into __next__
, but we want to have a separate __next__
for the async class that raises an error. That explicit __next__
will also get ported to the synchronous class unfortunately, giving us the duplicate defs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but can we workaround that? The duplicate code is strange the read. There's also a runtime perf cost to overriding a method just to call the super class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't a simple way to workaround it, no. We could change the definition to be less confusing, like this:
async def __anext__(self) -> bytes:
return super().__next__()
if not _IS_SYNC:
def __next__(self) -> bytes: # noqa: F811, RUF100
raise TypeError(
"AsyncGridOut does not support synchronous iteration. Use `async for` instead"
)
Which would synchronize to
def __next__(self) -> bytes:
return super().__next__()
if not _IS_SYNC:
def __next__(self) -> bytes: # noqa: F811, RUF100
raise TypeError("GridOut does not support synchronous iteration. Use `for` instead")
We can also add a comment explaining why the duplicate def exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, but how about:
if not _IS_SYNC:
async def __anext__(self) -> bytes:
return await self.readline()
def __next__(self) -> bytes: # noqa: F811, RUF100
raise TypeError(
"AsyncGridOut does not support synchronous iteration. Use `async for` instead"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot we had our own async readline
, good catch. This looks like we'd read a single line or every byte if the file wasn't line-delimited. Is that the intended behavior for iteration here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's what IOBase is supposed to do and AsyncGridOut iteration should match the sync version. We also need to remove IOBase from the async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, follow-up PR: #1821.
No description provided.