Skip to content

PYTHON-4669 - Update More APIs for Motor Compatibility #1815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 27, 2024

Conversation

NoahStapp
Copy link
Contributor

No description provided.

@blink1073 blink1073 requested review from blink1073 and removed request for Jibola August 27, 2024 17:26
Copy link
Member

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@NoahStapp NoahStapp merged commit 81ea92b into mongodb:master Aug 27, 2024
34 checks passed
@NoahStapp NoahStapp deleted the PYTHON-4669 branch August 27, 2024 17:38
@@ -1484,6 +1486,17 @@ def __init__(
_file: Any
_chunk_iter: Any

async def __anext__(self) -> bytes:
return super().__next__()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. We can't call super().__next__() because that does blocking I/O.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good catch. We don't have an async equivalent here unless we write one ourselves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOBase implements next using readline:

IOBase (and its subclasses) supports the iterator protocol, meaning that an IOBase object can be iterated over yielding the lines in a stream. Lines are defined slightly differently depending on whether the stream is a binary stream (yielding bytes), or a text stream (yielding character strings). See readline() below.

https://docs.python.org/3/library/io.html#io.IOBase

Copy link
Contributor Author

@NoahStapp NoahStapp Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't an asyncio version of readline, so we'd need to write our own. The canonical way to do so appears to be with threads (https://stackoverflow.com/questions/34699948/does-asyncio-supports-asynchronous-i-o-for-file-operations), at which point I question if the performance gained by not blocking the loop is more than the cost of thread overhead. The official CPython forums have similar concerns at the OS level: https://discuss.python.org/t/asyncio-for-files/31077/15.

def __next__(self) -> bytes:
return super().__next__()

def __next__(self) -> bytes: # noqa: F811, RUF100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way to avoid the duplicate def __next__(self) definitions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a limitation of the synchro script: it will translate the async __anext__ into __next__, but we want to have a separate __next__ for the async class that raises an error. That explicit __next__ will also get ported to the synchronous class unfortunately, giving us the duplicate defs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but can we workaround that? The duplicate code is strange the read. There's also a runtime perf cost to overriding a method just to call the super class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a simple way to workaround it, no. We could change the definition to be less confusing, like this:

    async def __anext__(self) -> bytes:
        return super().__next__()

    if not _IS_SYNC:

        def __next__(self) -> bytes:  # noqa: F811, RUF100
            raise TypeError(
                "AsyncGridOut does not support synchronous iteration. Use `async for` instead"
            )

Which would synchronize to

    def __next__(self) -> bytes:
        return super().__next__()

    if not _IS_SYNC:

        def __next__(self) -> bytes:  # noqa: F811, RUF100
            raise TypeError("GridOut does not support synchronous iteration. Use `for` instead")

We can also add a comment explaining why the duplicate def exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, but how about:

    if not _IS_SYNC:
        async def __anext__(self) -> bytes:
            return await self.readline()

        def __next__(self) -> bytes:  # noqa: F811, RUF100
            raise TypeError(
                "AsyncGridOut does not support synchronous iteration. Use `async for` instead"
            )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot we had our own async readline, good catch. This looks like we'd read a single line or every byte if the file wasn't line-delimited. Is that the intended behavior for iteration here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what IOBase is supposed to do and AsyncGridOut iteration should match the sync version. We also need to remove IOBase from the async.

Copy link
Contributor Author

@NoahStapp NoahStapp Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, follow-up PR: #1821.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants