Skip to content

GH-125603: Don't count executing generators and coroutines as referrers in gc.gc_referrers. #125640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions Lib/test/test_asyncio/test_streams.py
Original file line number Diff line number Diff line change
Expand Up @@ -1214,9 +1214,8 @@ async def main():
# can't use assertRaises because that clears frames
exc = excs.exceptions[0]
self.assertIsNotNone(exc)
self.assertListEqual(gc.get_referrers(exc), [main_coro])
main_coro = main()
asyncio.run(main_coro)
self.assertListEqual(gc.get_referrers(exc), [])
asyncio.run(main())


if __name__ == '__main__':
Expand Down
19 changes: 5 additions & 14 deletions Lib/test/test_asyncio/test_taskgroups.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,6 @@ def get_error_types(eg):
return {type(exc) for exc in eg.exceptions}


def no_other_refs():
# due to gh-124392 coroutines now refer to their locals
coro = asyncio.current_task().get_coro()
frame = sys._getframe(1)
while coro.cr_frame != frame:
coro = coro.cr_await
return [coro]


class TestTaskGroup(unittest.IsolatedAsyncioTestCase):

async def test_taskgroup_01(self):
Expand Down Expand Up @@ -923,7 +914,7 @@ class _Done(Exception):
exc = e

self.assertIsNotNone(exc)
self.assertListEqual(gc.get_referrers(exc), no_other_refs())
self.assertListEqual(gc.get_referrers(exc), [])


async def test_exception_refcycles_errors(self):
Expand All @@ -941,7 +932,7 @@ class _Done(Exception):
exc = excs.exceptions[0]

self.assertIsInstance(exc, _Done)
self.assertListEqual(gc.get_referrers(exc), no_other_refs())
self.assertListEqual(gc.get_referrers(exc), [])


async def test_exception_refcycles_parent_task(self):
Expand All @@ -963,7 +954,7 @@ async def coro_fn():
exc = excs.exceptions[0].exceptions[0]

self.assertIsInstance(exc, _Done)
self.assertListEqual(gc.get_referrers(exc), no_other_refs())
self.assertListEqual(gc.get_referrers(exc), [])

async def test_exception_refcycles_propagate_cancellation_error(self):
"""Test that TaskGroup deletes propagate_cancellation_error"""
Expand All @@ -978,7 +969,7 @@ async def test_exception_refcycles_propagate_cancellation_error(self):
exc = e.__cause__

self.assertIsInstance(exc, asyncio.CancelledError)
self.assertListEqual(gc.get_referrers(exc), no_other_refs())
self.assertListEqual(gc.get_referrers(exc), [])

async def test_exception_refcycles_base_error(self):
"""Test that TaskGroup deletes self._base_error"""
Expand All @@ -995,7 +986,7 @@ class MyKeyboardInterrupt(KeyboardInterrupt):
exc = e

self.assertIsNotNone(exc)
self.assertListEqual(gc.get_referrers(exc), no_other_refs())
self.assertListEqual(gc.get_referrers(exc), [])


if __name__ == "__main__":
Expand Down
3 changes: 2 additions & 1 deletion Objects/genobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ gen_traverse(PyObject *self, visitproc visit, void *arg)
PyGenObject *gen = _PyGen_CAST(self);
Py_VISIT(gen->gi_name);
Py_VISIT(gen->gi_qualname);
if (gen->gi_frame_state != FRAME_CLEARED) {
if (gen->gi_frame_state < FRAME_EXECUTING) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay, but it also adds some additional constraints that will make the GC in the free threading build more fragile. I think we have to make sure that any gen/coro marked as FRAME_EXECUTING is on a PyThreadState's frame stack -- otherwise some deferred references may not be visible to the GC and collected while still in use.

  • There must not be any escaping calls between setting gi_frame_state = FRAME_EXECUTING and pushing the frame to the thread's stack.
  • There must not be any escaping calls between popping the frame from the stack and setting gi_frame_state to some other value. (i.e., exit_unwind -> clear_gen_frame)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All those things are already true, because we rely on gi_frame_state == FRAME_EXECUTING to guard against sending to an already executing generator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At shutdown we delete all PyThreadStates except for the main thread. Those threads could be running generators or coroutines stuck in some long running call (like a time.sleep()). We run the GC after that. That seems like it would be unsafe (because we're now hiding deferred _PyStackRef from the GC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test program segfaults in the free threading build with this PR:

https://gist.github.com/colesbury/11d59b9987e881a3c016b086bb4ba1ff

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the generator is executing, it is part of the stack, not the heap.
So, I think tp_traverse should not be traversing executing generators.

@colesbury
How does the free-threading GC find references that are on the stack (in normal frames)?
We should do that for executing generators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, during shutdown, we delete all PyThreadStates except for the main thread.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me rephrase:
When deleting the thread state, how do we clean up the references?

Presumably, we should be changing the state of the generator when we clean up its frame but we aren't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer my own question. We don't cleanup the references 🙁

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call PyThreadState_Clear() on the deleted thread states, but that doesn't clean up current_frame. We don't call PyThreadState_Delete() for reasons that are not clear to me. Even if PyThreadState_Clear() cleaned up current_frame that wouldn't be sufficient because we unlink all the thread states before calling PyThreadState_Clear() and (by your definition) that already puts them in an invalid state.

This is all longstanding CPython behavior, as far as I can tell. Changing the shutdown behavior seems likely to cause different shutdown-related bugs than what we experience today.

I think we can work around this by encoding more knowledge about generators in gc_free_threading.c. I don't love that, but it seems less risky than messing with the shutdown behavior. Let me know if you want to go that route -- I can help with the gc_free_threading.c changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to investigate fixing up thread state clean up. While it may well risk introducing bugs in the short term, I think to would be better long term. It is hard to optimize anything if we can't trust our supposed invariants.

A quick fix, until we decide on how to handle this long term, would be to traverse the frame list when deleting threads and mark all generators as suspended.

_PyInterpreterFrame *frame = &gen->gi_iframe;
assert(frame->frame_obj == NULL ||
frame->frame_obj->f_frame->owner == FRAME_OWNED_BY_GENERATOR);
Expand Down Expand Up @@ -375,6 +375,7 @@ gen_close(PyObject *self, PyObject *args)

if (gen->gi_frame_state == FRAME_CREATED) {
gen->gi_frame_state = FRAME_COMPLETED;
_PyFrame_ClearLocals(&gen->gi_iframe);
Py_RETURN_NONE;
}
if (FRAME_STATE_FINISHED(gen->gi_frame_state)) {
Expand Down
Loading