-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-17611: Move unwinding of stack from interpreter to compiler #4682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
+1 Overall, this looks like a net win. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing the eval loop changes, my main comment is that we can't safely make assertions about the expected eval stack state when a particular bytecode is executed, since synthetically constructed bytecode may fail to meet those expectations in weird and exciting ways.
Instead, we should use explicit expectation checks, and raise SystemError
when they're not met.
(Note: I haven't reviewed the compiler changes yet)
Lib/test/test_peepholer.py
Outdated
@@ -261,7 +261,7 @@ def f(cond1, cond2): | |||
self.assertNotInBytecode(f, 'JUMP_ABSOLUTE') | |||
returns = [instr for instr in dis.get_instructions(f) | |||
if instr.opname == 'RETURN_VALUE'] | |||
self.assertEqual(len(returns), 6) | |||
self.assertLessEqual(len(returns), 6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this truly non-deterministic now? Or should the expected number of return opcodes be modified instead of making the check more permissive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looks to me like that check got changed in an old version of the patch and maybe isn't needed. I'm going to review all the unit test changes and see if they are still necessary with the final version of the patch. Antoine went through quite a few revisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this part of the patch can be reverted. self.assertEqual(len(returns), 6) works and I see no reason why the number of return statements should be indeterminate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the change below, "len(returns), 2", is the one that can be reverted. This one now returns 5. I believe the reason is that there is no longer a SETUP_LOOP opcode and the peephole op is now eliminating (correctly, as far as I see), the "return 6" at the end of the function body. So, should always be 5 return statements in bytecode now.
Lib/test/test_peepholer.py
Outdated
@@ -275,7 +275,7 @@ def f(cond1, cond2): | |||
self.assertEqual(len(returns), 1) | |||
returns = [instr for instr in dis.get_instructions(f) | |||
if instr.opname == 'RETURN_VALUE'] | |||
self.assertEqual(len(returns), 2) | |||
self.assertLessEqual(len(returns), 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for previous comment - can we adjust the number of expected returns here instead of relaxing the check?
Python/ceval.c
Outdated
PyObject *tb = POP(); | ||
if (exc == NULL) { | ||
int i; | ||
assert(val == NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be regular validity checks, rather than assertions - bytecode editing may inject a RERAISE
at a point where there isn't a valid exception state on top of the stack, and that should give a Python exception rather than a C level assertion failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised to hear that. Do we really expect ceval to be resistant to crashing if fed arbitrary bytecode? That seems a tough standard to meet and would seem to imply a lot of overhead in checking that the bytecode doesn't do anything unsafe (e.g. re whats on the stack). If that is the case, I expect this patch needs a lot of fixing yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't always check directly in ceval - we'll often rely on the fact that ceval calls public CPython C APIs, and those already have their own error checking.
But we do have a lot of checks for "impossible" situations (search for SystemError in ceval.c), and the general principle of "bytecode hacking may cause an exception, but it won't crash the process" applies. (That said, there are some existing assert
calls related to the stack state for function calls, so this could be worth raising on python-dev)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kinds of bytecode changes are allowed? I wrote a very simple minded fuzzer for ceval, giving it randomly generated code. It crashes Python 3.6 immediately. I don't disagree with better checking in principle but it seems a very high standard to meet without having some significant performance affects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, given the existing assert() checks for function calls, I'll withdraw this comment for now.
Python/ceval.c
Outdated
Py_DECREF(status); | ||
goto error; | ||
else { | ||
assert(PyExceptionClass_Check(exc)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above - this should be a check that raises PyExc_SystemError
, not a C level assertion.
When you're done making the requested changes, leave the comment: |
Behavior is still the same after the bytecode and compiler changes.
The bytecode changes cause the number of expected returns to go from 6 to 5. The last return in the function now gets eliminated, as it is unreachable. I believe the SETUP_LOOP opcode was defeating the peephole optimizer previously. That's just a guess though, I don't understand how the peephole optimizer works.
I did a fast benchmark using pyperformance. Results are here. Roughly 2% faster so I would conclude that performance did not get worse. |
We have other assert() checks already, so a few more are fine for now
I wouldn't say "a lot", since bytecode parsing isn't the most complex part of a Python JIT, but I definitely think this is a welcome change (fixing Numba's handling of bytecode stack effect took a bit too much time for my taste :-)). |
See also other continuation of Antoine's PR: https://github.com/serhiy-storchaka/cpython/commits/unwind_stack . I'm working on slightly different approach to this issue (not published yet). |
This following function (simplified from gevent threadpool.py) causes the compiler to crash::
I'm investigating. Seems to infinite recursion in compiler_unwind_fblock(). |
I have an even simpler example::
The fb_unwind function for the HANDLER_CLEANUP fblocktype seems to be buggy. |
The argument is not actually used for anything but it makes more sense to pass 'final' rather than passing 'body'.
If the finally body contains an exit (break or return), don't unwind the finally body a second time. That would cause an infinite loop in the compiler. We temporarily pop the top block, emit code for the finally clause and then push the block back on. At some (hopefully correct) comments explaining what's going on.
The fblock unwind logic in finally bodies is still buggy. It unwinds all blocks, not just the blocks enclosing the finally body. I think the fix is not too difficult: allocate fblockinfo on the C stack, have an enclosing block pointer in the fblockinfo struct. When unwind happens, walk the enclosing block pointers. fblock_unwind_finally_try() will have to set the current fblockinfo in the 'c' struct so that unwinds coming from VISIT_SEQ() will unwind from the correct place. As I mentioned in the issue tracker, I think the finally body code duplication is the correct approach. I spent a good part of Sunday studying the patch. The ceval/bytecode changes look pretty solid. The compiler needs some fixing yet. |
When there is a 'return' inside the finally body of a try/finally, we need to clear the current exception. If the final body was entered due to an exception, we also need to pop the EXCEPT_HANDLER fbock.
All my tests are passing now. Still could use some polish before merging. |
""" | ||
self.check_stack_size(snippet) | ||
|
||
# This one unfortunately "leaks" a few stack slots for each snippet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any idea how to solve this one?
This is slightly more efficient. For computing the stack effect of opcodes, consider PUSH_NO_EXCEPT to use six stack slots so that the stack will be large enough to handle an exception. Update dis.rst (with content from PR 5006 by Serhiy).
Fixes problem pointed out by Serhiy: There is a gap between calling the __enter__ method and the SETUP_FINALLY instruction. If the exception is raised in the gap, the __exit__ method will be never called. For example: a = [] with CM() as a[0]: # IndexError ...
This PR still has issues with frame.set_lineno() and they don't appear to be trivial to fix. I'm closing the PR. Mark Shannon, who was the original author of this patch, has expressed interested in trying to resolve the issues. So, I think it best to leave it to him. |
This is a re-base and slightly more polished version of PR #2827. I think this change is a good idea as it reduces the size of the stacked used by Python frames.
Also, it makes the bytecode have an "ahead of time" computable effect on the stack. See the changes in PyCompile_OpcodeStackEffect(). E.g. before WITH_CLEANUP_FINISH had a "XXX" comment saying it dropped at least one stack item but at runtime maybe more. After this change, WITH_CLEANUP_FINISH always drops 6 items.
I understand that is important for tools that try to JIT compile Python bytecode. Having an ahead of time computable stack effect makes the JIT a lot simpler.
https://bugs.python.org/issue17611