Skip to content

bpo-17611: Move unwinding of stack from interpreter to compiler #4682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 34 commits into from

Conversation

nascheme
Copy link
Member

@nascheme nascheme commented Dec 2, 2017

This is a re-base and slightly more polished version of PR #2827. I think this change is a good idea as it reduces the size of the stacked used by Python frames.

Also, it makes the bytecode have an "ahead of time" computable effect on the stack. See the changes in PyCompile_OpcodeStackEffect(). E.g. before WITH_CLEANUP_FINISH had a "XXX" comment saying it dropped at least one stack item but at runtime maybe more. After this change, WITH_CLEANUP_FINISH always drops 6 items.

I understand that is important for tools that try to JIT compile Python bytecode. Having an ahead of time computable stack effect makes the JIT a lot simpler.

https://bugs.python.org/issue17611

@rhettinger
Copy link
Contributor

+1 Overall, this looks like a net win.

Copy link
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing the eval loop changes, my main comment is that we can't safely make assertions about the expected eval stack state when a particular bytecode is executed, since synthetically constructed bytecode may fail to meet those expectations in weird and exciting ways.

Instead, we should use explicit expectation checks, and raise SystemError when they're not met.

(Note: I haven't reviewed the compiler changes yet)

@@ -261,7 +261,7 @@ def f(cond1, cond2):
self.assertNotInBytecode(f, 'JUMP_ABSOLUTE')
returns = [instr for instr in dis.get_instructions(f)
if instr.opname == 'RETURN_VALUE']
self.assertEqual(len(returns), 6)
self.assertLessEqual(len(returns), 6)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this truly non-deterministic now? Or should the expected number of return opcodes be modified instead of making the check more permissive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looks to me like that check got changed in an old version of the patch and maybe isn't needed. I'm going to review all the unit test changes and see if they are still necessary with the final version of the patch. Antoine went through quite a few revisions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this part of the patch can be reverted. self.assertEqual(len(returns), 6) works and I see no reason why the number of return statements should be indeterminate.

Copy link
Member Author

@nascheme nascheme Dec 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the change below, "len(returns), 2", is the one that can be reverted. This one now returns 5. I believe the reason is that there is no longer a SETUP_LOOP opcode and the peephole op is now eliminating (correctly, as far as I see), the "return 6" at the end of the function body. So, should always be 5 return statements in bytecode now.

@@ -275,7 +275,7 @@ def f(cond1, cond2):
self.assertEqual(len(returns), 1)
returns = [instr for instr in dis.get_instructions(f)
if instr.opname == 'RETURN_VALUE']
self.assertEqual(len(returns), 2)
self.assertLessEqual(len(returns), 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for previous comment - can we adjust the number of expected returns here instead of relaxing the check?

Python/ceval.c Outdated
PyObject *tb = POP();
if (exc == NULL) {
int i;
assert(val == NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be regular validity checks, rather than assertions - bytecode editing may inject a RERAISE at a point where there isn't a valid exception state on top of the stack, and that should give a Python exception rather than a C level assertion failure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised to hear that. Do we really expect ceval to be resistant to crashing if fed arbitrary bytecode? That seems a tough standard to meet and would seem to imply a lot of overhead in checking that the bytecode doesn't do anything unsafe (e.g. re whats on the stack). If that is the case, I expect this patch needs a lot of fixing yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't always check directly in ceval - we'll often rely on the fact that ceval calls public CPython C APIs, and those already have their own error checking.

But we do have a lot of checks for "impossible" situations (search for SystemError in ceval.c), and the general principle of "bytecode hacking may cause an exception, but it won't crash the process" applies. (That said, there are some existing assert calls related to the stack state for function calls, so this could be worth raising on python-dev)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kinds of bytecode changes are allowed? I wrote a very simple minded fuzzer for ceval, giving it randomly generated code. It crashes Python 3.6 immediately. I don't disagree with better checking in principle but it seems a very high standard to meet without having some significant performance affects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, given the existing assert() checks for function calls, I'll withdraw this comment for now.

Python/ceval.c Outdated
Py_DECREF(status);
goto error;
else {
assert(PyExceptionClass_Check(exc));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above - this should be a check that raises PyExc_SystemError, not a C level assertion.

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Behavior is still the same after the bytecode and compiler
changes.
The bytecode changes cause the number of expected returns
to go from 6 to 5.  The last return in the function now gets
eliminated, as it is unreachable.  I believe the SETUP_LOOP
opcode was defeating the peephole optimizer previously.  That's just
a guess though, I don't understand how the peephole optimizer works.
@nascheme
Copy link
Member Author

nascheme commented Dec 3, 2017

I did a fast benchmark using pyperformance. Results are here. Roughly 2% faster so I would conclude that performance did not get worse.

@ncoghlan ncoghlan dismissed their stale review December 3, 2017 03:28

We have other assert() checks already, so a few more are fine for now

@pitrou
Copy link
Member

pitrou commented Dec 3, 2017

Having an ahead of time computable stack effect makes the JIT a lot simpler.

I wouldn't say "a lot", since bytecode parsing isn't the most complex part of a Python JIT, but I definitely think this is a welcome change (fixing Numba's handling of bytecode stack effect took a bit too much time for my taste :-)).

@serhiy-storchaka
Copy link
Member

See also other continuation of Antoine's PR: https://github.com/serhiy-storchaka/cpython/commits/unwind_stack .

I'm working on slightly different approach to this issue (not published yet).

@nascheme
Copy link
Member Author

nascheme commented Dec 3, 2017

This following function (simplified from gevent threadpool.py) causes the compiler to crash::

def func():
    try:
	try:
	    func()
	except:
	    return
	finally:
	    pass
    finally:
	return

I'm investigating. Seems to infinite recursion in compiler_unwind_fblock().

@nascheme
Copy link
Member Author

nascheme commented Dec 3, 2017

I have an even simpler example::

def func():
    try:
	try:
	    True
	except:
	    return
    finally:
	return

The fb_unwind function for the HANDLER_CLEANUP fblocktype seems to be buggy.
Still trying to figure out how this all works.

The argument is not actually used for anything but it makes
more sense to pass 'final' rather than passing 'body'.
If the finally body contains an exit (break or return), don't unwind
the finally body a second time.  That would cause an infinite loop
in the compiler.  We temporarily pop the top block, emit code for
the finally clause and then push the block back on.

At some (hopefully correct) comments explaining what's going on.
@nascheme
Copy link
Member Author

nascheme commented Dec 4, 2017

The fblock unwind logic in finally bodies is still buggy. It unwinds all blocks, not just the blocks enclosing the finally body. I think the fix is not too difficult: allocate fblockinfo on the C stack, have an enclosing block pointer in the fblockinfo struct. When unwind happens, walk the enclosing block pointers. fblock_unwind_finally_try() will have to set the current fblockinfo in the 'c' struct so that unwinds coming from VISIT_SEQ() will unwind from the correct place.

As I mentioned in the issue tracker, I think the finally body code duplication is the correct approach. I spent a good part of Sunday studying the patch. The ceval/bytecode changes look pretty solid. The compiler needs some fixing yet.

When there is a 'return' inside the finally body of a try/finally,
we need to clear the current exception.  If the final body was
entered due to an exception, we also need to pop the EXCEPT_HANDLER
fbock.
@nascheme
Copy link
Member Author

All my tests are passing now. Still could use some polish before merging.

"""
self.check_stack_size(snippet)

# This one unfortunately "leaks" a few stack slots for each snippet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any idea how to solve this one?

This is slightly more efficient.  For computing the stack effect
of opcodes, consider PUSH_NO_EXCEPT to use six stack slots so that
the stack will be large enough to handle an exception.

Update dis.rst (with content from PR 5006 by Serhiy).
Fixes problem pointed out by Serhiy:

There is a gap between calling the __enter__ method and the
SETUP_FINALLY instruction. If the exception is raised in the gap,
the __exit__ method will be never called. For example:

    a = []
    with CM() as a[0]: # IndexError
        ...
@nascheme
Copy link
Member Author

nascheme commented Jan 2, 2018

This PR still has issues with frame.set_lineno() and they don't appear to be trivial to fix. I'm closing the PR. Mark Shannon, who was the original author of this patch, has expressed interested in trying to resolve the issues. So, I think it best to leave it to him.

@nascheme nascheme closed this Jan 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants