bpo-17611: Move unwinding of stack from interpreter to compiler #4682

nascheme · 2017-12-02T22:16:48Z

This is a re-base and slightly more polished version of PR #2827. I think this change is a good idea as it reduces the size of the stacked used by Python frames.

Also, it makes the bytecode have an "ahead of time" computable effect on the stack. See the changes in PyCompile_OpcodeStackEffect(). E.g. before WITH_CLEANUP_FINISH had a "XXX" comment saying it dropped at least one stack item but at runtime maybe more. After this change, WITH_CLEANUP_FINISH always drops 6 items.

I understand that is important for tools that try to JIT compile Python bytecode. Having an ahead of time computable stack effect makes the JIT a lot simpler.

https://bugs.python.org/issue17611

…xception

rhettinger · 2017-12-03T00:08:36Z

+1 Overall, this looks like a net win.

ncoghlan

Reviewing the eval loop changes, my main comment is that we can't safely make assertions about the expected eval stack state when a particular bytecode is executed, since synthetically constructed bytecode may fail to meet those expectations in weird and exciting ways.

Instead, we should use explicit expectation checks, and raise SystemError when they're not met.

(Note: I haven't reviewed the compiler changes yet)

ncoghlan · 2017-12-03T00:43:28Z

Lib/test/test_peepholer.py

@@ -261,7 +261,7 @@ def f(cond1, cond2):
        self.assertNotInBytecode(f, 'JUMP_ABSOLUTE')
        returns = [instr for instr in dis.get_instructions(f)
                          if instr.opname == 'RETURN_VALUE']
-        self.assertEqual(len(returns), 6)
+        self.assertLessEqual(len(returns), 6)


Is this truly non-deterministic now? Or should the expected number of return opcodes be modified instead of making the check more permissive?

I looks to me like that check got changed in an old version of the patch and maybe isn't needed. I'm going to review all the unit test changes and see if they are still necessary with the final version of the patch. Antoine went through quite a few revisions.

I believe this part of the patch can be reverted. self.assertEqual(len(returns), 6) works and I see no reason why the number of return statements should be indeterminate.

Sorry, the change below, "len(returns), 2", is the one that can be reverted. This one now returns 5. I believe the reason is that there is no longer a SETUP_LOOP opcode and the peephole op is now eliminating (correctly, as far as I see), the "return 6" at the end of the function body. So, should always be 5 return statements in bytecode now.

ncoghlan · 2017-12-03T00:44:45Z

Lib/test/test_peepholer.py

@@ -275,7 +275,7 @@ def f(cond1, cond2):
        self.assertEqual(len(returns), 1)
        returns = [instr for instr in dis.get_instructions(f)
                          if instr.opname == 'RETURN_VALUE']
-        self.assertEqual(len(returns), 2)
+        self.assertLessEqual(len(returns), 2)


As for previous comment - can we adjust the number of expected returns here instead of relaxing the check?

ncoghlan · 2017-12-03T01:00:02Z

Python/ceval.c

+            PyObject *tb = POP();
+            if (exc == NULL) {
+                int i;
+                assert(val == NULL);


These should be regular validity checks, rather than assertions - bytecode editing may inject a RERAISE at a point where there isn't a valid exception state on top of the stack, and that should give a Python exception rather than a C level assertion failure.

I'm surprised to hear that. Do we really expect ceval to be resistant to crashing if fed arbitrary bytecode? That seems a tough standard to meet and would seem to imply a lot of overhead in checking that the bytecode doesn't do anything unsafe (e.g. re whats on the stack). If that is the case, I expect this patch needs a lot of fixing yet.

We don't always check directly in ceval - we'll often rely on the fact that ceval calls public CPython C APIs, and those already have their own error checking.

But we do have a lot of checks for "impossible" situations (search for SystemError in ceval.c), and the general principle of "bytecode hacking may cause an exception, but it won't crash the process" applies. (That said, there are some existing assert calls related to the stack state for function calls, so this could be worth raising on python-dev)

What kinds of bytecode changes are allowed? I wrote a very simple minded fuzzer for ceval, giving it randomly generated code. It crashes Python 3.6 immediately. I don't disagree with better checking in principle but it seems a very high standard to meet without having some significant performance affects.

Yeah, given the existing assert() checks for function calls, I'll withdraw this comment for now.

ncoghlan · 2017-12-03T01:00:54Z

Python/ceval.c

-                Py_DECREF(status);
-                goto error;
+            else {
+                assert(PyExceptionClass_Check(exc));


As above - this should be a check that raises PyExc_SystemError, not a C level assertion.

bedevere-bot · 2017-12-03T01:07:34Z

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Behavior is still the same after the bytecode and compiler changes.

The bytecode changes cause the number of expected returns to go from 6 to 5. The last return in the function now gets eliminated, as it is unreachable. I believe the SETUP_LOOP opcode was defeating the peephole optimizer previously. That's just a guess though, I don't understand how the peephole optimizer works.

nascheme · 2017-12-03T02:40:23Z

I did a fast benchmark using pyperformance. Results are here. Roughly 2% faster so I would conclude that performance did not get worse.

We have other assert() checks already, so a few more are fine for now

pitrou · 2017-12-03T09:31:47Z

Having an ahead of time computable stack effect makes the JIT a lot simpler.

I wouldn't say "a lot", since bytecode parsing isn't the most complex part of a Python JIT, but I definitely think this is a welcome change (fixing Numba's handling of bytecode stack effect took a bit too much time for my taste :-)).

serhiy-storchaka · 2017-12-03T10:03:43Z

See also other continuation of Antoine's PR: https://github.com/serhiy-storchaka/cpython/commits/unwind_stack .

I'm working on slightly different approach to this issue (not published yet).

…long opargs.

nascheme · 2017-12-03T19:06:26Z

This following function (simplified from gevent threadpool.py) causes the compiler to crash::

def func():
    try:
	try:
	    func()
	except:
	    return
	finally:
	    pass
    finally:
	return

I'm investigating. Seems to infinite recursion in compiler_unwind_fblock().

nascheme · 2017-12-03T23:11:20Z

I have an even simpler example::

def func():
    try:
	try:
	    True
	except:
	    return
    finally:
	return

The fb_unwind function for the HANDLER_CLEANUP fblocktype seems to be buggy.
Still trying to figure out how this all works.

The argument is not actually used for anything but it makes more sense to pass 'final' rather than passing 'body'.

If the finally body contains an exit (break or return), don't unwind the finally body a second time. That would cause an infinite loop in the compiler. We temporarily pop the top block, emit code for the finally clause and then push the block back on. At some (hopefully correct) comments explaining what's going on.

nascheme · 2017-12-04T17:58:26Z

The fblock unwind logic in finally bodies is still buggy. It unwinds all blocks, not just the blocks enclosing the finally body. I think the fix is not too difficult: allocate fblockinfo on the C stack, have an enclosing block pointer in the fblockinfo struct. When unwind happens, walk the enclosing block pointers. fblock_unwind_finally_try() will have to set the current fblockinfo in the 'c' struct so that unwinds coming from VISIT_SEQ() will unwind from the correct place.

As I mentioned in the issue tracker, I think the finally body code duplication is the correct approach. I spent a good part of Sunday studying the patch. The ceval/bytecode changes look pretty solid. The compiler needs some fixing yet.

When there is a 'return' inside the finally body of a try/finally, we need to clear the current exception. If the final body was entered due to an exception, we also need to pop the EXCEPT_HANDLER fbock.

nascheme · 2017-12-23T21:21:52Z

All my tests are passing now. Still could use some polish before merging.

pitrou · 2017-12-26T23:45:32Z

Lib/test/test_compile.py

+            """
+        self.check_stack_size(snippet)
+
+    # This one unfortunately "leaks" a few stack slots for each snippet


Do you have any idea how to solve this one?

This is slightly more efficient. For computing the stack effect of opcodes, consider PUSH_NO_EXCEPT to use six stack slots so that the stack will be large enough to handle an exception. Update dis.rst (with content from PR 5006 by Serhiy).

Fixes problem pointed out by Serhiy: There is a gap between calling the __enter__ method and the SETUP_FINALLY instruction. If the exception is raised in the gap, the __exit__ method will be never called. For example: a = [] with CM() as a[0]: # IndexError ...

nascheme · 2018-01-02T19:03:53Z

This PR still has issues with frame.set_lineno() and they don't appear to be trivial to fix. I'm closing the PR. Mark Shannon, who was the original author of this patch, has expressed interested in trying to resolve the issues. So, I think it best to leave it to him.

pitrou and others added 22 commits December 2, 2017 14:06

bpo-17611: Move unwinding of stack from interpreter to compiler

50dfe22

Cleanup

f65e3e6

Improve stack size calculation for SETUP_{EXCEPT,FINALLY}

e9fc614

Update frozen importlib

4788ea1

Remove useless code

a42e21e

Don't predict FOR_ITER as it breaks tracing without computed gotos

071f986

Remove dead code and fix tracing tests

33bd715

Fix test_dis

710e2f1

Add JUMP_FINALLY to avoid finally block duplication

129798d

Remove finally block duplication

b34c641

More precise computation of code object stack size

72087e6

JUMP_FINALLY pushes 6 values to be consistent with the effect of an e…

fe0519f

…xception

Exact stack size computation by popping stale exception state in RERAISE

67d8382

Remove the now pointless POP_MANY

9560ad4

Add comment for RERAISE

456b35d

Remove JUMP_FINALLY

ec4f127

Remove last block duplication

8437c30

Get rid of END_ITER

b33101e

Fix comments in frameobject.c

ea6e3f9

Add stack size tests

9dee9ff

Fix tests in test_dis to match new bytecode.

1346a9c

Slightly more logical order for fblocktype enum.

ef5e4d6

nascheme requested a review from 1st1 as a code owner December 2, 2017 22:16

nascheme requested a review from a team December 2, 2017 22:16

the-knights-who-say-ni added the CLA signed label Dec 2, 2017

bedevere-bot added the awaiting merge label Dec 2, 2017

Add news file.

7b4ef3d

ncoghlan previously requested changes Dec 3, 2017

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting merge labels Dec 3, 2017

nascheme added 2 commits December 2, 2017 18:08

Revert this change to test_peepholer.py.

92448dc

Behavior is still the same after the bytecode and compiler changes.

Simplify handling of 'for' loops in frame_setlineno() and fix it for …

3794016

…long opargs.

nascheme added 2 commits December 3, 2017 21:01

Pass 'final' block to compiler_push_finally_try().

442d30e

The argument is not actually used for anything but it makes more sense to pass 'final' rather than passing 'body'.

nascheme added 2 commits December 4, 2017 13:24

Make fblock stack into a singly linked list.

c65e9a6

Introduce POP_NO_EXCEPT opcode.

51746f6

When there is a 'return' inside the finally body of a try/finally, we need to clear the current exception. If the final body was entered due to an exception, we also need to pop the EXCEPT_HANDLER fbock.

pitrou reviewed Dec 26, 2017

View reviewed changes

pitrou mentioned this pull request Dec 28, 2017

[WIP] bpo-17611: Move unwinding of stack from interpreter to compiler #2827

Closed

nascheme added 4 commits December 28, 2017 13:44

Make PUSH_NO_EXCEPT push a single NULL.

3234776

This is slightly more efficient. For computing the stack effect of opcodes, consider PUSH_NO_EXCEPT to use six stack slots so that the stack will be large enough to handle an exception. Update dis.rst (with content from PR 5006 by Serhiy).

Refactor test_dis to make regen of expected instructions cleaner.

992b824

Fix test_dis.

25db5df

nascheme closed this Jan 2, 2018

Uh oh!

bpo-17611: Move unwinding of stack from interpreter to compiler #4682

bpo-17611: Move unwinding of stack from interpreter to compiler #4682

Uh oh!

Conversation

nascheme commented Dec 2, 2017 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhettinger commented Dec 3, 2017

Uh oh!

ncoghlan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nascheme Dec 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented Dec 3, 2017

Uh oh!

nascheme commented Dec 3, 2017

Uh oh!

pitrou commented Dec 3, 2017

Uh oh!

serhiy-storchaka commented Dec 3, 2017

Uh oh!

nascheme commented Dec 3, 2017

Uh oh!

nascheme commented Dec 3, 2017

Uh oh!

nascheme commented Dec 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nascheme commented Dec 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nascheme commented Jan 2, 2018

Uh oh!

Uh oh!

nascheme commented Dec 2, 2017 •

edited by bedevere-bot

Loading

nascheme Dec 3, 2017 •

edited

Loading

nascheme commented Dec 4, 2017 •

edited

Loading