unix/main: Check pending exception at the end of code block execution. #1667

pfalcon · 2015-12-01T22:40:01Z

Usually this checking is done by VM on jump instructions, but for linear
sequences of instructions and builtin functions this won't happen. Particular
target of this change is long-running builtin functions like time.sleep().

Usually this checking is done by VM on jump instructions, but for linear sequences of instructions and builtin functions this won't happen. Particular target of this change is long-running builtin functions like time.sleep().

THis is required to deal well with signals, signals being the closest analogue of hardware interrupts for POSIX. This is also CPython 3.5 compliant behavior (PEP 475). The main problem implementing this is to figure out how much time was spent in waiting so far/how much is remaining. It's well-known fact that Linux updates select()'s timeout value when returning with EINTR to the remaining wait time. Here's what POSIX-based standards say about this: (http://pubs.opengroup.org/onlinepubs/9699919799/functions/pselect.html): "Upon successful completion, the select() function may modify the object pointed to by the timeout argument." I.e. it allows to modify timeout value, but doesn't say how exactly it is modified. And actually, it allows such modification only "upon successful completion", which returning with EINTR error hardly is. POSIX also allows to request automatic EINTR restart for system calls using sigaction call with SA_RESTART flag, but here's what the same document says about it: "If SA_RESTART has been set for the interrupting signal, it is implementation-defined whether the function restarts or returns with [EINTR]." In other words, POSIX doesn't leave room for both portable and efficient handling of this matter, so the code just allows to manually select Linux-compatible behavior with MICROPY_SELECT_REMAINING_TIME option, or otherwise will just raise OSError. When systems with non-Linux behavior are found, they can be handled separately.

pfalcon · 2015-12-01T22:46:16Z

Proposed solution, based on discussion in 0bb57bf#commitcomment-14693152 .

Fairly speaking, to catch it all, we'd need to add checks to bunch of places (e.g. after returning from any builtin function). But that of course is losing performance. So, I guess putting it just there as patch does is a good compromise - it may be caught later than it could have been, but at least will be caught. Other issue is that this solution is unix-specific, but tricks stmhal does (PENDSV, etc.) are also stmhal-specific, and generally we know that this stuff (repl/file execution) could use some refactoring across ports.

dhylands · 2015-12-02T02:13:27Z

Actually, to do this properly, I think that any builtin function which loops for any significant amount of time, should call a function which checks for pending interrrupts.

Under linux, if the Control-C raises a signal then that will cause the sleep to return prematurely (assuming the signals aren't bklocked).

On stmhal, we have the same problem while calling say uart.getchar(), any of the sleep functions, or any peripheral I/O that can block for an indefinite amount of time (or even just as long time).

pfalcon · 2015-12-02T11:36:21Z

Let's start with saying that doing it properly requires just raising KeyboardInterrupt exception as soon as possible after it being detected, instead of keeping it "pending". That - calling nlr_jump from signal handling - is even possible under Linux, but POSIX doesn't give any guarantee like that, so I don't want to rely on that. But for stmhal, I thought the whole idea of using platform-specific PENDSV mechanism is to work like that. If that's not the case, then d'oh guys who like to deal with niche platform-specific features instead of doing in general way - what you were spending your time on?

I think that any builtin function which loops for any significant amount of time, should call a function which checks for pending interrrupts.

Let's see what you propose: to duplicate the same code in any==every function of vaguely defined subset, conservative clarification of which is "every function with loop". I personally don't want to go, as long as there's more technically sound alternative (for stmhal there's, even though it doesn't help other ports). Seeing what needs to be done to get signals work, I'd even skip doing that for unix port at all, but if there's aim to be able to more or less close emulate Hardware API on unix port (and I have such aim), then there's no other alternative to use signals, then I'd prefer to do minimal and incremental set of changes to support them (keeping in mind that usecase for supporting Hardware API on unix is testing, not production, unless proven otherwise).

dhylands · 2015-12-02T15:01:00Z

The issue that I see (with the stm port) is that you can't raise an exception in an interrupt handler, and have that seamlessly affect the main thread.

Which is what the whole pendsv thing is taking care of. The interrupt handler "pends" a raise until control gets back to the main thread, where the exception can be raised properly.

Let's see what you propose: to duplicate the same code in any==every function of vaguely defined subset, conservative clarification of which is "every function with loop".

Well, the "code" I was referring to was a simple function call. Let me rephrase my vagueness.

Currently, on the stm port, when you hit Control-C there are cases where it doesn't respond until control gets back to the VM. I think that any function that can take longer than X milliseconds (X should probably be something between 0-1000 milliseconds) should call a function which checks to see if it should return prematurely so that a pending exception (like a Control-C) can be dealt with. Whatever value of X that you come with determines the maximum time it takes to respond to the Control-C.

pfalcon · 2015-12-02T23:09:51Z

@dhylands , I don't see any disagreement with what I wrote above, so perhaps I should try to clarify what I meant.

Well, the "code" I was referring to was a simple function call.

Which is still to be applied to many places. If there's instead a chance to do it in one place, why not follow it?

Which is what the whole pendsv thing is taking care of. The interrupt handler "pends" a raise until control gets back to the main thread, where the exception can be raised properly.

Well, by definition, PENDSV is ARM exception which gets executed after all other exceptions/interrupts, right before returning to main thread. So, it should be possible to raise Python exception from it. Unless you right away can point at 100% race conditions or other issues, that's what I'd recommend to folks who want to use all opportunities of a particular platform.

I think that any function that can take longer than X milliseconds (X should probably be something between 0-1000 milliseconds)

Once again, that criteria is vague. Any function which waits (loops) on a hardware can potentially take arbitrary time, so you're looking into patching potentially large set of functions. Taking into account my constant argumentation for simple changes (formal criteria being number of files, lines, characters affected), you may imagine that I'm not fond of such solution and propose an alternative. (I don't have such alternative with unix port (without compromising more important requirements), so follow other approach, still "simple".)

dpgeorge · 2015-12-02T23:25:35Z

Maybe this can be fixed in a more general way by putting another check in the VM. Currently the VM checks for pending exceptions only at the end of a jump instruction. This will eventually catch anything in a "long running" program, because such a program must use jumps somewhere. But there are cases where the code can go for a long time just calling and returning from functions (Python and native functions). And this is the case when you do simple calls at the REPL (eg time.sleep(10)).

So to catch all (?) these cases where code is just called and there are no jumps, how about checking at the end of the RETURN opcode:

--- a/py/vm.c
+++ b/py/vm.c
@@ -1066,6 +1066,11 @@ unwind_return:
                         }
                         exc_sp--;
                     }
+                    if (MP_STATE_VM(mp_pending_exception) != MP_OBJ_NULL) {
+                        mp_obj_t obj = MP_STATE_VM(mp_pending_exception);
+                        MP_STATE_VM(mp_pending_exception) = MP_OBJ_NULL;
+                        RAISE(obj);
+                    }
                     nlr_pop();
                     code_state->sp = sp;
                     assert(exc_sp == exc_stack - 1);

I've checked that the above patch fixes the problem with breaking out of time.sleep(10).

pfalcon · 2015-12-02T23:39:33Z

Yes, sure it fixes it, as any REPL code is wrapped in function for execution. But it makes VM slower. So I didn't even consider such solution (for the title of patch). And it doesn't solve other issues raised by @dhylands .

dpgeorge · 2015-12-02T23:58:54Z

This "bug" is in all ports and would need to be patched in a similar way for them all, so VM patch is a more generic solution. RETURN opcode makes up 2.7% of static code, so performance hit won't be big (but will still be there).

Breaking out of long-running native code is doable on stmhal (press ctrl-C twice), but fixing that in a generic way is hard.

pfalcon · 2015-12-03T00:05:30Z

This "bug" is in all ports and would need to be patched in a similar way for them all

Yes, I mentioned that in the original description - there should be common function to execute a string buffer/file. VM solution doesn't add too much comparing to that.

RETURN opcode makes up 2.7% of static code, so performance hit won't be big (but will still be there).

For function like lambda x: x+1 run in a loop it's 30% or so of dynamic frequency. So, unless you want to follow up with a patch to inline functions... ;-))) (Btw, did you saw that message on python-dev where Victor Stinner experimented with inlining on CPython?)

but fixing that in a generic way is hard.

And nobody even asked, so can be left for later.

dpgeorge · 2015-12-03T00:21:08Z

Btw, did you saw that message on python-dev where Victor Stinner experimented with inlining on CPython?

Yes, I had a brief look. It seems that there's a lot of overhead to keep the original code + inlined code, and switch between the 2 depending on whether builtins have been redefined (or did I mix that with something else he was doing?).

pfalcon · 2015-12-03T00:26:50Z

I actually have reading it in my backlog still...

dpgeorge · 2015-12-04T01:18:44Z

I'm ok with original proposal of this PR. As said above, we need to factor out this whole parse/compile/execute function anyway at some later stage, and then the fix will be available for all ports.

pfalcon · 2015-12-04T17:17:13Z

Thanks, merged.

dhylands · 2015-12-16T23:01:23Z

@pfalcon suggested that discussion continue here about Control-C handling.

I was going to suggest that we expose a function in vm.c which was something like this (I just copied this from https://github.com/micropython/micropython/blob/master/py/vm.c#L1247-L1252 and haven't actually tried to compile it):

void handle_pending_exception(void) {
    if (MP_STATE_VM(mp_pending_exception) != MP_OBJ_NULL) {
        MARK_EXC_IP_SELECTIVE();
        mp_obj_t obj = MP_STATE_VM(mp_pending_exception);
        MP_STATE_VM(mp_pending_exception) = MP_OBJ_NULL;
        RAISE(obj);
    }
}

In the unix port, each time it makes a system call which can return EINTR then it would do something like:

if (errno == EINTR) {
    handle_pending_exception();
    ...if you get this far, then it wasn't a KeyboardInterrupt or other exception...
    ...either retry the operation or return an error - whichever is appropriate for the occaison...
}

In the other ports, any code which is potentially block (i.e. waiting for a timeout, waitiing for data to arrive) could call handle_pending_exception() inisde the loop.

This would allow Control-C to be acted on promptly.
When we get support for software ISRs it would also allow that functionality to be slotted in with a minimal of changes (just the guts of handle_pending_exception)

pfalcon · 2015-12-16T23:31:30Z

Well, I didn't mean to discuss here, at the closed PR, just keep in mind everything what was said here: specifically, that's the "best" plan to deal with Ctrl+C is to raise it straight from interrupt/signal handler where it's detected. The "worst" plan is to check it at the places where there's 0.00001% or so chance of getting it - while wasting resources on each such check. These 2 extremes are at least "general" and require minimal changes to code. Between them, there intermediate stages, where we identify several places where there's higher chance to get pending exception, and duplicate checking code in each place.

From this, it should be clear why "the best" solution is the best - it adds code in one place, where exception happens in 100% of cases.

Without this, a double free can occur when a display (and terminal) is released and then a crash occurs. Upon a second release, different memory is released (sometimes the heap). When this is followed by an allocation for the flash cache, the cache can overwrite the active heap causing crashes. Fixes micropython#1667

pfalcon closed this Dec 4, 2015

pfalcon deleted the unix-check-pending-exc branch December 4, 2015 17:17

dpgeorge mentioned this pull request Jul 15, 2019

"FATAL: uncaught exception" on KeyboardInterrupt during large printout at normal REPL #4921

Closed

mikewadsten mentioned this pull request Jan 5, 2020

ESP32: Crash after pushing Ctrl-C #5488

Closed

Uh oh!

unix/main: Check pending exception at the end of code block execution. #1667

unix/main: Check pending exception at the end of code block execution. #1667

Uh oh!

Conversation

pfalcon commented Dec 1, 2015

Uh oh!

pfalcon commented Dec 1, 2015

Uh oh!

dhylands commented Dec 2, 2015

Uh oh!

pfalcon commented Dec 2, 2015

Uh oh!

dhylands commented Dec 2, 2015

Uh oh!

pfalcon commented Dec 2, 2015

Uh oh!

dpgeorge commented Dec 2, 2015

Uh oh!

pfalcon commented Dec 2, 2015

Uh oh!

dpgeorge commented Dec 2, 2015

Uh oh!

pfalcon commented Dec 3, 2015

Uh oh!

dpgeorge commented Dec 3, 2015

Uh oh!

pfalcon commented Dec 3, 2015

Uh oh!

dpgeorge commented Dec 4, 2015

Uh oh!

pfalcon commented Dec 4, 2015

Uh oh!

dhylands commented Dec 16, 2015

Uh oh!

pfalcon commented Dec 16, 2015

Uh oh!

Uh oh!