Skip to content

Issue#3216 - Needed an alternate way to check if background tasks were run #3237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Sep 16, 2020
Merged

Issue#3216 - Needed an alternate way to check if background tasks were run #3237

merged 25 commits into from
Sep 16, 2020

Conversation

DavePutz
Copy link
Collaborator

@DavePutz DavePutz commented Aug 1, 2020

The redo of how background tasks are handled changed the accuracy of supervisor_background_tasks_ok() in the Pulsein interrupt handler. This resulted in the handler pausing the collecting of the pulsein stream, which effectively hung the application. This PR implements an alternate way of saving and checking if the background tasks have been recently run. Also, the hang caused by calling common_hal_pulseio_pulsein_pause() in the event of an input taking too long has now been turned into an error that will let the application know something went wrong rather than just hanging.
Note to reviewer: Similar code checking supervisor_background_tasks_ok() is in ports/atmel-samd/common-hal/frequencyio/FrequencyIn.c . I do not have any means of testing whether that needs to be changed as well, let me know if you believe a change there would be appropriate as well.

@jerryneedell
Copy link
Collaborator

Tested on CPX -- irremote_simpletest now works normally.
Not sure why all the build failures -- appear to be for nrf52840 boards

@jepler jepler self-requested a review August 1, 2020 20:52
Copy link

@jepler jepler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR unintentionally modifies the "protomatter" submodule, which is leading to failed builds.

Because this change introduces new work that is done whenever we're checking for background tasks, it may have an impact on performance. Can you please provide the performance of this simple code loop before and after your change so we can consider the trade-off (vs enabling ticks while these specific peripherals are in use)? The code is from #2879

import time
t0 = time.monotonic_ns()
k = 0
for i in range(1000):
    for j in range(100):
        k += j
t1 = time.monotonic_ns()
print(k)
print((t1-t0) / 1e9)

@jepler
Copy link

jepler commented Aug 1, 2020

Let us know if you need help with git to fix the protomatter thing.

@DavePutz
Copy link
Collaborator Author

DavePutz commented Aug 1, 2020

@jepler, I have had all sorts of issues with submodules. I tried to clear things up with a fresh clone and branch, but obviously there are still issues. I would appreciate any and all help to get things straight.
We can't just enable ticks during the pulsein. The problem is that, on the SAMD21 boards, the time it takes to re-synchronize the timer after setting a new interval can be up to 100us, which causes misses on the incoming pulses. See issue #3049 for details.
Unfortunately, there seems to be quite a performance hit using your test script. I will see if the PR can be re-worked to minimize that.

@jepler
Copy link

jepler commented Aug 1, 2020

I have added a commit to your branch to fix the protomatter submodule.

My top advice for NOT committing changes of this nature is to use a graphical tool to select only the files you intend to commit, and commit them. Some people suggest to "git commit -a" or graphical equivalents, which can commit files that you did not intend to modify.

Once a change to a submodule has been committed, the "submodule update" and similar commands will stubbornly update it to the committed† version. Changing into the submodule, manually checking out the intended ref (which can be nontrivial to discover via github or graphical tools), and then adding and committing the change in the submodule is the basic workflow to fix this problem. "git log -p -- lib/protomatter" is a reasonable way to find this out, as long as it's not a merge commit where the problem was introduced (it wasn't, this time). Reportedly the very modern git 2.28's "--show-pulls" may improve this, but I haven't got 2.28 yet to check: https://github.blog/2020-07-27-highlights-from-git-2-28/#tidbits

† Maybe the staged ref? The docs are unclear

@DavePutz
Copy link
Collaborator Author

DavePutz commented Aug 2, 2020

Performance issue has been addressed, at least to an increase of only .5%
From 6.0.0-alpha.2 on a CPX:

>>> exec(open('perf_test.py').read())
4950000
5.69202

With the modified changes:

>>> exec(open('perf_test.py').read())
4950000
5.73193

@jerryneedell
Copy link
Collaborator

I am sorry to report that this is not working reliably on my CPX. Running the irremote_simpletest, it works "sometimes" but eventually fails with am "Input taking too long" error. Once that happens it does not work at all after a soft reboot. A hard reset is needed to get it to start reading IR pulses again... until it fails.

I also had the same results on a pirkey_m0

Adafruit CircuitPython 6.0.0-alpha.2-136-g45dee6e77-dirty on 2020-08-02; Adafruit CircuitPlayground Express with samd21g18
>>> 
>>> import irremote_simpletest
Heard 67 Pulses: [9340, 4498, 635, 516, 633, 519, 639, 515, 632, 519, 629, 522, 636, 516, 633, 519, 639, 512, 633, 1632, 635, 1632, 634, 1630, 636, 1629, 638, 1628, 628, 1637, 630, 522, 636, 1630, 637, 515, 635, 1632, 633, 1631, 635, 517, 631, 521, 638, 514, 634, 519, 629, 523, 635, 1632, 636, 514, 632, 520, 638, 1629, 628, 1636, 632, 1634, 632, 1634, 633, 1633, 634]
Decoded: [255, 2, 159, 96]
----------------------------
Heard 67 Pulses: [9336, 4529, 606, 545, 572, 579, 579, 601, 547, 576, 573, 579, 579, 573, 605, 574, 545, 608, 550, 1688, 598, 1667, 570, 1695, 602, 1663, 575, 1691, 575, 1690, 577, 574, 574, 1693, 606, 545, 572, 1694, 573, 1691, 575, 605, 545, 579, 578, 574, 574, 578, 602, 550, 578, 1687, 578, 573, 575, 576, 572, 1693, 575, 1693, 572, 1690, 577, 1690, 577, 1687, 580]
Decoded: [255, 2, 159, 96]
----------------------------
Heard 3 Pulses: [165, 5882, 170]
Failed to decode:  ('10 pulses minimum',)
----------------------------
Heard 67 Pulses: [9338, 4498, 636, 515, 632, 519, 629, 523, 635, 517, 632, 519, 639, 514, 632, 519, 630, 522, 636, 1629, 638, 1627, 639, 1628, 629, 1636, 632, 1633, 633, 1632, 634, 518, 632, 1633, 632, 519, 629, 1639, 629, 1635, 633, 518, 640, 512, 635, 518, 630, 521, 637, 514, 635, 1632, 633, 517, 631, 520, 638, 1627, 629, 1637, 630, 1634, 633, 1632, 634, 1631, 636]
Decoded: [255, 2, 159, 96]
----------------------------
Heard 67 Pulses: [9330, 4505, 629, 520, 638, 514, 633, 518, 630, 521, 637, 515, 634, 518, 630, 522, 636, 516, 632, 1633, 634, 1632, 625, 1640, 637, 1631, 626, 1637, 630, 1636, 631, 521, 628, 1636, 632, 520, 639, 1627, 638, 1628, 639, 512, 636, 517, 631, 519, 639, 513, 635, 516, 633, 1633, 632, 519, 639, 513, 635, 1631, 636, 1628, 638, 1628, 629, 1636, 631, 1634, 632]
Decoded: [255, 2, 159, 96]
----------------------------
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "irremote_simpletest.py", line 12, in <module>
  File "adafruit_irremote.py", line 247, in read_pulses
RuntimeError: Input taking too long
>>> 

@DavePutz
Copy link
Collaborator Author

DavePutz commented Aug 2, 2020 via email

Copy link

@jepler jepler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are additional unintended submodule changes. What specific tool(s) are you using when preparing commits? You should investigate whether your tool can help you include only the files you intended to modify when making commits. I use gitk, which lets me select from a list of files. We can work together to figure out how to get your tool to only commit desired changes. If you like, I can push another commit to this branch to fix all the refs back to what they should be (no change compared to the circuitpython main branch you are based off of). Let me know if that would be helpful.

Raising an exception from an interrupt context seems even more dodgy than the early return, which was an initial concern you raised to me (it emulates exceptions using setjmp/longjmp or other platform trickery, but this will not cause the interrupt-handling context to be exited). Did your testing procedure check whether this worked properly? This could be at the root of the problem @jerryneedell reports.

It seems likely that USB activity, such as printing received pulses or just regular packets passing between PC and device, is probably causing background tasks to have "work" to do with some regularity, but not fully guaranteed. It will be tricky, but I suspect that testing without USB (or display with auto_refresh=True) would show that the new code didn't work. Possibly: upload a code.py that will blink a LED if everything continues to be OK. Unplug the device and power from a battery pack or using a charge-only cable. I suspect that, assuming this successfully causes no background tasks to be queued, your added code would not be hit and the pulsein interrupt code would conclude it was "too fast"/"taking too long".

@jerryneedell
Copy link
Collaborator

@jerryneedell https://github.com/jerryneedell - what kind of IR input are you using? I have a simple IR remote and ran it for over 5 minutes with no errors. I do see incomplete pulses sometimes; my guess is that it may be due to 'bounce' on the remotes buttons. Interestingly, I also see occasional 'stray' IR inputs - no idea where those are coming from. Note that once that runtime error occurs you can clear it by executing a pulsein.resume() - no reboot needed.

I’m using one of these. https://www.adafruit.com/product/389 what board are you testing on? I’ll try the error recovery later today.

@jerryneedell
Copy link
Collaborator

jerryneedell commented Aug 2, 2020

I tried catching the error but it is still hanging after an error and needs a RESET to function again.
It catches the error, but then hangs forever on the next call to read_pulses(pulses)


Press any key to enter the REPL. Use CTRL-D to reload.
Adafruit CircuitPython 6.0.0-alpha.2-136-g45dee6e77-dirty on 2020-08-02; Adafruit CircuitPlayground Express with samd21g18
>>> import irremote_simpletest
Input taking too long  resuming
Heard 23 Pulses: [9295, 4530, 633, 517, 629, 523, 635, 516, 632, 520, 658, 495, 603, 548, 630, 522, 627, 524, 632, 1634, 631, 1633, 633]
Decoded: [255, 0]
----------------------------

hangs here -- interrupted with control=C

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "irremote_simpletest.py", line 17, in <module>
  File "irremote_simpletest.py", line 13, in <module>
  File "adafruit_irremote.py", line 247, in read_pulses
KeyboardInterrupt: 
>>> 

here is the current code I am running

# Circuit Playground Express Demo Code
# Adjust the pulseio 'board.PIN' if using something else
import pulseio
import board
import adafruit_irremote

pulsein = pulseio.PulseIn(board.REMOTEIN, maxlen=120, idle_state=True)
decoder = adafruit_irremote.GenericDecode()


while True:
    try:
        pulses = decoder.read_pulses(pulsein)
    except RuntimeError as e:
        print(e," resuming")
        pulsein.resume()
        continue
    print("Heard", len(pulses), "Pulses:", pulses)
    try:
        code = decoder.decode_bits(pulses)
        print("Decoded:", code)
    except adafruit_irremote.IRNECRepeatException:  # unusual short code!
        print("NEC repeat!")
    except adafruit_irremote.IRDecodeException as e:  # failed to decode
        print("Failed to decode: ", e.args)
    print("----------------------------")

@DavePutz
Copy link
Collaborator Author

DavePutz commented Aug 3, 2020

@jerryneedell, are you able to test the latest patch? I am having problems with submodules getting a good build on github, so I have attached a zip file with a uf2 for CPX.
CPX-6.0.0-test.zip

@jerryneedell
Copy link
Collaborator

Same results with a local build or with your .uf2 :-(

Press any key to enter the REPL. Use CTRL-D to reload.
Adafruit CircuitPython 6.0.0-alpha.2-137-g9203a7780-dirty on 2020-08-03; Adafruit CircuitPlayground Express with samd21g18
>>> import irremote_simpletest
Input taking too long  resuming


hangs here -- hit control-c 

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "irremote_simpletest.py", line 17, in <module>
  File "irremote_simpletest.py", line 13, in <module>
  File "adafruit_irremote.py", line 247, in read_pulses
KeyboardInterrupt: 
>>> 

@DavePutz
Copy link
Collaborator Author

DavePutz commented Aug 5, 2020

@jerryneedell, I've attached another attempt. I don't think it's perfect, but it seems to be a lot closer for my tests. I'm still picking up quite a few stray interrupts on the CPX, though not when using an external IR receiver on a Metro M0. I've included a version of the test script that has some pulsein commands added that seem to help - I'd love to hear what you see with it.
CP600-test2.zip

@jerryneedell
Copy link
Collaborator

@jerryneedell, I've attached another attempt. I don't think it's perfect, but it seems to be a lot closer for my tests. I'm still picking up quite a few stray interrupts on the CPX, though not when using an external IR receiver on a Metro M0. I've included a version of the test script that has some pulsein commands added that seem to help - I'd love to hear what you see with it.
CP600-test2.zip

That works much better -- I still get quite a few bad reads but it seems to recover OK -- I am not seeing any evidence of the RuntimeError indicating it has trapped the "taking too long" euros as before. I just get some short messages and some spurious reads. I may have some IR interference locally.

another odd observation.
I do notice that after running test2_ir.py -- if I then do a soft Reboot (control-D) I have problems execting code that reads from the accelerometer (I2C) --- I had seen this before. It still seems like something is leaving things in a "bad state". Running the lis3dh_simpletest just hangs when trying to access the sensor ...

A hard RESET clears it all and I can access the lis3dh until the next time I reboot after running test2_py

Very odd -- I tried running the lis3dh_simpletest after running test2_ir (without reboot) and it works BUT if I then do a soft reboot (control-D) the lis3dh_simpletest hangs ..... a hard RESET is needed to get it back.

@jerryneedell
Copy link
Collaborator

I was planning got try the latest, but I am having a problem fetching this PR

[Jerry-desktop-mini:/Volumes/CircuitPythonBuild/circuitpython] jerryneedell% git fetch origin pull/3237/head:pr_3237
From https://github.com/adafruit/circuitpython
 * [new ref]             refs/pull/3237/head -> pr_3237
Fetching submodule lib/protomatter
fatal: remote error: upload-pack: not our ref 2408e9c033f5ec050967b1592b29a950a831d6c2
fatal: the remote end hung up unexpectedly

@DavePutz
Copy link
Collaborator Author

DavePutz commented Aug 10, 2020 via email

@DavePutz
Copy link
Collaborator Author

DavePutz commented Aug 19, 2020 via email

@tannewt
Copy link
Member

tannewt commented Aug 20, 2020

I'm totally OK with it! I think that'll be much simpler.

Ya, I think popleft is a good spot. Perhaps also having an explicit errored property would be good too.

@DavePutz
Copy link
Collaborator Author

DavePutz commented Aug 20, 2020

Made changes to remove relying on ticks to decide if pulsein is taking too long. If the input exceeds the given max buffer size it will now raise a RuntimeError when the call to popleft() is made. If uncaught this will result in a stack such as:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "test2_ir.py", line 18, in <module>
  File "test2_ir.py", line 13, in <module>
  File "adafruit_irremote.py", line 243, in read_pulses
  File "adafruit_irremote.py", line 206, in _read_pulses_non_blocking
RuntimeError: Input taking too long

Or it can be handled in the application and the pulsein resumed. For example:

while True:
    try:
        pulses = decoder.read_pulses(pulsein)
    except RuntimeError as e:
        print(e," resuming")
        pulsein.deinit()
        pulsein = pulseio.PulseIn(board.REMOTEIN, maxlen=120, idle_state=True)
        continue

@deshipu
Copy link

deshipu commented Aug 21, 2020

Looking at the second example, perhaps it would make sense to have pulseio.reset() at this point.

@DavePutz
Copy link
Collaborator Author

DavePutz commented Aug 21, 2020 via email

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this PR still has a lot of background tick changes. Want to just apply the changes to PulseIn.c from 6eae7ce ?

@@ -300,12 +298,15 @@ uint16_t common_hal_pulseio_pulsein_popleft(pulseio_pulsein_obj_t* self) {
if (self->len == 0) {
mp_raise_IndexError(translate("pop from an empty PulseIn"));
}
if (self->errored_too_fast) {
self->errored_too_fast = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't clear this here because you could call it a second time by catching the exception. I think it should be left in resume.

@@ -114,15 +112,18 @@ void pulsein_interrupt_handler(uint8_t channel) {
}

uint16_t i = (self->start + self->len) % self->maxlen;
self->buffer[i] = duration;
if (self->len < self->maxlen) {
if (self->len <= self->maxlen) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should this be <=? The original is <.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original is correct, I was thinking of self->len as a length rather than an index.

@DavePutz
Copy link
Collaborator Author

The background tick changes were left in because frequencyio/FrequencyIn.c still depends on them. Should we look at changing that as well? I don't have a FrequencyIn test, though.

@tannewt
Copy link
Member

tannewt commented Aug 25, 2020

The background tick changes were left in because frequencyio/FrequencyIn.c still depends on them. Should we look at changing that as well? I don't have a FrequencyIn test, though.

I think we should worry about FrequencyIn separately and keep this PR simpler.

FrequencyIn is pretty easy to test with a PWMOut from a second board. It's less critical if it stops as well because it will have done a measurement already.

@tannewt
Copy link
Member

tannewt commented Sep 10, 2020

@DavePutz Would you like me to finish this? I think it has more to remove to leave only the PulseIn changes. Thanks!

@DavePutz
Copy link
Collaborator Author

DavePutz commented Sep 12, 2020 via email

@DavePutz
Copy link
Collaborator Author

Reworked the check for pulsein. We couldn't use an input past the requested length as an indicator, as it seems that the DHT11 provides 83 bytes while the library code only looks for 81 bytes(1st two are ignored). Went to a check on the number of overflows on the 16-bit timer used for pulsein; 15 overflows works out to about 1 second. So, the current logic says that if there have been that many overflows since starting the current pulsein something is wrong and the RuntimeError "Input taking too long" is raised. All previous modifications to ticks have been removed.

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thank you @DavePutz. Looks great.

@tannewt tannewt requested a review from jepler September 14, 2020 22:19
Copy link

@jepler jepler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The items I raised concerns about in the past have been addressed. No testing performed.

@jepler jepler merged commit 1fc7971 into adafruit:main Sep 16, 2020
@DavePutz DavePutz deleted the issue3216 branch December 30, 2020 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

possible pulseio issue -- irremote not working on SAMD21
6 participants