-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Issue#3216 - Needed an alternate way to check if background tasks were run #3237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tested on CPX -- irremote_simpletest now works normally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR unintentionally modifies the "protomatter" submodule, which is leading to failed builds.
Because this change introduces new work that is done whenever we're checking for background tasks, it may have an impact on performance. Can you please provide the performance of this simple code loop before and after your change so we can consider the trade-off (vs enabling ticks while these specific peripherals are in use)? The code is from #2879
import time
t0 = time.monotonic_ns()
k = 0
for i in range(1000):
for j in range(100):
k += j
t1 = time.monotonic_ns()
print(k)
print((t1-t0) / 1e9)
Let us know if you need help with git to fix the protomatter thing. |
@jepler, I have had all sorts of issues with submodules. I tried to clear things up with a fresh clone and branch, but obviously there are still issues. I would appreciate any and all help to get things straight. |
I have added a commit to your branch to fix the protomatter submodule. My top advice for NOT committing changes of this nature is to use a graphical tool to select only the files you intend to commit, and commit them. Some people suggest to "git commit -a" or graphical equivalents, which can commit files that you did not intend to modify. Once a change to a submodule has been committed, the "submodule update" and similar commands will stubbornly update it to the committed† version. Changing into the submodule, manually checking out the intended ref (which can be nontrivial to discover via github or graphical tools), and then adding and committing the change in the submodule is the basic workflow to fix this problem. "git log -p -- lib/protomatter" is a reasonable way to find this out, as long as it's not a merge commit where the problem was introduced (it wasn't, this time). Reportedly the very modern git 2.28's "--show-pulls" may improve this, but I haven't got 2.28 yet to check: https://github.blog/2020-07-27-highlights-from-git-2-28/#tidbits † Maybe the staged ref? The docs are unclear |
…into issue3216 Conflicts: lib/protomatter
Performance issue has been addressed, at least to an increase of only .5%
With the modified changes:
|
I am sorry to report that this is not working reliably on my CPX. Running the irremote_simpletest, it works "sometimes" but eventually fails with am "Input taking too long" error. Once that happens it does not work at all after a soft reboot. A hard reset is needed to get it to start reading IR pulses again... until it fails. I also had the same results on a pirkey_m0
|
@jerryneedell <https://github.com/jerryneedell> - what kind of IR input are
you using? I have a simple IR remote and ran it for over 5 minutes with no
errors.
I do see incomplete pulses sometimes; my guess is that it may be due to
'bounce' on the remotes buttons. Interestingly, I also
see occasional 'stray' IR inputs - no idea where those are coming from.
Note that once that runtime error occurs you can clear it by executing a
pulsein.resume() - no reboot needed.
…On Sun, Aug 2, 2020 at 4:56 AM jerryneedell ***@***.***> wrote:
I am sorry to report that this is not working reliably on my CPX. Running
the irremote_simpletest, it works "sometimes" but eventually fails with am
"Input taking too long" error. Once that happens it does not work at all
after a soft reboot. A hard reset is needed to get it to start reading IR
pulses again... until it fails.
I also had the same results on a pirkey_m0
Adafruit CircuitPython 6.0.0-alpha.2-136-g45dee6e77-dirty on 2020-08-02; Adafruit CircuitPlayground Express with samd21g18
>>>
>>> import irremote_simpletest
Heard 67 Pulses: [9340, 4498, 635, 516, 633, 519, 639, 515, 632, 519, 629, 522, 636, 516, 633, 519, 639, 512, 633, 1632, 635, 1632, 634, 1630, 636, 1629, 638, 1628, 628, 1637, 630, 522, 636, 1630, 637, 515, 635, 1632, 633, 1631, 635, 517, 631, 521, 638, 514, 634, 519, 629, 523, 635, 1632, 636, 514, 632, 520, 638, 1629, 628, 1636, 632, 1634, 632, 1634, 633, 1633, 634]
Decoded: [255, 2, 159, 96]
----------------------------
Heard 67 Pulses: [9336, 4529, 606, 545, 572, 579, 579, 601, 547, 576, 573, 579, 579, 573, 605, 574, 545, 608, 550, 1688, 598, 1667, 570, 1695, 602, 1663, 575, 1691, 575, 1690, 577, 574, 574, 1693, 606, 545, 572, 1694, 573, 1691, 575, 605, 545, 579, 578, 574, 574, 578, 602, 550, 578, 1687, 578, 573, 575, 576, 572, 1693, 575, 1693, 572, 1690, 577, 1690, 577, 1687, 580]
Decoded: [255, 2, 159, 96]
----------------------------
Heard 3 Pulses: [165, 5882, 170]
Failed to decode: ('10 pulses minimum',)
----------------------------
Heard 67 Pulses: [9338, 4498, 636, 515, 632, 519, 629, 523, 635, 517, 632, 519, 639, 514, 632, 519, 630, 522, 636, 1629, 638, 1627, 639, 1628, 629, 1636, 632, 1633, 633, 1632, 634, 518, 632, 1633, 632, 519, 629, 1639, 629, 1635, 633, 518, 640, 512, 635, 518, 630, 521, 637, 514, 635, 1632, 633, 517, 631, 520, 638, 1627, 629, 1637, 630, 1634, 633, 1632, 634, 1631, 636]
Decoded: [255, 2, 159, 96]
----------------------------
Heard 67 Pulses: [9330, 4505, 629, 520, 638, 514, 633, 518, 630, 521, 637, 515, 634, 518, 630, 522, 636, 516, 632, 1633, 634, 1632, 625, 1640, 637, 1631, 626, 1637, 630, 1636, 631, 521, 628, 1636, 632, 520, 639, 1627, 638, 1628, 639, 512, 636, 517, 631, 519, 639, 513, 635, 516, 633, 1633, 632, 519, 639, 513, 635, 1631, 636, 1628, 638, 1628, 629, 1636, 631, 1634, 632]
Decoded: [255, 2, 159, 96]
----------------------------
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "irremote_simpletest.py", line 12, in <module>
File "adafruit_irremote.py", line 247, in read_pulses
RuntimeError: Input taking too long
>>>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3237 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFNJKERJ6KIFARGGWBONFHTR6UZ3LANCNFSM4PR6ZFVA>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are additional unintended submodule changes. What specific tool(s) are you using when preparing commits? You should investigate whether your tool can help you include only the files you intended to modify when making commits. I use gitk, which lets me select from a list of files. We can work together to figure out how to get your tool to only commit desired changes. If you like, I can push another commit to this branch to fix all the refs back to what they should be (no change compared to the circuitpython main branch you are based off of). Let me know if that would be helpful.
Raising an exception from an interrupt context seems even more dodgy than the early return, which was an initial concern you raised to me (it emulates exceptions using setjmp/longjmp or other platform trickery, but this will not cause the interrupt-handling context to be exited). Did your testing procedure check whether this worked properly? This could be at the root of the problem @jerryneedell reports.
It seems likely that USB activity, such as printing received pulses or just regular packets passing between PC and device, is probably causing background tasks to have "work" to do with some regularity, but not fully guaranteed. It will be tricky, but I suspect that testing without USB (or display with auto_refresh=True) would show that the new code didn't work. Possibly: upload a code.py that will blink a LED if everything continues to be OK. Unplug the device and power from a battery pack or using a charge-only cable. I suspect that, assuming this successfully causes no background tasks to be queued, your added code would not be hit and the pulsein interrupt code would conclude it was "too fast"/"taking too long".
I’m using one of these. https://www.adafruit.com/product/389 what board are you testing on? I’ll try the error recovery later today. |
I tried catching the error but it is still hanging after an error and needs a RESET to function again.
here is the current code I am running
|
@jerryneedell, are you able to test the latest patch? I am having problems with submodules getting a good build on github, so I have attached a zip file with a uf2 for CPX. |
Same results with a local build or with your .uf2 :-(
|
Update from main
@jerryneedell, I've attached another attempt. I don't think it's perfect, but it seems to be a lot closer for my tests. I'm still picking up quite a few stray interrupts on the CPX, though not when using an external IR receiver on a Metro M0. I've included a version of the test script that has some pulsein commands added that seem to help - I'd love to hear what you see with it. |
That works much better -- I still get quite a few bad reads but it seems to recover OK -- I am not seeing any evidence of the RuntimeError indicating it has trapped the "taking too long" euros as before. I just get some short messages and some spurious reads. I may have some IR interference locally. another odd observation. A hard RESET clears it all and I can access the lis3dh until the next time I reboot after running test2_py Very odd -- I tried running the lis3dh_simpletest after running test2_ir (without reboot) and it works BUT if I then do a soft reboot (control-D) the lis3dh_simpletest hangs ..... a hard RESET is needed to get it back. |
I was planning got try the latest, but I am having a problem fetching this PR
|
I was having some submodule issues due to using an old version of git. I
had thought all that
was straightened out, though, as the PR passed all the build checks. If you
need to get a good
copy, you can clone my repository at
https://github.com/DavePutz/circuitpython.git and
checkout the issue3216 branch.
…On Mon, Aug 10, 2020 at 5:38 AM jerryneedell ***@***.***> wrote:
I was planning got try the latest, but I am having a problem fetching this
PR
[Jerry-desktop-mini:/Volumes/CircuitPythonBuild/circuitpython] jerryneedell% git fetch origin pull/3237/head:pr_3237
From https://github.com/adafruit/circuitpython
* [new ref] refs/pull/3237/head -> pr_3237
Fetching submodule lib/protomatter
fatal: remote error: upload-pack: not our ref 2408e9c033f5ec050967b1592b29a950a831d6c2
fatal: the remote end hung up unexpectedly
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3237 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFNJKEWDGL2JDDDRPOZWYTLR77EY3ANCNFSM4PR6ZFVA>
.
|
If you are OK with it, I would be quite happy to get rid of all the tick
checking in the Pulsein code and just set
the self->errored_too_fast flag if the buffer overflows. An overwritten
buffer will not have valid data in any case.
I am not quite sure where to check the self->errored_too_fast flag and
raise an error though. Looking at the
library code for irremote and the dhts it looks like they are looping and
calling popleft() until the pulsein object has a zero
length. Maybe check & raise the error in the popleft code??
…On Wed, Aug 19, 2020 at 3:12 PM Scott Shawcroft ***@***.***> wrote:
***@***.**** requested changes on this pull request.
Thanks again for fixing this up @DavePutz <https://github.com/DavePutz>.
I think the core issue here is that we don't actually need to run
background tasks every second anymore because things are added to the queue
as needed. So, we need to modify the idea of what OK is to account for
stretches where no background stuff is needed. I've commented inline how to
do this by making last_background_tick mean the tick when the oldest
background task was queued.
Another approach to detecting when pulses come in too quickly would be to
simply stop capturing pulses when the buffer is full. The behavior is
sketchy if they overwrite anyway. That way the PulseIn code wouldn't need
to know about background tasks at all. The IR library would probably need
to be tweaked though so it knows how to restart reception or we could do it
automatically as the buffer is emptied.
Thoughts?
------------------------------
In ports/atmel-samd/common-hal/pulseio/PulseIn.c
<#3237 (comment)>
:
> @@ -88,10 +91,8 @@ void pulsein_interrupt_handler(uint8_t channel) {
uint32_t current_count = tc->COUNT16.COUNT.reg;
pulseio_pulsein_obj_t* self = get_eic_channel_data(channel);
- if (!supervisor_background_tasks_ok() || self->errored_too_fast) {
- self->errored_too_fast = true;
- common_hal_pulseio_pulsein_pause(self);
- return;
+ if (self->len == 0) {
+ update_background_ticks();
The background tick count shouldn't be updated outside of the code that
runs background tasks.
------------------------------
In ports/atmel-samd/common-hal/pulseio/PulseIn.c
<#3237 (comment)>
:
> @@ -121,8 +122,14 @@ void pulsein_interrupt_handler(uint8_t channel) {
self->start++;
}
}
+ if (!supervisor_background_tasks_ok() ) {
+ common_hal_mcu_enable_interrupts();
+ mp_raise_RuntimeError(translate("Input taking too long"));
Please don't raise an exception in an interrupt handler because it may
mess up an interrupts we're nested in and also may raise the exception with
an unrelated python stack trace. self->errored_too_fast was a way to signal
the issue and the calls into PulseIn could see it and raise an error when
called.
------------------------------
In supervisor/shared/tick.c
<#3237 (comment)>
:
> port_finish_background_task();
}
bool supervisor_background_tasks_ok(void) {
- return port_get_raw_ticks(NULL) - last_finished_tick < 1024;
+ return port_get_raw_ticks(NULL) - get_background_ticks() < 1024;
I think we need to add a bit more smarts between this and
background_callback. Lets have get_background_ticks be 0 when no tasks are
pending. That changes here so background tasks are ok if it's 0 or the
difference is < 1024.
------------------------------
In supervisor/shared/background_callback.c
<#3237 (comment)>
:
> @@ -67,6 +77,7 @@ void background_callback_run_all() {
if (!callback_head) {
return;
}
+ last_background_tick = port_get_raw_ticks(NULL);
Move this up into add_core and set it if last_background_tick is 0. We
want to track how long the currently queued task has been waiting.
Set it to 0 below when callback_head is set to NULL.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3237 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFNJKEXY6KQLTZEEFPA6Y3DSBQW4NANCNFSM4PR6ZFVA>
.
|
I'm totally OK with it! I think that'll be much simpler. Ya, I think |
Made changes to remove relying on ticks to decide if pulsein is taking too long. If the input exceeds the given max buffer size it will now raise a RuntimeError when the call to popleft() is made. If uncaught this will result in a stack such as:
Or it can be handled in the application and the pulsein resumed. For example:
|
Looking at the second example, perhaps it would make sense to have |
The second example might be a bit misleading. In order to test and hit the
error I created a
PulseIn object with a buffer that was intentionally too small. So, to
correct and keep
going I re-created one with a large enough buffer. If buffer size was not
the problem you
could do something like:
pulsein.pause()
pulsein.clear()
pulsein.resume(100)
…On Fri, Aug 21, 2020 at 2:37 AM Radomir Dopieralski < ***@***.***> wrote:
Looking at the second example, perhaps it would make sense to have
pulseio.reset() at this point.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3237 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFNJKEUPDDG7BNSM6VZW5ETSBYPZ3ANCNFSM4PR6ZFVA>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this PR still has a lot of background tick changes. Want to just apply the changes to PulseIn.c from 6eae7ce ?
@@ -300,12 +298,15 @@ uint16_t common_hal_pulseio_pulsein_popleft(pulseio_pulsein_obj_t* self) { | |||
if (self->len == 0) { | |||
mp_raise_IndexError(translate("pop from an empty PulseIn")); | |||
} | |||
if (self->errored_too_fast) { | |||
self->errored_too_fast = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't clear this here because you could call it a second time by catching the exception. I think it should be left in resume.
@@ -114,15 +112,18 @@ void pulsein_interrupt_handler(uint8_t channel) { | |||
} | |||
|
|||
uint16_t i = (self->start + self->len) % self->maxlen; | |||
self->buffer[i] = duration; | |||
if (self->len < self->maxlen) { | |||
if (self->len <= self->maxlen) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should this be <=
? The original is <
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original is correct, I was thinking of self->len as a length rather than an index.
The background tick changes were left in because frequencyio/FrequencyIn.c still depends on them. Should we look at changing that as well? I don't have a FrequencyIn test, though. |
I think we should worry about FrequencyIn separately and keep this PR simpler. FrequencyIn is pretty easy to test with a PWMOut from a second board. It's less critical if it stops as well because it will have done a measurement already. |
@DavePutz Would you like me to finish this? I think it has more to remove to leave only the PulseIn changes. Thanks! |
@tannewt - I had left this one for a while, getting back to it. The
pulseio changes are working fine on
an irremote, but the DHT11 is failing. We seem to be picking up 2 extra
values (83 instead of 81)
which is causing the input to fail. I will post a note when I've figured
out the issue, thanks!
…On Thu, Sep 10, 2020 at 2:55 PM Scott Shawcroft ***@***.***> wrote:
@DavePutz <https://github.com/DavePutz> Would you like me to finish this?
I think it has more to remove to leave only the PulseIn changes. Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3237 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFNJKEWNJ2LYISAM2ZYEKRDSFEVLLANCNFSM4PR6ZFVA>
.
|
Reworked the check for pulsein. We couldn't use an input past the requested length as an indicator, as it seems that the DHT11 provides 83 bytes while the library code only looks for 81 bytes(1st two are ignored). Went to a check on the number of overflows on the 16-bit timer used for pulsein; 15 overflows works out to about 1 second. So, the current logic says that if there have been that many overflows since starting the current pulsein something is wrong and the RuntimeError "Input taking too long" is raised. All previous modifications to ticks have been removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thank you @DavePutz. Looks great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The items I raised concerns about in the past have been addressed. No testing performed.
The redo of how background tasks are handled changed the accuracy of supervisor_background_tasks_ok() in the Pulsein interrupt handler. This resulted in the handler pausing the collecting of the pulsein stream, which effectively hung the application. This PR implements an alternate way of saving and checking if the background tasks have been recently run. Also, the hang caused by calling common_hal_pulseio_pulsein_pause() in the event of an input taking too long has now been turned into an error that will let the application know something went wrong rather than just hanging.
Note to reviewer: Similar code checking supervisor_background_tasks_ok() is in ports/atmel-samd/common-hal/frequencyio/FrequencyIn.c . I do not have any means of testing whether that needs to be changed as well, let me know if you believe a change there would be appropriate as well.