-
Notifications
You must be signed in to change notification settings - Fork 3k
Halt to enforce reboot limit once only #9544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The pre-main check of error_reboot_count was applied repeatedly on every boot, meaning that once the reboot limit was hit, every subsequent reset would halt before main, until something managed to clear or corrupt the error context. Set the is_error_processed flag before halting, so that when an external agent resets us while we're halted, we do not report the error and halt again.
CI started while the reviews are ongoing (critical to get the fix in) |
I think this still might not quite be perfect - there could be issues when changing configuration or when It's possible that I think But that interacts with the way |
Test run: SUCCESSSummary: 12 of 12 test jobs passed |
@SenRamakri @deepikabhavnani Please review. |
I understand the fix is critical but I have few queries related to implementation and fix
Please correct me if this is wrong understanding |
Line 297 in 899ea59
|
OK, Regarding the comment #9544 (comment) Reset always is intended if MBED_CONF_PLATFORM_FATAL_ERROR_AUTO_REBOOT_ENABLED is true. Don't bother to read my previous comments.. |
All the information is in RAM, so manual reset with new image will not have old RAM data. Does this scenario stand true in that case? |
report_error_ctx->is_error_processed = 1;//Set the flag that we already processed this error | ||
crc_val = compute_crc32(report_error_ctx, offsetof(mbed_error_ctx, crc_error_ctx)); | ||
report_error_ctx->crc_error_ctx = crc_val; | ||
|
||
//Enforce max-reboot only if auto reboot is enabled | ||
#if MBED_CONF_PLATFORM_FATAL_ERROR_AUTO_REBOOT_ENABLED | ||
if (report_error_ctx->error_reboot_count >= MBED_CONF_PLATFORM_ERROR_REBOOT_MAX) { | ||
mbed_halt_system(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we clear all RAM data here before halting? So next reset - which has to be manual since system is in halt will not have impact of this error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably want to preserve the contents of the error RAM, in case someone wants to attach a debugger or something. Setting the "processed" flag should be sufficient to stop anything else happening on next reset.
@kjbracey-arm - Thanks for making this change, I think it should fix the case where a manual (warm)reset is intended to get the system executing the main (cold reset will reset RAM contents anyway), again. I was also thinking if we should reset the error_reboot_count as well so that the system can go through as many configured auto-reboots after a manual (warm)reset. That's something to think about. |
Halt to enforce reboot limit once only
If you've cleared One possible failure scenario as it is would be running an image without reboot enabled - that would set or increment the count on each run, then halt, but not check it itself. Next time you flash an image with reboot enabled and max 1 - it would not boot the first time. But you have some other code looking at the count as a sort of "valid" flag from the callback - that would need to be checked and adjusted. I'm not quite clear on whether the "reboot callback" is supposed to be called only if it is a reboot, or also if it is a "halt followed by manual reset". |
Manual resets (button or JTAG or serial break or whatever) do not reset the RAM any more than a The error block is preserved until power is off long enough for the RAM to lose data or some code deliberately or accidentally wipes it. Flashing the device might wipe it. If the error block is at or near the bottom of RAM, it's would be quite possible that the pyOCD or built-in flash programming algorithm uses the area as workspace, so it might get trashed. But it might not. Flashing algorithms wouldn't usually deliberately clear the RAM. |
Description
The pre-main check of
error_reboot_count
was applied repeatedly on every boot, meaning that once the reboot limit was hit, every subsequent reset would halt beforemain
, until something managed to clear or corrupt the error context.Set the
is_error_processed
flag before halting, so that when an external agent resets us while we're halted, we do not report the error and halt again.Pull request type
Reviewers
@SenRamakri, @TeemuKultala