Halt to enforce reboot limit once only #9544

kjbracey · 2019-01-30T11:05:18Z

Description

The pre-main check of error_reboot_count was applied repeatedly on every boot, meaning that once the reboot limit was hit, every subsequent reset would halt before main, until something managed to clear or corrupt the error context.

Set the is_error_processed flag before halting, so that when an external agent resets us while we're halted, we do not report the error and halt again.

Pull request type

[X] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@SenRamakri, @TeemuKultala

The pre-main check of error_reboot_count was applied repeatedly on every boot, meaning that once the reboot limit was hit, every subsequent reset would halt before main, until something managed to clear or corrupt the error context. Set the is_error_processed flag before halting, so that when an external agent resets us while we're halted, we do not report the error and halt again.

0xc0170 · 2019-01-30T11:55:04Z

CI started while the reviews are ongoing (critical to get the fix in)

kjbracey · 2019-01-30T12:36:23Z

I think this still might not quite be perfect - there could be issues when changing configuration or when platform.error-reboot-max is 0.

It's possible that mbed_error chooses not to reset itself (because max is 0 or reboot is disabled altogether), but does still increment error_reboot_count. Then when someone causes a manual reset (possibly with a new image that now has reboot enabled), it may see reboot_count higher than max, and halt on entry. We don't want to halt on manual resets, only automatic ones.

I think error_reboot_count should only be being incremented or set to 1 if we are actually going to do a system_reset. We don't want any of the limit checks being activated if the reset was manual, so we should set it to 0 if just halting instead of rebooting.

But that interacts with the way error_reboot_count is used as a sort of flag elsewhere, so bit wary about changing anything further. I think we're probably okay in a setup where the config isn't changing and max > 0.

mbed-ci · 2019-01-30T14:33:01Z

Test run: SUCCESS

Summary: 12 of 12 test jobs passed
Build number : 1
Build artifacts

cmonr · 2019-01-30T15:00:48Z

@SenRamakri @deepikabhavnani Please review.

As a note, only #9544 or #9545 should come in.

deepikabhavnani · 2019-01-30T15:31:15Z

CI started while the reviews are ongoing (critical to get the fix in)

I understand the fix is critical but I have few queries related to implementation and fix

Config option - fatal-error-auto-reboot-enabled


        "fatal-error-auto-reboot-enabled": {
            "help": "Setting this to true enables auto-reboot on a fatal error.",
            "value": false
        },

* NOTE: If MBED_CONF_PLATFORM_FATAL_ERROR_AUTO_REBOOT_ENABLED is enabled and if the current reboot count exceeds MBED_CONF_PLATFORM_ERROR_REBOOT_MAX the system will halt when this function is called and in such cases the caller will not get the control back.

As per my understanding this config option should disable auto reboot completely when false, i.e. always halt if false for critical errors (default mbed_error behavior).
In case of true - It should allow auto reboot only for MBED_CONF_PLATFORM_ERROR_REBOOT_MAX times.

Please correct me if this is wrong understanding

deepikabhavnani · 2019-01-30T15:36:43Z

mbed-os/platform/mbed_error.c

Line 297 in 899ea59

    
           if (report_error_ctx->error_reboot_count >= MBED_CONF_PLATFORM_ERROR_REBOOT_MAX) {


#if MBED_CONF_PLATFORM_FATAL_ERROR_AUTO_REBOOT_ENABLED && (MBED_CONF_PLATFORM_ERROR_REBOOT_MAX > 0)

-- Reboot enabled and max should be checked

#ifndef NDEBUG
    mbed_error_printf("\n= System will be rebooted due to a fatal error =\n");
    if (report_error_ctx->error_reboot_count >= MBED_CONF_PLATFORM_ERROR_REBOOT_MAX) {
    
    -- Max checked for print
        //We have rebooted more than enough, hold the system here.
        mbed_error_printf("= Reboot count(=%ld) reached maximum, system will halt after rebooting =\n", report_error_ctx->error_reboot_count);
    }
#endif

    -- Reset done always???? Shouldnt it be if (report_error_ctx->error_reboot_count >= MBED_CONF_PLATFORM_ERROR_REBOOT_MAX) { system_reset(); }
    system_reset();//do a system reset to get the system rebooted
#endif
#endif
    mbed_halt_system();
}

platform/mbed_error.c

deepikabhavnani · 2019-01-30T17:02:07Z

OK, Regarding the comment #9544 (comment)

Reset always is intended if MBED_CONF_PLATFORM_FATAL_ERROR_AUTO_REBOOT_ENABLED is true. Don't bother to read my previous comments..

deepikabhavnani · 2019-01-30T17:04:07Z

It's possible that mbed_error chooses not to reset itself (because max is 0 or reboot is disabled altogether), but does still increment error_reboot_count. Then when someone causes a manual reset (possibly with a new image that now has reboot enabled), it may see reboot_count higher than max, and halt on entry. We don't want to halt on manual resets, only automatic ones.

All the information is in RAM, so manual reset with new image will not have old RAM data. Does this scenario stand true in that case?

deepikabhavnani · 2019-01-30T18:10:37Z

platform/mbed_error.c

+                report_error_ctx->is_error_processed = 1;//Set the flag that we already processed this error
+                crc_val = compute_crc32(report_error_ctx, offsetof(mbed_error_ctx, crc_error_ctx));
+                report_error_ctx->crc_error_ctx = crc_val;
+
                //Enforce max-reboot only if auto reboot is enabled
 #if MBED_CONF_PLATFORM_FATAL_ERROR_AUTO_REBOOT_ENABLED
                if (report_error_ctx->error_reboot_count >= MBED_CONF_PLATFORM_ERROR_REBOOT_MAX) {
                    mbed_halt_system();


Should we clear all RAM data here before halting? So next reset - which has to be manual since system is in halt will not have impact of this error.

I think we probably want to preserve the contents of the error RAM, in case someone wants to attach a debugger or something. Setting the "processed" flag should be sufficient to stop anything else happening on next reset.

SenRamakri · 2019-01-30T21:20:47Z

@kjbracey-arm - Thanks for making this change, I think it should fix the case where a manual (warm)reset is intended to get the system executing the main (cold reset will reset RAM contents anyway), again. I was also thinking if we should reset the error_reboot_count as well so that the system can go through as many configured auto-reboots after a manual (warm)reset. That's something to think about.
I also liked the solution @deepikabhavnani mentioned to clear the Crash-Data-RAM area before halting the system that would clear both is_error_processed and error_reboot_count. But with just setting is_error_processed to 1 we can still retrieve the reboot context just in case. So, there is pros/cons for both. I'll go ahead approve this now. By the way I don't think we should do the change in #9545.

Halt to enforce reboot limit once only

cmonr · 2019-01-30T23:39:16Z

#9508 (comment)

kjbracey · 2019-01-31T07:46:32Z

I was also thinking if we should reset the error_reboot_count as well so that the system can go through as many configured auto-reboots after a manual (warm)reset

If you've cleared error_processed, nothing more should be required. But I think it's important that mbed_error set reboot_count to 0 in the case where it prepares a new error context and is not calling system_reset itself. We only want the reboot count set if it actually was a reboot, not a halt.

One possible failure scenario as it is would be running an image without reboot enabled - that would set or increment the count on each run, then halt, but not check it itself. Next time you flash an image with reboot enabled and max 1 - it would not boot the first time.

But you have some other code looking at the count as a sort of "valid" flag from the callback - that would need to be checked and adjusted. I'm not quite clear on whether the "reboot callback" is supposed to be called only if it is a reboot, or also if it is a "halt followed by manual reset".

kjbracey · 2019-01-31T08:03:28Z

All the information is in RAM, so manual reset with new image will not have old RAM data.

Manual resets (button or JTAG or serial break or whatever) do not reset the RAM any more than a system_reset call does.

The error block is preserved until power is off long enough for the RAM to lose data or some code deliberately or accidentally wipes it.

Flashing the device might wipe it. If the error block is at or near the bottom of RAM, it's would be quite possible that the pyOCD or built-in flash programming algorithm uses the area as workspace, so it might get trashed. But it might not. Flashing algorithms wouldn't usually deliberately clear the RAM.

kjbracey mentioned this pull request Jan 30, 2019

Disable reboot limit #9545

Closed

0xc0170 requested review from SenRamakri and a user January 30, 2019 11:26

0xc0170 added the needs: review label Jan 30, 2019

0xc0170 added the release-version: 5.11.3 label Jan 30, 2019

bulislaw approved these changes Jan 30, 2019

View reviewed changes

adbridge added the Manually Patch label Jan 30, 2019

deepikabhavnani reviewed Jan 30, 2019

View reviewed changes

platform/mbed_error.c Show resolved Hide resolved

deepikabhavnani reviewed Jan 30, 2019

View reviewed changes

SenRamakri approved these changes Jan 30, 2019

View reviewed changes

cmonr added ready for merge and removed needs: review labels Jan 30, 2019

cmonr merged commit cc94690 into ARMmbed:master Jan 30, 2019

cmonr removed the ready for merge label Jan 30, 2019

cmonr pushed a commit that referenced this pull request Jan 30, 2019

Merge pull request #9544 from kjbracey-arm/reboot_limit_fix

36bb984

Halt to enforce reboot limit once only

kjbracey deleted the reboot_limit_fix branch January 31, 2019 08:07

adbridge removed the Manually Patch label Apr 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Halt to enforce reboot limit once only #9544

Halt to enforce reboot limit once only #9544

Uh oh!

kjbracey commented Jan 30, 2019 •

edited

Loading

Uh oh!

0xc0170 commented Jan 30, 2019

Uh oh!

kjbracey commented Jan 30, 2019

Uh oh!

mbed-ci commented Jan 30, 2019

Uh oh!

cmonr commented Jan 30, 2019

Uh oh!

deepikabhavnani commented Jan 30, 2019

Uh oh!

deepikabhavnani commented Jan 30, 2019

Uh oh!

Uh oh!

deepikabhavnani commented Jan 30, 2019

Uh oh!

deepikabhavnani commented Jan 30, 2019

Uh oh!

deepikabhavnani Jan 30, 2019

Uh oh!

kjbracey Jan 31, 2019

Uh oh!

SenRamakri commented Jan 30, 2019 •

edited

Loading

Uh oh!

cmonr commented Jan 30, 2019

Uh oh!

kjbracey commented Jan 31, 2019 •

edited

Loading

Uh oh!

kjbracey commented Jan 31, 2019 •

edited

Loading

Uh oh!

Uh oh!

Halt to enforce reboot limit once only #9544

Halt to enforce reboot limit once only #9544

Uh oh!

Conversation

kjbracey commented Jan 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Pull request type

Reviewers

Uh oh!

0xc0170 commented Jan 30, 2019

Uh oh!

kjbracey commented Jan 30, 2019

Uh oh!

mbed-ci commented Jan 30, 2019

Test run: SUCCESS

Uh oh!

cmonr commented Jan 30, 2019

Uh oh!

deepikabhavnani commented Jan 30, 2019

Uh oh!

deepikabhavnani commented Jan 30, 2019

Uh oh!

Uh oh!

deepikabhavnani commented Jan 30, 2019

Uh oh!

deepikabhavnani commented Jan 30, 2019

Uh oh!

deepikabhavnani Jan 30, 2019

Choose a reason for hiding this comment

Uh oh!

kjbracey Jan 31, 2019

Choose a reason for hiding this comment

Uh oh!

SenRamakri commented Jan 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmonr commented Jan 30, 2019

Uh oh!

kjbracey commented Jan 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kjbracey commented Jan 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kjbracey commented Jan 30, 2019 •

edited

Loading

SenRamakri commented Jan 30, 2019 •

edited

Loading

kjbracey commented Jan 31, 2019 •

edited

Loading

kjbracey commented Jan 31, 2019 •

edited

Loading