Skip to content

Crash reporting documentation #831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 27, 2018

Conversation

SenRamakri
Copy link
Contributor

@SenRamakri SenRamakri commented Nov 15, 2018

Documentation for crash reporting feature in Mbed OS. This is targeted for 5.11 release.

@SenRamakri SenRamakri force-pushed the sen_CrashReportingDocs branch from 4fd3b9b to 6715706 Compare November 15, 2018 17:29
@SenRamakri
Copy link
Contributor Author

@kegilbert @AnotherButler - Please review.

@kegilbert
Copy link
Contributor

Small nit that's not part of the included changes, but the very first line of Error.md uses the word 'error' 5 times in one sentence.

Mbed OS provides error code and error status definitions and error handling APIs for error construction, reporting and retrieving previously reported errors.

@@ -130,6 +130,37 @@ Mbed OS application and system developers may need to define error codes specifi

Some applications may want to do custom error handling when an error is reported using `MBED_ERROR()` or `MBED_WARNING()`. Applications can accomplish this by registering an error hook function with the Mbed OS error handling system using the **mbed_set_error_hook()** API. This function is called with error context information whenever the system handles an **MBED_ERROR()** or **MBED_WARNING()** invocation. This function should be implemented for re-entrancy because multiple threads may invoke `MBED_ERROR()` or `MBED_WARNING()`, which may cause the error hook to be called in parallel.

### Crash reporting and auto-reboot
Whenever a fatal error happens in the system, MbedOS error handling system collects key information such as error code, error location, register context(in the case of fault exceptions) etc. and stores them in a special area in RAM region called Crash-data-RAM. The error information stored in Crash-data-RAM is in binary format and follows the `mbed_error_ctx` structure defined in `mbed_error.h`. The system then triggers a warm-reset without losing the RAM contents where we have the error information collected. After the system reboots, during MbedOS initialization the Crash-data-RAM region is checked to find if there is valid error information captured. This is done by using a CRC value calculated over the stored error information and is appended as part of information stored in Crash-data-RAM. If the system detects that the reboot was triggered by a fatal error, it will invoke a callback function with a pointer to the error context structure stored in Crash-data-RAM. The default callback function is defined with `WEAK` attribute, which can be overridden by the application if required. Below is the signature for the callback:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit:
Move the following to the second line maybe? Sounds a little cleaner to me.
"...and stores them in a special area in RAM region called Crash-data-RAM." ->
"...and stores them in a reserved RAM region called Crash-data-RAM"


#### Adding Crash-data-RAM region for crash reporting
As mentioned above, the crash reporting feature requires a special memory region, called Crash-data-RAM to work. This region is 256 bytes in size and is allocated using linker scripts for the target for each toolchain. Although all platforms support crash reporting feature, not all targets are currently modified to allocate this Crash-data-RAM region.
See `mbed_lib.json` in platform directory to see which targets currently enabled with crash reporting. In order to enable crash reporting in other targets, you must modify the linker scripts for those targets to allocate the Crash-data-RAM region. You may refer the linker scripts for one of the targets already enabled with crash reporting to understand how the Crash-data-RAM region is allocated. Below are some guidelines to make the linker script changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
See `mbed_lib.json` in platform directory to see which targets currently enabled with crash reporting. In order to enable crash reporting in other targets, you must modify the linker scripts for those targets to allocate the Crash-data-RAM region. You may refer the linker scripts for one of the targets already enabled with crash reporting to understand how the Crash-data-RAM region is allocated. Below are some guidelines to make the linker script changes.
See `mbed_lib.json` in the platform directory to see which targets currently enable crash reporting. In order to enable crash reporting for other targets, you must modify the linker scripts for those targets to allocate the Crash-data-RAM region. You may refer to the linker scripts for one of the targets already enabled with crash reporting to understand how the Crash-data-RAM region is allocated. Below are some guidelines to make the linker script changes.

See `mbed_lib.json` in platform directory to see which targets currently enabled with crash reporting. In order to enable crash reporting in other targets, you must modify the linker scripts for those targets to allocate the Crash-data-RAM region. You may refer the linker scripts for one of the targets already enabled with crash reporting to understand how the Crash-data-RAM region is allocated. Below are some guidelines to make the linker script changes.

* The region size should be 256 bytes and aligned at 8-byte offset.
* If you are enabling the Crash-data-RAM for *ARM compiler*, linker script must export the following symbols:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If you are enabling the Crash-data-RAM for *ARM compiler*, linker script must export the following symbols:
* If you are enabling the Crash-data-RAM for the *ARM compiler*, linker script must export the following symbols:

See `mbed_lib.json` in platform directory to see which targets currently enabled with crash reporting. In order to enable crash reporting in other targets, you must modify the linker scripts for those targets to allocate the Crash-data-RAM region. You may refer the linker scripts for one of the targets already enabled with crash reporting to understand how the Crash-data-RAM region is allocated. Below are some guidelines to make the linker script changes.

* The region size should be 256 bytes and aligned at 8-byte offset.
* If you are enabling the Crash-data-RAM for *ARM compiler*, linker script must export the following symbols:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If you are enabling the Crash-data-RAM for *ARM compiler*, linker script must export the following symbols:
* If you are enabling the Crash-data-RAM for *ARM compiler*, linker scripts must export the following symbols:

__\_\_CRASH_DATA_RAM_START\_\___ - Indicates start address of Crash-data-RAM region.
__\_\_CRASH_DATA_RAM_END\_\___ - Indicates end address of Crash-data-RAM region.

It's important that these regions should be marked with appropriate attributes(based on toolchain) to mark them as uninitialized region. For example, for ARM Compiler Crash-data-RAM can be marked with attribute *EMPTY*. There is no hard requirement about the placement of this region. The only requirement is that it should be placed such that no other entity is overwriting this region when rebooted or at runtime. But in order to avoid fragmentation its best placed just after the vector table region, or if there is no vector table region, it can be placed at the bottom of RAM(lowest address).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It's important that these regions should be marked with appropriate attributes(based on toolchain) to mark them as uninitialized region. For example, for ARM Compiler Crash-data-RAM can be marked with attribute *EMPTY*. There is no hard requirement about the placement of this region. The only requirement is that it should be placed such that no other entity is overwriting this region when rebooted or at runtime. But in order to avoid fragmentation its best placed just after the vector table region, or if there is no vector table region, it can be placed at the bottom of RAM(lowest address).
It's important that these regions should be marked with appropriate attributes(based on toolchain) to mark them as uninitialized regions. For example, the ARM Compiler Crash-data-RAM region can be marked with attribute *EMPTY*. There is no hard requirement about the placement of this region. The only requirement is that it should be placed such that no other entity is overwriting this region when rebooted or at runtime. But in order to avoid fragmentation its best placed just after the vector table region, or if there is no vector table region, it can be placed at the bottom of RAM(lowest address).


#### Configuring crash reporting and auto-reboot
MbedOS crash reporting implementation provides many options to configure the crash reporting behavior.
Below is the list of new configuration options available to configure crash reporting functionality. These configuration options are defined in `mbed_lib.json` under platform directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Below is the list of new configuration options available to configure crash reporting functionality. These configuration options are defined in `mbed_lib.json` under platform directory.
Below is the list of new configuration options available to configure crash reporting functionality. These configuration options are defined in `mbed_lib.json` under the platform directory.

MbedOS crash reporting implementation provides many options to configure the crash reporting behavior.
Below is the list of new configuration options available to configure crash reporting functionality. These configuration options are defined in `mbed_lib.json` under platform directory.

`crash-capture-enabled` - Enables crash context capture when the system enters a fatal error/crash. When this is disabled it also disables other dependent options mentioned below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the configuration options laid out here into a list (Github proposed changes beta feature doesn't support multiline changes yet):

"Below is the list....."

  • crash-capture-enabled - Enables crash context capture when the system enters a fatal error/crash. When this is disabled it also disables other dependent options mentioned below.
  • fatal-error-auto-reboot-enabled - Setting this to true enables auto-reboot on fatal errors.
  • reboot-crash-report-enabled - Enables crash report printing over terminal when the system reboots after a fatal error. The format of this report is identical to error reporting structure mentioned in error report.
  • error-reboot-max - Maximum number of auto-reboots permitted on fatal errors. The system will stop auto-rebooting once the maximum limit is reached. Setting this to value 0 will disable auto-reboot.

@@ -298,12 +329,99 @@ void save_all_errors() {
mbed_clear_all_errors();
}
```

#### Using `mbed_get_reboot_error_info()` to retrieve the reboot error info
The error context captured by the error handling system can be retrieved using mbed_get_reboot_error_info() API. See the below code for example usage of that API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to alternative phrasing here, mainly trying to remove the API term if this sentence is only referencing a single function call (could have the usage of API in a sentence wrong as well, let me know if this is fine as is).

Suggested change
The error context captured by the error handling system can be retrieved using mbed_get_reboot_error_info() API. See the below code for example usage of that API.
The error context captured by the error handling system can be retrieved using `mbed_get_reboot_error_info()`. See the below code for an example using that function.

```

#### Using `mbed_get_reboot_fault_context()` to retrieve the fault context info
The fault context captured can be retrieved using mbed_get_reboot_fault_context() API. See the below code for example usage of that API. The example code below checks for error_status using the error context and then retrieves the fault context using mbed_get_reboot_fault_context() API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, open to alternative phrasing here, mainly trying to remove the API term if this sentence is only referencing a single function call (could have the usage of API in a sentence wrong as well, let me know if this is fine as is).

Suggested change
The fault context captured can be retrieved using mbed_get_reboot_fault_context() API. See the below code for example usage of that API. The example code below checks for error_status using the error context and then retrieves the fault context using mbed_get_reboot_fault_context() API.
The fault context captured can be retrieved using `mbed_get_reboot_fault_context()`. See the below code for an example using that function. The example code below checks for error_status using the error context and then retrieves the fault context using `mbed_get_reboot_fault_context()`.

The error context captured by the error handling system can be retrieved using mbed_get_reboot_error_info() API. See the below code for example usage of that API.
In the example below, a status variable reboot_error_detected has been used to track the presence of error context capture.

```CPP TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we missing for the snipping compilation script to run this so we can remove the TODO?

#### Using `mbed_get_reboot_fault_context()` to retrieve the fault context info
The fault context captured can be retrieved using mbed_get_reboot_fault_context() API. See the below code for example usage of that API. The example code below checks for error_status using the error context and then retrieves the fault context using mbed_get_reboot_fault_context() API.

```CPP TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we missing for the snipping compilation script to run this so we can remove the TODO?

```

#### Using `mbed_reset_reboot_count()` to reset the reboot count
`mbed_reset_reboot_error_info()` API can be used to specifically reset the reboot count stored as part error information stored in Crash-data-RAM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, open to alternative phrasing here, mainly trying to remove the API term if this sentence is only referencing a single function call (could have the usage of API in a sentence wrong as well, let me know if this is fine as is).

Suggested change
`mbed_reset_reboot_error_info()` API can be used to specifically reset the reboot count stored as part error information stored in Crash-data-RAM.
`mbed_reset_reboot_error_info()` can be used to specifically reset the reboot count stored as part of the error information stored in Crash-data-RAM.

#### Using `mbed_reset_reboot_count()` to reset the reboot count
`mbed_reset_reboot_error_info()` API can be used to specifically reset the reboot count stored as part error information stored in Crash-data-RAM.
Calling this function will set the reboot count to 0.
```CPP TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we missing for the snipping compilation script to run this so we can remove the TODO?

@kegilbert
Copy link
Contributor

kegilbert commented Nov 15, 2018

@SenRamakri Overall looks great! Just minor nits and a small question above.

Question: I feel like the API examples are a bit far from the section explaining the crash reboot feature (and same of the error API sections). Is it worth adding some jump here for examples/API reference! links at the end of those sections?

@SenRamakri
Copy link
Contributor Author

@kegilbert - I fixed the review comments. Please take a look, and I have removed TODOs but lets see if it fails and we have to fix them accordingly.

@kegilbert
Copy link
Contributor

kegilbert commented Nov 15, 2018

Hey! @geky the snipping tool caught something :D This won't build properly until the reboot PR is merged in retrospect, but did catch a missing }

{
if (reboot_error_detected == 1) {
if (MBED_SUCCESS == mbed_get_reboot_error_info(&error_ctx)) {
printf("\nSuccessfully read error context\n");
Copy link
Contributor

@kegilbert kegilbert Nov 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a closing }

```

#### Using `mbed_reset_reboot_error_info()` to clear the reboot error info
`mbed_reset_reboot_error_info()` API can be used to clear the reboot error info if required by the application.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing as before in reference to the API reference. In my mind

... using the Mbed OS Crash Reboot API

is correct,

...using the mbed_reset_reboot_error_info() API

should be rephrased from API to function as a member of the API.

Thoughts? I could be off the mark on this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, Im removing the API working.

@SenRamakri SenRamakri force-pushed the sen_CrashReportingDocs branch 2 times, most recently from 458cf8e to 815cbc9 Compare November 15, 2018 23:27
@SenRamakri
Copy link
Contributor Author

@kegilbert - I have fixed the bracket issue, but still need the TODO as you mentioned.

@kegilbert
Copy link
Contributor

Yeah we can put the TODOs back until ARMmbed/mbed-os#8702 is merged.

Copy link
Contributor

@kegilbert kegilbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@SenRamakri
Copy link
Contributor Author

Do not merge this until ARMmbed/mbed-os#8702 is merged.

@AnotherButler
Copy link
Contributor

Please review my edits to make sure I didn't accidentally change the meaning of anything.

@SenRamakri SenRamakri force-pushed the sen_CrashReportingDocs branch from adf71a8 to 2c648c7 Compare November 19, 2018 04:08
@AnotherButler
Copy link
Contributor

This PR is only waiting on the code dependency.

@SenRamakri
Copy link
Contributor Author

@AnotherButler - The code has been merged to mbed-os, this docs changes can be merged now.

@SenRamakri SenRamakri force-pushed the sen_CrashReportingDocs branch from 2c648c7 to 6a0530a Compare November 26, 2018 16:46
@AnotherButler AnotherButler merged commit 8c263e6 into ARMmbed:development Nov 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants