Skip to content

Error Handling and error coding #6859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

SenRamakri
Copy link
Contributor

@SenRamakri SenRamakri commented May 10, 2018

This is the implementation for Standardized error coding and handling for mbed-os.
The changes with this pull request implements standardized error code handling and error code definitions/encoding. With this, error handling is implemented as a Core-os service which captures and
logs the errors with context information. Each error code is captured with error location, error message,
error file(this is configurable as this can cause increase in binary size), calling thread info, module generating the error etc. Relevant APIs are provided to log and retrieve errors from error logging system.
A build time flag has been provided to switch off error logging system on memory constrained platforms.
Error reporting itself is logically separated and implemented in mbed_error_reporting.c/h. By doing this both exception handling and error handling can use the same reporting infrastructure. A sample error report is as below. Note that error status value captures error type, entity and error code. Error reporting prints these values separately as well to enable developers to search the source tree with error code.
Error handling service also provides an API to save the error log in file system if required.

++ MbedOS Error Info ++
Error Status: 0x80040103
Error Code: 259
Error Entity: 04
Error Message: HAL Entity error
Error Location: 0x000067C7
Error Value: 0x00005566
Current Thread: Id: 0x200024A8 EntryFn: 0x0000FB0D StackSize: 0x00001000 StackMem: 0x200014A8 SP: 0x2002FFD8
-- MbedOS Error Info --

The implementation is tested using the Greentea tests under TESTS/mbed_platform/error_handling.
Note that exception handling and reporting testing cannot be done using Greentea and was manually tested.

Pull request type

[ ] Fix
[ ] Refactor
[ ] New target
[X] Feature
[ ] Breaking change

The APIs/data structures/macros are all documented using Doxygen. In addition handbook documentation will also be generated for release.

Overall impact of this feature is ~1K(ROM) including logging. Logging can be disabled and if disabled
the overall impact would be around 0.75K.

Copy link
Member

@bulislaw bulislaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that, it would be great to have error codes unified across the OS finally :)

static MbedErrorHook error_hook = NULL;

//Helper function to get the current SP
static unsigned int get_current_sp()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be useful as a public lib/platform function. Maybe creating platform/mbed_utils.c/h for this kind of utility functions would be a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bulislaw for taking time to review this.
Good suggestion, yes Ill look into if we have enough of these functions which can be filtered out to be separated into its own APIs.

}

//Helper function to halt the system
static void mbed_halt_system(void)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's good for production devices to spin when something goes wrong. I feel like we should try to dump some state somewhere and reset. Or at least provide user level "hard fault"/error hook API the same way we do it for idle hook.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, Thanks for bringing that out, in fact I was thinking the same and was planning to handle this using warm reset+recovery mode. But it does require lot more design/changes and is in plans as of now. The current implementation behaves just the same as before so at least we are not taking it backwards. I'll discuss this with you more offline.

By the way, we do provide an error handling for applications to handle the error if required. That should help I guess.

extern "C" {
#endif

//Define this macro to include filenames in error context, this can save memory. For release builds, do not include filename
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not doxygen, it should be I guess. That's the same across this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we will modify it to be doxygen friendly. Originally it was not intended to be in doxygen, but I think its good idea to document that in doxygen.

#define MBED_ERROR_STATUS_CODE_POS (0)
#define MBED_ERROR_STATUS_CODE_FIELD_SIZE (16)

#define MBED_ERROR_STATUS_ENTITY_MASK (0x00FF0000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit pick, could we call it MODULE not ENTITY across the whole error handling? It actually took me a while to figure out what you mean by entity, I think more people will have the same, module is standard term in this context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll change it to module if its more context appropriate.

@@ -123,15 +123,15 @@ u32_t sys_now(void) {
*---------------------------------------------------------------------------*/
err_t sys_mbox_new(sys_mbox_t *mbox, int queue_sz) {
if (queue_sz > MB_SIZE)
error("sys_mbox_new size error\n");
SET_ERROR_FATAL(MAKE_ERROR(ENTITY_NETWORK_STACK, ERROR_CODE_INVALID_SIZE), "sys_mbox_new size error\n", queue_sz);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of points as it's not very user friendly:

  • Why SET_ERROR_FATAL and not ERROR or MBED_ERROR? Is there not fatal error? Also the set doesn't add anything to the meaning, I know from the implementation point of view it makes sense, but from user point of view doesn't matter that it sets the error state, for them it's just 'I want to report error'. If the error is not fatal lets call it warning.
  • I don't like this construct SET_ERROR_FATAL(MAKE_ERROR( why do we need to put it here manually every single time? Why not MBED_ERROR(ENTITY_NETWORK_STACK, ERROR_CODE_INVALID_SIZE, "sys_mbox_new size error\n", queue_sz) and MBED_ERROR can call MAKE_ERROR on the two first parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I named the functions starting with "Set"(verb) because calling that API changes the state of the system and makes the system do something. Its not a function which is declaring anything and retrieving anything, that's why I thought it should be a verb, so I think it does add meaning, simply saying ERROR doesn't say if its declaring anything or making the system do anything.

Then, MAKE_ERROR is required only if you want to add more info to the error status code. Normally you can use, SET_ERROR( status code(=ERROR_INVALID_ARGUMENT), msg, ... ), for example. But I do see the convenience factor in what you mentioned. I'll try to work on adding more convenience macros if it doesn't make it more confusing. I will also look into the comments on changing non-fatal errors as warnings. Thanks for pointing that out.

#ifdef __cplusplus
extern "C" {
#endif
MbedErrorStatus mbed_log_put_error(mbed_error_ctx *error_ctx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No docs, and they are public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is not intended to be public, we don't want this to be called directly, public APIs are the ones in mbed_error.h

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the header, so I think even if consumer are other parts of Mbed OS (outside of file) we should get some simple docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, let me see how we can address this. I was not thinking of exposing it outside. But Ill think about it and if its useful we can expose it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We document public API's with doxygen, and internal global functions similar way just without double asterisk.
Reference: https://github.com/deepikabhavnani/mbed-os/blob/1bfb5b14b65fd32ffc2f58ffc144de859032a494/platform/logger_internal.h#L123

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deepikabhavnani - Good catch, I'll update the doxygen comments.

}
}

/* Limited print functionality which prints the string out to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not doxy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are not doxygen as they are not called publicly. So keeping it more private.

*/
void mbed_error_print(char *fmtstr, uint32_t *values)
{
#if DEVICE_SERIAL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about ITM and so on? I think we have nice abstraction there, so we can do it once and have it for all backends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but I found that we only have 2 devices(NRFs) with ITM support as of now, I think we need broader support so that it justifies adding more logic here to route it based on whats present and whats active.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the case because we only needed it on these 2, due to small number of UARTs. I can envision a production device with multiple UARTs that are being used by WiFi, GPS, whatever and not having enough for debug.

extern "C" {
#endif

/* routine to report the error */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not doxy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, :-). Its not really intended to be called directly. Its more for platform components to use so keeping non-doxy until we really find a need to expose it.

@@ -126,6 +126,7 @@ void remove_filehandle(FileHandle *file) {
#if DEVICE_SERIAL
extern int stdio_uart_inited;
extern serial_t stdio_uart;
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calls below depend on serial, not sure if we can close the #if here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is not expected, I think its some sort of rebase/merge issue. Ill address it.

\endverbatim
*/

typedef enum _MbedErrorCode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's intimidating, we should also set some guidelines inside the system which errors should be used. POSIX errors for POSIX functions and system errors for the rest seems reasonable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thats good suggestion, Ill document that here and in handbook.

Copy link
Contributor Author

@SenRamakri SenRamakri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bulislaw - Thanks for your review, please see my responses, meanwhile Im working on fixing the code with your feedbacks.

@@ -123,15 +123,15 @@ u32_t sys_now(void) {
*---------------------------------------------------------------------------*/
err_t sys_mbox_new(sys_mbox_t *mbox, int queue_sz) {
if (queue_sz > MB_SIZE)
error("sys_mbox_new size error\n");
SET_ERROR_FATAL(MAKE_ERROR(ENTITY_NETWORK_STACK, ERROR_CODE_INVALID_SIZE), "sys_mbox_new size error\n", queue_sz);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I named the functions starting with "Set"(verb) because calling that API changes the state of the system and makes the system do something. Its not a function which is declaring anything and retrieving anything, that's why I thought it should be a verb, so I think it does add meaning, simply saying ERROR doesn't say if its declaring anything or making the system do anything.

Then, MAKE_ERROR is required only if you want to add more info to the error status code. Normally you can use, SET_ERROR( status code(=ERROR_INVALID_ARGUMENT), msg, ... ), for example. But I do see the convenience factor in what you mentioned. I'll try to work on adding more convenience macros if it doesn't make it more confusing. I will also look into the comments on changing non-fatal errors as warnings. Thanks for pointing that out.

static int error_count = 0;
static mbed_error_ctx first_error_ctx = {0};
static mbed_error_ctx current_error_ctx = {0};
static MbedErrorHook error_hook = NULL;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix the naming.

}

//Helper function to halt the system
static void mbed_halt_system(void)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, Thanks for bringing that out, in fact I was thinking the same and was planning to handle this using warm reset+recovery mode. But it does require lot more design/changes and is in plans as of now. The current implementation behaves just the same as before so at least we are not taking it backwards. I'll discuss this with you more offline.

By the way, we do provide an error handling for applications to handle the error if required. That should help I guess.

@@ -57,14 +57,14 @@ int ReadOnlyBlockDevice::read(void *buffer, bd_addr_t addr, bd_size_t size)

int ReadOnlyBlockDevice::program(const void *buffer, bd_addr_t addr, bd_size_t size)
{
error("ReadOnlyBlockDevice::program() not allowed");
return 0;
SET_ERROR_FATAL(MAKE_ERROR(ENTITY_BLOCK_DEVICE, ERROR_CODE_WRITE_PROTECTED), "ReadOnlyBlockDevice::program() not allowed", addr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the return is there only to make compiler happy, either before(with error()) or now its going to bring the system to halt.

}

//Use critsect here, as we don't want processing more than one error at the same time
core_util_critical_section_enter();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into if we can make the critical section shorter.

#ifdef __cplusplus
extern "C" {
#endif
MbedErrorStatus mbed_log_put_error(mbed_error_ctx *error_ctx);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is not intended to be public, we don't want this to be called directly, public APIs are the ones in mbed_error.h

}
}

/* Limited print functionality which prints the string out to
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are not doxygen as they are not called publicly. So keeping it more private.

*/
void mbed_error_print(char *fmtstr, uint32_t *values)
{
#if DEVICE_SERIAL
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but I found that we only have 2 devices(NRFs) with ITM support as of now, I think we need broader support so that it justifies adding more logic here to route it based on whats present and whats active.

extern "C" {
#endif

/* routine to report the error */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, :-). Its not really intended to be called directly. Its more for platform components to use so keeping non-doxy until we really find a need to expose it.

rtos/Thread.cpp Outdated
#endif

void Thread::constructor(uint32_t tz_module, osPriority priority,
void Thread::constructor(osPriority priority,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that's unexpected, Ill look into that. Looks like rebase issue, good catch :-)

@SenRamakri
Copy link
Contributor Author

@bulislaw - Please see my new changes for error handling - Most of your feedbacks are taken care of except few. Doxygen comments are fixed, changed from set_error/fatal to set_warning/set_error, itm support added, amount of critical section code is reduced(using a local variable for some portions), moved error hook callback towards the end of error handling, fixed comments and other changes. I'm unable to move the doxygen comments on list of error codes as you mentioned as I see that its not getting rendered as I want to be and is messing up other portions when I try to do that. Also, Im not going to add more convenience macros as I see that its starting to look more confusing as I add more macros as I cant remove the old ones because those macros work well for generic error statuses(without having entity info). So, for now I'm thinking its better to keep it how it is now.

@SenRamakri SenRamakri force-pushed the sen_ErrorHandling branch 2 times, most recently from bcc30e8 to 7a4f6ce Compare May 14, 2018 17:11
@SenRamakri
Copy link
Contributor Author

/morph build

@mbed-ci
Copy link

mbed-ci commented May 15, 2018

Build : FAILURE

Build number : 2017
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6859/

}

//Helper function to get the current SP
static unsigned int get_current_sp()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this one needs to take into account cortex-a (see the build failure).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for pointing that out, I'm looking into that. It appears there are no equivalent Mbed-os intrinsics for Cortex-A and so I'm looking into this.

@geky geky requested review from geky, c1728p9 and kjbracey May 15, 2018 20:41
@geky geky requested review from yossi2le and pan- May 15, 2018 23:24
Copy link
Contributor

@geky geky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell this is a mush of two new features that aren't all that related to each other (error logging and error codes). We shouldn't be creating PRs like this, it's harder to review and less likely to get in.

And sorry about putting a red X, but there are a few problems we need to take care of before this can go in:

  1. We should not have redundant error codes for the same error. Should I use ENOMEM or OUT_OF_MEMORY? Why are these two different error codes in our code base?

    Thinking about it, if you are encoding more information in the error codes, it would be better to just change the value of ENOMEM to whatever encoding you are using for OUT_OF_MEMORY. This would let users use both ENOMEM and OUT_OF_MEMORY without additional code size for translating between the two.

  2. Should this error logging be tied to @deepikabhavnani's logging pr, Unified logging - String Based  #5965? If Unified logging - String Based  #5965 does not make it in, I think we should hold this one back as well so that the API churn around logging is kept to a minimum.

  3. With the addition of the new error mechanisms, we should find out the memory consumption before and after with some example, and have those numbers available in this PR. The error code cannot be optimized out and is a cost to every application using mbed-os. @pan- did quite a bit of optimizing the previous error mechanism.

* @param error_code Error code value to be used, must be between 1 and 255(inclusive).
*
*/
#define DEFINE_POSIX_ERROR( error_name, error_code ) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New macros should have MBED_ prefixes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can rename this to add MBED_ prefix.

* This is to enable easy injection of Posix error codes into MbedOS error handling system without altering the actual Posix error values.\n
* Accordingly, Posix error codes are represented as -1 to -255 under MbedOS error status representation.
*/
typedef int MbedErrorStatus;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typedefs be something like mbed_error_t?

Especially for C developers, this appears very verbose and heavy-weight, which will discourage its adoption.

But maybe that's fine if we're ok with users using int to represent errors?

I like that this is backwards-compatibile with our current error handling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does mbed_error_status_t work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot of typing, why not mbed_error_t or mbed_status_t?

#ifdef MBED_CONF_ERROR_FILENAME_CAPTURE_ENABLED
#define SET_ERROR( error_status, error_msg, error_value ) set_error( error_status, (const char *)error_msg, (uint32_t)error_value, (const char *)MBED_FILENAME, __LINE__ )
#else
#define SET_ERROR( error_status, error_msg, error_value ) set_error( error_status, (const char *)error_msg, (uint32_t)error_value, NULL, 0 )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SET_ERROR sounds like it's setting an errno variable. Why not keep this verb "error"? Maybe make it match the style for macros as MBED_ERROR? Or if you need to differentiate between error logging and halting maybe MBED_FAIL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already discussed about this, please see my response to @bulislaw's comments earlier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why SET_ERROR_FATAL and not ERROR or MBED_ERROR? Is there not fatal error? Also the set doesn't add anything to the meaning, I know from the implementation point of view it makes sense, but from user point of view doesn't matter that it sets the error state, for them it's just 'I want to report error'. If the error is not fatal lets call it warning.

Let me know if I'm looking at the wrong conversation, but it looks like @bulislaw and I agree, the SET_* prefix is going to be confusing for users.

Consider @c1728p9's comment below, even we are getting confused by the goal of this function.


#ifdef __cplusplus
extern "C" {
#endif
void error(const char* format, ...);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Deprecating error! I think this is the wrong place for this.

How about we keep the new implementation but wait to deprecate until the logging changes? That way debug and error can be deprecated at the same time, and users will get less deprecation churn.

And even then we really shouldn't deprecate error or debug until we're sure logging is stable. There's no real rush to deprecate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging changes are still being reviewed/and other ideas being discussed. So, this is not getting deprecated as part of logging. So, no need to wait for logging/tracing to happen. Logging may introduce error-logging APIs which is different from error handling.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @geky here, we should not deprecate error. Current error API does both logging and handling of error, it is more for critical errors which will result in application termination.
However with error handling we would not terminate on all errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SET_ERROR does terminate the application, just as how error did. SET_WARNING does not terminate the application but will log the error context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current API does only the logging and termination, the goal here is to capture more actionable data which helps developers to debug the cause of app crashing etc. By the way, error logging is not same as logging or tracing implementation we are doing, they are two different things. But we can use some of the aspects of unified tracing in the error handling mechanism once tracing is in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not deprecate error until we deprecate debug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we cannot use mbed_trace as it uses too much resources+stdlibs, and the idea is to have an implementation to output to serial/uart without using much of system resources. If you have noticed the mbed_error_print is minimal implementation which was already there as part of exception handling, but we moved it under mbed_error_report to be used for error handling as well. By the error handling uses crit sect, so it will not corrupt other output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that nothing should be hard-coded to serial/uart - we've got full console abstraction now. There should be no serial_putcs in any of these paths, and one of the outstanding jobs is to remove them all in favour of write(STDERR_FILENO).

Copy link
Contributor Author

@SenRamakri SenRamakri May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjbracey-arm - I see your point, but the intention here is to use as much less resources as possible(without even using any stdlibs as allowed) as this is the same path for exception handling. So, using those high level abstracted calls may not work in all cases for example when we are dealing with an fatal exception. Just to give you more info, we are planning/thinking on a recovery mode feature where in we reset the system without re-initing the piece memory having the error info and then on reboot we can print that info out through "write" calls normally, as you mentioned. At that point, we can get rid of special handling when dealing with exception handling/errors handling. As of now, this looks to be the best option. If you noticed, this is actually not new, this piece of code has been moved from the exception handling path to be used for fatal error paths as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying we need to adopt mbed-trace right now, I'm saying we should not deprecate error until we deprecate debug. You have an implementation or error here, so just remove the deprecation.

There's no urgency to deprecate error and users don't like making multiple changes.

* @return void
*
*/
typedef void (*MbedErrorHook)(const mbed_error_ctx *error_ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typedefs should not be CamelCase

* @return int Number of errors reported.
*
*/
int get_error_count(void);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean errors are recoverable? I thought they would halt the CPU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, fatal errors are not recoverable. But eventually we need to extend this to implement a recovery mode and I hope we can recover from fatal errors by doing a warm reset and still keep the error info in tact and to be reported later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh interesting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we currently have a requirement to implement recovery mode.

* @note Filesystem support is required in order for this function to work.
*
*/
MbedErrorStatus save_error_log(const char *path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error log? Is this opt-in? Do we know how much of a footprint this has?

@@ -123,15 +123,15 @@ u32_t sys_now(void) {
*---------------------------------------------------------------------------*/
err_t sys_mbox_new(sys_mbox_t *mbox, int queue_sz) {
if (queue_sz > MB_SIZE)
error("sys_mbox_new size error\n");
SET_ERROR(MAKE_ERROR(MODULE_NETWORK_STACK, ERROR_CODE_INVALID_SIZE), "sys_mbox_new size error\n", queue_sz);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this, I think we should collapse these macros.

SET_ERROR(MODULE_NETWORK_STACK, ERROR_CODE_INVALID_SIZE, "sys_mbox_new size error\n", queue_sz);
// and
SET_CUSTOM_ERROR(MODULE_NETWORK_STACK, ERROR_CODE_INVALID_SIZE, "sys_mbox_new size error\n", queue_sz);

Is the extra macro name needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats good idea, if we collapse do you any suggestions for using the custom defined error?
If we use the same macro, how does the caller use a custom defined error?

Copy link
Contributor Author

@SenRamakri SenRamakri May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, error logging can be completely disabled using a config mechanism, but by default it is enabled. Note that we will still capture the first and last error in the system which is always enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does having two macros, SET_ERROR and SET_CUSTOM_ERROR not work?

* passed as the second argument in the DEFINE_SYSTEM_ERROR macro.\n\n
* Below are the Mbed System error codes and the description:
* \verbatim
UNKNOWN 256 Unknown error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already pretty well established as -1. Can we keep -1 as an unknown error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it but -1 maps to Posix Error Code. So, mapping -1 again to unknown error maybe confusing. And it also doesn't align with our overall error code definition/format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking more we should renumber the std errno codes (see above). This would free up -1 as general error.

Actually even if it conflicts, its just conflicting with EPERM (not EACCESS), which is pretty close to a generic error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to include posix error codes as part of mbed error handling model so the modules already using posix still have the same meaning. The problem with conflicting error codes is that it becomes difficult if some one gets an error and wants to search source code for where that came from.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are far more modules returning -1 for errors than POSIX errors. Back in mbed 2 this was the correct thing to do:
https://os.mbed.com/users/mbed_official/code/SDFileSystem/file/8db0d3b02cec/SDFileSystem.cpp

The only thing worse than conflicting error codes are error codes with misleading names.

This also does not become a problem if we renumber the std errno codes.

MbedErrorStatus error = MAKE_ERROR(MODULE_APPLICATION, ERROR_CODE_UNKNOWN);
SET_WARNING(error, "Error Unknown", 0xAABBCCDD );
MbedErrorStatus lastError = get_last_error();
printf("\nlastError = 0x%08X", lastError );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this print be part of final test code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we can remove those prints.

void test_system_errors()
{
MbedErrorStatus error = MAKE_ERROR(MODULE_APPLICATION, ERROR_CODE_UNKNOWN);
SET_WARNING(error, "Error Unknown", 0xAABBCCDD );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have SET_WARNING as an API? how is this different then tracing feature MBED_WARN or tr_warn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SET_WARNING captures error context, can call application hooks, stores the error error context etc but tracing does not(or rather should not). Error handling should be seen differently from tracing and both are needed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree error handling provides more functionality which is not same as tracing. My point islogs / print messages should not be handled in this feature. Log messages printed using SET_WARNING / SET_ERROR are not line interleaved and will corrupt data displayed on UART.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my reply above, unfortunately we cannot use mbed_trace APIs as of now due to its extensive use of resources. Note that this needs to work with handling of exceptions as well.

void fault_print_init(void);
void fault_print_str(char *fmtstr, uint32_t *values);
void hex_to_str(uint32_t value, char *hex_star);
//Functions Prototypes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this code restructuring and changes in fault handler should be a separate PR.

uint32_t SP;
uint32_t LR;
uint32_t PC;
uint32_t R0_reg;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fault handler changes / fixes should be separate PR please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, refactoring exception handling to use the new error handling framework is one of the goals with this feature.

Copy link

@deepikabhavnani deepikabhavnani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please break the PR, as it is difficult to review in current state.

  1. Error handling changes
  2. Refactor / fixes for fault handler
  3. Replace / Usage of error API in code base

Also I would agree with @geky that this is combination of error logging and error handling. I am fine with the error handling and standardizing error codes. But logging should not be part of this feature, we already have libraries for that. I agree with decisions pending on logging features, but lets not add more API's to make it more chaotic.

Copy link
Contributor

@c1728p9 c1728p9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will have a fairly large impact on the codebase. It is also making many errors that were considered fatal into non-fatal errors by changing error to SET_ERROR. There needs to be a good justification for changing these errors from fatal to non-fatal.

I would like to see a docs page explaining how to use these error codes in a driver or library and when to use what kind of errors. Without a clear idea of how this unified error handling should/will be used, this change will just add confusion. For example, block device an nsapi define their own error codes. Should libraries like this always define their own error codes or should this be moved into the unified error codes?

@mbed-ci
Copy link

mbed-ci commented May 21, 2018

Build : SUCCESS

Build number : 2077
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6859/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

@mbed-ci
Copy link

mbed-ci commented May 21, 2018

@mbed-ci
Copy link

mbed-ci commented May 21, 2018

@SenRamakri SenRamakri force-pushed the sen_ErrorHandling branch from b0557ca to 1b8a9d3 Compare May 21, 2018 12:09
@SenRamakri
Copy link
Contributor Author

/morph build

@mbed-ci
Copy link

mbed-ci commented May 21, 2018

Build : FAILURE

Build number : 2080
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6859/

@adbridge
Copy link
Contributor

@geky @c1728p9 @0xc0170 @bulislaw @deepikabhavnani Are you happy with the updates? Please approve review if this is the case.

@SenRamakri
Copy link
Contributor Author

/morph build

@SenRamakri
Copy link
Contributor Author

@geky @c1728p9 @0xc0170 @bulislaw @deepikabhavnani - Yes, I think I have addressed or answered all of your comments, please take a look at the latest updates.

Copy link
Contributor

@geky geky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • More things need MBED_ prefixes
  • SET_ERROR is not a good name
  • Duplicated error codes between errno and mbed
  • No support for -1
  • Now is not the time to deprecate error


#define GET_MBED_ERROR_TYPE( error_status ) ((error_status & MBED_ERROR_STATUS_TYPE_MASK) >> MBED_ERROR_STATUS_TYPE_POS)
#define GET_MBED_ERROR_MODULE( error_status ) ((error_status & MBED_ERROR_STATUS_MODULE_MASK) >> MBED_ERROR_STATUS_MODULE_POS)
#define GET_MBED_ERROR_CODE( error_status ) (int)((GET_MBED_ERROR_TYPE( error_status ) == ERROR_TYPE_POSIX)?(-error_status):((error_status & MBED_ERROR_STATUS_CODE_MASK) >> MBED_ERROR_STATUS_CODE_POS))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: MBED_ first as a prefix?

- GET_MBED_ERROR_TYPE
+ MBED_GET_ERROR_TYPE

#ifdef MBED_CONF_ERROR_FILENAME_CAPTURE_ENABLED
#define SET_ERROR( error_status, error_msg, error_value ) set_error( error_status, (const char *)error_msg, (uint32_t)error_value, (const char *)MBED_FILENAME, __LINE__ )
#else
#define SET_ERROR( error_status, error_msg, error_value ) set_error( error_status, (const char *)error_msg, (uint32_t)error_value, NULL, 0 )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SET_ERROR is still a confusing name for a function that halts the processor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be a good name? I dont like calling it just error because it sounds like we are declaring/defining something. I want that to be clear that it will be acted upon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm coming in the middle of the review a bit so might have missed some of the nuanced/offline discussions. The current use of error(...) halts execution. To me calling this macro ERROR makes more sense as SET_ERROR implies setting the status of an error code. At least to me ERROR implies it will be acted on.


#ifdef __cplusplus
extern "C" {
#endif
void error(const char* format, ...);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still should not deprecate error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I know why? We are planning to raise requirements for other modules to adopt new error handling. And this needs to be deprecated now before they start doing it.

MBED_DEFINE_SYSTEM_ERROR( HARDFAULT_EXCEPTION ,61 ), /* 317 HardFault exception */
MBED_DEFINE_SYSTEM_ERROR( MEMMANAGE_EXCEPTION ,62 ), /* 318 MemManage exception */
MBED_DEFINE_SYSTEM_ERROR( BUSFAULT_EXCEPTION ,63 ), /* 319 BusFault exception */
MBED_DEFINE_SYSTEM_ERROR( USAGEFAULT_EXCEPTION ,64 ), /* 320 UsageFault exception*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we prefix error codes with MBED_?

These names are common, user libraries may have already defined names such as DEVICE_BUSY

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't see a need for prefix-ing all the error codes again with MBED_. Please note that its not "DEVICE_BUSY", we are actually defining error codes/status using that macro. So, I don't see conflicts there. User libraries can continue to have "DEVICE_BUSY" if they want to.

MBED_DEFINE_POSIX_ERROR( EBADF ,EBADF ), /* 9 Bad file number */
MBED_DEFINE_POSIX_ERROR( ECHILD ,ECHILD ), /* 10 No child processes */
MBED_DEFINE_POSIX_ERROR( EAGAIN ,EAGAIN ), /* 11 Try again */
MBED_DEFINE_POSIX_ERROR( ENOMEM ,ENOMEM ), /* 12 Out of memory */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have duplicated error codes

Copy link
Contributor Author

@SenRamakri SenRamakri May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't call them as duplicates, because Posix error codes are special cases and we cannot capture the module reporting the error. That's why we have to define some of them(common ones) defined under system errors as well for mbed modules to use.

//Below are MBED SYSTEM ERROR CODE definitions
//MBED SYSTEM ERROR CODE definitions starts at offset MBED_SYSTEM_ERROR_BASE, see above.
// Error Name Error Offset Error Code
MBED_DEFINE_SYSTEM_ERROR( UNKNOWN ,0 ), /* 256 Unknown error */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still no support for -1?

Copy link
Contributor Author

@SenRamakri SenRamakri May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, we are unable to do that, because -1 is an absolute value which we cannot encode under system error codes because it always will be Posix error(ERROR_EPERM) and also searching for error codes in source code will get messed up as well. So for now, -1 will not be encoded under system error, but its encoded as ERROR_EPERM. My suggestion is drivers should change to use more appropriate error codes instead of using -1 as a generic error which is unmanageable and tough to track.

@mbed-ci
Copy link

mbed-ci commented May 21, 2018

Build : SUCCESS

Build number : 2082
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6859/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

@mbed-ci
Copy link

mbed-ci commented May 21, 2018

@mbed-ci
Copy link

mbed-ci commented May 21, 2018

@SenRamakri SenRamakri force-pushed the sen_ErrorHandling branch from 0561663 to 0d508a1 Compare May 22, 2018 04:15
@SenRamakri
Copy link
Contributor Author

/morph build

@SenRamakri SenRamakri force-pushed the sen_ErrorHandling branch from 0d508a1 to 7392504 Compare May 22, 2018 04:47
@SenRamakri
Copy link
Contributor Author

/morph build

@cmonr
Copy link
Contributor

cmonr commented May 22, 2018

@SenRamakri I would refrain from starting CI until at a minimum, pr-head and travis CI builds are green.

@mbed-ci
Copy link

mbed-ci commented May 22, 2018

Build : SUCCESS

Build number : 2094
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6859/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

@mbed-ci
Copy link

mbed-ci commented May 22, 2018

Build : SUCCESS

Build number : 2095
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6859/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

@mbed-ci
Copy link

mbed-ci commented May 22, 2018

@adbridge
Copy link
Contributor

@SenRamakri please do not kick off the ci yourself. CI should only ever be kicked off when the PR is in the needs ci state , otherwise this will mess up all the future metrics collection. Technically the review should also be approved before doing so...

@mbed-ci
Copy link

mbed-ci commented May 22, 2018

*/
typedef enum _mbed_module_type
{
MODULE_APPLICATION = 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These guys should also have the MBED_ prefix

* @note This macro generate mbed_error_status_t-es with error type set to MBED_ERROR_TYPE_SYSTEM
*
*/
#define MAKE_ERROR(module, error_code) MAKE_SYSTEM_ERROR(module, error_code)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These guys should also have the MBED_ prefix

* @return mbed_error_status_t code logged for the last error or MBED_SUCCESS if no errors are logged.
*
*/
mbed_error_status_t get_last_error(void);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These guys also need an mbed_ prefix

* @return 0 or MBED_ERROR_SUCCESS on success.
*
*/
mbed_error_status_t make_mbed_error(mbed_error_type_t error_type, mbed_module_type_t module, mbed_error_code_t error_code);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put the mbed_ at the front for consistency, makes it easy to parse for both humans and machines

* @note Filesystem support is required in order for this function to work.
*
*/
mbed_error_status_t save_error_log(const char *path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yeah, these all need mbed_ prefix

*
*/
#define MBED_DEFINE_SYSTEM_ERROR( error_name, error_code ) \
MBED_ERROR_CODE_##error_name = MBED_SYSTEM_ERROR_BASE + error_code, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though: I noticed these names have gotten long, maybe we don't need the full "ERROR_CODE"? Maybe we can just use MBED_CODE_ERROR_NAME? Up to you

@mbed-ci
Copy link

mbed-ci commented May 22, 2018

1 similar comment
@mbed-ci
Copy link

mbed-ci commented May 22, 2018

@SenRamakri
Copy link
Contributor Author

This PR has been moved to #6983 as most people are complaining on not able to access this(showing unicorns).

@SenRamakri SenRamakri closed this May 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.