-
Notifications
You must be signed in to change notification settings - Fork 3k
Error Handling and error coding #6859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that, it would be great to have error codes unified across the OS finally :)
platform/mbed_error.c
Outdated
static MbedErrorHook error_hook = NULL; | ||
|
||
//Helper function to get the current SP | ||
static unsigned int get_current_sp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be useful as a public lib/platform function. Maybe creating platform/mbed_utils.c/h for this kind of utility functions would be a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bulislaw for taking time to review this.
Good suggestion, yes Ill look into if we have enough of these functions which can be filtered out to be separated into its own APIs.
platform/mbed_error.c
Outdated
} | ||
|
||
//Helper function to halt the system | ||
static void mbed_halt_system(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's good for production devices to spin when something goes wrong. I feel like we should try to dump some state somewhere and reset. Or at least provide user level "hard fault"/error hook API the same way we do it for idle hook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point, Thanks for bringing that out, in fact I was thinking the same and was planning to handle this using warm reset+recovery mode. But it does require lot more design/changes and is in plans as of now. The current implementation behaves just the same as before so at least we are not taking it backwards. I'll discuss this with you more offline.
By the way, we do provide an error handling for applications to handle the error if required. That should help I guess.
platform/mbed_error.h
Outdated
extern "C" { | ||
#endif | ||
|
||
//Define this macro to include filenames in error context, this can save memory. For release builds, do not include filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not doxygen, it should be I guess. That's the same across this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we will modify it to be doxygen friendly. Originally it was not intended to be in doxygen, but I think its good idea to document that in doxygen.
platform/mbed_error.h
Outdated
#define MBED_ERROR_STATUS_CODE_POS (0) | ||
#define MBED_ERROR_STATUS_CODE_FIELD_SIZE (16) | ||
|
||
#define MBED_ERROR_STATUS_ENTITY_MASK (0x00FF0000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit pick, could we call it MODULE not ENTITY across the whole error handling? It actually took me a while to figure out what you mean by entity, I think more people will have the same, module is standard term in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll change it to module if its more context appropriate.
@@ -123,15 +123,15 @@ u32_t sys_now(void) { | |||
*---------------------------------------------------------------------------*/ | |||
err_t sys_mbox_new(sys_mbox_t *mbox, int queue_sz) { | |||
if (queue_sz > MB_SIZE) | |||
error("sys_mbox_new size error\n"); | |||
SET_ERROR_FATAL(MAKE_ERROR(ENTITY_NETWORK_STACK, ERROR_CODE_INVALID_SIZE), "sys_mbox_new size error\n", queue_sz); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of points as it's not very user friendly:
- Why
SET_ERROR_FATAL
and notERROR
orMBED_ERROR
? Is there not fatal error? Also the set doesn't add anything to the meaning, I know from the implementation point of view it makes sense, but from user point of view doesn't matter that it sets the error state, for them it's just 'I want to report error'. If the error is not fatal lets call it warning. - I don't like this construct
SET_ERROR_FATAL(MAKE_ERROR(
why do we need to put it here manually every single time? Why notMBED_ERROR(ENTITY_NETWORK_STACK, ERROR_CODE_INVALID_SIZE, "sys_mbox_new size error\n", queue_sz)
and MBED_ERROR can callMAKE_ERROR
on the two first parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I named the functions starting with "Set"(verb) because calling that API changes the state of the system and makes the system do something. Its not a function which is declaring anything and retrieving anything, that's why I thought it should be a verb, so I think it does add meaning, simply saying ERROR doesn't say if its declaring anything or making the system do anything.
Then, MAKE_ERROR is required only if you want to add more info to the error status code. Normally you can use, SET_ERROR( status code(=ERROR_INVALID_ARGUMENT), msg, ... ), for example. But I do see the convenience factor in what you mentioned. I'll try to work on adding more convenience macros if it doesn't make it more confusing. I will also look into the comments on changing non-fatal errors as warnings. Thanks for pointing that out.
platform/mbed_error_log.h
Outdated
#ifdef __cplusplus | ||
extern "C" { | ||
#endif | ||
MbedErrorStatus mbed_log_put_error(mbed_error_ctx *error_ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No docs, and they are public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is not intended to be public, we don't want this to be called directly, public APIs are the ones in mbed_error.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the header, so I think even if consumer are other parts of Mbed OS (outside of file) we should get some simple docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, let me see how we can address this. I was not thinking of exposing it outside. But Ill think about it and if its useful we can expose it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We document public API's with doxygen, and internal global functions similar way just without double asterisk.
Reference: https://github.com/deepikabhavnani/mbed-os/blob/1bfb5b14b65fd32ffc2f58ffc144de859032a494/platform/logger_internal.h#L123
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deepikabhavnani - Good catch, I'll update the doxygen comments.
platform/mbed_error_report.c
Outdated
} | ||
} | ||
|
||
/* Limited print functionality which prints the string out to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not doxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are not doxygen as they are not called publicly. So keeping it more private.
platform/mbed_error_report.c
Outdated
*/ | ||
void mbed_error_print(char *fmtstr, uint32_t *values) | ||
{ | ||
#if DEVICE_SERIAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about ITM and so on? I think we have nice abstraction there, so we can do it once and have it for all backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that, but I found that we only have 2 devices(NRFs) with ITM support as of now, I think we need broader support so that it justifies adding more logic here to route it based on whats present and whats active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the case because we only needed it on these 2, due to small number of UARTs. I can envision a production device with multiple UARTs that are being used by WiFi, GPS, whatever and not having enough for debug.
platform/mbed_error_report.h
Outdated
extern "C" { | ||
#endif | ||
|
||
/* routine to report the error */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not doxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, :-). Its not really intended to be called directly. Its more for platform components to use so keeping non-doxy until we really find a need to expose it.
platform/mbed_retarget.cpp
Outdated
@@ -126,6 +126,7 @@ void remove_filehandle(FileHandle *file) { | |||
#if DEVICE_SERIAL | |||
extern int stdio_uart_inited; | |||
extern serial_t stdio_uart; | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calls below depend on serial, not sure if we can close the #if
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is not expected, I think its some sort of rebase/merge issue. Ill address it.
platform/mbed_error.h
Outdated
\endverbatim | ||
*/ | ||
|
||
typedef enum _MbedErrorCode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's intimidating, we should also set some guidelines inside the system which errors should be used. POSIX errors for POSIX functions and system errors for the rest seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thats good suggestion, Ill document that here and in handbook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bulislaw - Thanks for your review, please see my responses, meanwhile Im working on fixing the code with your feedbacks.
@@ -123,15 +123,15 @@ u32_t sys_now(void) { | |||
*---------------------------------------------------------------------------*/ | |||
err_t sys_mbox_new(sys_mbox_t *mbox, int queue_sz) { | |||
if (queue_sz > MB_SIZE) | |||
error("sys_mbox_new size error\n"); | |||
SET_ERROR_FATAL(MAKE_ERROR(ENTITY_NETWORK_STACK, ERROR_CODE_INVALID_SIZE), "sys_mbox_new size error\n", queue_sz); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I named the functions starting with "Set"(verb) because calling that API changes the state of the system and makes the system do something. Its not a function which is declaring anything and retrieving anything, that's why I thought it should be a verb, so I think it does add meaning, simply saying ERROR doesn't say if its declaring anything or making the system do anything.
Then, MAKE_ERROR is required only if you want to add more info to the error status code. Normally you can use, SET_ERROR( status code(=ERROR_INVALID_ARGUMENT), msg, ... ), for example. But I do see the convenience factor in what you mentioned. I'll try to work on adding more convenience macros if it doesn't make it more confusing. I will also look into the comments on changing non-fatal errors as warnings. Thanks for pointing that out.
platform/mbed_error.c
Outdated
static int error_count = 0; | ||
static mbed_error_ctx first_error_ctx = {0}; | ||
static mbed_error_ctx current_error_ctx = {0}; | ||
static MbedErrorHook error_hook = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix the naming.
platform/mbed_error.c
Outdated
} | ||
|
||
//Helper function to halt the system | ||
static void mbed_halt_system(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point, Thanks for bringing that out, in fact I was thinking the same and was planning to handle this using warm reset+recovery mode. But it does require lot more design/changes and is in plans as of now. The current implementation behaves just the same as before so at least we are not taking it backwards. I'll discuss this with you more offline.
By the way, we do provide an error handling for applications to handle the error if required. That should help I guess.
@@ -57,14 +57,14 @@ int ReadOnlyBlockDevice::read(void *buffer, bd_addr_t addr, bd_size_t size) | |||
|
|||
int ReadOnlyBlockDevice::program(const void *buffer, bd_addr_t addr, bd_size_t size) | |||
{ | |||
error("ReadOnlyBlockDevice::program() not allowed"); | |||
return 0; | |||
SET_ERROR_FATAL(MAKE_ERROR(ENTITY_BLOCK_DEVICE, ERROR_CODE_WRITE_PROTECTED), "ReadOnlyBlockDevice::program() not allowed", addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the return is there only to make compiler happy, either before(with error()) or now its going to bring the system to halt.
platform/mbed_error.c
Outdated
} | ||
|
||
//Use critsect here, as we don't want processing more than one error at the same time | ||
core_util_critical_section_enter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into if we can make the critical section shorter.
platform/mbed_error_log.h
Outdated
#ifdef __cplusplus | ||
extern "C" { | ||
#endif | ||
MbedErrorStatus mbed_log_put_error(mbed_error_ctx *error_ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is not intended to be public, we don't want this to be called directly, public APIs are the ones in mbed_error.h
platform/mbed_error_report.c
Outdated
} | ||
} | ||
|
||
/* Limited print functionality which prints the string out to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are not doxygen as they are not called publicly. So keeping it more private.
platform/mbed_error_report.c
Outdated
*/ | ||
void mbed_error_print(char *fmtstr, uint32_t *values) | ||
{ | ||
#if DEVICE_SERIAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that, but I found that we only have 2 devices(NRFs) with ITM support as of now, I think we need broader support so that it justifies adding more logic here to route it based on whats present and whats active.
platform/mbed_error_report.h
Outdated
extern "C" { | ||
#endif | ||
|
||
/* routine to report the error */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, :-). Its not really intended to be called directly. Its more for platform components to use so keeping non-doxy until we really find a need to expose it.
rtos/Thread.cpp
Outdated
#endif | ||
|
||
void Thread::constructor(uint32_t tz_module, osPriority priority, | ||
void Thread::constructor(osPriority priority, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, that's unexpected, Ill look into that. Looks like rebase issue, good catch :-)
c440fa6
to
a4701f2
Compare
@bulislaw - Please see my new changes for error handling - Most of your feedbacks are taken care of except few. Doxygen comments are fixed, changed from set_error/fatal to set_warning/set_error, itm support added, amount of critical section code is reduced(using a local variable for some portions), moved error hook callback towards the end of error handling, fixed comments and other changes. I'm unable to move the doxygen comments on list of error codes as you mentioned as I see that its not getting rendered as I want to be and is messing up other portions when I try to do that. Also, Im not going to add more convenience macros as I see that its starting to look more confusing as I add more macros as I cant remove the old ones because those macros work well for generic error statuses(without having entity info). So, for now I'm thinking its better to keep it how it is now. |
bcc30e8
to
7a4f6ce
Compare
/morph build |
Build : FAILUREBuild number : 2017 |
platform/mbed_error_report.c
Outdated
} | ||
|
||
//Helper function to get the current SP | ||
static unsigned int get_current_sp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this one needs to take into account cortex-a (see the build failure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks for pointing that out, I'm looking into that. It appears there are no equivalent Mbed-os intrinsics for Cortex-A and so I'm looking into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell this is a mush of two new features that aren't all that related to each other (error logging and error codes). We shouldn't be creating PRs like this, it's harder to review and less likely to get in.
And sorry about putting a red X, but there are a few problems we need to take care of before this can go in:
-
We should not have redundant error codes for the same error. Should I use
ENOMEM
orOUT_OF_MEMORY
? Why are these two different error codes in our code base?Thinking about it, if you are encoding more information in the error codes, it would be better to just change the value of ENOMEM to whatever encoding you are using for OUT_OF_MEMORY. This would let users use both ENOMEM and OUT_OF_MEMORY without additional code size for translating between the two.
-
Should this error logging be tied to @deepikabhavnani's logging pr, Unified logging - String Based #5965? If Unified logging - String Based #5965 does not make it in, I think we should hold this one back as well so that the API churn around logging is kept to a minimum.
-
With the addition of the new error mechanisms, we should find out the memory consumption before and after with some example, and have those numbers available in this PR. The error code cannot be optimized out and is a cost to every application using mbed-os. @pan- did quite a bit of optimizing the previous error mechanism.
platform/mbed_error.h
Outdated
* @param error_code Error code value to be used, must be between 1 and 255(inclusive). | ||
* | ||
*/ | ||
#define DEFINE_POSIX_ERROR( error_name, error_code ) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New macros should have MBED_ prefixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can rename this to add MBED_ prefix.
platform/mbed_error.h
Outdated
* This is to enable easy injection of Posix error codes into MbedOS error handling system without altering the actual Posix error values.\n | ||
* Accordingly, Posix error codes are represented as -1 to -255 under MbedOS error status representation. | ||
*/ | ||
typedef int MbedErrorStatus; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typedefs be something like mbed_error_t
?
Especially for C developers, this appears very verbose and heavy-weight, which will discourage its adoption.
But maybe that's fine if we're ok with users using int
to represent errors?
I like that this is backwards-compatibile with our current error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does mbed_error_status_t work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot of typing, why not mbed_error_t
or mbed_status_t
?
platform/mbed_error.h
Outdated
#ifdef MBED_CONF_ERROR_FILENAME_CAPTURE_ENABLED | ||
#define SET_ERROR( error_status, error_msg, error_value ) set_error( error_status, (const char *)error_msg, (uint32_t)error_value, (const char *)MBED_FILENAME, __LINE__ ) | ||
#else | ||
#define SET_ERROR( error_status, error_msg, error_value ) set_error( error_status, (const char *)error_msg, (uint32_t)error_value, NULL, 0 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SET_ERROR sounds like it's setting an errno variable. Why not keep this verb "error"? Maybe make it match the style for macros as MBED_ERROR? Or if you need to differentiate between error logging and halting maybe MBED_FAIL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already discussed about this, please see my response to @bulislaw's comments earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why
SET_ERROR_FATAL
and notERROR
orMBED_ERROR
? Is there not fatal error? Also the set doesn't add anything to the meaning, I know from the implementation point of view it makes sense, but from user point of view doesn't matter that it sets the error state, for them it's just 'I want to report error'. If the error is not fatal lets call it warning.
Let me know if I'm looking at the wrong conversation, but it looks like @bulislaw and I agree, the SET_* prefix is going to be confusing for users.
Consider @c1728p9's comment below, even we are getting confused by the goal of this function.
|
||
#ifdef __cplusplus | ||
extern "C" { | ||
#endif | ||
void error(const char* format, ...); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! Deprecating error
! I think this is the wrong place for this.
How about we keep the new implementation but wait to deprecate until the logging changes? That way debug
and error
can be deprecated at the same time, and users will get less deprecation churn.
And even then we really shouldn't deprecate error
or debug
until we're sure logging is stable. There's no real rush to deprecate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging changes are still being reviewed/and other ideas being discussed. So, this is not getting deprecated as part of logging. So, no need to wait for logging/tracing to happen. Logging may introduce error-logging APIs which is different from error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @geky here, we should not deprecate error
. Current error
API does both logging and handling of error, it is more for critical errors which will result in application termination.
However with error handling we would not terminate on all errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SET_ERROR does terminate the application, just as how error did. SET_WARNING does not terminate the application but will log the error context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current API does only the logging and termination, the goal here is to capture more actionable data which helps developers to debug the cause of app crashing etc. By the way, error logging is not same as logging or tracing implementation we are doing, they are two different things. But we can use some of the aspects of unified tracing in the error handling mechanism once tracing is in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not deprecate error
until we deprecate debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we cannot use mbed_trace as it uses too much resources+stdlibs, and the idea is to have an implementation to output to serial/uart without using much of system resources. If you have noticed the mbed_error_print is minimal implementation which was already there as part of exception handling, but we moved it under mbed_error_report to be used for error handling as well. By the error handling uses crit sect, so it will not corrupt other output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that nothing should be hard-coded to serial/uart - we've got full console abstraction now. There should be no serial_putc
s in any of these paths, and one of the outstanding jobs is to remove them all in favour of write(STDERR_FILENO)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjbracey-arm - I see your point, but the intention here is to use as much less resources as possible(without even using any stdlibs as allowed) as this is the same path for exception handling. So, using those high level abstracted calls may not work in all cases for example when we are dealing with an fatal exception. Just to give you more info, we are planning/thinking on a recovery mode feature where in we reset the system without re-initing the piece memory having the error info and then on reboot we can print that info out through "write" calls normally, as you mentioned. At that point, we can get rid of special handling when dealing with exception handling/errors handling. As of now, this looks to be the best option. If you noticed, this is actually not new, this piece of code has been moved from the exception handling path to be used for fatal error paths as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not saying we need to adopt mbed-trace right now, I'm saying we should not deprecate error
until we deprecate debug
. You have an implementation or error
here, so just remove the deprecation.
There's no urgency to deprecate error
and users don't like making multiple changes.
platform/mbed_error.h
Outdated
* @return void | ||
* | ||
*/ | ||
typedef void (*MbedErrorHook)(const mbed_error_ctx *error_ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typedefs should not be CamelCase
* @return int Number of errors reported. | ||
* | ||
*/ | ||
int get_error_count(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean errors are recoverable? I thought they would halt the CPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, fatal errors are not recoverable. But eventually we need to extend this to implement a recovery mode and I hope we can recover from fatal errors by doing a warm reset and still keep the error info in tact and to be reported later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh interesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we currently have a requirement to implement recovery mode.
platform/mbed_error.h
Outdated
* @note Filesystem support is required in order for this function to work. | ||
* | ||
*/ | ||
MbedErrorStatus save_error_log(const char *path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An error log? Is this opt-in? Do we know how much of a footprint this has?
@@ -123,15 +123,15 @@ u32_t sys_now(void) { | |||
*---------------------------------------------------------------------------*/ | |||
err_t sys_mbox_new(sys_mbox_t *mbox, int queue_sz) { | |||
if (queue_sz > MB_SIZE) | |||
error("sys_mbox_new size error\n"); | |||
SET_ERROR(MAKE_ERROR(MODULE_NETWORK_STACK, ERROR_CODE_INVALID_SIZE), "sys_mbox_new size error\n", queue_sz); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this, I think we should collapse these macros.
SET_ERROR(MODULE_NETWORK_STACK, ERROR_CODE_INVALID_SIZE, "sys_mbox_new size error\n", queue_sz);
// and
SET_CUSTOM_ERROR(MODULE_NETWORK_STACK, ERROR_CODE_INVALID_SIZE, "sys_mbox_new size error\n", queue_sz);
Is the extra macro name needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats good idea, if we collapse do you any suggestions for using the custom defined error?
If we use the same macro, how does the caller use a custom defined error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, error logging can be completely disabled using a config mechanism, but by default it is enabled. Note that we will still capture the first and last error in the system which is always enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does having two macros, SET_ERROR and SET_CUSTOM_ERROR not work?
platform/mbed_error.h
Outdated
* passed as the second argument in the DEFINE_SYSTEM_ERROR macro.\n\n | ||
* Below are the Mbed System error codes and the description: | ||
* \verbatim | ||
UNKNOWN 256 Unknown error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already pretty well established as -1. Can we keep -1 as an unknown error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about it but -1 maps to Posix Error Code. So, mapping -1 again to unknown error maybe confusing. And it also doesn't align with our overall error code definition/format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking more we should renumber the std errno codes (see above). This would free up -1 as general error.
Actually even if it conflicts, its just conflicting with EPERM (not EACCESS), which is pretty close to a generic error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is to include posix error codes as part of mbed error handling model so the modules already using posix still have the same meaning. The problem with conflicting error codes is that it becomes difficult if some one gets an error and wants to search source code for where that came from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are far more modules returning -1 for errors than POSIX errors. Back in mbed 2 this was the correct thing to do:
https://os.mbed.com/users/mbed_official/code/SDFileSystem/file/8db0d3b02cec/SDFileSystem.cpp
The only thing worse than conflicting error codes are error codes with misleading names.
This also does not become a problem if we renumber the std errno codes.
MbedErrorStatus error = MAKE_ERROR(MODULE_APPLICATION, ERROR_CODE_UNKNOWN); | ||
SET_WARNING(error, "Error Unknown", 0xAABBCCDD ); | ||
MbedErrorStatus lastError = get_last_error(); | ||
printf("\nlastError = 0x%08X", lastError ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this print be part of final test code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, we can remove those prints.
void test_system_errors() | ||
{ | ||
MbedErrorStatus error = MAKE_ERROR(MODULE_APPLICATION, ERROR_CODE_UNKNOWN); | ||
SET_WARNING(error, "Error Unknown", 0xAABBCCDD ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have SET_WARNING as an API? how is this different then tracing feature MBED_WARN or tr_warn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SET_WARNING captures error context, can call application hooks, stores the error error context etc but tracing does not(or rather should not). Error handling should be seen differently from tracing and both are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree error handling provides more functionality which is not same as tracing. My point islogs / print messages should not be handled in this feature. Log messages printed using SET_WARNING / SET_ERROR are not line interleaved and will corrupt data displayed on UART.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my reply above, unfortunately we cannot use mbed_trace APIs as of now due to its extensive use of resources. Note that this needs to work with handling of exceptions as well.
void fault_print_init(void); | ||
void fault_print_str(char *fmtstr, uint32_t *values); | ||
void hex_to_str(uint32_t value, char *hex_star); | ||
//Functions Prototypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this code restructuring and changes in fault handler should be a separate PR.
uint32_t SP; | ||
uint32_t LR; | ||
uint32_t PC; | ||
uint32_t R0_reg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fault handler changes / fixes should be separate PR please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, refactoring exception handling to use the new error handling framework is one of the goals with this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please break the PR, as it is difficult to review in current state.
- Error handling changes
- Refactor / fixes for fault handler
- Replace / Usage of
error
API in code base
Also I would agree with @geky that this is combination of error logging and error handling. I am fine with the error handling and standardizing error codes. But logging should not be part of this feature, we already have libraries for that. I agree with decisions pending on logging features, but lets not add more API's to make it more chaotic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will have a fairly large impact on the codebase. It is also making many errors that were considered fatal into non-fatal errors by changing error
to SET_ERROR. There needs to be a good justification for changing these errors from fatal to non-fatal.
I would like to see a docs page explaining how to use these error codes in a driver or library and when to use what kind of errors. Without a clear idea of how this unified error handling should/will be used, this change will just add confusion. For example, block device an nsapi define their own error codes. Should libraries like this always define their own error codes or should this be moved into the unified error codes?
Build : SUCCESSBuild number : 2077 Triggering tests/morph test |
Test : FAILUREBuild number : 1887 |
Exporter Build : SUCCESSBuild number : 1722 |
b0557ca
to
1b8a9d3
Compare
/morph build |
Build : FAILUREBuild number : 2080 |
@geky @c1728p9 @0xc0170 @bulislaw @deepikabhavnani Are you happy with the updates? Please approve review if this is the case. |
/morph build |
@geky @c1728p9 @0xc0170 @bulislaw @deepikabhavnani - Yes, I think I have addressed or answered all of your comments, please take a look at the latest updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- More things need
MBED_
prefixes - SET_ERROR is not a good name
- Duplicated error codes between errno and mbed
- No support for -1
- Now is not the time to deprecate
error
platform/mbed_error.h
Outdated
|
||
#define GET_MBED_ERROR_TYPE( error_status ) ((error_status & MBED_ERROR_STATUS_TYPE_MASK) >> MBED_ERROR_STATUS_TYPE_POS) | ||
#define GET_MBED_ERROR_MODULE( error_status ) ((error_status & MBED_ERROR_STATUS_MODULE_MASK) >> MBED_ERROR_STATUS_MODULE_POS) | ||
#define GET_MBED_ERROR_CODE( error_status ) (int)((GET_MBED_ERROR_TYPE( error_status ) == ERROR_TYPE_POSIX)?(-error_status):((error_status & MBED_ERROR_STATUS_CODE_MASK) >> MBED_ERROR_STATUS_CODE_POS)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: MBED_ first as a prefix?
- GET_MBED_ERROR_TYPE
+ MBED_GET_ERROR_TYPE
platform/mbed_error.h
Outdated
#ifdef MBED_CONF_ERROR_FILENAME_CAPTURE_ENABLED | ||
#define SET_ERROR( error_status, error_msg, error_value ) set_error( error_status, (const char *)error_msg, (uint32_t)error_value, (const char *)MBED_FILENAME, __LINE__ ) | ||
#else | ||
#define SET_ERROR( error_status, error_msg, error_value ) set_error( error_status, (const char *)error_msg, (uint32_t)error_value, NULL, 0 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SET_ERROR is still a confusing name for a function that halts the processor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be a good name? I dont like calling it just error because it sounds like we are declaring/defining something. I want that to be clear that it will be acted upon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm coming in the middle of the review a bit so might have missed some of the nuanced/offline discussions. The current use of error(...)
halts execution. To me calling this macro ERROR
makes more sense as SET_ERROR
implies setting the status of an error code. At least to me ERROR
implies it will be acted on.
|
||
#ifdef __cplusplus | ||
extern "C" { | ||
#endif | ||
void error(const char* format, ...); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still should not deprecate error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I know why? We are planning to raise requirements for other modules to adopt new error handling. And this needs to be deprecated now before they start doing it.
MBED_DEFINE_SYSTEM_ERROR( HARDFAULT_EXCEPTION ,61 ), /* 317 HardFault exception */ | ||
MBED_DEFINE_SYSTEM_ERROR( MEMMANAGE_EXCEPTION ,62 ), /* 318 MemManage exception */ | ||
MBED_DEFINE_SYSTEM_ERROR( BUSFAULT_EXCEPTION ,63 ), /* 319 BusFault exception */ | ||
MBED_DEFINE_SYSTEM_ERROR( USAGEFAULT_EXCEPTION ,64 ), /* 320 UsageFault exception*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we prefix error codes with MBED_
?
These names are common, user libraries may have already defined names such as DEVICE_BUSY
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't see a need for prefix-ing all the error codes again with MBED_. Please note that its not "DEVICE_BUSY", we are actually defining error codes/status using that macro. So, I don't see conflicts there. User libraries can continue to have "DEVICE_BUSY" if they want to.
MBED_DEFINE_POSIX_ERROR( EBADF ,EBADF ), /* 9 Bad file number */ | ||
MBED_DEFINE_POSIX_ERROR( ECHILD ,ECHILD ), /* 10 No child processes */ | ||
MBED_DEFINE_POSIX_ERROR( EAGAIN ,EAGAIN ), /* 11 Try again */ | ||
MBED_DEFINE_POSIX_ERROR( ENOMEM ,ENOMEM ), /* 12 Out of memory */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still have duplicated error codes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't call them as duplicates, because Posix error codes are special cases and we cannot capture the module reporting the error. That's why we have to define some of them(common ones) defined under system errors as well for mbed modules to use.
//Below are MBED SYSTEM ERROR CODE definitions | ||
//MBED SYSTEM ERROR CODE definitions starts at offset MBED_SYSTEM_ERROR_BASE, see above. | ||
// Error Name Error Offset Error Code | ||
MBED_DEFINE_SYSTEM_ERROR( UNKNOWN ,0 ), /* 256 Unknown error */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still no support for -1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, we are unable to do that, because -1 is an absolute value which we cannot encode under system error codes because it always will be Posix error(ERROR_EPERM) and also searching for error codes in source code will get messed up as well. So for now, -1 will not be encoded under system error, but its encoded as ERROR_EPERM. My suggestion is drivers should change to use more appropriate error codes instead of using -1 as a generic error which is unmanageable and tough to track.
Build : SUCCESSBuild number : 2082 Triggering tests/morph test |
Test : FAILUREBuild number : 1892 |
Exporter Build : SUCCESSBuild number : 1726 |
0561663
to
0d508a1
Compare
/morph build |
…o changes and test fixes
0d508a1
to
7392504
Compare
/morph build |
@SenRamakri I would refrain from starting CI until at a minimum, pr-head and travis CI builds are green. |
Build : SUCCESSBuild number : 2094 Triggering tests/morph test |
Build : SUCCESSBuild number : 2095 Triggering tests/morph test |
Test : SUCCESSBuild number : 1904 |
@SenRamakri please do not kick off the ci yourself. CI should only ever be kicked off when the PR is in the needs ci state , otherwise this will mess up all the future metrics collection. Technically the review should also be approved before doing so... |
Test : FAILUREBuild number : 1905 |
*/ | ||
typedef enum _mbed_module_type | ||
{ | ||
MODULE_APPLICATION = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These guys should also have the MBED_ prefix
* @note This macro generate mbed_error_status_t-es with error type set to MBED_ERROR_TYPE_SYSTEM | ||
* | ||
*/ | ||
#define MAKE_ERROR(module, error_code) MAKE_SYSTEM_ERROR(module, error_code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These guys should also have the MBED_ prefix
* @return mbed_error_status_t code logged for the last error or MBED_SUCCESS if no errors are logged. | ||
* | ||
*/ | ||
mbed_error_status_t get_last_error(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These guys also need an mbed_ prefix
* @return 0 or MBED_ERROR_SUCCESS on success. | ||
* | ||
*/ | ||
mbed_error_status_t make_mbed_error(mbed_error_type_t error_type, mbed_module_type_t module, mbed_error_code_t error_code); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put the mbed_ at the front for consistency, makes it easy to parse for both humans and machines
* @note Filesystem support is required in order for this function to work. | ||
* | ||
*/ | ||
mbed_error_status_t save_error_log(const char *path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But yeah, these all need mbed_ prefix
* | ||
*/ | ||
#define MBED_DEFINE_SYSTEM_ERROR( error_name, error_code ) \ | ||
MBED_ERROR_CODE_##error_name = MBED_SYSTEM_ERROR_BASE + error_code, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though: I noticed these names have gotten long, maybe we don't need the full "ERROR_CODE"? Maybe we can just use MBED_CODE_ERROR_NAME? Up to you
Exporter Build : SUCCESSBuild number : 1736 |
1 similar comment
Exporter Build : SUCCESSBuild number : 1736 |
This PR has been moved to #6983 as most people are complaining on not able to access this(showing unicorns). |
This is the implementation for Standardized error coding and handling for mbed-os.
The changes with this pull request implements standardized error code handling and error code definitions/encoding. With this, error handling is implemented as a Core-os service which captures and
logs the errors with context information. Each error code is captured with error location, error message,
error file(this is configurable as this can cause increase in binary size), calling thread info, module generating the error etc. Relevant APIs are provided to log and retrieve errors from error logging system.
A build time flag has been provided to switch off error logging system on memory constrained platforms.
Error reporting itself is logically separated and implemented in mbed_error_reporting.c/h. By doing this both exception handling and error handling can use the same reporting infrastructure. A sample error report is as below. Note that error status value captures error type, entity and error code. Error reporting prints these values separately as well to enable developers to search the source tree with error code.
Error handling service also provides an API to save the error log in file system if required.
The implementation is tested using the Greentea tests under TESTS/mbed_platform/error_handling.
Note that exception handling and reporting testing cannot be done using Greentea and was manually tested.
Pull request type
The APIs/data structures/macros are all documented using Doxygen. In addition handbook documentation will also be generated for release.
Overall impact of this feature is ~1K(ROM) including logging. Logging can be disabled and if disabled
the overall impact would be around 0.75K.