Error Handling and error coding #6859

SenRamakri · 2018-05-10T03:12:33Z

This is the implementation for Standardized error coding and handling for mbed-os.
The changes with this pull request implements standardized error code handling and error code definitions/encoding. With this, error handling is implemented as a Core-os service which captures and
logs the errors with context information. Each error code is captured with error location, error message,
error file(this is configurable as this can cause increase in binary size), calling thread info, module generating the error etc. Relevant APIs are provided to log and retrieve errors from error logging system.
A build time flag has been provided to switch off error logging system on memory constrained platforms.
Error reporting itself is logically separated and implemented in mbed_error_reporting.c/h. By doing this both exception handling and error handling can use the same reporting infrastructure. A sample error report is as below. Note that error status value captures error type, entity and error code. Error reporting prints these values separately as well to enable developers to search the source tree with error code.
Error handling service also provides an API to save the error log in file system if required.

++ MbedOS Error Info ++
Error Status: 0x80040103
Error Code: 259
Error Entity: 04
Error Message: HAL Entity error
Error Location: 0x000067C7
Error Value: 0x00005566
Current Thread: Id: 0x200024A8 EntryFn: 0x0000FB0D StackSize: 0x00001000 StackMem: 0x200014A8 SP: 0x2002FFD8
-- MbedOS Error Info --

The implementation is tested using the Greentea tests under TESTS/mbed_platform/error_handling.
Note that exception handling and reporting testing cannot be done using Greentea and was manually tested.

Pull request type

[ ] Fix
[ ] Refactor
[ ] New target
[X] Feature
[ ] Breaking change

The APIs/data structures/macros are all documented using Doxygen. In addition handbook documentation will also be generated for release.

Overall impact of this feature is ~1K(ROM) including logging. Logging can be disabled and if disabled
the overall impact would be around 0.75K.

bulislaw

Thanks for that, it would be great to have error codes unified across the OS finally :)

bulislaw · 2018-05-10T08:50:31Z

platform/mbed_error.c

+static MbedErrorHook error_hook = NULL;
+
+//Helper function to get the current SP
+static unsigned int get_current_sp()


That could be useful as a public lib/platform function. Maybe creating platform/mbed_utils.c/h for this kind of utility functions would be a good idea.

Thanks @bulislaw for taking time to review this.
Good suggestion, yes Ill look into if we have enough of these functions which can be filtered out to be separated into its own APIs.

bulislaw · 2018-05-10T08:51:26Z

platform/mbed_error.c

+}
+
+//Helper function to halt the system
+static void mbed_halt_system(void)


I'm not sure it's good for production devices to spin when something goes wrong. I feel like we should try to dump some state somewhere and reset. Or at least provide user level "hard fault"/error hook API the same way we do it for idle hook.

Great point, Thanks for bringing that out, in fact I was thinking the same and was planning to handle this using warm reset+recovery mode. But it does require lot more design/changes and is in plans as of now. The current implementation behaves just the same as before so at least we are not taking it backwards. I'll discuss this with you more offline.

By the way, we do provide an error handling for applications to handle the error if required. That should help I guess.

bulislaw · 2018-05-10T08:57:21Z

platform/mbed_error.h

+extern "C" {
+#endif
+
+//Define this macro to include filenames in error context, this can save memory. For release builds, do not include filename    


That's not doxygen, it should be I guess. That's the same across this file.

Ok, we will modify it to be doxygen friendly. Originally it was not intended to be in doxygen, but I think its good idea to document that in doxygen.

bulislaw · 2018-05-10T08:59:55Z

platform/mbed_error.h

+#define MBED_ERROR_STATUS_CODE_POS                  (0)
+#define MBED_ERROR_STATUS_CODE_FIELD_SIZE           (16)
+
+#define MBED_ERROR_STATUS_ENTITY_MASK               (0x00FF0000)


Nit pick, could we call it MODULE not ENTITY across the whole error handling? It actually took me a while to figure out what you mean by entity, I think more people will have the same, module is standard term in this context.

Ok, I'll change it to module if its more context appropriate.

bulislaw · 2018-05-10T09:06:25Z

features/FEATURE_LWIP/lwip-interface/lwip-sys/arch/lwip_sys_arch.c

@@ -123,15 +123,15 @@ u32_t sys_now(void) {
 *---------------------------------------------------------------------------*/
 err_t sys_mbox_new(sys_mbox_t *mbox, int queue_sz) {
    if (queue_sz > MB_SIZE)
-        error("sys_mbox_new size error\n");
+        SET_ERROR_FATAL(MAKE_ERROR(ENTITY_NETWORK_STACK, ERROR_CODE_INVALID_SIZE), "sys_mbox_new size error\n", queue_sz);


Couple of points as it's not very user friendly:

Why SET_ERROR_FATAL and not ERROR or MBED_ERROR? Is there not fatal error? Also the set doesn't add anything to the meaning, I know from the implementation point of view it makes sense, but from user point of view doesn't matter that it sets the error state, for them it's just 'I want to report error'. If the error is not fatal lets call it warning.

I don't like this construct SET_ERROR_FATAL(MAKE_ERROR( why do we need to put it here manually every single time? Why not MBED_ERROR(ENTITY_NETWORK_STACK, ERROR_CODE_INVALID_SIZE, "sys_mbox_new size error\n", queue_sz) and MBED_ERROR can call MAKE_ERROR on the two first parameters.

I named the functions starting with "Set"(verb) because calling that API changes the state of the system and makes the system do something. Its not a function which is declaring anything and retrieving anything, that's why I thought it should be a verb, so I think it does add meaning, simply saying ERROR doesn't say if its declaring anything or making the system do anything.

Then, MAKE_ERROR is required only if you want to add more info to the error status code. Normally you can use, SET_ERROR( status code(=ERROR_INVALID_ARGUMENT), msg, ... ), for example. But I do see the convenience factor in what you mentioned. I'll try to work on adding more convenience macros if it doesn't make it more confusing. I will also look into the comments on changing non-fatal errors as warnings. Thanks for pointing that out.

bulislaw · 2018-05-10T11:01:48Z

platform/mbed_error_log.h

+#ifdef __cplusplus
+extern "C" {
+#endif
+MbedErrorStatus mbed_log_put_error(mbed_error_ctx *error_ctx);


No docs, and they are public.

Actually this is not intended to be public, we don't want this to be called directly, public APIs are the ones in mbed_error.h

It's in the header, so I think even if consumer are other parts of Mbed OS (outside of file) we should get some simple docs.

I see, let me see how we can address this. I was not thinking of exposing it outside. But Ill think about it and if its useful we can expose it.

We document public API's with doxygen, and internal global functions similar way just without double asterisk.
Reference: https://github.com/deepikabhavnani/mbed-os/blob/1bfb5b14b65fd32ffc2f58ffc144de859032a494/platform/logger_internal.h#L123

@deepikabhavnani - Good catch, I'll update the doxygen comments.

bulislaw · 2018-05-10T11:07:47Z

platform/mbed_error_report.c

+    }
+}
+
+/* Limited print functionality which prints the string out to 


That's not doxy.

Yes, they are not doxygen as they are not called publicly. So keeping it more private.

bulislaw · 2018-05-10T11:08:35Z

platform/mbed_error_report.c

+*/
+void mbed_error_print(char *fmtstr, uint32_t *values)
+{
+#if DEVICE_SERIAL


What about ITM and so on? I think we have nice abstraction there, so we can do it once and have it for all backends.

I thought about that, but I found that we only have 2 devices(NRFs) with ITM support as of now, I think we need broader support so that it justifies adding more logic here to route it based on whats present and whats active.

That's the case because we only needed it on these 2, due to small number of UARTs. I can envision a production device with multiple UARTs that are being used by WiFi, GPS, whatever and not having enough for debug.

bulislaw · 2018-05-10T11:09:28Z

platform/mbed_error_report.h

+extern "C" {
+#endif
+
+/* routine to report the error */


It's not doxy.

Yes, :-). Its not really intended to be called directly. Its more for platform components to use so keeping non-doxy until we really find a need to expose it.

bulislaw · 2018-05-10T11:10:28Z

platform/mbed_retarget.cpp

@@ -126,6 +126,7 @@ void remove_filehandle(FileHandle *file) {
 #if DEVICE_SERIAL
 extern int stdio_uart_inited;
 extern serial_t stdio_uart;
+#endif


The calls below depend on serial, not sure if we can close the #if here.

Oh, this is not expected, I think its some sort of rebase/merge issue. Ill address it.

bulislaw · 2018-05-10T11:15:56Z

platform/mbed_error.h

+    \endverbatim
+ */
+
+typedef enum _MbedErrorCode


That's intimidating, we should also set some guidelines inside the system which errors should be used. POSIX errors for POSIX functions and system errors for the rest seems reasonable.

Ok, thats good suggestion, Ill document that here and in handbook.

SenRamakri

@bulislaw - Thanks for your review, please see my responses, meanwhile Im working on fixing the code with your feedbacks.

SenRamakri · 2018-05-10T16:18:52Z

features/FEATURE_LWIP/lwip-interface/lwip-sys/arch/lwip_sys_arch.c

@@ -123,15 +123,15 @@ u32_t sys_now(void) {
 *---------------------------------------------------------------------------*/
 err_t sys_mbox_new(sys_mbox_t *mbox, int queue_sz) {
    if (queue_sz > MB_SIZE)
-        error("sys_mbox_new size error\n");
+        SET_ERROR_FATAL(MAKE_ERROR(ENTITY_NETWORK_STACK, ERROR_CODE_INVALID_SIZE), "sys_mbox_new size error\n", queue_sz);


I named the functions starting with "Set"(verb) because calling that API changes the state of the system and makes the system do something. Its not a function which is declaring anything and retrieving anything, that's why I thought it should be a verb, so I think it does add meaning, simply saying ERROR doesn't say if its declaring anything or making the system do anything.

Then, MAKE_ERROR is required only if you want to add more info to the error status code. Normally you can use, SET_ERROR( status code(=ERROR_INVALID_ARGUMENT), msg, ... ), for example. But I do see the convenience factor in what you mentioned. I'll try to work on adding more convenience macros if it doesn't make it more confusing. I will also look into the comments on changing non-fatal errors as warnings. Thanks for pointing that out.

SenRamakri · 2018-05-10T16:23:23Z

platform/mbed_error.c

+static int error_count = 0;
+static mbed_error_ctx first_error_ctx = {0};
+static mbed_error_ctx current_error_ctx = {0};
+static MbedErrorHook error_hook = NULL;


I'll fix the naming.

SenRamakri · 2018-05-10T16:25:46Z

platform/mbed_error.c

+}
+
+//Helper function to halt the system
+static void mbed_halt_system(void)


Great point, Thanks for bringing that out, in fact I was thinking the same and was planning to handle this using warm reset+recovery mode. But it does require lot more design/changes and is in plans as of now. The current implementation behaves just the same as before so at least we are not taking it backwards. I'll discuss this with you more offline.

By the way, we do provide an error handling for applications to handle the error if required. That should help I guess.

SenRamakri · 2018-05-10T16:42:12Z

features/filesystem/bd/ReadOnlyBlockDevice.cpp

@@ -57,14 +57,14 @@ int ReadOnlyBlockDevice::read(void *buffer, bd_addr_t addr, bd_size_t size)

 int ReadOnlyBlockDevice::program(const void *buffer, bd_addr_t addr, bd_size_t size)
 {
-    error("ReadOnlyBlockDevice::program() not allowed");
-    return 0;
+    SET_ERROR_FATAL(MAKE_ERROR(ENTITY_BLOCK_DEVICE, ERROR_CODE_WRITE_PROTECTED), "ReadOnlyBlockDevice::program() not allowed", addr);


I think the return is there only to make compiler happy, either before(with error()) or now its going to bring the system to halt.

SenRamakri · 2018-05-10T16:43:34Z

platform/mbed_error.c

+    }
+
+    //Use critsect here, as we don't want processing more than one error at the same time
+    core_util_critical_section_enter();


I'll look into if we can make the critical section shorter.

SenRamakri · 2018-05-10T17:26:35Z

platform/mbed_error_log.h

+#ifdef __cplusplus
+extern "C" {
+#endif
+MbedErrorStatus mbed_log_put_error(mbed_error_ctx *error_ctx);


Actually this is not intended to be public, we don't want this to be called directly, public APIs are the ones in mbed_error.h

SenRamakri · 2018-05-10T17:27:23Z

platform/mbed_error_report.c

+    }
+}
+
+/* Limited print functionality which prints the string out to 


Yes, they are not doxygen as they are not called publicly. So keeping it more private.

SenRamakri · 2018-05-10T17:31:10Z

platform/mbed_error_report.c

+*/
+void mbed_error_print(char *fmtstr, uint32_t *values)
+{
+#if DEVICE_SERIAL


I thought about that, but I found that we only have 2 devices(NRFs) with ITM support as of now, I think we need broader support so that it justifies adding more logic here to route it based on whats present and whats active.

SenRamakri · 2018-05-10T17:32:13Z

platform/mbed_error_report.h

+extern "C" {
+#endif
+
+/* routine to report the error */


Yes, :-). Its not really intended to be called directly. Its more for platform components to use so keeping non-doxy until we really find a need to expose it.

SenRamakri · 2018-05-10T17:33:29Z

rtos/Thread.cpp

-#endif
-
-void Thread::constructor(uint32_t tz_module, osPriority priority,
+void Thread::constructor(osPriority priority,


Ah yes, that's unexpected, Ill look into that. Looks like rebase issue, good catch :-)

SenRamakri · 2018-05-12T15:46:04Z

@bulislaw - Please see my new changes for error handling - Most of your feedbacks are taken care of except few. Doxygen comments are fixed, changed from set_error/fatal to set_warning/set_error, itm support added, amount of critical section code is reduced(using a local variable for some portions), moved error hook callback towards the end of error handling, fixed comments and other changes. I'm unable to move the doxygen comments on list of error codes as you mentioned as I see that its not getting rendered as I want to be and is messing up other portions when I try to do that. Also, Im not going to add more convenience macros as I see that its starting to look more confusing as I add more macros as I cant remove the old ones because those macros work well for generic error statuses(without having entity info). So, for now I'm thinking its better to keep it how it is now.

SenRamakri · 2018-05-15T06:43:55Z

/morph build

mbed-ci · 2018-05-15T07:21:15Z

Build : FAILURE

Build number : 2017
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6859/

0xc0170 · 2018-05-15T08:31:51Z

platform/mbed_error_report.c

+}
+
+//Helper function to get the current SP
+static unsigned int get_current_sp()


I believe this one needs to take into account cortex-a (see the build failure).

Yes, thanks for pointing that out, I'm looking into that. It appears there are no equivalent Mbed-os intrinsics for Cortex-A and so I'm looking into this.

geky

From what I can tell this is a mush of two new features that aren't all that related to each other (error logging and error codes). We shouldn't be creating PRs like this, it's harder to review and less likely to get in.

And sorry about putting a red X, but there are a few problems we need to take care of before this can go in:

We should not have redundant error codes for the same error. Should I use ENOMEM or OUT_OF_MEMORY? Why are these two different error codes in our code base?

Thinking about it, if you are encoding more information in the error codes, it would be better to just change the value of ENOMEM to whatever encoding you are using for OUT_OF_MEMORY. This would let users use both ENOMEM and OUT_OF_MEMORY without additional code size for translating between the two.
Should this error logging be tied to @deepikabhavnani's logging pr, Unified logging - String Based #5965? If Unified logging - String Based #5965 does not make it in, I think we should hold this one back as well so that the API churn around logging is kept to a minimum.
With the addition of the new error mechanisms, we should find out the memory consumption before and after with some example, and have those numbers available in this PR. The error code cannot be optimized out and is a cost to every application using mbed-os. @pan- did quite a bit of optimizing the previous error mechanism.

geky · 2018-05-14T22:18:27Z

platform/mbed_error.h

+ * @param  error_code       Error code value to be used, must be between 1 and 255(inclusive).
+ *
+ */
+#define DEFINE_POSIX_ERROR( error_name, error_code )   \


New macros should have MBED_ prefixes

I think we can rename this to add MBED_ prefix.

geky · 2018-05-15T23:27:35Z

platform/mbed_error.h

+ *       This is to enable easy injection of Posix error codes into MbedOS error handling system without altering the actual Posix error values.\n
+ *       Accordingly, Posix error codes are represented as -1 to -255 under MbedOS error status representation.
+ */
+typedef int MbedErrorStatus;


typedefs be something like mbed_error_t?

Especially for C developers, this appears very verbose and heavy-weight, which will discourage its adoption.

But maybe that's fine if we're ok with users using int to represent errors?

I like that this is backwards-compatibile with our current error handling.

Does mbed_error_status_t work?

That's a lot of typing, why not mbed_error_t or mbed_status_t?

geky · 2018-05-15T23:28:30Z

platform/mbed_error.h

+#ifdef MBED_CONF_ERROR_FILENAME_CAPTURE_ENABLED
+    #define SET_ERROR( error_status, error_msg, error_value )       set_error( error_status, (const char *)error_msg, (uint32_t)error_value, (const char *)MBED_FILENAME, __LINE__ )
+#else
+    #define SET_ERROR( error_status, error_msg, error_value )       set_error( error_status, (const char *)error_msg, (uint32_t)error_value, NULL, 0 )


SET_ERROR sounds like it's setting an errno variable. Why not keep this verb "error"? Maybe make it match the style for macros as MBED_ERROR? Or if you need to differentiate between error logging and halting maybe MBED_FAIL?

I think we already discussed about this, please see my response to @bulislaw's comments earlier.

Why SET_ERROR_FATAL and not ERROR or MBED_ERROR? Is there not fatal error? Also the set doesn't add anything to the meaning, I know from the implementation point of view it makes sense, but from user point of view doesn't matter that it sets the error state, for them it's just 'I want to report error'. If the error is not fatal lets call it warning.

Let me know if I'm looking at the wrong conversation, but it looks like @bulislaw and I agree, the SET_* prefix is going to be confusing for users.

Consider @c1728p9's comment below, even we are getting confused by the goal of this function.

geky · 2018-05-15T23:30:51Z

platform/mbed_error.h


-#ifdef __cplusplus
-extern "C" {
-#endif
 void error(const char* format, ...);


Wow! Deprecating error! I think this is the wrong place for this.

How about we keep the new implementation but wait to deprecate until the logging changes? That way debug and error can be deprecated at the same time, and users will get less deprecation churn.

And even then we really shouldn't deprecate error or debug until we're sure logging is stable. There's no real rush to deprecate.

Logging changes are still being reviewed/and other ideas being discussed. So, this is not getting deprecated as part of logging. So, no need to wait for logging/tracing to happen. Logging may introduce error-logging APIs which is different from error handling.

I agree with @geky here, we should not deprecate error. Current error API does both logging and handling of error, it is more for critical errors which will result in application termination.
However with error handling we would not terminate on all errors.

SET_ERROR does terminate the application, just as how error did. SET_WARNING does not terminate the application but will log the error context.

The current API does only the logging and termination, the goal here is to capture more actionable data which helps developers to debug the cause of app crashing etc. By the way, error logging is not same as logging or tracing implementation we are doing, they are two different things. But we can use some of the aspects of unified tracing in the error handling mechanism once tracing is in.

We should not deprecate error until we deprecate debug

Unfortunately, we cannot use mbed_trace as it uses too much resources+stdlibs, and the idea is to have an implementation to output to serial/uart without using much of system resources. If you have noticed the mbed_error_print is minimal implementation which was already there as part of exception handling, but we moved it under mbed_error_report to be used for error handling as well. By the error handling uses crit sect, so it will not corrupt other output.

Note that nothing should be hard-coded to serial/uart - we've got full console abstraction now. There should be no serial_putcs in any of these paths, and one of the outstanding jobs is to remove them all in favour of write(STDERR_FILENO).

@kjbracey-arm - I see your point, but the intention here is to use as much less resources as possible(without even using any stdlibs as allowed) as this is the same path for exception handling. So, using those high level abstracted calls may not work in all cases for example when we are dealing with an fatal exception. Just to give you more info, we are planning/thinking on a recovery mode feature where in we reset the system without re-initing the piece memory having the error info and then on reboot we can print that info out through "write" calls normally, as you mentioned. At that point, we can get rid of special handling when dealing with exception handling/errors handling. As of now, this looks to be the best option. If you noticed, this is actually not new, this piece of code has been moved from the exception handling path to be used for fatal error paths as well.

I'm not saying we need to adopt mbed-trace right now, I'm saying we should not deprecate error until we deprecate debug. You have an implementation or error here, so just remove the deprecation.

There's no urgency to deprecate error and users don't like making multiple changes.

geky · 2018-05-15T23:31:19Z

platform/mbed_error.h

+ * @return                  void
+ *
+ */
+typedef void (*MbedErrorHook)(const mbed_error_ctx *error_ctx);


typedefs should not be CamelCase

geky · 2018-05-15T23:32:40Z

platform/mbed_error.h

+ * @return                  int Number of errors reported.
+ *
+ */
+int get_error_count(void);


Does this mean errors are recoverable? I thought they would halt the CPU?

As of now, fatal errors are not recoverable. But eventually we need to extend this to implement a recovery mode and I hope we can recover from fatal errors by doing a warm reset and still keep the error info in tact and to be reported later.

oh interesting

Yes, we currently have a requirement to implement recovery mode.

geky · 2018-05-15T23:35:32Z

platform/mbed_error.h

+ * @note                        Filesystem support is required in order for this function to work.
+ *
+ */
+MbedErrorStatus save_error_log(const char *path);


An error log? Is this opt-in? Do we know how much of a footprint this has?

geky · 2018-05-15T23:43:46Z

features/FEATURE_LWIP/lwip-interface/lwip-sys/arch/lwip_sys_arch.c

@@ -123,15 +123,15 @@ u32_t sys_now(void) {
 *---------------------------------------------------------------------------*/
 err_t sys_mbox_new(sys_mbox_t *mbox, int queue_sz) {
    if (queue_sz > MB_SIZE)
-        error("sys_mbox_new size error\n");
+        SET_ERROR(MAKE_ERROR(MODULE_NETWORK_STACK, ERROR_CODE_INVALID_SIZE), "sys_mbox_new size error\n", queue_sz);


Reading this, I think we should collapse these macros.

SET_ERROR(MODULE_NETWORK_STACK, ERROR_CODE_INVALID_SIZE, "sys_mbox_new size error\n", queue_sz); // and SET_CUSTOM_ERROR(MODULE_NETWORK_STACK, ERROR_CODE_INVALID_SIZE, "sys_mbox_new size error\n", queue_sz);

Is the extra macro name needed?

Thats good idea, if we collapse do you any suggestions for using the custom defined error?
If we use the same macro, how does the caller use a custom defined error?

By the way, error logging can be completely disabled using a config mechanism, but by default it is enabled. Note that we will still capture the first and last error in the system which is always enabled.

Does having two macros, SET_ERROR and SET_CUSTOM_ERROR not work?

geky · 2018-05-15T23:48:15Z

platform/mbed_error.h

+ *  passed as the second argument in the DEFINE_SYSTEM_ERROR macro.\n\n
+ *  Below are the Mbed System error codes and the description:
+ *  \verbatim 
+    UNKNOWN                    256      Unknown error 


This is already pretty well established as -1. Can we keep -1 as an unknown error?

I was thinking about it but -1 maps to Posix Error Code. So, mapping -1 again to unknown error maybe confusing. And it also doesn't align with our overall error code definition/format.

I'm thinking more we should renumber the std errno codes (see above). This would free up -1 as general error.

Actually even if it conflicts, its just conflicting with EPERM (not EACCESS), which is pretty close to a generic error.

The idea here is to include posix error codes as part of mbed error handling model so the modules already using posix still have the same meaning. The problem with conflicting error codes is that it becomes difficult if some one gets an error and wants to search source code for where that came from.

There are far more modules returning -1 for errors than POSIX errors. Back in mbed 2 this was the correct thing to do:
https://os.mbed.com/users/mbed_official/code/SDFileSystem/file/8db0d3b02cec/SDFileSystem.cpp

The only thing worse than conflicting error codes are error codes with misleading names.

This also does not become a problem if we renumber the std errno codes.

deepikabhavnani · 2018-05-16T16:13:36Z

TESTS/mbed_platform/error_handling/main.cpp

+    MbedErrorStatus error = MAKE_ERROR(MODULE_APPLICATION, ERROR_CODE_UNKNOWN);
+    SET_WARNING(error, "Error Unknown", 0xAABBCCDD );
+    MbedErrorStatus lastError = get_last_error();
+    printf("\nlastError = 0x%08X", lastError );


Should this print be part of final test code?

Good point, we can remove those prints.

deepikabhavnani · 2018-05-16T16:14:59Z

TESTS/mbed_platform/error_handling/main.cpp

+void test_system_errors()
+{
+    MbedErrorStatus error = MAKE_ERROR(MODULE_APPLICATION, ERROR_CODE_UNKNOWN);
+    SET_WARNING(error, "Error Unknown", 0xAABBCCDD );


Why do we have SET_WARNING as an API? how is this different then tracing feature MBED_WARN or tr_warn?

SET_WARNING captures error context, can call application hooks, stores the error error context etc but tracing does not(or rather should not). Error handling should be seen differently from tracing and both are needed.

Agree error handling provides more functionality which is not same as tracing. My point islogs / print messages should not be handled in this feature. Log messages printed using SET_WARNING / SET_ERROR are not line interleaved and will corrupt data displayed on UART.

Please see my reply above, unfortunately we cannot use mbed_trace APIs as of now due to its extensive use of resources. Note that this needs to work with handling of exceptions as well.

deepikabhavnani · 2018-05-16T16:29:51Z

rtos/TARGET_CORTEX/TARGET_CORTEX_M/mbed_rtx_fault_handler.c

-void fault_print_init(void);
-void fault_print_str(char *fmtstr, uint32_t *values);
-void hex_to_str(uint32_t value, char *hex_star);
+//Functions Prototypes


All this code restructuring and changes in fault handler should be a separate PR.

deepikabhavnani · 2018-05-16T16:30:46Z

rtos/TARGET_CORTEX/TARGET_CORTEX_M/mbed_rtx_fault_handler.h

-  uint32_t SP;
-  uint32_t LR;
-  uint32_t PC;
+  uint32_t R0_reg;


Fault handler changes / fixes should be separate PR please

No, refactoring exception handling to use the new error handling framework is one of the goals with this feature.

deepikabhavnani

Please break the PR, as it is difficult to review in current state.

Error handling changes
Refactor / fixes for fault handler
Replace / Usage of error API in code base

Also I would agree with @geky that this is combination of error logging and error handling. I am fine with the error handling and standardizing error codes. But logging should not be part of this feature, we already have libraries for that. I agree with decisions pending on logging features, but lets not add more API's to make it more chaotic.

c1728p9

This change will have a fairly large impact on the codebase. It is also making many errors that were considered fatal into non-fatal errors by changing error to SET_ERROR. There needs to be a good justification for changing these errors from fatal to non-fatal.

I would like to see a docs page explaining how to use these error codes in a driver or library and when to use what kind of errors. Without a clear idea of how this unified error handling should/will be used, this change will just add confusion. For example, block device an nsapi define their own error codes. Should libraries like this always define their own error codes or should this be moved into the unified error codes?

mbed-ci · 2018-05-21T03:47:35Z

Build : SUCCESS

Build number : 2077
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6859/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

mbed-ci · 2018-05-21T05:36:13Z

Test : FAILURE

Build number : 1887
Test logs :http://mbed-os-logs.s3-website-us-west-1.amazonaws.com/?prefix=logs/6859/1887

mbed-ci · 2018-05-21T05:52:19Z

Exporter Build : SUCCESS

Build number : 1722
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/exporter/6859/

SenRamakri · 2018-05-21T12:10:12Z

/morph build

mbed-ci · 2018-05-21T12:42:24Z

Build : FAILURE

Build number : 2080
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6859/

adbridge · 2018-05-21T13:27:50Z

@geky @c1728p9 @0xc0170 @bulislaw @deepikabhavnani Are you happy with the updates? Please approve review if this is the case.

SenRamakri · 2018-05-21T14:01:21Z

/morph build

SenRamakri · 2018-05-21T14:10:20Z

@geky @c1728p9 @0xc0170 @bulislaw @deepikabhavnani - Yes, I think I have addressed or answered all of your comments, please take a look at the latest updates.

geky

More things need MBED_ prefixes
SET_ERROR is not a good name
Duplicated error codes between errno and mbed
No support for -1
Now is not the time to deprecate error

geky · 2018-05-21T14:31:04Z

platform/mbed_error.h

+
+#define GET_MBED_ERROR_TYPE( error_status )         ((error_status & MBED_ERROR_STATUS_TYPE_MASK) >> MBED_ERROR_STATUS_TYPE_POS)
+#define GET_MBED_ERROR_MODULE( error_status )       ((error_status & MBED_ERROR_STATUS_MODULE_MASK) >> MBED_ERROR_STATUS_MODULE_POS)    
+#define GET_MBED_ERROR_CODE( error_status )         (int)((GET_MBED_ERROR_TYPE( error_status ) == ERROR_TYPE_POSIX)?(-error_status):((error_status & MBED_ERROR_STATUS_CODE_MASK) >> MBED_ERROR_STATUS_CODE_POS))


nit: MBED_ first as a prefix?

- GET_MBED_ERROR_TYPE + MBED_GET_ERROR_TYPE

geky · 2018-05-21T14:32:02Z

platform/mbed_error.h

+#ifdef MBED_CONF_ERROR_FILENAME_CAPTURE_ENABLED
+    #define SET_ERROR( error_status, error_msg, error_value )       set_error( error_status, (const char *)error_msg, (uint32_t)error_value, (const char *)MBED_FILENAME, __LINE__ )
+#else
+    #define SET_ERROR( error_status, error_msg, error_value )       set_error( error_status, (const char *)error_msg, (uint32_t)error_value, NULL, 0 )


SET_ERROR is still a confusing name for a function that halts the processor

What would be a good name? I dont like calling it just error because it sounds like we are declaring/defining something. I want that to be clear that it will be acted upon.

Sorry, I'm coming in the middle of the review a bit so might have missed some of the nuanced/offline discussions. The current use of error(...) halts execution. To me calling this macro ERROR makes more sense as SET_ERROR implies setting the status of an error code. At least to me ERROR implies it will be acted on.

geky · 2018-05-21T14:33:31Z

platform/mbed_error.h


-#ifdef __cplusplus
-extern "C" {
-#endif
 void error(const char* format, ...);


We still should not deprecate error

Can I know why? We are planning to raise requirements for other modules to adopt new error handling. And this needs to be deprecated now before they start doing it.

geky · 2018-05-21T14:34:35Z

platform/mbed_error.h

+    MBED_DEFINE_SYSTEM_ERROR( HARDFAULT_EXCEPTION        ,61 ),         /* 317      HardFault exception */
+    MBED_DEFINE_SYSTEM_ERROR( MEMMANAGE_EXCEPTION        ,62 ),         /* 318      MemManage exception */
+    MBED_DEFINE_SYSTEM_ERROR( BUSFAULT_EXCEPTION         ,63 ),         /* 319      BusFault exception */
+    MBED_DEFINE_SYSTEM_ERROR( USAGEFAULT_EXCEPTION       ,64 ),         /* 320      UsageFault exception*/


Should we prefix error codes with MBED_?

These names are common, user libraries may have already defined names such as DEVICE_BUSY

I really don't see a need for prefix-ing all the error codes again with MBED_. Please note that its not "DEVICE_BUSY", we are actually defining error codes/status using that macro. So, I don't see conflicts there. User libraries can continue to have "DEVICE_BUSY" if they want to.

geky · 2018-05-21T14:37:12Z

platform/mbed_error.h

+    MBED_DEFINE_POSIX_ERROR( EBADF                 ,EBADF           ),  /* 9       Bad file number */
+    MBED_DEFINE_POSIX_ERROR( ECHILD                ,ECHILD          ),  /* 10      No child processes */
+    MBED_DEFINE_POSIX_ERROR( EAGAIN                ,EAGAIN          ),  /* 11      Try again */
+    MBED_DEFINE_POSIX_ERROR( ENOMEM                ,ENOMEM          ),  /* 12      Out of memory */


We still have duplicated error codes

I wouldn't call them as duplicates, because Posix error codes are special cases and we cannot capture the module reporting the error. That's why we have to define some of them(common ones) defined under system errors as well for mbed modules to use.

geky · 2018-05-21T14:37:41Z

platform/mbed_error.h

+    //Below are MBED SYSTEM ERROR CODE definitions
+    //MBED SYSTEM ERROR CODE definitions starts at offset MBED_SYSTEM_ERROR_BASE, see above.
+    //                   Error Name                 Error Offset   Error Code
+    MBED_DEFINE_SYSTEM_ERROR( UNKNOWN                    ,0 ),          /* 256      Unknown error */


Still no support for -1?

Sorry, we are unable to do that, because -1 is an absolute value which we cannot encode under system error codes because it always will be Posix error(ERROR_EPERM) and also searching for error codes in source code will get messed up as well. So for now, -1 will not be encoded under system error, but its encoded as ERROR_EPERM. My suggestion is drivers should change to use more appropriate error codes instead of using -1 as a generic error which is unmanageable and tough to track.

mbed-ci · 2018-05-21T15:40:46Z

Build : SUCCESS

Build number : 2082
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6859/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

mbed-ci · 2018-05-21T17:54:29Z

Test : FAILURE

Build number : 1892
Test logs :http://mbed-os-logs.s3-website-us-west-1.amazonaws.com/?prefix=logs/6859/1892

mbed-ci · 2018-05-21T19:16:56Z

Exporter Build : SUCCESS

Build number : 1726
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/exporter/6859/

SenRamakri · 2018-05-22T04:17:06Z

/morph build

…o changes and test fixes

SenRamakri · 2018-05-22T05:19:09Z

/morph build

cmonr · 2018-05-22T05:26:17Z

@SenRamakri I would refrain from starting CI until at a minimum, pr-head and travis CI builds are green.

mbed-ci · 2018-05-22T07:11:45Z

Build : SUCCESS

Build number : 2094
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6859/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

mbed-ci · 2018-05-22T07:57:35Z

Build : SUCCESS

Build number : 2095
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6859/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

mbed-ci · 2018-05-22T10:52:32Z

Test : SUCCESS

Build number : 1904
Test logs :http://mbed-os-logs.s3-website-us-west-1.amazonaws.com/?prefix=logs/6859/1904

adbridge · 2018-05-22T11:27:19Z

@SenRamakri please do not kick off the ci yourself. CI should only ever be kicked off when the PR is in the needs ci state , otherwise this will mess up all the future metrics collection. Technically the review should also be approved before doing so...

mbed-ci · 2018-05-22T12:32:44Z

Test : FAILURE

Build number : 1905
Test logs :http://mbed-os-logs.s3-website-us-west-1.amazonaws.com/?prefix=logs/6859/1905

geky · 2018-05-22T15:13:24Z

platform/mbed_error.h

+ */
+typedef enum _mbed_module_type
+{
+    MODULE_APPLICATION = 0,


These guys should also have the MBED_ prefix

geky · 2018-05-22T15:14:08Z

platform/mbed_error.h

+ * @note This macro generate mbed_error_status_t-es with error type set to MBED_ERROR_TYPE_SYSTEM
+ *
+ */
+#define MAKE_ERROR(module, error_code)                          MAKE_SYSTEM_ERROR(module, error_code)


These guys should also have the MBED_ prefix

geky · 2018-05-22T15:14:52Z

platform/mbed_error.h

+ * @return                  mbed_error_status_t code logged for the last error or MBED_SUCCESS if no errors are logged.
+ *
+ */
+mbed_error_status_t get_last_error(void);


These guys also need an mbed_ prefix

geky · 2018-05-22T15:15:22Z

platform/mbed_error.h

+ * @return                      0 or MBED_ERROR_SUCCESS on success.
+ *
+ */
+mbed_error_status_t make_mbed_error(mbed_error_type_t error_type, mbed_module_type_t module, mbed_error_code_t error_code);


I would put the mbed_ at the front for consistency, makes it easy to parse for both humans and machines

geky · 2018-05-22T15:15:34Z

platform/mbed_error.h

+ * @note                        Filesystem support is required in order for this function to work.
+ *
+ */
+mbed_error_status_t save_error_log(const char *path);


But yeah, these all need mbed_ prefix

geky · 2018-05-22T15:17:06Z

platform/mbed_error.h

+ *
+ */
+#define MBED_DEFINE_SYSTEM_ERROR( error_name, error_code )   \
+                      MBED_ERROR_CODE_##error_name = MBED_SYSTEM_ERROR_BASE + error_code,  \


Though: I noticed these names have gotten long, maybe we don't need the full "ERROR_CODE"? Maybe we can just use MBED_CODE_ERROR_NAME? Up to you

mbed-ci · 2018-05-22T16:33:21Z

Exporter Build : SUCCESS

Build number : 1736
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/exporter/6859/

mbed-ci · 2018-05-22T16:33:21Z

Exporter Build : SUCCESS

Build number : 1736
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/exporter/6859/

SenRamakri · 2018-05-22T17:50:48Z

This PR has been moved to #6983 as most people are complaining on not able to access this(showing unicorns).

SenRamakri requested review from bulislaw, cmonr, sg-, kegilbert and 0xc0170 May 10, 2018 03:13

bulislaw suggested changes May 10, 2018

View reviewed changes

bulislaw reviewed May 10, 2018

View reviewed changes

SenRamakri commented May 10, 2018

View reviewed changes

cmonr added the needs: review label May 11, 2018

SenRamakri force-pushed the sen_ErrorHandling branch from c440fa6 to a4701f2 Compare May 12, 2018 15:36

cmonr added the release-version: 5.9.0-rc1 label May 14, 2018

SenRamakri force-pushed the sen_ErrorHandling branch 2 times, most recently from bcc30e8 to 7a4f6ce Compare May 14, 2018 17:11

0xc0170 requested changes May 15, 2018

View reviewed changes

geky requested review from geky, c1728p9 and kjbracey May 15, 2018 20:41

geky added the scope: feature label May 15, 2018

geky requested review from yossi2le and pan- May 15, 2018 23:24

geky suggested changes May 16, 2018

View reviewed changes

deepikabhavnani reviewed May 16, 2018

View reviewed changes

deepikabhavnani suggested changes May 16, 2018

View reviewed changes

c1728p9 suggested changes May 16, 2018

View reviewed changes

Fix test failures when trap errors are enabled and other fixes

1b8a9d3

SenRamakri force-pushed the sen_ErrorHandling branch from b0557ca to 1b8a9d3 Compare May 21, 2018 12:09

geky suggested changes May 21, 2018

View reviewed changes

SenRamakri force-pushed the sen_ErrorHandling branch from 0561663 to 0d508a1 Compare May 22, 2018 04:15

Test application/cases optimization for some low memory targets, macr…

7392504

…o changes and test fixes

SenRamakri force-pushed the sen_ErrorHandling branch from 0d508a1 to 7392504 Compare May 22, 2018 04:47

geky reviewed May 22, 2018

View reviewed changes

SenRamakri closed this May 22, 2018

Error Handling and error coding #6859

Error Handling and error coding #6859

Uh oh!

Conversation

SenRamakri commented May 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull request type

Uh oh!

bulislaw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SenRamakri left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SenRamakri commented May 10, 2018 •

edited

Loading

geky left a comment •

edited

Loading