Skip to content

Standardized Error Handling and Error Codes #6983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 24, 2018

Conversation

SenRamakri
Copy link
Contributor

This is the implementation for Standardized error coding and handling for mbed-os.
The changes with this pull request implements standardized error code handling and error code definitions/encoding. With this, error handling is implemented as a Core-os service which captures and
logs the errors with context information. Each error code is captured with error location, error message,
error file(this is configurable as this can cause increase in binary size), calling thread info, module generating the error etc. Relevant APIs are provided to log and retrieve errors from error logging system.
A build time flag has been provided to switch off error logging system on memory constrained platforms.
Error reporting itself is logically separated and implemented in mbed_error_reporting.c/h. By doing this both exception handling and error handling can use the same reporting infrastructure. A sample error report is as below. Note that error status value captures error type, entity and error code. Error reporting prints these values separately as well to enable developers to search the source tree with error code.
Error handling service also provides an API to save the error log in file system if required.

++ MbedOS Error Info ++
Error Status: 0x80040103
Error Code: 259
Error Entity: 04
Error Message: HAL Entity error
Error Location: 0x000067C7
Error Value: 0x00005566
Current Thread: Id: 0x200024A8 EntryFn: 0x0000FB0D StackSize: 0x00001000 StackMem: 0x200014A8 SP: 0x2002FFD8
-- MbedOS Error Info --

The implementation is tested using the Greentea tests under TESTS/mbed_platform/error_handling.
Note that exception handling and reporting testing cannot be done using Greentea and was manually tested.

Pull request type

[ ] Fix
[ ] Refactor
[ ] New target
[X] Feature
[ ] Breaking change

The APIs/data structures/macros are all documented using Doxygen. In addition handbook documentation will also be generated for release.

Overall impact of this feature is ~1K(ROM) including logging. Logging can be disabled and if disabled
the overall impact would be around 0.75K.

//Error logger threads
void err_thread_func(mbed_error_status_t *error_status)
{
//printf("\nError Status = 0x%08X\n",*error_status);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the dead code and printf's from all tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we can remove it

}

i = mbed_get_error_log_count()-1;
//printf("\nError log count = %d\n", i+1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Printf should be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed


error = MBED_TEST_FILESYSTEM::format(&fd);
if(error < 0) {
printf("Failed formatting");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Printf's - should be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deepikabhavnani - These are printf in the case of errors, its helpful for debugging. Do you think it needs to be taken out? I don't think they are normally hit and shouldnt cause any adverse effects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im changing them to TEST_FAIL_MESSAGE, hope that works


utest::v1::status_t test_setup(const size_t number_of_cases)
{
GREENTEA_SETUP(300, "default_auto");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the test take this long (300 sec) to finish?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that it takes quite a long for some slow targets(I saw one of them took 80secs it could be some test delays as well) as I see. But I can cut that down to 2 mins or less. Ill do that.

@@ -85,6 +85,11 @@ void error(const char* format, ...)
{
(void) format;
}

mbed_error_status_t mbed_error(mbed_error_status_t error_status, const char *error_msg, unsigned int error_value, const char *filename, int line_number)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the comments above, it does not look like we need mbed_error here till we actually replace error with mbed_error in EvrRtxTimerError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is when the error happens we have more info to debug the issue. Thats the reason why we are replacing them. Hope that works since its a good direction.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code will not be called, till we actual update EvrRtxTimerError. We can add this later when we are making changes to CMSIS/RTX code. As it helps in keeping track of all CMSIS/RTX changes in single PR and dependencies as well when updating to higher versions.

@@ -55,6 +55,11 @@ struct Sync {
void error(const char* format, ...) {
(void) format;
}

mbed_error_status_t mbed_error(mbed_error_status_t error_status, const char *error_msg, unsigned int error_value, const char *filename, int line_number)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this code here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain it? Do you mean we can have it under macro?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was inline to previous one, till we replace error call from CMSIS/RTX lib, we should not add this code. Its should be part of same commit, to help in keeping track of CMSIS changes for updates.

@@ -123,15 +123,15 @@ u32_t sys_now(void) {
*---------------------------------------------------------------------------*/
err_t sys_mbox_new(sys_mbox_t *mbox, int queue_sz) {
if (queue_sz > MB_SIZE)
error("sys_mbox_new size error\n");
MBED_ERROR(MBED_MAKE_ERROR(MBED_MODULE_NETWORK_STACK, MBED_ERROR_CODE_INVALID_SIZE), "sys_mbox_new size error\n", queue_sz);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend not to replace API in library/feature code (even when deprecating), let the feature owners do it after thorough testing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can share the samples and examples, but actual API usage should be done by feature owners.

Copy link
Contributor Author

@SenRamakri SenRamakri May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deepikabhavnani - Lot of these changes are already talked about during design review. In addition we are going to write separate requirements for each team once this is in. So, my understanding it is better to capture more info as long as they are not changing the behavior in any way. Do you still think we should remove them?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes Senthil, I believe it should be done and verified by feature owners

@@ -57,14 +57,14 @@ int ReadOnlyBlockDevice::read(void *buffer, bd_addr_t addr, bd_size_t size)

int ReadOnlyBlockDevice::program(const void *buffer, bd_addr_t addr, bd_size_t size)
{
error("ReadOnlyBlockDevice::program() not allowed");
return 0;
MBED_ERROR(MBED_MAKE_ERROR(MBED_MODULE_BLOCK_DEVICE, MBED_ERROR_CODE_WRITE_PROTECTED), "ReadOnlyBlockDevice::program() not allowed", addr);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment, please dont replace the API everywhere in Mbed OS code, it should not be part of API addition. API addition and API adoption should be separate PR's

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as above, we are only changing places where it doesnt cause any behavior change. We will be writing requirements for other teams as well to adopt this more broadly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to pass addr and size to the MBED_ERROR macro?

Should we just not take any arguments in MBED_ERROR?


//Prevent corruption by holding out other callers
//and we also need this until we remove the error call completely
while (error_in_progress == 1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spinning is fine in case of critical error, but in case of warnings we are unnecessary holding the CPU here

Copy link
Contributor Author

@SenRamakri SenRamakri May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I thought about that. But how do we sync-up with the current "error" API without this. Until we remove that we have to do something like this. I was thinking of returning but thats not desirable to return from warning API when some code is already in distress and doesn't know how to handle the situation. Do you think we should call some sort of wait to free up the cpu?What would you suggest

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait will have its own side effects, but do we support these warning/error from interrupts? if no then wait is fine


/* Prints the error information */
void mbed_report_error(mbed_error_ctx *error_ctx, char *error_msg)
{
Copy link

@deepikabhavnani deepikabhavnani May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query: Will this function be called in case of warning/non-fatal errors as well? If yes then I would say it is lacking synchronization with existing mbed_trace library.
I know the prints in the call are under critical section, but prints from application and mbed_trace are not in critical section. Consider a scenario where thread1 is in between printing mbed_trace log and context switches to thread_2 which has non-fatal error. It will start printing on UART console without previous log message dumped completely. Logging in this module does not guarantee line interleaving with existing libraries

Copy link

@deepikabhavnani deepikabhavnani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added my comments at specific code section, major concern is line interleave (sync with other libraries) in case of warning messages and updating existing 'error` API's in feature and other libraries.

@SenRamakri
Copy link
Contributor Author

@deepikabhavnani - See my new changes. The serial print functions are removed and now we print the error report only in the case of fatal errors, so it shouldn't cause issues when mbed_trace is active.
I have removed the MBED_ERROR calls from lwip. For all other files which are changed to use MBED_ERROR we do have component experts already in this review. "error" function is no longer deprecated. It will internally call mbed_error to capture the error info. In future, we may deprecate "error" once we have other components using the new mbed error apis.

@SenRamakri SenRamakri requested a review from sg- May 23, 2018 05:27
kjbracey
kjbracey previously approved these changes May 23, 2018
Copy link
Contributor

@kjbracey kjbracey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do still need to remove all the hard-coded console device stuff from here (later), but in that regard this version is no worse than the existing error, and a teeny bit better in having ITM hard-coded. So I'm okay to approve this - no other concerns.

Open issue with my thoughts on that here: #6521

0xc0170
0xc0170 previously approved these changes May 23, 2018
c1728p9
c1728p9 previously approved these changes May 23, 2018
Copy link
Contributor

@c1728p9 c1728p9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SenRamakri you have addressed the concerns I had. I would have liked more visibility on this during the concept phase, but what you have here looks good to me and matches the proposed design.

@deepikabhavnani
Copy link

@SenRamakri - Changes look good. I will request just one more change in API to have it more aligned with other logging/tracing systems, const char string message as first arg.

--#define MBED_WARNING( error_status, error_msg, error_value ) 
++define MBED_WARNING( error_msg, error_status, error_value )

#ifdef MBED_CONF_ERROR_FILENAME_CAPTURE_ENABLED
#define MBED_ERROR( error_status, error_msg, error_value ) mbed_error( error_status, (const char *)error_msg, (uint32_t)error_value, (const char *)MBED_FILENAME, __LINE__ )
#else
#define MBED_ERROR( error_status, error_msg, error_value ) mbed_error( error_status, (const char *)error_msg, (uint32_t)error_value, NULL, 0 )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing error does not take arguments, so most calls will not know what to pass as an error_value. I do see it as useful, but a single error_value is both additional work and limiting.

Since we're stuck once this lands, can we change MBED_ERROR to take two parameters and add MBED_ERROR1 or similar to take the error_value argument?

This would allow use to increase the arg count in the future if we decide that we want to do that.

MBED_ERROR(MBED_MAKE_ERROR(MBED_MODULE_BLOCK_DEVICE, MBED_ERROR_CODE_WRITE_PROTECTED), "ReadOnlyBlockDevice::program() not allowed");
MBED_ERROR1(MBED_MAKE_ERROR(MBED_MODULE_BLOCK_DEVICE, MBED_ERROR_CODE_WRITE_PROTECTED), "ReadOnlyBlockDevice::program() not allowed", addr);
MBED_ERROR2(MBED_MAKE_ERROR(MBED_MODULE_BLOCK_DEVICE, MBED_ERROR_CODE_WRITE_PROTECTED), "ReadOnlyBlockDevice::program() not allowed", addr, size);

@geky
Copy link
Contributor

geky commented May 23, 2018

@deepikabhavnani, interesting, I disagree, in this case error_status is not an argument, but the object as a target of the MBED_WARNING call:

Consider the order of arguments for sprintf:

#define MBED_WARNING(error_status, error_msg, error_value)
// vs
sprintf(buffer, msg, args);

@deepikabhavnani
Copy link

@SenRamakri @geky - As discussed internally will keep error_status as first arg

@SenRamakri SenRamakri dismissed stale reviews from deepikabhavnani, c1728p9, 0xc0170, and kjbracey via 5ef6728 May 23, 2018 17:41
@SenRamakri SenRamakri force-pushed the sen_ErrorHandling_Push2 branch from a9f2a0d to 5ef6728 Compare May 23, 2018 17:41
@SenRamakri
Copy link
Contributor Author

Pushed a minor change to split the MBED_ERROR macros into MBED_ERROR and MBED_ERROR1.

Copy link
Contributor

@geky geky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an external discussion:

I'm happy with this as long as we plan to unify POSIX and Mbed error codes in the future. Having them separate creates a challenge for users.

Thanks for responding to all my feedback 👍

@cmonr
Copy link
Contributor

cmonr commented May 24, 2018

Starting CI since comments indicate multiple reviewers are good with the changes.
Note: It's possible that export and test builds may get shuffled around tomorrow.

/morph build

@mbed-ci
Copy link

mbed-ci commented May 24, 2018

Build : SUCCESS

Build number : 2134
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6983/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

@mbed-ci
Copy link

mbed-ci commented May 24, 2018

@adbridge
Copy link
Contributor

Test failed trying to gather gcov data on Nordic device.
Will try re-running the tests.
/morph test

@mbed-ci
Copy link

mbed-ci commented May 24, 2018

@0xc0170
Copy link
Contributor

0xc0170 commented May 24, 2018

Test aborted, this needs rebuild to have a fix in that was merged today

/morph build

@SenRamakri
Copy link
Contributor Author

@0xc0170 - Do we need to rebase to get the fix that merged today?

@mbed-ci
Copy link

mbed-ci commented May 24, 2018

Build : SUCCESS

Build number : 2136
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6983/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

@0xc0170
Copy link
Contributor

0xc0170 commented May 24, 2018

@0xc0170 - Do we need to rebase to get the fix that merged today?

not needed. CI does merge. Thus I restarted it

@mbed-ci
Copy link

mbed-ci commented May 24, 2018

@mbed-ci
Copy link

mbed-ci commented May 24, 2018

@cmonr
Copy link
Contributor

cmonr commented May 24, 2018

Bringing it in!

@cmonr cmonr merged commit 527f9a1 into ARMmbed:master May 24, 2018
Copy link
Contributor

@0xc0170 0xc0170 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overlooked one commit change (suffixed 1 macros).

Where is docs PR ?

@@ -78,13 +78,13 @@ void Thread::constructor(Callback<void()> task,

switch (start(task)) {
case osErrorResource:
error("OS ran out of threads!\n");
MBED_ERROR1(MBED_MAKE_ERROR(MBED_MODULE_PLATFORM, MBED_ERROR_CODE_OUT_OF_RESOURCES), "OS ran out of threads!\n", task);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MBED_WARNING1 or this ERROR1 are confusing to me - what does 1 stands for, which one should I use. From the commit message and the changes I see there is error value, so we either have it or not, but having these 2 ?

@SenRamakri
Copy link
Contributor Author

@0xc0170 - I'm working on Examples and Docs, example has been completed and available. Docs are in progress, will land the PR in Handbook early next week. The code changes will be reflected in docs changes as well.


core_util_critical_section_enter();
error_log_count++;
memcpy(&mbed_error_ctx_log[error_log_count % MBED_CONF_ERROR_HIST_SIZE], error_ctx, sizeof(mbed_error_ctx) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IAR warning without including string.h
Warning[Pe223]: function "memcpy" declared implicitly ...mbed\platform\mbed_error_hist.c 39

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I sent a fix here #7136

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants