-
Notifications
You must be signed in to change notification settings - Fork 3k
Enable stack dump #10872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable stack dump #10872
Conversation
@andrewc-arm, thank you for your changes. |
Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. Could you add short description in our handbook in debugging section?
@andrewc-arm could you move the 'how to use this' to the release notes section please along with any additional information on how to use it that can go into the next release note. Thanks |
Done following items.
|
095f205
to
1217aae
Compare
platform/mbed_error.c
Outdated
@@ -74,14 +74,19 @@ static unsigned int compute_crc32(const void *data, int datalen) | |||
unsigned int crc = 0; /* CRC value is 32bit */ | |||
unsigned char *buf = (unsigned char *)data;//use a temp variable to make code readable and to avoid typecasting issues. | |||
|
|||
for (; datalen > 0; datalen--) { | |||
for (; datalen > 0; datalen--) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coding style requires 1 TBS for if, else, while, for statements, see https://os.mbed.com/teams/SDK-Development/wiki/mbed-sdk-coding-style
Please revert this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
astyle --style=allman applied to pass astyle test
Use astylerc file in the root, defines all rules needed - just run it on your changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@0xc0170
I ran atstyle based on the rc file and this is what I got.
andcho01@bc-a1-1-3:~/code/mbed-os$ astyle -n --options=.astylerc platform/mbed_error.c
Invalid option file options:
max-continuation-indent=120
For help on options type 'astyle -h'
Artistic Style has terminated
So I commented out max-continuation-indent
and got the astyle fixed. Please review d0ca786.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I saw this error earlier - what version of astyle you use? Update if older, they do not provide this option
platform/mbed_error.c
Outdated
@@ -556,6 +590,51 @@ static void print_error_report(const mbed_error_ctx *ctx, const char *error_msg, | |||
mbed_stats_sys_get(&sys_stats); | |||
mbed_error_printf("\nFor more info, visit: https://mbed.com/s/error?error=0x%08X&osver=%" PRId32 "&core=0x%08" PRIX32 "&comp=%d&ver=%" PRIu32 "&tgt=" GET_TARGET_NAME(TARGET_NAME), ctx->error_status, sys_stats.os_version, sys_stats.cpu_id, sys_stats.compiler_id, sys_stats.compiler_version); | |||
#endif | |||
|
|||
#if MBED_STACK_DUMP_ENABLED && defined(MBED_CONF_RTOS_PRESENT) | |||
#define STACK_END_MARK 0xCCCCCCCC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we use already RTX symbols (they are public) but not yet defined in CMSIS RTOS, lets use these here not redefine again. Use osRtxStackFillPattern
, same as we use in Thread.cpp file (add #if defined(MBED_OS_BACKEND_RTX5)
to be able to find these easy).
We do not care about watermark (if it is enabled)? There was an issue previously for this: #7545, we use both symbols in Thread object when checking stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used osRtxStackFillPattern
in the new commit d0ca786.
We do not care about watermark. Without the watermark, osRtxStackMagicWord
was not visible and this code does not depend on it.
platform/mbed_error.c
Outdated
|
||
#if MBED_STACK_DUMP_ENABLED && defined(MBED_CONF_RTOS_PRESENT) | ||
#define STACK_END_MARK 0xCCCCCCCC | ||
#define STACK_END_MARK_CNT 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this number do (why is it 3) ? line 613 is not that clear to me - might be worth adding a comment there what we do there ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment (https://github.com/ARMmbed/mbed-os/pull/10872/files#diff-ae57d671f3af44d01b722ed277444e27R561) to elaborate the stack end detection logic.
1217aae
to
d0ca786
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for the update
d0ca786
to
4ed338a
Compare
// Find the stack end. | ||
int stack_end_cnt = 0; | ||
uint32_t st_end = ctx->thread_current_sp; | ||
for (; st_end >= ctx->thread_stack_mem; st_end -= sizeof(int)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the current stack pointer verified somewhere before stored to context? It might be worth checking that it is within RAM at least to avoid faulting again here. Also the value should be checked for misalingment to avoid accessing data from non-32bit aligned addresses. A bit of caution on the data collected from crashed thread will not hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Good point. I will add this feature in the next candidate code for this PR.
if (stack_end_cnt >= STACK_END_MARK_CNT) { | ||
st_end += (STACK_END_MARK_CNT - 1) * sizeof(int); | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stack size calculation logic differs from RTX one: https://github.com/ARMmbed/mbed-os/blob/master/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_thread.c#L901-L908. Why not just stop at the first non-osRtxStackFillPattern value? Nevermind, this makes more sense.
Not following the logic on the area you're dumping. If I follow correctly, you're dumping all ever-used stack space up to the current stack pointer. It's all stale data, and nothing from the current callstack. I would be inclined to just dump starting from the current stack pointer ( Possibly you could still choose to show used stuff below Although there's a further complication in that You should test this out with a hard fault, as well as the easier |
Another thing to consider to make any analysis tool's life easier is ensuring that we have a good (PC,SP) pair for it to lock on to. A precise backtrace uses PC->SP-offset mappings from the ELF/DWARF debug info, and getting that to work correctly means we need to start from a matched (PC,SP) readout. In the fault case we can get a precisely matched pair from |
Good feedback @kjbracey-arm 👍 |
Hi, @kjbracey-arm I think you are right! Thanks for pointing it out. I was dumping the other way around. You are right about that we need the region of The information I acquired was this one from my experimentation.
I will amend this PR and come back to you after the full verification.
I think the stack pointer of the failure is correct one since we are extracting it from the |
As far as I can see, |
Hi, BTW, I cannot reopen this PR. It's grayed out and it says (CC: @ohadhawk ) |
@kjbracey-arm Regarding your comment, I experimented and found out that our This is true value from gdb.
Please note that true This is acquired value from
Log: teraterm_20190827.1806.log Looks like we need some attention to |
I don't think you're going to get a solid answer from assert - there's an extra wrapper around the Register contents aren't as relevant when it's a high-level error. The SP will at least be good enough to indicate overflow. |
@kjbracey-arm I would like to share my recent experiment to double check the correctness of the current stack dump code. And it's exactly correct for thread-mode fault crashes. The true callstack acquired by gdb
The stack dump
I marked the important callback points with (CC: @ohadhawk ) |
Hi, |
Description
In the current mbed-os, there is no stack dump feature as far as I know.
With this code change, now we can have a stack dump as shown in the
Release Notes
.In the near future we will come up on how to parse the stack and reconstruct the back-trace and call arguments. But even without the parser, the assembly readers can manually decode the stack dump to find the failure.
Pull request type
Reviewers
Release Notes
For better post-crash debugging with Mbed-OS, you can enable stack dump.
In order to use, the application should add
mbed_app.json
this line.It is strongly recommended that you should save the *.elf file along with your *.bin file. In the future Mbed-OS debugging tutorial, we are planning to introduce how to debug with Mbed-OS stack dump and *.elf file. Without *.elf file, you will need to disassemble the code and manually decode the meaning of the stack.
This is the example of the stack dump from
MBED_ASSERT(false);
.This dump routine is smart and only dumps the critical area containing following information.
NOTE: Otherwise, if we dump too much, it will choke up the crash log.
Also, note that this does not require the application to build with
-funwind-tables
option or have GCC dependency.Reference
(CC: @ohadhawk, @JanneKiiskila , @kjbracey-arm , @TeroJaasko , @bulislaw , @kivaisan )