-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Don't count all the frames just to skip the current inlined ones. #80918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The algorithm to find the DW_OP_entry_value requires you to find the nearest non-inlined frame. It did that by counting the number of stack frames so that it could use that as a loop stopper. That is unnecessary and inefficient. Unnecessary because GetFrameAtIndex will return a null frame when you step past the oldest frame, so you already have the "got to the end" signal without counting all the stack frames. And counting all the stack frames can be expensive.
@llvm/pr-subscribers-lldb Author: None (jimingham) ChangesThe algorithm to find the DW_OP_entry_value requires you to find the nearest non-inlined frame. It did that by counting the number of stack frames so that it could use that as a loop stopper. That is unnecessary and inefficient. Unnecessary because GetFrameAtIndex will return a null frame when you step past the oldest frame, so you already have the "got to the end" signal without counting all the stack frames. Full diff: https://github.com/llvm/llvm-project/pull/80918.diff 1 Files Affected:
diff --git a/lldb/source/Expression/DWARFExpression.cpp b/lldb/source/Expression/DWARFExpression.cpp
index fe4928d4f43a43..c061fd1140fff7 100644
--- a/lldb/source/Expression/DWARFExpression.cpp
+++ b/lldb/source/Expression/DWARFExpression.cpp
@@ -608,11 +608,10 @@ static bool Evaluate_DW_OP_entry_value(std::vector<Value> &stack,
StackFrameSP parent_frame = nullptr;
addr_t return_pc = LLDB_INVALID_ADDRESS;
uint32_t current_frame_idx = current_frame->GetFrameIndex();
- uint32_t num_frames = thread->GetStackFrameCount();
- for (uint32_t parent_frame_idx = current_frame_idx + 1;
- parent_frame_idx < num_frames; ++parent_frame_idx) {
+
+ for (uint32_t parent_frame_idx = current_frame_idx + 1;;parent_frame_idx++) {
parent_frame = thread->GetStackFrameAtIndex(parent_frame_idx);
- // Require a valid sequence of frames.
+ // If this is null, we're at the end of the stack.
if (!parent_frame)
break;
|
I found this looking though a bunch of samples to see why some operation was slow, and a good bit of the time was this unnecessary counting the stack. Except for performance, this is not easily observable, I couldn't figure out how to write a robust test. |
You can test this locally with the following command:git-clang-format --diff 2f490583c368627f552c71e340c39f2b55c0526c e8659a128f34b93469e9ad9b0ed013ff6764c5be -- lldb/include/lldb/Target/Thread.h lldb/source/Expression/DWARFExpression.cpp View the diff from clang-format here.diff --git a/lldb/include/lldb/Target/Thread.h b/lldb/include/lldb/Target/Thread.h
index 30863ad4c9..b764dbf3a9 100644
--- a/lldb/include/lldb/Target/Thread.h
+++ b/lldb/include/lldb/Target/Thread.h
@@ -391,7 +391,7 @@ public:
virtual bool ThreadHasQueueInformation() const { return false; }
/// GetStackFrameCount can be expensive. Stacks can get very deep, and they
- /// require memory reads for each frame. So only use GetStackFrameCount when
+ /// require memory reads for each frame. So only use GetStackFrameCount when
/// you need to know the depth of the stack. When iterating over frames, its
/// better to generate the frames one by one with GetFrameAtIndex, and when
/// that returns NULL, you are at the end of the stack. That way your loop
diff --git a/lldb/source/Expression/DWARFExpression.cpp b/lldb/source/Expression/DWARFExpression.cpp
index c061fd1140..e909c2c65d 100644
--- a/lldb/source/Expression/DWARFExpression.cpp
+++ b/lldb/source/Expression/DWARFExpression.cpp
@@ -609,7 +609,7 @@ static bool Evaluate_DW_OP_entry_value(std::vector<Value> &stack,
addr_t return_pc = LLDB_INVALID_ADDRESS;
uint32_t current_frame_idx = current_frame->GetFrameIndex();
- for (uint32_t parent_frame_idx = current_frame_idx + 1;;parent_frame_idx++) {
+ for (uint32_t parent_frame_idx = current_frame_idx + 1;; parent_frame_idx++) {
parent_frame = thread->GetStackFrameAtIndex(parent_frame_idx);
// If this is null, we're at the end of the stack.
if (!parent_frame)
|
for (uint32_t parent_frame_idx = current_frame_idx + 1; | ||
parent_frame_idx < num_frames; ++parent_frame_idx) { | ||
|
||
for (uint32_t parent_frame_idx = current_frame_idx + 1;;parent_frame_idx++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: If you initialize parent_frame
to thread->GetStackFrameAtIndex(current_frame_idx + 1)
and move the parent_frame = ...
bit to the end of the loop, you can have the loop condition be parent_frame != nullptr
instead of relying on a break statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I would second that suggestion to make the code easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this, but I don't think it makes things any clearer. This isn't a simple loop, it has another break, and a continue. You end up having to cache IsInline before you reset the parent_frame and that hides where the important next frame part of the code goes. That's just awkward.
I think this version is clearer. The first thing the loop does is fetch the next frame and, checks if it is null as the signal that the stack walk is done. The way I wrote it keeps those operations right next to one another which I think is easier to read..
Should the Doxygen comment of GetStackFrameCount warn that this is an expensive operation? |
It might be nice to add a "std::optional<uint32_t> max_frame_count" to this function to allow it to stop when it hits "max_frame_count". Like:
|
That seems an okay idea, but I wouldn't really want to use the new API in this patch. I know I'm only looking to get past all the inlined frames, which is pretty cheap, but I have no way of knowing how many there are. So it really wouldn't be a good idea to try to guess a max_frame_count. |
I added something to that effect. |
…vm#80918) The algorithm to find the DW_OP_entry_value requires you to find the nearest non-inlined frame. It did that by counting the number of stack frames so that it could use that as a loop stopper. That is unnecessary and inefficient. Unnecessary because GetFrameAtIndex will return a null frame when you step past the oldest frame, so you already have the "got to the end" signal without counting all the stack frames. And counting all the stack frames can be expensive. (cherry picked from commit a04c636)
The algorithm to find the DW_OP_entry_value requires you to find the nearest non-inlined frame. It did that by counting the number of stack frames so that it could use that as a loop stopper.
That is unnecessary and inefficient. Unnecessary because GetFrameAtIndex will return a null frame when you step past the oldest frame, so you already have the "got to the end" signal without counting all the stack frames.
And counting all the stack frames can be expensive.