Skip to content

Don't count all the frames just to skip the current inlined ones. #80918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 13, 2024

Conversation

jimingham
Copy link
Collaborator

The algorithm to find the DW_OP_entry_value requires you to find the nearest non-inlined frame. It did that by counting the number of stack frames so that it could use that as a loop stopper.

That is unnecessary and inefficient. Unnecessary because GetFrameAtIndex will return a null frame when you step past the oldest frame, so you already have the "got to the end" signal without counting all the stack frames.
And counting all the stack frames can be expensive.

The algorithm to find the DW_OP_entry_value requires you to find
the nearest non-inlined frame.  It did that by counting the number
of stack frames so that it could use that as a loop stopper.

That is unnecessary and inefficient.  Unnecessary because
GetFrameAtIndex will return a null frame when you step past the
oldest frame, so you already have the "got to the end" signal
without counting all the stack frames.
And counting all the stack frames can be expensive.
@llvmbot
Copy link
Member

llvmbot commented Feb 7, 2024

@llvm/pr-subscribers-lldb

Author: None (jimingham)

Changes

The algorithm to find the DW_OP_entry_value requires you to find the nearest non-inlined frame. It did that by counting the number of stack frames so that it could use that as a loop stopper.

That is unnecessary and inefficient. Unnecessary because GetFrameAtIndex will return a null frame when you step past the oldest frame, so you already have the "got to the end" signal without counting all the stack frames.
And counting all the stack frames can be expensive.


Full diff: https://github.com/llvm/llvm-project/pull/80918.diff

1 Files Affected:

  • (modified) lldb/source/Expression/DWARFExpression.cpp (+3-4)
diff --git a/lldb/source/Expression/DWARFExpression.cpp b/lldb/source/Expression/DWARFExpression.cpp
index fe4928d4f43a43..c061fd1140fff7 100644
--- a/lldb/source/Expression/DWARFExpression.cpp
+++ b/lldb/source/Expression/DWARFExpression.cpp
@@ -608,11 +608,10 @@ static bool Evaluate_DW_OP_entry_value(std::vector<Value> &stack,
   StackFrameSP parent_frame = nullptr;
   addr_t return_pc = LLDB_INVALID_ADDRESS;
   uint32_t current_frame_idx = current_frame->GetFrameIndex();
-  uint32_t num_frames = thread->GetStackFrameCount();
-  for (uint32_t parent_frame_idx = current_frame_idx + 1;
-       parent_frame_idx < num_frames; ++parent_frame_idx) {
+
+  for (uint32_t parent_frame_idx = current_frame_idx + 1;;parent_frame_idx++) {
     parent_frame = thread->GetStackFrameAtIndex(parent_frame_idx);
-    // Require a valid sequence of frames.
+    // If this is null, we're at the end of the stack.
     if (!parent_frame)
       break;
 

@jimingham
Copy link
Collaborator Author

I found this looking though a bunch of samples to see why some operation was slow, and a good bit of the time was this unnecessary counting the stack.

Except for performance, this is not easily observable, I couldn't figure out how to write a robust test.

Copy link

github-actions bot commented Feb 7, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 2f490583c368627f552c71e340c39f2b55c0526c e8659a128f34b93469e9ad9b0ed013ff6764c5be -- lldb/include/lldb/Target/Thread.h lldb/source/Expression/DWARFExpression.cpp
View the diff from clang-format here.
diff --git a/lldb/include/lldb/Target/Thread.h b/lldb/include/lldb/Target/Thread.h
index 30863ad4c9..b764dbf3a9 100644
--- a/lldb/include/lldb/Target/Thread.h
+++ b/lldb/include/lldb/Target/Thread.h
@@ -391,7 +391,7 @@ public:
   virtual bool ThreadHasQueueInformation() const { return false; }
 
   /// GetStackFrameCount can be expensive.  Stacks can get very deep, and they
-  /// require memory reads for each frame.  So only use GetStackFrameCount when 
+  /// require memory reads for each frame.  So only use GetStackFrameCount when
   /// you need to know the depth of the stack.  When iterating over frames, its
   /// better to generate the frames one by one with GetFrameAtIndex, and when
   /// that returns NULL, you are at the end of the stack.  That way your loop
diff --git a/lldb/source/Expression/DWARFExpression.cpp b/lldb/source/Expression/DWARFExpression.cpp
index c061fd1140..e909c2c65d 100644
--- a/lldb/source/Expression/DWARFExpression.cpp
+++ b/lldb/source/Expression/DWARFExpression.cpp
@@ -609,7 +609,7 @@ static bool Evaluate_DW_OP_entry_value(std::vector<Value> &stack,
   addr_t return_pc = LLDB_INVALID_ADDRESS;
   uint32_t current_frame_idx = current_frame->GetFrameIndex();
 
-  for (uint32_t parent_frame_idx = current_frame_idx + 1;;parent_frame_idx++) {
+  for (uint32_t parent_frame_idx = current_frame_idx + 1;; parent_frame_idx++) {
     parent_frame = thread->GetStackFrameAtIndex(parent_frame_idx);
     // If this is null, we're at the end of the stack.
     if (!parent_frame)

for (uint32_t parent_frame_idx = current_frame_idx + 1;
parent_frame_idx < num_frames; ++parent_frame_idx) {

for (uint32_t parent_frame_idx = current_frame_idx + 1;;parent_frame_idx++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: If you initialize parent_frame to thread->GetStackFrameAtIndex(current_frame_idx + 1) and move the parent_frame = ... bit to the end of the loop, you can have the loop condition be parent_frame != nullptr instead of relying on a break statement.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I would second that suggestion to make the code easier to read.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, but I don't think it makes things any clearer. This isn't a simple loop, it has another break, and a continue. You end up having to cache IsInline before you reset the parent_frame and that hides where the important next frame part of the code goes. That's just awkward.
I think this version is clearer. The first thing the loop does is fetch the next frame and, checks if it is null as the signal that the stack walk is done. The way I wrote it keeps those operations right next to one another which I think is easier to read..

@adrian-prantl
Copy link
Collaborator

Should the Doxygen comment of GetStackFrameCount warn that this is an expensive operation?
https://lldb.llvm.org/cpp_reference/classlldb__private_1_1Thread.html#afc54feef950a58b625bbb198dc4cf57c

@clayborg
Copy link
Collaborator

clayborg commented Feb 7, 2024

Should the Doxygen comment of GetStackFrameCount warn that this is an expensive operation? https://lldb.llvm.org/cpp_reference/classlldb__private_1_1Thread.html#afc54feef950a58b625bbb198dc4cf57c

It might be nice to add a "std::optional<uint32_t> max_frame_count" to this function to allow it to stop when it hits "max_frame_count". Like:

/// Get the number of frames in a thread.
///
/// If \a max_frame_count is valid, return a number that is less than or equal 
/// to max_frame_count, else calculate the true number of frames. Calculating
/// the total number of frames can be expensive.
virtual uint32_t lldb_private::Thread::GetStackFrameCount(std::optional<uint32_t> max_frame_count);

@jimingham
Copy link
Collaborator Author

Should the Doxygen comment of GetStackFrameCount warn that this is an expensive operation? https://lldb.llvm.org/cpp_reference/classlldb__private_1_1Thread.html#afc54feef950a58b625bbb198dc4cf57c

It might be nice to add a "std::optional<uint32_t> max_frame_count" to this function to allow it to stop when it hits "max_frame_count". Like:

/// Get the number of frames in a thread.
///
/// If \a max_frame_count is valid, return a number that is less than or equal 
/// to max_frame_count, else calculate the true number of frames. Calculating
/// the total number of frames can be expensive.
virtual uint32_t lldb_private::Thread::GetStackFrameCount(std::optional<uint32_t> max_frame_count);

That seems an okay idea, but I wouldn't really want to use the new API in this patch. I know I'm only looking to get past all the inlined frames, which is pretty cheap, but I have no way of knowing how many there are. So it really wouldn't be a good idea to try to guess a max_frame_count.
I'd rather not add an API to a patch that doesn't actually use the API...

@jimingham
Copy link
Collaborator Author

Should the Doxygen comment of GetStackFrameCount warn that this is an expensive operation? https://lldb.llvm.org/cpp_reference/classlldb__private_1_1Thread.html#afc54feef950a58b625bbb198dc4cf57c

I added something to that effect.

@jimingham jimingham merged commit a04c636 into llvm:main Feb 13, 2024
@jimingham jimingham deleted the dont-count-frames branch February 13, 2024 19:06
jimingham added a commit to jimingham/from-apple-llvm-project that referenced this pull request Feb 29, 2024
…vm#80918)

The algorithm to find the DW_OP_entry_value requires you to find the
nearest non-inlined frame. It did that by counting the number of stack
frames so that it could use that as a loop stopper.

That is unnecessary and inefficient. Unnecessary because GetFrameAtIndex
will return a null frame when you step past the oldest frame, so you
already have the "got to the end" signal without counting all the stack
frames.
And counting all the stack frames can be expensive.

(cherry picked from commit a04c636)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants