Skip to content

[lldb] Fix RangeDataVector::CombineConsecutiveEntriesWithEqualData #127059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 20, 2025

Conversation

labath
Copy link
Collaborator

@labath labath commented Feb 13, 2025

Function was merging equal data even if they weren't adjecant. This caused a problem in command-disassemble.s test because the two ranges describing the function would be merged and "swallow" the function between them.

This PR copies/adapts the algorithm from
RangeVector::CombineConsecutiveEntries (which does not have the same problem) and also adds a call to ComputeUpperBounds as moving entries around invalidates the binary tree. (The lack of this call wasn't noticed until now either because we were not calling methods which rely on upper bounds (right now, it's only the ill-named FindEntryIndexes method), or because we weren't merging anything.

@labath labath requested a review from JDevlieghere as a code owner February 13, 2025 13:21
@llvmbot llvmbot added the lldb label Feb 13, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 13, 2025

@llvm/pr-subscribers-lldb

Author: Pavel Labath (labath)

Changes

Function was merging equal data even if they weren't adjecant. This caused a problem in command-disassemble.s test because the two ranges describing the function would be merged and "swallow" the function between them.

This PR copies/adapts the algorithm from
RangeVector::CombineConsecutiveEntries (which does not have the same problem) and also adds a call to ComputeUpperBounds as moving entries around invalidates the binary tree. (The lack of this call wasn't noticed until now either because we were not calling methods which rely on upper bounds (right now, it's only the ill-named FindEntryIndexes method), or because we weren't merging anything.


Full diff: https://github.com/llvm/llvm-project/pull/127059.diff

3 Files Affected:

  • (modified) lldb/include/lldb/Utility/RangeMap.h (+19-28)
  • (modified) lldb/test/Shell/Commands/command-disassemble.s (+1-2)
  • (modified) lldb/unittests/Utility/RangeMapTest.cpp (+21)
diff --git a/lldb/include/lldb/Utility/RangeMap.h b/lldb/include/lldb/Utility/RangeMap.h
index 433466eebced8..2bebe74cc4cfe 100644
--- a/lldb/include/lldb/Utility/RangeMap.h
+++ b/lldb/include/lldb/Utility/RangeMap.h
@@ -493,36 +493,27 @@ class RangeDataVector {
 #ifdef ASSERT_RANGEMAP_ARE_SORTED
     assert(IsSorted());
 #endif
-    typename Collection::iterator pos;
-    typename Collection::iterator end;
-    typename Collection::iterator prev;
-    bool can_combine = false;
-    // First we determine if we can combine any of the Entry objects so we
-    // don't end up allocating and making a new collection for no reason
-    for (pos = m_entries.begin(), end = m_entries.end(), prev = end; pos != end;
-         prev = pos++) {
-      if (prev != end && prev->data == pos->data) {
-        can_combine = true;
-        break;
-      }
-    }
+    auto first_intersect = std::adjacent_find(
+        m_entries.begin(), m_entries.end(), [](const Entry &a, const Entry &b) {
+          return a.DoesAdjoinOrIntersect(b) && a.data == b.data;
+        });
+    if (first_intersect == m_entries.end())
+      return;
 
-    // We can combine at least one entry, then we make a new collection and
-    // populate it accordingly, and then swap it into place.
-    if (can_combine) {
-      Collection minimal_ranges;
-      for (pos = m_entries.begin(), end = m_entries.end(), prev = end;
-           pos != end; prev = pos++) {
-        if (prev != end && prev->data == pos->data)
-          minimal_ranges.back().SetRangeEnd(pos->GetRangeEnd());
-        else
-          minimal_ranges.push_back(*pos);
-      }
-      // Use the swap technique in case our new vector is much smaller. We must
-      // swap when using the STL because std::vector objects never release or
-      // reduce the memory once it has been allocated/reserved.
-      m_entries.swap(minimal_ranges);
+    // We can combine at least one entry. Make a new collection and populate it
+    // accordingly, and then swap it into place.
+    auto pos = std::next(first_intersect);
+    Collection minimal_ranges(m_entries.begin(), pos);
+    for (; pos != m_entries.end(); ++pos) {
+      Entry &back = minimal_ranges.back();
+      if (back.DoesAdjoinOrIntersect(*pos) && back.data == pos->data)
+        back.SetRangeEnd(std::max(back.GetRangeEnd(), pos->GetRangeEnd()));
+      else
+        minimal_ranges.push_back(*pos);
     }
+    m_entries.swap(minimal_ranges);
+    if (!m_entries.empty())
+      ComputeUpperBounds(0, m_entries.size());
   }
 
   void Clear() { m_entries.clear(); }
diff --git a/lldb/test/Shell/Commands/command-disassemble.s b/lldb/test/Shell/Commands/command-disassemble.s
index eb84a9ce39d4a..14f416d221231 100644
--- a/lldb/test/Shell/Commands/command-disassemble.s
+++ b/lldb/test/Shell/Commands/command-disassemble.s
@@ -94,8 +94,7 @@
 # CHECK-EMPTY:
 # CHECK-NEXT: command-disassemble.s.tmp`n2::case3:
 # CHECK-NEXT: command-disassemble.s.tmp[0x9046] <+0>: jmp 0x6046 ; <-12288>
-## FIXME: This should resolve to `middle_of_case3`
-# CHECK-NEXT: command-disassemble.s.tmp[0x904b] <+5>: jmp 0x7046 ; n2::case3 - 8192
+# CHECK-NEXT: command-disassemble.s.tmp[0x904b] <+5>: jmp 0x7046 ; middle_of_case3
 # CHECK-NEXT: command-disassemble.s.tmp[0x9050] <+10>: int    $0x2a
 # CHECK-EMPTY:
 # CHECK-NEXT: command-disassemble.s.tmp`n1::case3:
diff --git a/lldb/unittests/Utility/RangeMapTest.cpp b/lldb/unittests/Utility/RangeMapTest.cpp
index 981fa2a7d1c34..337e74a0e3d8c 100644
--- a/lldb/unittests/Utility/RangeMapTest.cpp
+++ b/lldb/unittests/Utility/RangeMapTest.cpp
@@ -238,3 +238,24 @@ TEST(RangeDataVector, FindEntryIndexesThatContain_Overlap) {
   EXPECT_THAT(FindEntryIndexes(39, Map), testing::ElementsAre(10));
   EXPECT_THAT(FindEntryIndexes(40, Map), testing::ElementsAre());
 }
+
+TEST(RangeDataVector, CombineConsecutiveEntriesWithEqualData) {
+  RangeDataVectorT Map;
+  Map.Append(EntryT(0, 10, 10));
+  Map.Append(EntryT(10, 10, 10));
+  Map.Sort();
+  Map.CombineConsecutiveEntriesWithEqualData();
+  EXPECT_THAT(FindEntryIndexes(5, Map), testing::ElementsAre(10));
+  EXPECT_THAT(FindEntryIndexes(15, Map), testing::ElementsAre(10));
+  EXPECT_THAT(FindEntryIndexes(25, Map), testing::ElementsAre());
+
+  Map.Clear();
+  Map.Append(EntryT(0, 10, 10));
+  Map.Append(EntryT(20, 10, 10));
+  Map.Sort();
+  Map.CombineConsecutiveEntriesWithEqualData();
+  EXPECT_THAT(FindEntryIndexes(5, Map), testing::ElementsAre(10));
+  EXPECT_THAT(FindEntryIndexes(15, Map), testing::ElementsAre());
+  EXPECT_THAT(FindEntryIndexes(25, Map), testing::ElementsAre(10));
+  EXPECT_THAT(FindEntryIndexes(35, Map), testing::ElementsAre());
+}

Function was merging equal data even if they weren't adjecant. This
caused a problem in command-disassemble.s test because the two ranges
describing the function would be merged and "swallow" the function
between them.

This PR copies/adapts the algorithm from
RangeVector::CombineConsecutiveEntries (which does not have the same
problem) and also adds a call to ComputeUpperBounds as moving entries
around invalidates the binary tree. (The lack of this call wasn't
noticed until now either because we were not calling methods which rely
on upper bounds (right now, it's only the ill-named FindEntryIndexes
method), or because we weren't merging anything.
}
auto first_intersect = std::adjacent_find(
m_entries.begin(), m_entries.end(), [](const Entry &a, const Entry &b) {
return a.DoesAdjoinOrIntersect(b) && a.data == b.data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the equal check cheaper than the Intersect or Adjacency check? If so we should short circuit on equality before checking for an intersection

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of them are integer comparisons (in practice -- technically, this is a template so it could be whatever), so it really comes down to "which one is more likely to be false". I'm not sure about that, but I doubt this code is hot enough for it to matter. If we wanted to optimize this we could change the DoesAdjoinOrIntersect call to b.GetRangeBase() <= a.GetRangeEnd() (since the other check inside that function is guaranteed to be true due to sorting).

auto first_intersect = std::adjacent_find(
m_entries.begin(), m_entries.end(), [](const Entry &a, const Entry &b) {
return a.DoesAdjoinOrIntersect(b) && a.data == b.data;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: an empty line before the if is in my opinion nicer to read

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me.

}
m_entries.swap(minimal_ranges);
if (!m_entries.empty())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for this to be empty? We would have at least one entry to pass the intersection check, and I don't see how we would empty the collection via merging ranges

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I am usually all about deleting these.

I was actually thinking about follow this up with a patch to move the check inside the function, as it's currently repeated at all call sites (including the recursive ones).

@labath labath merged commit e264317 into llvm:main Feb 20, 2025
7 checks passed
@labath labath deleted the disasm branch February 20, 2025 09:26
labath added a commit that referenced this pull request Feb 20, 2025
I suspect it was fixed by #127059. aarch64 is the only windows bot we have now, so it's can't be certain it's fixed everywhere, but also I have no reason to believe otherwise.

Fixes #43774.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants