[lldb] (Prepare to) speed up dwarf indexing #118657

labath · 2024-12-04T15:44:46Z

Indexing a single DWARF unit is a fairly small task, which means the overhead of enqueueing a task for each unit is not negligible (mainly because introduces a lot of synchronization points for queue management, memory allocation etc.). This is particularly true if the binary was built with type units, as these are usually very small.

This essentially brings us back to the state before https://reviews.llvm.org/D78337, but the new implementation is built on the llvm ThreadPool, and I've added a small improvement -- we now construct one "index set" per thread instead of one per unit, which should lower the memory usage (fewer small allocations) and make the subsequent merge step faster.

On its own this patch doesn't actually change the performance characteristics because we still have one choke point -- progress reporting. I'm leaving that for a separate patch, but I've tried that simply removing the progress reporting gives us about a 30-60% speed boost.

Indexing a single DWARF unit is a fairly small task, which means the overhead of enqueueing a task for each unit is not negligible (mainly because introduces a lot of synchronization points for queue management, memory allocation etc.). This is particularly true for if the binary was built with type units, as these are usually very small. This essentially brings us back to the state before https://reviews.llvm.org/D78337, but the new implementation is built on the llvm ThreadPool, and I've added a small improvement -- we now construct one "index set" per thread instead of one per unit, which should lower the memory usage (fewer small allocations) and make the subsequent merge step faster. On its own this patch doesn't actually change the performance characteristics because we still have one choke point -- progress reporting. I'm leaving that for a separate patch, but I've tried that simply removing the progress reporting gives us about a 30-60% speed boost.

llvmbot · 2024-12-04T15:45:24Z

@llvm/pr-subscribers-lldb

Author: Pavel Labath (labath)

Changes

Indexing a single DWARF unit is a fairly small task, which means the overhead of enqueueing a task for each unit is not negligible (mainly because introduces a lot of synchronization points for queue management, memory allocation etc.). This is particularly true for if the binary was built with type units, as these are usually very small.

This essentially brings us back to the state before https://reviews.llvm.org/D78337, but the new implementation is built on the llvm ThreadPool, and I've added a small improvement -- we now construct one "index set" per thread instead of one per unit, which should lower the memory usage (fewer small allocations) and make the subsequent merge step faster.

On its own this patch doesn't actually change the performance characteristics because we still have one choke point -- progress reporting. I'm leaving that for a separate patch, but I've tried that simply removing the progress reporting gives us about a 30-60% speed boost.

Full diff: https://github.com/llvm/llvm-project/pull/118657.diff

1 Files Affected:

(modified) lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp (+42-31)

diff --git a/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp b/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp
index 0be19ab29ef082..03a031626ab154 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp
@@ -23,6 +23,7 @@
 #include "lldb/Utility/Timer.h"
 #include "llvm/Support/FormatVariadic.h"
 #include "llvm/Support/ThreadPool.h"
+#include <atomic>
 #include <optional>
 
 using namespace lldb_private;
@@ -81,44 +82,54 @@ void ManualDWARFIndex::Index() {
   Progress progress("Manually indexing DWARF", module_desc.GetData(),
                     total_progress);
 
-  std::vector<IndexSet> sets(units_to_index.size());
-
-  // Keep memory down by clearing DIEs for any units if indexing
-  // caused us to load the unit's DIEs.
-  std::vector<std::optional<DWARFUnit::ScopedExtractDIEs>> clear_cu_dies(
-      units_to_index.size());
-  auto parser_fn = [&](size_t cu_idx) {
-    IndexUnit(*units_to_index[cu_idx], dwp_dwarf, sets[cu_idx]);
-    progress.Increment();
-  };
-
-  auto extract_fn = [&](size_t cu_idx) {
-    clear_cu_dies[cu_idx] = units_to_index[cu_idx]->ExtractDIEsScoped();
-    progress.Increment();
-  };
 
   // Share one thread pool across operations to avoid the overhead of
   // recreating the threads.
   llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
+  const size_t num_threads = Debugger::GetThreadPool().getMaxConcurrency();
+
+  // Run a function for each compile unit in parallel using as many threads as
+  // are available. This is significantly faster than submiting a new task for
+  // each unit.
+  auto for_each_unit = [&](auto &&fn) {
+    std::atomic<size_t> next_cu_idx = 0;
+    auto wrapper = [&fn, &next_cu_idx, &units_to_index,
+                    &progress](size_t worker_id) {
+      size_t cu_idx;
+      while ((cu_idx = next_cu_idx.fetch_add(1, std::memory_order_relaxed)) <
+             units_to_index.size()) {
+        fn(worker_id, cu_idx, units_to_index[cu_idx]);
+        progress.Increment();
+      }
+    };
 
-  // Create a task runner that extracts dies for each DWARF unit in a
-  // separate thread.
-  // First figure out which units didn't have their DIEs already
-  // parsed and remember this.  If no DIEs were parsed prior to this index
-  // function call, we are going to want to clear the CU dies after we are
-  // done indexing to make sure we don't pull in all DWARF dies, but we need
-  // to wait until all units have been indexed in case a DIE in one
-  // unit refers to another and the indexes accesses those DIEs.
-  for (size_t i = 0; i < units_to_index.size(); ++i)
-    task_group.async(extract_fn, i);
-  task_group.wait();
+    for (size_t i = 0; i < num_threads; ++i)
+      task_group.async(wrapper, i);
 
-  // Now create a task runner that can index each DWARF unit in a
-  // separate thread so we can index quickly.
-  for (size_t i = 0; i < units_to_index.size(); ++i)
-    task_group.async(parser_fn, i);
-  task_group.wait();
+    task_group.wait();
+  };
 
+  // Extract dies for all DWARFs unit in parallel.  Figure out which units
+  // didn't have their DIEs already parsed and remember this.  If no DIEs were
+  // parsed prior to this index function call, we are going to want to clear the
+  // CU dies after we are done indexing to make sure we don't pull in all DWARF
+  // dies, but we need to wait until all units have been indexed in case a DIE
+  // in one unit refers to another and the indexes accesses those DIEs.
+  std::vector<std::optional<DWARFUnit::ScopedExtractDIEs>> clear_cu_dies(
+      units_to_index.size());
+  for_each_unit([&clear_cu_dies](size_t, size_t idx, DWARFUnit *unit) {
+    clear_cu_dies[idx] = unit->ExtractDIEsScoped();
+  });
+
+  // Now index all DWARF unit in parallel.
+  std::vector<IndexSet> sets(num_threads);
+  for_each_unit(
+      [this, dwp_dwarf, &sets](size_t worker_id, size_t, DWARFUnit *unit) {
+        IndexUnit(*unit, dwp_dwarf, sets[worker_id]);
+      });
+
+  // Merge partial indexes into a single index. Process each index in a set in
+  // parallel.
   auto finalize_fn = [this, &sets, &progress](NameToDIE(IndexSet::*index)) {
     NameToDIE &result = m_set.*index;
     for (auto &set : sets)

github-actions · 2024-12-04T15:48:14Z

✅ With the latest revision this PR passed the C/C++ code formatter.

JDevlieghere

Makes sense, LGTM.

If the amount of work here is so small, and the granularity was too fine for creating separate tasks, I'd argue that it's too fine for progress reporting as well. I think that's what you're saying too. There's probably multiple ways of dealing with that, so I'm looking forward to the PR that addresses that.

labath · 2024-12-05T09:21:26Z

Makes sense, LGTM.

If the amount of work here is so small, and the granularity was too fine for creating separate tasks, I'd argue that it's too fine for progress reporting as well. I think that's what you're saying too. There's probably multiple ways of dealing with that, so I'm looking forward to the PR that addresses that.

It is, and there are. I have a prototype which deals with that by updating progress after every X bytes of DWARF have been indexed. The thing I like about that is that it should give a relatively constant rate of updates regardless of whether you're using type units (many small units) or not (fewer large units). The part I don't like is that does gives a wildly different rate for split vs. non-split DWARF (as in the former case, the main DWARF unit is just a handful of bytes). I'm still thinking about alternatives...

labath requested a review from JDevlieghere as a code owner December 4, 2024 15:44

llvmbot added the lldb label Dec 4, 2024

JDevlieghere approved these changes Dec 4, 2024

View reviewed changes

reformat

2b08e19

labath merged commit 15de77d into llvm:main Dec 5, 2024
7 checks passed

labath deleted the fast branch December 5, 2024 09:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lldb] (Prepare to) speed up dwarf indexing #118657

[lldb] (Prepare to) speed up dwarf indexing #118657

Uh oh!

labath commented Dec 4, 2024 •

edited

Loading

Uh oh!

llvmbot commented Dec 4, 2024

Uh oh!

github-actions bot commented Dec 4, 2024 •

edited

Loading

Uh oh!

JDevlieghere left a comment

Uh oh!

labath commented Dec 5, 2024

Uh oh!

Uh oh!

Uh oh!

[lldb] (Prepare to) speed up dwarf indexing #118657

[lldb] (Prepare to) speed up dwarf indexing #118657

Uh oh!

Conversation

labath commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 4, 2024

Uh oh!

github-actions bot commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JDevlieghere left a comment

Choose a reason for hiding this comment

Uh oh!

labath commented Dec 5, 2024

Uh oh!

Uh oh!

Uh oh!

labath commented Dec 4, 2024 •

edited

Loading

github-actions bot commented Dec 4, 2024 •

edited

Loading