[lldb] Reduce the frequency of DWARF index progress reporting #118953

labath · 2024-12-06T10:49:06Z

Indexing a single DWARF unit is a relatively fast operation, particularly if it's a type unit, which can be very small. Reporting progress takes a mutex (and allocates memory, etc.), which creates a lot of contention and slows down indexing noticeably.

This patch reports makes us report progress only once per 10 milliseconds (on average), which speeds up indexing by up to 55%. It achieves this by checking whether the time after indexing every unit. This creates the possibility that a particularly large unit could cause us to stop reporting progress for a while (even for units that have already been indexed), but I don't think this is likely to happen, because:

Even the largest units don't take that long to index. The largest unit in lldb (4MB of .debug_info) was indexed in "only" 200ms.
The time is being checked and reported by all worker threads, which means that in order to stall, we'd have to be very unfortunate and pick up an extremely large compile unit on all indexing threads simultaneously.

Even if that does happens, the only negative consequence is some jitteriness in a progress bar, which is why I prefer this over alternative implementations which e.g. involve reporting progress from a dedicated thread.

Indexing a single DWARF unit is a relatively fast operation, particularly if it's a type unit, which can be very small. Reporting progress takes a mutex (and allocates memory, etc.), which creates a lot of contention and slows down indexing noticeably. This patch reports makes us report progress only once per 10 milliseconds (on average), which speeds up indexing by up to 55%. It achieves this by checking whether the time after indexing every unit. This creates the possibility that a particularly large unit could cause us to stop reporting progress for a while (even for units that have already been indexed), but I don't think this is likely to happen, because: - Even the largest units don't take that long to index. The largest unit in lldb (4MB of .debug_info) was indexed in "only" 200ms. - The time is being checked and reported by all worker threads, which means that in order to stall, we'd have to be very unfortunate and pick up an extremely large compile unit on all indexing threads simultaneously. Even if that does happens, the only negative consequence is some jitteriness in a progress bar, which is why I prefer this over alternative implementations which e.g. involve reporting progress from a dedicated thread.

llvmbot · 2024-12-06T10:49:44Z

@llvm/pr-subscribers-lldb

Author: Pavel Labath (labath)

Changes

Indexing a single DWARF unit is a relatively fast operation, particularly if it's a type unit, which can be very small. Reporting progress takes a mutex (and allocates memory, etc.), which creates a lot of contention and slows down indexing noticeably.

This patch reports makes us report progress only once per 10 milliseconds (on average), which speeds up indexing by up to 55%. It achieves this by checking whether the time after indexing every unit. This creates the possibility that a particularly large unit could cause us to stop reporting progress for a while (even for units that have already been indexed), but I don't think this is likely to happen, because:

Even the largest units don't take that long to index. The largest unit in lldb (4MB of .debug_info) was indexed in "only" 200ms.
The time is being checked and reported by all worker threads, which means that in order to stall, we'd have to be very unfortunate and pick up an extremely large compile unit on all indexing threads simultaneously.

Even if that does happens, the only negative consequence is some jitteriness in a progress bar, which is why I prefer this over alternative implementations which e.g. involve reporting progress from a dedicated thread.

Full diff: https://github.com/llvm/llvm-project/pull/118953.diff

1 Files Affected:

(modified) lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp (+23-8)

diff --git a/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp b/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp
index 5b325e30bef430..a3e595d0194eb9 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp
@@ -24,6 +24,7 @@
 #include "llvm/Support/FormatVariadic.h"
 #include "llvm/Support/ThreadPool.h"
 #include <atomic>
+#include <chrono>
 #include <optional>
 
 using namespace lldb_private;
@@ -91,14 +92,27 @@ void ManualDWARFIndex::Index() {
   // are available. This is significantly faster than submiting a new task for
   // each unit.
   auto for_each_unit = [&](auto &&fn) {
-    std::atomic<size_t> next_cu_idx = 0;
-    auto wrapper = [&fn, &next_cu_idx, &units_to_index,
-                    &progress](size_t worker_id) {
-      size_t cu_idx;
-      while ((cu_idx = next_cu_idx.fetch_add(1, std::memory_order_relaxed)) <
-             units_to_index.size()) {
-        fn(worker_id, cu_idx, units_to_index[cu_idx]);
-        progress.Increment();
+    std::atomic<size_t> next_unit_idx = 0;
+    std::atomic<size_t> units_indexed = 0;
+    auto wrapper = [&fn, &next_unit_idx, &units_indexed, &units_to_index,
+                    &progress, num_threads](size_t worker_id) {
+      constexpr auto progress_interval = std::chrono::milliseconds(10);
+
+      // Stagger the reports for different threads so we get a steady stream of
+      // one report per ~10ms.
+      auto next_report = std::chrono::steady_clock::now() +
+                         progress_interval * (1 + worker_id);
+      size_t unit_idx;
+      while ((unit_idx = next_unit_idx.fetch_add(
+                  1, std::memory_order_relaxed)) < units_to_index.size()) {
+        fn(worker_id, unit_idx, units_to_index[unit_idx]);
+
+        units_indexed.fetch_add(1, std::memory_order_acq_rel);
+        if (auto now = std::chrono::steady_clock::now(); now >= next_report) {
+          progress.Increment(
+              units_indexed.exchange(0, std::memory_order_acq_rel));
+          next_report = now + num_threads * progress_interval;
+        }
       }
     };
 
@@ -106,6 +120,7 @@ void ManualDWARFIndex::Index() {
       task_group.async(wrapper, i);
 
     task_group.wait();
+    progress.Increment(units_indexed.load(std::memory_order_acquire));
   };
 
   // Extract dies for all DWARFs unit in parallel.  Figure out which units

SingleAccretion · 2024-12-06T11:07:05Z

I would like to say I would love for this improvement to land :).

I have observed that on Windows with a fair number (4K+) of small compile units the progress reporting completely dominates indexing time due to contention (presumably not just in the lock but also the IO layers), to the point that disabling it resulted in very large speedups (an operation which previously took 5s+ now was almost instant).

JDevlieghere

🚢

clayborg

Can we build this feature into the Progress class by calling an accessor? Something like:

Progress progress("Manually indexing DWARF", module_desc.GetData(), total_progress);
progress.SetMinimumNotificationTime(std::chrono::milliseconds(10));

Then any busy progress dialogs can take advantage of this timing feature?

clayborg · 2024-12-06T18:58:34Z

And the Progress class can check if the an optional instance variable that contains the minimum time has a value and avoid taking the mutex to keep things faster even when building this into the Progress class?

labath · 2024-12-09T16:34:01Z

Can we build this feature into the Progress class by calling an accessor? Something like:
Progress progress("Manually indexing DWARF", module_desc.GetData(), total_progress);
progress.SetMinimumNotificationTime(std::chrono::milliseconds(10));
Then any busy progress dialogs can take advantage of this timing feature?

That's possible, but there are a couple of caveats:

in the Progress class implementation, we'd only have a single "time of last progress report" member, which would mean that (in order for it to be lock-free) it would have to be atomic. As there's no atomic<duration>, this would have to be a raw "time since epoch") timestamp. It's fine, but slightly uglier, and it's also one more atomic. In this version I was able to get away with it by essentially making the timestamp thread-local, which is sort of nice, but also less correct than the hypothetical version with a single variable, so maybe that's okay..
this implementation only works if the progress updates come at a fairly fast and steady rate. This is the case here, but may not be true for all usages. With this implementation, if you have something that tries to send e.g. 9 (out of 10) progress updates very quickly (within the 10ms interval), but then gets spends a lot of time (many seconds) on the last part, the progress bar will stay stuck at 1/10 because we've ignored the updates 2--9. Attempting to schedule some sort of a callback to send the 9/10 event after a timeout would probably involve spinning up another thread, and all of this management would make the code significantly more complicated (and slower).

This is the reason I did not want to make this a general feature, but it's not strong opinion, so if you're find with these trade-offs, I can move the code into the Progress class. I definitely don't want get into the business of managing another thread though.

clayborg · 2024-12-10T01:38:24Z

What is the resolution if we switch to "time since epoch" in an atomic value? Can we use some casting to cast a double back into a uint64_t for storage in an atomic?

It would be nice to not require everyone to roll their own timeout if possible by just calling an accessor. I am ok with the caveats if everyone else is.

labath · 2024-12-10T13:38:36Z

What is the resolution if we switch to "time since epoch" in an atomic value? Can we use some casting to cast a double back into a uint64_t for storage in an atomic?

That's not a problem. I was referring to the time_point::time_since_epoch() (which returns arbitrary precision), not the unix "seconds since epoch" concept. I've created an alternative implementation in #119377

labath · 2024-12-10T13:49:03Z

I would like to say I would love for this improvement to land :).

I have observed that on Windows with a fair number (4K+) of small compile units the progress reporting completely dominates indexing time due to contention (presumably not just in the lock but also the IO layers), to the point that disabling it resulted in very large speedups (an operation which previously took 5s+ now was almost instant).

Thanks. I think that in your case we're limited by the speed of console updates rather than the progress mutex. We've gotten similar reports for high ("high") latency ssh connections and unusual console setups, and that's the angle I've started working this from before I realized the it was bottlenecking the parallelism. If you have some time, I'd be grateful if you could test this patch out. The thing want the see is whether this completely resolves your issue or if you still see a difference between disabled&enabled progress reports. The reason I'm asking that is that I have a prototype which should specifically help with the slow console issue, and I'm wondering if that is necessary, or if this patch is sufficient.

labath · 2024-12-10T13:49:59Z

this patch

or the other patch (#119377). Either is fine for the thing I want to see.

SingleAccretion · 2024-12-10T19:49:22Z

I think that in your case we're limited by the speed of console updates rather than the progress mutex.

Agreed.

If you have some time, I'd be grateful if you could test this patch out. The thing want the see is whether this completely resolves your issue or if you still see a difference between disabled&enabled progress reports.

I can confirm it does!

Without the patch, I see a ~25x slowdown when enabling progress reporting ("LLDB time" goes from < 1 second to ~20 seconds), with this patch, the difference is not measurable.

labath · 2024-12-13T13:21:18Z

That's great. Thanks for checking this out.

Closing this patch in favor of #119377.

labath requested a review from JDevlieghere as a code owner December 6, 2024 10:49

llvmbot added the lldb label Dec 6, 2024

JDevlieghere approved these changes Dec 6, 2024

View reviewed changes

clayborg reviewed Dec 6, 2024

View reviewed changes

labath closed this Dec 13, 2024

labath deleted the fast branch December 17, 2024 10:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lldb] Reduce the frequency of DWARF index progress reporting #118953

[lldb] Reduce the frequency of DWARF index progress reporting #118953

Uh oh!

labath commented Dec 6, 2024

Uh oh!

llvmbot commented Dec 6, 2024

Uh oh!

SingleAccretion commented Dec 6, 2024 •

edited

Loading

Uh oh!

JDevlieghere left a comment

Uh oh!

clayborg left a comment

Uh oh!

clayborg commented Dec 6, 2024

Uh oh!

labath commented Dec 9, 2024 •

edited

Loading

Uh oh!

clayborg commented Dec 10, 2024

Uh oh!

labath commented Dec 10, 2024

Uh oh!

labath commented Dec 10, 2024

Uh oh!

labath commented Dec 10, 2024

Uh oh!

SingleAccretion commented Dec 10, 2024

Uh oh!

labath commented Dec 13, 2024

Uh oh!

Uh oh!

[lldb] Reduce the frequency of DWARF index progress reporting #118953

[lldb] Reduce the frequency of DWARF index progress reporting #118953

Uh oh!

Conversation

labath commented Dec 6, 2024

Uh oh!

llvmbot commented Dec 6, 2024

Uh oh!

SingleAccretion commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JDevlieghere left a comment

Choose a reason for hiding this comment

Uh oh!

clayborg left a comment

Choose a reason for hiding this comment

Uh oh!

clayborg commented Dec 6, 2024

Uh oh!

labath commented Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clayborg commented Dec 10, 2024

Uh oh!

labath commented Dec 10, 2024

Uh oh!

labath commented Dec 10, 2024

Uh oh!

labath commented Dec 10, 2024

Uh oh!

SingleAccretion commented Dec 10, 2024

Uh oh!

labath commented Dec 13, 2024

Uh oh!

Uh oh!

SingleAccretion commented Dec 6, 2024 •

edited

Loading

labath commented Dec 9, 2024 •

edited

Loading