Reduce llvm-gsymutil memory usage #91023

kevinfrei · 2024-05-03T22:24:39Z

llvm-gsymutil eats a lot of RAM. On some large binaries, it causes OOM's on smaller hardware, consuming well over 64GB of RAM. This change frees line tables once we're done with them, and frees DWARFUnits's DIE's when we finish processing each DU, though they may get reconstituted if there are references from other DU's during processing. Once the conversion is complete, all DIE's are freed. The reduction in peak memory usage from these changes showed between 7-12% in my tests.

The double-checked locking around the creation & freeing of the data structures was tested on a 166 core system. I validated that it trivially malfunctioned without the locks (and with stupid reordering of the locks) and worked reliably with them.

llvmbot · 2024-05-06T16:14:12Z

@llvm/pr-subscribers-debuginfo

Author: Kevin Frei (kevinfrei)

Changes

llvm-gsymutil eats a lot of RAM. On some large binaries, it causes OOM's on smaller hardware, consuming well over 64GB of RAM. This change frees line tables once we're done with them, and frees DWARFUnits's DIE's when we finish processing each DU, though they may get reconstituted if there are references from other DU's during processing. Once the conversion is complete, all DIE's are freed. The reduction in peak memory usage from these changes showed between 7-12% in my tests.

My testing of the double-checked locking around the creation & freeing of the data structures was tested on a 166 core system, validating that it trivially malfunctioned without the locks (or with stupid reordering of the locks) and worked reliably with them.

Full diff: https://github.com/llvm/llvm-project/pull/91023.diff

3 Files Affected:

(modified) llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h (+7-3)
(modified) llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp (+25-7)
(modified) llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp (+14-1)

diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h
index 80c27aea893123..9614aab8bb9b50 100644
--- a/llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h
+++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h
@@ -22,6 +22,7 @@
 #include "llvm/DebugInfo/DWARF/DWARFLocationExpression.h"
 #include "llvm/DebugInfo/DWARF/DWARFUnitIndex.h"
 #include "llvm/Support/DataExtractor.h"
+#include "llvm/Support/RWMutex.h"
 #include <cassert>
 #include <cstddef>
 #include <cstdint>
@@ -257,6 +258,9 @@ class DWARFUnit {
 
   std::shared_ptr<DWARFUnit> DWO;
 
+  mutable llvm::sys::RWMutex m_cu_die_array_mutex;
+  mutable llvm::sys::RWMutex m_all_die_array_mutex;
+
 protected:
   friend dwarf_linker::parallel::CompileUnit;
 
@@ -566,6 +570,9 @@ class DWARFUnit {
 
   Error tryExtractDIEsIfNeeded(bool CUDieOnly);
 
+  /// clearDIEs - Clear parsed DIEs to keep memory usage low.
+  void clearDIEs(bool KeepCUDie);
+
 private:
   /// Size in bytes of the .debug_info data associated with this compile unit.
   size_t getDebugInfoSize() const {
@@ -581,9 +588,6 @@ class DWARFUnit {
   void extractDIEsToVector(bool AppendCUDie, bool AppendNonCUDIEs,
                            std::vector<DWARFDebugInfoEntry> &DIEs) const;
 
-  /// clearDIEs - Clear parsed DIEs to keep memory usage low.
-  void clearDIEs(bool KeepCUDie);
-
   /// parseDWO - Parses .dwo file for current compile unit. Returns true if
   /// it was actually constructed.
   /// The \p AlternativeLocation specifies an alternative location to get
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp b/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
index bdd04b00f557bd..cc79d9ec7130c9 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
@@ -496,13 +496,29 @@ void DWARFUnit::extractDIEsIfNeeded(bool CUDieOnly) {
 }
 
 Error DWARFUnit::tryExtractDIEsIfNeeded(bool CUDieOnly) {
-  if ((CUDieOnly && !DieArray.empty()) ||
-      DieArray.size() > 1)
-    return Error::success(); // Already parsed.
-
-  bool HasCUDie = !DieArray.empty();
-  extractDIEsToVector(!HasCUDie, !CUDieOnly, DieArray);
-
+  {
+    llvm::sys::ScopedReader Lock(m_cu_die_array_mutex);
+    if ((CUDieOnly && !DieArray.empty()) || DieArray.size() > 1)
+      return Error::success(); // Already parsed.
+  }
+  bool HasCUDie = false;
+  {
+    llvm::sys::ScopedWriter Lock(m_cu_die_array_mutex);
+    if ((CUDieOnly && !DieArray.empty()) || DieArray.size() > 1)
+      return Error::success(); // Already parsed.
+    HasCUDie = !DieArray.empty();
+    extractDIEsToVector(!HasCUDie, !CUDieOnly, DieArray);
+  }
+  {
+    llvm::sys::ScopedReader Lock(m_all_die_array_mutex);
+    if (DieArray.empty())
+      return Error::success();
+
+    // If CU DIE was just parsed, copy several attribute values from it.
+    if (HasCUDie)
+      return Error::success();
+  }
+  llvm::sys::ScopedWriter Lock(m_all_die_array_mutex);
   if (DieArray.empty())
     return Error::success();
 
@@ -658,6 +674,8 @@ void DWARFUnit::clearDIEs(bool KeepCUDie) {
   // It depends on the implementation whether the request is fulfilled.
   // Create a new vector with a small capacity and assign it to the DieArray to
   // have previous contents freed.
+  llvm::sys::ScopedWriter CULock(m_cu_die_array_mutex);
+  llvm::sys::ScopedWriter AllLock(m_all_die_array_mutex);
   DieArray = (KeepCUDie && !DieArray.empty())
                  ? std::vector<DWARFDebugInfoEntry>({DieArray[0]})
                  : std::vector<DWARFDebugInfoEntry>();
diff --git a/llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp b/llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp
index 601686fdd3dd51..4a1ed91a7244f2 100644
--- a/llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp
+++ b/llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp
@@ -587,6 +587,11 @@ Error DwarfTransformer::convert(uint32_t NumThreads, OutputAggregator &Out) {
       DWARFDie Die = getDie(*CU);
       CUInfo CUI(DICtx, dyn_cast<DWARFCompileUnit>(CU.get()));
       handleDie(Out, CUI, Die);
+      // Release the line table, once we're done.
+      DICtx.clearLineTableForUnit(CU.get());
+      // Free any DIEs that were allocated by the DWARF parser.
+      // If/when they're needed by other CU's, they'll be recreated.
+      CU->clearDIEs(false);
     }
   } else {
     // LLVM Dwarf parser is not thread-safe and we need to parse all DWARF up
@@ -612,11 +617,16 @@ Error DwarfTransformer::convert(uint32_t NumThreads, OutputAggregator &Out) {
       DWARFDie Die = getDie(*CU);
       if (Die) {
         CUInfo CUI(DICtx, dyn_cast<DWARFCompileUnit>(CU.get()));
-        pool.async([this, CUI, &LogMutex, &Out, Die]() mutable {
+        pool.async([this, CUI, &CU, &LogMutex, &Out, Die]() mutable {
           std::string storage;
           raw_string_ostream StrStream(storage);
           OutputAggregator ThreadOut(Out.GetOS() ? &StrStream : nullptr);
           handleDie(ThreadOut, CUI, Die);
+          // Release the line table once we're done.
+          DICtx.clearLineTableForUnit(CU.get());
+          // Free any DIEs that were allocated by the DWARF parser.
+          // If/when they're needed by other CU's, they'll be recreated.
+          CU->clearDIEs(false);
           // Print ThreadLogStorage lines into an actual stream under a lock
           std::lock_guard<std::mutex> guard(LogMutex);
           if (Out.GetOS()) {
@@ -629,6 +639,9 @@ Error DwarfTransformer::convert(uint32_t NumThreads, OutputAggregator &Out) {
     }
     pool.wait();
   }
+  // Now get rid of all the DIEs that may have been recreated
+  for (const auto &CU : DICtx.compile_units())
+    CU->clearDIEs(false);
   size_t FunctionsAddedCount = Gsym.getNumFunctionInfos() - NumBefore;
   Out << "Loaded " << FunctionsAddedCount << " functions from DWARF.\n";
   return Error::success();

kevinfrei · 2024-05-06T16:14:22Z

@clayborg @jeffreytan81 could one of you please take a look?

jeffreytan81

Per my understanding, these two mutex are required to protect any read/write to DieArray field? If so, there seem to be many more accesses of this field not protected?

llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp

llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp

kevinfrei · 2024-05-13T18:45:56Z

Per my understanding, these two mutex are required to protect any read/write to DieArray field? If so, there seem to be many more accesses of this field not protected?

They're there for protecting the allocation+population, or destruction of the DieArrays, not generally reading/writing those fields. There needs to be two (for just the first CUDie, and then the whole array) as populating the whole array can trigger a request to populate the first CuDie.

clayborg

I am fine with the GSYM parts. I will let the LLVM DWARF experts accept the changes in DWARFUnit.h/.cpp

llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp

ayermolo · 2024-05-14T01:11:16Z

Per my understanding, these two mutex are required to protect any read/write to DieArray field? If so, there seem to be many more accesses of this field not protected?

They're there for protecting the allocation+population, or destruction of the DieArrays, not generally reading/writing those fields. There needs to be two (for just the first CUDie, and then the whole array) as populating the first can trigger a request to populate the rest.

What is the code path for that?

kevinfrei · 2024-05-14T16:40:07Z

Per my understanding, these two mutex are required to protect any read/write to DieArray field? If so, there seem to be many more accesses of this field not protected?

They're there for protecting the allocation+population, or destruction of the DieArrays, not generally reading/writing those fields. There needs to be two (for just the first CUDie, and then the whole array) as populating the first can trigger a request to populate the rest.

What is the code path for that?

Sorry: I wrote that backwards (fixed the original comment). Populating the full array can trigger a request to populate the CuDie through determineStringOffsetsTableContribution, which is why you need the two different locks. The first is always created before the list, but you need to lock it separately so that the read-lock will be free to return just the CUDie.

ayermolo · 2024-05-16T17:33:29Z

Per my understanding, these two mutex are required to protect any read/write to DieArray field? If so, there seem to be many more accesses of this field not protected?

They're there for protecting the allocation+population, or destruction of the DieArrays, not generally reading/writing those fields. There needs to be two (for just the first CUDie, and then the whole array) as populating the first can trigger a request to populate the rest.

What is the code path for that?

Sorry: I wrote that backwards (fixed the original comment). Populating the full array can trigger a request to populate the CuDie through determineStringOffsetsTableContribution, which is why you need the two different locks. The first is always created before the list, but you need to lock it separately so that the read-lock will be free to return just the CUDie.

Glanced at the code. I think I am missing something. How can code through determineStringOffsetsTableContribution trigger the path? It just accesses StrOffset Section.

kevinfrei · 2024-05-17T05:02:30Z

Per my understanding, these two mutex are required to protect any read/write to DieArray field? If so, there seem to be many more accesses of this field not protected?

They're there for protecting the allocation+population, or destruction of the DieArrays, not generally reading/writing those fields. There needs to be two (for just the first CUDie, and then the whole array) as populating the first can trigger a request to populate the rest.

What is the code path for that?

Sorry: I wrote that backwards (fixed the original comment). Populating the full array can trigger a request to populate the CuDie through determineStringOffsetsTableContribution, which is why you need the two different locks. The first is always created before the list, but you need to lock it separately so that the read-lock will be free to return just the CUDie.

Glanced at the code. I think I am missing something. How can code through determineStringOffsetsTableContribution trigger the path? It just accesses StrOffset Section.

Yeah, I didn't see it by looking at the code, because the recursion occurs inside getUnitDIE() (in the .h file), which calls extractDIEsIfNeeded(...)

kevinfrei · 2024-05-17T17:03:36Z

ping @ayermolo @dwblaikie This is blocking some dwarf-health tracking work I'm trying to make headway on. The only 'stylistic' adjustment I can see making sense is to have some sort of doubled-checked locking abstraction that takes a lock and a couple lambda's to make the duplicated code cleaner, which I'm happy to do if someone wants it, but I haven't seen something like that elsewhere.

ayermolo · 2024-05-17T18:11:42Z

OK. I think this two lock thing makes sense in my head.
One concern I have is that between locks in tryExtractDIEsIfNeeded it switches to clearDIEs.

kevinfrei · 2024-05-21T13:07:08Z

OK. I think this two lock thing makes sense in my head. One concern I have is that between locks in tryExtractDIEsIfNeeded it switches to clearDIEs.

Oooh! You're right: There's a spot in there where bad things could happen. I'll get that fixed.

kevinfrei · 2024-05-21T16:01:32Z

@ayermolo I used a rwlock model to prevent the issue you identified. Good catch!

kevinfrei · 2024-05-29T16:10:33Z

To alleviate copy-pasta problems, I've added a local DoubleCheckewhatdRWLocker helper function that takes the "check if it's already been initialized" lambda, and the initializer lambda (and the lock). I feel like the result is a much cleaner version of the original diff, plus it has the third lock to prevent free's from occurring between the single CU and the array CU initialization code that @ayermolo saw. @dwblaikie they tell me you're the guy who needs to approve this, now...

dwblaikie · 2024-05-29T18:23:29Z

Ah, I thought this was all in gsym, so hadn't given the review much attention.

Are these independent changes - the line table clearing stuff presumably is independent, for instance? I guess the unit DIE clearing is dependent on the extra locking being added?

The extra locking is unfortunate/more difficult to just straight up approve, as we'd previously restricted the multithreading support to the DWARFContext by having distinct contexts - so that non-multithreaded code didn't take the hit or complexity for the multithreaded code. But this new codepath and locking is present for single and multithreaded uses alike, by the looks of it? Not sure what to do about that...

kevinfrei · 2024-05-29T18:33:17Z

Are these independent changes - the line table clearing stuff presumably is independent, for instance? I guess the unit DIE clearing is dependent on the extra locking being added?

I'm only doing one "real" thing here, so no, the changes can't be split into smaller, still useful/sensible pieces (except that I could delay the gsymutil change until after the dwarfunit support for releasing the memory is in, which doesn't seem worthwhile)

The extra locking is unfortunate/more difficult to just straight up approve, as we'd previously restricted the multithreading support to the DWARFContext by having distinct contexts - so that non-multithreaded code didn't take the hit or complexity for the multithreaded code. But this new codepath and locking is present for single and multithreaded uses alike, by the looks of it? Not sure what to do about that...

I spent about 3 days staring at how to accomplish this in a way that wouldn't increase the cost of single-threaded code, and couldn't come up with anything other that copying the entire class and making a multi-threaded 'freeable' duplicate, which seemed like an awful lot of code duplication to save a few hundred CPU cycles in code that is egregiously I/O bound. (I have a 166 core server: gsymutil works fastest when it's run on 6-10 cores: any more swamps the NVMe drive).

kevinfrei · 2024-05-29T18:35:00Z

All that said, apparently something broke a bunch of tests (I hadn't run them since my last couple changes) so I'm not blocked on your review just yet :/

dwblaikie · 2024-06-26T19:20:10Z

Is it worth having 3 separate mutexes? One answer would be to go fully coarse grained - just have 1 that covers the whole array. Does that have noticably worse performance for your use case?

The other option would be 2 - you have one covering the first element of the array, and another covering the rest - I guess then you have problems when reallocating the array, though?

kevinfrei · 2024-06-27T15:51:38Z

Is it worth having 3 separate mutexes? One answer would be to go fully coarse grained - just have 1 that covers the whole array. Does that have noticably worse performance for your use case?

I started with what you're describing, and it deadlocks, as there's a path from the non-CU DIE creation that eventually recurses into the function (which I called out in the review above)

The other option would be 2 - you have one covering the first element of the array, and another covering the rest - I guess then you have problems when reallocating the array, though?

Yup. That's what @ayermolo caught as well. You need 3 locks. No amount of wishful thinking is gonna change that (I tried. Many different ways. Several times. 🫤)

dwblaikie · 2024-06-28T15:57:31Z

Is it worth having 3 separate mutexes? One answer would be to go fully coarse grained - just have 1 that covers the whole array. Does that have noticably worse performance for your use case?

I started with what you're describing, and it deadlocks, as there's a path from the non-CU DIE creation that eventually recurses into the function (which I called out in the review above)

Thanks, sorry I'm rehashing things. A recursive mutex would apply here, then? (& we even have this thing: https://llvm.org/doxygen/Mutex_8h_source.html - which would reduce the overhead (presumably?) to the non-multithreaded users (well, I guess only to users building all of LLVM without multithreading, not users building with multithreading enabled and just not using multiple threads to interact with libDebugInfoDWARF)?)

Does a single recursive mutex substantially affect the performance of the situations you're interested in?

kevinfrei · 2024-06-28T17:24:24Z

Does a single recursive mutex substantially affect the performance of the situations you're interested in?

Yes. You need a reader-writer mutex, as the reader scenario is very common, but conflicts enough to matter quite a bit. Any other performance doesn't matter at all. Even in single-threaded mode, the locking overhead isn't likely to register, as the work protected by the write-lock is deeply IO bound (and the existing lock is already one of the LLVM smart locks so it shouldn't be a cost). If you're reading this a whole lot, it could potentially slow things down, but I'm not sure what that scenario is, outside of the one I'm looking at in GSYM.

How common is it to try to extract the DIE's multiple times in a scenario where it's not processing an entire gigantic dependency graph from storage? I do not claim to understand all potential uses/scenarios of this code, but with my current understanding, that seems like a very unlikely scenario. I'm happy to learn of a counter-example.

dwblaikie

Fair enough - thanks for your patience walking me through it

dwblaikie · 2024-06-26T19:05:12Z

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp

+// Helper to safely check if the Compile-Unit DIE has been extracted already.
+// If not, then extract it, and return false, indicating that it was *not*
+// already extracted.
+bool DWARFUnit::extractCUDieIfNeeded(bool CUDieOnly, bool &HasCUDie) {


Is this function misnamed? It looks like it extracts more than only the CUDie (if CUDieOnly is false, it extracts all the DIEs, right?) So maybe this should be called "extractDiesIfNeeded"?

Yup, you're correct: This is allocating (and extracting) all the DIE's. What's protected by the second lock appears to be extracting string (offset...), range, and location tables for the DIEs that were just extracted. Would renaming things to extractDIEsIfNeeded/extractTablesForDIEsIfNeeded seem reasonable?

(I'm also going to see if I can free those tables in a later diff, as I'd love to drive memory usage lower than it currently is even with this change)

kamaub · 2024-07-03T14:34:21Z

It seems this commit caused a build failure on the clang-ppc64le-linux-multistage build no.583.

Please take a look at it and post a fix or revert your patch if the fix will take to long to implement. Let me know if there is anyway help or extra information I can offer.

FAILED: lib/libLLVMDebugInfoDWARF.so.19.0git 
: && /usr/lib64/ccache/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -fno-lifetime-dse -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/./lib  -Wl,--gc-sections -shared -Wl,-soname,libLLVMDebugInfoDWARF.so.19.0git -o lib/libLLVMDebugInfoDWARF.so.19.0git lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFAbbreviationDeclaration.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFAddressRange.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFAcceleratorTable.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFCompileUnit.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFContext.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDataExtractor.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugAbbrev.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugAddr.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugArangeSet.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugAranges.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugFrame.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFTypePrinter.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugInfoEntry.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugLine.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugLoc.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugMacro.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugPubTable.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugRangeList.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugRnglists.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDie.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFExpression.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFFormValue.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFGdbIndex.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFListTable.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFLocationExpression.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFTypeUnit.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFUnitIndex.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFUnit.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFVerifier.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib:"  lib/libLLVMObject.so.19.0git  lib/libLLVMBinaryFormat.so.19.0git  lib/libLLVMTargetParser.so.19.0git  lib/libLLVMSupport.so.19.0git  -Wl,-rpath-link,/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib && :
lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFUnit.cpp.o: In function `DoubleCheckedRWLocker(llvm::sys::SmartRWMutex<false>&, std::function<bool ()> const&, std::function<void ()> const&)':
DWARFUnit.cpp:(.text._ZL21DoubleCheckedRWLockerRN4llvm3sys12SmartRWMutexILb0EEERKSt8functionIFbvEERKS4_IFvvEE+0x44): undefined reference to `pthread_rwlock_rdlock'
DWARFUnit.cpp:(.text._ZL21DoubleCheckedRWLockerRN4llvm3sys12SmartRWMutexILb0EEERKSt8functionIFbvEERKS4_IFvvEE+0x94): undefined reference to `pthread_rwlock_unlock'
DWARFUnit.cpp:(.text._ZL21DoubleCheckedRWLockerRN4llvm3sys12SmartRWMutexILb0EEERKSt8functionIFbvEERKS4_IFvvEE+0xa8): undefined reference to `pthread_rwlock_wrlock'
DWARFUnit.cpp:(.text._ZL21DoubleCheckedRWLockerRN4llvm3sys12SmartRWMutexILb0EEERKSt8functionIFbvEERKS4_IFvvEE+0x114): undefined reference to `pthread_rwlock_unlock'
lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFUnit.cpp.o: In function `llvm::DWARFUnit::tryExtractDIEsIfNeeded(bool) [clone .localalias.18]':
DWARFUnit.cpp:(.text._ZN4llvm9DWARFUnit22tryExtractDIEsIfNeededEb+0x44): undefined reference to `pthread_rwlock_rdlock'
DWARFUnit.cpp:(.text._ZN4llvm9DWARFUnit22tryExtractDIEsIfNeededEb+0x94): undefined reference to `pthread_rwlock_unlock'
lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFUnit.cpp.o: In function `llvm::DWARFUnit::clearDIEs(bool) [clone .localalias.15]':
DWARFUnit.cpp:(.text._ZN4llvm9DWARFUnit9clearDIEsEb+0x30): undefined reference to `pthread_rwlock_wrlock'
DWARFUnit.cpp:(.text._ZN4llvm9DWARFUnit9clearDIEsEb+0x7c): undefined reference to `pthread_rwlock_unlock'
collect2: error: ld returned 1 exit status

kevinfrei · 2024-07-03T15:22:03Z

It seems this commit caused a build failure on the clang-ppc64le-linux-multistage build no.583.

Please take a look at it and post a fix or revert your patch if the fix will take to long to implement. Let me know if there is anyway help or extra information I can offer.

FAILED: lib/libLLVMDebugInfoDWARF.so.19.0git 
: && /usr/lib64/ccache/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -fno-lifetime-dse -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/./lib  -Wl,--gc-sections -shared -Wl,-soname,libLLVMDebugInfoDWARF.so.19.0git -o lib/libLLVMDebugInfoDWARF.so.19.0git lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFAbbreviationDeclaration.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFAddressRange.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFAcceleratorTable.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFCompileUnit.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFContext.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDataExtractor.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugAbbrev.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugAddr.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugArangeSet.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugAranges.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugFrame.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFTypePrinter.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugInfoEntry.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugLine.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugLoc.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugMacro.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugPubTable.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugRangeList.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDebugRnglists.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFDie.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFExpression.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFFormValue.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFGdbIndex.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFListTable.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFLocationExpression.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFTypeUnit.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFUnitIndex.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFUnit.cpp.o lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFVerifier.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib:"  lib/libLLVMObject.so.19.0git  lib/libLLVMBinaryFormat.so.19.0git  lib/libLLVMTargetParser.so.19.0git  lib/libLLVMSupport.so.19.0git  -Wl,-rpath-link,/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib && :
lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFUnit.cpp.o: In function `DoubleCheckedRWLocker(llvm::sys::SmartRWMutex<false>&, std::function<bool ()> const&, std::function<void ()> const&)':
DWARFUnit.cpp:(.text._ZL21DoubleCheckedRWLockerRN4llvm3sys12SmartRWMutexILb0EEERKSt8functionIFbvEERKS4_IFvvEE+0x44): undefined reference to `pthread_rwlock_rdlock'
DWARFUnit.cpp:(.text._ZL21DoubleCheckedRWLockerRN4llvm3sys12SmartRWMutexILb0EEERKSt8functionIFbvEERKS4_IFvvEE+0x94): undefined reference to `pthread_rwlock_unlock'
DWARFUnit.cpp:(.text._ZL21DoubleCheckedRWLockerRN4llvm3sys12SmartRWMutexILb0EEERKSt8functionIFbvEERKS4_IFvvEE+0xa8): undefined reference to `pthread_rwlock_wrlock'
DWARFUnit.cpp:(.text._ZL21DoubleCheckedRWLockerRN4llvm3sys12SmartRWMutexILb0EEERKSt8functionIFbvEERKS4_IFvvEE+0x114): undefined reference to `pthread_rwlock_unlock'
lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFUnit.cpp.o: In function `llvm::DWARFUnit::tryExtractDIEsIfNeeded(bool) [clone .localalias.18]':
DWARFUnit.cpp:(.text._ZN4llvm9DWARFUnit22tryExtractDIEsIfNeededEb+0x44): undefined reference to `pthread_rwlock_rdlock'
DWARFUnit.cpp:(.text._ZN4llvm9DWARFUnit22tryExtractDIEsIfNeededEb+0x94): undefined reference to `pthread_rwlock_unlock'
lib/DebugInfo/DWARF/CMakeFiles/LLVMDebugInfoDWARF.dir/DWARFUnit.cpp.o: In function `llvm::DWARFUnit::clearDIEs(bool) [clone .localalias.15]':
DWARFUnit.cpp:(.text._ZN4llvm9DWARFUnit9clearDIEsEb+0x30): undefined reference to `pthread_rwlock_wrlock'
DWARFUnit.cpp:(.text._ZN4llvm9DWARFUnit9clearDIEsEb+0x7c): undefined reference to `pthread_rwlock_unlock'
collect2: error: ld returned 1 exit status

Please revert it. Looks like the PPC64le build doesn't like lambda's very much. I'll try to refactor it without the lambda's, but I won't be able to do it in the next few hours :/

This reverts commit 60cd3eb.

Reverts #91023 Build break found in clang-ppc64le-linux-multistage build no. 583.

llvm-gsymutil eats a lot of RAM. On some large binaries, it causes OOM's on smaller hardware, consuming well over 64GB of RAM. This change frees line tables once we're done with them, and frees DWARFUnits's DIE's when we finish processing each DU, though they may get reconstituted if there are references from other DU's during processing. Once the conversion is complete, all DIE's are freed. The reduction in peak memory usage from these changes showed between 7-12% in my tests. The double-checked locking around the creation & freeing of the data structures was tested on a 166 core system. I validated that it trivially malfunctioned without the locks (and with stupid reordering of the locks) and worked reliably with them. --------- Co-authored-by: Kevin Frei <[email protected]>

llvm-gsymutil eats a lot of RAM. On some large binaries, it causes OOM's on smaller hardware, consuming well over 64GB of RAM. This change frees line tables once we're done with them, and frees DWARFUnits's DIE's when we finish processing each DU, though they may get reconstituted if there are references from other DU's during processing. Once the conversion is complete, all DIE's are freed. The reduction in peak memory usage from these changes showed between 7-12% in my tests. There is a recursive mutex necessary to prevent accidental freeing of the DIE arrays while they're being extraced. It needs to be recursive as there's a recursive path through the final section of the code (determineStringOffsetsTableContribution) that may result in a call to this function.

llvm-gsymutil eats a lot of RAM. On some large binaries, it causes OOM's on smaller hardware, consuming well over 64GB of RAM. This change frees line tables once we're done with them, and frees DWARFUnits's DIE's when we finish processing each DU, though they may get reconstituted if there are references from other DU's during processing. Once the conversion is complete, all DIE's are freed. The reduction in peak memory usage from these changes showed between 7-12% in my tests. The double-checked locking around the creation & freeing of the data structures was tested on a 166 core system. I validated that it trivially malfunctioned without the locks (and with stupid reordering of the locks) and worked reliably with them. --------- Co-authored-by: Kevin Frei <[email protected]>

Reverts llvm#91023 Build break found in clang-ppc64le-linux-multistage build no. 583.

llvm-gsymutil eats a lot of RAM. On some large binaries, it causes OOM's on smaller hardware, consuming well over 64GB of RAM. This change frees line tables once we're done with them, and frees DWARFUnits's DIE's when we finish processing each DU, though they may get reconstituted if there are references from other DU's during processing. Once the conversion is complete, all DIE's are freed. The reduction in peak memory usage from these changes showed between 7-12% in my tests. There is a recursive mutex necessary to prevent accidental freeing of the DIE arrays while they're being extraced. It needs to be recursive as there's a recursive path through the final section of the code (determineStringOffsetsTableContribution) that may result in a call to this function.

kevinfrei marked this pull request as ready for review May 6, 2024 16:13

llvmbot added the debuginfo label May 6, 2024

jeffreytan81 reviewed May 7, 2024

View reviewed changes

llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp Outdated Show resolved Hide resolved

llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp Outdated Show resolved Hide resolved

llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h Outdated Show resolved Hide resolved

jeffreytan81 requested review from clayborg, dwblaikie and ayermolo May 7, 2024 00:06

clayborg reviewed May 7, 2024

View reviewed changes

llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h Outdated Show resolved Hide resolved

llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h Outdated Show resolved Hide resolved

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp Outdated Show resolved Hide resolved

kevinfrei force-pushed the gsym-mem-usage branch from 03a2103 to 5a40af3 Compare May 13, 2024 17:37

kevinfrei force-pushed the gsym-mem-usage branch from ae91f55 to da79a50 Compare May 13, 2024 18:54

clayborg reviewed May 13, 2024

View reviewed changes

llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp Show resolved Hide resolved

kevinfrei force-pushed the gsym-mem-usage branch from 2cf7c2a to 967fa38 Compare May 14, 2024 14:43

kevinfrei force-pushed the gsym-mem-usage branch from 967fa38 to 31bc833 Compare May 17, 2024 16:08

kevinfrei force-pushed the gsym-mem-usage branch from 4085cf1 to 5924b72 Compare May 28, 2024 15:09

kevinfrei force-pushed the gsym-mem-usage branch 2 times, most recently from 502f36a to f4a456d Compare May 31, 2024 15:37

kevinfrei force-pushed the gsym-mem-usage branch from b3120a5 to b8c0704 Compare June 26, 2024 17:03

dwblaikie approved these changes Jun 28, 2024

View reviewed changes

clayborg merged commit 60cd3eb into llvm:main Jul 2, 2024
7 checks passed

kevinfrei deleted the gsym-mem-usage branch July 2, 2024 21:00

kamaub added a commit that referenced this pull request Jul 3, 2024

Revert "Reduce llvm-gsymutil memory usage (#91023)"

5960fee

This reverts commit 60cd3eb.

kamaub mentioned this pull request Jul 3, 2024

Revert "Reduce llvm-gsymutil memory usage" #97603

Merged

kamaub added a commit that referenced this pull request Jul 3, 2024

Revert "Reduce llvm-gsymutil memory usage" (#97603)

3386d24

Reverts #91023 Build break found in clang-ppc64le-linux-multistage build no. 583.

kevinfrei mentioned this pull request Jul 3, 2024

Reduce llvm-gsymutil memory usage (lambda-free, and less locking) #97640

Open

kbluck pushed a commit to kbluck/llvm-project that referenced this pull request Jul 6, 2024

Revert "Reduce llvm-gsymutil memory usage" (llvm#97603)

55a2ddd

Reverts llvm#91023 Build break found in clang-ppc64le-linux-multistage build no. 583.

Reduce llvm-gsymutil memory usage #91023

Reduce llvm-gsymutil memory usage #91023

Uh oh!

Conversation

kevinfrei commented May 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented May 6, 2024

Uh oh!

kevinfrei commented May 6, 2024

Uh oh!

jeffreytan81 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevinfrei commented May 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clayborg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ayermolo commented May 14, 2024

Uh oh!

kevinfrei commented May 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ayermolo commented May 16, 2024

Uh oh!

kevinfrei commented May 17, 2024

Uh oh!

kevinfrei commented May 17, 2024

Uh oh!

ayermolo commented May 17, 2024

Uh oh!

kevinfrei commented May 21, 2024

Uh oh!

kevinfrei commented May 21, 2024

Uh oh!

kevinfrei commented May 29, 2024

Uh oh!

dwblaikie commented May 29, 2024

Uh oh!

kevinfrei commented May 29, 2024

Uh oh!

kevinfrei commented May 29, 2024

Uh oh!

dwblaikie commented Jun 26, 2024

Uh oh!

kevinfrei commented Jun 27, 2024

Uh oh!

dwblaikie commented Jun 28, 2024

Uh oh!

kevinfrei commented Jun 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dwblaikie left a comment

Choose a reason for hiding this comment

Uh oh!

dwblaikie Jun 26, 2024

Choose a reason for hiding this comment

Uh oh!

kevinfrei Jun 28, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kamaub commented Jul 3, 2024

Uh oh!

kevinfrei commented Jul 3, 2024

Uh oh!

Uh oh!

kevinfrei commented May 3, 2024 •

edited

Loading

kevinfrei commented May 13, 2024 •

edited

Loading

kevinfrei commented May 14, 2024 •

edited

Loading

kevinfrei commented Jun 28, 2024 •

edited

Loading