Skip to content

[InstrRef][nfc] Remove usage of unique_ptrs of arrays #74203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 14, 2023

Conversation

felipepiovezan
Copy link
Contributor

These are usually difficult to reason about, and they were being used to pass raw pointers around with array semantic (i.e., we were using operator [] on raw pointers). To put it in InstrRef terminology: we were passing a pointer to a ValueTable but using it as if it were a FuncValueTable.

These could have easily been SmallVectors, which now allow us to have reference semantics in some places, as well as simpler initialization.

In the future, we can use even more pass-by-reference with some extra changes in the code.

@llvmbot
Copy link
Member

llvmbot commented Dec 2, 2023

@llvm/pr-subscribers-debuginfo

Author: Felipe de Azevedo Piovezan (felipepiovezan)

Changes

These are usually difficult to reason about, and they were being used to pass raw pointers around with array semantic (i.e., we were using operator [] on raw pointers). To put it in InstrRef terminology: we were passing a pointer to a ValueTable but using it as if it were a FuncValueTable.

These could have easily been SmallVectors, which now allow us to have reference semantics in some places, as well as simpler initialization.

In the future, we can use even more pass-by-reference with some extra changes in the code.


Full diff: https://github.com/llvm/llvm-project/pull/74203.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp (+23-29)
  • (modified) llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.h (+12-12)
  • (modified) llvm/unittests/CodeGen/InstrRefLDVTest.cpp (+2-9)
diff --git a/llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp b/llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
index b2c2b40139eda..cb0954dc544eb 100644
--- a/llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
+++ b/llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
@@ -1429,7 +1429,7 @@ bool InstrRefBasedLDV::transferDebugValue(const MachineInstr &MI) {
 
 std::optional<ValueIDNum> InstrRefBasedLDV::getValueForInstrRef(
     unsigned InstNo, unsigned OpNo, MachineInstr &MI,
-    const ValueTable *MLiveOuts, const ValueTable *MLiveIns) {
+    const FuncValueTable *MLiveOuts, const FuncValueTable *MLiveIns) {
   // Various optimizations may have happened to the value during codegen,
   // recorded in the value substitution table. Apply any substitutions to
   // the instruction / operand number in this DBG_INSTR_REF, and collect
@@ -1495,7 +1495,8 @@ std::optional<ValueIDNum> InstrRefBasedLDV::getValueForInstrRef(
   } else if (PHIIt != DebugPHINumToValue.end() && PHIIt->InstrNum == InstNo) {
     // It's actually a PHI value. Which value it is might not be obvious, use
     // the resolver helper to find out.
-    NewID = resolveDbgPHIs(*MI.getParent()->getParent(), MLiveOuts, MLiveIns,
+    assert(MLiveOuts && MLiveIns);
+    NewID = resolveDbgPHIs(*MI.getParent()->getParent(), *MLiveOuts, *MLiveIns,
                            MI, InstNo);
   }
 
@@ -1574,8 +1575,8 @@ std::optional<ValueIDNum> InstrRefBasedLDV::getValueForInstrRef(
 }
 
 bool InstrRefBasedLDV::transferDebugInstrRef(MachineInstr &MI,
-                                             const ValueTable *MLiveOuts,
-                                             const ValueTable *MLiveIns) {
+                                             const FuncValueTable *MLiveOuts,
+                                             const FuncValueTable *MLiveIns) {
   if (!MI.isDebugRef())
     return false;
 
@@ -2245,8 +2246,8 @@ void InstrRefBasedLDV::accumulateFragmentMap(MachineInstr &MI) {
   AllSeenFragments.insert(ThisFragment);
 }
 
-void InstrRefBasedLDV::process(MachineInstr &MI, const ValueTable *MLiveOuts,
-                               const ValueTable *MLiveIns) {
+void InstrRefBasedLDV::process(MachineInstr &MI, const FuncValueTable *MLiveOuts,
+                               const FuncValueTable *MLiveIns) {
   // Try to interpret an MI as a debug or transfer instruction. Only if it's
   // none of these should we interpret it's register defs as new value
   // definitions.
@@ -3503,7 +3504,10 @@ bool InstrRefBasedLDV::depthFirstVLocAndEmit(
   // Helper lambda for ejecting a block -- if nothing is going to use the block,
   // we can translate the variable location information into DBG_VALUEs and then
   // free all of InstrRefBasedLDV's data structures.
+  SmallPtrSet<const MachineBasicBlock *, 8> EjectedBBs;
   auto EjectBlock = [&](MachineBasicBlock &MBB) -> void {
+    if (EjectedBBs.insert(&MBB).second == false)
+      return;
     unsigned BBNum = MBB.getNumber();
     AllTheVLocs[BBNum].clear();
 
@@ -3517,14 +3521,14 @@ bool InstrRefBasedLDV::depthFirstVLocAndEmit(
     CurBB = BBNum;
     CurInst = 1;
     for (auto &MI : MBB) {
-      process(MI, MOutLocs.get(), MInLocs.get());
+      process(MI, &MOutLocs, &MInLocs);
       TTracker->checkInstForNewValues(CurInst, MI.getIterator());
       ++CurInst;
     }
 
     // Free machine-location tables for this block.
-    MInLocs[BBNum].reset();
-    MOutLocs[BBNum].reset();
+    MInLocs[BBNum] = ValueTable();
+    MOutLocs[BBNum] = ValueTable();
     // We don't need live-in variable values for this block either.
     Output[BBNum].clear();
     AllTheVLocs[BBNum].clear();
@@ -3589,8 +3593,7 @@ bool InstrRefBasedLDV::depthFirstVLocAndEmit(
   // anything for such out-of-scope blocks, but for the sake of being similar
   // to VarLocBasedLDV, eject these too.
   for (auto *MBB : ArtificialBlocks)
-    if (MOutLocs[MBB->getNumber()])
-      EjectBlock(*MBB);
+    EjectBlock(*MBB);
 
   return emitTransfers(AllVarsNumbering);
 }
@@ -3688,14 +3691,9 @@ bool InstrRefBasedLDV::ExtendRanges(MachineFunction &MF,
   // Allocate and initialize two array-of-arrays for the live-in and live-out
   // machine values. The outer dimension is the block number; while the inner
   // dimension is a LocIdx from MLocTracker.
-  FuncValueTable MOutLocs = std::make_unique<ValueTable[]>(MaxNumBlocks);
-  FuncValueTable MInLocs = std::make_unique<ValueTable[]>(MaxNumBlocks);
   unsigned NumLocs = MTracker->getNumLocs();
-  for (int i = 0; i < MaxNumBlocks; ++i) {
-    // These all auto-initialize to ValueIDNum::EmptyValue
-    MOutLocs[i] = std::make_unique<ValueIDNum[]>(NumLocs);
-    MInLocs[i] = std::make_unique<ValueIDNum[]>(NumLocs);
-  }
+  FuncValueTable MOutLocs(MaxNumBlocks, ValueTable(NumLocs));
+  FuncValueTable MInLocs(MaxNumBlocks, ValueTable(NumLocs));
 
   // Solve the machine value dataflow problem using the MLocTransfer function,
   // storing the computed live-ins / live-outs into the array-of-arrays. We use
@@ -3736,7 +3734,7 @@ bool InstrRefBasedLDV::ExtendRanges(MachineFunction &MF,
     MTracker->loadFromArray(MInLocs[CurBB], CurBB);
     CurInst = 1;
     for (auto &MI : MBB) {
-      process(MI, MOutLocs.get(), MInLocs.get());
+      process(MI, &MOutLocs, &MInLocs);
       ++CurInst;
     }
     MTracker->reset();
@@ -3917,9 +3915,9 @@ class LDVSSAUpdater {
   /// Machine location where any PHI must occur.
   LocIdx Loc;
   /// Table of live-in machine value numbers for blocks / locations.
-  const ValueTable *MLiveIns;
+  const FuncValueTable &MLiveIns;
 
-  LDVSSAUpdater(LocIdx L, const ValueTable *MLiveIns)
+  LDVSSAUpdater(LocIdx L, const FuncValueTable &MLiveIns)
       : Loc(L), MLiveIns(MLiveIns) {}
 
   void reset() {
@@ -4075,12 +4073,8 @@ template <> class SSAUpdaterTraits<LDVSSAUpdater> {
 } // end namespace llvm
 
 std::optional<ValueIDNum> InstrRefBasedLDV::resolveDbgPHIs(
-    MachineFunction &MF, const ValueTable *MLiveOuts,
-    const ValueTable *MLiveIns, MachineInstr &Here, uint64_t InstrNum) {
-  assert(MLiveOuts && MLiveIns &&
-         "Tried to resolve DBG_PHI before location "
-         "tables allocated?");
-
+    MachineFunction &MF, const FuncValueTable &MLiveOuts,
+    const FuncValueTable &MLiveIns, MachineInstr &Here, uint64_t InstrNum) {
   // This function will be called twice per DBG_INSTR_REF, and might end up
   // computing lots of SSA information: memoize it.
   auto SeenDbgPHIIt = SeenDbgPHIs.find(std::make_pair(&Here, InstrNum));
@@ -4094,8 +4088,8 @@ std::optional<ValueIDNum> InstrRefBasedLDV::resolveDbgPHIs(
 }
 
 std::optional<ValueIDNum> InstrRefBasedLDV::resolveDbgPHIsImpl(
-    MachineFunction &MF, const ValueTable *MLiveOuts,
-    const ValueTable *MLiveIns, MachineInstr &Here, uint64_t InstrNum) {
+    MachineFunction &MF, const FuncValueTable &MLiveOuts,
+    const FuncValueTable &MLiveIns, MachineInstr &Here, uint64_t InstrNum) {
   // Pick out records of DBG_PHI instructions that have been observed. If there
   // are none, then we cannot compute a value number.
   auto RangePair = std::equal_range(DebugPHINumToValue.begin(),
diff --git a/llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.h b/llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.h
index 5e94962f9d7e6..d6dbb1feda3e8 100644
--- a/llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.h
+++ b/llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.h
@@ -205,11 +205,11 @@ namespace LiveDebugValues {
 using namespace llvm;
 
 /// Type for a table of values in a block.
-using ValueTable = std::unique_ptr<ValueIDNum[]>;
+using ValueTable = SmallVector<ValueIDNum, 0>;
 
 /// Type for a table-of-table-of-values, i.e., the collection of either
 /// live-in or live-out values for each block in the function.
-using FuncValueTable = std::unique_ptr<ValueTable[]>;
+using FuncValueTable = SmallVector<ValueTable, 0>;
 
 /// Thin wrapper around an integer -- designed to give more type safety to
 /// spill location numbers.
@@ -1200,12 +1200,12 @@ class InstrRefBasedLDV : public LDVImpl {
   /// exists, otherwise returns std::nullopt.
   std::optional<ValueIDNum> getValueForInstrRef(unsigned InstNo, unsigned OpNo,
                                                 MachineInstr &MI,
-                                                const ValueTable *MLiveOuts,
-                                                const ValueTable *MLiveIns);
+                                                const FuncValueTable *MLiveOuts,
+                                                const FuncValueTable *MLiveIns);
 
   /// Observe a single instruction while stepping through a block.
-  void process(MachineInstr &MI, const ValueTable *MLiveOuts,
-               const ValueTable *MLiveIns);
+  void process(MachineInstr &MI, const FuncValueTable *MLiveOuts,
+               const FuncValueTable *MLiveIns);
 
   /// Examines whether \p MI is a DBG_VALUE and notifies trackers.
   /// \returns true if MI was recognized and processed.
@@ -1213,8 +1213,8 @@ class InstrRefBasedLDV : public LDVImpl {
 
   /// Examines whether \p MI is a DBG_INSTR_REF and notifies trackers.
   /// \returns true if MI was recognized and processed.
-  bool transferDebugInstrRef(MachineInstr &MI, const ValueTable *MLiveOuts,
-                             const ValueTable *MLiveIns);
+  bool transferDebugInstrRef(MachineInstr &MI, const FuncValueTable *MLiveOuts,
+                             const FuncValueTable *MLiveIns);
 
   /// Stores value-information about where this PHI occurred, and what
   /// instruction number is associated with it.
@@ -1246,14 +1246,14 @@ class InstrRefBasedLDV : public LDVImpl {
   /// \p InstrNum Debug instruction number defined by DBG_PHI instructions.
   /// \returns The machine value number at position Here, or std::nullopt.
   std::optional<ValueIDNum> resolveDbgPHIs(MachineFunction &MF,
-                                           const ValueTable *MLiveOuts,
-                                           const ValueTable *MLiveIns,
+                                           const FuncValueTable &MLiveOuts,
+                                           const FuncValueTable &MLiveIns,
                                            MachineInstr &Here,
                                            uint64_t InstrNum);
 
   std::optional<ValueIDNum> resolveDbgPHIsImpl(MachineFunction &MF,
-                                               const ValueTable *MLiveOuts,
-                                               const ValueTable *MLiveIns,
+                                               const FuncValueTable &MLiveOuts,
+                                               const FuncValueTable &MLiveIns,
                                                MachineInstr &Here,
                                                uint64_t InstrNum);
 
diff --git a/llvm/unittests/CodeGen/InstrRefLDVTest.cpp b/llvm/unittests/CodeGen/InstrRefLDVTest.cpp
index 5a050bc520362..acbcd247fa9a0 100644
--- a/llvm/unittests/CodeGen/InstrRefLDVTest.cpp
+++ b/llvm/unittests/CodeGen/InstrRefLDVTest.cpp
@@ -497,15 +497,8 @@ body:  |
 
   std::pair<FuncValueTable, FuncValueTable>
   allocValueTables(unsigned Blocks, unsigned Locs) {
-    FuncValueTable MOutLocs = std::make_unique<ValueTable[]>(Blocks);
-    FuncValueTable MInLocs = std::make_unique<ValueTable[]>(Blocks);
-
-    for (unsigned int I = 0; I < Blocks; ++I) {
-      MOutLocs[I] = std::make_unique<ValueIDNum[]>(Locs);
-      MInLocs[I] = std::make_unique<ValueIDNum[]>(Locs);
-    }
-
-    return std::make_pair(std::move(MOutLocs), std::move(MInLocs));
+    return {FuncValueTable(Blocks, ValueTable(Locs)),
+            FuncValueTable(Blocks, ValueTable(Locs))};
   }
 };
 

Copy link

github-actions bot commented Dec 2, 2023

:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.

using FuncValueTable = std::unique_ptr<ValueTable[]>;
using FuncValueTable = SmallVector<ValueTable, 0>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(personally, I find code easier to read when it doesn't obscure these outer templates like unique_ptr or SmallVector (ie: I'd rather avoid/remove these aliases and have code refer to SmallVector<ValueTable, 0> directly) - and the name might be a bit misleading too, if we're going to keep it, FuncValueTable - it isn't a single value table, it's a sequence of them - so like ValueTableVector? (at least that hints at it being a sequence, which is helpful - I guess the old name couldn't do that because unique_ptr<T[]> doesn't have begin/end or even a length, etc))

Copy link
Contributor Author

@felipepiovezan felipepiovezan Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.
@jmorse would you be ok if I removed the FuncValueTable typedef and only kept ValueTable?
This would also allow us to pass MutableArrayRefs around instead of SmallVectors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM!

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me and is probably clearer than the array-allocation situation. I've added a question about a potentially functional change / code-smell inline.

(I wrote the unique_ptr stuff in D118774 after having been bitten by asan too many times, in a bid to escape further manual memory allocation).

Comment on lines +3507 to +3511
SmallPtrSet<const MachineBasicBlock *, 8> EjectedBBs;
auto EjectBlock = [&](MachineBasicBlock &MBB) -> void {
if (EjectedBBs.insert(&MBB).second == false)
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recollection of this (which may be faulty) is that blocks should only get ejected once, in the order determined just above this block (it reduces the working-set size during compilation). If the added code fires I'd suggest there's a bug somewhere else, is there a reproducer for that? I'd imagine it'd only reduce performance a bit.

Copy link
Contributor Author

@felipepiovezan felipepiovezan Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale for this code is because I removed the check on line 3596 of the "green" side of the diff.

I.e., before:

 for (auto *MBB : ArtificialBlocks)
     if (MOutLocs[MBB->getNumber()])
       EjectBlock(*MBB);

After:

  for (auto *MBB : ArtificialBlocks)
     EjectBlock(*MBB);

Prior to this change, we were testing whether something had been ejected by querying whether we had called "reset" on the unique pointer. There are no unique pointers anymore, and checking for whether the length of the vector is 0 not the equivalent, so the only solution was to add this set.

So, with this mind, the patch is 100% intended to be NFC. Do you agree with that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, I see the rationale. I'm struggling to remember where that came from -- I thought it should be the case that the set of ArtificialBlocks and the set of blocks produced by makeDepthFirstEjectionMap shouldn't overlap anyway because the artificial blocks shouldn't be in any lexical scopes? But maybe they can, as LexicalScopes only stores a start/end block-number pair.

Either way, I'm not confident enough to say the is-null test isn't double-ejection prevention so it should be kept. For performance, could you use a SmallDenseMap instead of SmallPtrSet with a call to reserve as we'll know the maximum number of blocks that might get inserted. That limits the worst-case map-growth to a single allocation while providing a local scratchpad for small functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not confident enough to say the is-null test isn't double-ejection prevention so it should be kept

I think there may be some miscommunication going here: the check is kept, it just takes a different form.
I moved it inside the lambda, and instead of checking for a nullptr (pointers don't exist anymore, so we can't check that), I am using an "has been ejected" check.

Does that make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes perfect sense and you're correct; I'm just second-guessing whether the check had a different purpose, or was something I'd gratuitously added in the past without reason.

using FuncValueTable = std::unique_ptr<ValueTable[]>;
using FuncValueTable = SmallVector<ValueTable, 0>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM!

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with a preference for using a fixed-allocation set for EjectedBBs, rather than one that grows incrementally.

@felipepiovezan
Copy link
Contributor Author

LGTM, with a preference for using a fixed-allocation set for EjectedBBs, rather than one that grows incrementally.

I'll add the reserve call!

These are usually difficult to reason about, and they were being used to pass
raw pointers around with array semantic (i.e., we were using operator [] on raw
pointers). To put it in InstrRef terminology: we were passing a pointer to a
ValueTable but using it as if it were a FuncValueTable.

These could have easily been SmallVectors, which now allow us to have
reference semantics in some places, as well as simpler initialization.

In the future, we can use even more pass-by-reference with some extra changes in
the code.
@felipepiovezan
Copy link
Contributor Author

LGTM, with a preference for using a fixed-allocation set for EjectedBBs, rather than one that grows incrementally.

So it felt a bit wrong to use a SmallDenseMap for something that should have been a set, and it felt extra wrong to use that when we have a data structure designed specifically to be a set of pointers. This motivated me to look into the implementation of said data structure, and indeed it is designed exactly to be very conservative with memory allocation: it uses the same "Small" strategy of all "Small" containers, but when it needs to malloc, it will grab 128 elements right away. This should be more than enough for the vast majority of functions.

If more than N insertions are performed, a single quadratically probed hash table is allocated and grows as needed, providing extremely efficient access (constant time insertion/deleting/queries with low constant factors) and is very stingy with malloc traffic.

@felipepiovezan felipepiovezan force-pushed the felipe/remove_unique_ptrs branch from 0e36c60 to b7f8032 Compare December 13, 2023 19:36
@felipepiovezan
Copy link
Contributor Author

Fixed formatter issue

@felipepiovezan
Copy link
Contributor Author

I'll go ahead and merge this, but happy to address the reserve issue afterwards if you disagree with my assessment from above.

@felipepiovezan felipepiovezan merged commit 1b531d5 into llvm:main Dec 14, 2023
@felipepiovezan felipepiovezan deleted the felipe/remove_unique_ptrs branch December 14, 2023 16:22
@jmorse
Copy link
Member

jmorse commented Dec 15, 2023

Ah, apologies, looks like I intended on writing "SmallSenseSet" but wrote "SmallDenseMap" by mistake -- apologies. Whichever set is probably fine, if there's any serious problems it'll show up in a profiler.

@nikic
Copy link
Contributor

nikic commented Dec 15, 2023

FYI this seems to significantly increase memory usage: http://llvm-compile-time-tracker.com/compare.php?from=0d02ecc6bddc5e8c8bacbae3ab72a10467e177b4&to=1b531d54f6234488dec048fec1c5aca331bf8f3d&stat=max-rss&linkStats=on (look at stage1-ReleaseLTO-g (link only) only). mafft and SPASS go up by 10%.

@felipepiovezan
Copy link
Contributor Author

FYI this seems to significantly increase memory usage: http://llvm-compile-time-tracker.com/compare.php?from=0d02ecc6bddc5e8c8bacbae3ab72a10467e177b4&to=1b531d54f6234488dec048fec1c5aca331bf8f3d&stat=max-rss&linkStats=on (look at stage1-ReleaseLTO-g (link only) only). mafft and SPASS go up by 10%.

That's very surprising. I'll investigate

@felipepiovezan
Copy link
Contributor Author

felipepiovezan commented Dec 15, 2023

FYI this seems to significantly increase memory usage: http://llvm-compile-time-tracker.com/compare.php?from=0d02ecc6bddc5e8c8bacbae3ab72a10467e177b4&to=1b531d54f6234488dec048fec1c5aca331bf8f3d&stat=max-rss&linkStats=on (look at stage1-ReleaseLTO-g (link only) only). mafft and SPASS go up by 10%.

How is the compiler itself built for this? IIUC the configurations refer to how the tests are built, but not how the compiler itself is built.

Also, the regressions seem to happen in the link stage?

SPASS.link	288MiB	319MiB (+10.88%)
pairlocalalign.link	119MiB	131MiB (+10.41%)

edit: Ah, this is LTO, so probably one huge module at the end of the compilation flow

@felipepiovezan
Copy link
Contributor Author

@nikic could you show me how the memory usage is being measured? The tools I used, namely time -l and Instruments (macOS) both failed to show any difference with and without this patch.

@nikic
Copy link
Contributor

nikic commented Dec 15, 2023

max-rss is measured using time -f "%M;%e", which should produce the same data as time -l. Note that this is only for the final linker invocation, so you'd want to run ninja -v to get the final clang invocation that performs LTO and run that under time. (For actual analysis, I would run under massif.)

It's possible that there is some platform-specific difference here (libstdc++ vs libc++), but it seems unlikely that that this causes a 10% regression on Linux and no change on MacOS.

@felipepiovezan
Copy link
Contributor Author

Note that this is only for the final linker invocation, so you'd want to run ninja -v to get the final clang invocation that performs LTO and run that under time

This is what I tried when repro-ing that experiment.
I will try to use a vm and run it under valgrind

@felipepiovezan
Copy link
Contributor Author

max-rss is measured using time -f "%M;%e", which should produce the same data as time -l. Note that this is only for the final linker invocation, so you'd want to run ninja -v to get the final clang invocation that performs LTO and run that under time. (For actual analysis, I would run under massif.)

It's possible that there is some platform-specific difference here (libstdc++ vs libc++), but it seems unlikely that that this causes a 10% regression on Linux and no change on MacOS.

Are you sure time -f %M is reliable in the presence of a forking program like clang?
I used an Ubuntu VM and measured this with and without the patch and there is a ~5% difference:

166,580  -> 157,924

But then running this under massif I get no measured difference at all between runs and the peak memory usage is quite different from the time output, and the graph is identical for both runs, which makes me think this is also not measuring the right process (even though the docs say "If your program forks, the child will inherit all the profiling data that has been gathered for the parent.").

Command:            /home/build-user/llvm-project//build_RelWithDebInfo/bin/clang -O3 -fomit-frame-pointer -flto -DNDEBUG -g -fuse-ld=/home/build-user/llvm-project//build_RelWithDebInfo/bin/ld.lld CMakeFiles/pairlocalalign.dir/Calignm1.c.o CMakeFiles/pairlocalalign.dir/constants.c.o CMakeFiles/pairlocalalign.dir/defs.c.o CMakeFiles/pairlocalalign.dir/Falign.c.o CMakeFiles/pairlocalalign.dir/fft.c.o CMakeFiles/pairlocalalign.dir/fftFunctions.c.o CMakeFiles/pairlocalalign.dir/Galign11.c.o CMakeFiles/pairlocalalign.dir/genalign11.c.o CMakeFiles/pairlocalalign.dir/genGalign11.c.o CMakeFiles/pairlocalalign.dir/Halignmm.c.o CMakeFiles/pairlocalalign.dir/io.c.o CMakeFiles/pairlocalalign.dir/Lalign11.c.o CMakeFiles/pairlocalalign.dir/Lalignmm.c.o CMakeFiles/pairlocalalign.dir/mltaln9.c.o CMakeFiles/pairlocalalign.dir/MSalign11.c.o CMakeFiles/pairlocalalign.dir/MSalignmm.c.o CMakeFiles/pairlocalalign.dir/mtxutl.c.o CMakeFiles/pairlocalalign.dir/pairlocalalign.c.o CMakeFiles/pairlocalalign.dir/partQalignmm.c.o CMakeFiles/pairlocalalign.dir/partSalignmm.c.o CMakeFiles/pairlocalalign.dir/Qalignmm.c.o CMakeFiles/pairlocalalign.dir/Ralignmm.c.o CMakeFiles/pairlocalalign.dir/rna.c.o CMakeFiles/pairlocalalign.dir/SAalignmm.c.o CMakeFiles/pairlocalalign.dir/Salignmm.c.o CMakeFiles/pairlocalalign.dir/suboptalign11.c.o CMakeFiles/pairlocalalign.dir/tddis.c.o -o pairlocalalign -lm
Massif arguments:   --time-unit=B --massif-out-file=/home/build-user/test-suite/stats_with_patch.massif.out
ms_print arguments: /home/build-user/test-suite/stats_with_patch.massif.out
--------------------------------------------------------------------------------


    KB
436.1^                                     @@  ::  ::
     |                                    #@   :   @    :::
     |                               @::::#@ ::: ::@ ::::: :
     |                             @:@::: #@ : : : @ ::::: ::
     |                           ::@:@::: #@ : : : @ ::::: :::
     |                           : @:@::: #@ : : : @ ::::: :::::
     |                           : @:@::: #@ : : : @ ::::: :::: :
     |                     : ::::: @:@::: #@ : : : @ ::::: :::: ::::
     |                     :::   : @:@::: #@ : : : @ ::::: :::: ::
     |                  ::::::   : @:@::: #@ : : : @ ::::: :::: ::  @
     |                 @:  :::   : @:@::: #@ : : : @ ::::: :::: ::  @:
     |                :@:  :::   : @:@::: #@ : : : @ ::::: :::: ::  @:::
     |             ::::@:  :::   : @:@::: #@ : : : @ ::::: :::: ::  @:: :
     |           :::: :@:  :::   : @:@::: #@ : : : @ ::::: :::: ::  @:: :@
     |         @:: :: :@:  :::   : @:@::: #@ : : : @ ::::: :::: ::  @:: :@:::
     |       @:@:: :: :@:  :::   : @:@::: #@ : : : @ ::::: :::: ::  @:: :@
     |     ::@:@:: :: :@:  :::   : @:@::: #@ : : : @ ::::: :::: ::  @:: :@
     |    :: @:@:: :: :@:  :::   : @:@::: #@ : : : @ ::::: :::: ::  @:: :@   :
     |    :: @:@:: :: :@:  :::   : @:@::: #@ : : : @ ::::: :::: ::  @:: :@   :
     |    :: @:@:: :: :@:  :::   : @:@::: #@ : : : @ ::::: :::: ::  @:: :@   :
   0 +----------------------------------------------------------------------->MB
     0                                                                   1.203
 Detailed snapshots: [3, 5, 11, 17, 19, 24 (peak), 25, 32, 45, 51, 61]

I also tried manually linking all the LTO object files with llvm-link and then running them through llc, and here I see a small memory difference. With patch:

    MB
137.2^                                                                     #
     |                                                                  @@:#
     |                                                             @@@::@@:#
     |                                                          @ @@@@::@@:#
     |                                                       @  @:@@@@::@@:#
     |                                                       @::@:@@@@::@@:#
     |                                                      @@::@:@@@@::@@:#
     |                                                   ::@@@::@:@@@@::@@:#:
     |                                                :::: @@@::@:@@@@::@@:#:
     |                                             @:::::: @@@::@:@@@@::@@:#:
     |                                         @:::@: :::: @@@::@:@@@@::@@:#::
     |                                   @    :@:: @: :::: @@@::@:@@@@::@@:#::
     |                                   @:::::@:: @: :::: @@@::@:@@@@::@@:#::
     |               @@@::::::::::::: :::@:::::@:: @: :::: @@@::@:@@@@::@@:#::
     |       @ ::::::@@ :: : : :::: :::::@:::::@:: @: :::: @@@::@:@@@@::@@:#::
     |:::::::@:::: : @@ :: : : :::: :::::@:::::@:: @: :::: @@@::@:@@@@::@@:#::
     |:: ::::@:::: : @@ :: : : :::: :::::@:::::@:: @: :::: @@@::@:@@@@::@@:#::
     |:: ::::@:::: : @@ :: : : :::: :::::@:::::@:: @: :::: @@@::@:@@@@::@@:#::
     |:: ::::@:::: : @@ :: : : :::: :::::@:::::@:: @: :::: @@@::@:@@@@::@@:#::
     |:: ::::@:::: : @@ :: : : :::: :::::@:::::@:: @: :::: @@@::@:@@@@::@@:#::
   0 +----------------------------------------------------------------------->GB
     0                                                                   7.103

Without patch:

    MB
136.0^                                                                    #
     |                                                               @  @@#
     |                                                             @@@::@@#:
     |                                                          @@:@@@::@@#:
     |                                                          @@:@@@::@@#:
     |                                                       :::@@:@@@::@@#:
     |                                                      ::: @@:@@@::@@#:
     |                                                   :@@::: @@:@@@::@@#:
     |                                              @:::::@ ::: @@:@@@::@@#::
     |                                              @::: :@ ::: @@:@@@::@@#::
     |                                         @::::@::: :@ ::: @@:@@@::@@#:::
     |                                  @     :@::: @::: :@ ::: @@:@@@::@@#:::
     |                                  @  ::::@::: @::: :@ ::: @@:@@@::@@#:::
     |                @::::@@::@:::  :@:@::: ::@::: @::: :@ ::: @@:@@@::@@#:::
     |     : @::::::::@::: @ : @::::::@:@: : ::@::: @::: :@ ::: @@:@@@::@@#:::
     |:::::::@:::: :: @::: @ : @:::: :@:@: : ::@::: @::: :@ ::: @@:@@@::@@#:::
     |:: ::::@:::: :: @::: @ : @:::: :@:@: : ::@::: @::: :@ ::: @@:@@@::@@#:::
     |:: ::::@:::: :: @::: @ : @:::: :@:@: : ::@::: @::: :@ ::: @@:@@@::@@#:::
     |:: ::::@:::: :: @::: @ : @:::: :@:@: : ::@::: @::: :@ ::: @@:@@@::@@#:::
     |:: ::::@:::: :: @::: @ : @:::: :@:@: : ::@::: @::: :@ ::: @@:@@@::@@#:::
   0 +----------------------------------------------------------------------->GB
     0                                                                   7.107

In these traces I see InstrRefBasedImpl.cpp show up quite a bit (unlike the clang ones). Also note how here the peak memory is around 100MB instead of KBs. The difference is quite small between the two.

@nikic
Copy link
Contributor

nikic commented Dec 19, 2023

You need to pass --trace-children=yes to valgrind to follow fork.

I just tested this myself locally, and I can reproduce the difference. The time output is 158696 before and 171124 after.

Here are the massif results for the two runs: https://gist.github.com/nikic/bfdd7686c0d6c1ddc076d527148da851 These show a peak of 72MB vs 84MB.

@felipepiovezan
Copy link
Contributor Author

You need to pass --trace-children=yes to valgrind to follow fork.

I just tested this myself locally, and I can reproduce the difference. The time output is 158696 before and 171124 after.

Here are the massif results for the two runs: https://gist.github.com/nikic/bfdd7686c0d6c1ddc076d527148da851 These show a peak of 72MB vs 84MB.

Thank you! With --trace-children, I've been able to see an increase as well. In my setup, it goes from 87MB to 94 MB

felipepiovezan added a commit to felipepiovezan/llvm-project that referenced this pull request Dec 20, 2023
Commit 1b531d5 (llvm#74203) removed the usage of unique_ptrs of arrays in
favour of using vectors, but inadvertently increased peak memory usage by
removing the ability to deallocate vector memory that was no longer needed
mid-LDV.

In that same review, it was pointed out that `FuncValueTable` typedef could be
removed, since it was "just a vector".

This commit addresses both issues by making `FuncValueTable` a real data
structure, capable of mapping BBs to ValueTables and able to free ValueTables as
needed.

This reduces peak memory usage in the compiler by 10% in the benchmarks flagged
by the original review.

As a consequence, we had to remove a handful of instances of the
"declare-then-initialize" antipattern in unittests, as the FuncValueTable class
is no longer default-constructible.
@felipepiovezan
Copy link
Contributor Author

Fixes in #76051

felipepiovezan added a commit that referenced this pull request Dec 23, 2023
Commit 1b531d5 (#74203) removed the usage of unique_ptrs of arrays
in favour of using vectors, but inadvertently increased peak memory
usage by removing the ability to deallocate vector memory that was no
longer needed mid-LDV.

In that same review, it was pointed out that `FuncValueTable` typedef
could be removed, since it was "just a vector".

This commit addresses both issues by making `FuncValueTable` a real data
structure, capable of mapping BBs to ValueTables and able to free
ValueTables as needed.

This reduces peak memory usage in the compiler by 10% in the benchmarks
flagged by the original review.

As a consequence, we had to remove a handful of instances of the
"declare-then-initialize" antipattern in unittests, as the
FuncValueTable class is no longer default-constructible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants