Skip to content

Linker: Remove dropTriviallyDeadConstantArrays(). #137081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

pcc
Copy link
Contributor

@pcc pcc commented Apr 23, 2025

By calling this function after every link, we introduce quadratic
behavior during full LTO links that primarily affects links involving
large numbers of constant arrays, such as the full LTO part of a
ThinLTO link which will involve large numbers of vtable constants.
Removing this call resulted in a 1.3x speedup in the indexing phase
of a large internal Google binary.

This call was originally introduced to reduce memory consumption during
full LTO (see commit dab999d), but it
doesn't seem to be the case that this helps any more. I ran 3 stage2
links of clang with full LTO and ThinLTO with and without this change
and measured the max RSS. The results were as follows (all in KB):

FullLTO before 22362512 22383524 22387456
         after 22383496 22365356 22364352
ThinLTO before  4391404  4478192  4383468
         after  4399220  4363100  4417688

As you can see, any max RSS differences are in the noise.

Created using spr 1.3.6-beta.1
@llvmbot llvmbot added LTO Link time optimization (regular/full LTO or ThinLTO) llvm:ir labels Apr 23, 2025
@pcc pcc requested a review from teresajohnson April 23, 2025 23:15
@llvmbot
Copy link
Member

llvmbot commented Apr 23, 2025

@llvm/pr-subscribers-lto

Author: Peter Collingbourne (pcc)

Changes

By calling this function after every link, we introduce quadratic
behavior during full LTO links that primarily affects links involving
large numbers of constant arrays, such as the full LTO part of a
ThinLTO link which will involve large numbers of vtable constants.
Removing this call resulted in a 1.3x speedup in the indexing phase
of a large internal Google binary.

This call was originally introduced to reduce memory consumption during
full LTO (see commit dab999d), but it
doesn't seem to be the case that this helps any more. I ran 3 stage2
links of clang with full LTO and ThinLTO with and without this change
and measured the max RSS. The results were as follows (all in KB):

FullLTO before 22362512 22383524 22387456
         after 22383496 22365356 22364352
ThinLTO before  4391404  4478192  4383468
         after  4399220  4363100  4417688

As you can see, any max RSS differences are in the noise.


Full diff: https://github.com/llvm/llvm-project/pull/137081.diff

4 Files Affected:

  • (modified) llvm/include/llvm/IR/Module.h (-9)
  • (modified) llvm/lib/IR/LLVMContextImpl.cpp (-27)
  • (modified) llvm/lib/IR/LLVMContextImpl.h (-3)
  • (modified) llvm/lib/Linker/IRMover.cpp (+1-3)
diff --git a/llvm/include/llvm/IR/Module.h b/llvm/include/llvm/IR/Module.h
index 91ccd76c41e07..6d06c12a77759 100644
--- a/llvm/include/llvm/IR/Module.h
+++ b/llvm/include/llvm/IR/Module.h
@@ -877,15 +877,6 @@ class LLVM_ABI Module {
   }
 /// @}
 
-  /// Destroy ConstantArrays in LLVMContext if they are not used.
-  /// ConstantArrays constructed during linking can cause quadratic memory
-  /// explosion. Releasing all unused constants can cause a 20% LTO compile-time
-  /// slowdown for a large application.
-  ///
-  /// NOTE: Constants are currently owned by LLVMContext. This can then only
-  /// be called where all uses of the LLVMContext are understood.
-  void dropTriviallyDeadConstantArrays();
-
 /// @name Utility functions for printing and dumping Module objects
 /// @{
 
diff --git a/llvm/lib/IR/LLVMContextImpl.cpp b/llvm/lib/IR/LLVMContextImpl.cpp
index d8cdbf8370984..dfeabb4cbc7c1 100644
--- a/llvm/lib/IR/LLVMContextImpl.cpp
+++ b/llvm/lib/IR/LLVMContextImpl.cpp
@@ -147,33 +147,6 @@ LLVMContextImpl::~LLVMContextImpl() {
     delete Pair.second;
 }
 
-void LLVMContextImpl::dropTriviallyDeadConstantArrays() {
-  SmallSetVector<ConstantArray *, 4> WorkList;
-
-  // When ArrayConstants are of substantial size and only a few in them are
-  // dead, starting WorkList with all elements of ArrayConstants can be
-  // wasteful. Instead, starting WorkList with only elements that have empty
-  // uses.
-  for (ConstantArray *C : ArrayConstants)
-    if (C->use_empty())
-      WorkList.insert(C);
-
-  while (!WorkList.empty()) {
-    ConstantArray *C = WorkList.pop_back_val();
-    if (C->use_empty()) {
-      for (const Use &Op : C->operands()) {
-        if (auto *COp = dyn_cast<ConstantArray>(Op))
-          WorkList.insert(COp);
-      }
-      C->destroyConstant();
-    }
-  }
-}
-
-void Module::dropTriviallyDeadConstantArrays() {
-  Context.pImpl->dropTriviallyDeadConstantArrays();
-}
-
 namespace llvm {
 
 /// Make MDOperand transparent for hashing.
diff --git a/llvm/lib/IR/LLVMContextImpl.h b/llvm/lib/IR/LLVMContextImpl.h
index 9b60b57e64130..71369772f4529 100644
--- a/llvm/lib/IR/LLVMContextImpl.h
+++ b/llvm/lib/IR/LLVMContextImpl.h
@@ -1808,9 +1808,6 @@ class LLVMContextImpl {
   LLVMContextImpl(LLVMContext &C);
   ~LLVMContextImpl();
 
-  /// Destroy the ConstantArrays if they are not used.
-  void dropTriviallyDeadConstantArrays();
-
   mutable OptPassGate *OPG = nullptr;
 
   /// Access the object which can disable optional passes and individual
diff --git a/llvm/lib/Linker/IRMover.cpp b/llvm/lib/Linker/IRMover.cpp
index 0f32e9e1a05a5..af2ac13e3665b 100644
--- a/llvm/lib/Linker/IRMover.cpp
+++ b/llvm/lib/Linker/IRMover.cpp
@@ -1766,7 +1766,5 @@ Error IRMover::move(std::unique_ptr<Module> Src,
   IRLinker TheIRLinker(Composite, SharedMDs, IdentifiedStructTypes,
                        std::move(Src), ValuesToLink, std::move(AddLazyFor),
                        IsPerformingImport);
-  Error E = TheIRLinker.run();
-  Composite.dropTriviallyDeadConstantArrays();
-  return E;
+  return TheIRLinker.run();
 }

@llvmbot
Copy link
Member

llvmbot commented Apr 23, 2025

@llvm/pr-subscribers-llvm-ir

Author: Peter Collingbourne (pcc)

Changes

By calling this function after every link, we introduce quadratic
behavior during full LTO links that primarily affects links involving
large numbers of constant arrays, such as the full LTO part of a
ThinLTO link which will involve large numbers of vtable constants.
Removing this call resulted in a 1.3x speedup in the indexing phase
of a large internal Google binary.

This call was originally introduced to reduce memory consumption during
full LTO (see commit dab999d), but it
doesn't seem to be the case that this helps any more. I ran 3 stage2
links of clang with full LTO and ThinLTO with and without this change
and measured the max RSS. The results were as follows (all in KB):

FullLTO before 22362512 22383524 22387456
         after 22383496 22365356 22364352
ThinLTO before  4391404  4478192  4383468
         after  4399220  4363100  4417688

As you can see, any max RSS differences are in the noise.


Full diff: https://github.com/llvm/llvm-project/pull/137081.diff

4 Files Affected:

  • (modified) llvm/include/llvm/IR/Module.h (-9)
  • (modified) llvm/lib/IR/LLVMContextImpl.cpp (-27)
  • (modified) llvm/lib/IR/LLVMContextImpl.h (-3)
  • (modified) llvm/lib/Linker/IRMover.cpp (+1-3)
diff --git a/llvm/include/llvm/IR/Module.h b/llvm/include/llvm/IR/Module.h
index 91ccd76c41e07..6d06c12a77759 100644
--- a/llvm/include/llvm/IR/Module.h
+++ b/llvm/include/llvm/IR/Module.h
@@ -877,15 +877,6 @@ class LLVM_ABI Module {
   }
 /// @}
 
-  /// Destroy ConstantArrays in LLVMContext if they are not used.
-  /// ConstantArrays constructed during linking can cause quadratic memory
-  /// explosion. Releasing all unused constants can cause a 20% LTO compile-time
-  /// slowdown for a large application.
-  ///
-  /// NOTE: Constants are currently owned by LLVMContext. This can then only
-  /// be called where all uses of the LLVMContext are understood.
-  void dropTriviallyDeadConstantArrays();
-
 /// @name Utility functions for printing and dumping Module objects
 /// @{
 
diff --git a/llvm/lib/IR/LLVMContextImpl.cpp b/llvm/lib/IR/LLVMContextImpl.cpp
index d8cdbf8370984..dfeabb4cbc7c1 100644
--- a/llvm/lib/IR/LLVMContextImpl.cpp
+++ b/llvm/lib/IR/LLVMContextImpl.cpp
@@ -147,33 +147,6 @@ LLVMContextImpl::~LLVMContextImpl() {
     delete Pair.second;
 }
 
-void LLVMContextImpl::dropTriviallyDeadConstantArrays() {
-  SmallSetVector<ConstantArray *, 4> WorkList;
-
-  // When ArrayConstants are of substantial size and only a few in them are
-  // dead, starting WorkList with all elements of ArrayConstants can be
-  // wasteful. Instead, starting WorkList with only elements that have empty
-  // uses.
-  for (ConstantArray *C : ArrayConstants)
-    if (C->use_empty())
-      WorkList.insert(C);
-
-  while (!WorkList.empty()) {
-    ConstantArray *C = WorkList.pop_back_val();
-    if (C->use_empty()) {
-      for (const Use &Op : C->operands()) {
-        if (auto *COp = dyn_cast<ConstantArray>(Op))
-          WorkList.insert(COp);
-      }
-      C->destroyConstant();
-    }
-  }
-}
-
-void Module::dropTriviallyDeadConstantArrays() {
-  Context.pImpl->dropTriviallyDeadConstantArrays();
-}
-
 namespace llvm {
 
 /// Make MDOperand transparent for hashing.
diff --git a/llvm/lib/IR/LLVMContextImpl.h b/llvm/lib/IR/LLVMContextImpl.h
index 9b60b57e64130..71369772f4529 100644
--- a/llvm/lib/IR/LLVMContextImpl.h
+++ b/llvm/lib/IR/LLVMContextImpl.h
@@ -1808,9 +1808,6 @@ class LLVMContextImpl {
   LLVMContextImpl(LLVMContext &C);
   ~LLVMContextImpl();
 
-  /// Destroy the ConstantArrays if they are not used.
-  void dropTriviallyDeadConstantArrays();
-
   mutable OptPassGate *OPG = nullptr;
 
   /// Access the object which can disable optional passes and individual
diff --git a/llvm/lib/Linker/IRMover.cpp b/llvm/lib/Linker/IRMover.cpp
index 0f32e9e1a05a5..af2ac13e3665b 100644
--- a/llvm/lib/Linker/IRMover.cpp
+++ b/llvm/lib/Linker/IRMover.cpp
@@ -1766,7 +1766,5 @@ Error IRMover::move(std::unique_ptr<Module> Src,
   IRLinker TheIRLinker(Composite, SharedMDs, IdentifiedStructTypes,
                        std::move(Src), ValuesToLink, std::move(AddLazyFor),
                        IsPerformingImport);
-  Error E = TheIRLinker.run();
-  Composite.dropTriviallyDeadConstantArrays();
-  return E;
+  return TheIRLinker.run();
 }

@nikic
Copy link
Contributor

nikic commented Apr 25, 2025

I'd expect this optimization is mainly targeting linking many modules that have globals with appending linkage -- not sure if clang exercises that. It should be possible to address memory usage for that case more directly though, by tracking the old ConstantArrays for the appending linkage link and only trying to clean them up.

@pcc
Copy link
Contributor Author

pcc commented Apr 26, 2025

Each of our translation units with a cl::opt will have an llvm.global_ctors entry so I would expect clang to exercise this case fairly realistically given the number of translation units we have that declare flags. I would also expect that clang overrepresents this case; most large C++ programs that I'm aware of try to avoid global constructors (see e.g. Google's style guide).

I probably wouldn't try to optimize this preemptively in any case; even if the issue somehow still exists it's possible that anyone who previously cared about it has moved to ThinLTO (the patch that introduced dropTriviallyDeadConstantArrays predates ThinLTO) which doesn't import appending arrays so I reckon we should make the change and see if anyone complains.

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the wrong place to do this optimization anyway - is there no optimization pass that removes dead constant arrays, e.g. GlobalDCE?

@pcc
Copy link
Contributor Author

pcc commented Apr 28, 2025

This seems like the wrong place to do this optimization anyway - is there no optimization pass that removes dead constant arrays, e.g. GlobalDCE?

GlobalDCE only deletes dead globals as far as I can tell. Although I suppose we could have passes (and other code like the IR linker) try to delete constants that have likely become unused (e.g. GlobalDCE can try to delete initializers of globals it deletes), that should only be done if the benefits outweigh the costs, which is not the case here.

@pcc pcc merged commit 95795ab into main Apr 28, 2025
14 checks passed
@pcc pcc deleted the users/pcc/spr/linker-remove-droptriviallydeadconstantarrays branch April 28, 2025 19:23
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
By calling this function after every link, we introduce quadratic
behavior during full LTO links that primarily affects links involving
large numbers of constant arrays, such as the full LTO part of a
ThinLTO link which will involve large numbers of vtable constants.
Removing this call resulted in a 1.3x speedup in the indexing phase
of a large internal Google binary.

This call was originally introduced to reduce memory consumption during
full LTO (see commit dab999d), but it
doesn't seem to be the case that this helps any more. I ran 3 stage2
links of clang with full LTO and ThinLTO with and without this change
and measured the max RSS. The results were as follows (all in KB):

```
FullLTO before 22362512 22383524 22387456
         after 22383496 22365356 22364352
ThinLTO before  4391404  4478192  4383468
         after  4399220  4363100  4417688
```

As you can see, any max RSS differences are in the noise.

Reviewers: nikic, teresajohnson

Reviewed By: teresajohnson, nikic

Pull Request: llvm#137081
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
By calling this function after every link, we introduce quadratic
behavior during full LTO links that primarily affects links involving
large numbers of constant arrays, such as the full LTO part of a
ThinLTO link which will involve large numbers of vtable constants.
Removing this call resulted in a 1.3x speedup in the indexing phase
of a large internal Google binary.

This call was originally introduced to reduce memory consumption during
full LTO (see commit dab999d), but it
doesn't seem to be the case that this helps any more. I ran 3 stage2
links of clang with full LTO and ThinLTO with and without this change
and measured the max RSS. The results were as follows (all in KB):

```
FullLTO before 22362512 22383524 22387456
         after 22383496 22365356 22364352
ThinLTO before  4391404  4478192  4383468
         after  4399220  4363100  4417688
```

As you can see, any max RSS differences are in the noise.

Reviewers: nikic, teresajohnson

Reviewed By: teresajohnson, nikic

Pull Request: llvm#137081
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
By calling this function after every link, we introduce quadratic
behavior during full LTO links that primarily affects links involving
large numbers of constant arrays, such as the full LTO part of a
ThinLTO link which will involve large numbers of vtable constants.
Removing this call resulted in a 1.3x speedup in the indexing phase
of a large internal Google binary.

This call was originally introduced to reduce memory consumption during
full LTO (see commit dab999d), but it
doesn't seem to be the case that this helps any more. I ran 3 stage2
links of clang with full LTO and ThinLTO with and without this change
and measured the max RSS. The results were as follows (all in KB):

```
FullLTO before 22362512 22383524 22387456
         after 22383496 22365356 22364352
ThinLTO before  4391404  4478192  4383468
         after  4399220  4363100  4417688
```

As you can see, any max RSS differences are in the noise.

Reviewers: nikic, teresajohnson

Reviewed By: teresajohnson, nikic

Pull Request: llvm#137081
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 6, 2025
By calling this function after every link, we introduce quadratic
behavior during full LTO links that primarily affects links involving
large numbers of constant arrays, such as the full LTO part of a
ThinLTO link which will involve large numbers of vtable constants.
Removing this call resulted in a 1.3x speedup in the indexing phase
of a large internal Google binary.

This call was originally introduced to reduce memory consumption during
full LTO (see commit dab999d), but it
doesn't seem to be the case that this helps any more. I ran 3 stage2
links of clang with full LTO and ThinLTO with and without this change
and measured the max RSS. The results were as follows (all in KB):

```
FullLTO before 22362512 22383524 22387456
         after 22383496 22365356 22364352
ThinLTO before  4391404  4478192  4383468
         after  4399220  4363100  4417688
```

As you can see, any max RSS differences are in the noise.

Reviewers: nikic, teresajohnson

Reviewed By: teresajohnson, nikic

Pull Request: llvm/llvm-project#137081
GeorgeARM pushed a commit to GeorgeARM/llvm-project that referenced this pull request May 7, 2025
By calling this function after every link, we introduce quadratic
behavior during full LTO links that primarily affects links involving
large numbers of constant arrays, such as the full LTO part of a
ThinLTO link which will involve large numbers of vtable constants.
Removing this call resulted in a 1.3x speedup in the indexing phase
of a large internal Google binary.

This call was originally introduced to reduce memory consumption during
full LTO (see commit dab999d), but it
doesn't seem to be the case that this helps any more. I ran 3 stage2
links of clang with full LTO and ThinLTO with and without this change
and measured the max RSS. The results were as follows (all in KB):

```
FullLTO before 22362512 22383524 22387456
         after 22383496 22365356 22364352
ThinLTO before  4391404  4478192  4383468
         after  4399220  4363100  4417688
```

As you can see, any max RSS differences are in the noise.

Reviewers: nikic, teresajohnson

Reviewed By: teresajohnson, nikic

Pull Request: llvm#137081
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request May 9, 2025
By calling this function after every link, we introduce quadratic
behavior during full LTO links that primarily affects links involving
large numbers of constant arrays, such as the full LTO part of a
ThinLTO link which will involve large numbers of vtable constants.
Removing this call resulted in a 1.3x speedup in the indexing phase
of a large internal Google binary.

This call was originally introduced to reduce memory consumption during
full LTO (see commit dab999d), but it
doesn't seem to be the case that this helps any more. I ran 3 stage2
links of clang with full LTO and ThinLTO with and without this change
and measured the max RSS. The results were as follows (all in KB):

```
FullLTO before 22362512 22383524 22387456
         after 22383496 22365356 22364352
ThinLTO before  4391404  4478192  4383468
         after  4399220  4363100  4417688
```

As you can see, any max RSS differences are in the noise.

Reviewers: nikic, teresajohnson

Reviewed By: teresajohnson, nikic

Pull Request: llvm#137081
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:ir LTO Link time optimization (regular/full LTO or ThinLTO)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants