-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Linker: Remove dropTriviallyDeadConstantArrays(). #137081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linker: Remove dropTriviallyDeadConstantArrays(). #137081
Conversation
Created using spr 1.3.6-beta.1
@llvm/pr-subscribers-lto Author: Peter Collingbourne (pcc) ChangesBy calling this function after every link, we introduce quadratic This call was originally introduced to reduce memory consumption during
As you can see, any max RSS differences are in the noise. Full diff: https://github.com/llvm/llvm-project/pull/137081.diff 4 Files Affected:
diff --git a/llvm/include/llvm/IR/Module.h b/llvm/include/llvm/IR/Module.h
index 91ccd76c41e07..6d06c12a77759 100644
--- a/llvm/include/llvm/IR/Module.h
+++ b/llvm/include/llvm/IR/Module.h
@@ -877,15 +877,6 @@ class LLVM_ABI Module {
}
/// @}
- /// Destroy ConstantArrays in LLVMContext if they are not used.
- /// ConstantArrays constructed during linking can cause quadratic memory
- /// explosion. Releasing all unused constants can cause a 20% LTO compile-time
- /// slowdown for a large application.
- ///
- /// NOTE: Constants are currently owned by LLVMContext. This can then only
- /// be called where all uses of the LLVMContext are understood.
- void dropTriviallyDeadConstantArrays();
-
/// @name Utility functions for printing and dumping Module objects
/// @{
diff --git a/llvm/lib/IR/LLVMContextImpl.cpp b/llvm/lib/IR/LLVMContextImpl.cpp
index d8cdbf8370984..dfeabb4cbc7c1 100644
--- a/llvm/lib/IR/LLVMContextImpl.cpp
+++ b/llvm/lib/IR/LLVMContextImpl.cpp
@@ -147,33 +147,6 @@ LLVMContextImpl::~LLVMContextImpl() {
delete Pair.second;
}
-void LLVMContextImpl::dropTriviallyDeadConstantArrays() {
- SmallSetVector<ConstantArray *, 4> WorkList;
-
- // When ArrayConstants are of substantial size and only a few in them are
- // dead, starting WorkList with all elements of ArrayConstants can be
- // wasteful. Instead, starting WorkList with only elements that have empty
- // uses.
- for (ConstantArray *C : ArrayConstants)
- if (C->use_empty())
- WorkList.insert(C);
-
- while (!WorkList.empty()) {
- ConstantArray *C = WorkList.pop_back_val();
- if (C->use_empty()) {
- for (const Use &Op : C->operands()) {
- if (auto *COp = dyn_cast<ConstantArray>(Op))
- WorkList.insert(COp);
- }
- C->destroyConstant();
- }
- }
-}
-
-void Module::dropTriviallyDeadConstantArrays() {
- Context.pImpl->dropTriviallyDeadConstantArrays();
-}
-
namespace llvm {
/// Make MDOperand transparent for hashing.
diff --git a/llvm/lib/IR/LLVMContextImpl.h b/llvm/lib/IR/LLVMContextImpl.h
index 9b60b57e64130..71369772f4529 100644
--- a/llvm/lib/IR/LLVMContextImpl.h
+++ b/llvm/lib/IR/LLVMContextImpl.h
@@ -1808,9 +1808,6 @@ class LLVMContextImpl {
LLVMContextImpl(LLVMContext &C);
~LLVMContextImpl();
- /// Destroy the ConstantArrays if they are not used.
- void dropTriviallyDeadConstantArrays();
-
mutable OptPassGate *OPG = nullptr;
/// Access the object which can disable optional passes and individual
diff --git a/llvm/lib/Linker/IRMover.cpp b/llvm/lib/Linker/IRMover.cpp
index 0f32e9e1a05a5..af2ac13e3665b 100644
--- a/llvm/lib/Linker/IRMover.cpp
+++ b/llvm/lib/Linker/IRMover.cpp
@@ -1766,7 +1766,5 @@ Error IRMover::move(std::unique_ptr<Module> Src,
IRLinker TheIRLinker(Composite, SharedMDs, IdentifiedStructTypes,
std::move(Src), ValuesToLink, std::move(AddLazyFor),
IsPerformingImport);
- Error E = TheIRLinker.run();
- Composite.dropTriviallyDeadConstantArrays();
- return E;
+ return TheIRLinker.run();
}
|
@llvm/pr-subscribers-llvm-ir Author: Peter Collingbourne (pcc) ChangesBy calling this function after every link, we introduce quadratic This call was originally introduced to reduce memory consumption during
As you can see, any max RSS differences are in the noise. Full diff: https://github.com/llvm/llvm-project/pull/137081.diff 4 Files Affected:
diff --git a/llvm/include/llvm/IR/Module.h b/llvm/include/llvm/IR/Module.h
index 91ccd76c41e07..6d06c12a77759 100644
--- a/llvm/include/llvm/IR/Module.h
+++ b/llvm/include/llvm/IR/Module.h
@@ -877,15 +877,6 @@ class LLVM_ABI Module {
}
/// @}
- /// Destroy ConstantArrays in LLVMContext if they are not used.
- /// ConstantArrays constructed during linking can cause quadratic memory
- /// explosion. Releasing all unused constants can cause a 20% LTO compile-time
- /// slowdown for a large application.
- ///
- /// NOTE: Constants are currently owned by LLVMContext. This can then only
- /// be called where all uses of the LLVMContext are understood.
- void dropTriviallyDeadConstantArrays();
-
/// @name Utility functions for printing and dumping Module objects
/// @{
diff --git a/llvm/lib/IR/LLVMContextImpl.cpp b/llvm/lib/IR/LLVMContextImpl.cpp
index d8cdbf8370984..dfeabb4cbc7c1 100644
--- a/llvm/lib/IR/LLVMContextImpl.cpp
+++ b/llvm/lib/IR/LLVMContextImpl.cpp
@@ -147,33 +147,6 @@ LLVMContextImpl::~LLVMContextImpl() {
delete Pair.second;
}
-void LLVMContextImpl::dropTriviallyDeadConstantArrays() {
- SmallSetVector<ConstantArray *, 4> WorkList;
-
- // When ArrayConstants are of substantial size and only a few in them are
- // dead, starting WorkList with all elements of ArrayConstants can be
- // wasteful. Instead, starting WorkList with only elements that have empty
- // uses.
- for (ConstantArray *C : ArrayConstants)
- if (C->use_empty())
- WorkList.insert(C);
-
- while (!WorkList.empty()) {
- ConstantArray *C = WorkList.pop_back_val();
- if (C->use_empty()) {
- for (const Use &Op : C->operands()) {
- if (auto *COp = dyn_cast<ConstantArray>(Op))
- WorkList.insert(COp);
- }
- C->destroyConstant();
- }
- }
-}
-
-void Module::dropTriviallyDeadConstantArrays() {
- Context.pImpl->dropTriviallyDeadConstantArrays();
-}
-
namespace llvm {
/// Make MDOperand transparent for hashing.
diff --git a/llvm/lib/IR/LLVMContextImpl.h b/llvm/lib/IR/LLVMContextImpl.h
index 9b60b57e64130..71369772f4529 100644
--- a/llvm/lib/IR/LLVMContextImpl.h
+++ b/llvm/lib/IR/LLVMContextImpl.h
@@ -1808,9 +1808,6 @@ class LLVMContextImpl {
LLVMContextImpl(LLVMContext &C);
~LLVMContextImpl();
- /// Destroy the ConstantArrays if they are not used.
- void dropTriviallyDeadConstantArrays();
-
mutable OptPassGate *OPG = nullptr;
/// Access the object which can disable optional passes and individual
diff --git a/llvm/lib/Linker/IRMover.cpp b/llvm/lib/Linker/IRMover.cpp
index 0f32e9e1a05a5..af2ac13e3665b 100644
--- a/llvm/lib/Linker/IRMover.cpp
+++ b/llvm/lib/Linker/IRMover.cpp
@@ -1766,7 +1766,5 @@ Error IRMover::move(std::unique_ptr<Module> Src,
IRLinker TheIRLinker(Composite, SharedMDs, IdentifiedStructTypes,
std::move(Src), ValuesToLink, std::move(AddLazyFor),
IsPerformingImport);
- Error E = TheIRLinker.run();
- Composite.dropTriviallyDeadConstantArrays();
- return E;
+ return TheIRLinker.run();
}
|
I'd expect this optimization is mainly targeting linking many modules that have globals with appending linkage -- not sure if clang exercises that. It should be possible to address memory usage for that case more directly though, by tracking the old ConstantArrays for the appending linkage link and only trying to clean them up. |
Each of our translation units with a I probably wouldn't try to optimize this preemptively in any case; even if the issue somehow still exists it's possible that anyone who previously cared about it has moved to ThinLTO (the patch that introduced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the wrong place to do this optimization anyway - is there no optimization pass that removes dead constant arrays, e.g. GlobalDCE?
GlobalDCE only deletes dead globals as far as I can tell. Although I suppose we could have passes (and other code like the IR linker) try to delete constants that have likely become unused (e.g. GlobalDCE can try to delete initializers of globals it deletes), that should only be done if the benefits outweigh the costs, which is not the case here. |
By calling this function after every link, we introduce quadratic behavior during full LTO links that primarily affects links involving large numbers of constant arrays, such as the full LTO part of a ThinLTO link which will involve large numbers of vtable constants. Removing this call resulted in a 1.3x speedup in the indexing phase of a large internal Google binary. This call was originally introduced to reduce memory consumption during full LTO (see commit dab999d), but it doesn't seem to be the case that this helps any more. I ran 3 stage2 links of clang with full LTO and ThinLTO with and without this change and measured the max RSS. The results were as follows (all in KB): ``` FullLTO before 22362512 22383524 22387456 after 22383496 22365356 22364352 ThinLTO before 4391404 4478192 4383468 after 4399220 4363100 4417688 ``` As you can see, any max RSS differences are in the noise. Reviewers: nikic, teresajohnson Reviewed By: teresajohnson, nikic Pull Request: llvm#137081
By calling this function after every link, we introduce quadratic behavior during full LTO links that primarily affects links involving large numbers of constant arrays, such as the full LTO part of a ThinLTO link which will involve large numbers of vtable constants. Removing this call resulted in a 1.3x speedup in the indexing phase of a large internal Google binary. This call was originally introduced to reduce memory consumption during full LTO (see commit dab999d), but it doesn't seem to be the case that this helps any more. I ran 3 stage2 links of clang with full LTO and ThinLTO with and without this change and measured the max RSS. The results were as follows (all in KB): ``` FullLTO before 22362512 22383524 22387456 after 22383496 22365356 22364352 ThinLTO before 4391404 4478192 4383468 after 4399220 4363100 4417688 ``` As you can see, any max RSS differences are in the noise. Reviewers: nikic, teresajohnson Reviewed By: teresajohnson, nikic Pull Request: llvm#137081
By calling this function after every link, we introduce quadratic behavior during full LTO links that primarily affects links involving large numbers of constant arrays, such as the full LTO part of a ThinLTO link which will involve large numbers of vtable constants. Removing this call resulted in a 1.3x speedup in the indexing phase of a large internal Google binary. This call was originally introduced to reduce memory consumption during full LTO (see commit dab999d), but it doesn't seem to be the case that this helps any more. I ran 3 stage2 links of clang with full LTO and ThinLTO with and without this change and measured the max RSS. The results were as follows (all in KB): ``` FullLTO before 22362512 22383524 22387456 after 22383496 22365356 22364352 ThinLTO before 4391404 4478192 4383468 after 4399220 4363100 4417688 ``` As you can see, any max RSS differences are in the noise. Reviewers: nikic, teresajohnson Reviewed By: teresajohnson, nikic Pull Request: llvm#137081
By calling this function after every link, we introduce quadratic behavior during full LTO links that primarily affects links involving large numbers of constant arrays, such as the full LTO part of a ThinLTO link which will involve large numbers of vtable constants. Removing this call resulted in a 1.3x speedup in the indexing phase of a large internal Google binary. This call was originally introduced to reduce memory consumption during full LTO (see commit dab999d), but it doesn't seem to be the case that this helps any more. I ran 3 stage2 links of clang with full LTO and ThinLTO with and without this change and measured the max RSS. The results were as follows (all in KB): ``` FullLTO before 22362512 22383524 22387456 after 22383496 22365356 22364352 ThinLTO before 4391404 4478192 4383468 after 4399220 4363100 4417688 ``` As you can see, any max RSS differences are in the noise. Reviewers: nikic, teresajohnson Reviewed By: teresajohnson, nikic Pull Request: llvm/llvm-project#137081
By calling this function after every link, we introduce quadratic behavior during full LTO links that primarily affects links involving large numbers of constant arrays, such as the full LTO part of a ThinLTO link which will involve large numbers of vtable constants. Removing this call resulted in a 1.3x speedup in the indexing phase of a large internal Google binary. This call was originally introduced to reduce memory consumption during full LTO (see commit dab999d), but it doesn't seem to be the case that this helps any more. I ran 3 stage2 links of clang with full LTO and ThinLTO with and without this change and measured the max RSS. The results were as follows (all in KB): ``` FullLTO before 22362512 22383524 22387456 after 22383496 22365356 22364352 ThinLTO before 4391404 4478192 4383468 after 4399220 4363100 4417688 ``` As you can see, any max RSS differences are in the noise. Reviewers: nikic, teresajohnson Reviewed By: teresajohnson, nikic Pull Request: llvm#137081
By calling this function after every link, we introduce quadratic behavior during full LTO links that primarily affects links involving large numbers of constant arrays, such as the full LTO part of a ThinLTO link which will involve large numbers of vtable constants. Removing this call resulted in a 1.3x speedup in the indexing phase of a large internal Google binary. This call was originally introduced to reduce memory consumption during full LTO (see commit dab999d), but it doesn't seem to be the case that this helps any more. I ran 3 stage2 links of clang with full LTO and ThinLTO with and without this change and measured the max RSS. The results were as follows (all in KB): ``` FullLTO before 22362512 22383524 22387456 after 22383496 22365356 22364352 ThinLTO before 4391404 4478192 4383468 after 4399220 4363100 4417688 ``` As you can see, any max RSS differences are in the noise. Reviewers: nikic, teresajohnson Reviewed By: teresajohnson, nikic Pull Request: llvm#137081
By calling this function after every link, we introduce quadratic
behavior during full LTO links that primarily affects links involving
large numbers of constant arrays, such as the full LTO part of a
ThinLTO link which will involve large numbers of vtable constants.
Removing this call resulted in a 1.3x speedup in the indexing phase
of a large internal Google binary.
This call was originally introduced to reduce memory consumption during
full LTO (see commit dab999d), but it
doesn't seem to be the case that this helps any more. I ran 3 stage2
links of clang with full LTO and ThinLTO with and without this change
and measured the max RSS. The results were as follows (all in KB):
As you can see, any max RSS differences are in the noise.