-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[NFCI][LTO][lld] Optimize away symbol copies within LTO global resolution in ELF #106193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-lto @llvm/pr-subscribers-llvm-support Author: Mingming Liu (minglotus-6) Changes
This change proposes to optimize away string copies inside LTO::GlobalResolutions, which will make LTO indexing more memory efficient for ELF. There are similar opportunities for other (COFF, wasm, MachO) formats but I'm not familiar with them enough or have enough testing capability. The optimization takes place for lld (ELF) only. For the rest of use cases (gold plugin, Together with @kazutakahirata's work to make [1] ELF llvm-project/lld/ELF/InputFiles.cpp Line 1748 in 1cea5c2
Full diff: https://github.com/llvm/llvm-project/pull/106193.diff 5 Files Affected:
diff --git a/lld/ELF/LTO.cpp b/lld/ELF/LTO.cpp
index 935d0a9eab9ee0..f339f1c2c0ec21 100644
--- a/lld/ELF/LTO.cpp
+++ b/lld/ELF/LTO.cpp
@@ -135,6 +135,7 @@ static lto::Config createConfig() {
config->ltoValidateAllVtablesHaveTypeInfos;
c.AllVtablesHaveTypeInfos = ctx.ltoAllVtablesHaveTypeInfos;
c.AlwaysEmitRegularLTOObj = !config->ltoObjPath.empty();
+ c.KeepSymbolNameCopies = false;
for (const llvm::StringRef &name : config->thinLTOModulesToCompile)
c.ThinLTOModulesToCompile.emplace_back(name);
diff --git a/llvm/include/llvm/LTO/Config.h b/llvm/include/llvm/LTO/Config.h
index 482b6e55a19d35..a49cce9f30e20c 100644
--- a/llvm/include/llvm/LTO/Config.h
+++ b/llvm/include/llvm/LTO/Config.h
@@ -88,6 +88,11 @@ struct Config {
/// want to know a priori all possible output files.
bool AlwaysEmitRegularLTOObj = false;
+ /// If true, the LTO instance creates copies of the symbol names for LTO::run.
+ /// The lld linker uses string saver to keep symbol names alive and doesn't
+ /// need to create copies, so it can set this field to false.
+ bool KeepSymbolNameCopies = true;
+
/// Allows non-imported definitions to get the potentially more constraining
/// visibility from the prevailing definition. FromPrevailing is the default
/// because it works for many binary formats. ELF can use the more optimized
diff --git a/llvm/include/llvm/LTO/LTO.h b/llvm/include/llvm/LTO/LTO.h
index 30eda34cd7ba54..7282e7bbcf451c 100644
--- a/llvm/include/llvm/LTO/LTO.h
+++ b/llvm/include/llvm/LTO/LTO.h
@@ -15,6 +15,9 @@
#ifndef LLVM_LTO_LTO_H
#define LLVM_LTO_LTO_H
+#include <memory>
+
+#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/StringMap.h"
#include "llvm/Bitcode/BitcodeReader.h"
@@ -23,6 +26,7 @@
#include "llvm/Object/IRSymtab.h"
#include "llvm/Support/Caching.h"
#include "llvm/Support/Error.h"
+#include "llvm/Support/StringSaver.h"
#include "llvm/Support/thread.h"
#include "llvm/Transforms/IPO/FunctionAttrs.h"
#include "llvm/Transforms/IPO/FunctionImport.h"
@@ -36,6 +40,7 @@ class MemoryBufferRef;
class Module;
class raw_pwrite_stream;
class ToolOutputFile;
+class UniqueStringSaver;
/// Resolve linkage for prevailing symbols in the \p Index. Linkage changes
/// recorded in the index and the ThinLTO backends must apply the changes to
@@ -348,6 +353,11 @@ class LTO {
DenseMap<GlobalValue::GUID, StringRef> PrevailingModuleForGUID;
} ThinLTO;
+ std::unique_ptr<llvm::BumpPtrAllocator> Alloc;
+
+ // Symbol saver for global resolution map.
+ std::unique_ptr<llvm::StringSaver> GlobalResolutionSymbolSaver;
+
// The global resolution for a particular (mangled) symbol name. This is in
// particular necessary to track whether each symbol can be internalized.
// Because any input file may introduce a new cross-partition reference, we
@@ -406,9 +416,10 @@ class LTO {
};
// Global mapping from mangled symbol names to resolutions.
- // Make this an optional to guard against accessing after it has been reset
+ // Make this an unique_ptr to guard against accessing after it has been reset
// (to reduce memory after we're done with it).
- std::optional<StringMap<GlobalResolution>> GlobalResolutions;
+ std::unique_ptr<llvm::DenseMap<StringRef, GlobalResolution>>
+ GlobalResolutions;
void addModuleToGlobalRes(ArrayRef<InputFile::Symbol> Syms,
ArrayRef<SymbolResolution> Res, unsigned Partition,
diff --git a/llvm/include/llvm/Support/Allocator.h b/llvm/include/llvm/Support/Allocator.h
index 568f0d34032fa2..3c54556bfefd34 100644
--- a/llvm/include/llvm/Support/Allocator.h
+++ b/llvm/include/llvm/Support/Allocator.h
@@ -22,6 +22,7 @@
#include "llvm/Support/AllocatorBase.h"
#include "llvm/Support/Compiler.h"
#include "llvm/Support/MathExtras.h"
+#include "llvm/Support/raw_ostream.h"
#include <algorithm>
#include <cassert>
#include <cstddef>
diff --git a/llvm/lib/LTO/LTO.cpp b/llvm/lib/LTO/LTO.cpp
index bb3c9f7acdb8e5..dcba01d4cc05ee 100644
--- a/llvm/lib/LTO/LTO.cpp
+++ b/llvm/lib/LTO/LTO.cpp
@@ -78,6 +78,10 @@ cl::opt<bool> EnableLTOInternalization(
"enable-lto-internalization", cl::init(true), cl::Hidden,
cl::desc("Enable global value internalization in LTO"));
+static cl::opt<bool>
+ LTOKeepSymbolCopies("lto-keep-symbol-copies", cl::init(false), cl::Hidden,
+ cl::desc("Keep copies of symbols in LTO indexing"));
+
/// Indicate we are linking with an allocator that supports hot/cold operator
/// new interfaces.
extern cl::opt<bool> SupportsHotColdNew;
@@ -607,8 +611,14 @@ LTO::LTO(Config Conf, ThinBackend Backend,
: Conf(std::move(Conf)),
RegularLTO(ParallelCodeGenParallelismLevel, this->Conf),
ThinLTO(std::move(Backend)),
- GlobalResolutions(std::make_optional<StringMap<GlobalResolution>>()),
- LTOMode(LTOMode) {}
+ GlobalResolutions(
+ std::make_unique<DenseMap<StringRef, GlobalResolution>>()),
+ LTOMode(LTOMode) {
+ if (Conf.KeepSymbolNameCopies || LTOKeepSymbolCopies) {
+ Alloc = std::make_unique<BumpPtrAllocator>();
+ GlobalResolutionSymbolSaver = std::make_unique<llvm::StringSaver>(*Alloc);
+ }
+}
// Requires a destructor for MapVector<BitcodeModule>.
LTO::~LTO() = default;
@@ -626,7 +636,12 @@ void LTO::addModuleToGlobalRes(ArrayRef<InputFile::Symbol> Syms,
assert(ResI != ResE);
SymbolResolution Res = *ResI++;
- auto &GlobalRes = (*GlobalResolutions)[Sym.getName()];
+ StringRef SymbolName = Sym.getName();
+ // Keep copies of symbols if the client of LTO says so.
+ if (GlobalResolutionSymbolSaver && !GlobalResolutions->contains(SymbolName))
+ SymbolName = GlobalResolutionSymbolSaver->save(SymbolName);
+
+ auto &GlobalRes = (*GlobalResolutions)[SymbolName];
GlobalRes.UnnamedAddr &= Sym.isUnnamedAddr();
if (Res.Prevailing) {
assert(!GlobalRes.Prevailing &&
@@ -1797,6 +1812,9 @@ Error LTO::runThinLTO(AddStreamFn AddStream, FileCache Cache,
// cross module importing, which adds to peak memory via the computed import
// and export lists.
GlobalResolutions.reset();
+ // Reset the bump pointer allocator to release its memory.
+ GlobalResolutionSymbolSaver.reset();
+ Alloc.reset();
if (Conf.OptLevel > 0)
ComputeCrossModuleImport(ThinLTO.CombinedIndex, ModuleToDefinedGVSummaries,
|
@@ -22,6 +22,7 @@ | |||
#include "llvm/Support/AllocatorBase.h" | |||
#include "llvm/Support/Compiler.h" | |||
#include "llvm/Support/MathExtras.h" | |||
#include "llvm/Support/raw_ostream.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed or was it added earlier for debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added for debugging. I removed it.
When a bitcode file is parsed lazily [1], undefined symbols in a module won't get saved. In a LTO unit, a symbol is saved as long as one module defines it. I'll add some logs in LTO class to see how undefined symbols are resolved. If LTO cannot rely on string savers' copies for undefined symbols, it needs to keep copies of undefined symbols itself (still a subset). [1] llvm-project/lld/ELF/InputFiles.h Lines 142 to 144 in 2c7e1b8
|
StringRef SymbolName = Sym.getName(); | ||
// Keep copies of symbols if the client of LTO says so. | ||
if (GlobalResolutionSymbolSaver && !GlobalResolutions->contains(SymbolName)) | ||
SymbolName = GlobalResolutionSymbolSaver->save(SymbolName); | ||
|
||
auto &GlobalRes = (*GlobalResolutions)[SymbolName]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A somewhat nit picky suggestion, but should we create a wrapper class for llvm::DenseMap<StringRef, GlobalResolution>
? This way, we can intercept its operator()[]
and put all the deduplication logic there.
This way, the application logic should stay clear. That is:
auto &GlobalRes = (*GlobalResolutions)[Sym.getName()];
staying as is.
We should be able to put members like:
std::unique_ptr<llvm::BumpPtrAllocator> Alloc;
// Symbol saver for global resolution map.
std::unique_ptr<llvm::StringSaver> GlobalResolutionSymbolSaver;
into the wrapper class and probably strip away std::unique_ptr
from them, which in turn should let you remove:
GlobalResolutionSymbolSaver.reset();
Alloc.reset();
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a helper method LTO::releaseGlobalResolutionsMemory
which releases the memory used by GlobalResolutions together. I'm not sure if having a wrapper class has the correct ROI for code simplicity here as reader also need to parse the added logic of wrapper class.
@@ -78,6 +78,10 @@ cl::opt<bool> EnableLTOInternalization( | |||
"enable-lto-internalization", cl::init(true), cl::Hidden, | |||
cl::desc("Enable global value internalization in LTO")); | |||
|
|||
static cl::opt<bool> | |||
LTOKeepSymbolCopies("lto-keep-symbol-copies", cl::init(false), cl::Hidden, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is an option needed? I think this can just be set through the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a back-up option in case this change hit any corner case (missed a symbol) but not caught in release. Though all symbols used by GlobalResolutions should be covered by looking at the code.
In the updated PR, Experiment data are collected on two benchmarks, stacking this PR on top of #106670
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs an update after #106670 being merged? I don't have a strong feeling on the wrapper class suggested by @kazutakahirata but lgtm otherwise with a small suggestion below.
lld/ELF/InputFiles.cpp
Outdated
} else { | ||
// Keep copies of per-module undefined symbols for LTO::GlobalResolutions | ||
// usage. | ||
[[maybe_unused]] StringRef SymbolRef = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than the maybe_unused, can you just not assign the return value to a variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG too aside from a small suggestion
Nice memory savings!
llvm/include/llvm/LTO/LTO.h
Outdated
@@ -348,6 +352,11 @@ class LTO { | |||
DenseMap<GlobalValue::GUID, StringRef> PrevailingModuleForGUID; | |||
} ThinLTO; | |||
|
|||
std::unique_ptr<llvm::BumpPtrAllocator> Alloc; | |||
|
|||
// Symbol saver for global resolution map. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to group these closer to GlobalResolutions map? The comment helps refer to it, but right now is separated by the "struct GlobalResolution" definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/5169 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/169/builds/2989 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/52/builds/2121 Here is the relevant piece of the build log for the reference
|
…l resolution in ELF" (#107788) Reverts #106193 while investigating bot failures https://lab.llvm.org/buildbot/#/builders/169/builds/2989/steps/9/logs/stdio
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/164/builds/2685 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/55/builds/1867 Here is the relevant piece of the build log for the reference
|
Reverted the change in 1cc4c87 and working on a fix (to store the saved StringRef in |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/24/builds/1358 Here is the relevant piece of the build log for the reference
|
…bal resolution in ELF" (#107792) Fix the use-after-free bug and re-apply #106193 * Without the fix, the string referenced by `objSym.Name` could be destroyed even if string saver keeps a copy of the referenced string. This caused use-after-free. * The fix ([latest commit](9776ed4)) updates `objSym.Name` to reference (via `StringRef`) the string saver's copy. Test: 1. For `lld/test/ELF/lto/asmundef.ll`, its test failure is reproducible with `-DLLVM_USE_SANITIZER=Address` and gone with the fix. 3. Run all tests by following https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild#try-local-changes. * Without the fix, `ELF/lto/asmundef.ll` aborted the multi-stage test at `@@@BUILD_STEP stage2/asan_ubsan check@@@`, defined [here](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh#L30) * With the fix, the [multi-stage test](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh) pass stage2 {asan, ubsan, masan}. This is also the test used by https://lab.llvm.org/buildbot/#/builders/169 **Original commit message** `StringMap<T>` creates a [copy of the string](https://github.com/llvm/llvm-project/blob/d4c519e7b2ac21350ec08b23eda44bf4a2d3c974/llvm/include/llvm/ADT/StringMapEntry.h#L55-L58) for entry insertions and intentionally keep copies [since the implementation optimizes string memory usage](https://github.com/llvm/llvm-project/blob/d4c519e7b2ac21350ec08b23eda44bf4a2d3c974/llvm/include/llvm/ADT/StringMap.h#L124). On the other hand, linker keeps copies of symbol names [1] in `lld::elf::parseFiles` [2] before invoking `compileBitcodeFiles` [3]. This change proposes to optimize away string copies inside [LTO::GlobalResolutions](https://github.com/llvm/llvm-project/blob/24e791b4164986a1ca7776e3ae0292ef20d20c47/llvm/include/llvm/LTO/LTO.h#L409), which will make LTO indexing more memory efficient for ELF. There are similar opportunities for other (COFF, wasm, MachO) formats. The optimization takes place for lld (ELF) only. For the rest of use cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep copies and use global resolution key for de-duplication. Together with @kazutakahirata's work to make `ComputeCrossModuleImport` more memory efficient, we see a ~20% peak memory usage reduction in a binary where peak memory usage needs to go down. Thanks to the optimization in 329ba52, the max (as opposed to the sum) of `ComputeCrossModuleImport` or `GlobalResolution` shows up in peak memory usage. * Regarding correctness, the set of [resolved](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/lib/LTO/LTO.cpp#L739) [per-module symbols](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/include/llvm/LTO/LTO.h#L188-L191) is a subset of [llvm::lto::InputFile::Symbols](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/include/llvm/LTO/LTO.h#L120). And bitcode symbol parsing saves symbol name when iterating `obj->symbols` in `BitcodeFile::parse` already. This change updates `BitcodeFile::parseLazy` to keep copies of per-module undefined symbols. * Presumably the undefined symbols in a LTO unit (copied in this patch in linker unique saver) is a small set compared with the set of symbols in global-resolution (copied before this patch), making this a worthwhile trade-off. Benchmarking this change alone shows measurable memory savings across various benchmarks. [1] ELF https://github.com/llvm/llvm-project/blob/1cea5c2138bef3d8fec75508df6dbb858e6e3560/lld/ELF/InputFiles.cpp#L1748 [2] https://github.com/llvm/llvm-project/blob/ef7b18a53c0d186dcda1e322be6035407fdedb55/lld/ELF/Driver.cpp#L2863 [3] https://github.com/llvm/llvm-project/blob/ef7b18a53c0d186dcda1e322be6035407fdedb55/lld/ELF/Driver.cpp#L2995
StringMap<T>
creates a copy of the string for entry insertions and intentionally keep copies since the implementation optimizes string memory usage. On the other hand, linker keeps copies of symbol names [1] inlld::elf::parseFiles
[2] before invokingcompileBitcodeFiles
[3].This change proposes to optimize away string copies inside LTO::GlobalResolutions, which will make LTO indexing more memory efficient for ELF. There are similar opportunities for other (COFF, wasm, MachO) formats.
The optimization takes place for lld (ELF) only. For the rest of use cases (gold plugin,
llvm-lto2
, etc), LTO owns a string saver to keep copies and use global resolution key for de-duplication.Together with @kazutakahirata's work to make
ComputeCrossModuleImport
more memory efficient, we see a ~20% peak memory usage reduction in a binary where peak memory usage needs to go down. Thanks to the optimization in 329ba52, the max (as opposed to the sum) ofComputeCrossModuleImport
orGlobalResolution
shows up in peak memory usage.obj->symbols
inBitcodeFile::parse
already. This change updatesBitcodeFile::parseLazy
to keep copies of per-module undefined symbols.[1] ELF
llvm-project/lld/ELF/InputFiles.cpp
Line 1748 in 1cea5c2
[2]
llvm-project/lld/ELF/Driver.cpp
Line 2863 in ef7b18a
[3]
llvm-project/lld/ELF/Driver.cpp
Line 2995 in ef7b18a