-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[lld-macho] Implement symbol string deduplication #123874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-lld @llvm/pr-subscribers-lld-macho Author: None (alx32) ChangesThe symbol string table does not have deduplication. We enable deduplication by default and add a flag to disable it ( Full diff: https://github.com/llvm/llvm-project/pull/123874.diff 6 Files Affected:
diff --git a/lld/MachO/Config.h b/lld/MachO/Config.h
index d41ca5382c692a2..edc1246acd3e42d 100644
--- a/lld/MachO/Config.h
+++ b/lld/MachO/Config.h
@@ -256,6 +256,8 @@ struct Configuration {
llvm::MachO::PlatformType platform() const {
return platformInfo.target.Platform;
}
+
+ bool deduplicateSymbolStrings = true;
};
extern std::unique_ptr<Configuration> config;
diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp
index 31630ba7d69de28..9c63ab0739a8a5c 100644
--- a/lld/MachO/Driver.cpp
+++ b/lld/MachO/Driver.cpp
@@ -1806,6 +1806,8 @@ bool link(ArrayRef<const char *> argsArr, llvm::raw_ostream &stdoutOS,
config->keepICFStabs = args.hasArg(OPT_keep_icf_stabs);
config->dedupStrings =
args.hasFlag(OPT_deduplicate_strings, OPT_no_deduplicate_strings, true);
+ config->deduplicateSymbolStrings =
+ !args.hasArg(OPT_no_deduplicate_symbol_strings);
config->deadStripDuplicates = args.hasArg(OPT_dead_strip_duplicates);
config->warnDylibInstallName = args.hasFlag(
OPT_warn_dylib_install_name, OPT_no_warn_dylib_install_name, false);
diff --git a/lld/MachO/Options.td b/lld/MachO/Options.td
index 4c89f96c3ebaadf..9001e85582c1244 100644
--- a/lld/MachO/Options.td
+++ b/lld/MachO/Options.td
@@ -1476,3 +1476,8 @@ def no_warn_duplicate_libraries : Flag<["-"], "no_warn_duplicate_libraries">,
HelpText<"Do not warn if the input contains duplicate library options.">,
Flags<[HelpHidden]>,
Group<grp_ignored_silently>;
+
+// Add this with the other flags in the rare options group
+def no_deduplicate_symbol_strings : Flag<["-"], "no-deduplicate-symbol-strings">,
+ HelpText<"Do not deduplicate strings in the symbol string table. Might result in larger binaries but slightly faster link times.">,
+ Group<grp_rare>;
diff --git a/lld/MachO/SyntheticSections.cpp b/lld/MachO/SyntheticSections.cpp
index 417b7cf93efa7db..313860128bcafb7 100644
--- a/lld/MachO/SyntheticSections.cpp
+++ b/lld/MachO/SyntheticSections.cpp
@@ -1540,9 +1540,24 @@ StringTableSection::StringTableSection()
: LinkEditSection(segment_names::linkEdit, section_names::stringTable) {}
uint32_t StringTableSection::addString(StringRef str) {
+ // If deduplication is disabled, just add the string
+ if (!config->deduplicateSymbolStrings) {
+ uint32_t strx = size;
+ strings.push_back(str);
+ size += str.size() + 1; // +1 for null terminator
+ return strx;
+ }
+
+ // Deduplicate strings
+ llvm::CachedHashStringRef hashedStr(str);
+ auto it = stringMap.find(hashedStr);
+ if (it != stringMap.end())
+ return it->second;
+
uint32_t strx = size;
- strings.push_back(str); // TODO: consider deduplicating strings
- size += str.size() + 1; // account for null terminator
+ stringMap[hashedStr] = strx;
+ strings.push_back(str);
+ size += str.size() + 1;
return strx;
}
diff --git a/lld/MachO/SyntheticSections.h b/lld/MachO/SyntheticSections.h
index af99f22788d6e99..5796b0790c83a09 100644
--- a/lld/MachO/SyntheticSections.h
+++ b/lld/MachO/SyntheticSections.h
@@ -447,6 +447,7 @@ class StringTableSection final : public LinkEditSection {
// match its behavior here since some tools depend on it.
// Consequently, the empty string will be at index 1, not zero.
std::vector<StringRef> strings{" "};
+ llvm::DenseMap<llvm::CachedHashStringRef, uint32_t> stringMap;
size_t size = 2;
};
diff --git a/lld/test/MachO/cfstring-dedup.s b/lld/test/MachO/cfstring-dedup.s
index fb121cde3e9585c..7e1772600b427f2 100644
--- a/lld/test/MachO/cfstring-dedup.s
+++ b/lld/test/MachO/cfstring-dedup.s
@@ -7,6 +7,14 @@
# RUN: %lld -dylib -framework CoreFoundation %t/foo1.o %t/foo2.o -o %t/foo
# RUN: llvm-objdump --no-print-imm-hex --macho --rebase --bind --syms -d %t/foo | FileCheck %s --check-prefix=LITERALS
+# Check that string deduplication for symbol names is working
+# RUN: %lld -dylib -framework CoreFoundation %t/foo1.o %t/foo2.o -o %t/foo_no_dedup -no-deduplicate-symbol-strings
+# RUN: count1=$((`llvm-strings %t/foo | grep _named_cfstring | wc -l`))
+# RUN: test $count1 -eq 1
+# RUN: count2=$((`llvm-strings %t/foo_no_dedup | grep _named_cfstring | wc -l`))
+# RUN: test $count2 -eq 2
+
+
# CHECK: (__TEXT,__text) section
# CHECK-NEXT: _foo1:
# CHECK-NEXT: _foo2:
|
Any reason why this needs a dedicated flag and isn't folded with the original |
✅ With the latest revision this PR passed the C/C++ code formatter. |
@thevinster - I think of program string literals and metadata strings differently. For example you might want to disable string deduplication if using string pointers as unique keys. Since symbol strings are program metadata, they have different reasons for enabling / disabling deduplication. So I was thinking we would want different flags to control enabling/disabling the deduplication. |
Okay, if there's a reason to disable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as soon as the question of having a separate flag is resolved.
lld/MachO/Config.h
Outdated
@@ -256,6 +256,8 @@ struct Configuration { | |||
llvm::MachO::PlatformType platform() const { | |||
return platformInfo.target.Platform; | |||
} | |||
|
|||
bool deduplicateSymbolStrings = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably makes sense to put this next to
llvm-project/lld/MachO/Config.h
Line 145 in 2a51a0d
bool dedupStrings = true; |
|
||
// Add this with the other flags in the rare options group | ||
def no_deduplicate_symbol_strings : Flag<["-"], "no-deduplicate-symbol-strings">, | ||
HelpText<"Do not deduplicate strings in the symbol string table. Might result in larger binaries but slightly faster link times.">, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to also note that "no_dedup..." takes higher priority if both flags are present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which flags are you referring to ? This is enabled by default with only a flag to disable it - there is no flag to enable. Similar to -no_deduplicate
in apple linker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh NVM, I thought OPT_deduplicate_strings
needs to be explicitly turned on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted offline. The two flags are fine. Apple's -no_deduplicate
seems to be orthogonal to symbol string deduplication, and combining it under one flag could create confusion. Two flags also provides more flexibility.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/12286 Here is the relevant piece of the build log for the reference
|
Do you have numbers on the performance cost of this? |
@nico - here are perf numbers for a
So a |
lld20 introduces a symbol string deduplication pass, reducing binary sizes on macOS (llvm/llvm-project#123874). For debug binaries with linker optimizations turned on, binary size improves by 930KB to 235.4MB, gains for release binaries should be similar. Run `brew update` on CI to make this version available.
lld20 introduces a symbol string deduplication pass, reducing binary sizes on macOS (llvm/llvm-project#123874). For debug binaries with linker optimizations turned on, binary size improves by 930KB to 235.4MB, gains for release binaries should be similar. Run `brew update` on CI to make this version available.
lld20 introduces a symbol string deduplication pass, reducing binary sizes on macOS (llvm/llvm-project#123874). For debug binaries with linker optimizations turned on, binary size improves by 930KB to 235.4MB, gains for release binaries should be similar. Run `brew update` on CI to make this version available.
The symbol string table does not have deduplication.
Here we add code to deduplicate the symbol string table.
This has a rather large size impact (20-30%) on unstripped binaries (typically debug binaries) but no size impact on stripped binaries(typically release binaries).
We enable deduplication by default and add a flag to disable it (
-no-deduplicate-symbol-strings
).