Skip to content

[lld][macho] Support order cstrings with -order_file #140307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 63 additions & 10 deletions lld/MachO/SectionPriorities.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -247,15 +247,13 @@ DenseMap<const InputSection *, int> CallGraphSort::run() {
}

std::optional<int>
macho::PriorityBuilder::getSymbolPriority(const Defined *sym) {
if (sym->isAbsolute())
return std::nullopt;
macho::PriorityBuilder::getSymbolOrCStringPriority(const StringRef key,
InputFile *f) {

auto it = priorities.find(utils::getRootSymbol(sym->getName()));
auto it = priorities.find(key);
if (it == priorities.end())
return std::nullopt;
const SymbolPriorityEntry &entry = it->second;
const InputFile *f = sym->isec()->getFile();
if (!f)
return entry.anyObjectFile;
// We don't use toString(InputFile *) here because it returns the full path
Expand All @@ -269,6 +267,14 @@ macho::PriorityBuilder::getSymbolPriority(const Defined *sym) {
return std::min(entry.objectFiles.lookup(filename), entry.anyObjectFile);
}

std::optional<int>
macho::PriorityBuilder::getSymbolPriority(const Defined *sym) {
if (sym->isAbsolute())
return std::nullopt;
return getSymbolOrCStringPriority(utils::getRootSymbol(sym->getName()),
sym->isec()->getFile());
}

void macho::PriorityBuilder::extractCallGraphProfile() {
TimeTraceScope timeScope("Extract call graph profile");
bool hasOrderFile = !priorities.empty();
Expand Down Expand Up @@ -301,7 +307,7 @@ void macho::PriorityBuilder::parseOrderFile(StringRef path) {
int prio = std::numeric_limits<int>::min();
MemoryBufferRef mbref = *buffer;
for (StringRef line : args::getLines(mbref)) {
StringRef objectFile, symbol;
StringRef objectFile, symbolOrCStrHash;
line = line.take_until([](char c) { return c == '#'; }); // ignore comments
line = line.ltrim();

Expand All @@ -316,7 +322,6 @@ void macho::PriorityBuilder::parseOrderFile(StringRef path) {

if (cpuType != CPU_TYPE_ANY && cpuType != target->cpuType)
continue;

// Drop the CPU type as well as the colon
if (cpuType != CPU_TYPE_ANY)
line = line.drop_until([](char c) { return c == ':'; }).drop_front();
Expand All @@ -331,10 +336,20 @@ void macho::PriorityBuilder::parseOrderFile(StringRef path) {
break;
}
}
symbol = utils::getRootSymbol(line.trim());

if (!symbol.empty()) {
SymbolPriorityEntry &entry = priorities[symbol];
// The rest of the line is either <symbol name> or
// CStringEntryPrefix<cstring hash>
line = line.trim();
if (line.starts_with(CStringEntryPrefix)) {
StringRef possibleHash = line.drop_front(CStringEntryPrefix.size());
uint32_t hash = 0;
if (to_integer(possibleHash, hash))
symbolOrCStrHash = possibleHash;
} else
symbolOrCStrHash = utils::getRootSymbol(line);

if (!symbolOrCStrHash.empty()) {
SymbolPriorityEntry &entry = priorities[symbolOrCStrHash];
if (!objectFile.empty())
entry.objectFiles.insert(std::make_pair(objectFile, prio));
else
Expand Down Expand Up @@ -389,3 +404,41 @@ macho::PriorityBuilder::buildInputSectionPriorities() {

return sectionPriorities;
}

std::vector<StringPiecePair> macho::PriorityBuilder::buildCStringPriorities(
ArrayRef<CStringInputSection *> inputs) {
// Split the input strings into hold and cold sets.
// Order hot set based on -order_file_cstring for performance improvement;
// TODO: Order cold set of cstrings for compression via BP.
std::vector<std::pair<int, StringPiecePair>>
hotStringPrioritiesAndStringPieces;
std::vector<StringPiecePair> coldStringPieces;
std::vector<StringPiecePair> orderedStringPieces;

for (CStringInputSection *isec : inputs) {
for (const auto &[stringPieceIdx, piece] : llvm::enumerate(isec->pieces)) {
if (!piece.live)
continue;

std::optional<int> priority = getSymbolOrCStringPriority(
std::to_string(piece.hash), isec->getFile());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we convert the string hash in the orderfile into an integer (but we don't use the result), then we also convert each hash in the object file to a string to look it up in the priority map. It would be more efficient to create a new priority map, cstrPriorities that maps integer hashes to priorities.

Copy link
Contributor Author

@SharonXSharon SharonXSharon May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if ; is an allowed character in a symbol, should we use that?

@ellishg Do you mean just for CSTR; or do you mean all the other places using : currently should be changed to something like ;?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would use ; in place of :, but this is a breaking change. Maybe in the future we can support both and then eventually drop support for :, but that would definitely require an RFC since it would break existing orderfiles.

For now, I think we can use the CSTR; format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bikeshedding

I believe the only character disallowed in the symbolname would be the null character (at least for ELF). More practical constraints on symbol names arise from the programming language used to generate the executable. So in this case I believe most c-like languages that we care about disallow : and ; in identifier names so they are equal in this regard. Given the existing usage of ':' I'd prefer we continue using the same prefix delimiter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe most c-like languages that we care about disallow : and ; in identifier names

Actually we still support Objective-C which commonly has : in the symbol names. I don't think we should land as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM to proceed with a different delimiter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still support Objective-C which commonly has : in the symbol names.

oh you're right, forgot about the objC symbol names. but on the other hand, all the existing : for CPU and files have already been causing this issue for ObjC symbols names...
maybe right now the best trade-off is to use CSTR;?

if (!priority)
coldStringPieces.emplace_back(isec, stringPieceIdx);
else
hotStringPrioritiesAndStringPieces.emplace_back(
*priority, std::make_pair(isec, stringPieceIdx));
}
}

// Order hot set for perf
llvm::stable_sort(hotStringPrioritiesAndStringPieces);
for (auto &[priority, stringPiecePair] : hotStringPrioritiesAndStringPieces)
orderedStringPieces.push_back(stringPiecePair);

// TODO: Order cold set for compression

orderedStringPieces.insert(orderedStringPieces.end(),
coldStringPieces.begin(), coldStringPieces.end());

return orderedStringPieces;
}
29 changes: 23 additions & 6 deletions lld/MachO/SectionPriorities.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
namespace lld::macho {

using SectionPair = std::pair<const InputSection *, const InputSection *>;
using StringPiecePair = std::pair<CStringInputSection *, size_t>;

class PriorityBuilder {
public:
Expand All @@ -28,17 +29,28 @@ class PriorityBuilder {
//
// An order file has one entry per line, in the following format:
//
// <cpu>:<object file>:<symbol name>
// <cpu>:<object file>:[<symbol name> | CStringEntryPrefix <cstring hash>]
//
// <cpu> and <object file> are optional. If not specified, then that entry
// matches any symbol of that name. Parsing this format is not quite
// straightforward because the symbol name itself can contain colons, so when
// encountering a colon, we consider the preceding characters to decide if it
// can be a valid CPU type or file path.
// <cpu> and <object file> are optional.
// If not specified, then that entry tries to match either,
//
// 1) any symbol of the <symbol name>;
// Parsing this format is not quite straightforward because the symbol name
// itself can contain colons, so when encountering a colon, we consider the
// preceding characters to decide if it can be a valid CPU type or file path.
// If a symbol is matched by multiple entries, then it takes the
// lowest-ordered entry (the one nearest to the front of the list.)
//
// or 2) any cstring literal with the given hash, if the entry has the
// CStringEntryPrefix prefix defined below in the file. <cstring hash> is the
// hash of cstring literal content.
//
// Cstring literals are not symbolized, we can't identify them by name
// However, cstrings are deduplicated, hence unique, so we use the hash of
// the content of cstring literals to identify them and assign priority to it.
// We use the same hash as used in StringPiece, i.e. 31 bit:
// xxh3_64bits(string) & 0x7fffffff
//
// The file can also have line comments that start with '#'.
void parseOrderFile(StringRef path);

Expand All @@ -54,6 +66,8 @@ class PriorityBuilder {
// Each section gets assigned the priority of the highest-priority symbol it
// contains.
llvm::DenseMap<const InputSection *, int> buildInputSectionPriorities();
std::vector<StringPiecePair>
buildCStringPriorities(ArrayRef<CStringInputSection *>);

private:
// The symbol with the smallest priority should be ordered first in the output
Expand All @@ -65,8 +79,11 @@ class PriorityBuilder {
// The priority given to a matching symbol from a particular object file.
llvm::DenseMap<llvm::StringRef, int> objectFiles;
};
const llvm::StringRef CStringEntryPrefix = "CSTR;";

std::optional<int> getSymbolPriority(const Defined *sym);
std::optional<int> getSymbolOrCStringPriority(const StringRef key,
InputFile *f);
llvm::DenseMap<llvm::StringRef, SymbolPriorityEntry> priorities;
llvm::MapVector<SectionPair, uint64_t> callGraphProfile;
};
Expand Down
34 changes: 17 additions & 17 deletions lld/MachO/SyntheticSections.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
#include "InputFiles.h"
#include "ObjC.h"
#include "OutputSegment.h"
#include "SectionPriorities.h"
#include "SymbolTable.h"
#include "Symbols.h"

Expand Down Expand Up @@ -1760,26 +1761,25 @@ void DeduplicatedCStringSection::finalizeContents() {
}
}

// Assign an offset for each string and save it to the corresponding
// Sort the strings for performance and compression size win, and then
// assign an offset for each string and save it to the corresponding
// StringPieces for easy access.
for (CStringInputSection *isec : inputs) {
for (const auto &[i, piece] : llvm::enumerate(isec->pieces)) {
if (!piece.live)
continue;
auto s = isec->getCachedHashStringRef(i);
auto it = stringOffsetMap.find(s);
assert(it != stringOffsetMap.end());
StringOffset &offsetInfo = it->second;
if (offsetInfo.outSecOff == UINT64_MAX) {
offsetInfo.outSecOff =
alignToPowerOf2(size, 1ULL << offsetInfo.trailingZeros);
size =
offsetInfo.outSecOff + s.size() + 1; // account for null terminator
}
piece.outSecOff = offsetInfo.outSecOff;
for (auto &[isec, i] : priorityBuilder.buildCStringPriorities(inputs)) {
auto &piece = isec->pieces[i];
auto s = isec->getCachedHashStringRef(i);
auto it = stringOffsetMap.find(s);
assert(it != stringOffsetMap.end());
lld::macho::DeduplicatedCStringSection::StringOffset &offsetInfo =
it->second;
if (offsetInfo.outSecOff == UINT64_MAX) {
offsetInfo.outSecOff =
alignToPowerOf2(size, 1ULL << offsetInfo.trailingZeros);
size = offsetInfo.outSecOff + s.size() + 1; // account for null terminator
}
isec->isFinal = true;
piece.outSecOff = offsetInfo.outSecOff;
}
for (CStringInputSection *isec : inputs)
isec->isFinal = true;
}

void DeduplicatedCStringSection::writeTo(uint8_t *buf) const {
Expand Down
Loading