Skip to content

ELF: Add branch-to-branch optimization. #138366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions lld/ELF/Arch/AArch64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include "Symbols.h"
#include "SyntheticSections.h"
#include "Target.h"
#include "TargetImpl.h"
#include "llvm/BinaryFormat/ELF.h"
#include "llvm/Support/Endian.h"

Expand Down Expand Up @@ -82,6 +83,7 @@ class AArch64 : public TargetInfo {
uint64_t val) const override;
RelExpr adjustTlsExpr(RelType type, RelExpr expr) const override;
void relocateAlloc(InputSectionBase &sec, uint8_t *buf) const override;
void applyBranchToBranchOpt() const override;

private:
void relaxTlsGdToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
Expand Down Expand Up @@ -974,6 +976,63 @@ void AArch64::relocateAlloc(InputSectionBase &sec, uint8_t *buf) const {
}
}

static std::optional<uint64_t> getControlTransferAddend(InputSection &is,
Relocation &r) {
// Identify a control transfer relocation for the branch-to-branch
// optimization. A "control transfer relocation" means a B or BL
// target but it also includes relative vtable relocations for example.
//
// We require the relocation type to be JUMP26, CALL26 or PLT32. With a
// relocation type of PLT32 the value may be assumed to be used for branching
// directly to the symbol and the addend is only used to produce the relocated
// value (hence the effective addend is always 0). This is because if a PLT is
// needed the addend will be added to the address of the PLT, and it doesn't
// make sense to branch into the middle of a PLT. For example, relative vtable
// relocations use PLT32 and 0 or a positive value as the addend but still are
// used to branch to the symbol.
//
// With JUMP26 or CALL26 the only reasonable interpretation of a non-zero
// addend is that we are branching to symbol+addend so that becomes the
// effective addend.
if (r.type == R_AARCH64_PLT32)
return 0;
if (r.type == R_AARCH64_JUMP26 || r.type == R_AARCH64_CALL26)
return r.addend;
return std::nullopt;
}

static std::pair<Relocation *, uint64_t>
getBranchInfoAtTarget(InputSection &is, uint64_t offset) {
auto *i =
std::partition_point(is.relocations.begin(), is.relocations.end(),
[&](Relocation &r) { return r.offset < offset; });
if (i != is.relocations.end() && i->offset == offset &&
i->type == R_AARCH64_JUMP26) {
return {i, i->addend};
}
return {nullptr, 0};
}

static void redirectControlTransferRelocations(Relocation &r1,
const Relocation &r2) {
r1.expr = r2.expr;
r1.sym = r2.sym;
// With PLT32 we must respect the original addend as that affects the value's
// interpretation. With the other relocation types the original addend is
// irrelevant because it referred to an offset within the original target
// section so we overwrite it.
if (r1.type == R_AARCH64_PLT32)
r1.addend += r2.addend;
else
r1.addend = r2.addend;
}

void AArch64::applyBranchToBranchOpt() const {
applyBranchToBranchOptImpl(ctx, getControlTransferAddend,
getBranchInfoAtTarget,
redirectControlTransferRelocations);
}

// AArch64 may use security features in variant PLT sequences. These are:
// Pointer Authentication (PAC), introduced in armv8.3-a and Branch Target
// Indicator (BTI) introduced in armv8.5-a. The additional instructions used
Expand Down
93 changes: 93 additions & 0 deletions lld/ELF/Arch/TargetImpl.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
//===- TargetImpl.h -------------------------------------------------------===//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: switch to the new style as well

//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#ifndef LLD_ELF_ARCH_TARGETIMPL_H
#define LLD_ELF_ARCH_TARGETIMPL_H

#include "InputFiles.h"
#include "InputSection.h"
#include "Relocations.h"
#include "Symbols.h"
#include "llvm/BinaryFormat/ELF.h"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespace lld::elf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

namespace lld::elf {

// getControlTransferAddend: If this relocation is used for control transfer
// instructions (e.g. branch, branch-link or call) or code references (e.g.
// virtual function pointers) and indicates an address-insignificant reference,
// return the effective addend for the relocation, otherwise return
// std::nullopt. The effective addend for a relocation is the addend that is
// used to determine its branch destination.
//
// getBranchInfoAtTarget: If a control transfer relocation referring to
// is+offset directly transfers control to a relocated branch instruction in the
// specified section, return the relocation for the branch target as well as its
// effective addend (see above). Otherwise return {nullptr, 0}.
//
// redirectControlTransferRelocations: Given r1, a relocation for which
// getControlTransferAddend() returned a value, and r2, a relocation returned by
// getBranchInfo(), modify r1 so that it branches directly to the target of r2.
template <typename GetControlTransferAddend, typename GetBranchInfoAtTarget,
typename RedirectControlTransferRelocations>
inline void applyBranchToBranchOptImpl(
Ctx &ctx, GetControlTransferAddend getControlTransferAddend,
GetBranchInfoAtTarget getBranchInfoAtTarget,
RedirectControlTransferRelocations redirectControlTransferRelocations) {
// Needs to run serially because it writes to the relocations array as well as
// reading relocations of other sections.
for (ELFFileBase *f : ctx.objectFiles) {
auto getRelocBranchInfo =
[&getBranchInfoAtTarget](
Relocation &r,
uint64_t addend) -> std::pair<Relocation *, uint64_t> {
auto *target = dyn_cast_or_null<Defined>(r.sym);
// We don't allow preemptible symbols or ifuncs (may go somewhere else),
// absolute symbols (runtime behavior unknown), non-executable or writable
// memory (ditto) or non-regular sections (no section data).
if (!target || target->isPreemptible || target->isGnuIFunc() ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While uncommon, it is possible to have SHT_REL relocs which may have a non zero addend. I know of at least one tool that can generate them. I don't think these need to be supported, but could be worth skipping any that are encountered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't SHT_REL just work already because we read the implicit addend when producing the Relocation object?

I wanted to add a test case for this but it looks like llvm-mc doesn't have an option to write SHT_REL and instead SHT_REL is tested with yaml2obj hacks, e.g. lld/test/ELF/aarch64-reloc-implicit-addend.test. I think that test is already providing enough coverage of the SHT_REL path (otherwise we would need duplicate and difficult to maintain tests of every feature that processes Relocations).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just checked and it does copy the relocation addend.

I agree that this wouldn't need a test case.

As an aside when checking where the addends were read in I ran into this bit of copyRelocations https://github.com/llvm/llvm-project/blob/main/lld/ELF/InputSection.cpp#L433

  if (ctx.arg.relax && !ctx.arg.relocatable &&
      (ctx.arg.emachine == EM_RISCV || ctx.arg.emachine == EM_LOONGARCH)) {
    // On LoongArch and RISC-V, relaxation might change relocations: copy
    // from internal ones that are updated by relaxation.
    InputSectionBase *sec = getRelocatedSection();
    copyRelocations<ELFT, RelTy>(
        ctx, buf,
        llvm::make_range(sec->relocations.begin(), sec->relocations.end()));

I think I mentioned in a previous comment that bolt uses emit-relocations so it may be worth following suite here when the transformation is applied.

I suspect that if bolt trusts the original relocation then in worst case the transformation is undone though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there should be a --emit-relocs test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added --emit-relocs test, also switched to the other code path in InputSection.cpp

!target->section ||
!(target->section->flags & llvm::ELF::SHF_EXECINSTR) ||
(target->section->flags & llvm::ELF::SHF_WRITE) ||
target->section->kind() != SectionBase::Regular)
return {nullptr, 0};
return getBranchInfoAtTarget(*cast<InputSection>(target->section),
target->value + addend);
};
for (InputSectionBase *s : f->getSections()) {
if (!s)
continue;
for (Relocation &r : s->relocations) {
std::optional<uint64_t> addend =
getControlTransferAddend(*cast<InputSection>(s), r);
if (!addend)
continue;
std::pair<Relocation *, uint64_t> targetAndAddend =
getRelocBranchInfo(r, *addend);
if (!targetAndAddend.first)
continue;
// Avoid getting stuck in an infinite loop if we encounter a branch
// that (possibly indirectly) branches to itself. It is unlikely
// that more than 5 iterations will ever be needed in practice.
size_t iterations = 5;
while (iterations--) {
std::pair<Relocation *, uint64_t> nextTargetAndAddend =
getRelocBranchInfo(*targetAndAddend.first,
targetAndAddend.second);
if (!nextTargetAndAddend.first)
break;
targetAndAddend = nextTargetAndAddend;
}
redirectControlTransferRelocations(r, *targetAndAddend.first);
}
}
}
}

} // namespace lld::elf

#endif
69 changes: 69 additions & 0 deletions lld/ELF/Arch/X86_64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include "Symbols.h"
#include "SyntheticSections.h"
#include "Target.h"
#include "TargetImpl.h"
#include "llvm/BinaryFormat/ELF.h"
#include "llvm/Support/Endian.h"
#include "llvm/Support/MathExtras.h"
Expand Down Expand Up @@ -49,6 +50,7 @@ class X86_64 : public TargetInfo {
bool deleteFallThruJmpInsn(InputSection &is, InputFile *file,
InputSection *nextIS) const override;
bool relaxOnce(int pass) const override;
void applyBranchToBranchOpt() const override;

private:
void relaxTlsGdToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
Expand Down Expand Up @@ -1161,6 +1163,73 @@ void X86_64::relocateAlloc(InputSectionBase &sec, uint8_t *buf) const {
}
}

static std::optional<uint64_t> getControlTransferAddend(InputSection &is,
Relocation &r) {
// Identify a control transfer relocation for the branch-to-branch
// optimization. A "control transfer relocation" usually means a CALL or JMP
// target but it also includes relative vtable relocations for example.
//
// We require the relocation type to be PLT32. With a relocation type of PLT32
// the value may be assumed to be used for branching directly to the symbol
// and the addend is only used to produce the relocated value (hence the
// effective addend is always 0). This is because if a PLT is needed the
// addend will be added to the address of the PLT, and it doesn't make sense
// to branch into the middle of a PLT. For example, relative vtable
// relocations use PLT32 and 0 or a positive value as the addend but still are
// used to branch to the symbol.
//
// STT_SECTION symbols are a special case on x86 because the LLVM assembler
// uses them for branches to local symbols which are assembled as referring to
// the section symbol with the addend equal to the symbol value - 4.
if (r.type == R_X86_64_PLT32) {
if (r.sym->isSection())
return r.addend + 4;
return 0;
}
return std::nullopt;
}

static std::pair<Relocation *, uint64_t>
getBranchInfoAtTarget(InputSection &is, uint64_t offset) {
auto content = is.contentMaybeDecompress();
if (content.size() > offset && content[offset] == 0xe9) { // JMP immediate
auto *i = std::partition_point(
is.relocations.begin(), is.relocations.end(),
[&](Relocation &r) { return r.offset < offset + 1; });
// Unlike with getControlTransferAddend() it is valid to accept a PC32
// relocation here because we know that this is actually a JMP and not some
// other reference, so the interpretation is that we add 4 to the addend and
// use that as the effective addend.
if (i != is.relocations.end() && i->offset == offset + 1 &&
(i->type == R_X86_64_PC32 || i->type == R_X86_64_PLT32)) {
return {i, i->addend + 4};
}
}
return {nullptr, 0};
}

static void redirectControlTransferRelocations(Relocation &r1,
const Relocation &r2) {
// The isSection() check handles the STT_SECTION case described above.
// In that case the original addend is irrelevant because it referred to an
// offset within the original target section so we overwrite it.
//
// The +4 is here to compensate for r2.addend which will likely be -4,
// but may also be addend-4 in case of a PC32 branch to symbol+addend.
if (r1.sym->isSection())
r1.addend = r2.addend;
else
r1.addend += r2.addend + 4;
r1.expr = r2.expr;
r1.sym = r2.sym;
}

void X86_64::applyBranchToBranchOpt() const {
applyBranchToBranchOptImpl(ctx, getControlTransferAddend,
getBranchInfoAtTarget,
redirectControlTransferRelocations);
}

// If Intel Indirect Branch Tracking is enabled, we have to emit special PLT
// entries containing endbr64 instructions. A PLT entry will be split into two
// parts, one in .plt.sec (writePlt), and the other in .plt (writeIBTPlt).
Expand Down
1 change: 1 addition & 0 deletions lld/ELF/Config.h
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,7 @@ struct Config {
bool bpFunctionOrderForCompression = false;
bool bpDataOrderForCompression = false;
bool bpVerboseSectionOrderer = false;
bool branchToBranch = false;
bool checkSections;
bool checkDynamicRelocs;
std::optional<llvm::DebugCompressionType> compressDebugSections;
Expand Down
2 changes: 2 additions & 0 deletions lld/ELF/Driver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1644,6 +1644,8 @@ static void readConfigs(Ctx &ctx, opt::InputArgList &args) {
ctx.arg.zWxneeded = hasZOption(args, "wxneeded");
setUnresolvedSymbolPolicy(ctx, args);
ctx.arg.power10Stubs = args.getLastArgValue(OPT_power10_stubs_eq) != "no";
ctx.arg.branchToBranch = args.hasFlag(
OPT_branch_to_branch, OPT_no_branch_to_branch, ctx.arg.optimize >= 2);

if (opt::Arg *arg = args.getLastArg(OPT_eb, OPT_el)) {
if (arg->getOption().matches(OPT_eb))
Expand Down
6 changes: 4 additions & 2 deletions lld/ELF/InputSection.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -430,8 +430,10 @@ InputSectionBase *InputSection::getRelocatedSection() const {

template <class ELFT, class RelTy>
void InputSection::copyRelocations(Ctx &ctx, uint8_t *buf) {
if (ctx.arg.relax && !ctx.arg.relocatable &&
(ctx.arg.emachine == EM_RISCV || ctx.arg.emachine == EM_LOONGARCH)) {
if (!ctx.arg.relocatable &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition is now complex... Perhaps define a variable for linker relaxation targets (RISCV,LoongArch)?

bool linkerRelax = ctx.arg.relax && (is_contained({EM_RISCV, EM_LOONGARCH}, ctx.arg.emachine);
...
(linkerRelax || ctx.arg.branchToBranch)

((ctx.arg.relax &&
(ctx.arg.emachine == EM_RISCV || ctx.arg.emachine == EM_LOONGARCH)) ||
ctx.arg.branchToBranch)) {
// On LoongArch and RISC-V, relaxation might change relocations: copy
// from internal ones that are updated by relaxation.
InputSectionBase *sec = getRelocatedSection();
Expand Down
4 changes: 4 additions & 0 deletions lld/ELF/Options.td
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,10 @@ def build_id: J<"build-id=">, HelpText<"Generate build ID note">,
MetaVarName<"[fast,md5,sha1,uuid,0x<hexstring>]">;
def : F<"build-id">, Alias<build_id>, AliasArgs<["sha1"]>, HelpText<"Alias for --build-id=sha1">;

defm branch_to_branch: BB<"branch-to-branch",
"Enable branch-to-branch optimization (default at -O2)",
"Disable branch-to-branch optimization (default at -O0 and -O1)">;

defm check_sections: B<"check-sections",
"Check section addresses for overlaps (default)",
"Do not check section addresses for overlaps">;
Expand Down
8 changes: 6 additions & 2 deletions lld/ELF/Relocations.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1665,9 +1665,10 @@ void RelocationScanner::scan(Relocs<RelTy> rels) {
}

// Sort relocations by offset for more efficient searching for
// R_RISCV_PCREL_HI20 and R_PPC64_ADDR64.
// R_RISCV_PCREL_HI20, R_PPC64_ADDR64 and the branch-to-branch optimization.
if (ctx.arg.emachine == EM_RISCV ||
(ctx.arg.emachine == EM_PPC64 && sec->name == ".toc"))
(ctx.arg.emachine == EM_PPC64 && sec->name == ".toc") ||
ctx.arg.branchToBranch)
llvm::stable_sort(sec->relocs(),
[](const Relocation &lhs, const Relocation &rhs) {
return lhs.offset < rhs.offset;
Expand Down Expand Up @@ -1958,6 +1959,9 @@ void elf::postScanRelocations(Ctx &ctx) {
for (ELFFileBase *file : ctx.objectFiles)
for (Symbol *sym : file->getLocalSymbols())
fn(*sym);

if (ctx.arg.branchToBranch)
ctx.target->applyBranchToBranchOpt();
}

static bool mergeCmp(const InputSection *a, const InputSection *b) {
Expand Down
1 change: 1 addition & 0 deletions lld/ELF/Target.h
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ class TargetInfo {

virtual void applyJumpInstrMod(uint8_t *loc, JumpModType type,
JumpModType val) const {}
virtual void applyBranchToBranchOpt() const {}

virtual ~TargetInfo();

Expand Down
8 changes: 6 additions & 2 deletions lld/docs/ld.lld.1
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,10 @@ Bind default visibility defined STB_GLOBAL function symbols locally for
.Fl shared.
.It Fl -be8
Write a Big Endian ELF File using BE8 format(AArch32 only)
.It Fl -branch-to-branch
Enable the branch-to-branch optimizations: a branch whose target is
another branch instruction is rewritten to point to the latter branch
target (AArch64 and X86_64 only). Enabled by default at -O2.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-O2 should use .Fl O2. (unfortunately the dot after -O2 has to be highlighted as well...)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Fl O2 Ns .?

.It Fl -build-id Ns = Ns Ar value
Generate a build ID note.
.Ar value
Expand Down Expand Up @@ -414,7 +418,7 @@ If not specified,
.Dv a.out
is used as a default.
.It Fl O Ns Ar value
Optimize output file size.
Optimize output file.
.Ar value
may be:
.Pp
Expand All @@ -424,7 +428,7 @@ Disable string merging.
.It Cm 1
Enable string merging.
.It Cm 2
Enable string tail merging.
Enable string tail merging and branch-to-branch optimization.
.El
.Pp
.Fl O Ns Cm 1
Expand Down
Loading
Loading