-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[BOLT] Skip out-of-range pending relocations #116964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BOLT] Skip out-of-range pending relocations #116964
Conversation
e9b0066
to
b3902db
Compare
@llvm/pr-subscribers-bolt Author: Paschalis Mpeis (paschalis-mpeis) ChangesWhen a pending relocation is created it is also marked whether it is When bolt tries to Background: BOLT can assert if any of these pending relocations cannot be encoded. This patch does not disable this optimization but instead selectively Stacked PR on top of:
Full diff: https://github.com/llvm/llvm-project/pull/116964.diff 6 Files Affected:
diff --git a/bolt/include/bolt/Core/BinarySection.h b/bolt/include/bolt/Core/BinarySection.h
index dedee361882497..37d0932a5c142e 100644
--- a/bolt/include/bolt/Core/BinarySection.h
+++ b/bolt/include/bolt/Core/BinarySection.h
@@ -66,8 +66,10 @@ class BinarySection {
// from the original section address.
RelocationSetType DynamicRelocations;
- // Pending relocations for this section.
- std::vector<Relocation> PendingRelocations;
+ /// Pending relocations for this section and whether they are optional, i.e.,
+ /// added as part of an optimization. In that case they can be safely omitted
+ /// if flushPendingRelocations discovers they cannot be encoded.
+ std::vector<std::pair<Relocation, bool>> PendingRelocations;
struct BinaryPatch {
uint64_t Offset;
@@ -374,9 +376,10 @@ class BinarySection {
DynamicRelocations.emplace(Reloc);
}
- /// Add relocation against the original contents of this section.
- void addPendingRelocation(const Relocation &Rel) {
- PendingRelocations.push_back(Rel);
+ /// Add relocation against the original contents of this section. When added
+ /// as part of an optimization it is marked as \p Optional.
+ void addPendingRelocation(const Relocation &Rel, bool Optional = false) {
+ PendingRelocations.push_back({Rel, Optional});
}
/// Add patch to the input contents of this section.
diff --git a/bolt/include/bolt/Core/Relocation.h b/bolt/include/bolt/Core/Relocation.h
index 933f62a31f8fd7..177cc0c70431f4 100644
--- a/bolt/include/bolt/Core/Relocation.h
+++ b/bolt/include/bolt/Core/Relocation.h
@@ -64,9 +64,12 @@ struct Relocation {
/// and \P Type mismatch occurred.
static bool skipRelocationProcess(uint64_t &Type, uint64_t Contents);
- // Adjust value depending on relocation type (make it PC relative or not)
+ /// Adjust value depending on relocation type (make it PC relative or not).
static uint64_t encodeValue(uint64_t Type, uint64_t Value, uint64_t PC);
+ /// Return true if there are enough bits to encode the relocation value.
+ static bool canEncodeValue(uint64_t Type, uint64_t Value, uint64_t PC);
+
/// Extract current relocated value from binary contents. This is used for
/// RISC architectures where values are encoded in specific bits depending
/// on the relocation value. For X86, we limit to sign extending the value
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index 5da777411ba7a1..5d1e5ca92ca131 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -1672,7 +1672,7 @@ bool BinaryFunction::scanExternalRefs() {
// Add relocations unless disassembly failed for this function.
if (!DisassemblyFailed)
for (Relocation &Rel : FunctionRelocations)
- getOriginSection()->addPendingRelocation(Rel);
+ getOriginSection()->addPendingRelocation(Rel, /*Optional*/ true);
// Inform BinaryContext that this function symbols will not be defined and
// relocations should not be created against them.
diff --git a/bolt/lib/Core/BinarySection.cpp b/bolt/lib/Core/BinarySection.cpp
index 9ad49ca1b3a038..a37cc7603df285 100644
--- a/bolt/lib/Core/BinarySection.cpp
+++ b/bolt/lib/Core/BinarySection.cpp
@@ -165,11 +165,19 @@ void BinarySection::flushPendingRelocations(raw_pwrite_stream &OS,
OS.pwrite(Patch.Bytes.data(), Patch.Bytes.size(),
SectionFileOffset + Patch.Offset);
- for (Relocation &Reloc : PendingRelocations) {
+ uint64_t SkippedPendingRelocations = 0;
+ for (auto &[Reloc, Optional] : PendingRelocations) {
uint64_t Value = Reloc.Addend;
if (Reloc.Symbol)
Value += Resolver(Reloc.Symbol);
+ // Safely skip any pending relocation that cannot be encoded and was added
+ // as part of an optimization.
+ if (Optional && !Relocation::canEncodeValue(
+ Reloc.Type, Value, SectionAddress + Reloc.Offset)) {
+ ++SkippedPendingRelocations;
+ continue;
+ }
Value = Relocation::encodeValue(Reloc.Type, Value,
SectionAddress + Reloc.Offset);
@@ -188,6 +196,10 @@ void BinarySection::flushPendingRelocations(raw_pwrite_stream &OS,
}
clearList(PendingRelocations);
+
+ if (SkippedPendingRelocations > 0)
+ LLVM_DEBUG(dbgs() << "BOLT-INFO: Skipped " << SkippedPendingRelocations
+ << " pending relocations as they were out of range\n");
}
BinarySection::~BinarySection() { updateContents(nullptr, 0); }
diff --git a/bolt/lib/Core/Relocation.cpp b/bolt/lib/Core/Relocation.cpp
index 4e888a5b147aca..3bc5c20ceb6739 100644
--- a/bolt/lib/Core/Relocation.cpp
+++ b/bolt/lib/Core/Relocation.cpp
@@ -361,6 +361,16 @@ static uint64_t encodeValueX86(uint64_t Type, uint64_t Value, uint64_t PC) {
return Value;
}
+static bool canEncodeValueAArch64(uint64_t Type, uint64_t Value, uint64_t PC) {
+ switch (Type) {
+ default:
+ llvm_unreachable("unsupported relocation");
+ case ELF::R_AARCH64_CALL26:
+ case ELF::R_AARCH64_JUMP26:
+ return isInt<28>(Value - PC);
+ }
+}
+
static uint64_t encodeValueAArch64(uint64_t Type, uint64_t Value, uint64_t PC) {
switch (Type) {
default:
@@ -393,6 +403,16 @@ static uint64_t encodeValueAArch64(uint64_t Type, uint64_t Value, uint64_t PC) {
return Value;
}
+static uint64_t canEncodeValueRISCV(uint64_t Type, uint64_t Value,
+ uint64_t PC) {
+ switch (Type) {
+ default:
+ llvm_unreachable("unsupported relocation");
+ case ELF::R_RISCV_64:
+ return true;
+ }
+}
+
static uint64_t encodeValueRISCV(uint64_t Type, uint64_t Value, uint64_t PC) {
switch (Type) {
default:
@@ -838,6 +858,19 @@ uint64_t Relocation::encodeValue(uint64_t Type, uint64_t Value, uint64_t PC) {
}
}
+bool Relocation::canEncodeValue(uint64_t Type, uint64_t Value, uint64_t PC) {
+ switch (Arch) {
+ default:
+ llvm_unreachable("Unsupported architecture");
+ case Triple::aarch64:
+ return canEncodeValueAArch64(Type, Value, PC);
+ case Triple::riscv64:
+ return canEncodeValueRISCV(Type, Value, PC);
+ case Triple::x86_64:
+ return true;
+ }
+}
+
uint64_t Relocation::extractValue(uint64_t Type, uint64_t Contents,
uint64_t PC) {
switch (Arch) {
diff --git a/bolt/unittests/Core/BinaryContext.cpp b/bolt/unittests/Core/BinaryContext.cpp
index 6e8ad4f62baeff..9ba0bba16c9f35 100644
--- a/bolt/unittests/Core/BinaryContext.cpp
+++ b/bolt/unittests/Core/BinaryContext.cpp
@@ -157,6 +157,38 @@ TEST_P(BinaryContextTester, FlushPendingRelocJUMP26) {
<< "Wrong forward branch value\n";
}
+TEST_P(BinaryContextTester, FlushOptionalOutOfRangePendingRelocCALL26) {
+ if (GetParam() != Triple::aarch64)
+ GTEST_SKIP();
+
+ // This test checks that flushPendingRelocations skips flushing any optional
+ // pending relocations that cannot be encoded.
+
+ bool DebugFlagPrev = ::llvm::DebugFlag;
+ ::llvm::DebugFlag = true;
+ testing::internal::CaptureStderr();
+
+ BinarySection &BS = BC->registerOrUpdateSection(
+ ".text", ELF::SHT_PROGBITS, ELF::SHF_EXECINSTR | ELF::SHF_ALLOC);
+ MCSymbol *RelSymbol = BC->getOrCreateGlobalSymbol(4, "Func");
+ ASSERT_TRUE(RelSymbol);
+ BS.addPendingRelocation(Relocation{8, RelSymbol, ELF::R_AARCH64_CALL26, 0, 0},
+ /*Optional*/ true);
+
+ SmallVector<char> Vect;
+ raw_svector_ostream OS(Vect);
+
+ // Resolve relocation symbol to a high value so encoding will be out of range.
+ BS.flushPendingRelocations(OS, [&](const MCSymbol *S) { return 0x800000F; });
+
+ std::string CapturedStdErr = testing::internal::GetCapturedStderr();
+ EXPECT_TRUE(CapturedStdErr.find("BOLT-INFO: Skipped 1 pending relocations as "
+ "they were out of range") !=
+ std::string::npos);
+
+ ::llvm::DebugFlag = DebugFlagPrev;
+}
+
#endif
TEST_P(BinaryContextTester, BaseAddress) {
|
b3902db
to
d46158c
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Overall, looks good to me.
Let's add Optional
flag to the Relocation
. We can make Relocation::Type
uint32_t
and then the new flag will come with zero overhead.
Hey Maksim, Extending Relocations is even better. Thanks for the suggestion and the review. Before proceeding, and regarding the size overheads, I want to highlight an inconsistency with LLVM’s ObjectFile, where the type is 64 bits (see here). Note that in other places in LLVM codebase relocation types are 32 bits. In our case, BOLT only has 3 inlined sites of this in |
I've put up a separate patch that makes Relocation type 32b and adds an |
This patch converts `Relocations` from a struct to a class, and introduces the `Optional` field. Patch #116964 will use it. Some optimizations, like `scanExternalRefs`, create relocations that patch the old code. Under certain circumstances these may be skipped without correctness implications.
In lite mode, we only emit code for a subset of functions while preserving the original code in .bolt.org.text. This requires updating code references in non-emitted functions to ensure that: * Non-optimized versions of the optimized code never execute. * Function pointer comparison semantics is preserved. On x86-64, we can update code references in-place using "pending relocations" added in scanExternalRefs(). However, on AArch64, this is not always possible due to address range limitations and linker address "relaxation". There are two types of code-to-code references: control transfer (e.g., calls and branches) and function pointer materialization. AArch64-specific control transfer instructions are covered by llvm#116964. For function pointer materialization, simply changing the immediate field of an instruction is not always sufficient. In some cases, we need to modify a pair of instructions, such as undoing linker relaxation and converting NOP+ADR into ADRP+ADD sequence. To achieve this, we use the instruction patch mechanism instead of pending relocations. Instruction patches are emitted via the regular MC layer, just like regular functions. However, they have a fixed address and do not have an associated symbol table entry. This allows us to make more complex changes to the code, ensuring that function pointers are correctly updated. Such mechanism should also be portable to RISC-V and other architectures. To summarize, for AArch64, we extend the scanExternalRefs() process to undo linker relaxation and use instruction patches to partially overwrite unoptimized code.
d46158c
to
b4e906c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Minor nits.
In lite mode, we only emit code for a subset of functions while preserving the original code in .bolt.org.text. This requires updating code references in non-emitted functions to ensure that: * Non-optimized versions of the optimized code never execute. * Function pointer comparison semantics is preserved. On x86-64, we can update code references in-place using "pending relocations" added in scanExternalRefs(). However, on AArch64, this is not always possible due to address range limitations and linker address "relaxation". There are two types of code-to-code references: control transfer (e.g., calls and branches) and function pointer materialization. AArch64-specific control transfer instructions are covered by #116964. For function pointer materialization, simply changing the immediate field of an instruction is not always sufficient. In some cases, we need to modify a pair of instructions, such as undoing linker relaxation and converting NOP+ADR into ADRP+ADD sequence. To achieve this, we use the instruction patch mechanism instead of pending relocations. Instruction patches are emitted via the regular MC layer, just like regular functions. However, they have a fixed address and do not have an associated symbol table entry. This allows us to make more complex changes to the code, ensuring that function pointers are correctly updated. Such mechanism should also be portable to RISC-V and other architectures. To summarize, for AArch64, we extend the scanExternalRefs() process to undo linker relaxation and use instruction patches to partially overwrite unoptimized code.
Hey Maksim, Thanks for your review. Let me know if you have any further suggestions. |
When a pending relocation is created it is also marked whether it is optional or not. It can be optional when such relocation is added as part of an optimization (i.e., `scanExternalRefs`). When bolt tries to `flushPendingRelocations`, it safely skips any optional relocations that cannot be encoded. Background: BOLT, as part of scanExternalRefs, identifies external references from calls and creates some pending relocations for them. Those when flushed will update references to point to the optimized functions. This optimization can be disabled using `--no-scan`. BOLT can assert if any of these pending relocations cannot be encoded. This patch does not disable this optimization but instead selectively applies it given that a pending relocation is optional.
Allow skipping pending relocations only when `-force-patch` is set. Otherwise, exit with a relevant message.
8798806
to
1bb9af5
Compare
forced push to rebase to latest main. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Paschalis. Looks great! Please address a tiny nit before commit.
Great, thanks. I'll wait a bit for other reviewers and merge by the EOD. |
When a pending relocation is created it is also marked whether it is
optional or not. It can be optional when such relocation is added as
part of an optimization (i.e.,
scanExternalRefs
).When bolt tries to
flushPendingRelocations
, it safely skips anyoptional relocations that cannot be encoded.
Background:
BOLT, as part of scanExternalRefs, identifies external references from
calls and creates some pending relocations for them. Those when flushed
will update references to point to the optimized functions. This
optimization can be disabled using
--no-scan
.BOLT can assert if any of these pending relocations cannot be encoded.
This patch does not disable this optimization but instead selectively
applies it given that a pending relocation is optional.