Skip to content

[BOLT] Skip out-of-range pending relocations #116964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 4, 2025

Conversation

paschalis-mpeis
Copy link
Member

@paschalis-mpeis paschalis-mpeis commented Nov 20, 2024

When a pending relocation is created it is also marked whether it is
optional or not. It can be optional when such relocation is added as
part of an optimization (i.e., scanExternalRefs).

When bolt tries to flushPendingRelocations, it safely skips any
optional relocations that cannot be encoded.

Background:
BOLT, as part of scanExternalRefs, identifies external references from
calls and creates some pending relocations for them. Those when flushed
will update references to point to the optimized functions. This
optimization can be disabled using --no-scan.

BOLT can assert if any of these pending relocations cannot be encoded.

This patch does not disable this optimization but instead selectively
applies it given that a pending relocation is optional.

@paschalis-mpeis paschalis-mpeis force-pushed the users/paschalis-mpeis/skip-oob-pending-relocs branch from e9b0066 to b3902db Compare January 20, 2025 16:27
@paschalis-mpeis paschalis-mpeis changed the base branch from main to users/paschalis-mpeis/bolt-addrelocs-nopending January 20, 2025 16:27
@paschalis-mpeis paschalis-mpeis marked this pull request as ready for review January 20, 2025 16:34
@llvmbot llvmbot added the BOLT label Jan 20, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 20, 2025

@llvm/pr-subscribers-bolt

Author: Paschalis Mpeis (paschalis-mpeis)

Changes

When a pending relocation is created it is also marked whether it is
optional or not. It can be optional when such relocation is added as
part of an optimization (i.e., scanExternalRefs).

When bolt tries to flushPendingRelocations, it safely skips any
optional relocations that cannot be encoded.

Background:
BOLT, as part of scanExternalRefs, identifies external references from
calls and creates some pending relocations for them. Those when flushed
will update references to point to the optimized functions. This
optimization can be disabled using --no-scan.

BOLT can assert if any of these pending relocations cannot be encoded.

This patch does not disable this optimization but instead selectively
applies it given that a pending relocation is optional.


Stacked PR on top of:

  • #123635

Full diff: https://github.com/llvm/llvm-project/pull/116964.diff

6 Files Affected:

  • (modified) bolt/include/bolt/Core/BinarySection.h (+8-5)
  • (modified) bolt/include/bolt/Core/Relocation.h (+4-1)
  • (modified) bolt/lib/Core/BinaryFunction.cpp (+1-1)
  • (modified) bolt/lib/Core/BinarySection.cpp (+13-1)
  • (modified) bolt/lib/Core/Relocation.cpp (+33)
  • (modified) bolt/unittests/Core/BinaryContext.cpp (+32)
diff --git a/bolt/include/bolt/Core/BinarySection.h b/bolt/include/bolt/Core/BinarySection.h
index dedee361882497..37d0932a5c142e 100644
--- a/bolt/include/bolt/Core/BinarySection.h
+++ b/bolt/include/bolt/Core/BinarySection.h
@@ -66,8 +66,10 @@ class BinarySection {
   // from the original section address.
   RelocationSetType DynamicRelocations;
 
-  // Pending relocations for this section.
-  std::vector<Relocation> PendingRelocations;
+  /// Pending relocations for this section and whether they are optional, i.e.,
+  /// added as part of an optimization. In that case they can be safely omitted
+  /// if flushPendingRelocations discovers they cannot be encoded.
+  std::vector<std::pair<Relocation, bool>> PendingRelocations;
 
   struct BinaryPatch {
     uint64_t Offset;
@@ -374,9 +376,10 @@ class BinarySection {
     DynamicRelocations.emplace(Reloc);
   }
 
-  /// Add relocation against the original contents of this section.
-  void addPendingRelocation(const Relocation &Rel) {
-    PendingRelocations.push_back(Rel);
+  /// Add relocation against the original contents of this section. When added
+  /// as part of an optimization it is marked as \p Optional.
+  void addPendingRelocation(const Relocation &Rel, bool Optional = false) {
+    PendingRelocations.push_back({Rel, Optional});
   }
 
   /// Add patch to the input contents of this section.
diff --git a/bolt/include/bolt/Core/Relocation.h b/bolt/include/bolt/Core/Relocation.h
index 933f62a31f8fd7..177cc0c70431f4 100644
--- a/bolt/include/bolt/Core/Relocation.h
+++ b/bolt/include/bolt/Core/Relocation.h
@@ -64,9 +64,12 @@ struct Relocation {
   /// and \P Type mismatch occurred.
   static bool skipRelocationProcess(uint64_t &Type, uint64_t Contents);
 
-  // Adjust value depending on relocation type (make it PC relative or not)
+  /// Adjust value depending on relocation type (make it PC relative or not).
   static uint64_t encodeValue(uint64_t Type, uint64_t Value, uint64_t PC);
 
+  /// Return true if there are enough bits to encode the relocation value.
+  static bool canEncodeValue(uint64_t Type, uint64_t Value, uint64_t PC);
+
   /// Extract current relocated value from binary contents. This is used for
   /// RISC architectures where values are encoded in specific bits depending
   /// on the relocation value. For X86, we limit to sign extending the value
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index 5da777411ba7a1..5d1e5ca92ca131 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -1672,7 +1672,7 @@ bool BinaryFunction::scanExternalRefs() {
   // Add relocations unless disassembly failed for this function.
   if (!DisassemblyFailed)
     for (Relocation &Rel : FunctionRelocations)
-      getOriginSection()->addPendingRelocation(Rel);
+      getOriginSection()->addPendingRelocation(Rel, /*Optional*/ true);
 
   // Inform BinaryContext that this function symbols will not be defined and
   // relocations should not be created against them.
diff --git a/bolt/lib/Core/BinarySection.cpp b/bolt/lib/Core/BinarySection.cpp
index 9ad49ca1b3a038..a37cc7603df285 100644
--- a/bolt/lib/Core/BinarySection.cpp
+++ b/bolt/lib/Core/BinarySection.cpp
@@ -165,11 +165,19 @@ void BinarySection::flushPendingRelocations(raw_pwrite_stream &OS,
     OS.pwrite(Patch.Bytes.data(), Patch.Bytes.size(),
               SectionFileOffset + Patch.Offset);
 
-  for (Relocation &Reloc : PendingRelocations) {
+  uint64_t SkippedPendingRelocations = 0;
+  for (auto &[Reloc, Optional] : PendingRelocations) {
     uint64_t Value = Reloc.Addend;
     if (Reloc.Symbol)
       Value += Resolver(Reloc.Symbol);
 
+    // Safely skip any pending relocation that cannot be encoded and was added
+    // as part of an optimization.
+    if (Optional && !Relocation::canEncodeValue(
+                        Reloc.Type, Value, SectionAddress + Reloc.Offset)) {
+      ++SkippedPendingRelocations;
+      continue;
+    }
     Value = Relocation::encodeValue(Reloc.Type, Value,
                                     SectionAddress + Reloc.Offset);
 
@@ -188,6 +196,10 @@ void BinarySection::flushPendingRelocations(raw_pwrite_stream &OS,
   }
 
   clearList(PendingRelocations);
+
+  if (SkippedPendingRelocations > 0)
+    LLVM_DEBUG(dbgs() << "BOLT-INFO: Skipped " << SkippedPendingRelocations
+                      << " pending relocations as they were out of range\n");
 }
 
 BinarySection::~BinarySection() { updateContents(nullptr, 0); }
diff --git a/bolt/lib/Core/Relocation.cpp b/bolt/lib/Core/Relocation.cpp
index 4e888a5b147aca..3bc5c20ceb6739 100644
--- a/bolt/lib/Core/Relocation.cpp
+++ b/bolt/lib/Core/Relocation.cpp
@@ -361,6 +361,16 @@ static uint64_t encodeValueX86(uint64_t Type, uint64_t Value, uint64_t PC) {
   return Value;
 }
 
+static bool canEncodeValueAArch64(uint64_t Type, uint64_t Value, uint64_t PC) {
+  switch (Type) {
+  default:
+    llvm_unreachable("unsupported relocation");
+  case ELF::R_AARCH64_CALL26:
+  case ELF::R_AARCH64_JUMP26:
+    return isInt<28>(Value - PC);
+  }
+}
+
 static uint64_t encodeValueAArch64(uint64_t Type, uint64_t Value, uint64_t PC) {
   switch (Type) {
   default:
@@ -393,6 +403,16 @@ static uint64_t encodeValueAArch64(uint64_t Type, uint64_t Value, uint64_t PC) {
   return Value;
 }
 
+static uint64_t canEncodeValueRISCV(uint64_t Type, uint64_t Value,
+                                    uint64_t PC) {
+  switch (Type) {
+  default:
+    llvm_unreachable("unsupported relocation");
+  case ELF::R_RISCV_64:
+    return true;
+  }
+}
+
 static uint64_t encodeValueRISCV(uint64_t Type, uint64_t Value, uint64_t PC) {
   switch (Type) {
   default:
@@ -838,6 +858,19 @@ uint64_t Relocation::encodeValue(uint64_t Type, uint64_t Value, uint64_t PC) {
   }
 }
 
+bool Relocation::canEncodeValue(uint64_t Type, uint64_t Value, uint64_t PC) {
+  switch (Arch) {
+  default:
+    llvm_unreachable("Unsupported architecture");
+  case Triple::aarch64:
+    return canEncodeValueAArch64(Type, Value, PC);
+  case Triple::riscv64:
+    return canEncodeValueRISCV(Type, Value, PC);
+  case Triple::x86_64:
+    return true;
+  }
+}
+
 uint64_t Relocation::extractValue(uint64_t Type, uint64_t Contents,
                                   uint64_t PC) {
   switch (Arch) {
diff --git a/bolt/unittests/Core/BinaryContext.cpp b/bolt/unittests/Core/BinaryContext.cpp
index 6e8ad4f62baeff..9ba0bba16c9f35 100644
--- a/bolt/unittests/Core/BinaryContext.cpp
+++ b/bolt/unittests/Core/BinaryContext.cpp
@@ -157,6 +157,38 @@ TEST_P(BinaryContextTester, FlushPendingRelocJUMP26) {
       << "Wrong forward branch value\n";
 }
 
+TEST_P(BinaryContextTester, FlushOptionalOutOfRangePendingRelocCALL26) {
+  if (GetParam() != Triple::aarch64)
+    GTEST_SKIP();
+
+  // This test checks that flushPendingRelocations skips flushing any optional
+  // pending relocations that cannot be encoded.
+
+  bool DebugFlagPrev = ::llvm::DebugFlag;
+  ::llvm::DebugFlag = true;
+  testing::internal::CaptureStderr();
+
+  BinarySection &BS = BC->registerOrUpdateSection(
+      ".text", ELF::SHT_PROGBITS, ELF::SHF_EXECINSTR | ELF::SHF_ALLOC);
+  MCSymbol *RelSymbol = BC->getOrCreateGlobalSymbol(4, "Func");
+  ASSERT_TRUE(RelSymbol);
+  BS.addPendingRelocation(Relocation{8, RelSymbol, ELF::R_AARCH64_CALL26, 0, 0},
+                          /*Optional*/ true);
+
+  SmallVector<char> Vect;
+  raw_svector_ostream OS(Vect);
+
+  // Resolve relocation symbol to a high value so encoding will be out of range.
+  BS.flushPendingRelocations(OS, [&](const MCSymbol *S) { return 0x800000F; });
+
+  std::string CapturedStdErr = testing::internal::GetCapturedStderr();
+  EXPECT_TRUE(CapturedStdErr.find("BOLT-INFO: Skipped 1 pending relocations as "
+                                  "they were out of range") !=
+              std::string::npos);
+
+  ::llvm::DebugFlag = DebugFlagPrev;
+}
+
 #endif
 
 TEST_P(BinaryContextTester, BaseAddress) {

@paschalis-mpeis
Copy link
Member Author

Forced-pushed to add the missing code. Also, this PR on now stacked top of #123635.
Thanks for the comments @maks. I am not sure if your concern on the issue still stands or not.

Base automatically changed from users/paschalis-mpeis/bolt-addrelocs-nopending to main February 12, 2025 15:24
@paschalis-mpeis paschalis-mpeis force-pushed the users/paschalis-mpeis/skip-oob-pending-relocs branch from b3902db to d46158c Compare February 20, 2025 16:45
@paschalis-mpeis paschalis-mpeis changed the base branch from main to users/paschalis-mpeis/move-force-path-to-cliopts February 20, 2025 16:46
@paschalis-mpeis
Copy link
Member Author

Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Overall, looks good to me.

Let's add Optional flag to the Relocation. We can make Relocation::Type uint32_t and then the new flag will come with zero overhead.

@paschalis-mpeis
Copy link
Member Author

paschalis-mpeis commented Mar 6, 2025

Hey Maksim,

Extending Relocations is even better. Thanks for the suggestion and the review.

Before proceeding, and regarding the size overheads, I want to highlight an inconsistency with LLVM’s ObjectFile, where the type is 64 bits (see here). Note that in other places in LLVM codebase relocation types are 32 bits.

In our case, BOLT only has 3 inlined sites of this in RewriteInstance (eg one is here). If you agree, I'll proceed with an NFCI change, adding assertion overflow checks at these 3 sites.

@paschalis-mpeis
Copy link
Member Author

I've put up a separate patch that makes Relocation type 32b and adds an Optional field:

paschalis-mpeis added a commit that referenced this pull request Mar 20, 2025
This patch converts `Relocations` from a struct to a class, and
introduces the `Optional` field. Patch #116964 will use it.

Some optimizations, like `scanExternalRefs`, create relocations that
patch the old code. Under certain circumstances these may be skipped
without correctness implications.
@paschalis-mpeis paschalis-mpeis marked this pull request as draft March 20, 2025 18:05
Base automatically changed from users/paschalis-mpeis/move-force-path-to-cliopts to main March 21, 2025 15:55
maksfb added a commit to maksfb/llvm-project that referenced this pull request Mar 25, 2025
In lite mode, we only emit code for a subset of functions while
preserving the original code in .bolt.org.text. This requires updating
code references in non-emitted functions to ensure that:

* Non-optimized versions of the optimized code never execute.
* Function pointer comparison semantics is preserved.

On x86-64, we can update code references in-place using "pending
relocations" added in scanExternalRefs(). However, on AArch64, this is
not always possible due to address range limitations and linker address
"relaxation".

There are two types of code-to-code references: control transfer (e.g.,
calls and branches) and function pointer materialization.
AArch64-specific control transfer instructions are covered by llvm#116964.

For function pointer materialization, simply changing the immediate
field of an instruction is not always sufficient. In some cases, we need
to modify a pair of instructions, such as undoing linker relaxation and
converting NOP+ADR into ADRP+ADD sequence.

To achieve this, we use the instruction patch mechanism instead of
pending relocations. Instruction patches are emitted via the regular MC
layer, just like regular functions. However, they have a fixed address
and do not have an associated symbol table entry. This allows us to make
more complex changes to the code, ensuring that function pointers are
correctly updated. Such mechanism should also be portable to RISC-V and
other architectures.

To summarize, for AArch64, we extend the scanExternalRefs() process to
undo linker relaxation and use instruction patches to partially
overwrite unoptimized code.
@paschalis-mpeis paschalis-mpeis force-pushed the users/paschalis-mpeis/skip-oob-pending-relocs branch from d46158c to b4e906c Compare March 27, 2025 12:47
@paschalis-mpeis paschalis-mpeis changed the base branch from main to users/paschalis-mpeis/relocation-optional-getter March 27, 2025 12:48
@paschalis-mpeis
Copy link
Member Author

Thanks for your review Maksim.

Forced-push to rebase on top of #133085 and add commit b4e906c to address reviewers.

@paschalis-mpeis paschalis-mpeis marked this pull request as ready for review March 27, 2025 13:05
@paschalis-mpeis paschalis-mpeis requested a review from maksfb March 27, 2025 13:05
Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Minor nits.

maksfb added a commit that referenced this pull request Mar 28, 2025
In lite mode, we only emit code for a subset of functions while
preserving the original code in .bolt.org.text. This requires updating
code references in non-emitted functions to ensure that:

* Non-optimized versions of the optimized code never execute.
* Function pointer comparison semantics is preserved.

On x86-64, we can update code references in-place using "pending
relocations" added in scanExternalRefs(). However, on AArch64, this is
not always possible due to address range limitations and linker address
"relaxation".

There are two types of code-to-code references: control transfer (e.g.,
calls and branches) and function pointer materialization.
AArch64-specific control transfer instructions are covered by #116964.

For function pointer materialization, simply changing the immediate
field of an instruction is not always sufficient. In some cases, we need
to modify a pair of instructions, such as undoing linker relaxation and
converting NOP+ADR into ADRP+ADD sequence.

To achieve this, we use the instruction patch mechanism instead of
pending relocations. Instruction patches are emitted via the regular MC
layer, just like regular functions. However, they have a fixed address
and do not have an associated symbol table entry. This allows us to make
more complex changes to the code, ensuring that function pointers are
correctly updated. Such mechanism should also be portable to RISC-V and
other architectures.

To summarize, for AArch64, we extend the scanExternalRefs() process to
undo linker relaxation and use instruction patches to partially
overwrite unoptimized code.
Base automatically changed from users/paschalis-mpeis/relocation-optional-getter to main March 28, 2025 14:07
@paschalis-mpeis
Copy link
Member Author

Hey Maksim,

Thanks for your review. Let me know if you have any further suggestions.

@paschalis-mpeis paschalis-mpeis requested a review from maksfb April 3, 2025 15:14
When a pending relocation is created it is also marked whether it is
optional or not. It can be optional when such relocation is added as
part of an optimization (i.e., `scanExternalRefs`).

When bolt tries to `flushPendingRelocations`, it safely skips any
optional relocations that cannot be encoded.

Background:
BOLT, as part of scanExternalRefs, identifies external references from
calls and creates some pending relocations for them. Those when flushed
will update references to point to the optimized functions. This
optimization can be disabled using `--no-scan`.

BOLT can assert if any of these pending relocations cannot be encoded.

This patch does not disable this optimization but instead selectively
applies it given that a pending relocation is optional.
Allow skipping pending relocations only when `-force-patch` is set.
Otherwise, exit with a relevant message.
@paschalis-mpeis paschalis-mpeis force-pushed the users/paschalis-mpeis/skip-oob-pending-relocs branch from 8798806 to 1bb9af5 Compare April 3, 2025 15:45
@paschalis-mpeis
Copy link
Member Author

forced push to rebase to latest main.

Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Paschalis. Looks great! Please address a tiny nit before commit.

@paschalis-mpeis
Copy link
Member Author

Great, thanks. I'll wait a bit for other reviewers and merge by the EOD.

@paschalis-mpeis paschalis-mpeis merged commit 3d24046 into main Apr 4, 2025
10 checks passed
@paschalis-mpeis paschalis-mpeis deleted the users/paschalis-mpeis/skip-oob-pending-relocs branch April 4, 2025 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants