Skip to content

[BOLT] Validate secondary entry point #135731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 15, 2025
Merged

[BOLT] Validate secondary entry point #135731

merged 1 commit into from
Apr 15, 2025

Conversation

yozhu
Copy link
Contributor

@yozhu yozhu commented Apr 15, 2025

Some functions have their sizes as zero in input binary's symbol table, like
those compiled by assembler. When figuring out function sizes, we may create
label symbol if it doesn't point to any constant island. However, before
function size is known, marker symbol cannot be correctly associated to a
function and therefore all such checks would fail and we could end up adding
a code label pointing to constant island as secondary entry point and later
mistakenly marking the function as not simple.

Querying the global marker symbol array has big throughput overhead. Instead
we can run an extra check when post processing entry points to identify such
label symbols that actually point to constant islands.

Some functions have their sizes as zero in input binary's symbol table, like
those compiled by assembler. When figuring out function sizes, we may create
label symbol if it doesn't point to any constant island. However, before
function size is known, marker symbol cannot be correctly associated to a
function and therefore all such checks would fail and we could end up adding
a code label pointing to constant island as secondary entry point and later
mistakenly marking the function as not simple.

Querying the global marker symbol array has big throughput overhead. Instead
we can run an extra check when post processing entry points to identify such
label symbols that actually point to constant islands.
@llvmbot
Copy link
Member

llvmbot commented Apr 15, 2025

@llvm/pr-subscribers-bolt

Author: YongKang Zhu (yozhu)

Changes

Some functions have their sizes as zero in input binary's symbol table, like
those compiled by assembler. When figuring out function sizes, we may create
label symbol if it doesn't point to any constant island. However, before
function size is known, marker symbol cannot be correctly associated to a
function and therefore all such checks would fail and we could end up adding
a code label pointing to constant island as secondary entry point and later
mistakenly marking the function as not simple.

Querying the global marker symbol array has big throughput overhead. Instead
we can run an extra check when post processing entry points to identify such
label symbols that actually point to constant islands.


Full diff: https://github.com/llvm/llvm-project/pull/135731.diff

4 Files Affected:

  • (modified) bolt/include/bolt/Core/BinaryFunction.h (+9)
  • (modified) bolt/lib/Core/BinaryFunction.cpp (+9)
  • (added) bolt/test/AArch64/validate-secondary-entry-point.s (+34)
  • (added) bolt/test/RISCV/validate-secondary-entry-point.s (+34)
diff --git a/bolt/include/bolt/Core/BinaryFunction.h b/bolt/include/bolt/Core/BinaryFunction.h
index d3d11f8c5fb73..a52998564ee1b 100644
--- a/bolt/include/bolt/Core/BinaryFunction.h
+++ b/bolt/include/bolt/Core/BinaryFunction.h
@@ -1174,6 +1174,11 @@ class BinaryFunction {
     return getSecondaryEntryPointSymbol(BB.getLabel());
   }
 
+  /// Remove a label from the secondary entry point map.
+  void removeSymbolFromSecondaryEntryPointMap(const MCSymbol *Label) {
+    SecondaryEntryPoints.erase(Label);
+  }
+
   /// Return true if the basic block is an entry point into the function
   /// (either primary or secondary).
   bool isEntryPoint(const BinaryBasicBlock &BB) const {
@@ -2126,6 +2131,10 @@ class BinaryFunction {
     return Islands && !Islands->DataOffsets.empty();
   }
 
+  bool isStartOfConstantIsland(uint64_t Offset) const {
+    return hasConstantIsland() && Islands->DataOffsets.count(Offset);
+  }
+
   /// Return true iff the symbol could be seen inside this function otherwise
   /// it is probably another function.
   bool isSymbolValidInScope(const SymbolRef &Symbol, uint64_t SymbolSize) const;
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index c4f4d234b30c0..184a4462b356a 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -1896,6 +1896,15 @@ void BinaryFunction::postProcessEntryPoints() {
     if (BC.isAArch64() && Offset == getSize())
       continue;
 
+    // If we have grabbed a wrong code label which actually points to some
+    // constant island inside the function, ignore this label and remove it
+    // from the secondary entry point map.
+    if (isStartOfConstantIsland(Offset)) {
+      BC.SymbolToFunctionMap.erase(Label);
+      removeSymbolFromSecondaryEntryPointMap(Label);
+      continue;
+    }
+
     BC.errs() << "BOLT-WARNING: reference in the middle of instruction "
                  "detected in function "
               << *this << " at offset 0x" << Twine::utohexstr(Offset) << '\n';
diff --git a/bolt/test/AArch64/validate-secondary-entry-point.s b/bolt/test/AArch64/validate-secondary-entry-point.s
new file mode 100644
index 0000000000000..0099a0ee4fe99
--- /dev/null
+++ b/bolt/test/AArch64/validate-secondary-entry-point.s
@@ -0,0 +1,34 @@
+# This test is to verify that BOLT won't take a label pointing to constant
+# island as a secondary entry point (function `_start` doesn't have ELF size
+# set originally) and the function won't otherwise be mistaken as non-simple.
+
+# RUN: %clang %cflags -pie %s -o %t.so -Wl,-q -Wl,--init=_foo -Wl,--fini=_foo
+# RUN: llvm-bolt %t.so -o %t.bolt.so --print-cfg 2>&1 | FileCheck %s
+# CHECK-NOT: BOLT-WARNING: reference in the middle of instruction detected \
+# CHECK-NOT:   function _start at offset 0x{{[0-9a-f]+}}
+# CHECK: Binary Function "_start" after building cfg
+
+  .text
+
+  .global _foo
+  .type _foo, %function
+_foo:
+  ret
+
+  .global _start
+  .type _start, %function
+_start:
+  b _foo
+
+  .balign 16
+_random_consts:
+  .long 0x12345678
+  .long 0x90abcdef
+
+  .global _bar
+  .type _bar, %function
+_bar:
+  ret
+
+  # Dummy relocation to force relocation mode
+  .reloc 0, R_AARCH64_NONE
diff --git a/bolt/test/RISCV/validate-secondary-entry-point.s b/bolt/test/RISCV/validate-secondary-entry-point.s
new file mode 100644
index 0000000000000..0c29f5c97c689
--- /dev/null
+++ b/bolt/test/RISCV/validate-secondary-entry-point.s
@@ -0,0 +1,34 @@
+# This test is to verify that BOLT won't take a label pointing to constant
+# island as a secondary entry point (function `_start` doesn't have ELF size
+# set originally) and the function won't otherwise be mistaken as non-simple.
+
+# RUN: %clang %cflags -pie %s -o %t.so -Wl,-q -Wl,--init=_foo -Wl,--fini=_foo
+# RUN: llvm-bolt %t.so -o %t.bolt.so --print-cfg 2>&1 | FileCheck %s
+# CHECK-NOT: BOLT-WARNING: reference in the middle of instruction detected \
+# CHECK-NOT:   function _start at offset 0x{{[0-9a-f]+}}
+# CHECK: Binary Function "_start" after building cfg
+
+  .text
+
+  .global _foo
+  .type _foo, %function
+_foo:
+  ret
+
+  .global _start
+  .type _start, %function
+_start:
+  j _foo
+
+  .balign 16
+_random_consts:
+  .long 0x12345678
+  .long 0x90abcdef
+
+  .global _bar
+  .type _bar, %function
+_bar:
+  ret
+
+  # Dummy relocation to force relocation mode
+  .reloc 0, R_RISCV_NONE

Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@yozhu yozhu merged commit 823adc7 into llvm:main Apr 15, 2025
12 checks passed
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
Some functions have their sizes as zero in input binary's symbol
table, like those compiled by assembler. When figuring out function
sizes, we may create label symbol if it doesn't point to any constant
island. However, before function size is known, marker symbol can
not be correctly associated to a function and therefore all such
checks would fail and we could end up adding a code label pointing
to constant island as secondary entry point and later mistakenly
marking the function as not simple.

Querying the global marker symbol array has big throughput overhead.
Instead we can run an extra check when post processing entry points
to identify such label symbols that actually point to constant islands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants