-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[BOLT] Validate secondary entry point #135731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some functions have their sizes as zero in input binary's symbol table, like those compiled by assembler. When figuring out function sizes, we may create label symbol if it doesn't point to any constant island. However, before function size is known, marker symbol cannot be correctly associated to a function and therefore all such checks would fail and we could end up adding a code label pointing to constant island as secondary entry point and later mistakenly marking the function as not simple. Querying the global marker symbol array has big throughput overhead. Instead we can run an extra check when post processing entry points to identify such label symbols that actually point to constant islands.
@llvm/pr-subscribers-bolt Author: YongKang Zhu (yozhu) ChangesSome functions have their sizes as zero in input binary's symbol table, like Querying the global marker symbol array has big throughput overhead. Instead Full diff: https://github.com/llvm/llvm-project/pull/135731.diff 4 Files Affected:
diff --git a/bolt/include/bolt/Core/BinaryFunction.h b/bolt/include/bolt/Core/BinaryFunction.h
index d3d11f8c5fb73..a52998564ee1b 100644
--- a/bolt/include/bolt/Core/BinaryFunction.h
+++ b/bolt/include/bolt/Core/BinaryFunction.h
@@ -1174,6 +1174,11 @@ class BinaryFunction {
return getSecondaryEntryPointSymbol(BB.getLabel());
}
+ /// Remove a label from the secondary entry point map.
+ void removeSymbolFromSecondaryEntryPointMap(const MCSymbol *Label) {
+ SecondaryEntryPoints.erase(Label);
+ }
+
/// Return true if the basic block is an entry point into the function
/// (either primary or secondary).
bool isEntryPoint(const BinaryBasicBlock &BB) const {
@@ -2126,6 +2131,10 @@ class BinaryFunction {
return Islands && !Islands->DataOffsets.empty();
}
+ bool isStartOfConstantIsland(uint64_t Offset) const {
+ return hasConstantIsland() && Islands->DataOffsets.count(Offset);
+ }
+
/// Return true iff the symbol could be seen inside this function otherwise
/// it is probably another function.
bool isSymbolValidInScope(const SymbolRef &Symbol, uint64_t SymbolSize) const;
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index c4f4d234b30c0..184a4462b356a 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -1896,6 +1896,15 @@ void BinaryFunction::postProcessEntryPoints() {
if (BC.isAArch64() && Offset == getSize())
continue;
+ // If we have grabbed a wrong code label which actually points to some
+ // constant island inside the function, ignore this label and remove it
+ // from the secondary entry point map.
+ if (isStartOfConstantIsland(Offset)) {
+ BC.SymbolToFunctionMap.erase(Label);
+ removeSymbolFromSecondaryEntryPointMap(Label);
+ continue;
+ }
+
BC.errs() << "BOLT-WARNING: reference in the middle of instruction "
"detected in function "
<< *this << " at offset 0x" << Twine::utohexstr(Offset) << '\n';
diff --git a/bolt/test/AArch64/validate-secondary-entry-point.s b/bolt/test/AArch64/validate-secondary-entry-point.s
new file mode 100644
index 0000000000000..0099a0ee4fe99
--- /dev/null
+++ b/bolt/test/AArch64/validate-secondary-entry-point.s
@@ -0,0 +1,34 @@
+# This test is to verify that BOLT won't take a label pointing to constant
+# island as a secondary entry point (function `_start` doesn't have ELF size
+# set originally) and the function won't otherwise be mistaken as non-simple.
+
+# RUN: %clang %cflags -pie %s -o %t.so -Wl,-q -Wl,--init=_foo -Wl,--fini=_foo
+# RUN: llvm-bolt %t.so -o %t.bolt.so --print-cfg 2>&1 | FileCheck %s
+# CHECK-NOT: BOLT-WARNING: reference in the middle of instruction detected \
+# CHECK-NOT: function _start at offset 0x{{[0-9a-f]+}}
+# CHECK: Binary Function "_start" after building cfg
+
+ .text
+
+ .global _foo
+ .type _foo, %function
+_foo:
+ ret
+
+ .global _start
+ .type _start, %function
+_start:
+ b _foo
+
+ .balign 16
+_random_consts:
+ .long 0x12345678
+ .long 0x90abcdef
+
+ .global _bar
+ .type _bar, %function
+_bar:
+ ret
+
+ # Dummy relocation to force relocation mode
+ .reloc 0, R_AARCH64_NONE
diff --git a/bolt/test/RISCV/validate-secondary-entry-point.s b/bolt/test/RISCV/validate-secondary-entry-point.s
new file mode 100644
index 0000000000000..0c29f5c97c689
--- /dev/null
+++ b/bolt/test/RISCV/validate-secondary-entry-point.s
@@ -0,0 +1,34 @@
+# This test is to verify that BOLT won't take a label pointing to constant
+# island as a secondary entry point (function `_start` doesn't have ELF size
+# set originally) and the function won't otherwise be mistaken as non-simple.
+
+# RUN: %clang %cflags -pie %s -o %t.so -Wl,-q -Wl,--init=_foo -Wl,--fini=_foo
+# RUN: llvm-bolt %t.so -o %t.bolt.so --print-cfg 2>&1 | FileCheck %s
+# CHECK-NOT: BOLT-WARNING: reference in the middle of instruction detected \
+# CHECK-NOT: function _start at offset 0x{{[0-9a-f]+}}
+# CHECK: Binary Function "_start" after building cfg
+
+ .text
+
+ .global _foo
+ .type _foo, %function
+_foo:
+ ret
+
+ .global _start
+ .type _start, %function
+_start:
+ j _foo
+
+ .balign 16
+_random_consts:
+ .long 0x12345678
+ .long 0x90abcdef
+
+ .global _bar
+ .type _bar, %function
+_bar:
+ ret
+
+ # Dummy relocation to force relocation mode
+ .reloc 0, R_RISCV_NONE
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Some functions have their sizes as zero in input binary's symbol table, like those compiled by assembler. When figuring out function sizes, we may create label symbol if it doesn't point to any constant island. However, before function size is known, marker symbol can not be correctly associated to a function and therefore all such checks would fail and we could end up adding a code label pointing to constant island as secondary entry point and later mistakenly marking the function as not simple. Querying the global marker symbol array has big throughput overhead. Instead we can run an extra check when post processing entry points to identify such label symbols that actually point to constant islands.
Some functions have their sizes as zero in input binary's symbol table, like
those compiled by assembler. When figuring out function sizes, we may create
label symbol if it doesn't point to any constant island. However, before
function size is known, marker symbol cannot be correctly associated to a
function and therefore all such checks would fail and we could end up adding
a code label pointing to constant island as secondary entry point and later
mistakenly marking the function as not simple.
Querying the global marker symbol array has big throughput overhead. Instead
we can run an extra check when post processing entry points to identify such
label symbols that actually point to constant islands.