-
Notifications
You must be signed in to change notification settings - Fork 14.3k
PowerPC32:PIC: Update to bcl to fix branch prediction mis-predict issue #134140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-powerpc Author: Lei Huang (lei137) ChangesUpdate fixes: #128644 Full diff: https://github.com/llvm/llvm-project/pull/134140.diff 5 Files Affected:
diff --git a/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp b/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
index ba6653e10bdc5..0aca8ec7567af 100644
--- a/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
+++ b/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
@@ -955,8 +955,8 @@ void PPCAsmPrinter::emitInstruction(const MachineInstr *MI) {
OutContext),
MCConstantExpr::create(4, OutContext), OutContext);
- // Emit the 'bl'.
- EmitToStreamer(*OutStreamer, MCInstBuilder(PPC::BL).addExpr(OffsExpr));
+ // Emit 'bcl 20,31,.+4' so the link stack is not corrupted.
+ EmitToStreamer(*OutStreamer, MCInstBuilder(PPC::BCLalways).addExpr(OffsExpr));
return;
}
case PPC::MovePCtoLR:
@@ -967,9 +967,9 @@ void PPCAsmPrinter::emitInstruction(const MachineInstr *MI) {
// L1$pb:
MCSymbol *PICBase = MF->getPICBaseSymbol();
- // Emit the 'bl'.
+ // Emit 'bcl 20,31,.+4' so the link stack is not corrupted.
EmitToStreamer(*OutStreamer,
- MCInstBuilder(PPC::BL)
+ MCInstBuilder(PPC::BCLalways)
// FIXME: We would like an efficient form for this, so we
// don't have to do a lot of extra uniquing.
.addExpr(MCSymbolRefExpr::create(PICBase, OutContext)));
diff --git a/llvm/test/CodeGen/PowerPC/mcm-5.ll b/llvm/test/CodeGen/PowerPC/mcm-5.ll
index b88f405211b05..f9629b5710d72 100644
--- a/llvm/test/CodeGen/PowerPC/mcm-5.ll
+++ b/llvm/test/CodeGen/PowerPC/mcm-5.ll
@@ -51,7 +51,7 @@ sw.epilog: ; preds = %sw.bb3, %sw.default
ret i32 %5
}
; CHECK-LABEL: test_jump_table:
-; CHECK-NOT: bl .L0$pb
+; CHECK-NOT: bcl 20, 31, .L0$pb
; CHECK: addis [[REG1:[0-9]+]], 2, .LC[[TOCNUM:[0-9]+]]@toc@ha
; CHECK: ld [[REG2:[0-9]+]], .LC[[TOCNUM]]@toc@l([[REG1]])
@@ -64,7 +64,7 @@ sw.epilog: ; preds = %sw.bb3, %sw.default
; CHECK-NEXT: .long .LBB0_{{[0-9]+}}-.LJTI0_0
; LARGE-LABEL: test_jump_table:
-; LARGE: bl .L0$pb
+; LARGE: bcl 20, 31, .L0$pb
; LARGE-NEXT: .L0$pb:
; LARGE: mflr [[REGBASE:[0-9]+]]
diff --git a/llvm/test/CodeGen/PowerPC/ppc32-pic-bcl.ll b/llvm/test/CodeGen/PowerPC/ppc32-pic-bcl.ll
new file mode 100644
index 0000000000000..1e938b135fbfd
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/ppc32-pic-bcl.ll
@@ -0,0 +1,29 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=powerpc -relocation-model=pic | \
+; RUN: FileCheck -check-prefixes=SMALL %s
+
+@val = global i8 0, align 1
+
+define zeroext i8 @testbcl() nounwind {
+; SMALL-LABEL: testbcl:
+; SMALL: # %bb.0: # %entry
+; SMALL-NEXT: mflr 0
+; SMALL-NEXT: stwu 1, -16(1)
+; SMALL-NEXT: stw 30, 8(1)
+; SMALL-NEXT: stw 0, 20(1)
+; SMALL-NEXT: bcl 20, 31, .L0$pb
+; SMALL-NEXT: .L0$pb:
+; SMALL-NEXT: mflr 30
+; SMALL-NEXT: lwz 3, .L0$poff-.L0$pb(30)
+; SMALL-NEXT: add 30, 3, 30
+; SMALL-NEXT: lwz 3, .LC0-.LTOC(30)
+; SMALL-NEXT: lbz 3, 0(3)
+; SMALL-NEXT: lwz 0, 20(1)
+; SMALL-NEXT: lwz 30, 8(1)
+; SMALL-NEXT: addi 1, 1, 16
+; SMALL-NEXT: mtlr 0
+; SMALL-NEXT: blr
+entry:
+ %0 = load i8, ptr @val, align 1
+ ret i8 %0
+}
diff --git a/llvm/test/CodeGen/PowerPC/ppc32-pic-large.ll b/llvm/test/CodeGen/PowerPC/ppc32-pic-large.ll
index 2f0b92964c13b..7be1a80b7af43 100644
--- a/llvm/test/CodeGen/PowerPC/ppc32-pic-large.ll
+++ b/llvm/test/CodeGen/PowerPC/ppc32-pic-large.ll
@@ -55,7 +55,7 @@ entry:
; LARGE-BSS-NEXT: foo:
; LARGE-BSS: stwu 1, -32(1)
; LARGE-BSS: stw 30, 24(1)
-; LARGE-BSS: bl [[PB]]
+; LARGE-BSS: bcl 20, 31, [[PB]]
; LARGE-BSS-NEXT: [[PB]]:
; LARGE-BSS: mflr 30
; LARGE-BSS: lwz [[REG:[0-9]+]], [[POFF]]-[[PB]](30)
diff --git a/llvm/test/CodeGen/PowerPC/ppc32-pic.ll b/llvm/test/CodeGen/PowerPC/ppc32-pic.ll
index aed994144940c..9ffd7395ffeb5 100644
--- a/llvm/test/CodeGen/PowerPC/ppc32-pic.ll
+++ b/llvm/test/CodeGen/PowerPC/ppc32-pic.ll
@@ -18,8 +18,8 @@ entry:
; SMALL-LABEL: foo:
; SMALL: stwu 1, -32(1)
; SMALL: stw 30, 24(1)
-; SMALL-BSS: bl _GLOBAL_OFFSET_TABLE_@local-4
-; SMALL-SECURE: bl .L0$pb
+; SMALL-BSS: bcl 20, 31, _GLOBAL_OFFSET_TABLE_@local-4
+; SMALL-SECURE: bcl 20, 31, .L0$pb
; SMALL: mflr 30
; SMALL-SECURE: addis 30, 30, _GLOBAL_OFFSET_TABLE_-.L0$pb@ha
; SMALL-SECURE: addi 30, 30, _GLOBAL_OFFSET_TABLE_-.L0$pb@l
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
@@ -51,7 +51,7 @@ sw.epilog: ; preds = %sw.bb3, %sw.default | |||
ret i32 %5 | |||
} | |||
; CHECK-LABEL: test_jump_table: | |||
; CHECK-NOT: bl .L0$pb | |||
; CHECK-NOT: bcl 20, 31, .L0$pb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silly question - the .L0$pb
is supposed to evaluate to .+4
, I assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If you look at the LARGE code gen, .L0$pb:
is the instruction right after bcl
so it's +4
from the bcl
instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! Yes, good point. That makes sense. Thank you for pointing that out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, FWIW.
@@ -18,8 +18,8 @@ entry: | |||
; SMALL-LABEL: foo: | |||
; SMALL: stwu 1, -32(1) | |||
; SMALL: stw 30, 24(1) | |||
; SMALL-BSS: bl _GLOBAL_OFFSET_TABLE_@local-4 | |||
; SMALL-SECURE: bl .L0$pb | |||
; SMALL-BSS: bcl 20, 31, _GLOBAL_OFFSET_TABLE_@local-4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change is not valid. Allthough it will likely work, it is expected to use 'bl' for unconditionnal calls, as there will be a return instruction at the called address.
Only 'bl +.4' is to be replaced by 'bcl 20,31,+.4', nothing else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, IIUC the linker inserts a blrl
at this address. So we branch and link to it, then return and link to get the pointer to the .got and the link stack should stay in sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Update `bl` to `bcl 20, 31, .+4` for 32bit PIC code gen so we don't corrupt the link stack and cause mis-predict for the branch predictor. fixes: llvm#128644
Update
bl
tobcl 20, 31, .+4
for 32bit PIC code gen so the link stack isnot corrupted and cause mis-predict for the branch predictor.
fixes: #128644