-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[ARM] Fix llvm.returnaddress for Thumb1 with R11 frame-pointer #117735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-arm Author: Oliver Stannard (ostannard) ChangesWhen using R11 as the frame pointer for a Thumb1 target, we need to use LR as a temporary in the prologue when setting up the frame record, like this:
This means that the value in LR is clobbered by the prologue, so we can't use it as a live-in value to the function to implement the llvm.returnaddress intrinsic. I don't think there's any way to avoid this use if LR without making the prologue code even more complicated. Instead, we can load the return address from the frame record, as we do when the intrinsic is used with a non-zero depth. Full diff: https://github.com/llvm/llvm-project/pull/117735.diff 3 Files Affected:
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index 6b290135c5bcba..6032cac473c137 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -6196,18 +6196,28 @@ SDValue ARMTargetLowering::LowerFCOPYSIGN(SDValue Op, SelectionDAG &DAG) const {
return DAG.getNode(ARMISD::VMOVDRR, dl, MVT::f64, Lo, Hi);
}
-SDValue ARMTargetLowering::LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const{
+SDValue
+ARMTargetLowering::LowerRETURNADDR(SDValue Op, SelectionDAG &DAG,
+ const ARMSubtarget *Subtarget) const {
MachineFunction &MF = DAG.getMachineFunction();
MachineFrameInfo &MFI = MF.getFrameInfo();
+ const ARMBaseRegisterInfo &ARI =
+ *static_cast<const ARMBaseRegisterInfo*>(RegInfo);
MFI.setReturnAddressIsTaken(true);
if (verifyReturnAddressArgumentIsConstant(Op, DAG))
return SDValue();
+ // When the frame register is R11 for a Thumb1-only target, we need to use LR
+ // as a temporary while setting up the frame record, so it can not be used as
+ // a live-in to the function.
+ bool LRAvailable =
+ !(Subtarget->isThumb1Only() && ARI.getFrameRegister(MF) == ARM::R11);
+
EVT VT = Op.getValueType();
SDLoc dl(Op);
unsigned Depth = Op.getConstantOperandVal(0);
- if (Depth) {
+ if (Depth || !LRAvailable) {
SDValue FrameAddr = LowerFRAMEADDR(Op, DAG);
SDValue Offset = DAG.getConstant(4, dl, MVT::i32);
return DAG.getLoad(VT, dl, DAG.getEntryNode(),
@@ -10667,7 +10677,7 @@ SDValue ARMTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::FP_TO_SINT_SAT:
case ISD::FP_TO_UINT_SAT: return LowerFP_TO_INT_SAT(Op, DAG, Subtarget);
case ISD::FCOPYSIGN: return LowerFCOPYSIGN(Op, DAG);
- case ISD::RETURNADDR: return LowerRETURNADDR(Op, DAG);
+ case ISD::RETURNADDR: return LowerRETURNADDR(Op, DAG, Subtarget);
case ISD::FRAMEADDR: return LowerFRAMEADDR(Op, DAG);
case ISD::EH_SJLJ_SETJMP: return LowerEH_SJLJ_SETJMP(Op, DAG);
case ISD::EH_SJLJ_LONGJMP: return LowerEH_SJLJ_LONGJMP(Op, DAG);
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.h b/llvm/lib/Target/ARM/ARMISelLowering.h
index 344a0ad91e5178..54f9c643fc87d7 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.h
+++ b/llvm/lib/Target/ARM/ARMISelLowering.h
@@ -854,7 +854,8 @@ class VectorType;
SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFCOPYSIGN(SDValue Op, SelectionDAG &DAG) const;
- SDValue LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const;
+ SDValue LowerRETURNADDR(SDValue Op, SelectionDAG &DAG,
+ const ARMSubtarget *Subtarget) const;
SDValue LowerFRAMEADDR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerShiftRightParts(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/test/CodeGen/Thumb/returnaddress.ll b/llvm/test/CodeGen/Thumb/returnaddress.ll
new file mode 100644
index 00000000000000..15cd138bda78b4
--- /dev/null
+++ b/llvm/test/CodeGen/Thumb/returnaddress.ll
@@ -0,0 +1,309 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=thumbv6m-none-eabi -frame-pointer=none -verify-machineinstrs | FileCheck %s --check-prefix=FP-NONE
+; RUN: llc < %s -mtriple=thumbv6m-none-eabi -frame-pointer=all -verify-machineinstrs | FileCheck %s --check-prefix=FP-ALL
+; RUN: llc < %s -mtriple=thumbv6m-none-eabi -frame-pointer=all -mattr=+aapcs-frame-chain -verify-machineinstrs | FileCheck %s --check-prefix=FP-AAPCS
+
+define void @ra_call() {
+; FP-NONE-LABEL: ra_call:
+; FP-NONE: @ %bb.0: @ %entry
+; FP-NONE-NEXT: .save {r7, lr}
+; FP-NONE-NEXT: push {r7, lr}
+; FP-NONE-NEXT: mov r0, lr
+; FP-NONE-NEXT: bl sink_ptr
+; FP-NONE-NEXT: pop {r7, pc}
+;
+; FP-ALL-LABEL: ra_call:
+; FP-ALL: @ %bb.0: @ %entry
+; FP-ALL-NEXT: .save {r7, lr}
+; FP-ALL-NEXT: push {r7, lr}
+; FP-ALL-NEXT: .setfp r7, sp
+; FP-ALL-NEXT: add r7, sp, #0
+; FP-ALL-NEXT: mov r0, lr
+; FP-ALL-NEXT: bl sink_ptr
+; FP-ALL-NEXT: pop {r7, pc}
+;
+; FP-AAPCS-LABEL: ra_call:
+; FP-AAPCS: @ %bb.0: @ %entry
+; FP-AAPCS-NEXT: .save {lr}
+; FP-AAPCS-NEXT: push {lr}
+; FP-AAPCS-NEXT: mov lr, r11
+; FP-AAPCS-NEXT: .save {r11}
+; FP-AAPCS-NEXT: push {lr}
+; FP-AAPCS-NEXT: .setfp r11, sp
+; FP-AAPCS-NEXT: mov r11, sp
+; FP-AAPCS-NEXT: mov r0, r11
+; FP-AAPCS-NEXT: ldr r0, [r0, #4]
+; FP-AAPCS-NEXT: bl sink_ptr
+; FP-AAPCS-NEXT: pop {r0}
+; FP-AAPCS-NEXT: mov r11, r0
+; FP-AAPCS-NEXT: pop {pc}
+entry:
+ %r = tail call ptr @llvm.returnaddress(i32 0)
+ tail call void @sink_ptr(ptr %r)
+ ret void
+}
+
+define ptr @ra_return() {
+; FP-NONE-LABEL: ra_return:
+; FP-NONE: @ %bb.0: @ %entry
+; FP-NONE-NEXT: mov r0, lr
+; FP-NONE-NEXT: bx lr
+;
+; FP-ALL-LABEL: ra_return:
+; FP-ALL: @ %bb.0: @ %entry
+; FP-ALL-NEXT: .save {r7, lr}
+; FP-ALL-NEXT: push {r7, lr}
+; FP-ALL-NEXT: .setfp r7, sp
+; FP-ALL-NEXT: add r7, sp, #0
+; FP-ALL-NEXT: mov r0, lr
+; FP-ALL-NEXT: pop {r7, pc}
+;
+; FP-AAPCS-LABEL: ra_return:
+; FP-AAPCS: @ %bb.0: @ %entry
+; FP-AAPCS-NEXT: .save {lr}
+; FP-AAPCS-NEXT: push {lr}
+; FP-AAPCS-NEXT: mov lr, r11
+; FP-AAPCS-NEXT: .save {r11}
+; FP-AAPCS-NEXT: push {lr}
+; FP-AAPCS-NEXT: .setfp r11, sp
+; FP-AAPCS-NEXT: mov r11, sp
+; FP-AAPCS-NEXT: mov r0, r11
+; FP-AAPCS-NEXT: ldr r0, [r0, #4]
+; FP-AAPCS-NEXT: pop {r1}
+; FP-AAPCS-NEXT: mov r11, r1
+; FP-AAPCS-NEXT: pop {pc}
+entry:
+ %r = tail call ptr @llvm.returnaddress(i32 0)
+ ret ptr %r
+}
+
+define ptr @callee_saved_low() {
+; FP-NONE-LABEL: callee_saved_low:
+; FP-NONE: @ %bb.0: @ %entry
+; FP-NONE-NEXT: .save {r4, r5, r7, lr}
+; FP-NONE-NEXT: push {r4, r5, r7, lr}
+; FP-NONE-NEXT: mov r0, lr
+; FP-NONE-NEXT: @APP
+; FP-NONE-NEXT: @NO_APP
+; FP-NONE-NEXT: pop {r4, r5, r7, pc}
+;
+; FP-ALL-LABEL: callee_saved_low:
+; FP-ALL: @ %bb.0: @ %entry
+; FP-ALL-NEXT: .save {r4, r5, r7, lr}
+; FP-ALL-NEXT: push {r4, r5, r7, lr}
+; FP-ALL-NEXT: .setfp r7, sp, #8
+; FP-ALL-NEXT: add r7, sp, #8
+; FP-ALL-NEXT: mov r0, lr
+; FP-ALL-NEXT: @APP
+; FP-ALL-NEXT: @NO_APP
+; FP-ALL-NEXT: pop {r4, r5, r7, pc}
+;
+; FP-AAPCS-LABEL: callee_saved_low:
+; FP-AAPCS: @ %bb.0: @ %entry
+; FP-AAPCS-NEXT: .save {lr}
+; FP-AAPCS-NEXT: push {lr}
+; FP-AAPCS-NEXT: mov lr, r11
+; FP-AAPCS-NEXT: .save {r11}
+; FP-AAPCS-NEXT: push {lr}
+; FP-AAPCS-NEXT: .setfp r11, sp
+; FP-AAPCS-NEXT: mov r11, sp
+; FP-AAPCS-NEXT: .save {r4, r5}
+; FP-AAPCS-NEXT: push {r4, r5}
+; FP-AAPCS-NEXT: mov r0, r11
+; FP-AAPCS-NEXT: ldr r0, [r0, #4]
+; FP-AAPCS-NEXT: @APP
+; FP-AAPCS-NEXT: @NO_APP
+; FP-AAPCS-NEXT: pop {r4, r5}
+; FP-AAPCS-NEXT: pop {r1}
+; FP-AAPCS-NEXT: mov r11, r1
+; FP-AAPCS-NEXT: pop {pc}
+entry:
+ call void asm sideeffect "", "~{r4},~{r5}"()
+ %r = tail call ptr @llvm.returnaddress(i32 0)
+ ret ptr %r
+}
+
+define ptr @callee_saved_high() {
+; FP-NONE-LABEL: callee_saved_high:
+; FP-NONE: @ %bb.0: @ %entry
+; FP-NONE-NEXT: mov r3, r9
+; FP-NONE-NEXT: mov r2, r8
+; FP-NONE-NEXT: .save {r8, r9}
+; FP-NONE-NEXT: push {r2, r3}
+; FP-NONE-NEXT: mov r0, lr
+; FP-NONE-NEXT: @APP
+; FP-NONE-NEXT: @NO_APP
+; FP-NONE-NEXT: pop {r1, r2}
+; FP-NONE-NEXT: mov r8, r1
+; FP-NONE-NEXT: mov r9, r2
+; FP-NONE-NEXT: bx lr
+;
+; FP-ALL-LABEL: callee_saved_high:
+; FP-ALL: @ %bb.0: @ %entry
+; FP-ALL-NEXT: .save {r7, lr}
+; FP-ALL-NEXT: push {r7, lr}
+; FP-ALL-NEXT: .setfp r7, sp
+; FP-ALL-NEXT: add r7, sp, #0
+; FP-ALL-NEXT: mov r3, r9
+; FP-ALL-NEXT: mov r2, r8
+; FP-ALL-NEXT: .save {r8, r9}
+; FP-ALL-NEXT: push {r2, r3}
+; FP-ALL-NEXT: mov r0, lr
+; FP-ALL-NEXT: @APP
+; FP-ALL-NEXT: @NO_APP
+; FP-ALL-NEXT: pop {r1, r2}
+; FP-ALL-NEXT: mov r8, r1
+; FP-ALL-NEXT: mov r9, r2
+; FP-ALL-NEXT: pop {r7, pc}
+;
+; FP-AAPCS-LABEL: callee_saved_high:
+; FP-AAPCS: @ %bb.0: @ %entry
+; FP-AAPCS-NEXT: .save {lr}
+; FP-AAPCS-NEXT: push {lr}
+; FP-AAPCS-NEXT: mov lr, r11
+; FP-AAPCS-NEXT: .save {r11}
+; FP-AAPCS-NEXT: push {lr}
+; FP-AAPCS-NEXT: .setfp r11, sp
+; FP-AAPCS-NEXT: mov r11, sp
+; FP-AAPCS-NEXT: mov r3, r9
+; FP-AAPCS-NEXT: mov r2, r8
+; FP-AAPCS-NEXT: .save {r8, r9}
+; FP-AAPCS-NEXT: push {r2, r3}
+; FP-AAPCS-NEXT: mov r0, r11
+; FP-AAPCS-NEXT: ldr r0, [r0, #4]
+; FP-AAPCS-NEXT: @APP
+; FP-AAPCS-NEXT: @NO_APP
+; FP-AAPCS-NEXT: pop {r1, r2}
+; FP-AAPCS-NEXT: mov r8, r1
+; FP-AAPCS-NEXT: mov r9, r2
+; FP-AAPCS-NEXT: pop {r1}
+; FP-AAPCS-NEXT: mov r11, r1
+; FP-AAPCS-NEXT: pop {pc}
+entry:
+ call void asm sideeffect "", "~{r8},~{r9}"()
+ %r = tail call ptr @llvm.returnaddress(i32 0)
+ ret ptr %r
+}
+
+define ptr @large_alloca() {
+; FP-NONE-LABEL: large_alloca:
+; FP-NONE: @ %bb.0: @ %entry
+; FP-NONE-NEXT: .save {r4, r5, r6, lr}
+; FP-NONE-NEXT: push {r4, r5, r6, lr}
+; FP-NONE-NEXT: ldr r6, .LCPI4_0
+; FP-NONE-NEXT: .pad #2000
+; FP-NONE-NEXT: add sp, r6
+; FP-NONE-NEXT: mov r4, lr
+; FP-NONE-NEXT: mov r0, sp
+; FP-NONE-NEXT: bl sink_ptr
+; FP-NONE-NEXT: mov r0, r4
+; FP-NONE-NEXT: ldr r6, .LCPI4_1
+; FP-NONE-NEXT: add sp, r6
+; FP-NONE-NEXT: pop {r4, r5, r6, pc}
+; FP-NONE-NEXT: .p2align 2
+; FP-NONE-NEXT: @ %bb.1:
+; FP-NONE-NEXT: .LCPI4_0:
+; FP-NONE-NEXT: .long 4294965296 @ 0xfffff830
+; FP-NONE-NEXT: .LCPI4_1:
+; FP-NONE-NEXT: .long 2000 @ 0x7d0
+;
+; FP-ALL-LABEL: large_alloca:
+; FP-ALL: @ %bb.0: @ %entry
+; FP-ALL-NEXT: .save {r4, r6, r7, lr}
+; FP-ALL-NEXT: push {r4, r6, r7, lr}
+; FP-ALL-NEXT: .setfp r7, sp, #8
+; FP-ALL-NEXT: add r7, sp, #8
+; FP-ALL-NEXT: ldr r6, .LCPI4_0
+; FP-ALL-NEXT: .pad #2000
+; FP-ALL-NEXT: add sp, r6
+; FP-ALL-NEXT: mov r4, lr
+; FP-ALL-NEXT: mov r0, sp
+; FP-ALL-NEXT: bl sink_ptr
+; FP-ALL-NEXT: mov r0, r4
+; FP-ALL-NEXT: subs r6, r7, #7
+; FP-ALL-NEXT: subs r6, #1
+; FP-ALL-NEXT: mov sp, r6
+; FP-ALL-NEXT: pop {r4, r6, r7, pc}
+; FP-ALL-NEXT: .p2align 2
+; FP-ALL-NEXT: @ %bb.1:
+; FP-ALL-NEXT: .LCPI4_0:
+; FP-ALL-NEXT: .long 4294965296 @ 0xfffff830
+;
+; FP-AAPCS-LABEL: large_alloca:
+; FP-AAPCS: @ %bb.0: @ %entry
+; FP-AAPCS-NEXT: .save {lr}
+; FP-AAPCS-NEXT: push {lr}
+; FP-AAPCS-NEXT: mov lr, r11
+; FP-AAPCS-NEXT: .save {r11}
+; FP-AAPCS-NEXT: push {lr}
+; FP-AAPCS-NEXT: .setfp r11, sp
+; FP-AAPCS-NEXT: mov r11, sp
+; FP-AAPCS-NEXT: .save {r4, r7}
+; FP-AAPCS-NEXT: push {r4, r7}
+; FP-AAPCS-NEXT: ldr r7, .LCPI4_0
+; FP-AAPCS-NEXT: .pad #2000
+; FP-AAPCS-NEXT: add sp, r7
+; FP-AAPCS-NEXT: mov r0, sp
+; FP-AAPCS-NEXT: bl sink_ptr
+; FP-AAPCS-NEXT: mov r0, r11
+; FP-AAPCS-NEXT: ldr r0, [r0, #4]
+; FP-AAPCS-NEXT: mov r7, r11
+; FP-AAPCS-NEXT: subs r7, #8
+; FP-AAPCS-NEXT: mov sp, r7
+; FP-AAPCS-NEXT: pop {r4, r7}
+; FP-AAPCS-NEXT: pop {r1}
+; FP-AAPCS-NEXT: mov r11, r1
+; FP-AAPCS-NEXT: pop {pc}
+; FP-AAPCS-NEXT: .p2align 2
+; FP-AAPCS-NEXT: @ %bb.1:
+; FP-AAPCS-NEXT: .LCPI4_0:
+; FP-AAPCS-NEXT: .long 4294965296 @ 0xfffff830
+entry:
+ %big = alloca i8, i32 2000
+ tail call void @sink_ptr(ptr %big)
+ %r = tail call ptr @llvm.returnaddress(i32 0)
+ ret ptr %r
+}
+
+define ptr @ra_depth_1() {
+; FP-NONE-LABEL: ra_depth_1:
+; FP-NONE: @ %bb.0: @ %entry
+; FP-NONE-NEXT: .save {r7, lr}
+; FP-NONE-NEXT: push {r7, lr}
+; FP-NONE-NEXT: .setfp r7, sp
+; FP-NONE-NEXT: add r7, sp, #0
+; FP-NONE-NEXT: ldr r0, [r7]
+; FP-NONE-NEXT: ldr r0, [r0, #4]
+; FP-NONE-NEXT: pop {r7, pc}
+;
+; FP-ALL-LABEL: ra_depth_1:
+; FP-ALL: @ %bb.0: @ %entry
+; FP-ALL-NEXT: .save {r7, lr}
+; FP-ALL-NEXT: push {r7, lr}
+; FP-ALL-NEXT: .setfp r7, sp
+; FP-ALL-NEXT: add r7, sp, #0
+; FP-ALL-NEXT: ldr r0, [r7]
+; FP-ALL-NEXT: ldr r0, [r0, #4]
+; FP-ALL-NEXT: pop {r7, pc}
+;
+; FP-AAPCS-LABEL: ra_depth_1:
+; FP-AAPCS: @ %bb.0: @ %entry
+; FP-AAPCS-NEXT: .save {lr}
+; FP-AAPCS-NEXT: push {lr}
+; FP-AAPCS-NEXT: mov lr, r11
+; FP-AAPCS-NEXT: .save {r11}
+; FP-AAPCS-NEXT: push {lr}
+; FP-AAPCS-NEXT: .setfp r11, sp
+; FP-AAPCS-NEXT: mov r11, sp
+; FP-AAPCS-NEXT: mov r0, r11
+; FP-AAPCS-NEXT: ldr r0, [r0]
+; FP-AAPCS-NEXT: ldr r0, [r0, #4]
+; FP-AAPCS-NEXT: pop {r1}
+; FP-AAPCS-NEXT: mov r11, r1
+; FP-AAPCS-NEXT: pop {pc}
+entry:
+ %r = tail call ptr @llvm.returnaddress(i32 1)
+ ret ptr %r
+}
+
+declare void @sink_ptr(ptr)
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
I'm concerned that this is fragile: isel is guessing what framelowering is going to do, and if it guesses wrong, you get a miscompile. And there's no obvious connection between the bits of code in question. Either framelowering should preserve LR by using a different code sequence if isReturnAddressTaken() is set, or the decision on how to lower returnaddress should be deferred to framelowering (using a pseudo-instruction). |
When the llvm.returnaddress intrinsic is used, the LR is marked as live-in to the function, so it must be preserved through the prologue. This is normally fine, but there is one case for Thumb1 where we use LR as a temporary in the prologue to set up a frame chain using r11 as the frame pointer. There are no other registers guaranteed to be free to do this, so we have to re-load LR from the stack after pushing the callee saved registers.
When setting up frame records using r11 as the frame pointer for Thumb1, we currently use LR as a temporary. This requires us to re-load it in cases where LR is live into the function, so it is better to use an argument register if one is free.
1f6e523
to
75a3296
Compare
I don't think the decision can be deferred to frame lowering time, because the two choices differ in whether a register value needs to be preserved from entry to the Instead I've gone with preserving the value of LR in the prologue, which sometimes requires re-loading it from the stack, but that can be optimised by picking a different register in most cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the pseudo, I was thinking it would be sort of opportunistic: if the return address happens to be available in LR, then mov from LR, else load it from the stack. (If it's not in LR, it's almost always going to be somewhere on the stack.) I guess that's potentially less efficient in some cases, but probably fine for Thumb1.
In any case, LGTM.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/171/builds/11255 Here is the relevant piece of the build log for the reference
|
That buildbot has been failing intermittently in the same way for the last 6 hours, so ignoring it. |
When using R11 as the frame pointer for a Thumb1 target, we need to use LR as a temporary in the prologue when setting up the frame record, like this:
This means that the value in LR is clobbered by the prologue, so we can't use it as a live-in value to the function to implement the llvm.returnaddress intrinsic. I don't think there's any way to avoid this use if LR without making the prologue code even more complicated. Instead, we can load the return address from the frame record, as we do when the intrinsic is used with a non-zero depth.