Skip to content

[ARM] Fix llvm.returnaddress for Thumb1 with R11 frame-pointer #117735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 28, 2024

Conversation

ostannard
Copy link
Collaborator

When using R11 as the frame pointer for a Thumb1 target, we need to use LR as a temporary in the prologue when setting up the frame record, like this:

  push {lr}
  mov lr, r11
  push {r11}
  mov r11, sp

This means that the value in LR is clobbered by the prologue, so we can't use it as a live-in value to the function to implement the llvm.returnaddress intrinsic. I don't think there's any way to avoid this use if LR without making the prologue code even more complicated. Instead, we can load the return address from the frame record, as we do when the intrinsic is used with a non-zero depth.

@llvmbot
Copy link
Member

llvmbot commented Nov 26, 2024

@llvm/pr-subscribers-backend-arm

Author: Oliver Stannard (ostannard)

Changes

When using R11 as the frame pointer for a Thumb1 target, we need to use LR as a temporary in the prologue when setting up the frame record, like this:

  push {lr}
  mov lr, r11
  push {r11}
  mov r11, sp

This means that the value in LR is clobbered by the prologue, so we can't use it as a live-in value to the function to implement the llvm.returnaddress intrinsic. I don't think there's any way to avoid this use if LR without making the prologue code even more complicated. Instead, we can load the return address from the frame record, as we do when the intrinsic is used with a non-zero depth.


Full diff: https://github.com/llvm/llvm-project/pull/117735.diff

3 Files Affected:

  • (modified) llvm/lib/Target/ARM/ARMISelLowering.cpp (+13-3)
  • (modified) llvm/lib/Target/ARM/ARMISelLowering.h (+2-1)
  • (added) llvm/test/CodeGen/Thumb/returnaddress.ll (+309)
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index 6b290135c5bcba..6032cac473c137 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -6196,18 +6196,28 @@ SDValue ARMTargetLowering::LowerFCOPYSIGN(SDValue Op, SelectionDAG &DAG) const {
   return DAG.getNode(ARMISD::VMOVDRR, dl, MVT::f64, Lo, Hi);
 }
 
-SDValue ARMTargetLowering::LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const{
+SDValue
+ARMTargetLowering::LowerRETURNADDR(SDValue Op, SelectionDAG &DAG,
+                                   const ARMSubtarget *Subtarget) const {
   MachineFunction &MF = DAG.getMachineFunction();
   MachineFrameInfo &MFI = MF.getFrameInfo();
+  const ARMBaseRegisterInfo &ARI =
+    *static_cast<const ARMBaseRegisterInfo*>(RegInfo);
   MFI.setReturnAddressIsTaken(true);
 
   if (verifyReturnAddressArgumentIsConstant(Op, DAG))
     return SDValue();
 
+  // When the frame register is R11 for a Thumb1-only target, we need to use LR
+  // as a temporary while setting up the frame record, so it can not be used as
+  // a live-in to the function.
+  bool LRAvailable =
+      !(Subtarget->isThumb1Only() && ARI.getFrameRegister(MF) == ARM::R11);
+
   EVT VT = Op.getValueType();
   SDLoc dl(Op);
   unsigned Depth = Op.getConstantOperandVal(0);
-  if (Depth) {
+  if (Depth || !LRAvailable) {
     SDValue FrameAddr = LowerFRAMEADDR(Op, DAG);
     SDValue Offset = DAG.getConstant(4, dl, MVT::i32);
     return DAG.getLoad(VT, dl, DAG.getEntryNode(),
@@ -10667,7 +10677,7 @@ SDValue ARMTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
   case ISD::FP_TO_SINT_SAT:
   case ISD::FP_TO_UINT_SAT: return LowerFP_TO_INT_SAT(Op, DAG, Subtarget);
   case ISD::FCOPYSIGN:     return LowerFCOPYSIGN(Op, DAG);
-  case ISD::RETURNADDR:    return LowerRETURNADDR(Op, DAG);
+  case ISD::RETURNADDR:    return LowerRETURNADDR(Op, DAG, Subtarget);
   case ISD::FRAMEADDR:     return LowerFRAMEADDR(Op, DAG);
   case ISD::EH_SJLJ_SETJMP: return LowerEH_SJLJ_SETJMP(Op, DAG);
   case ISD::EH_SJLJ_LONGJMP: return LowerEH_SJLJ_LONGJMP(Op, DAG);
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.h b/llvm/lib/Target/ARM/ARMISelLowering.h
index 344a0ad91e5178..54f9c643fc87d7 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.h
+++ b/llvm/lib/Target/ARM/ARMISelLowering.h
@@ -854,7 +854,8 @@ class VectorType;
     SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;
     SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;
     SDValue LowerFCOPYSIGN(SDValue Op, SelectionDAG &DAG) const;
-    SDValue LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const;
+    SDValue LowerRETURNADDR(SDValue Op, SelectionDAG &DAG,
+                            const ARMSubtarget *Subtarget) const;
     SDValue LowerFRAMEADDR(SDValue Op, SelectionDAG &DAG) const;
     SDValue LowerShiftRightParts(SDValue Op, SelectionDAG &DAG) const;
     SDValue LowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/test/CodeGen/Thumb/returnaddress.ll b/llvm/test/CodeGen/Thumb/returnaddress.ll
new file mode 100644
index 00000000000000..15cd138bda78b4
--- /dev/null
+++ b/llvm/test/CodeGen/Thumb/returnaddress.ll
@@ -0,0 +1,309 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=thumbv6m-none-eabi -frame-pointer=none                          -verify-machineinstrs | FileCheck %s --check-prefix=FP-NONE
+; RUN: llc < %s -mtriple=thumbv6m-none-eabi -frame-pointer=all                           -verify-machineinstrs | FileCheck %s --check-prefix=FP-ALL
+; RUN: llc < %s -mtriple=thumbv6m-none-eabi -frame-pointer=all -mattr=+aapcs-frame-chain -verify-machineinstrs | FileCheck %s --check-prefix=FP-AAPCS
+
+define void @ra_call() {
+; FP-NONE-LABEL: ra_call:
+; FP-NONE:       @ %bb.0: @ %entry
+; FP-NONE-NEXT:    .save {r7, lr}
+; FP-NONE-NEXT:    push {r7, lr}
+; FP-NONE-NEXT:    mov r0, lr
+; FP-NONE-NEXT:    bl sink_ptr
+; FP-NONE-NEXT:    pop {r7, pc}
+;
+; FP-ALL-LABEL: ra_call:
+; FP-ALL:       @ %bb.0: @ %entry
+; FP-ALL-NEXT:    .save {r7, lr}
+; FP-ALL-NEXT:    push {r7, lr}
+; FP-ALL-NEXT:    .setfp r7, sp
+; FP-ALL-NEXT:    add r7, sp, #0
+; FP-ALL-NEXT:    mov r0, lr
+; FP-ALL-NEXT:    bl sink_ptr
+; FP-ALL-NEXT:    pop {r7, pc}
+;
+; FP-AAPCS-LABEL: ra_call:
+; FP-AAPCS:       @ %bb.0: @ %entry
+; FP-AAPCS-NEXT:    .save {lr}
+; FP-AAPCS-NEXT:    push {lr}
+; FP-AAPCS-NEXT:    mov lr, r11
+; FP-AAPCS-NEXT:    .save {r11}
+; FP-AAPCS-NEXT:    push {lr}
+; FP-AAPCS-NEXT:    .setfp r11, sp
+; FP-AAPCS-NEXT:    mov r11, sp
+; FP-AAPCS-NEXT:    mov r0, r11
+; FP-AAPCS-NEXT:    ldr r0, [r0, #4]
+; FP-AAPCS-NEXT:    bl sink_ptr
+; FP-AAPCS-NEXT:    pop {r0}
+; FP-AAPCS-NEXT:    mov r11, r0
+; FP-AAPCS-NEXT:    pop {pc}
+entry:
+  %r = tail call ptr @llvm.returnaddress(i32 0)
+  tail call void @sink_ptr(ptr %r)
+  ret void
+}
+
+define ptr @ra_return() {
+; FP-NONE-LABEL: ra_return:
+; FP-NONE:       @ %bb.0: @ %entry
+; FP-NONE-NEXT:    mov r0, lr
+; FP-NONE-NEXT:    bx lr
+;
+; FP-ALL-LABEL: ra_return:
+; FP-ALL:       @ %bb.0: @ %entry
+; FP-ALL-NEXT:    .save {r7, lr}
+; FP-ALL-NEXT:    push {r7, lr}
+; FP-ALL-NEXT:    .setfp r7, sp
+; FP-ALL-NEXT:    add r7, sp, #0
+; FP-ALL-NEXT:    mov r0, lr
+; FP-ALL-NEXT:    pop {r7, pc}
+;
+; FP-AAPCS-LABEL: ra_return:
+; FP-AAPCS:       @ %bb.0: @ %entry
+; FP-AAPCS-NEXT:    .save {lr}
+; FP-AAPCS-NEXT:    push {lr}
+; FP-AAPCS-NEXT:    mov lr, r11
+; FP-AAPCS-NEXT:    .save {r11}
+; FP-AAPCS-NEXT:    push {lr}
+; FP-AAPCS-NEXT:    .setfp r11, sp
+; FP-AAPCS-NEXT:    mov r11, sp
+; FP-AAPCS-NEXT:    mov r0, r11
+; FP-AAPCS-NEXT:    ldr r0, [r0, #4]
+; FP-AAPCS-NEXT:    pop {r1}
+; FP-AAPCS-NEXT:    mov r11, r1
+; FP-AAPCS-NEXT:    pop {pc}
+entry:
+  %r = tail call ptr @llvm.returnaddress(i32 0)
+  ret ptr %r
+}
+
+define ptr @callee_saved_low() {
+; FP-NONE-LABEL: callee_saved_low:
+; FP-NONE:       @ %bb.0: @ %entry
+; FP-NONE-NEXT:    .save {r4, r5, r7, lr}
+; FP-NONE-NEXT:    push {r4, r5, r7, lr}
+; FP-NONE-NEXT:    mov r0, lr
+; FP-NONE-NEXT:    @APP
+; FP-NONE-NEXT:    @NO_APP
+; FP-NONE-NEXT:    pop {r4, r5, r7, pc}
+;
+; FP-ALL-LABEL: callee_saved_low:
+; FP-ALL:       @ %bb.0: @ %entry
+; FP-ALL-NEXT:    .save {r4, r5, r7, lr}
+; FP-ALL-NEXT:    push {r4, r5, r7, lr}
+; FP-ALL-NEXT:    .setfp r7, sp, #8
+; FP-ALL-NEXT:    add r7, sp, #8
+; FP-ALL-NEXT:    mov r0, lr
+; FP-ALL-NEXT:    @APP
+; FP-ALL-NEXT:    @NO_APP
+; FP-ALL-NEXT:    pop {r4, r5, r7, pc}
+;
+; FP-AAPCS-LABEL: callee_saved_low:
+; FP-AAPCS:       @ %bb.0: @ %entry
+; FP-AAPCS-NEXT:    .save {lr}
+; FP-AAPCS-NEXT:    push {lr}
+; FP-AAPCS-NEXT:    mov lr, r11
+; FP-AAPCS-NEXT:    .save {r11}
+; FP-AAPCS-NEXT:    push {lr}
+; FP-AAPCS-NEXT:    .setfp r11, sp
+; FP-AAPCS-NEXT:    mov r11, sp
+; FP-AAPCS-NEXT:    .save {r4, r5}
+; FP-AAPCS-NEXT:    push {r4, r5}
+; FP-AAPCS-NEXT:    mov r0, r11
+; FP-AAPCS-NEXT:    ldr r0, [r0, #4]
+; FP-AAPCS-NEXT:    @APP
+; FP-AAPCS-NEXT:    @NO_APP
+; FP-AAPCS-NEXT:    pop {r4, r5}
+; FP-AAPCS-NEXT:    pop {r1}
+; FP-AAPCS-NEXT:    mov r11, r1
+; FP-AAPCS-NEXT:    pop {pc}
+entry:
+  call void asm sideeffect "", "~{r4},~{r5}"()
+  %r = tail call ptr @llvm.returnaddress(i32 0)
+  ret ptr %r
+}
+
+define ptr @callee_saved_high() {
+; FP-NONE-LABEL: callee_saved_high:
+; FP-NONE:       @ %bb.0: @ %entry
+; FP-NONE-NEXT:    mov r3, r9
+; FP-NONE-NEXT:    mov r2, r8
+; FP-NONE-NEXT:    .save {r8, r9}
+; FP-NONE-NEXT:    push {r2, r3}
+; FP-NONE-NEXT:    mov r0, lr
+; FP-NONE-NEXT:    @APP
+; FP-NONE-NEXT:    @NO_APP
+; FP-NONE-NEXT:    pop {r1, r2}
+; FP-NONE-NEXT:    mov r8, r1
+; FP-NONE-NEXT:    mov r9, r2
+; FP-NONE-NEXT:    bx lr
+;
+; FP-ALL-LABEL: callee_saved_high:
+; FP-ALL:       @ %bb.0: @ %entry
+; FP-ALL-NEXT:    .save {r7, lr}
+; FP-ALL-NEXT:    push {r7, lr}
+; FP-ALL-NEXT:    .setfp r7, sp
+; FP-ALL-NEXT:    add r7, sp, #0
+; FP-ALL-NEXT:    mov r3, r9
+; FP-ALL-NEXT:    mov r2, r8
+; FP-ALL-NEXT:    .save {r8, r9}
+; FP-ALL-NEXT:    push {r2, r3}
+; FP-ALL-NEXT:    mov r0, lr
+; FP-ALL-NEXT:    @APP
+; FP-ALL-NEXT:    @NO_APP
+; FP-ALL-NEXT:    pop {r1, r2}
+; FP-ALL-NEXT:    mov r8, r1
+; FP-ALL-NEXT:    mov r9, r2
+; FP-ALL-NEXT:    pop {r7, pc}
+;
+; FP-AAPCS-LABEL: callee_saved_high:
+; FP-AAPCS:       @ %bb.0: @ %entry
+; FP-AAPCS-NEXT:    .save {lr}
+; FP-AAPCS-NEXT:    push {lr}
+; FP-AAPCS-NEXT:    mov lr, r11
+; FP-AAPCS-NEXT:    .save {r11}
+; FP-AAPCS-NEXT:    push {lr}
+; FP-AAPCS-NEXT:    .setfp r11, sp
+; FP-AAPCS-NEXT:    mov r11, sp
+; FP-AAPCS-NEXT:    mov r3, r9
+; FP-AAPCS-NEXT:    mov r2, r8
+; FP-AAPCS-NEXT:    .save {r8, r9}
+; FP-AAPCS-NEXT:    push {r2, r3}
+; FP-AAPCS-NEXT:    mov r0, r11
+; FP-AAPCS-NEXT:    ldr r0, [r0, #4]
+; FP-AAPCS-NEXT:    @APP
+; FP-AAPCS-NEXT:    @NO_APP
+; FP-AAPCS-NEXT:    pop {r1, r2}
+; FP-AAPCS-NEXT:    mov r8, r1
+; FP-AAPCS-NEXT:    mov r9, r2
+; FP-AAPCS-NEXT:    pop {r1}
+; FP-AAPCS-NEXT:    mov r11, r1
+; FP-AAPCS-NEXT:    pop {pc}
+entry:
+  call void asm sideeffect "", "~{r8},~{r9}"()
+  %r = tail call ptr @llvm.returnaddress(i32 0)
+  ret ptr %r
+}
+
+define ptr @large_alloca() {
+; FP-NONE-LABEL: large_alloca:
+; FP-NONE:       @ %bb.0: @ %entry
+; FP-NONE-NEXT:    .save {r4, r5, r6, lr}
+; FP-NONE-NEXT:    push {r4, r5, r6, lr}
+; FP-NONE-NEXT:    ldr r6, .LCPI4_0
+; FP-NONE-NEXT:    .pad #2000
+; FP-NONE-NEXT:    add sp, r6
+; FP-NONE-NEXT:    mov r4, lr
+; FP-NONE-NEXT:    mov r0, sp
+; FP-NONE-NEXT:    bl sink_ptr
+; FP-NONE-NEXT:    mov r0, r4
+; FP-NONE-NEXT:    ldr r6, .LCPI4_1
+; FP-NONE-NEXT:    add sp, r6
+; FP-NONE-NEXT:    pop {r4, r5, r6, pc}
+; FP-NONE-NEXT:    .p2align 2
+; FP-NONE-NEXT:  @ %bb.1:
+; FP-NONE-NEXT:  .LCPI4_0:
+; FP-NONE-NEXT:    .long 4294965296 @ 0xfffff830
+; FP-NONE-NEXT:  .LCPI4_1:
+; FP-NONE-NEXT:    .long 2000 @ 0x7d0
+;
+; FP-ALL-LABEL: large_alloca:
+; FP-ALL:       @ %bb.0: @ %entry
+; FP-ALL-NEXT:    .save {r4, r6, r7, lr}
+; FP-ALL-NEXT:    push {r4, r6, r7, lr}
+; FP-ALL-NEXT:    .setfp r7, sp, #8
+; FP-ALL-NEXT:    add r7, sp, #8
+; FP-ALL-NEXT:    ldr r6, .LCPI4_0
+; FP-ALL-NEXT:    .pad #2000
+; FP-ALL-NEXT:    add sp, r6
+; FP-ALL-NEXT:    mov r4, lr
+; FP-ALL-NEXT:    mov r0, sp
+; FP-ALL-NEXT:    bl sink_ptr
+; FP-ALL-NEXT:    mov r0, r4
+; FP-ALL-NEXT:    subs r6, r7, #7
+; FP-ALL-NEXT:    subs r6, #1
+; FP-ALL-NEXT:    mov sp, r6
+; FP-ALL-NEXT:    pop {r4, r6, r7, pc}
+; FP-ALL-NEXT:    .p2align 2
+; FP-ALL-NEXT:  @ %bb.1:
+; FP-ALL-NEXT:  .LCPI4_0:
+; FP-ALL-NEXT:    .long 4294965296 @ 0xfffff830
+;
+; FP-AAPCS-LABEL: large_alloca:
+; FP-AAPCS:       @ %bb.0: @ %entry
+; FP-AAPCS-NEXT:    .save {lr}
+; FP-AAPCS-NEXT:    push {lr}
+; FP-AAPCS-NEXT:    mov lr, r11
+; FP-AAPCS-NEXT:    .save {r11}
+; FP-AAPCS-NEXT:    push {lr}
+; FP-AAPCS-NEXT:    .setfp r11, sp
+; FP-AAPCS-NEXT:    mov r11, sp
+; FP-AAPCS-NEXT:    .save {r4, r7}
+; FP-AAPCS-NEXT:    push {r4, r7}
+; FP-AAPCS-NEXT:    ldr r7, .LCPI4_0
+; FP-AAPCS-NEXT:    .pad #2000
+; FP-AAPCS-NEXT:    add sp, r7
+; FP-AAPCS-NEXT:    mov r0, sp
+; FP-AAPCS-NEXT:    bl sink_ptr
+; FP-AAPCS-NEXT:    mov r0, r11
+; FP-AAPCS-NEXT:    ldr r0, [r0, #4]
+; FP-AAPCS-NEXT:    mov r7, r11
+; FP-AAPCS-NEXT:    subs r7, #8
+; FP-AAPCS-NEXT:    mov sp, r7
+; FP-AAPCS-NEXT:    pop {r4, r7}
+; FP-AAPCS-NEXT:    pop {r1}
+; FP-AAPCS-NEXT:    mov r11, r1
+; FP-AAPCS-NEXT:    pop {pc}
+; FP-AAPCS-NEXT:    .p2align 2
+; FP-AAPCS-NEXT:  @ %bb.1:
+; FP-AAPCS-NEXT:  .LCPI4_0:
+; FP-AAPCS-NEXT:    .long 4294965296 @ 0xfffff830
+entry:
+  %big = alloca i8, i32 2000
+  tail call void @sink_ptr(ptr %big)
+  %r = tail call ptr @llvm.returnaddress(i32 0)
+  ret ptr %r
+}
+
+define ptr @ra_depth_1() {
+; FP-NONE-LABEL: ra_depth_1:
+; FP-NONE:       @ %bb.0: @ %entry
+; FP-NONE-NEXT:    .save {r7, lr}
+; FP-NONE-NEXT:    push {r7, lr}
+; FP-NONE-NEXT:    .setfp r7, sp
+; FP-NONE-NEXT:    add r7, sp, #0
+; FP-NONE-NEXT:    ldr r0, [r7]
+; FP-NONE-NEXT:    ldr r0, [r0, #4]
+; FP-NONE-NEXT:    pop {r7, pc}
+;
+; FP-ALL-LABEL: ra_depth_1:
+; FP-ALL:       @ %bb.0: @ %entry
+; FP-ALL-NEXT:    .save {r7, lr}
+; FP-ALL-NEXT:    push {r7, lr}
+; FP-ALL-NEXT:    .setfp r7, sp
+; FP-ALL-NEXT:    add r7, sp, #0
+; FP-ALL-NEXT:    ldr r0, [r7]
+; FP-ALL-NEXT:    ldr r0, [r0, #4]
+; FP-ALL-NEXT:    pop {r7, pc}
+;
+; FP-AAPCS-LABEL: ra_depth_1:
+; FP-AAPCS:       @ %bb.0: @ %entry
+; FP-AAPCS-NEXT:    .save {lr}
+; FP-AAPCS-NEXT:    push {lr}
+; FP-AAPCS-NEXT:    mov lr, r11
+; FP-AAPCS-NEXT:    .save {r11}
+; FP-AAPCS-NEXT:    push {lr}
+; FP-AAPCS-NEXT:    .setfp r11, sp
+; FP-AAPCS-NEXT:    mov r11, sp
+; FP-AAPCS-NEXT:    mov r0, r11
+; FP-AAPCS-NEXT:    ldr r0, [r0]
+; FP-AAPCS-NEXT:    ldr r0, [r0, #4]
+; FP-AAPCS-NEXT:    pop {r1}
+; FP-AAPCS-NEXT:    mov r11, r1
+; FP-AAPCS-NEXT:    pop {pc}
+entry:
+  %r = tail call ptr @llvm.returnaddress(i32 1)
+  ret ptr %r
+}
+
+declare void @sink_ptr(ptr)

Copy link

github-actions bot commented Nov 26, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@efriedma-quic
Copy link
Collaborator

I'm concerned that this is fragile: isel is guessing what framelowering is going to do, and if it guesses wrong, you get a miscompile. And there's no obvious connection between the bits of code in question.

Either framelowering should preserve LR by using a different code sequence if isReturnAddressTaken() is set, or the decision on how to lower returnaddress should be deferred to framelowering (using a pseudo-instruction).

When the llvm.returnaddress intrinsic is used, the LR is marked as
live-in to the function, so it must be preserved through the prologue.
This is normally fine, but there is one case for Thumb1 where we use LR
as a temporary in the prologue to set up a frame chain using r11 as the
frame pointer. There are no other registers guaranteed to be free to do
this, so we have to re-load LR from the stack after pushing the callee
saved registers.
When setting up frame records using r11 as the frame pointer for Thumb1,
we currently use LR as a temporary. This requires us to re-load it in
cases where LR is live into the function, so it is better to use an
argument register if one is free.
@ostannard
Copy link
Collaborator Author

I don't think the decision can be deferred to frame lowering time, because the two choices differ in whether a register value needs to be preserved from entry to the llvm.returnaddr call, and frame lowering happens after register allocation. We could preserve and register-allocate that value and ignore it if the prologue can't preserve LR, but that seems wasteful if it doesn't end up being used.

Instead I've gone with preserving the value of LR in the prologue, which sometimes requires re-loading it from the stack, but that can be optimised by picking a different register in most cases.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the pseudo, I was thinking it would be sort of opportunistic: if the return address happens to be available in LR, then mov from LR, else load it from the stack. (If it's not in LR, it's almost always going to be somewhere on the stack.) I guess that's potentially less efficient in some cases, but probably fine for Thumb1.

In any case, LGTM.

@ostannard ostannard merged commit 0c0f765 into llvm:main Nov 28, 2024
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 28, 2024

LLVM Buildbot has detected a new failure on builder libc-x86_64-debian-fullbuild-dbg-asan running on libc-x86_64-debian-fullbuild while building llvm at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/171/builds/11255

Here is the relevant piece of the build log for the reference
Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py ...' (failure)
...
[==========] Running 4 tests from 1 test suite.
[ RUN      ] LlvmLibcHashTest.SanityCheck
[       OK ] LlvmLibcHashTest.SanityCheck (17 ms)
[ RUN      ] LlvmLibcHashTest.Avalanche
[       OK ] LlvmLibcHashTest.Avalanche (2125 ms)
[ RUN      ] LlvmLibcHashTest.UniformLSB
[       OK ] LlvmLibcHashTest.UniformLSB (202 ms)
[ RUN      ] LlvmLibcHashTest.UniformMSB
[       OK ] LlvmLibcHashTest.UniformMSB (143 us)
Ran 4 tests.  PASS: 4  FAIL: 0
command timed out: 1200 seconds without output running [b'python', b'../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py', b'--debug', b'--asan'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1258.220034
Step 8 (libc-unit-tests) failure: libc-unit-tests (failure)
...
[ RUN      ] LlvmLibcStrtoint64Test.InvalidBase
[       OK ] LlvmLibcStrtoint64Test.InvalidBase (20 us)
[ RUN      ] LlvmLibcStrtoint64Test.CleanBaseTenDecode
[       OK ] LlvmLibcStrtoint64Test.CleanBaseTenDecode (20 us)
[ RUN      ] LlvmLibcStrtoint64Test.MessyBaseTenDecode
[       OK ] LlvmLibcStrtoint64Test.MessyBaseTenDecode (8 us)
[ RUN      ] LlvmLibcStrtoint64Test.DecodeInOtherBases
[       OK ] LlvmLibcStrtoint64Test.DecodeInOtherBases (504 ms)
[ RUN      ] LlvmLibcStrtoint64Test.CleanBaseSixteenDecode
[       OK ] LlvmLibcStrtoint64Test.CleanBaseSixteenDecode (9 us)
[ RUN      ] LlvmLibcStrtoint64Test.MessyBaseSixteenDecode
[       OK ] LlvmLibcStrtoint64Test.MessyBaseSixteenDecode (4 us)
[ RUN      ] LlvmLibcStrtoint64Test.AutomaticBaseSelection
[       OK ] LlvmLibcStrtoint64Test.AutomaticBaseSelection (5 us)
[ RUN      ] LlvmLibcStrtouint64Test.InvalidBase
[       OK ] LlvmLibcStrtouint64Test.InvalidBase (2 us)
[ RUN      ] LlvmLibcStrtouint64Test.CleanBaseTenDecode
[       OK ] LlvmLibcStrtouint64Test.CleanBaseTenDecode (7 us)
[ RUN      ] LlvmLibcStrtouint64Test.MessyBaseTenDecode
[       OK ] LlvmLibcStrtouint64Test.MessyBaseTenDecode (6 us)
[ RUN      ] LlvmLibcStrtouint64Test.DecodeInOtherBases
[       OK ] LlvmLibcStrtouint64Test.DecodeInOtherBases (234 ms)
[ RUN      ] LlvmLibcStrtouint64Test.CleanBaseSixteenDecode
[       OK ] LlvmLibcStrtouint64Test.CleanBaseSixteenDecode (9 us)
[ RUN      ] LlvmLibcStrtouint64Test.MessyBaseSixteenDecode
[       OK ] LlvmLibcStrtouint64Test.MessyBaseSixteenDecode (4 us)
[ RUN      ] LlvmLibcStrtouint64Test.AutomaticBaseSelection
[       OK ] LlvmLibcStrtouint64Test.AutomaticBaseSelection (5 us)
Ran 14 tests.  PASS: 14  FAIL: 0
[1096/1098] Running unit test libc.test.src.time.nanosleep_test.__unit__
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcNanosleep.SmokeTest
[       OK ] LlvmLibcNanosleep.SmokeTest (56 us)
Ran 1 tests.  PASS: 1  FAIL: 0
[1097/1098] Running unit test libc.test.src.__support.hash_test.__unit__
[==========] Running 4 tests from 1 test suite.
[ RUN      ] LlvmLibcHashTest.SanityCheck
[       OK ] LlvmLibcHashTest.SanityCheck (17 ms)
[ RUN      ] LlvmLibcHashTest.Avalanche
[       OK ] LlvmLibcHashTest.Avalanche (2125 ms)
[ RUN      ] LlvmLibcHashTest.UniformLSB
[       OK ] LlvmLibcHashTest.UniformLSB (202 ms)
[ RUN      ] LlvmLibcHashTest.UniformMSB
[       OK ] LlvmLibcHashTest.UniformMSB (143 us)
Ran 4 tests.  PASS: 4  FAIL: 0

command timed out: 1200 seconds without output running [b'python', b'../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py', b'--debug', b'--asan'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1258.220034

@ostannard
Copy link
Collaborator Author

That buildbot has been failing intermittently in the same way for the last 6 hours, so ignoring it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants