Skip to content

[LoongArch] Ensure pcaddu18i and jirl adjacency in tail calls for correct relocation #113932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 31, 2024

Conversation

heiher
Copy link
Member

@heiher heiher commented Oct 28, 2024

Prior to this patch, both pcaddu18i and jirl were marked as scheduling boundaries to prevent instruction reordering that would disrupt their adjacency. However, in certain cases, epilogues were still being inserted between these two instructions, breaking the required proximity. This patch ensures that pcaddu18i and jirl remain adjacent even in the presence of epilogues, maintaining correct relocation behavior for tail calls on LoongArch.

…rect relocation

Prior to this patch, both `pcaddu18i` and `jirl` were marked as scheduling
boundaries to prevent instruction reordering that would disrupt their
adjacency. However, in certain cases, epilogues were still being inserted
between these two instructions, breaking the required proximity. This patch
ensures that `pcaddu18i` and `jirl` remain adjacent even in the presence of
epilogues, maintaining correct relocation behavior for tail calls on LoongArch.
@llvmbot
Copy link
Member

llvmbot commented Oct 28, 2024

@llvm/pr-subscribers-backend-loongarch

Author: hev (heiher)

Changes

Prior to this patch, both pcaddu18i and jirl were marked as scheduling boundaries to prevent instruction reordering that would disrupt their adjacency. However, in certain cases, epilogues were still being inserted between these two instructions, breaking the required proximity. This patch ensures that pcaddu18i and jirl remain adjacent even in the presence of epilogues, maintaining correct relocation behavior for tail calls on LoongArch.


Full diff: https://github.com/llvm/llvm-project/pull/113932.diff

4 Files Affected:

  • (modified) llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp (+56-27)
  • (modified) llvm/lib/Target/LoongArch/LoongArchInstrInfo.cpp (-7)
  • (modified) llvm/lib/Target/LoongArch/LoongArchInstrInfo.td (+3-2)
  • (modified) llvm/test/CodeGen/LoongArch/code-models.ll (+59-2)
diff --git a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
index e872ec443f87b1..e289fc120e14c0 100644
--- a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
@@ -165,11 +165,9 @@ bool LoongArchPreRAExpandPseudo::expandMI(
   case LoongArch::PseudoLA_TLS_DESC_LARGE:
     return expandLoadAddressTLSDesc(MBB, MBBI, NextMBBI, /*Large=*/true);
   case LoongArch::PseudoCALL:
-  case LoongArch::PseudoCALL_MEDIUM:
   case LoongArch::PseudoCALL_LARGE:
     return expandFunctionCALL(MBB, MBBI, NextMBBI, /*IsTailCall=*/false);
   case LoongArch::PseudoTAIL:
-  case LoongArch::PseudoTAIL_MEDIUM:
   case LoongArch::PseudoTAIL_LARGE:
     return expandFunctionCALL(MBB, MBBI, NextMBBI, /*IsTailCall=*/true);
   case LoongArch::PseudoBRIND:
@@ -556,31 +554,6 @@ bool LoongArchPreRAExpandPseudo::expandFunctionCALL(
     CALL = BuildMI(MBB, MBBI, DL, TII->get(Opcode)).add(Func);
     break;
   }
-  case CodeModel::Medium: {
-    // CALL:
-    // pcaddu18i $ra, %call36(func)
-    // jirl      $ra, $ra, 0
-    // TAIL:
-    // pcaddu18i $scratch, %call36(func)
-    // jirl      $r0, $scratch, 0
-    Opcode =
-        IsTailCall ? LoongArch::PseudoJIRL_TAIL : LoongArch::PseudoJIRL_CALL;
-    Register ScratchReg =
-        IsTailCall
-            ? MF->getRegInfo().createVirtualRegister(&LoongArch::GPRRegClass)
-            : LoongArch::R1;
-    MachineInstrBuilder MIB =
-        BuildMI(MBB, MBBI, DL, TII->get(LoongArch::PCADDU18I), ScratchReg);
-
-    CALL =
-        BuildMI(MBB, MBBI, DL, TII->get(Opcode)).addReg(ScratchReg).addImm(0);
-
-    if (Func.isSymbol())
-      MIB.addExternalSymbol(Func.getSymbolName(), LoongArchII::MO_CALL36);
-    else
-      MIB.addDisp(Func, 0, LoongArchII::MO_CALL36);
-    break;
-  }
   case CodeModel::Large: {
     // Emit the 5-insn large address load sequence, either directly or
     // indirectly in case of going through the GOT, then JIRL_TAIL or
@@ -671,6 +644,10 @@ class LoongArchExpandPseudo : public MachineFunctionPass {
                 MachineBasicBlock::iterator &NextMBBI);
   bool expandCopyCFR(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
                      MachineBasicBlock::iterator &NextMBBI);
+  bool expandFunctionCALL(MachineBasicBlock &MBB,
+                          MachineBasicBlock::iterator MBBI,
+                          MachineBasicBlock::iterator &NextMBBI,
+                          bool IsTailCall);
 };
 
 char LoongArchExpandPseudo::ID = 0;
@@ -705,6 +682,10 @@ bool LoongArchExpandPseudo::expandMI(MachineBasicBlock &MBB,
   switch (MBBI->getOpcode()) {
   case LoongArch::PseudoCopyCFR:
     return expandCopyCFR(MBB, MBBI, NextMBBI);
+  case LoongArch::PseudoCALL_MEDIUM:
+    return expandFunctionCALL(MBB, MBBI, NextMBBI, /*IsTailCall=*/false);
+  case LoongArch::PseudoTAIL_MEDIUM:
+    return expandFunctionCALL(MBB, MBBI, NextMBBI, /*IsTailCall=*/true);
   }
 
   return false;
@@ -763,6 +744,54 @@ bool LoongArchExpandPseudo::expandCopyCFR(
   return true;
 }
 
+bool LoongArchExpandPseudo::expandFunctionCALL(
+    MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
+    MachineBasicBlock::iterator &NextMBBI, bool IsTailCall) {
+  MachineFunction *MF = MBB.getParent();
+  MachineInstr &MI = *MBBI;
+  DebugLoc DL = MI.getDebugLoc();
+  const MachineOperand &Func = MI.getOperand(0);
+  MachineInstrBuilder CALL;
+  unsigned Opcode;
+
+  switch (MF->getTarget().getCodeModel()) {
+  default:
+    report_fatal_error("Unsupported code model");
+    break;
+  case CodeModel::Medium: {
+    // CALL:
+    // pcaddu18i $ra, %call36(func)
+    // jirl      $ra, $ra, 0
+    // TAIL:
+    // pcaddu18i $t8, %call36(func)
+    // jirl      $r0, $t8, 0
+    Opcode =
+        IsTailCall ? LoongArch::PseudoJIRL_TAIL : LoongArch::PseudoJIRL_CALL;
+    Register ScratchReg = IsTailCall ? LoongArch::R20 : LoongArch::R1;
+    MachineInstrBuilder MIB =
+        BuildMI(MBB, MBBI, DL, TII->get(LoongArch::PCADDU18I), ScratchReg);
+
+    CALL =
+        BuildMI(MBB, MBBI, DL, TII->get(Opcode)).addReg(ScratchReg).addImm(0);
+
+    if (Func.isSymbol())
+      MIB.addExternalSymbol(Func.getSymbolName(), LoongArchII::MO_CALL36);
+    else
+      MIB.addDisp(Func, 0, LoongArchII::MO_CALL36);
+    break;
+  }
+  }
+
+  // Transfer implicit operands.
+  CALL.copyImplicitOps(MI);
+
+  // Transfer MI flags.
+  CALL.setMIFlags(MI.getFlags());
+
+  MI.eraseFromParent();
+  return true;
+}
+
 } // end namespace
 
 INITIALIZE_PASS(LoongArchPreRAExpandPseudo, "loongarch-prera-expand-pseudo",
diff --git a/llvm/lib/Target/LoongArch/LoongArchInstrInfo.cpp b/llvm/lib/Target/LoongArch/LoongArchInstrInfo.cpp
index a01f2ed3b5880f..363cacf726c9ce 100644
--- a/llvm/lib/Target/LoongArch/LoongArchInstrInfo.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchInstrInfo.cpp
@@ -391,9 +391,6 @@ bool LoongArchInstrInfo::isSchedulingBoundary(const MachineInstr &MI,
   //
   // The following instruction patterns are prohibited from being reordered:
   //
-  // * pcaddu18 $ra, %call36(s)
-  //   jirl     $ra, $ra, 0
-  //
   // * pcalau12i $a0, %pc_hi20(s)
   //   addi.d $a1, $zero, %pc_lo12(s)
   //   lu32i.d $a1, %pc64_lo20(s)
@@ -413,10 +410,6 @@ bool LoongArchInstrInfo::isSchedulingBoundary(const MachineInstr &MI,
   // boundaries, and the instructions between them are guaranteed to be
   // ordered according to data dependencies.
   switch (MI.getOpcode()) {
-  case LoongArch::PCADDU18I:
-    if (MI.getOperand(1).getTargetFlags() == LoongArchII::MO_CALL36)
-      return true;
-    break;
   case LoongArch::PCALAU12I: {
     auto AddI = std::next(MII);
     if (AddI == MIE || AddI->getOpcode() != LoongArch::ADDI_D)
diff --git a/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
index 671b8cc6ffe1b1..b8c302333c0552 100644
--- a/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
@@ -1484,7 +1484,7 @@ def : Pat<(loongarch_call tglobaladdr:$func), (PseudoCALL tglobaladdr:$func)>;
 def : Pat<(loongarch_call texternalsym:$func), (PseudoCALL texternalsym:$func)>;
 
 // Function call with 'Medium' code model.
-let isCall = 1, Defs = [R1] in
+let isCall = 1, Defs = [R1, R20] in
 def PseudoCALL_MEDIUM : Pseudo<(outs), (ins bare_symbol:$func)>;
 
 let Predicates = [IsLA64] in {
@@ -1533,7 +1533,8 @@ def : Pat<(loongarch_tail (iPTR texternalsym:$dst)),
           (PseudoTAIL texternalsym:$dst)>;
 
 // Tail call with 'Medium' code model.
-let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1, Uses = [R3] in
+let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
+    Uses = [R3], Defs = [R20] in
 def PseudoTAIL_MEDIUM : Pseudo<(outs), (ins bare_symbol:$dst)>;
 
 let Predicates = [IsLA64] in {
diff --git a/llvm/test/CodeGen/LoongArch/code-models.ll b/llvm/test/CodeGen/LoongArch/code-models.ll
index 7bc7a982db86d4..9dc7985ca52b10 100644
--- a/llvm/test/CodeGen/LoongArch/code-models.ll
+++ b/llvm/test/CodeGen/LoongArch/code-models.ll
@@ -105,8 +105,8 @@ define i32 @caller_tail(i32 %i) nounwind {
 ;
 ; MEDIUM-LABEL: caller_tail:
 ; MEDIUM:       # %bb.0: # %entry
-; MEDIUM-NEXT:    pcaddu18i $a1, %call36(callee_tail)
-; MEDIUM-NEXT:    jr $a1
+; MEDIUM-NEXT:    pcaddu18i $t8, %call36(callee_tail)
+; MEDIUM-NEXT:    jr $t8
 ;
 ; LARGE-LABEL: caller_tail:
 ; LARGE:       # %bb.0: # %entry
@@ -120,3 +120,60 @@ entry:
   %r = tail call i32 @callee_tail(i32 %i)
   ret i32 %r
 }
+
+define i32 @caller_call_tail(i32 %i) nounwind {
+; SMALL-LABEL: caller_call_tail:
+; SMALL:       # %bb.0: # %entry
+; SMALL-NEXT:    addi.d $sp, $sp, -16
+; SMALL-NEXT:    st.d $ra, $sp, 8 # 8-byte Folded Spill
+; SMALL-NEXT:    st.d $fp, $sp, 0 # 8-byte Folded Spill
+; SMALL-NEXT:    move $fp, $a0
+; SMALL-NEXT:    bl %plt(callee_tail)
+; SMALL-NEXT:    move $a0, $fp
+; SMALL-NEXT:    ld.d $fp, $sp, 0 # 8-byte Folded Reload
+; SMALL-NEXT:    ld.d $ra, $sp, 8 # 8-byte Folded Reload
+; SMALL-NEXT:    addi.d $sp, $sp, 16
+; SMALL-NEXT:    b %plt(callee_tail)
+;
+; MEDIUM-LABEL: caller_call_tail:
+; MEDIUM:       # %bb.0: # %entry
+; MEDIUM-NEXT:    addi.d $sp, $sp, -16
+; MEDIUM-NEXT:    st.d $ra, $sp, 8 # 8-byte Folded Spill
+; MEDIUM-NEXT:    st.d $fp, $sp, 0 # 8-byte Folded Spill
+; MEDIUM-NEXT:    move $fp, $a0
+; MEDIUM-NEXT:    pcaddu18i $ra, %call36(callee_tail)
+; MEDIUM-NEXT:    jirl $ra, $ra, 0
+; MEDIUM-NEXT:    move $a0, $fp
+; MEDIUM-NEXT:    ld.d $fp, $sp, 0 # 8-byte Folded Reload
+; MEDIUM-NEXT:    ld.d $ra, $sp, 8 # 8-byte Folded Reload
+; MEDIUM-NEXT:    addi.d $sp, $sp, 16
+; MEDIUM-NEXT:    pcaddu18i $t8, %call36(callee_tail)
+; MEDIUM-NEXT:    jr $t8
+;
+; LARGE-LABEL: caller_call_tail:
+; LARGE:       # %bb.0: # %entry
+; LARGE-NEXT:    addi.d $sp, $sp, -16
+; LARGE-NEXT:    st.d $ra, $sp, 8 # 8-byte Folded Spill
+; LARGE-NEXT:    st.d $fp, $sp, 0 # 8-byte Folded Spill
+; LARGE-NEXT:    move $fp, $a0
+; LARGE-NEXT:    pcalau12i $a1, %got_pc_hi20(callee_tail)
+; LARGE-NEXT:    addi.d $ra, $zero, %got_pc_lo12(callee_tail)
+; LARGE-NEXT:    lu32i.d $ra, %got64_pc_lo20(callee_tail)
+; LARGE-NEXT:    lu52i.d $ra, $ra, %got64_pc_hi12(callee_tail)
+; LARGE-NEXT:    ldx.d $ra, $ra, $a1
+; LARGE-NEXT:    jirl $ra, $ra, 0
+; LARGE-NEXT:    move $a0, $fp
+; LARGE-NEXT:    pcalau12i $a1, %got_pc_hi20(callee_tail)
+; LARGE-NEXT:    addi.d $a2, $zero, %got_pc_lo12(callee_tail)
+; LARGE-NEXT:    lu32i.d $a2, %got64_pc_lo20(callee_tail)
+; LARGE-NEXT:    lu52i.d $a2, $a2, %got64_pc_hi12(callee_tail)
+; LARGE-NEXT:    ldx.d $a1, $a2, $a1
+; LARGE-NEXT:    ld.d $fp, $sp, 0 # 8-byte Folded Reload
+; LARGE-NEXT:    ld.d $ra, $sp, 8 # 8-byte Folded Reload
+; LARGE-NEXT:    addi.d $sp, $sp, 16
+; LARGE-NEXT:    jr $a1
+entry:
+  call i32 @callee_tail(i32 %i)
+  %r = tail call i32 @callee_tail(i32 %i)
+  ret i32 %r
+}

@xen0n
Copy link
Contributor

xen0n commented Oct 30, 2024

Also, in case this PR was authored for a regression noticed in the wild, would it be possible to minimize a test case from the triggering code so we can ensure this wouldn't happen again under similar circumstances?

Copy link
Contributor

@xen0n xen0n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some excerpt from an unpatched LLVM compiling the caller_call_tail reproducer:

declare i32 @callee_tail(i32 %i)

define i32 @caller_call_tail(i32 %i) nounwind {
; MEDIUM-LABEL: caller_call_tail:
; MEDIUM:       # %bb.0: # %entry
; MEDIUM-NEXT:    addi.d $sp, $sp, -16
; MEDIUM-NEXT:    st.d $ra, $sp, 8 # 8-byte Folded Spill
; MEDIUM-NEXT:    st.d $fp, $sp, 0 # 8-byte Folded Spill
; MEDIUM-NEXT:    move $fp, $a0
; MEDIUM-NEXT:    pcaddu18i $ra, %call36(callee_tail)      ; 1/2
; MEDIUM-NEXT:    jirl $ra, $ra, 0                         ; 2/2 -- OK
; MEDIUM-NEXT:    move $a0, $fp
; MEDIUM-NEXT:    pcaddu18i $a1, %call36(callee_tail)      ; 1/2
; MEDIUM-NEXT:    ld.d $fp, $sp, 0 # 8-byte Folded Reload
; MEDIUM-NEXT:    ld.d $ra, $sp, 8 # 8-byte Folded Reload
; MEDIUM-NEXT:    addi.d $sp, $sp, 16
; MEDIUM-NEXT:    jr $a1                                   ; 2/2 -- NOT OK!
entry:
  call i32 @callee_tail(i32 %i)
  %r = tail call i32 @callee_tail(i32 %i)
  ret i32 %r
}

So it's actually very trivial to trigger (!!), and now we can be really confident that those failure modes would never happen again with this change. Thanks!

@heiher heiher merged commit f7a96dc into llvm:main Oct 31, 2024
8 checks passed
@heiher heiher deleted the fix-tailcall-medium branch October 31, 2024 16:08
smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
…rect relocation (llvm#113932)

Prior to this patch, both `pcaddu18i` and `jirl` were marked as
scheduling boundaries to prevent instruction reordering that would
disrupt their adjacency. However, in certain cases, epilogues were still
being inserted between these two instructions, breaking the required
proximity. This patch ensures that `pcaddu18i` and `jirl` remain
adjacent even in the presence of epilogues, maintaining correct
relocation behavior for tail calls on LoongArch.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
…rect relocation (llvm#113932)

Prior to this patch, both `pcaddu18i` and `jirl` were marked as
scheduling boundaries to prevent instruction reordering that would
disrupt their adjacency. However, in certain cases, epilogues were still
being inserted between these two instructions, breaking the required
proximity. This patch ensures that `pcaddu18i` and `jirl` remain
adjacent even in the presence of epilogues, maintaining correct
relocation behavior for tail calls on LoongArch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants