Skip to content

Reapply "[RISCV] Implement tail call optimization in machine outliner" #117700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 26, 2024

Conversation

mga-sc
Copy link
Contributor

@mga-sc mga-sc commented Nov 26, 2024

This MR fixes failed test CodeGen/RISCV/compress-opt-select.ll.

It was failed due to previously merged commit [TTI][RISCV] Unconditionally break critical edges to sink ADDI (PR #108889).

So, regenerated compress-opt-select test.

@mga-sc
Copy link
Contributor Author

mga-sc commented Nov 26, 2024

@ilovepi , @wangpc-pp , please, quick review this and merge.
Today's 8 hours earlier commit broke this test for my optimization.

@joker-eph
Copy link
Collaborator

Hi @mga-sc : I reverted the original PR to unbreak the CI. In general we revert unless we can land a fix-forward immediately.
This removes pressure on reviewing the right fix, feel free to cherry-pick the original commit, rebase (or squash) your fix here on top of it, and get it all landed again.

@wangpc-pp
Copy link
Contributor

Don't worry and please cherry-pick the reverted commit here and reland it with your fix:

  1. cherry-pick the reverted commit.
  2. Fix the test.
  3. Change the PR title to Reapply "..." and describe the reason why it was reverted.
  4. Then we can give a LGTM after the success of CI.

Following up issue llvm#89822, this patch adds opportunity to use tail call
in machine outliner pass.
Also it enables outline patterns with X5(T0) register.
@llvmbot
Copy link
Member

llvmbot commented Nov 26, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Mark Goncharov (mga-sc)

Changes

This test has been failed after commit
[RISCV] Implement tail call optimization in machine outliner (PR #115297).

Changes to the same file were merged today earlier
[TTI][RISCV] Unconditionally break critical edges to sink ADDI (PR #108889).


Patch is 42.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117700.diff

11 Files Affected:

  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h (+6)
  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp (+1-5)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+110-33)
  • (modified) llvm/test/CodeGen/RISCV/compress-opt-select.ll (+48-96)
  • (added) llvm/test/CodeGen/RISCV/machine-outliner-call.ll (+70)
  • (modified) llvm/test/CodeGen/RISCV/machine-outliner-cfi.mir (+8-14)
  • (modified) llvm/test/CodeGen/RISCV/machine-outliner-leaf-descendants.ll (+8-5)
  • (modified) llvm/test/CodeGen/RISCV/machine-outliner-patchable.ll (+20-4)
  • (modified) llvm/test/CodeGen/RISCV/machine-outliner-position.mir (+9-12)
  • (added) llvm/test/CodeGen/RISCV/machineoutliner-x5.mir (+54)
  • (modified) llvm/test/CodeGen/RISCV/machineoutliner.mir (+12-6)
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
index 19103e219cb800..ca2f868cd4e764 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
@@ -208,6 +208,12 @@ static inline unsigned getVLOpNum(const MCInstrDesc &Desc) {
   return Desc.getNumOperands() - Offset;
 }
 
+static inline unsigned getTailExpandUseRegNo(const FeatureBitset &FeatureBits) {
+  // For Zicfilp, PseudoTAIL should be expanded to a software guarded branch.
+  // It means to use t2(x7) as rs1 of JALR to expand PseudoTAIL.
+  return FeatureBits[RISCV::FeatureStdExtZicfilp] ? RISCV::X7 : RISCV::X6;
+}
+
 static inline unsigned getSEWOpNum(const MCInstrDesc &Desc) {
   const uint64_t TSFlags = Desc.TSFlags;
   assert(hasSEWOp(TSFlags));
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp
index 6bb49e2bb85fe1..a28bf1186589d9 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp
@@ -124,11 +124,7 @@ void RISCVMCCodeEmitter::expandFunctionCall(const MCInst &MI,
   MCRegister Ra;
   if (MI.getOpcode() == RISCV::PseudoTAIL) {
     Func = MI.getOperand(0);
-    Ra = RISCV::X6;
-    // For Zicfilp, PseudoTAIL should be expanded to a software guarded branch.
-    // It means to use t2(x7) as rs1 of JALR to expand PseudoTAIL.
-    if (STI.hasFeature(RISCV::FeatureStdExtZicfilp))
-      Ra = RISCV::X7;
+    Ra = RISCVII::getTailExpandUseRegNo(STI.getFeatureBits());
   } else if (MI.getOpcode() == RISCV::PseudoCALLReg) {
     Func = MI.getOperand(1);
     Ra = MI.getOperand(0).getReg();
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 933e776da47404..47273d6bc06d65 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -11,6 +11,7 @@
 //===----------------------------------------------------------------------===//
 
 #include "RISCVInstrInfo.h"
+#include "MCTargetDesc/RISCVBaseInfo.h"
 #include "MCTargetDesc/RISCVMatInt.h"
 #include "RISCV.h"
 #include "RISCVMachineFunctionInfo.h"
@@ -2927,6 +2928,7 @@ bool RISCVInstrInfo::isMBBSafeToOutlineFrom(MachineBasicBlock &MBB,
 
 // Enum values indicating how an outlined call should be constructed.
 enum MachineOutlinerConstructionID {
+  MachineOutlinerTailCall,
   MachineOutlinerDefault
 };
 
@@ -2935,46 +2937,118 @@ bool RISCVInstrInfo::shouldOutlineFromFunctionByDefault(
   return MF.getFunction().hasMinSize();
 }
 
+static bool isCandidatePatchable(const MachineBasicBlock &MBB) {
+  const MachineFunction *MF = MBB.getParent();
+  const Function &F = MF->getFunction();
+  return F.getFnAttribute("fentry-call").getValueAsBool() ||
+         F.hasFnAttribute("patchable-function-entry");
+}
+
+static bool isMIReadsReg(const MachineInstr &MI, const TargetRegisterInfo *TRI,
+                         unsigned RegNo) {
+  return MI.readsRegister(RegNo, TRI) ||
+         MI.getDesc().hasImplicitUseOfPhysReg(RegNo);
+}
+
+static bool isMIModifiesReg(const MachineInstr &MI,
+                            const TargetRegisterInfo *TRI, unsigned RegNo) {
+  return MI.modifiesRegister(RegNo, TRI) ||
+         MI.getDesc().hasImplicitDefOfPhysReg(RegNo);
+}
+
+static bool cannotInsertTailCall(const MachineBasicBlock &MBB) {
+  if (!MBB.back().isReturn())
+    return true;
+  if (isCandidatePatchable(MBB))
+    return true;
+
+  // If the candidate reads the pre-set register
+  // that can be used for expanding PseudoTAIL instruction,
+  // then we cannot insert tail call.
+  const TargetSubtargetInfo &STI = MBB.getParent()->getSubtarget();
+  unsigned TailExpandUseRegNo =
+      RISCVII::getTailExpandUseRegNo(STI.getFeatureBits());
+  for (const MachineInstr &MI : MBB) {
+    if (isMIReadsReg(MI, STI.getRegisterInfo(), TailExpandUseRegNo))
+      return true;
+    if (isMIModifiesReg(MI, STI.getRegisterInfo(), TailExpandUseRegNo))
+      break;
+  }
+  return false;
+}
+
+static std::optional<MachineOutlinerConstructionID>
+analyzeCandidate(outliner::Candidate &C) {
+  // If last instruction is return then we can rely on
+  // the verification already performed in the getOutliningTypeImpl.
+  if (C.back().isReturn()) {
+    assert(!cannotInsertTailCall(*C.getMBB()) &&
+           "The candidate who uses return instruction must be outlined "
+           "using tail call");
+    return MachineOutlinerTailCall;
+  }
+
+  auto CandidateUsesX5 = [](outliner::Candidate &C) {
+    const TargetRegisterInfo *TRI = C.getMF()->getSubtarget().getRegisterInfo();
+    if (std::any_of(C.begin(), C.end(), [TRI](const MachineInstr &MI) {
+          return isMIModifiesReg(MI, TRI, RISCV::X5);
+        }))
+      return true;
+    return !C.isAvailableAcrossAndOutOfSeq(RISCV::X5, *TRI);
+  };
+
+  if (!CandidateUsesX5(C))
+    return MachineOutlinerDefault;
+
+  return std::nullopt;
+}
+
 std::optional<std::unique_ptr<outliner::OutlinedFunction>>
 RISCVInstrInfo::getOutliningCandidateInfo(
     const MachineModuleInfo &MMI,
     std::vector<outliner::Candidate> &RepeatedSequenceLocs,
     unsigned MinRepeats) const {
 
-  // First we need to filter out candidates where the X5 register (IE t0) can't
-  // be used to setup the function call.
-  auto CannotInsertCall = [](outliner::Candidate &C) {
-    const TargetRegisterInfo *TRI = C.getMF()->getSubtarget().getRegisterInfo();
-    return !C.isAvailableAcrossAndOutOfSeq(RISCV::X5, *TRI);
-  };
-
-  llvm::erase_if(RepeatedSequenceLocs, CannotInsertCall);
+  // Each RepeatedSequenceLoc is identical.
+  outliner::Candidate &Candidate = RepeatedSequenceLocs[0];
+  auto CandidateInfo = analyzeCandidate(Candidate);
+  if (!CandidateInfo)
+    RepeatedSequenceLocs.clear();
 
   // If the sequence doesn't have enough candidates left, then we're done.
   if (RepeatedSequenceLocs.size() < MinRepeats)
     return std::nullopt;
 
-  unsigned SequenceSize = 0;
-
-  for (auto &MI : RepeatedSequenceLocs[0])
-    SequenceSize += getInstSizeInBytes(MI);
+  unsigned InstrSizeCExt =
+      Candidate.getMF()->getSubtarget<RISCVSubtarget>().hasStdExtCOrZca() ? 2
+                                                                          : 4;
+  unsigned CallOverhead = 0, FrameOverhead = 0;
+
+  MachineOutlinerConstructionID MOCI = CandidateInfo.value();
+  switch (MOCI) {
+  case MachineOutlinerDefault:
+    // call t0, function = 8 bytes.
+    CallOverhead = 8;
+    // jr t0 = 4 bytes, 2 bytes if compressed instructions are enabled.
+    FrameOverhead = InstrSizeCExt;
+    break;
+  case MachineOutlinerTailCall:
+    // tail call = auipc + jalr in the worst case without linker relaxation.
+    CallOverhead = 4 + InstrSizeCExt;
+    // Using tail call we move ret instruction from caller to callee.
+    FrameOverhead = 0;
+    break;
+  }
 
-  // call t0, function = 8 bytes.
-  unsigned CallOverhead = 8;
   for (auto &C : RepeatedSequenceLocs)
-    C.setCallInfo(MachineOutlinerDefault, CallOverhead);
+    C.setCallInfo(MOCI, CallOverhead);
 
-  // jr t0 = 4 bytes, 2 bytes if compressed instructions are enabled.
-  unsigned FrameOverhead = 4;
-  if (RepeatedSequenceLocs[0]
-          .getMF()
-          ->getSubtarget<RISCVSubtarget>()
-          .hasStdExtCOrZca())
-    FrameOverhead = 2;
+  unsigned SequenceSize = 0;
+  for (auto &MI : Candidate)
+    SequenceSize += getInstSizeInBytes(MI);
 
   return std::make_unique<outliner::OutlinedFunction>(
-      RepeatedSequenceLocs, SequenceSize, FrameOverhead,
-      MachineOutlinerDefault);
+      RepeatedSequenceLocs, SequenceSize, FrameOverhead, MOCI);
 }
 
 outliner::InstrType
@@ -2995,15 +3069,8 @@ RISCVInstrInfo::getOutliningTypeImpl(const MachineModuleInfo &MMI,
     return F.needsUnwindTableEntry() ? outliner::InstrType::Illegal
                                      : outliner::InstrType::Invisible;
 
-  // We need support for tail calls to outlined functions before return
-  // statements can be allowed.
-  if (MI.isReturn())
-    return outliner::InstrType::Illegal;
-
-  // Don't allow modifying the X5 register which we use for return addresses for
-  // these outlined functions.
-  if (MI.modifiesRegister(RISCV::X5, TRI) ||
-      MI.getDesc().hasImplicitDefOfPhysReg(RISCV::X5))
+  if (cannotInsertTailCall(*MBB) &&
+      (MI.isReturn() || isMIModifiesReg(MI, TRI, RISCV::X5)))
     return outliner::InstrType::Illegal;
 
   // Make sure the operands don't reference something unsafe.
@@ -3039,6 +3106,9 @@ void RISCVInstrInfo::buildOutlinedFrame(
     }
   }
 
+  if (OF.FrameConstructionID == MachineOutlinerTailCall)
+    return;
+
   MBB.addLiveIn(RISCV::X5);
 
   // Add in a return instruction to the end of the outlined frame.
@@ -3052,6 +3122,13 @@ MachineBasicBlock::iterator RISCVInstrInfo::insertOutlinedCall(
     Module &M, MachineBasicBlock &MBB, MachineBasicBlock::iterator &It,
     MachineFunction &MF, outliner::Candidate &C) const {
 
+  if (C.CallConstructionID == MachineOutlinerTailCall) {
+    It = MBB.insert(It, BuildMI(MF, DebugLoc(), get(RISCV::PseudoTAIL))
+                            .addGlobalAddress(M.getNamedValue(MF.getName()),
+                                              /*Offset=*/0, RISCVII::MO_CALL));
+    return It;
+  }
+
   // Add in a call instruction to the outlined function at the given location.
   It = MBB.insert(It,
                   BuildMI(MF, DebugLoc(), get(RISCV::PseudoCALLReg), RISCV::X5)
diff --git a/llvm/test/CodeGen/RISCV/compress-opt-select.ll b/llvm/test/CodeGen/RISCV/compress-opt-select.ll
index f9333a45016a06..733c84ac236133 100644
--- a/llvm/test/CodeGen/RISCV/compress-opt-select.ll
+++ b/llvm/test/CodeGen/RISCV/compress-opt-select.ll
@@ -13,11 +13,9 @@ define i32 @ne_small_pos(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    c.li a1, 20
 ; RV32IFDC-NEXT:    bne a0, a1, .LBB0_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB0_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: ne_small_pos:
 ; RV32IFD:       # %bb.0:
@@ -41,11 +39,9 @@ define i32 @ne_small_neg(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    c.li a1, -20
 ; RV32IFDC-NEXT:    bne a0, a1, .LBB1_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB1_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: ne_small_neg:
 ; RV32IFD:       # %bb.0:
@@ -69,11 +65,9 @@ define i32 @ne_small_edge_pos(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    c.li a1, 31
 ; RV32IFDC-NEXT:    bne a0, a1, .LBB2_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB2_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: ne_small_edge_pos:
 ; RV32IFD:       # %bb.0:
@@ -97,11 +91,9 @@ define i32 @ne_small_edge_neg(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    c.li a1, -32
 ; RV32IFDC-NEXT:    bne a0, a1, .LBB3_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB3_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: ne_small_edge_neg:
 ; RV32IFD:       # %bb.0:
@@ -126,11 +118,9 @@ define i32 @ne_medium_ledge_pos(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    addi a0, a0, -33
 ; RV32IFDC-NEXT:    c.bnez a0, .LBB4_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB4_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: ne_medium_ledge_pos:
 ; RV32IFD:       # %bb.0:
@@ -155,11 +145,9 @@ define i32 @ne_medium_ledge_neg(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    addi a0, a0, 33
 ; RV32IFDC-NEXT:    c.bnez a0, .LBB5_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB5_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: ne_medium_ledge_neg:
 ; RV32IFD:       # %bb.0:
@@ -184,11 +172,9 @@ define i32 @ne_medium_pos(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    addi a0, a0, -63
 ; RV32IFDC-NEXT:    c.bnez a0, .LBB6_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB6_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: ne_medium_pos:
 ; RV32IFD:       # %bb.0:
@@ -213,11 +199,9 @@ define i32 @ne_medium_neg(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    addi a0, a0, 63
 ; RV32IFDC-NEXT:    c.bnez a0, .LBB7_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB7_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: ne_medium_neg:
 ; RV32IFD:       # %bb.0:
@@ -242,11 +226,9 @@ define i32 @ne_medium_bedge_pos(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    addi a0, a0, -2047
 ; RV32IFDC-NEXT:    c.bnez a0, .LBB8_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB8_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: ne_medium_bedge_pos:
 ; RV32IFD:       # %bb.0:
@@ -271,11 +253,9 @@ define i32 @ne_medium_bedge_neg(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    addi a0, a0, 2047
 ; RV32IFDC-NEXT:    c.bnez a0, .LBB9_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB9_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: ne_medium_bedge_neg:
 ; RV32IFD:       # %bb.0:
@@ -300,11 +280,9 @@ define i32 @ne_big_ledge_pos(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    c.slli a1, 11
 ; RV32IFDC-NEXT:    bne a0, a1, .LBB10_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB10_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: ne_big_ledge_pos:
 ; RV32IFD:       # %bb.0:
@@ -329,11 +307,9 @@ define i32 @ne_big_ledge_neg(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    addi a1, zero, -2048
 ; RV32IFDC-NEXT:    bne a0, a1, .LBB11_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB11_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: ne_big_ledge_neg:
 ; RV32IFD:       # %bb.0:
@@ -360,11 +336,9 @@ define i32 @eq_small_pos(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    c.li a1, 20
 ; RV32IFDC-NEXT:    beq a0, a1, .LBB12_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB12_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: eq_small_pos:
 ; RV32IFD:       # %bb.0:
@@ -388,11 +362,9 @@ define i32 @eq_small_neg(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    c.li a1, -20
 ; RV32IFDC-NEXT:    beq a0, a1, .LBB13_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB13_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: eq_small_neg:
 ; RV32IFD:       # %bb.0:
@@ -416,11 +388,9 @@ define i32 @eq_small_edge_pos(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    c.li a1, 31
 ; RV32IFDC-NEXT:    beq a0, a1, .LBB14_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB14_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: eq_small_edge_pos:
 ; RV32IFD:       # %bb.0:
@@ -444,11 +414,9 @@ define i32 @eq_small_edge_neg(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    c.li a1, -32
 ; RV32IFDC-NEXT:    beq a0, a1, .LBB15_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB15_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: eq_small_edge_neg:
 ; RV32IFD:       # %bb.0:
@@ -473,11 +441,9 @@ define i32 @eq_medium_ledge_pos(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    addi a0, a0, -33
 ; RV32IFDC-NEXT:    c.beqz a0, .LBB16_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB16_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: eq_medium_ledge_pos:
 ; RV32IFD:       # %bb.0:
@@ -502,11 +468,9 @@ define i32 @eq_medium_ledge_neg(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    addi a0, a0, 33
 ; RV32IFDC-NEXT:    c.beqz a0, .LBB17_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB17_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: eq_medium_ledge_neg:
 ; RV32IFD:       # %bb.0:
@@ -531,11 +495,9 @@ define i32 @eq_medium_pos(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    addi a0, a0, -63
 ; RV32IFDC-NEXT:    c.beqz a0, .LBB18_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB18_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: eq_medium_pos:
 ; RV32IFD:       # %bb.0:
@@ -560,11 +522,9 @@ define i32 @eq_medium_neg(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    addi a0, a0, 63
 ; RV32IFDC-NEXT:    c.beqz a0, .LBB19_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB19_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: eq_medium_neg:
 ; RV32IFD:       # %bb.0:
@@ -589,11 +549,9 @@ define i32 @eq_medium_bedge_pos(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    addi a0, a0, -2047
 ; RV32IFDC-NEXT:    c.beqz a0, .LBB20_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a0, zero, 42
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_1
 ; RV32IFDC-NEXT:  .LBB20_2:
-; RV32IFDC-NEXT:    addi a0, zero, -99
-; RV32IFDC-NEXT:    c.jr ra
+; RV32IFDC-NEXT:    tail OUTLINED_FUNCTION_0
 ;
 ; RV32IFD-LABEL: eq_medium_bedge_pos:
 ; RV32IFD:       # %bb.0:
@@ -618,11 +576,9 @@ define i32 @eq_medium_bedge_neg(i32 %in0) minsize {
 ; RV32IFDC-NEXT:    addi a0, a0, 2047
 ; RV32IFDC-NEXT:    c.beqz a0, .LBB21_2
 ; RV32IFDC-NEXT:  # %bb.1:
-; RV32IFDC-NEXT:    addi a...
[truncated]

@mga-sc mga-sc changed the title Fix compress-opt-select test after merge #115297 Reapply "[RISCV] Implement tail call optimization in machine outliner" Nov 26, 2024
@mga-sc
Copy link
Contributor Author

mga-sc commented Nov 26, 2024

@wangpc-pp , done, CI has been passed.

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@mga-sc
Copy link
Contributor Author

mga-sc commented Nov 26, 2024

@wangpc-pp , could you please merge in this case

@wangpc-pp wangpc-pp merged commit 80df56e into llvm:main Nov 26, 2024
8 checks passed
svs-quic added a commit that referenced this pull request Feb 21, 2025
#117700 made a change from analyzing all the candidates to analyzing
just the first candidate before deciding to either delete or keep all of
them.

Even though the candidates all have the same instructions, the basic
blocks in which they are present are different and we will need to check
each of them before deciding whether to keep or erase them.
Particularly, `isAvailableAcrossAndOutOfSeq` checks to see if the
register (x5 in this case) is available from the end of the MBB to the
beginning of the candidate and not checking this for each candidate led
to incorrect candidates being outlined resulting in correctness issues
in a few downstream benchmarks.

Similarly, deleting all the candidates if the first one is not viable
will result in missed outlining opportunities.
swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request Feb 21, 2025
llvm#117700 made a change from analyzing all the candidates to analyzing
just the first candidate before deciding to either delete or keep all of
them.

Even though the candidates all have the same instructions, the basic
blocks in which they are present are different and we will need to check
each of them before deciding whether to keep or erase them.
Particularly, `isAvailableAcrossAndOutOfSeq` checks to see if the
register (x5 in this case) is available from the end of the MBB to the
beginning of the candidate and not checking this for each candidate led
to incorrect candidates being outlined resulting in correctness issues
in a few downstream benchmarks.

Similarly, deleting all the candidates if the first one is not viable
will result in missed outlining opportunities.

(cherry picked from commit 6757cf4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants