Skip to content

[RISCV] Move performCombineVMergeAndVOps to RISCVVectorPeephole #144076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Jun 13, 2025

This moves the peephole that folds vmerges into its operands into RISCVVectorPeephole. This will also allow us to eventually commute instructions to allow folding, see #141885 and #70042

Most of the test diffs are due to the slight change in instruction ordering.

For now doPeepholeMaskedRVV is kept even though its a duplicate of RISCVVectorPeephole::convertToUnmasked to minimize the diff, I plan on removing it in a separate patch as it causes some instructions to be shuffled around.

Similarly, this runs foldVMergeToMask before the other peepholes to minimize the diff for now.

rvv-peephole-vmerge-vops-mir.ll was replaced with a dedicated vmerge-peephole.mir test.

This moves the peephole that folds vmerges into its operands into RISCVVectorPeephole. This will also allow us to eventually commute instructions to allow folding, see llvm#141885 and llvm#70042

Most of the test diffs are due to the slight change in instruction ordering.

For now doPeepholeMaskedRVV is kept even though its a duplicate of RISCVVectorPeephole::convertToUnmasked to minimize the diff, I plan on removing it in a separate patch as it causes some instructions to be shuffled around.

Similarly, this runs foldVMergeToMask before the other peepholes to minimize the diff for now.

rvv-peephole-vmerge-vops-mir.ll was replaced with a dedicate vmerge-peephole.mir test.
@llvmbot
Copy link
Member

llvmbot commented Jun 13, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

This moves the peephole that folds vmerges into its operands into RISCVVectorPeephole. This will also allow us to eventually commute instructions to allow folding, see #141885 and #70042

Most of the test diffs are due to the slight change in instruction ordering.

For now doPeepholeMaskedRVV is kept even though its a duplicate of RISCVVectorPeephole::convertToUnmasked to minimize the diff, I plan on removing it in a separate patch as it causes some instructions to be shuffled around.

Similarly, this runs foldVMergeToMask before the other peepholes to minimize the diff for now.

rvv-peephole-vmerge-vops-mir.ll was replaced with a dedicate vmerge-peephole.mir test.


Patch is 103.88 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144076.diff

15 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp (-214)
  • (modified) llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h (-1)
  • (modified) llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp (+148-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/combine-reduce-add-to-vcpop.ll (+34-33)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll (+18-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+198-198)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-mask-buildvec.ll (+40-40)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+133-133)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-deinterleave2.ll (+96-96)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-int-interleave.ll (+34-21)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-int.ll (+11-12)
  • (removed) llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops-mir.ll (-45)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-fixed.ll (+10-12)
  • (added) llvm/test/CodeGen/RISCV/rvv/vmerge-peephole.mir (+57)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwadd-sdnode.ll (+6-8)
diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index 4539efd591c8b..604b924e94b66 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -163,8 +163,6 @@ void RISCVDAGToDAGISel::PostprocessISelDAG() {
 
   CurDAG->setRoot(Dummy.getValue());
 
-  MadeChange |= doPeepholeMergeVVMFold();
-
   // After we're done with everything else, convert IMPLICIT_DEF
   // passthru operands to NoRegister.  This is required to workaround
   // an optimization deficiency in MachineCSE.  This really should
@@ -4069,218 +4067,6 @@ bool RISCVDAGToDAGISel::doPeepholeMaskedRVV(MachineSDNode *N) {
   return true;
 }
 
-static bool IsVMerge(SDNode *N) {
-  return RISCV::getRVVMCOpcode(N->getMachineOpcode()) == RISCV::VMERGE_VVM;
-}
-
-// Try to fold away VMERGE_VVM instructions into their true operands:
-//
-// %true = PseudoVADD_VV ...
-// %x = PseudoVMERGE_VVM %false, %false, %true, %mask
-// ->
-// %x = PseudoVADD_VV_MASK %false, ..., %mask
-//
-// We can only fold if vmerge's passthru operand, vmerge's false operand and
-// %true's passthru operand (if it has one) are the same. This is because we
-// have to consolidate them into one passthru operand in the result.
-//
-// If %true is masked, then we can use its mask instead of vmerge's if vmerge's
-// mask is all ones.
-//
-// The resulting VL is the minimum of the two VLs.
-//
-// The resulting policy is the effective policy the vmerge would have had,
-// i.e. whether or not it's passthru operand was implicit-def.
-bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
-  SDValue Passthru, False, True, VL, Mask;
-  assert(IsVMerge(N));
-  Passthru = N->getOperand(0);
-  False = N->getOperand(1);
-  True = N->getOperand(2);
-  Mask = N->getOperand(3);
-  VL = N->getOperand(4);
-
-  // If the EEW of True is different from vmerge's SEW, then we can't fold.
-  if (True.getSimpleValueType() != N->getSimpleValueType(0))
-    return false;
-
-  // We require that either passthru and false are the same, or that passthru
-  // is undefined.
-  if (Passthru != False && !isImplicitDef(Passthru))
-    return false;
-
-  assert(True.getResNo() == 0 &&
-         "Expect True is the first output of an instruction.");
-
-  // Need N is the exactly one using True.
-  if (!True.hasOneUse())
-    return false;
-
-  if (!True.isMachineOpcode())
-    return false;
-
-  unsigned TrueOpc = True.getMachineOpcode();
-  const MCInstrDesc &TrueMCID = TII->get(TrueOpc);
-  uint64_t TrueTSFlags = TrueMCID.TSFlags;
-  bool HasTiedDest = RISCVII::isFirstDefTiedToFirstUse(TrueMCID);
-
-  const RISCV::RISCVMaskedPseudoInfo *Info =
-      RISCV::lookupMaskedIntrinsicByUnmasked(TrueOpc);
-  if (!Info)
-    return false;
-
-  // If True has a passthru operand then it needs to be the same as vmerge's
-  // False, since False will be used for the result's passthru operand.
-  if (HasTiedDest && !isImplicitDef(True->getOperand(0))) {
-    SDValue PassthruOpTrue = True->getOperand(0);
-    if (False != PassthruOpTrue)
-      return false;
-  }
-
-  // Skip if True has side effect.
-  if (TII->get(TrueOpc).hasUnmodeledSideEffects())
-    return false;
-
-  unsigned TrueChainOpIdx = True.getNumOperands() - 1;
-  bool HasChainOp =
-      True.getOperand(TrueChainOpIdx).getValueType() == MVT::Other;
-
-  if (HasChainOp) {
-    // Avoid creating cycles in the DAG. We must ensure that none of the other
-    // operands depend on True through it's Chain.
-    SmallVector<const SDNode *, 4> LoopWorklist;
-    SmallPtrSet<const SDNode *, 16> Visited;
-    LoopWorklist.push_back(False.getNode());
-    LoopWorklist.push_back(Mask.getNode());
-    LoopWorklist.push_back(VL.getNode());
-    if (SDNode::hasPredecessorHelper(True.getNode(), Visited, LoopWorklist))
-      return false;
-  }
-
-  // The vector policy operand may be present for masked intrinsics
-  bool HasVecPolicyOp = RISCVII::hasVecPolicyOp(TrueTSFlags);
-  unsigned TrueVLIndex =
-      True.getNumOperands() - HasVecPolicyOp - HasChainOp - 2;
-  SDValue TrueVL = True.getOperand(TrueVLIndex);
-  SDValue SEW = True.getOperand(TrueVLIndex + 1);
-
-  auto GetMinVL = [](SDValue LHS, SDValue RHS) {
-    if (LHS == RHS)
-      return LHS;
-    if (isAllOnesConstant(LHS))
-      return RHS;
-    if (isAllOnesConstant(RHS))
-      return LHS;
-    auto *CLHS = dyn_cast<ConstantSDNode>(LHS);
-    auto *CRHS = dyn_cast<ConstantSDNode>(RHS);
-    if (!CLHS || !CRHS)
-      return SDValue();
-    return CLHS->getZExtValue() <= CRHS->getZExtValue() ? LHS : RHS;
-  };
-
-  // Because N and True must have the same passthru operand (or True's operand
-  // is implicit_def), the "effective" body is the minimum of their VLs.
-  SDValue OrigVL = VL;
-  VL = GetMinVL(TrueVL, VL);
-  if (!VL)
-    return false;
-
-  // Some operations produce different elementwise results depending on the
-  // active elements, like viota.m or vredsum. This transformation is illegal
-  // for these if we change the active elements (i.e. mask or VL).
-  const MCInstrDesc &TrueBaseMCID = TII->get(RISCV::getRVVMCOpcode(TrueOpc));
-  if (RISCVII::elementsDependOnVL(TrueBaseMCID.TSFlags) && (TrueVL != VL))
-    return false;
-  if (RISCVII::elementsDependOnMask(TrueBaseMCID.TSFlags) &&
-      (Mask && !usesAllOnesMask(Mask)))
-    return false;
-
-  // Make sure it doesn't raise any observable fp exceptions, since changing the
-  // active elements will affect how fflags is set.
-  if (mayRaiseFPException(True.getNode()) && !True->getFlags().hasNoFPExcept())
-    return false;
-
-  SDLoc DL(N);
-
-  unsigned MaskedOpc = Info->MaskedPseudo;
-#ifndef NDEBUG
-  const MCInstrDesc &MaskedMCID = TII->get(MaskedOpc);
-  assert(RISCVII::hasVecPolicyOp(MaskedMCID.TSFlags) &&
-         "Expected instructions with mask have policy operand.");
-  assert(MaskedMCID.getOperandConstraint(MaskedMCID.getNumDefs(),
-                                         MCOI::TIED_TO) == 0 &&
-         "Expected instructions with mask have a tied dest.");
-#endif
-
-  // Use a tumu policy, relaxing it to tail agnostic provided that the passthru
-  // operand is undefined.
-  //
-  // However, if the VL became smaller than what the vmerge had originally, then
-  // elements past VL that were previously in the vmerge's body will have moved
-  // to the tail. In that case we always need to use tail undisturbed to
-  // preserve them.
-  bool MergeVLShrunk = VL != OrigVL;
-  uint64_t Policy = (isImplicitDef(Passthru) && !MergeVLShrunk)
-                        ? RISCVVType::TAIL_AGNOSTIC
-                        : /*TUMU*/ 0;
-  SDValue PolicyOp =
-    CurDAG->getTargetConstant(Policy, DL, Subtarget->getXLenVT());
-
-
-  SmallVector<SDValue, 8> Ops;
-  Ops.push_back(False);
-
-  const bool HasRoundingMode = RISCVII::hasRoundModeOp(TrueTSFlags);
-  const unsigned NormalOpsEnd = TrueVLIndex - HasRoundingMode;
-  Ops.append(True->op_begin() + HasTiedDest, True->op_begin() + NormalOpsEnd);
-
-  Ops.push_back(Mask);
-
-  // For unmasked "VOp" with rounding mode operand, that is interfaces like
-  // (..., rm, vl) or (..., rm, vl, policy).
-  // Its masked version is (..., vm, rm, vl, policy).
-  // Check the rounding mode pseudo nodes under RISCVInstrInfoVPseudos.td
-  if (HasRoundingMode)
-    Ops.push_back(True->getOperand(TrueVLIndex - 1));
-
-  Ops.append({VL, SEW, PolicyOp});
-
-  // Result node should have chain operand of True.
-  if (HasChainOp)
-    Ops.push_back(True.getOperand(TrueChainOpIdx));
-
-  MachineSDNode *Result =
-      CurDAG->getMachineNode(MaskedOpc, DL, True->getVTList(), Ops);
-  Result->setFlags(True->getFlags());
-
-  if (!cast<MachineSDNode>(True)->memoperands_empty())
-    CurDAG->setNodeMemRefs(Result, cast<MachineSDNode>(True)->memoperands());
-
-  // Replace vmerge.vvm node by Result.
-  ReplaceUses(SDValue(N, 0), SDValue(Result, 0));
-
-  // Replace another value of True. E.g. chain and VL.
-  for (unsigned Idx = 1; Idx < True->getNumValues(); ++Idx)
-    ReplaceUses(True.getValue(Idx), SDValue(Result, Idx));
-
-  return true;
-}
-
-bool RISCVDAGToDAGISel::doPeepholeMergeVVMFold() {
-  bool MadeChange = false;
-  SelectionDAG::allnodes_iterator Position = CurDAG->allnodes_end();
-
-  while (Position != CurDAG->allnodes_begin()) {
-    SDNode *N = &*--Position;
-    if (N->use_empty() || !N->isMachineOpcode())
-      continue;
-
-    if (IsVMerge(N))
-      MadeChange |= performCombineVMergeAndVOps(N);
-  }
-  return MadeChange;
-}
-
 /// If our passthru is an implicit_def, use noreg instead.  This side
 /// steps issues with MachineCSE not being able to CSE expressions with
 /// IMPLICIT_DEF operands while preserving the semantic intent. See
diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
index cb63c21fd8fc9..59c1b48282f5b 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
@@ -199,7 +199,6 @@ class RISCVDAGToDAGISel : public SelectionDAGISel {
 private:
   bool doPeepholeSExtW(SDNode *Node);
   bool doPeepholeMaskedRVV(MachineSDNode *Node);
-  bool doPeepholeMergeVVMFold();
   bool doPeepholeNoRegPassThru();
   bool performCombineVMergeAndVOps(SDNode *N);
   bool selectImm64IfCheaper(int64_t Imm, int64_t OrigImm, SDValue N,
diff --git a/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp b/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
index c9c2413d009b7..69c288c7a2e7a 100644
--- a/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
@@ -67,12 +67,13 @@ class RISCVVectorPeephole : public MachineFunctionPass {
   bool convertSameMaskVMergeToVMv(MachineInstr &MI);
   bool foldUndefPassthruVMV_V_V(MachineInstr &MI);
   bool foldVMV_V_V(MachineInstr &MI);
+  bool foldVMergeToMask(MachineInstr &MI) const;
 
   bool hasSameEEW(const MachineInstr &User, const MachineInstr &Src) const;
   bool isAllOnesMask(const MachineInstr *MaskDef) const;
   std::optional<unsigned> getConstant(const MachineOperand &VL) const;
   bool ensureDominates(const MachineOperand &Use, MachineInstr &Src) const;
-  bool isKnownSameDefs(const MachineOperand &A, const MachineOperand &B) const;
+  bool isKnownSameDefs(Register A, Register B) const;
 };
 
 } // namespace
@@ -380,13 +381,23 @@ bool RISCVVectorPeephole::convertAllOnesVMergeToVMv(MachineInstr &MI) const {
   return true;
 }
 
-bool RISCVVectorPeephole::isKnownSameDefs(const MachineOperand &A,
-                                          const MachineOperand &B) const {
-  if (A.getReg().isPhysical() || B.getReg().isPhysical())
+bool RISCVVectorPeephole::isKnownSameDefs(Register A, Register B) const {
+  if (A.isPhysical() || B.isPhysical())
     return false;
 
-  return TRI->lookThruCopyLike(A.getReg(), MRI) ==
-         TRI->lookThruCopyLike(B.getReg(), MRI);
+  auto LookThruVirtRegCopies = [this](Register Reg) {
+    while (MachineInstr *Def = MRI->getUniqueVRegDef(Reg)) {
+      if (!Def->isFullCopy())
+        break;
+      Register Src = Def->getOperand(1).getReg();
+      if (!Src.isVirtual())
+        break;
+      Reg = Src;
+    }
+    return Reg;
+  };
+
+  return LookThruVirtRegCopies(A) == LookThruVirtRegCopies(B);
 }
 
 /// If a PseudoVMERGE_VVM's true operand is a masked pseudo and both have the
@@ -414,7 +425,7 @@ bool RISCVVectorPeephole::convertSameMaskVMergeToVMv(MachineInstr &MI) {
   const MachineOperand &TrueMask =
       True->getOperand(TrueMaskedInfo->MaskOpIdx + True->getNumExplicitDefs());
   const MachineOperand &MIMask = MI.getOperand(4);
-  if (!isKnownSameDefs(TrueMask, MIMask))
+  if (!isKnownSameDefs(TrueMask.getReg(), MIMask.getReg()))
     return false;
 
   // True's passthru needs to be equivalent to False
@@ -663,6 +674,133 @@ bool RISCVVectorPeephole::foldVMV_V_V(MachineInstr &MI) {
   return true;
 }
 
+/// Try to fold away VMERGE_VVM instructions into their operands:
+///
+/// %true = PseudoVADD_VV ...
+/// %x = PseudoVMERGE_VVM_M1 %false, %false, %true, %mask
+/// ->
+/// %x = PseudoVADD_VV_M1_MASK %false, ..., %mask
+///
+/// We can only fold if vmerge's passthru operand, vmerge's false operand and
+/// %true's passthru operand (if it has one) are the same. This is because we
+/// have to consolidate them into one passthru operand in the result.
+///
+/// If %true is masked, then we can use its mask instead of vmerge's if vmerge's
+/// mask is all ones.
+///
+/// The resulting VL is the minimum of the two VLs.
+///
+/// The resulting policy is the effective policy the vmerge would have had,
+/// i.e. whether or not it's passthru operand was implicit-def.
+bool RISCVVectorPeephole::foldVMergeToMask(MachineInstr &MI) const {
+  if (RISCV::getRVVMCOpcode(MI.getOpcode()) != RISCV::VMERGE_VVM)
+    return false;
+
+  Register PassthruReg = MI.getOperand(1).getReg();
+  Register FalseReg = MI.getOperand(2).getReg();
+  Register TrueReg = MI.getOperand(3).getReg();
+  if (!TrueReg.isVirtual() || !MRI->hasOneUse(TrueReg))
+    return false;
+  MachineInstr &True = *MRI->getUniqueVRegDef(TrueReg);
+  if (True.getParent() != MI.getParent())
+    return false;
+  const MachineOperand &MaskOp = MI.getOperand(4);
+  MachineInstr *Mask = MRI->getUniqueVRegDef(MaskOp.getReg());
+  assert(Mask);
+
+  const RISCV::RISCVMaskedPseudoInfo *Info =
+      RISCV::lookupMaskedIntrinsicByUnmasked(True.getOpcode());
+  if (!Info)
+    return false;
+
+  // If the EEW of True is different from vmerge's SEW, then we can't fold.
+  if (!hasSameEEW(MI, True))
+    return false;
+
+  // We require that either passthru and false are the same, or that passthru
+  // is undefined.
+  if (PassthruReg != RISCV::NoRegister &&
+      !isKnownSameDefs(PassthruReg, FalseReg))
+    return false;
+
+  // If True has a passthru operand then it needs to be the same as vmerge's
+  // False, since False will be used for the result's passthru operand.
+  Register TruePassthru = True.getOperand(True.getNumExplicitDefs()).getReg();
+  if (RISCVII::isFirstDefTiedToFirstUse(True.getDesc()) &&
+      TruePassthru != RISCV::NoRegister &&
+      !isKnownSameDefs(TruePassthru, FalseReg))
+    return false;
+
+  // Make sure it doesn't raise any observable fp exceptions, since changing the
+  // active elements will affect how fflags is set.
+  if (True.hasUnmodeledSideEffects() || True.mayRaiseFPException())
+    return false;
+
+  const MachineOperand &VMergeVL =
+      MI.getOperand(RISCVII::getVLOpNum(MI.getDesc()));
+  const MachineOperand &TrueVL =
+      True.getOperand(RISCVII::getVLOpNum(True.getDesc()));
+
+  MachineOperand MinVL = MachineOperand::CreateImm(0);
+  if (RISCV::isVLKnownLE(TrueVL, VMergeVL))
+    MinVL = TrueVL;
+  else if (RISCV::isVLKnownLE(VMergeVL, TrueVL))
+    MinVL = VMergeVL;
+  else
+    return false;
+
+  unsigned RVVTSFlags =
+      TII->get(RISCV::getRVVMCOpcode(True.getOpcode())).TSFlags;
+  if (RISCVII::elementsDependOnVL(RVVTSFlags) && !TrueVL.isIdenticalTo(MinVL))
+    return false;
+  if (RISCVII::elementsDependOnMask(RVVTSFlags) && !isAllOnesMask(Mask))
+    return false;
+
+  // Use a tumu policy, relaxing it to tail agnostic provided that the passthru
+  // operand is undefined.
+  //
+  // However, if the VL became smaller than what the vmerge had originally, then
+  // elements past VL that were previously in the vmerge's body will have moved
+  // to the tail. In that case we always need to use tail undisturbed to
+  // preserve them.
+  uint64_t Policy = RISCVVType::TAIL_UNDISTURBED_MASK_UNDISTURBED;
+  if (PassthruReg == RISCV::NoRegister && RISCV::isVLKnownLE(VMergeVL, MinVL))
+    Policy |= RISCVVType::TAIL_AGNOSTIC;
+
+  assert(RISCVII::hasVecPolicyOp(True.getDesc().TSFlags) &&
+         "Foldable unmasked pseudo should have a policy op already");
+
+  // Make sure the mask dominates True, otherwise move down True so it does.
+  // VL will always dominate because if its a register they need to be the same.
+  if (!ensureDominates(MaskOp, True))
+    return false;
+
+  True.setDesc(TII->get(Info->MaskedPseudo));
+
+  // Insert the mask operand.
+  // TODO: Increment MaskOpIdx by number of explicit defs?
+  True.insert(&True.getOperand(Info->MaskOpIdx + True.getNumExplicitDefs()),
+              MachineOperand::CreateReg(MaskOp.getReg(), false));
+
+  // Update the passthru, AVL and policy.
+  True.getOperand(True.getNumExplicitDefs()).setReg(FalseReg);
+  True.removeOperand(RISCVII::getVLOpNum(True.getDesc()));
+  True.insert(&True.getOperand(RISCVII::getVLOpNum(True.getDesc())), MinVL);
+  True.getOperand(RISCVII::getVecPolicyOpNum(True.getDesc())).setImm(Policy);
+
+  MRI->replaceRegWith(True.getOperand(0).getReg(), MI.getOperand(0).getReg());
+  // Now that True is masked, constrain its operands from vr -> vrnov0.
+  for (MachineOperand &MO : True.explicit_operands()) {
+    if (!MO.isReg() || !MO.getReg().isVirtual())
+      continue;
+    MRI->constrainRegClass(
+        MO.getReg(), True.getRegClassConstraint(MO.getOperandNo(), TII, TRI));
+  }
+  MI.eraseFromParent();
+
+  return true;
+}
+
 bool RISCVVectorPeephole::runOnMachineFunction(MachineFunction &MF) {
   if (skipFunction(MF.getFunction()))
     return false;
@@ -679,6 +817,9 @@ bool RISCVVectorPeephole::runOnMachineFunction(MachineFunction &MF) {
   bool Changed = false;
 
   for (MachineBasicBlock &MBB : MF) {
+    for (MachineInstr &MI : make_early_inc_range(MBB))
+      Changed |= foldVMergeToMask(MI);
+
     for (MachineInstr &MI : make_early_inc_range(MBB)) {
       Changed |= convertToVLMAX(MI);
       Changed |= tryToReduceVL(MI);
diff --git a/llvm/test/CodeGen/RISCV/rvv/combine-reduce-add-to-vcpop.ll b/llvm/test/CodeGen/RISCV/rvv/combine-reduce-add-to-vcpop.ll
index 5dc532273b770..0d8aff306252e 100644
--- a/llvm/test/CodeGen/RISCV/rvv/combine-reduce-add-to-vcpop.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/combine-reduce-add-to-vcpop.ll
@@ -313,12 +313,12 @@ define i32 @test_nxv128i1(<vscale x 128 x i1> %x) {
 ; CHECK-NEXT:    vslidedown.vx v0, v6, a0
 ; CHECK-NEXT:    vsetvli a2, zero, e8, m1, ta, ma
 ; CHECK-NEXT:    vslidedown.vx v6, v7, a1
-; CHECK-NEXT:    vsetvli a1, zero, e32, m8, ta, ma
-; CHECK-NEXT:    vmerge.vim v8, v8, 1, v0
 ; CHECK-NEXT:    vsetvli a1, zero, e8, mf2, ta, ma
-; CHECK-NEXT:    vslidedown.vx v0, v7, a0
 ; CHECK-NEXT:    vslidedown.vx v5, v6, a0
+; CHECK-NEXT:    vslidedown.vx v4, v7, a0
 ; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, mu
+; CHECK-NEXT:    vmerge.vim v8, v8, 1, v0
+; CHECK-NEXT:    vmv1r.v v0, v4
 ; CHECK-NEXT:    vadd.vi v8, v8, 1, v0.t
 ; CHECK-NEXT:    vmv1r.v v0, v5
 ; CHECK-NEXT:    vadd.vi v16, v16, 1, v0.t
@@ -364,9 +364,9 @@ define i32 @test_nxv256i1(<vscale x 256 x i1> %x) {
 ; CHECK-NEXT:    vmv1r.v v7, v9
 ; CHECK-NEXT:    vmv1r.v v5, v8
 ; CHECK-NEXT:    vmv1r.v v4, v0
-; CHECK-NEXT:    vmv.v.i v24, 0
+; CHECK-NEXT:    vmv.v.i v16, 0
 ; CHECK-NEXT:    csrr a1, vlenb
-; CHECK-NEXT:    vmerge.vim v8, v24, 1, v0
+; CHECK-NEXT:    vmerge.vim v8, v16, 1, v0
 ; CHECK-NEXT:    csrr a0, vlenb
 ; CHECK-NEXT:    slli a0, a0, 3
 ; CHECK-NEXT:    mv a2, a0
@@ -376,7 +376,7 @@ define i32 @test_nxv256i1(<vscale x 256 x i1> %x) {
 ; CHECK-NEXT:    addi a0, a0, 16
 ; CHECK-NEXT:    vs8r.v v8, (a0) # vscale x 64-byte Folded Spill
 ; CHECK-NEXT:    vmv1r.v v0, v5
-; CHECK-NEXT:    vmerge.vim v8, v24, 1, v0
+; CHECK-NEXT:    vmerge.vim v8, v16, 1, v0
 ; CHECK-NEXT:    csrr a0, vlenb
 ; CHECK-NEXT:    slli a0, a0, 5
 ; CHECK-NEXT:    add a0, sp, a0
@@ -389,7 +389,7 @@ define i32 @test_nxv256i1(<vscale x 256 x i1> %x) {
 ; CHECK-NEXT:    vslidedown.vx v2, v5, a0
 ; CHECK-NEXT:    vmv.v.v v0, v3
 ; CHECK-NEXT:    vsetvli a2, zero, e32, m8, ta, ma
-; CHECK-NEXT:    vmerge.vim v8, v24, 1, v0
+; CHECK-NEXT:    vmerge.vim v8, v16, 1, v0
 ; CHECK-NEXT:    csrr a2, vlenb
 ; CHECK-NEXT:    slli a2, a2, 3
 ; CHECK-NEXT:    mv a3, a2
@@ -399,41 +399,41 @@ define i32 @test_nxv256i1(<vscale x 256 x i1> %x) {
 ; CHECK-NEXT:    addi a2, a2, 16
 ; CHECK-NEXT:    vs8r.v v8, (a2) # vscale x 64-byte Folded Spill
 ; CHECK-NEXT:    vmv1r.v v0, v2
-; CHECK-NEXT:    vmerge.vim v8, v24, 1, v0
+; CHECK-NEXT:    vmv8r.v v8, v16
+; CHECK-NEXT:    vmerge.vim v16, v16, 1, v0
 ; CHECK-NEXT:    csrr a2, vlenb
 ; CHECK-NEXT:    slli a2, a2, 4
 ; CHECK-NEXT:    add a2, sp, a2
 ; CHECK-NEXT:    addi a2, a2, 16
-; CHECK-NEXT:    vs8r.v v8, (a2) # vscale x 64-byte Folded Spill
+; CHECK-NEXT:    vs8r.v v16, (a2) # vscale x 64-byte Folded Spill
 ; CHECK-NEXT:    vsetvli a2, zero, e8, mf2, ta, ma
 ; CHECK...
[truncated]

// to the tail. In that case we always need to use tail undisturbed to
// preserve them.
uint64_t Policy = RISCVVType::TAIL_UNDISTURBED_MASK_UNDISTURBED;
if (PassthruReg == RISCV::NoRegister && RISCV::isVLKnownLE(VMergeVL, MinVL))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!PassthruReg.isValid() or !PassthruReg

Comment on lines +388 to +400
auto LookThruVirtRegCopies = [this](Register Reg) {
while (MachineInstr *Def = MRI->getUniqueVRegDef(Reg)) {
if (!Def->isFullCopy())
break;
Register Src = Def->getOperand(1).getReg();
if (!Src.isVirtual())
break;
Reg = Src;
}
return Reg;
};

return LookThruVirtRegCopies(A) == LookThruVirtRegCopies(B);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention, when plugging in isKnownSameDefs I noticed that lookThruCopyLike can actually return a physical register if it's the source of a copy chain, and we can't compare physical registers. So I've reworked it here to only look through virtual register COPYs.

Happy to split this off into a separate PR if preferred

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general.

I see this adds some extra vmvs in some cases and also removes some vtype toggles.

; CHECK-NEXT: vsetvli a0, zero, e32, m8, ta, mu
; CHECK-NEXT: vmerge.vim v8, v8, 1, v0
; CHECK-NEXT: vmv1r.v v0, v4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra vmv.

@@ -399,41 +399,41 @@ define i32 @test_nxv256i1(<vscale x 256 x i1> %x) {
; CHECK-NEXT: addi a2, a2, 16
; CHECK-NEXT: vs8r.v v8, (a2) # vscale x 64-byte Folded Spill
; CHECK-NEXT: vmv1r.v v0, v2
; CHECK-NEXT: vmerge.vim v8, v24, 1, v0
; CHECK-NEXT: vmv8r.v v8, v16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra vmv.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants