[RISCV] Move performCombineVMergeAndVOps to RISCVVectorPeephole #144076

lukel97 · 2025-06-13T13:36:01Z

This moves the peephole that folds vmerges into its operands into RISCVVectorPeephole. This will also allow us to eventually commute instructions to allow folding, see #141885 and #70042

Most of the test diffs are due to the slight change in instruction ordering.

For now doPeepholeMaskedRVV is kept even though its a duplicate of RISCVVectorPeephole::convertToUnmasked to minimize the diff, I plan on removing it in a separate patch as it causes some instructions to be shuffled around.

Similarly, this runs foldVMergeToMask before the other peepholes to minimize the diff for now.

rvv-peephole-vmerge-vops-mir.ll was replaced with a dedicated vmerge-peephole.mir test.

This moves the peephole that folds vmerges into its operands into RISCVVectorPeephole. This will also allow us to eventually commute instructions to allow folding, see llvm#141885 and llvm#70042 Most of the test diffs are due to the slight change in instruction ordering. For now doPeepholeMaskedRVV is kept even though its a duplicate of RISCVVectorPeephole::convertToUnmasked to minimize the diff, I plan on removing it in a separate patch as it causes some instructions to be shuffled around. Similarly, this runs foldVMergeToMask before the other peepholes to minimize the diff for now. rvv-peephole-vmerge-vops-mir.ll was replaced with a dedicate vmerge-peephole.mir test.

llvmbot · 2025-06-13T13:36:30Z

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

This moves the peephole that folds vmerges into its operands into RISCVVectorPeephole. This will also allow us to eventually commute instructions to allow folding, see #141885 and #70042

Most of the test diffs are due to the slight change in instruction ordering.

For now doPeepholeMaskedRVV is kept even though its a duplicate of RISCVVectorPeephole::convertToUnmasked to minimize the diff, I plan on removing it in a separate patch as it causes some instructions to be shuffled around.

Similarly, this runs foldVMergeToMask before the other peepholes to minimize the diff for now.

rvv-peephole-vmerge-vops-mir.ll was replaced with a dedicate vmerge-peephole.mir test.

Patch is 103.88 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144076.diff

15 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp (-214)
(modified) llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h (-1)
(modified) llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp (+148-7)
(modified) llvm/test/CodeGen/RISCV/rvv/combine-reduce-add-to-vcpop.ll (+34-33)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll (+18-18)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+198-198)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-mask-buildvec.ll (+40-40)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+133-133)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-deinterleave2.ll (+96-96)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-int-interleave.ll (+34-21)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-int.ll (+11-12)
(removed) llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops-mir.ll (-45)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-fixed.ll (+10-12)
(added) llvm/test/CodeGen/RISCV/rvv/vmerge-peephole.mir (+57)
(modified) llvm/test/CodeGen/RISCV/rvv/vwadd-sdnode.ll (+6-8)

diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index 4539efd591c8b..604b924e94b66 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -163,8 +163,6 @@ void RISCVDAGToDAGISel::PostprocessISelDAG() {
 
   CurDAG->setRoot(Dummy.getValue());
 
-  MadeChange |= doPeepholeMergeVVMFold();
-
   // After we're done with everything else, convert IMPLICIT_DEF
   // passthru operands to NoRegister.  This is required to workaround
   // an optimization deficiency in MachineCSE.  This really should
@@ -4069,218 +4067,6 @@ bool RISCVDAGToDAGISel::doPeepholeMaskedRVV(MachineSDNode *N) {
   return true;
 }
 
-static bool IsVMerge(SDNode *N) {
-  return RISCV::getRVVMCOpcode(N->getMachineOpcode()) == RISCV::VMERGE_VVM;
-}
-
-// Try to fold away VMERGE_VVM instructions into their true operands:
-//
-// %true = PseudoVADD_VV ...
-// %x = PseudoVMERGE_VVM %false, %false, %true, %mask
-// ->
-// %x = PseudoVADD_VV_MASK %false, ..., %mask
-//
-// We can only fold if vmerge's passthru operand, vmerge's false operand and
-// %true's passthru operand (if it has one) are the same. This is because we
-// have to consolidate them into one passthru operand in the result.
-//
-// If %true is masked, then we can use its mask instead of vmerge's if vmerge's
-// mask is all ones.
-//
-// The resulting VL is the minimum of the two VLs.
-//
-// The resulting policy is the effective policy the vmerge would have had,
-// i.e. whether or not it's passthru operand was implicit-def.
-bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
-  SDValue Passthru, False, True, VL, Mask;
-  assert(IsVMerge(N));
-  Passthru = N->getOperand(0);
-  False = N->getOperand(1);
-  True = N->getOperand(2);
-  Mask = N->getOperand(3);
-  VL = N->getOperand(4);
-
-  // If the EEW of True is different from vmerge's SEW, then we can't fold.
-  if (True.getSimpleValueType() != N->getSimpleValueType(0))
-    return false;
-
-  // We require that either passthru and false are the same, or that passthru
-  // is undefined.
-  if (Passthru != False && !isImplicitDef(Passthru))
-    return false;
-
-  assert(True.getResNo() == 0 &&
-         "Expect True is the first output of an instruction.");
-
-  // Need N is the exactly one using True.
-  if (!True.hasOneUse())
-    return false;
-
-  if (!True.isMachineOpcode())
-    return false;
-
-  unsigned TrueOpc = True.getMachineOpcode();
-  const MCInstrDesc &TrueMCID = TII->get(TrueOpc);
-  uint64_t TrueTSFlags = TrueMCID.TSFlags;
-  bool HasTiedDest = RISCVII::isFirstDefTiedToFirstUse(TrueMCID);
-
-  const RISCV::RISCVMaskedPseudoInfo *Info =
-      RISCV::lookupMaskedIntrinsicByUnmasked(TrueOpc);
-  if (!Info)
-    return false;
-
-  // If True has a passthru operand then it needs to be the same as vmerge's
-  // False, since False will be used for the result's passthru operand.
-  if (HasTiedDest && !isImplicitDef(True->getOperand(0))) {
-    SDValue PassthruOpTrue = True->getOperand(0);
-    if (False != PassthruOpTrue)
-      return false;
-  }
-
-  // Skip if True has side effect.
-  if (TII->get(TrueOpc).hasUnmodeledSideEffects())
-    return false;
-
-  unsigned TrueChainOpIdx = True.getNumOperands() - 1;
-  bool HasChainOp =
-      True.getOperand(TrueChainOpIdx).getValueType() == MVT::Other;
-
-  if (HasChainOp) {
-    // Avoid creating cycles in the DAG. We must ensure that none of the other
-    // operands depend on True through it's Chain.
-    SmallVector<const SDNode *, 4> LoopWorklist;
-    SmallPtrSet<const SDNode *, 16> Visited;
-    LoopWorklist.push_back(False.getNode());
-    LoopWorklist.push_back(Mask.getNode());
-    LoopWorklist.push_back(VL.getNode());
-    if (SDNode::hasPredecessorHelper(True.getNode(), Visited, LoopWorklist))
-      return false;
-  }
-
-  // The vector policy operand may be present for masked intrinsics
-  bool HasVecPolicyOp = RISCVII::hasVecPolicyOp(TrueTSFlags);
-  unsigned TrueVLIndex =
-      True.getNumOperands() - HasVecPolicyOp - HasChainOp - 2;
-  SDValue TrueVL = True.getOperand(TrueVLIndex);
-  SDValue SEW = True.getOperand(TrueVLIndex + 1);
-
-  auto GetMinVL = [](SDValue LHS, SDValue RHS) {
-    if (LHS == RHS)
-      return LHS;
-    if (isAllOnesConstant(LHS))
-      return RHS;
-    if (isAllOnesConstant(RHS))
-      return LHS;
-    auto *CLHS = dyn_cast<ConstantSDNode>(LHS);
-    auto *CRHS = dyn_cast<ConstantSDNode>(RHS);
-    if (!CLHS || !CRHS)
-      return SDValue();
-    return CLHS->getZExtValue() <= CRHS->getZExtValue() ? LHS : RHS;
-  };
-
-  // Because N and True must have the same passthru operand (or True's operand
-  // is implicit_def), the "effective" body is the minimum of their VLs.
-  SDValue OrigVL = VL;
-  VL = GetMinVL(TrueVL, VL);
-  if (!VL)
-    return false;
-
-  // Some operations produce different elementwise results depending on the
-  // active elements, like viota.m or vredsum. This transformation is illegal
-  // for these if we change the active elements (i.e. mask or VL).
-  const MCInstrDesc &TrueBaseMCID = TII->get(RISCV::getRVVMCOpcode(TrueOpc));
-  if (RISCVII::elementsDependOnVL(TrueBaseMCID.TSFlags) && (TrueVL != VL))
-    return false;
-  if (RISCVII::elementsDependOnMask(TrueBaseMCID.TSFlags) &&
-      (Mask && !usesAllOnesMask(Mask)))
-    return false;
-
-  // Make sure it doesn't raise any observable fp exceptions, since changing the
-  // active elements will affect how fflags is set.
-  if (mayRaiseFPException(True.getNode()) && !True->getFlags().hasNoFPExcept())
-    return false;
-
-  SDLoc DL(N);
-
-  unsigned MaskedOpc = Info->MaskedPseudo;
-#ifndef NDEBUG
-  const MCInstrDesc &MaskedMCID = TII->get(MaskedOpc);
-  assert(RISCVII::hasVecPolicyOp(MaskedMCID.TSFlags) &&
-         "Expected instructions with mask have policy operand.");
-  assert(MaskedMCID.getOperandConstraint(MaskedMCID.getNumDefs(),
-                                         MCOI::TIED_TO) == 0 &&
-         "Expected instructions with mask have a tied dest.");
-#endif
-
-  // Use a tumu policy, relaxing it to tail agnostic provided that the passthru
-  // operand is undefined.
-  //
-  // However, if the VL became smaller than what the vmerge had originally, then
-  // elements past VL that were previously in the vmerge's body will have moved
-  // to the tail. In that case we always need to use tail undisturbed to
-  // preserve them.
-  bool MergeVLShrunk = VL != OrigVL;
-  uint64_t Policy = (isImplicitDef(Passthru) && !MergeVLShrunk)
-                        ? RISCVVType::TAIL_AGNOSTIC
-                        : /*TUMU*/ 0;
-  SDValue PolicyOp =
-    CurDAG->getTargetConstant(Policy, DL, Subtarget->getXLenVT());
-
-
-  SmallVector<SDValue, 8> Ops;
-  Ops.push_back(False);
-
-  const bool HasRoundingMode = RISCVII::hasRoundModeOp(TrueTSFlags);
-  const unsigned NormalOpsEnd = TrueVLIndex - HasRoundingMode;
-  Ops.append(True->op_begin() + HasTiedDest, True->op_begin() + NormalOpsEnd);
-
-  Ops.push_back(Mask);
-
-  // For unmasked "VOp" with rounding mode operand, that is interfaces like
-  // (..., rm, vl) or (..., rm, vl, policy).
-  // Its masked version is (..., vm, rm, vl, policy).
-  // Check the rounding mode pseudo nodes under RISCVInstrInfoVPseudos.td
-  if (HasRoundingMode)
-    Ops.push_back(True->getOperand(TrueVLIndex - 1));
-
-  Ops.append({VL, SEW, PolicyOp});
-
-  // Result node should have chain operand of True.
-  if (HasChainOp)
-    Ops.push_back(True.getOperand(TrueChainOpIdx));
-
-  MachineSDNode *Result =
-      CurDAG->getMachineNode(MaskedOpc, DL, True->getVTList(), Ops);
-  Result->setFlags(True->getFlags());
-
-  if (!cast<MachineSDNode>(True)->memoperands_empty())
-    CurDAG->setNodeMemRefs(Result, cast<MachineSDNode>(True)->memoperands());
-
-  // Replace vmerge.vvm node by Result.
-  ReplaceUses(SDValue(N, 0), SDValue(Result, 0));
-
-  // Replace another value of True. E.g. chain and VL.
-  for (unsigned Idx = 1; Idx < True->getNumValues(); ++Idx)
-    ReplaceUses(True.getValue(Idx), SDValue(Result, Idx));
-
-  return true;
-}
-
-bool RISCVDAGToDAGISel::doPeepholeMergeVVMFold() {
-  bool MadeChange = false;
-  SelectionDAG::allnodes_iterator Position = CurDAG->allnodes_end();
-
-  while (Position != CurDAG->allnodes_begin()) {
-    SDNode *N = &*--Position;
-    if (N->use_empty() || !N->isMachineOpcode())
-      continue;
-
-    if (IsVMerge(N))
-      MadeChange |= performCombineVMergeAndVOps(N);
-  }
-  return MadeChange;
-}
-
 /// If our passthru is an implicit_def, use noreg instead.  This side
 /// steps issues with MachineCSE not being able to CSE expressions with
 /// IMPLICIT_DEF operands while preserving the semantic intent. See
diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
index cb63c21fd8fc9..59c1b48282f5b 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
@@ -199,7 +199,6 @@ class RISCVDAGToDAGISel : public SelectionDAGISel {
 private:
   bool doPeepholeSExtW(SDNode *Node);
   bool doPeepholeMaskedRVV(MachineSDNode *Node);
-  bool doPeepholeMergeVVMFold();
   bool doPeepholeNoRegPassThru();
   bool performCombineVMergeAndVOps(SDNode *N);
   bool selectImm64IfCheaper(int64_t Imm, int64_t OrigImm, SDValue N,
diff --git a/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp b/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
index c9c2413d009b7..69c288c7a2e7a 100644
--- a/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
@@ -67,12 +67,13 @@ class RISCVVectorPeephole : public MachineFunctionPass {
   bool convertSameMaskVMergeToVMv(MachineInstr &MI);
   bool foldUndefPassthruVMV_V_V(MachineInstr &MI);
   bool foldVMV_V_V(MachineInstr &MI);
+  bool foldVMergeToMask(MachineInstr &MI) const;
 
   bool hasSameEEW(const MachineInstr &User, const MachineInstr &Src) const;
   bool isAllOnesMask(const MachineInstr *MaskDef) const;
   std::optional<unsigned> getConstant(const MachineOperand &VL) const;
   bool ensureDominates(const MachineOperand &Use, MachineInstr &Src) const;
-  bool isKnownSameDefs(const MachineOperand &A, const MachineOperand &B) const;
+  bool isKnownSameDefs(Register A, Register B) const;
 };
 
 } // namespace
@@ -380,13 +381,23 @@ bool RISCVVectorPeephole::convertAllOnesVMergeToVMv(MachineInstr &MI) const {
   return true;
 }
 
-bool RISCVVectorPeephole::isKnownSameDefs(const MachineOperand &A,
-                                          const MachineOperand &B) const {
-  if (A.getReg().isPhysical() || B.getReg().isPhysical())
+bool RISCVVectorPeephole::isKnownSameDefs(Register A, Register B) const {
+  if (A.isPhysical() || B.isPhysical())
     return false;
 
-  return TRI->lookThruCopyLike(A.getReg(), MRI) ==
-         TRI->lookThruCopyLike(B.getReg(), MRI);
+  auto LookThruVirtRegCopies = [this](Register Reg) {
+    while (MachineInstr *Def = MRI->getUniqueVRegDef(Reg)) {
+      if (!Def->isFullCopy())
+        break;
+      Register Src = Def->getOperand(1).getReg();
+      if (!Src.isVirtual())
+        break;
+      Reg = Src;
+    }
+    return Reg;
+  };
+
+  return LookThruVirtRegCopies(A) == LookThruVirtRegCopies(B);
 }
 
 /// If a PseudoVMERGE_VVM's true operand is a masked pseudo and both have the
@@ -414,7 +425,7 @@ bool RISCVVectorPeephole::convertSameMaskVMergeToVMv(MachineInstr &MI) {
   const MachineOperand &TrueMask =
       True->getOperand(TrueMaskedInfo->MaskOpIdx + True->getNumExplicitDefs());
   const MachineOperand &MIMask = MI.getOperand(4);
-  if (!isKnownSameDefs(TrueMask, MIMask))
+  if (!isKnownSameDefs(TrueMask.getReg(), MIMask.getReg()))
     return false;
 
   // True's passthru needs to be equivalent to False
@@ -663,6 +674,133 @@ bool RISCVVectorPeephole::foldVMV_V_V(MachineInstr &MI) {
   return true;
 }
 
+/// Try to fold away VMERGE_VVM instructions into their operands:
+///
+/// %true = PseudoVADD_VV ...
+/// %x = PseudoVMERGE_VVM_M1 %false, %false, %true, %mask
+/// ->
+/// %x = PseudoVADD_VV_M1_MASK %false, ..., %mask
+///
+/// We can only fold if vmerge's passthru operand, vmerge's false operand and
+/// %true's passthru operand (if it has one) are the same. This is because we
+/// have to consolidate them into one passthru operand in the result.
+///
+/// If %true is masked, then we can use its mask instead of vmerge's if vmerge's
+/// mask is all ones.
+///
+/// The resulting VL is the minimum of the two VLs.
+///
+/// The resulting policy is the effective policy the vmerge would have had,
+/// i.e. whether or not it's passthru operand was implicit-def.
+bool RISCVVectorPeephole::foldVMergeToMask(MachineInstr &MI) const {
+  if (RISCV::getRVVMCOpcode(MI.getOpcode()) != RISCV::VMERGE_VVM)
+    return false;
+
+  Register PassthruReg = MI.getOperand(1).getReg();
+  Register FalseReg = MI.getOperand(2).getReg();
+  Register TrueReg = MI.getOperand(3).getReg();
+  if (!TrueReg.isVirtual() || !MRI->hasOneUse(TrueReg))
+    return false;
+  MachineInstr &True = *MRI->getUniqueVRegDef(TrueReg);
+  if (True.getParent() != MI.getParent())
+    return false;
+  const MachineOperand &MaskOp = MI.getOperand(4);
+  MachineInstr *Mask = MRI->getUniqueVRegDef(MaskOp.getReg());
+  assert(Mask);
+
+  const RISCV::RISCVMaskedPseudoInfo *Info =
+      RISCV::lookupMaskedIntrinsicByUnmasked(True.getOpcode());
+  if (!Info)
+    return false;
+
+  // If the EEW of True is different from vmerge's SEW, then we can't fold.
+  if (!hasSameEEW(MI, True))
+    return false;
+
+  // We require that either passthru and false are the same, or that passthru
+  // is undefined.
+  if (PassthruReg != RISCV::NoRegister &&
+      !isKnownSameDefs(PassthruReg, FalseReg))
+    return false;
+
+  // If True has a passthru operand then it needs to be the same as vmerge's
+  // False, since False will be used for the result's passthru operand.
+  Register TruePassthru = True.getOperand(True.getNumExplicitDefs()).getReg();
+  if (RISCVII::isFirstDefTiedToFirstUse(True.getDesc()) &&
+      TruePassthru != RISCV::NoRegister &&
+      !isKnownSameDefs(TruePassthru, FalseReg))
+    return false;
+
+  // Make sure it doesn't raise any observable fp exceptions, since changing the
+  // active elements will affect how fflags is set.
+  if (True.hasUnmodeledSideEffects() || True.mayRaiseFPException())
+    return false;
+
+  const MachineOperand &VMergeVL =
+      MI.getOperand(RISCVII::getVLOpNum(MI.getDesc()));
+  const MachineOperand &TrueVL =
+      True.getOperand(RISCVII::getVLOpNum(True.getDesc()));
+
+  MachineOperand MinVL = MachineOperand::CreateImm(0);
+  if (RISCV::isVLKnownLE(TrueVL, VMergeVL))
+    MinVL = TrueVL;
+  else if (RISCV::isVLKnownLE(VMergeVL, TrueVL))
+    MinVL = VMergeVL;
+  else
+    return false;
+
+  unsigned RVVTSFlags =
+      TII->get(RISCV::getRVVMCOpcode(True.getOpcode())).TSFlags;
+  if (RISCVII::elementsDependOnVL(RVVTSFlags) && !TrueVL.isIdenticalTo(MinVL))
+    return false;
+  if (RISCVII::elementsDependOnMask(RVVTSFlags) && !isAllOnesMask(Mask))
+    return false;
+
+  // Use a tumu policy, relaxing it to tail agnostic provided that the passthru
+  // operand is undefined.
+  //
+  // However, if the VL became smaller than what the vmerge had originally, then
+  // elements past VL that were previously in the vmerge's body will have moved
+  // to the tail. In that case we always need to use tail undisturbed to
+  // preserve them.
+  uint64_t Policy = RISCVVType::TAIL_UNDISTURBED_MASK_UNDISTURBED;
+  if (PassthruReg == RISCV::NoRegister && RISCV::isVLKnownLE(VMergeVL, MinVL))
+    Policy |= RISCVVType::TAIL_AGNOSTIC;
+
+  assert(RISCVII::hasVecPolicyOp(True.getDesc().TSFlags) &&
+         "Foldable unmasked pseudo should have a policy op already");
+
+  // Make sure the mask dominates True, otherwise move down True so it does.
+  // VL will always dominate because if its a register they need to be the same.
+  if (!ensureDominates(MaskOp, True))
+    return false;
+
+  True.setDesc(TII->get(Info->MaskedPseudo));
+
+  // Insert the mask operand.
+  // TODO: Increment MaskOpIdx by number of explicit defs?
+  True.insert(&True.getOperand(Info->MaskOpIdx + True.getNumExplicitDefs()),
+              MachineOperand::CreateReg(MaskOp.getReg(), false));
+
+  // Update the passthru, AVL and policy.
+  True.getOperand(True.getNumExplicitDefs()).setReg(FalseReg);
+  True.removeOperand(RISCVII::getVLOpNum(True.getDesc()));
+  True.insert(&True.getOperand(RISCVII::getVLOpNum(True.getDesc())), MinVL);
+  True.getOperand(RISCVII::getVecPolicyOpNum(True.getDesc())).setImm(Policy);
+
+  MRI->replaceRegWith(True.getOperand(0).getReg(), MI.getOperand(0).getReg());
+  // Now that True is masked, constrain its operands from vr -> vrnov0.
+  for (MachineOperand &MO : True.explicit_operands()) {
+    if (!MO.isReg() || !MO.getReg().isVirtual())
+      continue;
+    MRI->constrainRegClass(
+        MO.getReg(), True.getRegClassConstraint(MO.getOperandNo(), TII, TRI));
+  }
+  MI.eraseFromParent();
+
+  return true;
+}
+
 bool RISCVVectorPeephole::runOnMachineFunction(MachineFunction &MF) {
   if (skipFunction(MF.getFunction()))
     return false;
@@ -679,6 +817,9 @@ bool RISCVVectorPeephole::runOnMachineFunction(MachineFunction &MF) {
   bool Changed = false;
 
   for (MachineBasicBlock &MBB : MF) {
+    for (MachineInstr &MI : make_early_inc_range(MBB))
+      Changed |= foldVMergeToMask(MI);
+
     for (MachineInstr &MI : make_early_inc_range(MBB)) {
       Changed |= convertToVLMAX(MI);
       Changed |= tryToReduceVL(MI);
diff --git a/llvm/test/CodeGen/RISCV/rvv/combine-reduce-add-to-vcpop.ll b/llvm/test/CodeGen/RISCV/rvv/combine-reduce-add-to-vcpop.ll
index 5dc532273b770..0d8aff306252e 100644
--- a/llvm/test/CodeGen/RISCV/rvv/combine-reduce-add-to-vcpop.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/combine-reduce-add-to-vcpop.ll
@@ -313,12 +313,12 @@ define i32 @test_nxv128i1(<vscale x 128 x i1> %x) {
 ; CHECK-NEXT:    vslidedown.vx v0, v6, a0
 ; CHECK-NEXT:    vsetvli a2, zero, e8, m1, ta, ma
 ; CHECK-NEXT:    vslidedown.vx v6, v7, a1
-; CHECK-NEXT:    vsetvli a1, zero, e32, m8, ta, ma
-; CHECK-NEXT:    vmerge.vim v8, v8, 1, v0
 ; CHECK-NEXT:    vsetvli a1, zero, e8, mf2, ta, ma
-; CHECK-NEXT:    vslidedown.vx v0, v7, a0
 ; CHECK-NEXT:    vslidedown.vx v5, v6, a0
+; CHECK-NEXT:    vslidedown.vx v4, v7, a0
 ; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, mu
+; CHECK-NEXT:    vmerge.vim v8, v8, 1, v0
+; CHECK-NEXT:    vmv1r.v v0, v4
 ; CHECK-NEXT:    vadd.vi v8, v8, 1, v0.t
 ; CHECK-NEXT:    vmv1r.v v0, v5
 ; CHECK-NEXT:    vadd.vi v16, v16, 1, v0.t
@@ -364,9 +364,9 @@ define i32 @test_nxv256i1(<vscale x 256 x i1> %x) {
 ; CHECK-NEXT:    vmv1r.v v7, v9
 ; CHECK-NEXT:    vmv1r.v v5, v8
 ; CHECK-NEXT:    vmv1r.v v4, v0
-; CHECK-NEXT:    vmv.v.i v24, 0
+; CHECK-NEXT:    vmv.v.i v16, 0
 ; CHECK-NEXT:    csrr a1, vlenb
-; CHECK-NEXT:    vmerge.vim v8, v24, 1, v0
+; CHECK-NEXT:    vmerge.vim v8, v16, 1, v0
 ; CHECK-NEXT:    csrr a0, vlenb
 ; CHECK-NEXT:    slli a0, a0, 3
 ; CHECK-NEXT:    mv a2, a0
@@ -376,7 +376,7 @@ define i32 @test_nxv256i1(<vscale x 256 x i1> %x) {
 ; CHECK-NEXT:    addi a0, a0, 16
 ; CHECK-NEXT:    vs8r.v v8, (a0) # vscale x 64-byte Folded Spill
 ; CHECK-NEXT:    vmv1r.v v0, v5
-; CHECK-NEXT:    vmerge.vim v8, v24, 1, v0
+; CHECK-NEXT:    vmerge.vim v8, v16, 1, v0
 ; CHECK-NEXT:    csrr a0, vlenb
 ; CHECK-NEXT:    slli a0, a0, 5
 ; CHECK-NEXT:    add a0, sp, a0
@@ -389,7 +389,7 @@ define i32 @test_nxv256i1(<vscale x 256 x i1> %x) {
 ; CHECK-NEXT:    vslidedown.vx v2, v5, a0
 ; CHECK-NEXT:    vmv.v.v v0, v3
 ; CHECK-NEXT:    vsetvli a2, zero, e32, m8, ta, ma
-; CHECK-NEXT:    vmerge.vim v8, v24, 1, v0
+; CHECK-NEXT:    vmerge.vim v8, v16, 1, v0
 ; CHECK-NEXT:    csrr a2, vlenb
 ; CHECK-NEXT:    slli a2, a2, 3
 ; CHECK-NEXT:    mv a3, a2
@@ -399,41 +399,41 @@ define i32 @test_nxv256i1(<vscale x 256 x i1> %x) {
 ; CHECK-NEXT:    addi a2, a2, 16
 ; CHECK-NEXT:    vs8r.v v8, (a2) # vscale x 64-byte Folded Spill
 ; CHECK-NEXT:    vmv1r.v v0, v2
-; CHECK-NEXT:    vmerge.vim v8, v24, 1, v0
+; CHECK-NEXT:    vmv8r.v v8, v16
+; CHECK-NEXT:    vmerge.vim v16, v16, 1, v0
 ; CHECK-NEXT:    csrr a2, vlenb
 ; CHECK-NEXT:    slli a2, a2, 4
 ; CHECK-NEXT:    add a2, sp, a2
 ; CHECK-NEXT:    addi a2, a2, 16
-; CHECK-NEXT:    vs8r.v v8, (a2) # vscale x 64-byte Folded Spill
+; CHECK-NEXT:    vs8r.v v16, (a2) # vscale x 64-byte Folded Spill
 ; CHECK-NEXT:    vsetvli a2, zero, e8, mf2, ta, ma
 ; CHECK...
[truncated]