Skip to content

[RISCV] Implement cross basic block VXRM write insertion. #70382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 2, 2023

Conversation

topperc
Copy link
Collaborator

@topperc topperc commented Oct 26, 2023

This adds a new pass to insert VXRM writes for vector instructions. With the goal of avoiding redundant writes.

The pass does 2 dataflow algorithms. The first is a forward data flow to calculate where a VXRM value is available. The second is a backwards dataflow to determine where a VXRM value is anticipated.

Finally, we use the results of these two dataflows to insert VXRM writes where a value is anticipated, but not available.

The pass does not split critical edges so we aren't always able to eliminate all redundancy.

The pass will only insert vxrm writes on paths that always require it.

This adds a new pass to insert VXRM writes for vector instructions.
With the goal of avoiding redundant writes.

The pass does 2 dataflow algorithms. The first is a forward data flow
to calculate where a VXRM value is available. The second is a backwards
dataflow to determine where a VXRM value is anticipated.

Finally, we use the results of these two dataflows to insert VXRM writes
where a value is anticipated, but not available.

The pass does not split critical edges so we aren't always able
to eliminate all redundancy.

The pass will only insert vxrm writes on paths that always require
it.
@llvmbot
Copy link
Member

llvmbot commented Oct 26, 2023

@llvm/pr-subscribers-backend-risc-v

Author: Craig Topper (topperc)

Changes

This adds a new pass to insert VXRM writes for vector instructions. With the goal of avoiding redundant writes.

The pass does 2 dataflow algorithms. The first is a forward data flow to calculate where a VXRM value is available. The second is a backwards dataflow to determine where a VXRM value is anticipated.

Finally, we use the results of these two dataflows to insert VXRM writes where a value is anticipated, but not available.

The pass does not split critical edges so we aren't always able to eliminate all redundancy.

The pass will only insert vxrm writes on paths that always require it.


Patch is 588.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/70382.diff

29 Files Affected:

  • (modified) llvm/lib/Target/RISCV/CMakeLists.txt (+1)
  • (modified) llvm/lib/Target/RISCV/RISCV.h (+3)
  • (modified) llvm/lib/Target/RISCV/RISCVInsertReadWriteCSR.cpp (+2-17)
  • (added) llvm/lib/Target/RISCV/RISCVInsertWriteVXRM.cpp (+457)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+2)
  • (modified) llvm/test/CodeGen/RISCV/O0-pipeline.ll (+1)
  • (modified) llvm/test/CodeGen/RISCV/O3-pipeline.ll (+1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/copyprop.mir (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/masked-tama.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/masked-tamu.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/masked-tuma.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/masked-tumu.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/mutate-prior-vsetvli-avl.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/unmasked-tu.ll (+11-11)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vaadd.ll (+88-88)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vaaddu.ll (+88-88)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vasub.ll (+88-88)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vasubu.ll (+88-88)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vnclip.ll (+90-90)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vnclipu.ll (+90-90)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsmul-rv32.ll (+80-80)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsmul-rv64.ll (+88-88)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vssra-rv32.ll (+88-88)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vssra-rv64.ll (+88-88)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vssrl-rv32.ll (+88-88)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vssrl-rv64.ll (+88-88)
  • (added) llvm/test/CodeGen/RISCV/rvv/vxrm-insert.ll (+473)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vxrm.mir (+2-2)
diff --git a/llvm/lib/Target/RISCV/CMakeLists.txt b/llvm/lib/Target/RISCV/CMakeLists.txt
index 4d5fa79389ea68b..5b94ce3bea451a0 100644
--- a/llvm/lib/Target/RISCV/CMakeLists.txt
+++ b/llvm/lib/Target/RISCV/CMakeLists.txt
@@ -37,6 +37,7 @@ add_llvm_target(RISCVCodeGen
   RISCVGatherScatterLowering.cpp
   RISCVInsertVSETVLI.cpp
   RISCVInsertReadWriteCSR.cpp
+  RISCVInsertWriteVXRM.cpp
   RISCVInstrInfo.cpp
   RISCVISelDAGToDAG.cpp
   RISCVISelLowering.cpp
diff --git a/llvm/lib/Target/RISCV/RISCV.h b/llvm/lib/Target/RISCV/RISCV.h
index 3d8e33dc716ea44..6092d312e4f98b0 100644
--- a/llvm/lib/Target/RISCV/RISCV.h
+++ b/llvm/lib/Target/RISCV/RISCV.h
@@ -68,6 +68,9 @@ void initializeRISCVPostRAExpandPseudoPass(PassRegistry &);
 FunctionPass *createRISCVInsertReadWriteCSRPass();
 void initializeRISCVInsertReadWriteCSRPass(PassRegistry &);
 
+FunctionPass *createRISCVInsertWriteVXRMPass();
+void initializeRISCVInsertWriteVXRMPass(PassRegistry &);
+
 FunctionPass *createRISCVRedundantCopyEliminationPass();
 void initializeRISCVRedundantCopyEliminationPass(PassRegistry &);
 
diff --git a/llvm/lib/Target/RISCV/RISCVInsertReadWriteCSR.cpp b/llvm/lib/Target/RISCV/RISCVInsertReadWriteCSR.cpp
index 75f5ac3fbe0dd55..acd19bf1b8a162e 100644
--- a/llvm/lib/Target/RISCV/RISCVInsertReadWriteCSR.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInsertReadWriteCSR.cpp
@@ -9,7 +9,6 @@
 // of the RISC-V instructions.
 //
 // Currently the pass implements:
-// -Naive insertion of a write to vxrm before an RVV fixed-point instruction.
 // -Writing and saving frm before an RVV floating-point instruction with a
 //  static rounding mode and restores the value after.
 //
@@ -58,25 +57,11 @@ char RISCVInsertReadWriteCSR::ID = 0;
 INITIALIZE_PASS(RISCVInsertReadWriteCSR, DEBUG_TYPE,
                 RISCV_INSERT_READ_WRITE_CSR_NAME, false, false)
 
-// This function inserts a write to vxrm when encountering an RVV fixed-point
-// instruction. This function also swaps frm and restores it when encountering
-// an RVV floating point instruction with a static rounding mode.
+// This function also swaps frm and restores it when encountering an RVV
+// floating point instruction with a static rounding mode.
 bool RISCVInsertReadWriteCSR::emitWriteRoundingMode(MachineBasicBlock &MBB) {
   bool Changed = false;
   for (MachineInstr &MI : MBB) {
-    int VXRMIdx = RISCVII::getVXRMOpNum(MI.getDesc());
-    if (VXRMIdx >= 0) {
-      unsigned VXRMImm = MI.getOperand(VXRMIdx).getImm();
-
-      Changed = true;
-
-      BuildMI(MBB, MI, MI.getDebugLoc(), TII->get(RISCV::WriteVXRMImm))
-          .addImm(VXRMImm);
-      MI.addOperand(MachineOperand::CreateReg(RISCV::VXRM, /*IsDef*/ false,
-                                              /*IsImp*/ true));
-      continue;
-    }
-
     int FRMIdx = RISCVII::getFRMOpNum(MI.getDesc());
     if (FRMIdx < 0)
       continue;
diff --git a/llvm/lib/Target/RISCV/RISCVInsertWriteVXRM.cpp b/llvm/lib/Target/RISCV/RISCVInsertWriteVXRM.cpp
new file mode 100644
index 000000000000000..8313a809b84cbcb
--- /dev/null
+++ b/llvm/lib/Target/RISCV/RISCVInsertWriteVXRM.cpp
@@ -0,0 +1,457 @@
+//===-- RISCVInsertWriteVXRM.cpp - Insert Write of RISC-V VXRM CSR --------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass inserts writes to the VXRM CSR as needed by vector instructions.
+// Each instruction that uses VXRM carries an operand that contains its required
+// VXRM value. This pass tries to optimize placement to avoid redundant writes
+// to VXRM.
+//
+// This is done using 2 dataflow algorithms. The first is a forward data flow
+// to calculate where a VXRM value is available. The second is a backwards
+// dataflow to determine where a VXRM value is anticipated.
+//
+// Finally, we use the results of these two dataflows to insert VXRM writes
+// where a value is anticipated, but not available.
+//
+// FIXME: This pass does not split critical edges, so there can still be some
+// redundancy.
+//
+// FIXME: If we are willing to have writes that aren't always needed, we could
+// reduce the number of VXRM writes in some cases.
+//===----------------------------------------------------------------------===//
+
+#include "MCTargetDesc/RISCVBaseInfo.h"
+#include "RISCV.h"
+#include "RISCVSubtarget.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include <queue>
+
+using namespace llvm;
+
+#define DEBUG_TYPE "riscv-insert-write-vxrm"
+#define RISCV_INSERT_WRITE_VXRM_NAME "RISC-V Insert Write VXRM Pass"
+
+namespace {
+
+class VXRMInfo {
+  uint8_t VXRMImm = 0;
+
+  enum : uint8_t {
+    Uninitialized,
+    Static,
+    Unknown,
+  } State = Uninitialized;
+
+public:
+  VXRMInfo() {}
+
+  static VXRMInfo getUnknown() {
+    VXRMInfo Info;
+    Info.setUnknown();
+    return Info;
+  }
+
+  bool isValid() const { return State != Uninitialized; }
+  void setUnknown() { State = Unknown; }
+  bool isUnknown() const { return State == Unknown; }
+
+  bool isStatic() const { return State == Static; }
+
+  void setVXRMImm(unsigned Imm) {
+    assert(Imm <= 3 && "Unexpected VXRM value");
+    VXRMImm = Imm;
+    State = Static;
+  }
+  unsigned getVXRMImm() const {
+    assert(isStatic() && VXRMImm <= 3 && "Unexpected state");
+    return VXRMImm;
+  }
+
+  bool operator==(const VXRMInfo &Other) const {
+    // Uninitialized is only equal to another Uninitialized.
+    if (!isValid())
+      return !Other.isValid();
+    if (!Other.isValid())
+      return !isValid();
+
+    // Unknown is only equal to another Unknown.
+    if (isUnknown())
+      return Other.isUnknown();
+    if (Other.isUnknown())
+      return isUnknown();
+
+    return VXRMImm == Other.VXRMImm;
+  }
+
+  bool operator!=(const VXRMInfo &Other) const { return !(*this == Other); }
+
+  // Calculate the VXRMInfo visible to a block assuming this and Other are
+  // both predecessors.
+  VXRMInfo intersect(const VXRMInfo &Other) const {
+    // If the new value isn't valid, ignore it.
+    if (!Other.isValid())
+      return *this;
+
+    // If this value isn't valid, this must be the first predecessor, use it.
+    if (!isValid())
+      return Other;
+
+    // If either is unknown, the result is unknown.
+    if (isUnknown() || Other.isUnknown())
+      return VXRMInfo::getUnknown();
+
+    // If we have an exact match, return this.
+    if (*this == Other)
+      return *this;
+
+    // Otherwise the result is unknown.
+    return VXRMInfo::getUnknown();
+  }
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Support for debugging, callable in GDB: V->dump()
+  LLVM_DUMP_METHOD void dump() const {
+    print(dbgs());
+    dbgs() << "\n";
+  }
+
+  void print(raw_ostream &OS) const {
+    OS << '{';
+    if (!isValid())
+      OS << "Uninitialized";
+    else if (isUnknown())
+      OS << "Unknown";
+    else
+      OS << getVXRMImm();
+    OS << '}';
+  }
+#endif
+};
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+LLVM_ATTRIBUTE_USED
+inline raw_ostream &operator<<(raw_ostream &OS, const VXRMInfo &V) {
+  V.print(OS);
+  return OS;
+}
+#endif
+
+struct BlockData {
+  // Indicates if the block uses VXRM. Uninitialized means no use.
+  VXRMInfo VXRMUse;
+
+  // Indicates the VXRM output from the block. Unitialized means transparent.
+  VXRMInfo VXRMOut;
+
+  // Keeps track of the available VXRM value at the start of the basic bloc.
+  VXRMInfo AvailableIn;
+
+  // Keeps track of the available VXRM value at the end of the basic block.
+  VXRMInfo AvailableOut;
+
+  // Keeps track of what VXRM is anticipated at the start of the basic block.
+  VXRMInfo AnticipatedIn;
+
+  // Keeps track of what VXRM is anticipated at the end of the basic block.
+  VXRMInfo AnticipatedOut;
+
+  // Keeps track of whether the block is already in the queue.
+  bool InQueue;
+
+  BlockData() = default;
+};
+
+class RISCVInsertWriteVXRM : public MachineFunctionPass {
+  const TargetInstrInfo *TII;
+
+  std::vector<BlockData> BlockInfo;
+  std::queue<const MachineBasicBlock *> WorkList;
+
+public:
+  static char ID;
+
+  RISCVInsertWriteVXRM() : MachineFunctionPass(ID) {
+    initializeRISCVInsertWriteVXRMPass(*PassRegistry::getPassRegistry());
+  }
+
+  bool runOnMachineFunction(MachineFunction &MF) override;
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    AU.setPreservesCFG();
+    MachineFunctionPass::getAnalysisUsage(AU);
+  }
+
+  StringRef getPassName() const override {
+    return RISCV_INSERT_WRITE_VXRM_NAME;
+  }
+
+private:
+  bool computeVXRMChanges(const MachineBasicBlock &MBB);
+  void computeAvailable(const MachineBasicBlock &MBB);
+  void computeAnticipated(const MachineBasicBlock &MBB);
+  void emitWriteVXRM(MachineBasicBlock &MBB);
+};
+
+} // end anonymous namespace
+
+char RISCVInsertWriteVXRM::ID = 0;
+
+INITIALIZE_PASS(RISCVInsertWriteVXRM, DEBUG_TYPE, RISCV_INSERT_WRITE_VXRM_NAME,
+                false, false)
+
+bool RISCVInsertWriteVXRM::computeVXRMChanges(const MachineBasicBlock &MBB) {
+  BlockData &BBInfo = BlockInfo[MBB.getNumber()];
+
+  bool NeedVXRMWrite = false;
+  for (const MachineInstr &MI : MBB) {
+    int VXRMIdx = RISCVII::getVXRMOpNum(MI.getDesc());
+    if (VXRMIdx >= 0) {
+      unsigned NewVXRMImm = MI.getOperand(VXRMIdx).getImm();
+
+      if (!BBInfo.VXRMUse.isValid())
+        BBInfo.VXRMUse.setVXRMImm(NewVXRMImm);
+
+      BBInfo.VXRMOut.setVXRMImm(NewVXRMImm);
+      NeedVXRMWrite = true;
+    }
+
+    if (MI.isCall() || MI.isInlineAsm()) {
+      if (!BBInfo.VXRMUse.isValid())
+        BBInfo.VXRMUse.setUnknown();
+
+      BBInfo.VXRMOut.setUnknown();
+    }
+  }
+
+  return NeedVXRMWrite;
+}
+
+void RISCVInsertWriteVXRM::computeAvailable(const MachineBasicBlock &MBB) {
+  BlockData &BBInfo = BlockInfo[MBB.getNumber()];
+
+  BBInfo.InQueue = false;
+
+  VXRMInfo Available;
+  if (MBB.pred_empty()) {
+    Available.setUnknown();
+  } else {
+    for (const MachineBasicBlock *P : MBB.predecessors())
+      Available = Available.intersect(BlockInfo[P->getNumber()].AvailableOut);
+  }
+
+  // If we don't have any valid available info, wait until we do.
+  if (!Available.isValid())
+    return;
+
+  if (Available != BBInfo.AvailableIn) {
+    BBInfo.AvailableIn = Available;
+    LLVM_DEBUG(dbgs() << "AvailableIn state of " << printMBBReference(MBB)
+                      << " changed to " << BBInfo.AvailableIn << "\n");
+  }
+
+  if (BBInfo.VXRMOut.isValid())
+    Available = BBInfo.VXRMOut;
+
+  if (Available == BBInfo.AvailableOut)
+    return;
+
+  BBInfo.AvailableOut = Available;
+  LLVM_DEBUG(dbgs() << "AvailableOut state of " << printMBBReference(MBB)
+                    << " changed to " << BBInfo.AvailableOut << "\n");
+
+  // Add the successors to the work list so that we can propagate.
+  for (MachineBasicBlock *S : MBB.successors()) {
+    if (!BlockInfo[S->getNumber()].InQueue) {
+      BlockInfo[S->getNumber()].InQueue = true;
+      WorkList.push(S);
+    }
+  }
+}
+
+void RISCVInsertWriteVXRM::computeAnticipated(const MachineBasicBlock &MBB) {
+  BlockData &BBInfo = BlockInfo[MBB.getNumber()];
+
+  BBInfo.InQueue = false;
+
+  VXRMInfo Anticipated;
+  if (MBB.succ_empty()) {
+    Anticipated.setUnknown();
+  } else {
+    for (const MachineBasicBlock *S : MBB.successors())
+      Anticipated =
+          Anticipated.intersect(BlockInfo[S->getNumber()].AnticipatedIn);
+  }
+
+  // If we don't have any valid anticipated info, wait until we do.
+  if (!Anticipated.isValid())
+    return;
+
+  if (Anticipated != BBInfo.AnticipatedOut) {
+    BBInfo.AnticipatedOut = Anticipated;
+    LLVM_DEBUG(dbgs() << "AnticipatedOut state of " << printMBBReference(MBB)
+                      << " changed to " << BBInfo.AnticipatedOut << "\n");
+  }
+
+  // If this block reads VXRM, copy it.
+  if (BBInfo.VXRMUse.isValid())
+    Anticipated = BBInfo.VXRMUse;
+
+  if (Anticipated == BBInfo.AnticipatedIn)
+    return;
+
+  BBInfo.AnticipatedIn = Anticipated;
+  LLVM_DEBUG(dbgs() << "AnticipatedIn state of " << printMBBReference(MBB)
+                    << " changed to " << BBInfo.AnticipatedIn << "\n");
+
+  // Add the predecessors to the work list so that we can propagate.
+  for (MachineBasicBlock *P : MBB.predecessors()) {
+    if (!BlockInfo[P->getNumber()].InQueue) {
+      BlockInfo[P->getNumber()].InQueue = true;
+      WorkList.push(P);
+    }
+  }
+}
+
+void RISCVInsertWriteVXRM::emitWriteVXRM(MachineBasicBlock &MBB) {
+  const BlockData &BBInfo = BlockInfo[MBB.getNumber()];
+
+  VXRMInfo Info = BBInfo.AvailableIn;
+
+  // Insert VXRM write if anticipated and not available.
+  if (BBInfo.AnticipatedIn.isStatic()) {
+    bool NeedInsert = false;
+    // If there no predecessors and the value is anticipated, insert.
+    if (MBB.pred_empty()) {
+      NeedInsert = true;
+    } else {
+      // Search for any predecessors that wouldn't satisfy our requirement and
+      // insert a write VXRM if needed.
+      // NOTE: If one predecessor is able to provide the requirement, but
+      // another isn't, it means we have a critical edge. The better placement
+      // would be to split the critical edge.
+      for (MachineBasicBlock *P : MBB.predecessors()) {
+        const BlockData &PInfo = BlockInfo[P->getNumber()];
+        // If it's available out of the predecessor, then we're ok.
+        if (PInfo.AvailableOut.isStatic() &&
+            PInfo.AvailableOut.getVXRMImm() ==
+                BBInfo.AnticipatedIn.getVXRMImm())
+          continue;
+        // If the predecessor anticipated this value for all its succesors,
+        // then it should have already inserted.
+        if (PInfo.AnticipatedOut.isStatic() &&
+            PInfo.AnticipatedOut.getVXRMImm() ==
+                BBInfo.AnticipatedIn.getVXRMImm())
+          continue;
+        NeedInsert = true;
+        break;
+      }
+    }
+
+    if (NeedInsert) {
+      BuildMI(MBB, MBB.getFirstNonPHI(), DebugLoc(),
+              TII->get(RISCV::WriteVXRMImm))
+          .addImm(BBInfo.AnticipatedIn.getVXRMImm());
+      Info.setVXRMImm(BBInfo.AnticipatedIn.getVXRMImm());
+    }
+    Info = BBInfo.AnticipatedIn;
+  }
+
+  for (MachineInstr &MI : MBB) {
+    int VXRMIdx = RISCVII::getVXRMOpNum(MI.getDesc());
+    if (VXRMIdx >= 0) {
+      unsigned NewVXRMImm = MI.getOperand(VXRMIdx).getImm();
+
+      if (!Info.isStatic() || Info.getVXRMImm() != NewVXRMImm) {
+        LLVM_DEBUG(dbgs() << "Inserting before "; MI.print(dbgs()));
+        BuildMI(MBB, MI, MI.getDebugLoc(), TII->get(RISCV::WriteVXRMImm))
+            .addImm(NewVXRMImm);
+      }
+
+      MI.addOperand(MachineOperand::CreateReg(RISCV::VXRM, /*IsDef*/ false,
+                                              /*IsImp*/ true));
+      Info.setVXRMImm(NewVXRMImm);
+      continue;
+    }
+
+    if (MI.getOpcode() == RISCV::WriteVXRMImm) {
+      Info.setVXRMImm(MI.getOperand(0).getImm());
+      continue;
+    }
+
+    if (MI.isCall() || MI.isInlineAsm() || MI.modifiesRegister(RISCV::VXRM))
+      Info.setUnknown();
+  }
+
+  // If all our successors anticipate a value, do the insert.
+  // NOTE: It's possible that not all predecessors of our successor provide the
+  // correct value. This can occur on critical edges. If we don't split the
+  // critical edge we'll also have a write vxrm in the succesor that is
+  // redundant with this one.
+  if (BBInfo.AnticipatedOut.isStatic() &&
+      (!Info.isStatic() ||
+       Info.getVXRMImm() != BBInfo.AnticipatedOut.getVXRMImm())) {
+    LLVM_DEBUG(dbgs() << "Inserting at end of " << printMBBReference(MBB)
+                      << " changing to " << BBInfo.AnticipatedOut << "\n");
+    BuildMI(MBB, MBB.getFirstTerminator(), DebugLoc(),
+            TII->get(RISCV::WriteVXRMImm))
+        .addImm(BBInfo.AnticipatedOut.getVXRMImm());
+  }
+}
+
+bool RISCVInsertWriteVXRM::runOnMachineFunction(MachineFunction &MF) {
+  // Skip if the vector extension is not enabled.
+  const RISCVSubtarget &ST = MF.getSubtarget<RISCVSubtarget>();
+  if (!ST.hasVInstructions())
+    return false;
+
+  TII = ST.getInstrInfo();
+
+  assert(BlockInfo.empty() && "Expect empty block infos");
+  BlockInfo.resize(MF.getNumBlockIDs());
+
+  // Phase 1 - collect block information.
+  bool NeedVXRMChange = false;
+  for (const MachineBasicBlock &MBB : MF)
+    NeedVXRMChange |= computeVXRMChanges(MBB);
+
+  if (NeedVXRMChange) {
+    // Phase 2 - Compute available VXRM using a forward walk.
+    for (const MachineBasicBlock &MBB : MF) {
+      WorkList.push(&MBB);
+      BlockInfo[MBB.getNumber()].InQueue = true;
+    }
+    while (!WorkList.empty()) {
+      const MachineBasicBlock &MBB = *WorkList.front();
+      WorkList.pop();
+      computeAvailable(MBB);
+    }
+
+    // Phase 3 - Compute anticipated VXRM using a backwards walk.
+    for (const MachineBasicBlock &MBB : llvm::reverse(MF)) {
+      WorkList.push(&MBB);
+      BlockInfo[MBB.getNumber()].InQueue = true;
+    }
+    while (!WorkList.empty()) {
+      const MachineBasicBlock &MBB = *WorkList.front();
+      WorkList.pop();
+      computeAnticipated(MBB);
+    }
+
+    // Phase 4 - Emit VXRM writes at the earliest place possible.
+    for (MachineBasicBlock &MBB : MF)
+      emitWriteVXRM(MBB);
+  }
+
+  BlockInfo.clear();
+
+  return NeedVXRMChange;
+}
+
+FunctionPass *llvm::createRISCVInsertWriteVXRMPass() {
+  return new RISCVInsertWriteVXRM();
+}
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index 953ac097b915044..137d82505d152db 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -103,6 +103,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeRISCVTarget() {
   initializeRISCVExpandPseudoPass(*PR);
   initializeRISCVInsertVSETVLIPass(*PR);
   initializeRISCVInsertReadWriteCSRPass(*PR);
+  initializeRISCVInsertWriteVXRMPass(*PR);
   initializeRISCVDAGToDAGISelPass(*PR);
   initializeRISCVInitUndefPass(*PR);
   initializeRISCVMoveMergePass(*PR);
@@ -432,6 +433,7 @@ void RISCVPassConfig::addPreRegAlloc() {
       EnableRISCVDeadRegisterElimination)
     addPass(createRISCVDeadRegisterDefinitionsPass());
   addPass(createRISCVInsertReadWriteCSRPass());
+  addPass(createRISCVInsertWriteVXRMPass());
 }
 
 void RISCVPassConfig::addOptimizedRegAlloc() {
diff --git a/llvm/test/CodeGen/RISCV/O0-pipeline.ll b/llvm/test/CodeGen/RISCV/O0-pipeline.ll
index 1d9af9df2f718f0..e01d2d45263434e 100644
--- a/llvm/test/CodeGen/RISCV/O0-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O0-pipeline.ll
@@ -42,6 +42,7 @@
 ; CHECK-NEXT:       RISC-V Pre-RA pseudo instruction expansion pass
 ; CHECK-NEXT:       RISC-V Insert VSETVLI pass
 ; CHECK-NEXT:       RISC-V Insert Read/Write CSR Pass
+; CHECK-NEXT:       RISC-V Insert Write VXRM Pass
 ; CHECK-NEXT:       RISC-V init undef pass
 ; CHECK-NEXT:       Eliminate PHI nodes for register allocation
 ; CHECK-NEXT:       Two-Address instruction pass
diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
index cf0826096bd41f8..f1d56e9e18e2bae 100644
--- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
@@ -110,6 +110,7 @@
 ; CHECK-NEXT:       RISC-V Insert VSETVLI pass
 ; CHECK-NEXT:       RISC-V Dead register definitions
 ; CHECK-NEXT:       RISC-V Insert Read/Write CSR Pass
+; CHECK-NEXT:       RISC-V Insert Write VXRM Pass
 ; CHECK-NEXT:       Detect Dead Lanes
 ; CHECK-NEXT:       RISC-V init undef pass
 ; CHECK-NEXT:       Process Implicit Definitions
diff --git a/llvm/test/CodeGen/RISCV/rvv/copyprop.mir b/llvm/test/CodeGen/RISCV/rvv/copyprop.mir
index eb4c8bfdd67f9a6..7c9a943bd50b662 100644
--- a/llvm/test/CodeGen/RISCV/rvv/copyprop.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/copyprop.mir
@@ -5,11 +5,11 @@
   define void @foo() {
   ; CHECK-LABEL: foo:
   ; CHECK:       # %bb.0: # %entry
+  ; CHECK-NEXT:    csrwi vxrm, 0
   ; CHECK-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
   ; CHECK-NEXT:    vmsne.vi v0, v8, 0
   ; CHECK-NEXT:    vsll.vi v9, v8, 5
   ; CHECK-NEXT:    vmerge.vim v9, v9, -1, v0
-  ; CHECK-NEXT:    csrwi vxrm, 0
   ; CHECK-NEXT:    vssra.vi v8, v8, 2
   ; CHECK-NEXT:    bgeu a0, zero, .LBB0_3
   ; CHECK-NEXT:  # %bb.1: # %entry
diff --git a/llvm/test/CodeGen/RISCV/rvv/masked-tama.ll b/llvm/test/CodeGen/RISCV/rvv/ma...
[truncated]

Copy link
Collaborator

@rofirrim rofirrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I admit dataflow isn't always my forte and I had to reach my compiler books for the anticipation part :)

The approach looks sensible to me within limits already acknowledged in the notes and fixmes.

I fail to see a test in vxrm-insert.ll that shows the redundant write case caused by a critical edge. If there isn't one, would it make sense to add it?

}

if (NeedInsert) {
BuildMI(MBB, MBB.getFirstNonPHI(), DebugLoc(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You emit a debug message for the other insertions, perhaps it makes sense to do it here too?

// Insert VXRM write if anticipated and not available.
if (BBInfo.AnticipatedIn.isStatic()) {
bool NeedInsert = false;
// If there no predecessors and the value is anticipated, insert.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "If there are no…"

@topperc
Copy link
Collaborator Author

topperc commented Oct 27, 2023

LGTM. I admit dataflow isn't always my forte and I had to reach my compiler books for the anticipation part :)

The approach looks sensible to me within limits already acknowledged in the notes and fixmes.

Thanks for the review.

I fail to see a test in vxrm-insert.ll that shows the redundant write case caused by a critical edge. If there isn't one, would it make sense to add it?

I will add one.

Copy link
Member

@kito-cheng kito-cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question in my mind is should we keep naive one for O0?

// Insert VXRM write if anticipated and not available.
if (BBInfo.AnticipatedIn.isStatic()) {
bool NeedInsert = false;
// If there no predecessors and the value is anticipated, insert.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This is by definition an unreachable block, so why do we need to insert anything?

Ah, this can validly be the function entry block. Maybe check for that explicitly?

continue;
}

if (MI.getOpcode() == RISCV::WriteVXRMImm) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is missing from computeVXRMChanges. You really want that code to be in sync. :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect a WriteVXRMImm to exist before this pass runs, but I'll add it. It's needed here because we can insert a WriteVXRM at the beginning of the block before we start this loop.


VXRMInfo Info = BBInfo.AvailableIn;

// Insert VXRM write if anticipated and not available.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The eager insert at beginning of the block causes a bunch of spurious test diffs. The usual problem of "maybe there's something not spurious" lurking in their applies.

I think you could compute the start value, but defer the actual emission to first instruction using vxrm and get a lot fewer diffs in the existing test files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be not be a using instruction in the block since the algorithm calculate the earliest insertion point. So we'd still need to emit at the end in the worst case.

I'll try it.

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

; CHECK-NEXT: csrwi vxrm, 0
; CHECK-NEXT: beqz a1, .LBB5_2
; CHECK-NEXT: # %bb.1: # %trueblock
; CHECK-NEXT: vsetvli zero, a0, e8, mf8, ta, ma
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pure aside, but this test case demonstrates that we could probably benefit from an anticipation based extension to vsetvli insertion, and maybe a more aggressive flattening or tail commoning.

@topperc topperc merged commit 014390d into llvm:main Nov 2, 2023
@topperc topperc deleted the pr/vxrm-pre branch November 2, 2023 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants