Skip to content

[AMDGPU] Create an AMDGPUIfConverter pass #106415

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

jmmartinez
Copy link
Contributor

This is a draft to get some early feedback about this work.

Currently the code is mined with TODOs. Some thing that are missing:

  • Cost model - (shouldConvert function)
  • Factorize more code from EarlyIfConverter since the 3 passes share a lot in common
  • Proper implementation of getReversedVCMPXOpcode
  • Do not remove EarlyIfConverter from the pipeline
  • Test! Test! Test!
  • Handle more conditionals
  • The needsPredication function is probably wrong

This work is related to SWDEV-477895.

@llvmbot
Copy link
Member

llvmbot commented Aug 28, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Juan Manuel Martinez Caamaño (jmmartinez)

Changes

This is a draft to get some early feedback about this work.

Currently the code is mined with TODOs. Some thing that are missing:

  • Cost model - (shouldConvert function)
  • Factorize more code from EarlyIfConverter since the 3 passes share a lot in common
  • Proper implementation of getReversedVCMPXOpcode
  • Do not remove EarlyIfConverter from the pipeline
  • Test! Test! Test!
  • Handle more conditionals
  • The needsPredication function is probably wrong

This work is related to SWDEV-477895.


Patch is 70.14 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/106415.diff

9 Files Affected:

  • (added) llvm/include/llvm/CodeGen/SSAIfConv.h (+170)
  • (modified) llvm/lib/CodeGen/CMakeLists.txt (+1)
  • (modified) llvm/lib/CodeGen/EarlyIfConversion.cpp (+100-706)
  • (added) llvm/lib/CodeGen/SSAIfConv.cpp (+481)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.h (+3)
  • (added) llvm/lib/Target/AMDGPU/AMDGPUIfConverter.cpp (+279)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+2-8)
  • (modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (+1)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-if-cvt.ll (+65)
diff --git a/llvm/include/llvm/CodeGen/SSAIfConv.h b/llvm/include/llvm/CodeGen/SSAIfConv.h
new file mode 100644
index 00000000000000..9bba2f69ad4718
--- /dev/null
+++ b/llvm/include/llvm/CodeGen/SSAIfConv.h
@@ -0,0 +1,170 @@
+#ifndef LLVM_SSA_IF_CONV_H
+#define LLVM_SSA_IF_CONV_H
+
+#include "llvm/ADT/SparseSet.h"
+#include "llvm/CodeGen/TargetInstrInfo.h"
+
+//===----------------------------------------------------------------------===//
+//                                 SSAIfConv
+//===----------------------------------------------------------------------===//
+//
+// The SSAIfConv class performs if-conversion on SSA form machine code after
+// determining if it is possible. The class contains no heuristics; external
+// code should be used to determine when if-conversion is a good idea.
+//
+// SSAIfConv can convert both triangles and diamonds:
+//
+//   Triangle: Head              Diamond: Head
+//              | \                       /  \_
+//              |  \                     /    |
+//              |  [TF]BB              FBB    TBB
+//              |  /                     \    /
+//              | /                       \  /
+//             Tail                       Tail
+//
+// Instructions in the conditional blocks TBB and/or FBB are spliced into the
+// Head block, and phis in the Tail block are converted to select instructions.
+//
+namespace llvm {
+class SSAIfConv;
+
+namespace ifcvt {
+struct PredicationStrategy {
+  virtual bool canConvertIf(MachineBasicBlock *Head, MachineBasicBlock *TBB,
+                            MachineBasicBlock *FBB, MachineBasicBlock *Tail,
+                            ArrayRef<MachineOperand> Cond) {
+    return true;
+  }
+  virtual bool canPredicate(const MachineInstr &I) = 0;
+  virtual bool predicateBlock(MachineBasicBlock *Succ,
+                              ArrayRef<MachineOperand> Cond, bool Reverse) = 0;
+  virtual ~PredicationStrategy() = default;
+};
+} // namespace ifcvt
+
+class SSAIfConv {
+  const TargetInstrInfo *TII;
+  const TargetRegisterInfo *TRI;
+  MachineRegisterInfo *MRI;
+
+  // TODO INITIALIZE
+  const char *DEBUG_TYPE;
+  unsigned BlockInstrLimit;
+  bool Stress;
+
+  struct Statistics {
+    unsigned NumTrianglesSeen = 0;
+    unsigned NumDiamondsSeen = 0;
+    unsigned NumTrianglesConv = 0;
+    unsigned NumDiamondsConv = 0;
+  };
+
+  Statistics S;
+
+public:
+  SSAIfConv(const char *DEBUG_TYPE, unsigned BlockInstrLimit, bool Stress)
+      : DEBUG_TYPE(DEBUG_TYPE), BlockInstrLimit(BlockInstrLimit),
+        Stress(Stress) {}
+
+  template <typename CounterTy>
+  void updateStatistics(CounterTy &NumDiamondsSeen, CounterTy &NumDiamondsConv,
+                        CounterTy &NumTrianglesSeen,
+                        CounterTy &NumTrianglesConv) const {
+    NumDiamondsSeen += S.NumDiamondsSeen;
+    NumDiamondsConv += S.NumDiamondsConv;
+    NumTrianglesSeen += S.NumTrianglesSeen;
+    NumTrianglesConv += S.NumTrianglesConv;
+  }
+
+  /// The block containing the conditional branch.
+  MachineBasicBlock *Head;
+
+  /// The block containing phis after the if-then-else.
+  MachineBasicBlock *Tail;
+
+  /// The 'true' conditional block as determined by analyzeBranch.
+  MachineBasicBlock *TBB;
+
+  /// The 'false' conditional block as determined by analyzeBranch.
+  MachineBasicBlock *FBB;
+
+  /// isTriangle - When there is no 'else' block, either TBB or FBB will be
+  /// equal to Tail.
+  bool isTriangle() const { return TBB == Tail || FBB == Tail; }
+
+  /// Returns the Tail predecessor for the True side.
+  MachineBasicBlock *getTPred() const { return TBB == Tail ? Head : TBB; }
+
+  /// Returns the Tail predecessor for the  False side.
+  MachineBasicBlock *getFPred() const { return FBB == Tail ? Head : FBB; }
+
+  /// Information about each phi in the Tail block.
+  struct PHIInfo {
+    MachineInstr *PHI;
+    unsigned TReg = 0, FReg = 0;
+    // Latencies from Cond+Branch, TReg, and FReg to DstReg.
+    int CondCycles = 0, TCycles = 0, FCycles = 0;
+
+    PHIInfo(MachineInstr *phi) : PHI(phi) {}
+  };
+
+  SmallVector<PHIInfo, 8> PHIs;
+
+  /// The branch condition determined by analyzeBranch.
+  SmallVector<MachineOperand, 4> Cond;
+
+private:
+  /// Instructions in Head that define values used by the conditional blocks.
+  /// The hoisted instructions must be inserted after these instructions.
+  SmallPtrSet<MachineInstr *, 8> InsertAfter;
+
+  /// Register units clobbered by the conditional blocks.
+  BitVector ClobberedRegUnits;
+
+  // Scratch pad for findInsertionPoint.
+  SparseSet<unsigned> LiveRegUnits;
+
+  /// Insertion point in Head for speculatively executed instructions form TBB
+  /// and FBB.
+  MachineBasicBlock::iterator InsertionPoint;
+
+  /// Return true if all non-terminator instructions in MBB can be safely
+  /// predicated.
+  bool canPredicateInstrs(MachineBasicBlock *MBB,
+                          ifcvt::PredicationStrategy &Predicate);
+
+  /// Scan through instruction dependencies and update InsertAfter array.
+  /// Return false if any dependency is incompatible with if conversion.
+  bool InstrDependenciesAllowIfConv(MachineInstr *I);
+
+  /// Predicate all instructions of the basic block with current condition
+  /// except for terminators. Reverse the condition if ReversePredicate is set.
+  void PredicateBlock(MachineBasicBlock *MBB, bool ReversePredicate);
+
+  /// Find a valid insertion point in Head.
+  bool findInsertionPoint();
+
+  /// Replace PHI instructions in Tail with selects.
+  void replacePHIInstrs();
+
+  /// Insert selects and rewrite PHI operands to use them.
+  void rewritePHIOperands();
+
+public:
+  /// runOnMachineFunction - Initialize per-function data structures.
+  void runOnMachineFunction(MachineFunction &MF);
+
+  /// canConvertIf - If the sub-CFG headed by MBB can be if-converted,
+  /// initialize the internal state, and return true.
+  /// If predicate is set try to predicate the block otherwise try to
+  /// speculatively execute it.
+  bool canConvertIf(MachineBasicBlock *MBB, ifcvt::PredicationStrategy &S);
+
+  /// convertIf - If-convert the last block passed to canConvertIf(), assuming
+  /// it is possible. Add any blocks that are to be erased to RemoveBlocks.
+  void convertIf(SmallVectorImpl<MachineBasicBlock *> &RemoveBlocks,
+                 ifcvt::PredicationStrategy &S);
+};
+} // namespace llvm
+
+#endif
diff --git a/llvm/lib/CodeGen/CMakeLists.txt b/llvm/lib/CodeGen/CMakeLists.txt
index f1607f85c5b319..c322ca975170e7 100644
--- a/llvm/lib/CodeGen/CMakeLists.txt
+++ b/llvm/lib/CodeGen/CMakeLists.txt
@@ -190,6 +190,7 @@ add_llvm_component_library(LLVMCodeGen
   RegisterCoalescer.cpp
   RegisterPressure.cpp
   RegisterScavenging.cpp
+  SSAIfConv.cpp
   GCEmptyBasicBlocks.cpp
   RemoveRedundantDebugValues.cpp
   RenameIndependentSubregs.cpp
diff --git a/llvm/lib/CodeGen/EarlyIfConversion.cpp b/llvm/lib/CodeGen/EarlyIfConversion.cpp
index 0de8112fb72c89..27df4dea2c3a68 100644
--- a/llvm/lib/CodeGen/EarlyIfConversion.cpp
+++ b/llvm/lib/CodeGen/EarlyIfConversion.cpp
@@ -18,7 +18,6 @@
 #include "llvm/ADT/BitVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/SmallPtrSet.h"
-#include "llvm/ADT/SparseSet.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
 #include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
@@ -30,6 +29,7 @@
 #include "llvm/CodeGen/MachineOptimizationRemarkEmitter.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/MachineTraceMetrics.h"
+#include "llvm/CodeGen/SSAIfConv.h"
 #include "llvm/CodeGen/TargetInstrInfo.h"
 #include "llvm/CodeGen/TargetRegisterInfo.h"
 #include "llvm/CodeGen/TargetSubtargetInfo.h"
@@ -57,705 +57,6 @@ STATISTIC(NumDiamondsConv,  "Number of diamonds converted");
 STATISTIC(NumTrianglesSeen, "Number of triangles");
 STATISTIC(NumTrianglesConv, "Number of triangles converted");
 
-//===----------------------------------------------------------------------===//
-//                                 SSAIfConv
-//===----------------------------------------------------------------------===//
-//
-// The SSAIfConv class performs if-conversion on SSA form machine code after
-// determining if it is possible. The class contains no heuristics; external
-// code should be used to determine when if-conversion is a good idea.
-//
-// SSAIfConv can convert both triangles and diamonds:
-//
-//   Triangle: Head              Diamond: Head
-//              | \                       /  \_
-//              |  \                     /    |
-//              |  [TF]BB              FBB    TBB
-//              |  /                     \    /
-//              | /                       \  /
-//             Tail                       Tail
-//
-// Instructions in the conditional blocks TBB and/or FBB are spliced into the
-// Head block, and phis in the Tail block are converted to select instructions.
-//
-namespace {
-class SSAIfConv {
-  const TargetInstrInfo *TII;
-  const TargetRegisterInfo *TRI;
-  MachineRegisterInfo *MRI;
-
-public:
-  /// The block containing the conditional branch.
-  MachineBasicBlock *Head;
-
-  /// The block containing phis after the if-then-else.
-  MachineBasicBlock *Tail;
-
-  /// The 'true' conditional block as determined by analyzeBranch.
-  MachineBasicBlock *TBB;
-
-  /// The 'false' conditional block as determined by analyzeBranch.
-  MachineBasicBlock *FBB;
-
-  /// isTriangle - When there is no 'else' block, either TBB or FBB will be
-  /// equal to Tail.
-  bool isTriangle() const { return TBB == Tail || FBB == Tail; }
-
-  /// Returns the Tail predecessor for the True side.
-  MachineBasicBlock *getTPred() const { return TBB == Tail ? Head : TBB; }
-
-  /// Returns the Tail predecessor for the  False side.
-  MachineBasicBlock *getFPred() const { return FBB == Tail ? Head : FBB; }
-
-  /// Information about each phi in the Tail block.
-  struct PHIInfo {
-    MachineInstr *PHI;
-    unsigned TReg = 0, FReg = 0;
-    // Latencies from Cond+Branch, TReg, and FReg to DstReg.
-    int CondCycles = 0, TCycles = 0, FCycles = 0;
-
-    PHIInfo(MachineInstr *phi) : PHI(phi) {}
-  };
-
-  SmallVector<PHIInfo, 8> PHIs;
-
-  /// The branch condition determined by analyzeBranch.
-  SmallVector<MachineOperand, 4> Cond;
-
-private:
-  /// Instructions in Head that define values used by the conditional blocks.
-  /// The hoisted instructions must be inserted after these instructions.
-  SmallPtrSet<MachineInstr*, 8> InsertAfter;
-
-  /// Register units clobbered by the conditional blocks.
-  BitVector ClobberedRegUnits;
-
-  // Scratch pad for findInsertionPoint.
-  SparseSet<unsigned> LiveRegUnits;
-
-  /// Insertion point in Head for speculatively executed instructions form TBB
-  /// and FBB.
-  MachineBasicBlock::iterator InsertionPoint;
-
-  /// Return true if all non-terminator instructions in MBB can be safely
-  /// speculated.
-  bool canSpeculateInstrs(MachineBasicBlock *MBB);
-
-  /// Return true if all non-terminator instructions in MBB can be safely
-  /// predicated.
-  bool canPredicateInstrs(MachineBasicBlock *MBB);
-
-  /// Scan through instruction dependencies and update InsertAfter array.
-  /// Return false if any dependency is incompatible with if conversion.
-  bool InstrDependenciesAllowIfConv(MachineInstr *I);
-
-  /// Predicate all instructions of the basic block with current condition
-  /// except for terminators. Reverse the condition if ReversePredicate is set.
-  void PredicateBlock(MachineBasicBlock *MBB, bool ReversePredicate);
-
-  /// Find a valid insertion point in Head.
-  bool findInsertionPoint();
-
-  /// Replace PHI instructions in Tail with selects.
-  void replacePHIInstrs();
-
-  /// Insert selects and rewrite PHI operands to use them.
-  void rewritePHIOperands();
-
-public:
-  /// runOnMachineFunction - Initialize per-function data structures.
-  void runOnMachineFunction(MachineFunction &MF) {
-    TII = MF.getSubtarget().getInstrInfo();
-    TRI = MF.getSubtarget().getRegisterInfo();
-    MRI = &MF.getRegInfo();
-    LiveRegUnits.clear();
-    LiveRegUnits.setUniverse(TRI->getNumRegUnits());
-    ClobberedRegUnits.clear();
-    ClobberedRegUnits.resize(TRI->getNumRegUnits());
-  }
-
-  /// canConvertIf - If the sub-CFG headed by MBB can be if-converted,
-  /// initialize the internal state, and return true.
-  /// If predicate is set try to predicate the block otherwise try to
-  /// speculatively execute it.
-  bool canConvertIf(MachineBasicBlock *MBB, bool Predicate = false);
-
-  /// convertIf - If-convert the last block passed to canConvertIf(), assuming
-  /// it is possible. Add any blocks that are to be erased to RemoveBlocks.
-  void convertIf(SmallVectorImpl<MachineBasicBlock *> &RemoveBlocks,
-                 bool Predicate = false);
-};
-} // end anonymous namespace
-
-
-/// canSpeculateInstrs - Returns true if all the instructions in MBB can safely
-/// be speculated. The terminators are not considered.
-///
-/// If instructions use any values that are defined in the head basic block,
-/// the defining instructions are added to InsertAfter.
-///
-/// Any clobbered regunits are added to ClobberedRegUnits.
-///
-bool SSAIfConv::canSpeculateInstrs(MachineBasicBlock *MBB) {
-  // Reject any live-in physregs. It's probably CPSR/EFLAGS, and very hard to
-  // get right.
-  if (!MBB->livein_empty()) {
-    LLVM_DEBUG(dbgs() << printMBBReference(*MBB) << " has live-ins.\n");
-    return false;
-  }
-
-  unsigned InstrCount = 0;
-
-  // Check all instructions, except the terminators. It is assumed that
-  // terminators never have side effects or define any used register values.
-  for (MachineInstr &MI :
-       llvm::make_range(MBB->begin(), MBB->getFirstTerminator())) {
-    if (MI.isDebugInstr())
-      continue;
-
-    if (++InstrCount > BlockInstrLimit && !Stress) {
-      LLVM_DEBUG(dbgs() << printMBBReference(*MBB) << " has more than "
-                        << BlockInstrLimit << " instructions.\n");
-      return false;
-    }
-
-    // There shouldn't normally be any phis in a single-predecessor block.
-    if (MI.isPHI()) {
-      LLVM_DEBUG(dbgs() << "Can't hoist: " << MI);
-      return false;
-    }
-
-    // Don't speculate loads. Note that it may be possible and desirable to
-    // speculate GOT or constant pool loads that are guaranteed not to trap,
-    // but we don't support that for now.
-    if (MI.mayLoad()) {
-      LLVM_DEBUG(dbgs() << "Won't speculate load: " << MI);
-      return false;
-    }
-
-    // We never speculate stores, so an AA pointer isn't necessary.
-    bool DontMoveAcrossStore = true;
-    if (!MI.isSafeToMove(DontMoveAcrossStore)) {
-      LLVM_DEBUG(dbgs() << "Can't speculate: " << MI);
-      return false;
-    }
-
-    // Check for any dependencies on Head instructions.
-    if (!InstrDependenciesAllowIfConv(&MI))
-      return false;
-  }
-  return true;
-}
-
-/// Check that there is no dependencies preventing if conversion.
-///
-/// If instruction uses any values that are defined in the head basic block,
-/// the defining instructions are added to InsertAfter.
-bool SSAIfConv::InstrDependenciesAllowIfConv(MachineInstr *I) {
-  for (const MachineOperand &MO : I->operands()) {
-    if (MO.isRegMask()) {
-      LLVM_DEBUG(dbgs() << "Won't speculate regmask: " << *I);
-      return false;
-    }
-    if (!MO.isReg())
-      continue;
-    Register Reg = MO.getReg();
-
-    // Remember clobbered regunits.
-    if (MO.isDef() && Reg.isPhysical())
-      for (MCRegUnit Unit : TRI->regunits(Reg.asMCReg()))
-        ClobberedRegUnits.set(Unit);
-
-    if (!MO.readsReg() || !Reg.isVirtual())
-      continue;
-    MachineInstr *DefMI = MRI->getVRegDef(Reg);
-    if (!DefMI || DefMI->getParent() != Head)
-      continue;
-    if (InsertAfter.insert(DefMI).second)
-      LLVM_DEBUG(dbgs() << printMBBReference(*I->getParent()) << " depends on "
-                        << *DefMI);
-    if (DefMI->isTerminator()) {
-      LLVM_DEBUG(dbgs() << "Can't insert instructions below terminator.\n");
-      return false;
-    }
-  }
-  return true;
-}
-
-/// canPredicateInstrs - Returns true if all the instructions in MBB can safely
-/// be predicates. The terminators are not considered.
-///
-/// If instructions use any values that are defined in the head basic block,
-/// the defining instructions are added to InsertAfter.
-///
-/// Any clobbered regunits are added to ClobberedRegUnits.
-///
-bool SSAIfConv::canPredicateInstrs(MachineBasicBlock *MBB) {
-  // Reject any live-in physregs. It's probably CPSR/EFLAGS, and very hard to
-  // get right.
-  if (!MBB->livein_empty()) {
-    LLVM_DEBUG(dbgs() << printMBBReference(*MBB) << " has live-ins.\n");
-    return false;
-  }
-
-  unsigned InstrCount = 0;
-
-  // Check all instructions, except the terminators. It is assumed that
-  // terminators never have side effects or define any used register values.
-  for (MachineBasicBlock::iterator I = MBB->begin(),
-                                   E = MBB->getFirstTerminator();
-       I != E; ++I) {
-    if (I->isDebugInstr())
-      continue;
-
-    if (++InstrCount > BlockInstrLimit && !Stress) {
-      LLVM_DEBUG(dbgs() << printMBBReference(*MBB) << " has more than "
-                        << BlockInstrLimit << " instructions.\n");
-      return false;
-    }
-
-    // There shouldn't normally be any phis in a single-predecessor block.
-    if (I->isPHI()) {
-      LLVM_DEBUG(dbgs() << "Can't predicate: " << *I);
-      return false;
-    }
-
-    // Check that instruction is predicable
-    if (!TII->isPredicable(*I)) {
-      LLVM_DEBUG(dbgs() << "Isn't predicable: " << *I);
-      return false;
-    }
-
-    // Check that instruction is not already predicated.
-    if (TII->isPredicated(*I) && !TII->canPredicatePredicatedInstr(*I)) {
-      LLVM_DEBUG(dbgs() << "Is already predicated: " << *I);
-      return false;
-    }
-
-    // Check for any dependencies on Head instructions.
-    if (!InstrDependenciesAllowIfConv(&(*I)))
-      return false;
-  }
-  return true;
-}
-
-// Apply predicate to all instructions in the machine block.
-void SSAIfConv::PredicateBlock(MachineBasicBlock *MBB, bool ReversePredicate) {
-  auto Condition = Cond;
-  if (ReversePredicate) {
-    bool CanRevCond = !TII->reverseBranchCondition(Condition);
-    assert(CanRevCond && "Reversed predicate is not supported");
-    (void)CanRevCond;
-  }
-  // Terminators don't need to be predicated as they will be removed.
-  for (MachineBasicBlock::iterator I = MBB->begin(),
-                                   E = MBB->getFirstTerminator();
-       I != E; ++I) {
-    if (I->isDebugInstr())
-      continue;
-    TII->PredicateInstruction(*I, Condition);
-  }
-}
-
-/// Find an insertion point in Head for the speculated instructions. The
-/// insertion point must be:
-///
-/// 1. Before any terminators.
-/// 2. After any instructions in InsertAfter.
-/// 3. Not have any clobbered regunits live.
-///
-/// This function sets InsertionPoint and returns true when successful, it
-/// returns false if no valid insertion point could be found.
-///
-bool SSAIfConv::findInsertionPoint() {
-  // Keep track of live regunits before the current position.
-  // Only track RegUnits that are also in ClobberedRegUnits.
-  LiveRegUnits.clear();
-  SmallVector<MCRegister, 8> Reads;
-  MachineBasicBlock::iterator FirstTerm = Head->getFirstTerminator();
-  MachineBasicBlock::iterator I = Head->end();
-  MachineBasicBlock::iterator B = Head->begin();
-  while (I != B) {
-    --I;
-    // Some of the conditional code depends in I.
-    if (InsertAfter.count(&*I)) {
-      LLVM_DEBUG(dbgs() << "Can't insert code after " << *I);
-      return false;
-    }
-
-    // Update live regunits.
-    for (const MachineOperand &MO : I->operands()) {
-      // We're ignoring regmask operands. That is conservatively correct.
-      if (!MO.isReg())
-        continue;
-      Register Reg = MO.getReg();
-      if (!Reg.isPhysical())
-        continue;
-      // I clobbers Reg, so it isn't live before I.
-      if (MO.isDef())
-        for (MCRegUnit Unit : TRI->regunits(Reg.asMCReg()))
-          LiveRegUnits.erase(Unit);
-      // Unless I reads Reg.
-      if (MO.readsReg())
-        Reads.push_back(Reg.asMCReg());
-    }
-    // Anything read by I is live before I.
-    while (!Reads.empty())
-      for (MCRegUnit Unit : TRI->regunits(Reads.pop_back_val()))
-        if (ClobberedRegUnits.test(Unit))...
[truncated]

@jmmartinez jmmartinez self-assigned this Aug 28, 2024
static cl::opt<bool>
EnableEarlyIfConversion("amdgpu-early-ifcvt", cl::Hidden,
cl::desc("Run early if-conversion"),
cl::init(false));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect most if conversion to be done on the IR level. What cases do we need more aggressive MIR conversion for?

Can this be done in 2 steps? One showing the benefit of turning this one, and then another with the replacement pass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one main concern about doing this at IR level: How to handle instructions accessing memory.
I guess it could be done through masked load/store operations. But what if we have some other built-in?

I'm also not sure about how to implement the cost model in the LLVM-IR, but I guess it could be possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: For the moment this pass hasn't covered the cases the early-ifcvt pass covers. I should add it back.


namespace {
#define DEBUG_TYPE "amdgpu-if-cvt"
const char PassName[] = "AMDGPU if conversion";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need the vpassname variable

#define DEBUG_TYPE "amdgpu-if-cvt"
const char PassName[] = "AMDGPU if conversion";

class AMDGPUIfConverter : public MachineFunctionPass {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well start with this factored to be friendly to new PM

INITIALIZE_PASS_BEGIN(AMDGPUIfConverter, DEBUG_TYPE, PassName, false, false)
INITIALIZE_PASS_DEPENDENCY(MachineDominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(MachineBranchProbabilityInfoWrapperPass)
INITIALIZE_PASS_END(AMDGPUIfConverter, DEBUG_TYPE, PassName, false, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing line at end of file

@@ -0,0 +1,65 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc -mtriple=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need -verify-machineinstrs

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc -mtriple=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s

define amdgpu_kernel void @scalar_cmp(i32 noundef %value, ptr addrspace(8) nocapture writeonly %res, i32 noundef %v_offset, i32 noundef %0, i32 noundef %flag) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer regular functions with inreg arguments for SGPRS over kernels. Also precommit tests

@nvjle
Copy link
Contributor

nvjle commented Aug 31, 2024

I realize this is a draft patch, but this refactoring of the early if-conversion code is a great idea!

As this matures, does it makes sense to separate the target-independent portion into an isolated patch? Or at least provide a more generic title so other downstream users don't overlook the target-independent changes.

FWIW, I'm using it downstream in an NVIDIA backend-- specifically, the EarlyIfPredicator-- and I know of at least one other downstream backend using it. I'd be happy to review the target-independent code and try it locally on our tree. I believe the EarlyIfPredicator is not currently used in-tree , but the EarlyIfConverter is.

Thanks again for spearheading this work.

@jayfoad
Copy link
Contributor

jayfoad commented Sep 2, 2024

Could you always generate v_cmp in this pass, and rely on SIOptimizeExecMasking::tryRecordVCmpxAndSaveexecSequence to convert it to v_cmpx if it's profitable? On GFX10.1 it's generally not a good idea to use v_cmpx because FeatureVcmpxExecWARHazard means you have to insert an s_wait_depctr instruction before it, which negates any performance imrpovement.

@jmmartinez
Copy link
Contributor Author

jmmartinez commented Sep 3, 2024

As this matures, does it makes sense to separate the target-independent portion into an isolated patch? Or at least provide a more generic title so other downstream users don't overlook the target-independent changes.

Yes, I will do that. I wanted to get some quick validation of this work before being dug in.

I think I'm going to start by:

  1. moving EarlyIfConverter and EarlyIfPredicator to the new pass manager. -> not really needed. This can be done independently of these modifications
  2. factorizing the boolean argument into the PredicationStrategy -> done
  3. move the shouldIfConvert into the PredicationStrategy -> done
  4. move the post-dominator tree traversal and the tryIfConvert function into SSAIfConv. -> done

Only then I can start with the AMDGPU target specific stuff.

Thanks for the feedback !

@jmmartinez
Copy link
Contributor Author

@jayfoad @arsenm

If Conversion update

I'm having second thoughts on the implementation of this optimization as a stand-alone optimization, instead of spreading the transformation across other passes.

Problems

As it is now, this pass doesn't compose nicely with SIOptimizeExecMasking:
To promote vcmp+s_and_saveexec into vcmpx, there should be no other instruction modifying exec after the comparison.
This collides with the restauration of exec that we have to do at the end of the Then block.
We can work around this by generating directly the right vcmpx in this transformation directly.

When I try to generalize this pass from s_cbranch_scc0/1 to also support s_cbranch_execz/nz:

  • Depending on where I put the pass in the pipeline, exec branches may appear as SI_IF. To overcome this, we have to do this transformation after SILowerControlFlow.
  • The first terminator of the basic-block may not be the s_cbranch_execz/nz, but the s_mov_b32_term before the s_cbranch.
    This doesn't play well with the SSAIfConv pass that drops the old terminators when it moves the instructions from the Then block into the Head.
    For example:
    $exec_lo = S_MOV_B32_term killed %32:sreg_32 <- the terminator starts here
    S_CBRANCH_EXECZ %bb.2, implicit $exec
    S_BRANCH %bb.1

Alternative solution

I'm thinking that it would be better to spread this transformation in 3 parts:

  1. Transform s_cbranch_scc0/1 into vcmp + s_and_saveexec + s_cbranch_execz
  2. Let SIOptimizeExecMasking optimize vcmp + s_and_saveexec pairs into vcmpx if applicable
  3. If-Convert s_cbranch_execz

1. Transform s_cbranch_scc0/1 example

Transform this:

bb.0.entry:
  ...
  S_CMP_LT_I32 killed %14:sgpr_32, 1, implicit-def $scc
  S_CBRANCH_SCC1 %bb.2, implicit killed $scc
  S_BRANCH %bb.1

bb.1.if.then:
  ...

bb.2.if.end:
  ...

Into this:

bb.0.entry:
  ...
  $vcc_lo = V_CMP_LT_I32_e64 killed %14:sgpr_32, 1, implicit $exec
  28:sreg_32 = S_AND_SAVEEXEC_B32 killed $vcc_lo, implicit-def $exec, implicit-def dead $scc, implicit $exec # backup exec and mask it
  S_CBRANCH_EXECZ %bb.2, implicit killed $scc
  S_BRANCH %bb.1

bb.1.if.then:
  ...

bb.2.if.end:
  $exec_lo = S_MOV_B32 killed %28:sreg_32 # restore exec
  ...

For s_cbranch_scc0 we would have to reverse the comparison.

3. If-Convert s_cbranch_execz example

Transform this:

bb.0.entry:
  ...
  S_CBRANCH_EXECZ %bb.2, implicit killed $scc
  S_BRANCH %bb.1

bb.1.if.then:
  ..Then..

bb.2.if.end:
  ..End..

Into this:

bb.0.entry:
  ...
  ..Then..
  ..End..

Keep in mind that the instructions in the Then block should be either speculatable or masked by exec, and should not modify exec.

Where to go from here?

I'd appreciate some feedback if you think this is a better/worse idea; where to place this transformations (e.g. some existing pass or where in the pipeline); other potential issues that I've might have missed.

@jayfoad
Copy link
Contributor

jayfoad commented Sep 18, 2024

What are you trying to achieve? Do you have a motivating example? The patch description only says what is missing from the patch.

@jmmartinez
Copy link
Contributor Author

@jayfoad An extract of the motivating example is this code: https://godbolt.org/z/dPdf7r381 .

This extract comes from SWDEV-477895, where CK used inline-assembly to work around code that doesn't optimize to a v_cmpx instruction.

@jayfoad
Copy link
Contributor

jayfoad commented Sep 18, 2024

An extract of the motivating example is this code: https://godbolt.org/z/dPdf7r381 .

        s_cmp_lt_i32 s0, 1
        s_cbranch_scc1 .LBB0_2
        s_clause 0x1
        s_load_dwordx2 s[6:7], s[4:5], 0x0
        s_load_dwordx4 s[0:3], s[4:5], 0x10
        s_waitcnt lgkmcnt(0)
        s_load_dword s6, s[6:7], 0x0
        s_load_dword s4, s[4:5], 0x20
        s_waitcnt lgkmcnt(0)
        v_mov_b32_e32 v0, s6
        v_mov_b32_e32 v1, s4
        buffer_store_dword v0, v1, s[0:3], 0 offen
.LBB0_2:

So you want to remove the s_cbranch_scc1 and instead rely on setting EXEC to 0, even though the branch condition is uniform? Is that really viable, given that the s_load instructions can't be predicated with EXEC? What codegen would you like to see for this example?

@jmmartinez
Copy link
Contributor Author

Something like this:

        v_mov_b32_e32 exec_backup, exec
        v_cmpx_lt_i32 exec, s0, 1
        s_load_dwordx2 s[6:7], s[4:5], 0x0
        s_load_dwordx4 s[0:3], s[4:5], 0x10
        s_waitcnt lgkmcnt(0)
        s_load_dword s6, s[6:7], 0x0
        s_load_dword s4, s[4:5], 0x20
        s_waitcnt lgkmcnt(0)
        v_mov_b32_e32 v0, s6
        v_mov_b32_e32 v1, s4
        buffer_store_dword v0, v1, s[0:3], 0 offen
        v_mov_b32_e32 exec, exec_backup

Is that really viable, given that the s_load instructions can't be predicated with EXEC?

This depends if the loaded pointers are marked as dereferenceable. If they are we can safely execute them.

@jayfoad
Copy link
Contributor

jayfoad commented Sep 18, 2024

I think this is roughly the codegen you would get if the branch condition was divergent and SIPreEmitPeephole::removeExeczBranch successfully removed the s_cbranch_execz, right? Maybe there is some way to implement this by selectively treating some uniform branch conditions as if they were divergent? (Or to put it another way, handling uniform branches specially is just an optimization, and it should be conservatively correct to handle all branches as if they were divergent. So maybe you could selectively disable that optimization.)

@jmmartinez
Copy link
Contributor Author

Thanks ! I didn't know about SIPreEmitPeephole::removeExeczBranch. I'll try to approach this problem from a different angle.

@jmmartinez
Copy link
Contributor Author

I've started an alternative implementation in main...jmmartinez:llvm-project:amd/if-cvt-v2

I'm starting to submit the patches separately.

These new series of patches consists of 3 parts:

  1. Make StructurizeCFG maintain branch weight metadata (e.g. that comes from the [[likely/unlikely]] user annotations)
  2. Make mustRetainExeczBranch use a cost model based on BranchProbability and the TargetSchedmodel
  3. Implement a demote s_cbranch_scc into vcmp pass (but this transfromation could be placed elsewhere).

@jmmartinez jmmartinez closed this Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants