-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] Create an AMDGPUIfConverter pass #106415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Juan Manuel Martinez Caamaño (jmmartinez) ChangesThis is a draft to get some early feedback about this work. Currently the code is mined with TODOs. Some thing that are missing:
This work is related to Patch is 70.14 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/106415.diff 9 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/SSAIfConv.h b/llvm/include/llvm/CodeGen/SSAIfConv.h
new file mode 100644
index 00000000000000..9bba2f69ad4718
--- /dev/null
+++ b/llvm/include/llvm/CodeGen/SSAIfConv.h
@@ -0,0 +1,170 @@
+#ifndef LLVM_SSA_IF_CONV_H
+#define LLVM_SSA_IF_CONV_H
+
+#include "llvm/ADT/SparseSet.h"
+#include "llvm/CodeGen/TargetInstrInfo.h"
+
+//===----------------------------------------------------------------------===//
+// SSAIfConv
+//===----------------------------------------------------------------------===//
+//
+// The SSAIfConv class performs if-conversion on SSA form machine code after
+// determining if it is possible. The class contains no heuristics; external
+// code should be used to determine when if-conversion is a good idea.
+//
+// SSAIfConv can convert both triangles and diamonds:
+//
+// Triangle: Head Diamond: Head
+// | \ / \_
+// | \ / |
+// | [TF]BB FBB TBB
+// | / \ /
+// | / \ /
+// Tail Tail
+//
+// Instructions in the conditional blocks TBB and/or FBB are spliced into the
+// Head block, and phis in the Tail block are converted to select instructions.
+//
+namespace llvm {
+class SSAIfConv;
+
+namespace ifcvt {
+struct PredicationStrategy {
+ virtual bool canConvertIf(MachineBasicBlock *Head, MachineBasicBlock *TBB,
+ MachineBasicBlock *FBB, MachineBasicBlock *Tail,
+ ArrayRef<MachineOperand> Cond) {
+ return true;
+ }
+ virtual bool canPredicate(const MachineInstr &I) = 0;
+ virtual bool predicateBlock(MachineBasicBlock *Succ,
+ ArrayRef<MachineOperand> Cond, bool Reverse) = 0;
+ virtual ~PredicationStrategy() = default;
+};
+} // namespace ifcvt
+
+class SSAIfConv {
+ const TargetInstrInfo *TII;
+ const TargetRegisterInfo *TRI;
+ MachineRegisterInfo *MRI;
+
+ // TODO INITIALIZE
+ const char *DEBUG_TYPE;
+ unsigned BlockInstrLimit;
+ bool Stress;
+
+ struct Statistics {
+ unsigned NumTrianglesSeen = 0;
+ unsigned NumDiamondsSeen = 0;
+ unsigned NumTrianglesConv = 0;
+ unsigned NumDiamondsConv = 0;
+ };
+
+ Statistics S;
+
+public:
+ SSAIfConv(const char *DEBUG_TYPE, unsigned BlockInstrLimit, bool Stress)
+ : DEBUG_TYPE(DEBUG_TYPE), BlockInstrLimit(BlockInstrLimit),
+ Stress(Stress) {}
+
+ template <typename CounterTy>
+ void updateStatistics(CounterTy &NumDiamondsSeen, CounterTy &NumDiamondsConv,
+ CounterTy &NumTrianglesSeen,
+ CounterTy &NumTrianglesConv) const {
+ NumDiamondsSeen += S.NumDiamondsSeen;
+ NumDiamondsConv += S.NumDiamondsConv;
+ NumTrianglesSeen += S.NumTrianglesSeen;
+ NumTrianglesConv += S.NumTrianglesConv;
+ }
+
+ /// The block containing the conditional branch.
+ MachineBasicBlock *Head;
+
+ /// The block containing phis after the if-then-else.
+ MachineBasicBlock *Tail;
+
+ /// The 'true' conditional block as determined by analyzeBranch.
+ MachineBasicBlock *TBB;
+
+ /// The 'false' conditional block as determined by analyzeBranch.
+ MachineBasicBlock *FBB;
+
+ /// isTriangle - When there is no 'else' block, either TBB or FBB will be
+ /// equal to Tail.
+ bool isTriangle() const { return TBB == Tail || FBB == Tail; }
+
+ /// Returns the Tail predecessor for the True side.
+ MachineBasicBlock *getTPred() const { return TBB == Tail ? Head : TBB; }
+
+ /// Returns the Tail predecessor for the False side.
+ MachineBasicBlock *getFPred() const { return FBB == Tail ? Head : FBB; }
+
+ /// Information about each phi in the Tail block.
+ struct PHIInfo {
+ MachineInstr *PHI;
+ unsigned TReg = 0, FReg = 0;
+ // Latencies from Cond+Branch, TReg, and FReg to DstReg.
+ int CondCycles = 0, TCycles = 0, FCycles = 0;
+
+ PHIInfo(MachineInstr *phi) : PHI(phi) {}
+ };
+
+ SmallVector<PHIInfo, 8> PHIs;
+
+ /// The branch condition determined by analyzeBranch.
+ SmallVector<MachineOperand, 4> Cond;
+
+private:
+ /// Instructions in Head that define values used by the conditional blocks.
+ /// The hoisted instructions must be inserted after these instructions.
+ SmallPtrSet<MachineInstr *, 8> InsertAfter;
+
+ /// Register units clobbered by the conditional blocks.
+ BitVector ClobberedRegUnits;
+
+ // Scratch pad for findInsertionPoint.
+ SparseSet<unsigned> LiveRegUnits;
+
+ /// Insertion point in Head for speculatively executed instructions form TBB
+ /// and FBB.
+ MachineBasicBlock::iterator InsertionPoint;
+
+ /// Return true if all non-terminator instructions in MBB can be safely
+ /// predicated.
+ bool canPredicateInstrs(MachineBasicBlock *MBB,
+ ifcvt::PredicationStrategy &Predicate);
+
+ /// Scan through instruction dependencies and update InsertAfter array.
+ /// Return false if any dependency is incompatible with if conversion.
+ bool InstrDependenciesAllowIfConv(MachineInstr *I);
+
+ /// Predicate all instructions of the basic block with current condition
+ /// except for terminators. Reverse the condition if ReversePredicate is set.
+ void PredicateBlock(MachineBasicBlock *MBB, bool ReversePredicate);
+
+ /// Find a valid insertion point in Head.
+ bool findInsertionPoint();
+
+ /// Replace PHI instructions in Tail with selects.
+ void replacePHIInstrs();
+
+ /// Insert selects and rewrite PHI operands to use them.
+ void rewritePHIOperands();
+
+public:
+ /// runOnMachineFunction - Initialize per-function data structures.
+ void runOnMachineFunction(MachineFunction &MF);
+
+ /// canConvertIf - If the sub-CFG headed by MBB can be if-converted,
+ /// initialize the internal state, and return true.
+ /// If predicate is set try to predicate the block otherwise try to
+ /// speculatively execute it.
+ bool canConvertIf(MachineBasicBlock *MBB, ifcvt::PredicationStrategy &S);
+
+ /// convertIf - If-convert the last block passed to canConvertIf(), assuming
+ /// it is possible. Add any blocks that are to be erased to RemoveBlocks.
+ void convertIf(SmallVectorImpl<MachineBasicBlock *> &RemoveBlocks,
+ ifcvt::PredicationStrategy &S);
+};
+} // namespace llvm
+
+#endif
diff --git a/llvm/lib/CodeGen/CMakeLists.txt b/llvm/lib/CodeGen/CMakeLists.txt
index f1607f85c5b319..c322ca975170e7 100644
--- a/llvm/lib/CodeGen/CMakeLists.txt
+++ b/llvm/lib/CodeGen/CMakeLists.txt
@@ -190,6 +190,7 @@ add_llvm_component_library(LLVMCodeGen
RegisterCoalescer.cpp
RegisterPressure.cpp
RegisterScavenging.cpp
+ SSAIfConv.cpp
GCEmptyBasicBlocks.cpp
RemoveRedundantDebugValues.cpp
RenameIndependentSubregs.cpp
diff --git a/llvm/lib/CodeGen/EarlyIfConversion.cpp b/llvm/lib/CodeGen/EarlyIfConversion.cpp
index 0de8112fb72c89..27df4dea2c3a68 100644
--- a/llvm/lib/CodeGen/EarlyIfConversion.cpp
+++ b/llvm/lib/CodeGen/EarlyIfConversion.cpp
@@ -18,7 +18,6 @@
#include "llvm/ADT/BitVector.h"
#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/SmallPtrSet.h"
-#include "llvm/ADT/SparseSet.h"
#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
@@ -30,6 +29,7 @@
#include "llvm/CodeGen/MachineOptimizationRemarkEmitter.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/MachineTraceMetrics.h"
+#include "llvm/CodeGen/SSAIfConv.h"
#include "llvm/CodeGen/TargetInstrInfo.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"
@@ -57,705 +57,6 @@ STATISTIC(NumDiamondsConv, "Number of diamonds converted");
STATISTIC(NumTrianglesSeen, "Number of triangles");
STATISTIC(NumTrianglesConv, "Number of triangles converted");
-//===----------------------------------------------------------------------===//
-// SSAIfConv
-//===----------------------------------------------------------------------===//
-//
-// The SSAIfConv class performs if-conversion on SSA form machine code after
-// determining if it is possible. The class contains no heuristics; external
-// code should be used to determine when if-conversion is a good idea.
-//
-// SSAIfConv can convert both triangles and diamonds:
-//
-// Triangle: Head Diamond: Head
-// | \ / \_
-// | \ / |
-// | [TF]BB FBB TBB
-// | / \ /
-// | / \ /
-// Tail Tail
-//
-// Instructions in the conditional blocks TBB and/or FBB are spliced into the
-// Head block, and phis in the Tail block are converted to select instructions.
-//
-namespace {
-class SSAIfConv {
- const TargetInstrInfo *TII;
- const TargetRegisterInfo *TRI;
- MachineRegisterInfo *MRI;
-
-public:
- /// The block containing the conditional branch.
- MachineBasicBlock *Head;
-
- /// The block containing phis after the if-then-else.
- MachineBasicBlock *Tail;
-
- /// The 'true' conditional block as determined by analyzeBranch.
- MachineBasicBlock *TBB;
-
- /// The 'false' conditional block as determined by analyzeBranch.
- MachineBasicBlock *FBB;
-
- /// isTriangle - When there is no 'else' block, either TBB or FBB will be
- /// equal to Tail.
- bool isTriangle() const { return TBB == Tail || FBB == Tail; }
-
- /// Returns the Tail predecessor for the True side.
- MachineBasicBlock *getTPred() const { return TBB == Tail ? Head : TBB; }
-
- /// Returns the Tail predecessor for the False side.
- MachineBasicBlock *getFPred() const { return FBB == Tail ? Head : FBB; }
-
- /// Information about each phi in the Tail block.
- struct PHIInfo {
- MachineInstr *PHI;
- unsigned TReg = 0, FReg = 0;
- // Latencies from Cond+Branch, TReg, and FReg to DstReg.
- int CondCycles = 0, TCycles = 0, FCycles = 0;
-
- PHIInfo(MachineInstr *phi) : PHI(phi) {}
- };
-
- SmallVector<PHIInfo, 8> PHIs;
-
- /// The branch condition determined by analyzeBranch.
- SmallVector<MachineOperand, 4> Cond;
-
-private:
- /// Instructions in Head that define values used by the conditional blocks.
- /// The hoisted instructions must be inserted after these instructions.
- SmallPtrSet<MachineInstr*, 8> InsertAfter;
-
- /// Register units clobbered by the conditional blocks.
- BitVector ClobberedRegUnits;
-
- // Scratch pad for findInsertionPoint.
- SparseSet<unsigned> LiveRegUnits;
-
- /// Insertion point in Head for speculatively executed instructions form TBB
- /// and FBB.
- MachineBasicBlock::iterator InsertionPoint;
-
- /// Return true if all non-terminator instructions in MBB can be safely
- /// speculated.
- bool canSpeculateInstrs(MachineBasicBlock *MBB);
-
- /// Return true if all non-terminator instructions in MBB can be safely
- /// predicated.
- bool canPredicateInstrs(MachineBasicBlock *MBB);
-
- /// Scan through instruction dependencies and update InsertAfter array.
- /// Return false if any dependency is incompatible with if conversion.
- bool InstrDependenciesAllowIfConv(MachineInstr *I);
-
- /// Predicate all instructions of the basic block with current condition
- /// except for terminators. Reverse the condition if ReversePredicate is set.
- void PredicateBlock(MachineBasicBlock *MBB, bool ReversePredicate);
-
- /// Find a valid insertion point in Head.
- bool findInsertionPoint();
-
- /// Replace PHI instructions in Tail with selects.
- void replacePHIInstrs();
-
- /// Insert selects and rewrite PHI operands to use them.
- void rewritePHIOperands();
-
-public:
- /// runOnMachineFunction - Initialize per-function data structures.
- void runOnMachineFunction(MachineFunction &MF) {
- TII = MF.getSubtarget().getInstrInfo();
- TRI = MF.getSubtarget().getRegisterInfo();
- MRI = &MF.getRegInfo();
- LiveRegUnits.clear();
- LiveRegUnits.setUniverse(TRI->getNumRegUnits());
- ClobberedRegUnits.clear();
- ClobberedRegUnits.resize(TRI->getNumRegUnits());
- }
-
- /// canConvertIf - If the sub-CFG headed by MBB can be if-converted,
- /// initialize the internal state, and return true.
- /// If predicate is set try to predicate the block otherwise try to
- /// speculatively execute it.
- bool canConvertIf(MachineBasicBlock *MBB, bool Predicate = false);
-
- /// convertIf - If-convert the last block passed to canConvertIf(), assuming
- /// it is possible. Add any blocks that are to be erased to RemoveBlocks.
- void convertIf(SmallVectorImpl<MachineBasicBlock *> &RemoveBlocks,
- bool Predicate = false);
-};
-} // end anonymous namespace
-
-
-/// canSpeculateInstrs - Returns true if all the instructions in MBB can safely
-/// be speculated. The terminators are not considered.
-///
-/// If instructions use any values that are defined in the head basic block,
-/// the defining instructions are added to InsertAfter.
-///
-/// Any clobbered regunits are added to ClobberedRegUnits.
-///
-bool SSAIfConv::canSpeculateInstrs(MachineBasicBlock *MBB) {
- // Reject any live-in physregs. It's probably CPSR/EFLAGS, and very hard to
- // get right.
- if (!MBB->livein_empty()) {
- LLVM_DEBUG(dbgs() << printMBBReference(*MBB) << " has live-ins.\n");
- return false;
- }
-
- unsigned InstrCount = 0;
-
- // Check all instructions, except the terminators. It is assumed that
- // terminators never have side effects or define any used register values.
- for (MachineInstr &MI :
- llvm::make_range(MBB->begin(), MBB->getFirstTerminator())) {
- if (MI.isDebugInstr())
- continue;
-
- if (++InstrCount > BlockInstrLimit && !Stress) {
- LLVM_DEBUG(dbgs() << printMBBReference(*MBB) << " has more than "
- << BlockInstrLimit << " instructions.\n");
- return false;
- }
-
- // There shouldn't normally be any phis in a single-predecessor block.
- if (MI.isPHI()) {
- LLVM_DEBUG(dbgs() << "Can't hoist: " << MI);
- return false;
- }
-
- // Don't speculate loads. Note that it may be possible and desirable to
- // speculate GOT or constant pool loads that are guaranteed not to trap,
- // but we don't support that for now.
- if (MI.mayLoad()) {
- LLVM_DEBUG(dbgs() << "Won't speculate load: " << MI);
- return false;
- }
-
- // We never speculate stores, so an AA pointer isn't necessary.
- bool DontMoveAcrossStore = true;
- if (!MI.isSafeToMove(DontMoveAcrossStore)) {
- LLVM_DEBUG(dbgs() << "Can't speculate: " << MI);
- return false;
- }
-
- // Check for any dependencies on Head instructions.
- if (!InstrDependenciesAllowIfConv(&MI))
- return false;
- }
- return true;
-}
-
-/// Check that there is no dependencies preventing if conversion.
-///
-/// If instruction uses any values that are defined in the head basic block,
-/// the defining instructions are added to InsertAfter.
-bool SSAIfConv::InstrDependenciesAllowIfConv(MachineInstr *I) {
- for (const MachineOperand &MO : I->operands()) {
- if (MO.isRegMask()) {
- LLVM_DEBUG(dbgs() << "Won't speculate regmask: " << *I);
- return false;
- }
- if (!MO.isReg())
- continue;
- Register Reg = MO.getReg();
-
- // Remember clobbered regunits.
- if (MO.isDef() && Reg.isPhysical())
- for (MCRegUnit Unit : TRI->regunits(Reg.asMCReg()))
- ClobberedRegUnits.set(Unit);
-
- if (!MO.readsReg() || !Reg.isVirtual())
- continue;
- MachineInstr *DefMI = MRI->getVRegDef(Reg);
- if (!DefMI || DefMI->getParent() != Head)
- continue;
- if (InsertAfter.insert(DefMI).second)
- LLVM_DEBUG(dbgs() << printMBBReference(*I->getParent()) << " depends on "
- << *DefMI);
- if (DefMI->isTerminator()) {
- LLVM_DEBUG(dbgs() << "Can't insert instructions below terminator.\n");
- return false;
- }
- }
- return true;
-}
-
-/// canPredicateInstrs - Returns true if all the instructions in MBB can safely
-/// be predicates. The terminators are not considered.
-///
-/// If instructions use any values that are defined in the head basic block,
-/// the defining instructions are added to InsertAfter.
-///
-/// Any clobbered regunits are added to ClobberedRegUnits.
-///
-bool SSAIfConv::canPredicateInstrs(MachineBasicBlock *MBB) {
- // Reject any live-in physregs. It's probably CPSR/EFLAGS, and very hard to
- // get right.
- if (!MBB->livein_empty()) {
- LLVM_DEBUG(dbgs() << printMBBReference(*MBB) << " has live-ins.\n");
- return false;
- }
-
- unsigned InstrCount = 0;
-
- // Check all instructions, except the terminators. It is assumed that
- // terminators never have side effects or define any used register values.
- for (MachineBasicBlock::iterator I = MBB->begin(),
- E = MBB->getFirstTerminator();
- I != E; ++I) {
- if (I->isDebugInstr())
- continue;
-
- if (++InstrCount > BlockInstrLimit && !Stress) {
- LLVM_DEBUG(dbgs() << printMBBReference(*MBB) << " has more than "
- << BlockInstrLimit << " instructions.\n");
- return false;
- }
-
- // There shouldn't normally be any phis in a single-predecessor block.
- if (I->isPHI()) {
- LLVM_DEBUG(dbgs() << "Can't predicate: " << *I);
- return false;
- }
-
- // Check that instruction is predicable
- if (!TII->isPredicable(*I)) {
- LLVM_DEBUG(dbgs() << "Isn't predicable: " << *I);
- return false;
- }
-
- // Check that instruction is not already predicated.
- if (TII->isPredicated(*I) && !TII->canPredicatePredicatedInstr(*I)) {
- LLVM_DEBUG(dbgs() << "Is already predicated: " << *I);
- return false;
- }
-
- // Check for any dependencies on Head instructions.
- if (!InstrDependenciesAllowIfConv(&(*I)))
- return false;
- }
- return true;
-}
-
-// Apply predicate to all instructions in the machine block.
-void SSAIfConv::PredicateBlock(MachineBasicBlock *MBB, bool ReversePredicate) {
- auto Condition = Cond;
- if (ReversePredicate) {
- bool CanRevCond = !TII->reverseBranchCondition(Condition);
- assert(CanRevCond && "Reversed predicate is not supported");
- (void)CanRevCond;
- }
- // Terminators don't need to be predicated as they will be removed.
- for (MachineBasicBlock::iterator I = MBB->begin(),
- E = MBB->getFirstTerminator();
- I != E; ++I) {
- if (I->isDebugInstr())
- continue;
- TII->PredicateInstruction(*I, Condition);
- }
-}
-
-/// Find an insertion point in Head for the speculated instructions. The
-/// insertion point must be:
-///
-/// 1. Before any terminators.
-/// 2. After any instructions in InsertAfter.
-/// 3. Not have any clobbered regunits live.
-///
-/// This function sets InsertionPoint and returns true when successful, it
-/// returns false if no valid insertion point could be found.
-///
-bool SSAIfConv::findInsertionPoint() {
- // Keep track of live regunits before the current position.
- // Only track RegUnits that are also in ClobberedRegUnits.
- LiveRegUnits.clear();
- SmallVector<MCRegister, 8> Reads;
- MachineBasicBlock::iterator FirstTerm = Head->getFirstTerminator();
- MachineBasicBlock::iterator I = Head->end();
- MachineBasicBlock::iterator B = Head->begin();
- while (I != B) {
- --I;
- // Some of the conditional code depends in I.
- if (InsertAfter.count(&*I)) {
- LLVM_DEBUG(dbgs() << "Can't insert code after " << *I);
- return false;
- }
-
- // Update live regunits.
- for (const MachineOperand &MO : I->operands()) {
- // We're ignoring regmask operands. That is conservatively correct.
- if (!MO.isReg())
- continue;
- Register Reg = MO.getReg();
- if (!Reg.isPhysical())
- continue;
- // I clobbers Reg, so it isn't live before I.
- if (MO.isDef())
- for (MCRegUnit Unit : TRI->regunits(Reg.asMCReg()))
- LiveRegUnits.erase(Unit);
- // Unless I reads Reg.
- if (MO.readsReg())
- Reads.push_back(Reg.asMCReg());
- }
- // Anything read by I is live before I.
- while (!Reads.empty())
- for (MCRegUnit Unit : TRI->regunits(Reads.pop_back_val()))
- if (ClobberedRegUnits.test(Unit))...
[truncated]
|
static cl::opt<bool> | ||
EnableEarlyIfConversion("amdgpu-early-ifcvt", cl::Hidden, | ||
cl::desc("Run early if-conversion"), | ||
cl::init(false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect most if conversion to be done on the IR level. What cases do we need more aggressive MIR conversion for?
Can this be done in 2 steps? One showing the benefit of turning this one, and then another with the replacement pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one main concern about doing this at IR level: How to handle instructions accessing memory.
I guess it could be done through masked load/store operations. But what if we have some other built-in?
I'm also not sure about how to implement the cost model in the LLVM-IR, but I guess it could be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS: For the moment this pass hasn't covered the cases the early-ifcvt pass covers. I should add it back.
|
||
namespace { | ||
#define DEBUG_TYPE "amdgpu-if-cvt" | ||
const char PassName[] = "AMDGPU if conversion"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need the vpassname variable
#define DEBUG_TYPE "amdgpu-if-cvt" | ||
const char PassName[] = "AMDGPU if conversion"; | ||
|
||
class AMDGPUIfConverter : public MachineFunctionPass { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well start with this factored to be friendly to new PM
INITIALIZE_PASS_BEGIN(AMDGPUIfConverter, DEBUG_TYPE, PassName, false, false) | ||
INITIALIZE_PASS_DEPENDENCY(MachineDominatorTreeWrapperPass) | ||
INITIALIZE_PASS_DEPENDENCY(MachineBranchProbabilityInfoWrapperPass) | ||
INITIALIZE_PASS_END(AMDGPUIfConverter, DEBUG_TYPE, PassName, false, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing line at end of file
@@ -0,0 +1,65 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 | |||
; RUN: llc -mtriple=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need -verify-machineinstrs
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 | ||
; RUN: llc -mtriple=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s | ||
|
||
define amdgpu_kernel void @scalar_cmp(i32 noundef %value, ptr addrspace(8) nocapture writeonly %res, i32 noundef %v_offset, i32 noundef %0, i32 noundef %flag) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer regular functions with inreg arguments for SGPRS over kernels. Also precommit tests
I realize this is a draft patch, but this refactoring of the early if-conversion code is a great idea! As this matures, does it makes sense to separate the target-independent portion into an isolated patch? Or at least provide a more generic title so other downstream users don't overlook the target-independent changes. FWIW, I'm using it downstream in an NVIDIA backend-- specifically, the Thanks again for spearheading this work. |
Could you always generate v_cmp in this pass, and rely on |
Yes, I will do that. I wanted to get some quick validation of this work before being dug in. I think I'm going to start by:
Only then I can start with the AMDGPU target specific stuff. Thanks for the feedback ! |
This will simplify refactoring later.
This will help when adding new EarlyIfConverter transformations (other than the current 2)
9531c0e
to
5b51a3c
Compare
5b51a3c
to
13034d1
Compare
If Conversion updateI'm having second thoughts on the implementation of this optimization as a stand-alone optimization, instead of spreading the transformation across other passes. ProblemsAs it is now, this pass doesn't compose nicely with When I try to generalize this pass from
Alternative solutionI'm thinking that it would be better to spread this transformation in 3 parts:
1. Transform
|
What are you trying to achieve? Do you have a motivating example? The patch description only says what is missing from the patch. |
@jayfoad An extract of the motivating example is this code: https://godbolt.org/z/dPdf7r381 . This extract comes from SWDEV-477895, where CK used inline-assembly to work around code that doesn't optimize to a |
So you want to remove the |
Something like this: v_mov_b32_e32 exec_backup, exec
v_cmpx_lt_i32 exec, s0, 1
s_load_dwordx2 s[6:7], s[4:5], 0x0
s_load_dwordx4 s[0:3], s[4:5], 0x10
s_waitcnt lgkmcnt(0)
s_load_dword s6, s[6:7], 0x0
s_load_dword s4, s[4:5], 0x20
s_waitcnt lgkmcnt(0)
v_mov_b32_e32 v0, s6
v_mov_b32_e32 v1, s4
buffer_store_dword v0, v1, s[0:3], 0 offen
v_mov_b32_e32 exec, exec_backup
This depends if the loaded pointers are marked as dereferenceable. If they are we can safely execute them. |
I think this is roughly the codegen you would get if the branch condition was divergent and |
Thanks ! I didn't know about |
I've started an alternative implementation in main...jmmartinez:llvm-project:amd/if-cvt-v2 I'm starting to submit the patches separately. These new series of patches consists of 3 parts:
|
This is a draft to get some early feedback about this work.
Currently the code is mined with TODOs. Some thing that are missing:
shouldConvert
function)EarlyIfConverter
since the 3 passes share a lot in commongetReversedVCMPXOpcode
EarlyIfConverter
from the pipelineneedsPredication
function is probably wrongThis work is related to
SWDEV-477895
.