-
Notifications
You must be signed in to change notification settings - Fork 14.3k
AMDGPU: Minor improvement and cleanup for waterfall loop generation #111886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
First, ReadlanePieces should be in the scope of each MachineOperand. It is not corret if we declare in a outer scope without clearing after the use for a MachineOperand. Additionally, we do not need the OrigBB for emitLoadScalarOpsFromVGPRLoop, since MachineFunction can be obtained from LoopBB (or BodyBB).
@llvm/pr-subscribers-backend-amdgpu Author: Changpeng Fang (changpeng) ChangesFirst, ReadlanePieces should be in the scope of each MachineOperand. It is not correct if we declare in a outer scope without clearing after the use for a MachineOperand. Full diff: https://github.com/llvm/llvm-project/pull/111886.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 0c2ae382f53a19..d676d561d08180 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -6302,11 +6302,14 @@ void SIInstrInfo::legalizeGenericOperand(MachineBasicBlock &InsertMBB,
// Emit the actual waterfall loop, executing the wrapped instruction for each
// unique value of \p ScalarOps across all lanes. In the best case we execute 1
// iteration, in the worst case we execute 64 (once per lane).
-static void emitLoadScalarOpsFromVGPRLoop(
- const SIInstrInfo &TII, MachineRegisterInfo &MRI, MachineBasicBlock &OrigBB,
- MachineBasicBlock &LoopBB, MachineBasicBlock &BodyBB, const DebugLoc &DL,
- ArrayRef<MachineOperand *> ScalarOps) {
- MachineFunction &MF = *OrigBB.getParent();
+static void
+emitLoadScalarOpsFromVGPRLoop(const SIInstrInfo &TII,
+ MachineRegisterInfo &MRI,
+ MachineBasicBlock &LoopBB,
+ MachineBasicBlock &BodyBB,
+ const DebugLoc &DL,
+ ArrayRef<MachineOperand *> ScalarOps) {
+ MachineFunction &MF = *LoopBB.getParent();
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIRegisterInfo *TRI = ST.getRegisterInfo();
unsigned Exec = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
@@ -6319,8 +6322,6 @@ static void emitLoadScalarOpsFromVGPRLoop(
const auto *BoolXExecRC = TRI->getWaveMaskRegClass();
MachineBasicBlock::iterator I = LoopBB.begin();
-
- SmallVector<Register, 8> ReadlanePieces;
Register CondReg;
for (MachineOperand *ScalarOp : ScalarOps) {
@@ -6355,6 +6356,7 @@ static void emitLoadScalarOpsFromVGPRLoop(
ScalarOp->setReg(CurReg);
ScalarOp->setIsKill();
} else {
+ SmallVector<Register, 8> ReadlanePieces;
unsigned VScalarOpUndef = getUndefRegState(ScalarOp->isUndef());
assert(NumSubRegs % 2 == 0 && NumSubRegs <= 32 &&
"Unhandled register size");
@@ -6535,7 +6537,7 @@ loadMBUFScalarOperandsFromVGPR(const SIInstrInfo &TII, MachineInstr &MI,
}
}
- emitLoadScalarOpsFromVGPRLoop(TII, MRI, MBB, *LoopBB, *BodyBB, DL, ScalarOps);
+ emitLoadScalarOpsFromVGPRLoop(TII, MRI, *LoopBB, *BodyBB, DL, ScalarOps);
MachineBasicBlock::iterator First = RemainderBB->begin();
// Restore SCC
|
You can test this locally with the following command:git-clang-format --diff 13cd43aa6fa1dc5bfb96119db43b8c549386a86e a1c6365fc5060374b4a1dfeb9b5e54c3a9630e46 --extensions cpp -- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp View the diff from clang-format here.diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index d676d561d0..84c9462d5f 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -6303,11 +6303,9 @@ void SIInstrInfo::legalizeGenericOperand(MachineBasicBlock &InsertMBB,
// unique value of \p ScalarOps across all lanes. In the best case we execute 1
// iteration, in the worst case we execute 64 (once per lane).
static void
-emitLoadScalarOpsFromVGPRLoop(const SIInstrInfo &TII,
- MachineRegisterInfo &MRI,
+emitLoadScalarOpsFromVGPRLoop(const SIInstrInfo &TII, MachineRegisterInfo &MRI,
MachineBasicBlock &LoopBB,
- MachineBasicBlock &BodyBB,
- const DebugLoc &DL,
+ MachineBasicBlock &BodyBB, const DebugLoc &DL,
ArrayRef<MachineOperand *> ScalarOps) {
MachineFunction &MF = *LoopBB.getParent();
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…lvm#111886) First, ReadlanePieces should be in the scope of each MachineOperand. It is not correct if we declare in a outer scope without clearing after the use for a MachineOperand. Additionally, we do not need the OrigBB argyment for emitLoadScalarOpsFromVGPRLoop, since MachineFunction (the only use) can be obtained from LoopBB (or BodyBB).
First, ReadlanePieces should be in the scope of each MachineOperand. It is not correct if we declare in a outer scope without clearing after the use for a MachineOperand.
Additionally, we do not need the OrigBB argyment for emitLoadScalarOpsFromVGPRLoop, since MachineFunction (the only use) can be obtained from LoopBB (or BodyBB).