AMDGPU: Minor improvement and cleanup for waterfall loop generation #111886

changpeng · 2024-10-10T18:29:34Z

First, ReadlanePieces should be in the scope of each MachineOperand. It is not correct if we declare in a outer scope without clearing after the use for a MachineOperand.
Additionally, we do not need the OrigBB argyment for emitLoadScalarOpsFromVGPRLoop, since MachineFunction (the only use) can be obtained from LoopBB (or BodyBB).

First, ReadlanePieces should be in the scope of each MachineOperand. It is not corret if we declare in a outer scope without clearing after the use for a MachineOperand. Additionally, we do not need the OrigBB for emitLoadScalarOpsFromVGPRLoop, since MachineFunction can be obtained from LoopBB (or BodyBB).

llvmbot · 2024-10-10T18:30:13Z

@llvm/pr-subscribers-backend-amdgpu

Author: Changpeng Fang (changpeng)

Changes

First, ReadlanePieces should be in the scope of each MachineOperand. It is not correct if we declare in a outer scope without clearing after the use for a MachineOperand.
Additionally, we do not need the OrigBB argyment for emitLoadScalarOpsFromVGPRLoop, since MachineFunction (the only use) can be obtained from LoopBB (or BodyBB).

Full diff: https://github.com/llvm/llvm-project/pull/111886.diff

1 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+10-8)

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 0c2ae382f53a19..d676d561d08180 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -6302,11 +6302,14 @@ void SIInstrInfo::legalizeGenericOperand(MachineBasicBlock &InsertMBB,
 // Emit the actual waterfall loop, executing the wrapped instruction for each
 // unique value of \p ScalarOps across all lanes. In the best case we execute 1
 // iteration, in the worst case we execute 64 (once per lane).
-static void emitLoadScalarOpsFromVGPRLoop(
-    const SIInstrInfo &TII, MachineRegisterInfo &MRI, MachineBasicBlock &OrigBB,
-    MachineBasicBlock &LoopBB, MachineBasicBlock &BodyBB, const DebugLoc &DL,
-    ArrayRef<MachineOperand *> ScalarOps) {
-  MachineFunction &MF = *OrigBB.getParent();
+static void
+emitLoadScalarOpsFromVGPRLoop(const SIInstrInfo &TII,
+                              MachineRegisterInfo &MRI,
+                              MachineBasicBlock &LoopBB,
+                              MachineBasicBlock &BodyBB,
+                              const DebugLoc &DL,
+                              ArrayRef<MachineOperand *> ScalarOps) {
+  MachineFunction &MF = *LoopBB.getParent();
   const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
   const SIRegisterInfo *TRI = ST.getRegisterInfo();
   unsigned Exec = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
@@ -6319,8 +6322,6 @@ static void emitLoadScalarOpsFromVGPRLoop(
   const auto *BoolXExecRC = TRI->getWaveMaskRegClass();
 
   MachineBasicBlock::iterator I = LoopBB.begin();
-
-  SmallVector<Register, 8> ReadlanePieces;
   Register CondReg;
 
   for (MachineOperand *ScalarOp : ScalarOps) {
@@ -6355,6 +6356,7 @@ static void emitLoadScalarOpsFromVGPRLoop(
       ScalarOp->setReg(CurReg);
       ScalarOp->setIsKill();
     } else {
+      SmallVector<Register, 8> ReadlanePieces;
       unsigned VScalarOpUndef = getUndefRegState(ScalarOp->isUndef());
       assert(NumSubRegs % 2 == 0 && NumSubRegs <= 32 &&
              "Unhandled register size");
@@ -6535,7 +6537,7 @@ loadMBUFScalarOperandsFromVGPR(const SIInstrInfo &TII, MachineInstr &MI,
     }
   }
 
-  emitLoadScalarOpsFromVGPRLoop(TII, MRI, MBB, *LoopBB, *BodyBB, DL, ScalarOps);
+  emitLoadScalarOpsFromVGPRLoop(TII, MRI, *LoopBB, *BodyBB, DL, ScalarOps);
 
   MachineBasicBlock::iterator First = RemainderBB->begin();
   // Restore SCC

github-actions · 2024-10-10T18:33:15Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 13cd43aa6fa1dc5bfb96119db43b8c549386a86e a1c6365fc5060374b4a1dfeb9b5e54c3a9630e46 --extensions cpp -- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

View the diff from clang-format here.

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index d676d561d0..84c9462d5f 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -6303,11 +6303,9 @@ void SIInstrInfo::legalizeGenericOperand(MachineBasicBlock &InsertMBB,
 // unique value of \p ScalarOps across all lanes. In the best case we execute 1
 // iteration, in the worst case we execute 64 (once per lane).
 static void
-emitLoadScalarOpsFromVGPRLoop(const SIInstrInfo &TII,
-                              MachineRegisterInfo &MRI,
+emitLoadScalarOpsFromVGPRLoop(const SIInstrInfo &TII, MachineRegisterInfo &MRI,
                               MachineBasicBlock &LoopBB,
-                              MachineBasicBlock &BodyBB,
-                              const DebugLoc &DL,
+                              MachineBasicBlock &BodyBB, const DebugLoc &DL,
                               ArrayRef<MachineOperand *> ScalarOps) {
   MachineFunction &MF = *LoopBB.getParent();
   const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();

rampitec

LGTM

…lvm#111886) First, ReadlanePieces should be in the scope of each MachineOperand. It is not correct if we declare in a outer scope without clearing after the use for a MachineOperand. Additionally, we do not need the OrigBB argyment for emitLoadScalarOpsFromVGPRLoop, since MachineFunction (the only use) can be obtained from LoopBB (or BodyBB).

llvmbot added the backend:AMDGPU label Oct 10, 2024

changpeng requested review from rampitec and jayfoad October 10, 2024 18:30

rampitec approved these changes Oct 10, 2024

View reviewed changes

changpeng merged commit f6e93b8 into llvm:main Oct 10, 2024
6 of 9 checks passed

changpeng deleted the water branch October 11, 2024 05:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU: Minor improvement and cleanup for waterfall loop generation #111886

AMDGPU: Minor improvement and cleanup for waterfall loop generation #111886

Uh oh!

changpeng commented Oct 10, 2024

Uh oh!

llvmbot commented Oct 10, 2024

Uh oh!

github-actions bot commented Oct 10, 2024

Uh oh!

rampitec left a comment

Uh oh!

Uh oh!

Uh oh!

AMDGPU: Minor improvement and cleanup for waterfall loop generation #111886

AMDGPU: Minor improvement and cleanup for waterfall loop generation #111886

Uh oh!

Conversation

changpeng commented Oct 10, 2024

Uh oh!

llvmbot commented Oct 10, 2024

Uh oh!

github-actions bot commented Oct 10, 2024

Uh oh!

rampitec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!