-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU][True16][CodeGen] do not legalize t16 operand during user scan #145450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU][True16][CodeGen] do not legalize t16 operand during user scan #145450
Conversation
16cef3a
to
38e2154
Compare
38e2154
to
9b3841a
Compare
@llvm/pr-subscribers-backend-amdgpu Author: Brox Chen (broxigarchen) ChangesThe legalize t16 operand function could insert a reg_sequence which modify the user list of the targetted register, and we should not call it in the middle of an user list iteration Full diff: https://github.com/llvm/llvm-project/pull/145450.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 333e91bf37df5..ca29c71030942 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -8716,6 +8716,8 @@ void SIInstrInfo::splitScalar64BitCountOp(SIInstrWorklist &Worklist,
void SIInstrInfo::addUsersToMoveToVALUWorklist(
Register DstReg, MachineRegisterInfo &MRI,
SIInstrWorklist &Worklist) const {
+ SmallVector<std::pair<MachineInstr *, unsigned>, 4> LegalizeList;
+
for (MachineRegisterInfo::use_iterator I = MRI.use_begin(DstReg),
E = MRI.use_end(); I != E;) {
MachineInstr &UseMI = *I->getParent();
@@ -8744,11 +8746,14 @@ void SIInstrInfo::addUsersToMoveToVALUWorklist(
++I;
} while (I != E && I->getParent() == &UseMI);
} else {
- legalizeOperandsVALUt16(UseMI, OpNo, MRI);
+ LegalizeList.push_back(std::make_pair(&UseMI, OpNo));
++I;
}
}
+
+ for (auto &MI : LegalizeList)
+ legalizeOperandsVALUt16(*MI.first, MI.second, MRI);
}
void SIInstrInfo::movePackToVALU(SIInstrWorklist &Worklist,
diff --git a/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir
index 9b6a2f3a1aa1e..59184499de2f0 100644
--- a/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir
+++ b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir
@@ -108,6 +108,32 @@ body: |
%4:sreg_32 = S_FMAC_F16 %3:sreg_32, %3:sreg_32, %2:sreg_32, implicit $mode
...
+---
+name: legalize_with_multi_user
+body: |
+ bb.0:
+ ; GCN-LABEL: name: legalize_with_multi_user
+ ; GCN: [[DEF:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF
+ ; GCN-NEXT: [[DEF1:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF
+ ; GCN-NEXT: [[REG_SEQUENCE:%[0-9]+]]:vgpr_32 = REG_SEQUENCE [[DEF]], %subreg.lo16, [[DEF1]], %subreg.hi16
+ ; GCN-NEXT: [[V_ADD_F16_t16_e64_:%[0-9]+]]:vgpr_16 = V_ADD_F16_t16_e64 0, [[REG_SEQUENCE]].lo16, 0, 1, 0, 0, 0, implicit $mode, implicit $exec
+ ; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 32768
+ ; GCN-NEXT: [[DEF2:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[DEF3:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF
+ ; GCN-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:vgpr_32 = REG_SEQUENCE [[V_ADD_F16_t16_e64_]], %subreg.lo16, [[DEF3]], %subreg.hi16
+ ; GCN-NEXT: [[V_PK_FMA_F16_:%[0-9]+]]:vgpr_32 = V_PK_FMA_F16 11, [[S_MOV_B32_]], 0, [[REG_SEQUENCE1]], 8, [[DEF2]], 0, 0, 0, 0, 0, implicit $mode, implicit $exec
+ ; GCN-NEXT: [[DEF4:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF
+ ; GCN-NEXT: [[REG_SEQUENCE2:%[0-9]+]]:vgpr_32 = REG_SEQUENCE [[V_ADD_F16_t16_e64_]], %subreg.lo16, [[DEF4]], %subreg.hi16
+ ; GCN-NEXT: [[V_XOR_B32_e64_:%[0-9]+]]:vgpr_32 = V_XOR_B32_e64 [[REG_SEQUENCE2]], [[S_MOV_B32_]], implicit $exec
+ %0:vgpr_16 = IMPLICIT_DEF
+ %1:sreg_32 = COPY %0:vgpr_16
+ %2:sreg_32 = S_ADD_F16 %1:sreg_32, 1, implicit $mode
+ %3:sreg_32 = S_MOV_B32 32768
+ %4:vgpr_32 = IMPLICIT_DEF
+ %5:vgpr_32 = V_PK_FMA_F16 11, %3:sreg_32, 0, %2:sreg_32, 8, %4:vgpr_32, 0, 0, 0, 0, 0, implicit $mode, implicit $exec
+ %6:sreg_32 = S_XOR_B32 %2:sreg_32, %3:sreg_32, implicit-def dead $scc
+...
+
---
name: vgpr16_to_spgr32
body: |
|
9b3841a
to
f53b5ca
Compare
; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 32768 | ||
; GCN-NEXT: [[DEF2:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF | ||
; GCN-NEXT: [[DEF3:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF | ||
; GCN-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:vgpr_32 = REG_SEQUENCE [[V_ADD_F16_t16_e64_]], %subreg.lo16, [[DEF3]], %subreg.hi16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it reasonable to have one REG_SEQUENCE per use? I guess most or all of them will eventually fold out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reg_sequence should be fold out in the two-address, rewrite virtual reg and coalescer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the noted cosmetic issues, LGTM
56e2b07
to
75ce40d
Compare
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/195/builds/10898 Here is the relevant piece of the build log for the reference
|
But you are still calling |
Hi Jay. The problem is mainly from the mismatched register size in the MIR after we introduce VGPR16 but no SGPR16. There are a few places that are generating mismatched register size in the moveToVALU sequence. i.e. A SALU32 used by SALU16. Before moved to VALU this is legal as they all use sgpr32. When this def-use is moved to VALU. SALU32 becomes a VALU32 (sgpr32 to vpgr32), and SALU16 becomes a VALU16 (sgpr32 to vgpr16). The register size is mismatched in the def-use chain. Thus, in moveToVALU sequence, we need to legalize these operands. We cannot simply legalize the operands of the current instruction, but also need to legalize the users of the current instruction, because the iteration order of the moveToVALU sequence is bottom up (see #138734 (comment)). For this reason, we need to add user list scan in multiple places in the movetoVALU function to do the legalization which it's pretty ugly. Thus, I think since the |
llvm#145450) The legalize t16 operand function could insert a reg_sequence which modify the user list of the targetted register, and we should not call it in the middle of an user list iteration
The legalize t16 operand function could insert a reg_sequence which modify the user list of the targetted register, and we should not call it in the middle of an user list iteration