Skip to content

AMDGPU: Remove redundant operand folding checks #140587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 0 additions & 18 deletions llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -777,24 +777,6 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
return true;
}

// Check the case where we might introduce a second constant operand to a
// scalar instruction
if (TII->isSALU(MI->getOpcode())) {
const MCInstrDesc &InstDesc = MI->getDesc();
const MCOperandInfo &OpInfo = InstDesc.operands()[OpNo];

// Fine if the operand can be encoded as an inline constant
if (!OpToFold->isReg() && !TII->isInlineConstant(*OpToFold, OpInfo)) {
// Otherwise check for another constant
for (unsigned i = 0, e = InstDesc.getNumOperands(); i != e; ++i) {
auto &Op = MI->getOperand(i);
if (OpNo != i && !Op.isReg() &&
!TII->isInlineConstant(Op, InstDesc.operands()[i]))
return false;
}
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am seeing a regression because of the removal of the above code:
For the following instructions:
%333:sreg_32 = S_MOV_B32 96
%334:sreg_32 = S_OR_B32 %stack.35.argument10.addr, %333:sreg_32, implicit-def dead $scc

After folding:
renamable $sgpr26 = S_OR_B32 896, 96, implicit-def dead $scc

@arsenm : can you explain why this pre-filtering is not necessary? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because isOperandLegal is checked before the actual fold. This case is the special case with a frame index which we don't know the value of at this point which this avoids

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because isOperandLegal is checked before the actual fold. This case is the special case with a frame index which we don't know the value of at this point which this avoids

So, should we put the pre-filtering code back, or fix something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix something else. I'm pretty sure this exact case is already fixed in this patch

appendFoldCandidate(FoldList, MI, OpNo, OpToFold);
return true;
}
Expand Down
6 changes: 6 additions & 0 deletions llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6063,6 +6063,12 @@ bool SIInstrInfo::isOperandLegal(const MachineInstr &MI, unsigned OpIdx,
!isInlineConstant(Op, InstDesc.operands()[i]) &&
!Op.isIdenticalTo(*MO))
return false;

// Do not fold a frame index into an instruction that already has a frame
// index. The frame index handling code doesn't handle fixing up operand
// constraints if there are multiple indexes.
if (Op.isFI() && MO->isFI())
return false;
}
} else if (IsInlineConst && ST.hasNoF16PseudoScalarTransInlineConstants() &&
isF16PseudoScalarTrans(MI.getOpcode())) {
Expand Down
101 changes: 101 additions & 0 deletions llvm/test/CodeGen/AMDGPU/fold-operands-frame-index.mir
Original file line number Diff line number Diff line change
Expand Up @@ -573,3 +573,104 @@ body: |
S_ENDPGM 0, implicit %2

...

---
name: no_fold_multiple_fi_s_cselect_b32
tracksRegLiveness: true
stack:
- { id: 0, size: 64, alignment: 4 }
- { id: 1, size: 32, alignment: 4 }
body: |
bb.0:
; CHECK-LABEL: name: no_fold_multiple_fi_s_cselect_b32
; CHECK: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 %stack.1
; CHECK-NEXT: [[S_CSELECT_B32_:%[0-9]+]]:sreg_32 = S_CSELECT_B32 killed [[S_MOV_B32_]], %stack.0, implicit undef $scc
; CHECK-NEXT: S_ENDPGM 0, implicit [[S_CSELECT_B32_]]
%0:sreg_32 = S_MOV_B32 %stack.0
%1:sreg_32 = S_MOV_B32 %stack.1
%2:sreg_32 = S_CSELECT_B32 killed %1, killed %0, implicit undef $scc
S_ENDPGM 0, implicit %2

...

---
name: no_fold_multiple_fi_v_cndmask_b32_e64
tracksRegLiveness: true
stack:
- { id: 0, size: 64, alignment: 4 }
- { id: 1, size: 32, alignment: 4 }
body: |
bb.0:
liveins: $sgpr8_sgpr9
; GFX9-LABEL: name: no_fold_multiple_fi_v_cndmask_b32_e64
; GFX9: liveins: $sgpr8_sgpr9
; GFX9-NEXT: {{ $}}
; GFX9-NEXT: [[COPY:%[0-9]+]]:sreg_64_xexec = COPY $sgpr8_sgpr9
; GFX9-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
; GFX9-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.1, implicit $exec
; GFX9-NEXT: [[V_CNDMASK_B32_e64_:%[0-9]+]]:vgpr_32 = V_CNDMASK_B32_e64 0, killed [[V_MOV_B32_e32_]], 0, killed [[V_MOV_B32_e32_1]], [[COPY]], implicit $exec
; GFX9-NEXT: S_ENDPGM 0, implicit [[V_CNDMASK_B32_e64_]]
;
; GFX10-LABEL: name: no_fold_multiple_fi_v_cndmask_b32_e64
; GFX10: liveins: $sgpr8_sgpr9
; GFX10-NEXT: {{ $}}
; GFX10-NEXT: [[COPY:%[0-9]+]]:sreg_64_xexec = COPY $sgpr8_sgpr9
; GFX10-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.1, implicit $exec
; GFX10-NEXT: [[V_CNDMASK_B32_e64_:%[0-9]+]]:vgpr_32 = V_CNDMASK_B32_e64 0, %stack.0, 0, killed [[V_MOV_B32_e32_]], [[COPY]], implicit $exec
; GFX10-NEXT: S_ENDPGM 0, implicit [[V_CNDMASK_B32_e64_]]
;
; GFX12-LABEL: name: no_fold_multiple_fi_v_cndmask_b32_e64
; GFX12: liveins: $sgpr8_sgpr9
; GFX12-NEXT: {{ $}}
; GFX12-NEXT: [[COPY:%[0-9]+]]:sreg_64_xexec = COPY $sgpr8_sgpr9
; GFX12-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.1, implicit $exec
; GFX12-NEXT: [[V_CNDMASK_B32_e64_:%[0-9]+]]:vgpr_32 = V_CNDMASK_B32_e64 0, %stack.0, 0, killed [[V_MOV_B32_e32_]], [[COPY]], implicit $exec
; GFX12-NEXT: S_ENDPGM 0, implicit [[V_CNDMASK_B32_e64_]]
%0:sreg_64_xexec = COPY $sgpr8_sgpr9
%1:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
%2:vgpr_32 = V_MOV_B32_e32 %stack.1, implicit $exec
%3:vgpr_32 = V_CNDMASK_B32_e64 0, killed %1, 0, killed %2, %0, implicit $exec
S_ENDPGM 0, implicit %3

...

---
name: no_fold_multiple_fi_v_cndmask_b32_e32
tracksRegLiveness: true
stack:
- { id: 0, size: 64, alignment: 4 }
- { id: 1, size: 32, alignment: 4 }
body: |
bb.0:
liveins: $sgpr8_sgpr9
; GFX9-LABEL: name: no_fold_multiple_fi_v_cndmask_b32_e32
; GFX9: liveins: $sgpr8_sgpr9
; GFX9-NEXT: {{ $}}
; GFX9-NEXT: $vcc = COPY $sgpr8_sgpr9
; GFX9-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
; GFX9-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.1, implicit $exec
; GFX9-NEXT: [[V_CNDMASK_B32_e32_:%[0-9]+]]:vgpr_32 = V_CNDMASK_B32_e32 killed [[V_MOV_B32_e32_]], killed [[V_MOV_B32_e32_1]], implicit $vcc, implicit $exec
; GFX9-NEXT: S_ENDPGM 0, implicit [[V_CNDMASK_B32_e32_]]
;
; GFX10-LABEL: name: no_fold_multiple_fi_v_cndmask_b32_e32
; GFX10: liveins: $sgpr8_sgpr9
; GFX10-NEXT: {{ $}}
; GFX10-NEXT: $vcc = COPY $sgpr8_sgpr9
; GFX10-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.1, implicit $exec
; GFX10-NEXT: [[V_CNDMASK_B32_e32_:%[0-9]+]]:vgpr_32 = V_CNDMASK_B32_e32 %stack.0, killed [[V_MOV_B32_e32_]], implicit $vcc, implicit $exec
; GFX10-NEXT: S_ENDPGM 0, implicit [[V_CNDMASK_B32_e32_]]
;
; GFX12-LABEL: name: no_fold_multiple_fi_v_cndmask_b32_e32
; GFX12: liveins: $sgpr8_sgpr9
; GFX12-NEXT: {{ $}}
; GFX12-NEXT: $vcc = COPY $sgpr8_sgpr9
; GFX12-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.1, implicit $exec
; GFX12-NEXT: [[V_CNDMASK_B32_e32_:%[0-9]+]]:vgpr_32 = V_CNDMASK_B32_e32 %stack.0, killed [[V_MOV_B32_e32_]], implicit $vcc, implicit $exec
; GFX12-NEXT: S_ENDPGM 0, implicit [[V_CNDMASK_B32_e32_]]
$vcc = COPY $sgpr8_sgpr9
%1:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
%2:vgpr_32 = V_MOV_B32_e32 %stack.1, implicit $exec
%3:vgpr_32 = V_CNDMASK_B32_e32 killed %1, killed %2, implicit $vcc, implicit $exec
S_ENDPGM 0, implicit %3

...
Loading