Skip to content

Commit 70f013f

Browse files
committed
[AMDGPU] Fix isReallyTriviallyReMaterializable for V_MOV_*
D57708 changed SIInstrInfo::isReallyTriviallyReMaterializable to reject V_MOVs with extra implicit operands, but it accidentally rejected all V_MOVs because of their implicit use of exec. Fix it but avoid adding a moderately expensive call to MI.getDesc().getNumImplicitUses(). In real graphics shaders this changes quite a few vgpr copies into move- immediates, which is good for avoiding stalls on GFX10. Differential Revision: https://reviews.llvm.org/D98347
1 parent e64f3cc commit 70f013f

File tree

2 files changed

+49
-2
lines changed

2 files changed

+49
-2
lines changed

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -116,8 +116,11 @@ bool SIInstrInfo::isReallyTriviallyReMaterializable(const MachineInstr &MI,
116116
case AMDGPU::V_MOV_B64_PSEUDO:
117117
case AMDGPU::V_ACCVGPR_READ_B32_e64:
118118
case AMDGPU::V_ACCVGPR_WRITE_B32_e64:
119-
// No implicit operands.
120-
return MI.getNumOperands() == MI.getDesc().getNumOperands();
119+
// No non-standard implicit operands.
120+
assert(MI.getDesc().getNumOperands() == 2);
121+
assert(MI.getDesc().getNumImplicitDefs() == 0);
122+
assert(MI.getDesc().getNumImplicitUses() == 1);
123+
return MI.getNumOperands() == 3;
121124
default:
122125
return false;
123126
}
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
2+
# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass=simple-register-coalescing -o - %s | FileCheck %s
3+
4+
# Check that we get two move-immediates into %1 and %2, instead of a copy from
5+
# %1 to %2, because that would introduce a dependency and maybe a stall.
6+
---
7+
name: f
8+
tracksRegLiveness: true
9+
body: |
10+
; CHECK-LABEL: name: f
11+
; CHECK: bb.0:
12+
; CHECK: successors: %bb.2(0x40000000), %bb.1(0x40000000)
13+
; CHECK: liveins: $sgpr0
14+
; CHECK: undef %4.sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec
15+
; CHECK: %4.sub1:vreg_96 = V_MOV_B32_e32 0, implicit $exec
16+
; CHECK: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0
17+
; CHECK: $exec = S_MOV_B64_term [[COPY]]
18+
; CHECK: S_CBRANCH_EXECZ %bb.2, implicit $exec
19+
; CHECK: S_BRANCH %bb.1
20+
; CHECK: bb.1:
21+
; CHECK: successors: %bb.2(0x80000000)
22+
; CHECK: %4.sub0:vreg_96 = V_MUL_F32_e32 %4.sub0, %4.sub0, implicit $mode, implicit $exec
23+
; CHECK: %4.sub1:vreg_96 = V_MUL_F32_e32 %4.sub1, %4.sub1, implicit $mode, implicit $exec
24+
; CHECK: bb.2:
25+
; CHECK: S_ENDPGM 0, implicit %4
26+
bb.0:
27+
liveins: $sgpr0
28+
%0:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
29+
%1:vgpr_32 = COPY %0:vgpr_32
30+
%2:vgpr_32 = COPY %0:vgpr_32
31+
%3:sreg_64 = COPY $sgpr0
32+
$exec = S_MOV_B64_term %3:sreg_64
33+
S_CBRANCH_EXECZ %bb.2, implicit $exec
34+
S_BRANCH %bb.1
35+
36+
bb.1:
37+
%1:vgpr_32 = V_MUL_F32_e32 %1:vgpr_32, %1:vgpr_32, implicit $mode, implicit $exec
38+
%2:vgpr_32 = V_MUL_F32_e32 %2:vgpr_32, %2:vgpr_32, implicit $mode, implicit $exec
39+
40+
bb.2:
41+
undef %4.sub0:vreg_96 = COPY %1:vgpr_32
42+
%4.sub1:vreg_96 = COPY %2:vgpr_32
43+
S_ENDPGM 0, implicit %4
44+
...

0 commit comments

Comments
 (0)