Skip to content

Commit 3309219

Browse files
committed
[AMDGPU] Select AGPR in PHI operand legalization
If a PHI defines AGPR legalize its operands to AGPR. At the moment we can get an AGPR PHI with VGPR operands. I am not aware of any problems as it seems to be handled gracefully in RA, but this is not right anyway. It also slightly decreases VGPR pressure in some cases because we do not have to a copy via VGPR. Differential Revision: https://reviews.llvm.org/D69206 llvm-svn: 375446
1 parent 7c15c4f commit 3309219

File tree

2 files changed

+56
-1
lines changed

2 files changed

+56
-1
lines changed

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4576,6 +4576,10 @@ void SIInstrInfo::legalizeOperands(MachineInstr &MI,
45764576
VRC = RI.hasAGPRs(getOpRegClass(MI, 0))
45774577
? RI.getEquivalentAGPRClass(SRC)
45784578
: RI.getEquivalentVGPRClass(SRC);
4579+
} else {
4580+
VRC = RI.hasAGPRs(getOpRegClass(MI, 0))
4581+
? RI.getEquivalentAGPRClass(VRC)
4582+
: RI.getEquivalentVGPRClass(VRC);
45794583
}
45804584
RC = VRC;
45814585
} else {

llvm/test/CodeGen/AMDGPU/mfma-loop.ll

Lines changed: 52 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,64 @@
11
; RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s
22

33
; GCN-LABEL: {{^}}test_mfma_loop_zeroinit:
4-
; GCN-COUNT32: v_accvgpr_write_b32
4+
5+
; Check that we do not use 32 temp vgprs, but rotate 3 vgprs only.
6+
; 3 vgprs are needed to avoid wait states between writes.
7+
8+
; FIXME: We should not be using and temporary registers at all.
9+
; At the moment we initialize an sgpr, then copy it via vgprs.
10+
11+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2:v[0-9]+]]
12+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3:v[0-9]+]]
13+
14+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1:v[0-9]+]]
15+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
16+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
17+
18+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1]]
19+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
20+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
21+
22+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1]]
23+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
24+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
25+
26+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1]]
27+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
28+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
29+
30+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1]]
31+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
32+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
33+
34+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1]]
35+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
36+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
37+
38+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1]]
39+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
40+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
41+
42+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1]]
43+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
44+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
45+
46+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1]]
47+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
48+
; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
49+
50+
; Check that we do not copy agprs to vgprs and back inside the loop.
51+
552
; GCN: [[LOOP:BB[0-9_]+]]:
653
; GCN-NOT: v_accvgpr
754
; GCN: v_mfma_f32_32x32x1f32
855
; GCN-NOT: v_accvgpr
956
; GCN: s_cbranch_scc1 [[LOOP]]
57+
58+
; Final result should be read only once after the loop.
59+
1060
; GCN-COUNT32: v_accvgpr_read_b32
61+
1162
define amdgpu_kernel void @test_mfma_loop_zeroinit(<32 x float> addrspace(1)* %arg) {
1263
entry:
1364
br label %for.cond.preheader

0 commit comments

Comments
 (0)