Skip to content

Commit 58ec754

Browse files
committed
AMDGPU: Clear kill flags after FoldZeroHighBits (llvm#99582)
After folding, all uses of the result register are going to be replaced by the operand register. The kill flags on the uses of the result and operand registers are no longer valid after the replacement, and need to be cleared. The only exception is, however, if the kill flag is set for the operand register, we are sure the last use of the result register is the new last use of the operand register, and thus we are safe to keep the kill flags. Change-Id: I60dfe5d031d6a86d41f41113c284e6944faa4e02
1 parent d61b25f commit 58ec754

File tree

2 files changed

+57
-1
lines changed

2 files changed

+57
-1
lines changed

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1346,7 +1346,9 @@ bool SIFoldOperands::tryFoldZeroHighBits(MachineInstr &MI) const {
13461346
return false;
13471347

13481348
Register Dst = MI.getOperand(0).getReg();
1349-
MRI->replaceRegWith(Dst, SrcDef->getOperand(0).getReg());
1349+
MRI->replaceRegWith(Dst, Src1);
1350+
if (!MI.getOperand(2).isKill())
1351+
MRI->clearKillFlags(Src1);
13501352
MI.eraseFromParent();
13511353
return true;
13521354
}
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
2+
# RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx940 -verify-machineinstrs -run-pass si-fold-operands -o - %s | FileCheck -enable-var-scope -check-prefix=GCN %s
3+
4+
---
5+
name: fold_zero_high_bits_src1_alive
6+
tracksRegLiveness: true
7+
8+
body: |
9+
bb.0:
10+
liveins: $vgpr0, $vgpr1
11+
; GCN-LABEL: name: fold_zero_high_bits_src1_alive
12+
; GCN: liveins: $vgpr0, $vgpr1
13+
; GCN-NEXT: {{ $}}
14+
; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
15+
; GCN-NEXT: [[V_ADD_U16_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U16_e64 [[COPY]], 1, 0, implicit $exec
16+
; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
17+
; GCN-NEXT: [[V_MUL_U32_U24_e64_:%[0-9]+]]:vgpr_32 = V_MUL_U32_U24_e64 [[V_ADD_U16_e64_]], 1, 0, implicit $exec
18+
; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
19+
; GCN-NEXT: [[V_SUB_U16_e64_:%[0-9]+]]:vgpr_32 = V_SUB_U16_e64 [[V_ADD_U16_e64_]], [[COPY1]], 0, implicit $exec
20+
%0:vgpr_32 = COPY $vgpr0
21+
%1:sreg_32 = S_MOV_B32 1
22+
%2:vgpr_32 = V_ADD_U16_e64 %0:vgpr_32, %1:sreg_32, 0, implicit $exec
23+
%3:sreg_32 = S_MOV_B32 65535
24+
%4:vgpr_32 = V_AND_B32_e64 %3:sreg_32, %2:vgpr_32, implicit $exec
25+
%5:vgpr_32 = V_MUL_U32_U24_e64 killed %4:vgpr_32, %1:sreg_32, 0, implicit $exec
26+
%6:vgpr_32 = COPY $vgpr1
27+
%7:vgpr_32 = V_SUB_U16_e64 %2:vgpr_32, %6:vgpr_32, 0, implicit $exec
28+
...
29+
30+
---
31+
name: fold_zero_high_bits_src1_killed
32+
tracksRegLiveness: true
33+
34+
body: |
35+
bb.0:
36+
liveins: $vgpr0, $vgpr1
37+
; GCN-LABEL: name: fold_zero_high_bits_src1_killed
38+
; GCN: liveins: $vgpr0, $vgpr1
39+
; GCN-NEXT: {{ $}}
40+
; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
41+
; GCN-NEXT: [[V_ADD_U16_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U16_e64 [[COPY]], 1, 0, implicit $exec
42+
; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
43+
; GCN-NEXT: [[V_SUB_U16_e64_:%[0-9]+]]:vgpr_32 = V_SUB_U16_e64 [[V_ADD_U16_e64_]], [[COPY1]], 0, implicit $exec
44+
; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
45+
; GCN-NEXT: [[V_MUL_U32_U24_e64_:%[0-9]+]]:vgpr_32 = V_MUL_U32_U24_e64 killed [[V_ADD_U16_e64_]], 1, 0, implicit $exec
46+
%0:vgpr_32 = COPY $vgpr0
47+
%1:sreg_32 = S_MOV_B32 1
48+
%2:vgpr_32 = V_ADD_U16_e64 %0:vgpr_32, %1:sreg_32, 0, implicit $exec
49+
%6:vgpr_32 = COPY $vgpr1
50+
%7:vgpr_32 = V_SUB_U16_e64 %2:vgpr_32, %6:vgpr_32, 0, implicit $exec
51+
%3:sreg_32 = S_MOV_B32 65535
52+
%4:vgpr_32 = V_AND_B32_e64 %3:sreg_32, killed %2:vgpr_32, implicit $exec
53+
%5:vgpr_32 = V_MUL_U32_U24_e64 killed %4:vgpr_32, %1:sreg_32, 0, implicit $exec
54+
...

0 commit comments

Comments
 (0)