Skip to content

Commit 09feb6c

Browse files
changpengyuxuanchen1997
authored andcommitted
AMDGPU: Clear kill flags after FoldZeroHighBits (#99582)
Summary: After folding, all uses of the result register are going to be replaced by the operand register. The kill flags on the uses of the result and operand registers are no longer valid after the replacement, and need to be cleared. The only exception is, however, if the kill flag is set for the operand register, we are sure the last use of the result register is the new last use of the operand register, and thus we are safe to keep the kill flags. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251304
1 parent 6c9a6ab commit 09feb6c

File tree

2 files changed

+57
-1
lines changed

2 files changed

+57
-1
lines changed

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1361,7 +1361,9 @@ bool SIFoldOperands::tryFoldZeroHighBits(MachineInstr &MI) const {
13611361
return false;
13621362

13631363
Register Dst = MI.getOperand(0).getReg();
1364-
MRI->replaceRegWith(Dst, SrcDef->getOperand(0).getReg());
1364+
MRI->replaceRegWith(Dst, Src1);
1365+
if (!MI.getOperand(2).isKill())
1366+
MRI->clearKillFlags(Src1);
13651367
MI.eraseFromParent();
13661368
return true;
13671369
}
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
2+
# RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx940 -verify-machineinstrs -run-pass si-fold-operands -o - %s | FileCheck -enable-var-scope -check-prefix=GCN %s
3+
4+
---
5+
name: fold_zero_high_bits_src1_alive
6+
tracksRegLiveness: true
7+
8+
body: |
9+
bb.0:
10+
liveins: $vgpr0, $vgpr1
11+
; GCN-LABEL: name: fold_zero_high_bits_src1_alive
12+
; GCN: liveins: $vgpr0, $vgpr1
13+
; GCN-NEXT: {{ $}}
14+
; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
15+
; GCN-NEXT: [[V_ADD_U16_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U16_e64 [[COPY]], 1, 0, implicit $exec
16+
; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
17+
; GCN-NEXT: [[V_MUL_U32_U24_e64_:%[0-9]+]]:vgpr_32 = V_MUL_U32_U24_e64 [[V_ADD_U16_e64_]], 1, 0, implicit $exec
18+
; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
19+
; GCN-NEXT: [[V_SUB_U16_e64_:%[0-9]+]]:vgpr_32 = V_SUB_U16_e64 [[V_ADD_U16_e64_]], [[COPY1]], 0, implicit $exec
20+
%0:vgpr_32 = COPY $vgpr0
21+
%1:sreg_32 = S_MOV_B32 1
22+
%2:vgpr_32 = V_ADD_U16_e64 %0:vgpr_32, %1:sreg_32, 0, implicit $exec
23+
%3:sreg_32 = S_MOV_B32 65535
24+
%4:vgpr_32 = V_AND_B32_e64 %3:sreg_32, %2:vgpr_32, implicit $exec
25+
%5:vgpr_32 = V_MUL_U32_U24_e64 killed %4:vgpr_32, %1:sreg_32, 0, implicit $exec
26+
%6:vgpr_32 = COPY $vgpr1
27+
%7:vgpr_32 = V_SUB_U16_e64 %2:vgpr_32, %6:vgpr_32, 0, implicit $exec
28+
...
29+
30+
---
31+
name: fold_zero_high_bits_src1_killed
32+
tracksRegLiveness: true
33+
34+
body: |
35+
bb.0:
36+
liveins: $vgpr0, $vgpr1
37+
; GCN-LABEL: name: fold_zero_high_bits_src1_killed
38+
; GCN: liveins: $vgpr0, $vgpr1
39+
; GCN-NEXT: {{ $}}
40+
; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
41+
; GCN-NEXT: [[V_ADD_U16_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U16_e64 [[COPY]], 1, 0, implicit $exec
42+
; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
43+
; GCN-NEXT: [[V_SUB_U16_e64_:%[0-9]+]]:vgpr_32 = V_SUB_U16_e64 [[V_ADD_U16_e64_]], [[COPY1]], 0, implicit $exec
44+
; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
45+
; GCN-NEXT: [[V_MUL_U32_U24_e64_:%[0-9]+]]:vgpr_32 = V_MUL_U32_U24_e64 killed [[V_ADD_U16_e64_]], 1, 0, implicit $exec
46+
%0:vgpr_32 = COPY $vgpr0
47+
%1:sreg_32 = S_MOV_B32 1
48+
%2:vgpr_32 = V_ADD_U16_e64 %0:vgpr_32, %1:sreg_32, 0, implicit $exec
49+
%6:vgpr_32 = COPY $vgpr1
50+
%7:vgpr_32 = V_SUB_U16_e64 %2:vgpr_32, %6:vgpr_32, 0, implicit $exec
51+
%3:sreg_32 = S_MOV_B32 65535
52+
%4:vgpr_32 = V_AND_B32_e64 %3:sreg_32, killed %2:vgpr_32, implicit $exec
53+
%5:vgpr_32 = V_MUL_U32_U24_e64 killed %4:vgpr_32, %1:sreg_32, 0, implicit $exec
54+
...

0 commit comments

Comments
 (0)