Skip to content

AMDGPU: Clear kill flags after FoldZeroHighBits #99582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 19, 2024
Merged

Conversation

changpeng
Copy link
Contributor

@changpeng changpeng commented Jul 18, 2024

After folding, all uses of the result register are going to be replaced by the operand register. The kill flags on the uses of the result and operand registers are no longer valid after the replacement, and need to be cleared.
The only exception is, however, if the kill flag is set for the operand register, we are sure the last use of the result register is the new last use of the operand register, and thus we are safe to keep the kill flags.

  After the folding, all the uses of the result register are going to be replaced
by the operand register. The kill flags on the uses of the result regster are no
longer valid after the replacement.

  The only exception is, however, if the kill flag is set for the operand register,
we are sure the last use of the result register is the new last use of the operand
register, and thus we are safe to keep the kill flags.
@llvmbot
Copy link
Member

llvmbot commented Jul 18, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Changpeng Fang (changpeng)

Changes

After the folding, all the uses of the result register are going to be replaced by the operand register. The kill flags on the uses of the result register are no longer valid after the replacement.
The only exception is, however, if the kill flag is set for the operand register, we are sure the last use of the result register is the new last use of the operand register, and thus we are safe to keep the kill flags.


Full diff: https://github.com/llvm/llvm-project/pull/99582.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+2)
  • (added) llvm/test/CodeGen/AMDGPU/fold-zero-high-bits-clear-kill-flags.mir (+54)
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 0e8c96625b221..a1524a2c051f7 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -1361,6 +1361,8 @@ bool SIFoldOperands::tryFoldZeroHighBits(MachineInstr &MI) const {
     return false;
 
   Register Dst = MI.getOperand(0).getReg();
+  if (!MI.getOperand(2).isKill())
+    MRI->clearKillFlags(Dst);
   MRI->replaceRegWith(Dst, SrcDef->getOperand(0).getReg());
   MI.eraseFromParent();
   return true;
diff --git a/llvm/test/CodeGen/AMDGPU/fold-zero-high-bits-clear-kill-flags.mir b/llvm/test/CodeGen/AMDGPU/fold-zero-high-bits-clear-kill-flags.mir
new file mode 100644
index 0000000000000..baaca76bfd8a8
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fold-zero-high-bits-clear-kill-flags.mir
@@ -0,0 +1,54 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx940 -verify-machineinstrs -run-pass si-fold-operands -o - %s | FileCheck -enable-var-scope -check-prefix=GCN %s
+
+---
+name:            fold_zero_high_bits_src1_alive
+tracksRegLiveness: true
+
+body:             |
+  bb.0:
+    liveins: $vgpr0, $vgpr1
+    ; GCN-LABEL: name: fold_zero_high_bits_src1_alive
+    ; GCN: liveins: $vgpr0, $vgpr1
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+    ; GCN-NEXT: [[V_ADD_U16_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U16_e64 [[COPY]], 1, 0, implicit $exec
+    ; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
+    ; GCN-NEXT: [[V_MUL_U32_U24_e64_:%[0-9]+]]:vgpr_32 = V_MUL_U32_U24_e64 [[V_ADD_U16_e64_]], 1, 0, implicit $exec
+    ; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+    ; GCN-NEXT: [[V_SUB_U16_e64_:%[0-9]+]]:vgpr_32 = V_SUB_U16_e64 [[V_ADD_U16_e64_]], [[COPY1]], 0, implicit $exec
+    %0:vgpr_32 = COPY $vgpr0
+    %1:sreg_32 = S_MOV_B32 1
+    %2:vgpr_32 = V_ADD_U16_e64 %0:vgpr_32, %1:sreg_32, 0, implicit $exec
+    %3:sreg_32 = S_MOV_B32 65535
+    %4:vgpr_32 = V_AND_B32_e64 %3:sreg_32, %2:vgpr_32, implicit $exec
+    %5:vgpr_32 = V_MUL_U32_U24_e64 killed %4:vgpr_32, %1:sreg_32, 0, implicit $exec
+    %6:vgpr_32 = COPY $vgpr1
+    %7:vgpr_32 = V_SUB_U16_e64 %2:vgpr_32, %6:vgpr_32, 0, implicit $exec
+...
+
+---
+name:            fold_zero_high_bits_src1_killed
+tracksRegLiveness: true
+
+body:             |
+  bb.0:
+    liveins: $vgpr0, $vgpr1
+    ; GCN-LABEL: name: fold_zero_high_bits_src1_killed
+    ; GCN: liveins: $vgpr0, $vgpr1
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+    ; GCN-NEXT: [[V_ADD_U16_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U16_e64 [[COPY]], 1, 0, implicit $exec
+    ; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+    ; GCN-NEXT: [[V_SUB_U16_e64_:%[0-9]+]]:vgpr_32 = V_SUB_U16_e64 [[V_ADD_U16_e64_]], [[COPY1]], 0, implicit $exec
+    ; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
+    ; GCN-NEXT: [[V_MUL_U32_U24_e64_:%[0-9]+]]:vgpr_32 = V_MUL_U32_U24_e64 killed [[V_ADD_U16_e64_]], 1, 0, implicit $exec
+    %0:vgpr_32 = COPY $vgpr0
+    %1:sreg_32 = S_MOV_B32 1
+    %2:vgpr_32 = V_ADD_U16_e64 %0:vgpr_32, %1:sreg_32, 0, implicit $exec
+    %6:vgpr_32 = COPY $vgpr1
+    %7:vgpr_32 = V_SUB_U16_e64 %2:vgpr_32, %6:vgpr_32, 0, implicit $exec
+    %3:sreg_32 = S_MOV_B32 65535
+    %4:vgpr_32 = V_AND_B32_e64 %3:sreg_32, killed %2:vgpr_32, implicit $exec
+    %5:vgpr_32 = V_MUL_U32_U24_e64 killed %4:vgpr_32, %1:sreg_32, 0, implicit $exec
+...

@changpeng changpeng requested a review from rampitec July 18, 2024 22:23
@changpeng changpeng merged commit 25f4bd8 into llvm:main Jul 19, 2024
7 checks passed
@changpeng changpeng deleted the folding branch July 19, 2024 07:46
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Jul 22, 2024
After folding, all uses of the result register are going to be replaced
by the operand register. The kill flags on the uses of the result and
operand registers are no longer valid after the replacement, and need to
be cleared.
The only exception is, however, if the kill flag is set for the operand
register, we are sure the last use of the result register is the new
last use of the operand register, and thus we are safe to keep the kill
flags.

Change-Id: I60dfe5d031d6a86d41f41113c284e6944faa4e02
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary:
After folding, all uses of the result register are going to be replaced
by the operand register. The kill flags on the uses of the result and
operand registers are no longer valid after the replacement, and need to
be cleared.
The only exception is, however, if the kill flag is set for the operand
register, we are sure the last use of the result register is the new
last use of the operand register, and thus we are safe to keep the kill
flags.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251304
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants