-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU][True16][Codegen] keep srcmod/clamp/omod from v_s_xxx_f16 when moved to VALU #144781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b9f4313
to
5aca2e5
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
5aca2e5
to
556f07b
Compare
@llvm/pr-subscribers-backend-amdgpu Author: Brox Chen (broxigarchen) Changes#141152 causes an issue in v_s_xxx_f16 lowering in both true16/fake16 flow. V_S_XXX_F16 are special insts which has scalar input/output but in VALU VOP3 format. Need to keep the srcmod/clamp/omod when lower it to its corresponding VALU inst with vector input/output. Full diff: https://github.com/llvm/llvm-project/pull/144781.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 2ebf8b99e9d7b..2b4d6a35d6b5b 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -7744,11 +7744,11 @@ void SIInstrInfo::moveToVALUImpl(SIInstrWorklist &Worklist,
? &AMDGPU::VGPR_16RegClass
: &AMDGPU::VGPR_32RegClass);
auto NewInstr = BuildMI(*MBB, Inst, DL, get(NewOpcode), NewDst)
- .addImm(0) // src0_modifiers
+ .add(Inst.getOperand(1)) // src0_modifiers
.add(Inst.getOperand(2))
- .addImm(0) // clamp
- .addImm(0); // omod
- if (ST.useRealTrue16Insts())
+ .add(Inst.getOperand(3)) // clamp
+ .add(Inst.getOperand(4)); // omod
+ if (AMDGPU::hasNamedOperand(NewOpcode, AMDGPU::OpName::op_sel))
NewInstr.addImm(0); // opsel0
MRI.replaceRegWith(Inst.getOperand(0).getReg(), NewDst);
legalizeOperandsVALUt16(*NewInstr, MRI);
diff --git a/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-gfx12-fake16.mir b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-gfx12-fake16.mir
new file mode 100644
index 0000000000000..4030b54c1b5cf
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-gfx12-fake16.mir
@@ -0,0 +1,78 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 3
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1200 -mattr=-real-true16 -run-pass=si-fix-sgpr-copies -verify-machineinstrs -o - %s | FileCheck --check-prefixes=GCN %s
+
+---
+name: v_s_exp_f16
+body: |
+ bb.0.entry:
+ ; GCN-LABEL: name: v_s_exp_f16
+ ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+ ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_EXP_F16_fake16_e64_:%[0-9]+]]:vgpr_32 = V_EXP_F16_fake16_e64 1, [[V_CVT_F32_U32_e64_]], 1, 0, implicit $mode, implicit $exec
+ %0:vgpr_32 = IMPLICIT_DEF
+ %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+ %2:sreg_32 = COPY %1:vgpr_32
+ %3:sreg_32_xexec = V_S_EXP_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
+---
+name: v_s_log_f16
+body: |
+ bb.0.entry:
+ ; GCN-LABEL: name: v_s_log_f16
+ ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+ ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_LOG_F16_fake16_e64_:%[0-9]+]]:vgpr_32 = V_LOG_F16_fake16_e64 1, [[V_CVT_F32_U32_e64_]], 1, 0, implicit $mode, implicit $exec
+ %0:vgpr_32 = IMPLICIT_DEF
+ %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+ %2:sreg_32 = COPY %1:vgpr_32
+ %3:sreg_32_xexec = V_S_LOG_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
+---
+name: v_s_rcp_f16
+body: |
+ bb.0.entry:
+ ; GCN-LABEL: name: v_s_rcp_f16
+ ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+ ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_RCP_F16_fake16_e64_:%[0-9]+]]:vgpr_32 = V_RCP_F16_fake16_e64 1, [[V_CVT_F32_U32_e64_]], 1, 0, implicit $mode, implicit $exec
+ %0:vgpr_32 = IMPLICIT_DEF
+ %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+ %2:sreg_32 = COPY %1:vgpr_32
+ %3:sreg_32_xexec = V_S_RCP_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
+---
+name: v_s_rsq_f16
+body: |
+ bb.0.entry:
+ ; GCN-LABEL: name: v_s_rsq_f16
+ ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+ ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_RSQ_F16_fake16_e64_:%[0-9]+]]:vgpr_32 = V_RSQ_F16_fake16_e64 1, [[V_CVT_F32_U32_e64_]], 1, 0, implicit $mode, implicit $exec
+ %0:vgpr_32 = IMPLICIT_DEF
+ %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+ %2:sreg_32 = COPY %1:vgpr_32
+ %3:sreg_32_xexec = V_S_RSQ_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
+---
+name: v_s_sqrt_f16
+body: |
+ bb.0.entry:
+ ; GCN-LABEL: name: v_s_sqrt_f16
+ ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+ ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_SQRT_F16_fake16_e64_:%[0-9]+]]:vgpr_32 = V_SQRT_F16_fake16_e64 1, [[V_CVT_F32_U32_e64_]], 1, 0, implicit $mode, implicit $exec
+ %0:vgpr_32 = IMPLICIT_DEF
+ %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+ %2:sreg_32 = COPY %1:vgpr_32
+ %3:sreg_32_xexec = V_S_SQRT_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
diff --git a/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-gfx12-true16.mir b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-gfx12-true16.mir
new file mode 100644
index 0000000000000..4f4b98c0419cc
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-gfx12-true16.mir
@@ -0,0 +1,78 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 3
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1200 -mattr=+real-true16 -run-pass=si-fix-sgpr-copies -verify-machineinstrs -o - %s | FileCheck --check-prefixes=GCN %s
+
+---
+name: v_s_exp_f16
+body: |
+ bb.0.entry:
+ ; GCN-LABEL: name: v_s_exp_f16
+ ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+ ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_EXP_F16_t16_e64_:%[0-9]+]]:vgpr_16 = V_EXP_F16_t16_e64 1, [[V_CVT_F32_U32_e64_]].lo16, 1, 0, 0, implicit $mode, implicit $exec
+ %0:vgpr_32 = IMPLICIT_DEF
+ %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+ %2:sreg_32 = COPY %1:vgpr_32
+ %3:sreg_32_xexec = V_S_EXP_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
+---
+name: v_s_log_f16
+body: |
+ bb.0.entry:
+ ; GCN-LABEL: name: v_s_log_f16
+ ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+ ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_LOG_F16_t16_e64_:%[0-9]+]]:vgpr_16 = V_LOG_F16_t16_e64 1, [[V_CVT_F32_U32_e64_]].lo16, 1, 0, 0, implicit $mode, implicit $exec
+ %0:vgpr_32 = IMPLICIT_DEF
+ %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+ %2:sreg_32 = COPY %1:vgpr_32
+ %3:sreg_32_xexec = V_S_LOG_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
+---
+name: v_s_rcp_f16
+body: |
+ bb.0.entry:
+ ; GCN-LABEL: name: v_s_rcp_f16
+ ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+ ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_RCP_F16_t16_e64_:%[0-9]+]]:vgpr_16 = V_RCP_F16_t16_e64 1, [[V_CVT_F32_U32_e64_]].lo16, 1, 0, 0, implicit $mode, implicit $exec
+ %0:vgpr_32 = IMPLICIT_DEF
+ %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+ %2:sreg_32 = COPY %1:vgpr_32
+ %3:sreg_32_xexec = V_S_RCP_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
+---
+name: v_s_rsq_f16
+body: |
+ bb.0.entry:
+ ; GCN-LABEL: name: v_s_rsq_f16
+ ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+ ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_RSQ_F16_t16_e64_:%[0-9]+]]:vgpr_16 = V_RSQ_F16_t16_e64 1, [[V_CVT_F32_U32_e64_]].lo16, 1, 0, 0, implicit $mode, implicit $exec
+ %0:vgpr_32 = IMPLICIT_DEF
+ %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+ %2:sreg_32 = COPY %1:vgpr_32
+ %3:sreg_32_xexec = V_S_RSQ_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
+---
+name: v_s_sqrt_f16
+body: |
+ bb.0.entry:
+ ; GCN-LABEL: name: v_s_sqrt_f16
+ ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+ ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[V_SQRT_F16_t16_e64_:%[0-9]+]]:vgpr_16 = V_SQRT_F16_t16_e64 1, [[V_CVT_F32_U32_e64_]].lo16, 1, 0, 0, implicit $mode, implicit $exec
+ %0:vgpr_32 = IMPLICIT_DEF
+ %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+ %2:sreg_32 = COPY %1:vgpr_32
+ %3:sreg_32_xexec = V_S_SQRT_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
|
2579ac6
to
5b1ecb7
Compare
5b1ecb7
to
35d3f7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
#141152 causes an issue in v_s_xxx_f16 lowering in both true16/fake16 flow.
V_S_XXX_F16 are special insts which has scalar input/output but in VALU VOP3 format. Need to keep the srcmod/clamp/omod when lower it to its corresponding VALU inst with vector input/output.