Skip to content

[AMDGPU][True16][Codegen] keep srcmod/clamp/omod from v_s_xxx_f16 when moved to VALU #144781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 19, 2025

Conversation

broxigarchen
Copy link
Contributor

@broxigarchen broxigarchen commented Jun 18, 2025

#141152 causes an issue in v_s_xxx_f16 lowering in both true16/fake16 flow.

V_S_XXX_F16 are special insts which has scalar input/output but in VALU VOP3 format. Need to keep the srcmod/clamp/omod when lower it to its corresponding VALU inst with vector input/output.

Copy link

github-actions bot commented Jun 18, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@broxigarchen broxigarchen changed the title keep src mod and clamp for v_s_xxx_f16 lowering [AMDGPU][True16][Codegen] keep srcmod/clamp/omod for v_s_xxx_f16 when moved to VALU Jun 18, 2025
@broxigarchen broxigarchen marked this pull request as ready for review June 18, 2025 19:03
@broxigarchen broxigarchen requested review from Sisyph and kosarev June 18, 2025 19:03
@broxigarchen broxigarchen requested a review from arsenm June 18, 2025 19:04
@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

#141152 causes an issue in v_s_xxx_f16 lowering in both true16/fake16 flow.

V_S_XXX_F16 are special insts which has scalar input/output but in VALU VOP3 format. Need to keep the srcmod/clamp/omod when lower it to its corresponding VALU inst with vector input/output.


Full diff: https://github.com/llvm/llvm-project/pull/144781.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+4-4)
  • (added) llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-gfx12-fake16.mir (+78)
  • (added) llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-gfx12-true16.mir (+78)
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 2ebf8b99e9d7b..2b4d6a35d6b5b 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -7744,11 +7744,11 @@ void SIInstrInfo::moveToVALUImpl(SIInstrWorklist &Worklist,
                                                     ? &AMDGPU::VGPR_16RegClass
                                                     : &AMDGPU::VGPR_32RegClass);
     auto NewInstr = BuildMI(*MBB, Inst, DL, get(NewOpcode), NewDst)
-                        .addImm(0) // src0_modifiers
+                        .add(Inst.getOperand(1)) // src0_modifiers
                         .add(Inst.getOperand(2))
-                        .addImm(0)  // clamp
-                        .addImm(0); // omod
-    if (ST.useRealTrue16Insts())
+                        .add(Inst.getOperand(3))  // clamp
+                        .add(Inst.getOperand(4)); // omod
+    if (AMDGPU::hasNamedOperand(NewOpcode, AMDGPU::OpName::op_sel))
       NewInstr.addImm(0); // opsel0
     MRI.replaceRegWith(Inst.getOperand(0).getReg(), NewDst);
     legalizeOperandsVALUt16(*NewInstr, MRI);
diff --git a/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-gfx12-fake16.mir b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-gfx12-fake16.mir
new file mode 100644
index 0000000000000..4030b54c1b5cf
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-gfx12-fake16.mir
@@ -0,0 +1,78 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 3
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1200 -mattr=-real-true16 -run-pass=si-fix-sgpr-copies -verify-machineinstrs -o - %s | FileCheck --check-prefixes=GCN %s
+
+---
+name:            v_s_exp_f16
+body:             |
+  bb.0.entry:
+    ; GCN-LABEL: name: v_s_exp_f16
+    ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_EXP_F16_fake16_e64_:%[0-9]+]]:vgpr_32 = V_EXP_F16_fake16_e64 1, [[V_CVT_F32_U32_e64_]], 1, 0, implicit $mode, implicit $exec
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+    %2:sreg_32 = COPY %1:vgpr_32
+    %3:sreg_32_xexec = V_S_EXP_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
+---
+name:            v_s_log_f16
+body:             |
+  bb.0.entry:
+    ; GCN-LABEL: name: v_s_log_f16
+    ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_LOG_F16_fake16_e64_:%[0-9]+]]:vgpr_32 = V_LOG_F16_fake16_e64 1, [[V_CVT_F32_U32_e64_]], 1, 0, implicit $mode, implicit $exec
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+    %2:sreg_32 = COPY %1:vgpr_32
+    %3:sreg_32_xexec = V_S_LOG_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
+---
+name:            v_s_rcp_f16
+body:             |
+  bb.0.entry:
+    ; GCN-LABEL: name: v_s_rcp_f16
+    ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_RCP_F16_fake16_e64_:%[0-9]+]]:vgpr_32 = V_RCP_F16_fake16_e64 1, [[V_CVT_F32_U32_e64_]], 1, 0, implicit $mode, implicit $exec
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+    %2:sreg_32 = COPY %1:vgpr_32
+    %3:sreg_32_xexec = V_S_RCP_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
+---
+name:            v_s_rsq_f16
+body:             |
+  bb.0.entry:
+    ; GCN-LABEL: name: v_s_rsq_f16
+    ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_RSQ_F16_fake16_e64_:%[0-9]+]]:vgpr_32 = V_RSQ_F16_fake16_e64 1, [[V_CVT_F32_U32_e64_]], 1, 0, implicit $mode, implicit $exec
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+    %2:sreg_32 = COPY %1:vgpr_32
+    %3:sreg_32_xexec = V_S_RSQ_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
+---
+name:            v_s_sqrt_f16
+body:             |
+  bb.0.entry:
+    ; GCN-LABEL: name: v_s_sqrt_f16
+    ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_SQRT_F16_fake16_e64_:%[0-9]+]]:vgpr_32 = V_SQRT_F16_fake16_e64 1, [[V_CVT_F32_U32_e64_]], 1, 0, implicit $mode, implicit $exec
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+    %2:sreg_32 = COPY %1:vgpr_32
+    %3:sreg_32_xexec = V_S_SQRT_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
diff --git a/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-gfx12-true16.mir b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-gfx12-true16.mir
new file mode 100644
index 0000000000000..4f4b98c0419cc
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-gfx12-true16.mir
@@ -0,0 +1,78 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 3
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1200 -mattr=+real-true16 -run-pass=si-fix-sgpr-copies -verify-machineinstrs -o - %s | FileCheck --check-prefixes=GCN %s
+
+---
+name:            v_s_exp_f16
+body:             |
+  bb.0.entry:
+    ; GCN-LABEL: name: v_s_exp_f16
+    ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_EXP_F16_t16_e64_:%[0-9]+]]:vgpr_16 = V_EXP_F16_t16_e64 1, [[V_CVT_F32_U32_e64_]].lo16, 1, 0, 0, implicit $mode, implicit $exec
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+    %2:sreg_32 = COPY %1:vgpr_32
+    %3:sreg_32_xexec = V_S_EXP_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
+---
+name:            v_s_log_f16
+body:             |
+  bb.0.entry:
+    ; GCN-LABEL: name: v_s_log_f16
+    ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_LOG_F16_t16_e64_:%[0-9]+]]:vgpr_16 = V_LOG_F16_t16_e64 1, [[V_CVT_F32_U32_e64_]].lo16, 1, 0, 0, implicit $mode, implicit $exec
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+    %2:sreg_32 = COPY %1:vgpr_32
+    %3:sreg_32_xexec = V_S_LOG_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
+---
+name:            v_s_rcp_f16
+body:             |
+  bb.0.entry:
+    ; GCN-LABEL: name: v_s_rcp_f16
+    ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_RCP_F16_t16_e64_:%[0-9]+]]:vgpr_16 = V_RCP_F16_t16_e64 1, [[V_CVT_F32_U32_e64_]].lo16, 1, 0, 0, implicit $mode, implicit $exec
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+    %2:sreg_32 = COPY %1:vgpr_32
+    %3:sreg_32_xexec = V_S_RCP_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
+---
+name:            v_s_rsq_f16
+body:             |
+  bb.0.entry:
+    ; GCN-LABEL: name: v_s_rsq_f16
+    ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_RSQ_F16_t16_e64_:%[0-9]+]]:vgpr_16 = V_RSQ_F16_t16_e64 1, [[V_CVT_F32_U32_e64_]].lo16, 1, 0, 0, implicit $mode, implicit $exec
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+    %2:sreg_32 = COPY %1:vgpr_32
+    %3:sreg_32_xexec = V_S_RSQ_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+
+---
+name:            v_s_sqrt_f16
+body:             |
+  bb.0.entry:
+    ; GCN-LABEL: name: v_s_sqrt_f16
+    ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_CVT_F32_U32_e64_:%[0-9]+]]:vgpr_32 = V_CVT_F32_U32_e64 [[DEF]], 0, 0, implicit $mode, implicit $exec
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_SQRT_F16_t16_e64_:%[0-9]+]]:vgpr_16 = V_SQRT_F16_t16_e64 1, [[V_CVT_F32_U32_e64_]].lo16, 1, 0, 0, implicit $mode, implicit $exec
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0, implicit $mode, implicit $exec
+    %2:sreg_32 = COPY %1:vgpr_32
+    %3:sreg_32_xexec = V_S_SQRT_F16_e64 1, %2:sreg_32, 1, 0, implicit $mode, implicit $exec
+...
+

@broxigarchen broxigarchen force-pushed the main-fix-v-s-xx branch 3 times, most recently from 2579ac6 to 5b1ecb7 Compare June 18, 2025 19:17
Copy link
Contributor

@Sisyph Sisyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@broxigarchen broxigarchen changed the title [AMDGPU][True16][Codegen] keep srcmod/clamp/omod for v_s_xxx_f16 when moved to VALU [AMDGPU][True16][Codegen] keep srcmod/clamp/omod from v_s_xxx_f16 when moved to VALU Jun 18, 2025
@broxigarchen broxigarchen merged commit e75e248 into llvm:main Jun 19, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants