[AMDGPU][True16][CodeGen] fold clamp update for true16 #128919

broxigarchen · 2025-02-26T17:32:12Z

Check through COPY for possible clamp folding for v_mad_mixhi_f16 isel

llvmbot · 2025-03-04T02:10:47Z

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

Check through COPY for possible clamp folding for v_mad_mixhi_f16 isel

Full diff: https://github.com/llvm/llvm-project/pull/128919.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+5-1)
(modified) llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll (+7-22)

diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index eb9aabf8b6317..9d4a7f30baa29 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -1721,6 +1721,10 @@ bool SIFoldOperandsImpl::tryFoldClamp(MachineInstr &MI) {
     return false;
 
   MachineInstr *Def = MRI->getVRegDef(ClampSrc->getReg());
+  MachineInstr *OrigDef = Def;
+  // Look through COPY. COPY only observed with True16.
+  if (Def->isCopy() && Def->getOperand(1).getReg().isVirtual())
+    Def = MRI->getVRegDef(Def->getOperand(1).getReg());
 
   // The type of clamp must be compatible.
   if (TII->getClampMask(*Def) != TII->getClampMask(MI))
@@ -1738,7 +1742,7 @@ bool SIFoldOperandsImpl::tryFoldClamp(MachineInstr &MI) {
   // Clamp is applied after omod, so it is OK if omod is set.
   DefClamp->setImm(1);
 
-  Register DefReg = Def->getOperand(0).getReg();
+  Register DefReg = OrigDef->getOperand(0).getReg();
   Register MIDstReg = MI.getOperand(0).getReg();
   if (TRI->isSGPRReg(*MRI, DefReg)) {
     // Pseudo scalar instructions have a SGPR for dst and clamp is a v_max*
diff --git a/llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll b/llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll
index 9949b823dfec1..51965acec9573 100644
--- a/llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll
+++ b/llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll
@@ -269,19 +269,11 @@ define half @v_mad_mixlo_f16_f16lo_f16lo_f32(half %src0, half %src1, float %src2
 }
 
 define half @v_mad_mixlo_f16_f16lo_f16lo_f32_clamp_post_cvt(half %src0, half %src1, float %src2) #0 {
-; SDAG-GFX1100-TRUE16-LABEL: v_mad_mixlo_f16_f16lo_f16lo_f32_clamp_post_cvt:
-; SDAG-GFX1100-TRUE16:       ; %bb.0:
-; SDAG-GFX1100-TRUE16-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; SDAG-GFX1100-TRUE16-NEXT:    v_fma_mixlo_f16 v0, v0, v1, v2 op_sel_hi:[1,1,0]
-; SDAG-GFX1100-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; SDAG-GFX1100-TRUE16-NEXT:    v_max_f16_e64 v0.l, v0.l, v0.l clamp
-; SDAG-GFX1100-TRUE16-NEXT:    s_setpc_b64 s[30:31]
-;
-; SDAG-GFX1100-FAKE16-LABEL: v_mad_mixlo_f16_f16lo_f16lo_f32_clamp_post_cvt:
-; SDAG-GFX1100-FAKE16:       ; %bb.0:
-; SDAG-GFX1100-FAKE16-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; SDAG-GFX1100-FAKE16-NEXT:    v_fma_mixlo_f16 v0, v0, v1, v2 op_sel_hi:[1,1,0] clamp
-; SDAG-GFX1100-FAKE16-NEXT:    s_setpc_b64 s[30:31]
+; GFX1100-LABEL: v_mad_mixlo_f16_f16lo_f16lo_f32_clamp_post_cvt:
+; GFX1100:       ; %bb.0:
+; GFX1100-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1100-NEXT:    v_fma_mixlo_f16 v0, v0, v1, v2 op_sel_hi:[1,1,0] clamp
+; GFX1100-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX900-LABEL: v_mad_mixlo_f16_f16lo_f16lo_f32_clamp_post_cvt:
 ; GFX900:       ; %bb.0:
@@ -312,12 +304,6 @@ define half @v_mad_mixlo_f16_f16lo_f16lo_f32_clamp_post_cvt(half %src0, half %sr
 ; SDAG-CI-NEXT:    v_cvt_f32_f16_e64 v0, v0 clamp
 ; SDAG-CI-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GISEL-GFX1100-LABEL: v_mad_mixlo_f16_f16lo_f16lo_f32_clamp_post_cvt:
-; GISEL-GFX1100:       ; %bb.0:
-; GISEL-GFX1100-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GISEL-GFX1100-NEXT:    v_fma_mixlo_f16 v0, v0, v1, v2 op_sel_hi:[1,1,0] clamp
-; GISEL-GFX1100-NEXT:    s_setpc_b64 s[30:31]
-;
 ; GISEL-CI-LABEL: v_mad_mixlo_f16_f16lo_f16lo_f32_clamp_post_cvt:
 ; GISEL-CI:       ; %bb.0:
 ; GISEL-CI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
@@ -1546,10 +1532,9 @@ define <2 x half> @v_mad_mix_v2f32_clamp_postcvt_lo(<2 x half> %src0, <2 x half>
 ; SDAG-GFX1100-TRUE16-NEXT:    v_mov_b16_e32 v4.l, v1.l
 ; SDAG-GFX1100-TRUE16-NEXT:    v_mov_b16_e32 v5.l, v2.l
 ; SDAG-GFX1100-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; SDAG-GFX1100-TRUE16-NEXT:    v_fma_mixlo_f16 v3, v3, v4, v5 op_sel_hi:[1,1,1]
-; SDAG-GFX1100-TRUE16-NEXT:    v_max_f16_e64 v3.l, v3.l, v3.l clamp
-; SDAG-GFX1100-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; SDAG-GFX1100-TRUE16-NEXT:    v_fma_mixlo_f16 v3, v3, v4, v5 op_sel_hi:[1,1,1] clamp
 ; SDAG-GFX1100-TRUE16-NEXT:    v_fma_mixhi_f16 v3, v0, v1, v2 op_sel:[1,1,1] op_sel_hi:[1,1,1]
+; SDAG-GFX1100-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; SDAG-GFX1100-TRUE16-NEXT:    v_mov_b32_e32 v0, v3
 ; SDAG-GFX1100-TRUE16-NEXT:    s_setpc_b64 s[30:31]
 ;

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

broxigarchen · 2025-03-06T17:27:16Z

ping!

llvm/test/CodeGen/AMDGPU/true16-fold.mir

github-actions · 2025-03-09T02:43:22Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/test/CodeGen/AMDGPU/true16-fold.mir

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

broxigarchen · 2025-03-17T03:01:04Z

CI error is not related

broxigarchen · 2025-03-18T20:09:22Z

ping!

broxigarchen · 2025-03-24T14:45:11Z

rebased

broxigarchen · 2025-03-31T18:20:15Z

All comments on this PR has been address. If there is no other comments on it I will merge this tomorrow. Thanks!

broxigarchen · 2025-04-01T15:03:55Z

rebased

Sisyph · 2025-04-01T18:56:27Z

llvm/test/CodeGen/AMDGPU/true16-fold.mir

+    %3:sreg_32 = IMPLICIT_DEF
+    %4:vgpr_32 = COPY %3
+    %5:vgpr_32 = nofpexcept V_FMA_MIXLO_F16 8, %2, 8, %1, 0, %0, 0, %4, 0, 0, implicit $mode, implicit $exec
+    %6:vgpr_16 = nofpexcept V_MAX_F16_t16_e64 0, %5.lo16, 0, %5.lo16, -1, 0, 0, implicit $mode, implicit $exec


It would be nice to make this case and the other subreg case fold, as this is a more correct MIR than in fold_16bit_madmix_clamp (COPY of different size src and dst in that one)

I think we can keep it like this as we should not hit this case anyway.

We currently have the problem of different size COPY and thus we actually hit the case in line 5. We will need to support the true16 version of V_FMA_MIXLO_F16 to fix the COPY issue and thus we will have vpgr16 return without subreg.

llvm/test/CodeGen/AMDGPU/true16-fold.mir

Sisyph

LGTM

broxigarchen · 2025-04-02T21:10:47Z

rebased and resolved conflicts

broxigarchen force-pushed the fold-clamp branch from cec89d2 to 15912e6 Compare March 4, 2025 02:06

broxigarchen changed the title ~~true16 for fold clamp~~ AMDGPU][True16][CodeGen] fold clamp update for true16 Mar 4, 2025

broxigarchen requested review from mbrkusanin, arsenm, kosarev and Sisyph March 4, 2025 02:09

broxigarchen marked this pull request as ready for review March 4, 2025 02:10

llvmbot added the backend:AMDGPU label Mar 4, 2025

arsenm reviewed Mar 4, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp Outdated Show resolved Hide resolved

broxigarchen force-pushed the fold-clamp branch from 15912e6 to 91c8d4f Compare March 4, 2025 20:43

broxigarchen marked this pull request as draft March 4, 2025 20:46

broxigarchen force-pushed the fold-clamp branch from 91c8d4f to d57518f Compare March 4, 2025 21:51

broxigarchen marked this pull request as ready for review March 4, 2025 22:03

broxigarchen requested a review from arsenm March 4, 2025 22:06

broxigarchen mentioned this pull request Mar 4, 2025

[AMDGPU][True16][CodeGen] optimize codegen for mad-mix in true16 #124995

Merged

broxigarchen requested a review from jayfoad March 5, 2025 20:04

arsenm reviewed Mar 7, 2025

View reviewed changes

llvm/test/CodeGen/AMDGPU/true16-fold.mir Show resolved Hide resolved

broxigarchen force-pushed the fold-clamp branch from d57518f to 0789fcf Compare March 9, 2025 02:40

broxigarchen force-pushed the fold-clamp branch from 0789fcf to 68325c5 Compare March 9, 2025 03:42

broxigarchen requested a review from arsenm March 10, 2025 13:57

arsenm reviewed Mar 13, 2025

View reviewed changes

broxigarchen force-pushed the fold-clamp branch 2 times, most recently from 2982268 to 2cf303b Compare March 13, 2025 23:48

broxigarchen requested a review from arsenm March 14, 2025 18:09

broxigarchen changed the title ~~AMDGPU][True16][CodeGen] fold clamp update for true16~~ [AMDGPU][True16][CodeGen] fold clamp update for true16 Mar 15, 2025

broxigarchen force-pushed the fold-clamp branch from 2cf303b to 3dad0c3 Compare March 24, 2025 14:45

broxigarchen force-pushed the fold-clamp branch 2 times, most recently from 12a300c to dc5dbec Compare March 26, 2025 22:18

broxigarchen force-pushed the fold-clamp branch from dc5dbec to e9d0543 Compare April 1, 2025 15:03

Sisyph reviewed Apr 1, 2025

View reviewed changes

broxigarchen force-pushed the fold-clamp branch 2 times, most recently from d750f87 to 4842f7c Compare April 2, 2025 13:52

Sisyph approved these changes Apr 2, 2025

View reviewed changes

broxigarchen added 4 commits April 2, 2025 16:44

true16 for fold clamp

5d5dbdd

address comment

506a798

address comment

50498a6

use lookthrucopylike

a7d130c

broxigarchen force-pushed the fold-clamp branch from 4842f7c to a7d130c Compare April 2, 2025 21:10

broxigarchen merged commit 066787b into llvm:main Apr 2, 2025
6 of 9 checks passed

[AMDGPU][True16][CodeGen] fold clamp update for true16 #128919

[AMDGPU][True16][CodeGen] fold clamp update for true16 #128919

Uh oh!

Conversation

broxigarchen commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 4, 2025

Uh oh!

Uh oh!

broxigarchen commented Mar 6, 2025

Uh oh!

Uh oh!

github-actions bot commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

broxigarchen commented Mar 17, 2025

Uh oh!

broxigarchen commented Mar 18, 2025

Uh oh!

broxigarchen commented Mar 24, 2025

Uh oh!

broxigarchen commented Mar 31, 2025

Uh oh!

broxigarchen commented Apr 1, 2025

Uh oh!

Sisyph Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

broxigarchen Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Sisyph left a comment

Choose a reason for hiding this comment

Uh oh!

broxigarchen commented Apr 2, 2025

Uh oh!

Uh oh!

Uh oh!

broxigarchen commented Feb 26, 2025 •

edited

Loading

github-actions bot commented Mar 9, 2025 •

edited

Loading