[AMDGPU][True16][CodeGen] readfirstlane for vgpr16 copy to sgpr32 #137848

broxigarchen · 2025-04-29T17:30:13Z

i16 can be selected into sgpr32 or vgpr16 in isel lowering in true16 mode. And thus, it creates cases that we copy from vgpr16 to sgpr32 in ext selection and this seems inevitable without sgpr16 support.

legalize the src/dst reg when we decide to lower this special copy to a readfirstlane in fix-sgpr-copy pass and add a lit test

llvmbot · 2025-05-02T14:11:58Z

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

i16 can be selected into sgpr32 or vgpr16 in isel lowering in true16 mode. And thus, it creates cases that we copy from vgpr16 to sgpr32 in ext selection.

Add legalization when we decide to lower this copy to readfirstlane in fix-sgpr-copy pass and add a lit test

Full diff: https://github.com/llvm/llvm-project/pull/137848.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp (+17-4)
(modified) llvm/lib/Target/AMDGPU/VOP1Instructions.td (+1-8)
(added) llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-vgpr16-to-spgr32.ll (+41)

diff --git a/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp b/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
index bb8e9a092e07c..47052a80bf125 100644
--- a/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
@@ -1086,10 +1086,23 @@ void SIFixSGPRCopies::lowerVGPR2SGPRCopies(MachineFunction &MF) {
         TRI->getRegClassForOperandReg(*MRI, MI->getOperand(1));
     size_t SrcSize = TRI->getRegSizeInBits(*SrcRC);
     if (SrcSize == 16) {
-      // HACK to handle possible 16bit VGPR source
-      auto MIB = BuildMI(*MBB, MI, MI->getDebugLoc(),
-                         TII->get(AMDGPU::V_READFIRSTLANE_B32), DstReg);
-      MIB.addReg(SrcReg, 0, AMDGPU::NoSubRegister);
+      assert(MF.getSubtarget<GCNSubtarget>().useRealTrue16Insts() &&
+             "We do not expect to see 16-bit copies from VGPR to SGPR unless "
+             "we have 16-bit VGPRs");
+      assert(MRI->getRegClass(DstReg) == &AMDGPU::SGPR_LO16RegClass ||
+             MRI->getRegClass(DstReg) == &AMDGPU::SReg_32RegClass ||
+             MRI->getRegClass(DstReg) == &AMDGPU::SReg_32_XM0RegClass);
+      // There is no V_READFIRSTLANE_B16, so legalize the dst/src reg to 32 bits
+      if (MRI->getRegClass(DstReg) == &AMDGPU::SGPR_LO16RegClass)
+        MRI->setRegClass(DstReg, &AMDGPU::SReg_32RegClass);
+      Register VReg32 = MRI->createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+      const DebugLoc &DL = MI->getDebugLoc();
+      BuildMI(*MBB, MI, DL, TII->get(TargetOpcode::SUBREG_TO_REG), VReg32)
+          .addImm(0)
+          .addReg(SrcReg, 0)
+          .addImm(AMDGPU::lo16);
+      BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), DstReg)
+          .addReg(VReg32);
     } else if (SrcSize == 32) {
       auto MIB = BuildMI(*MBB, MI, MI->getDebugLoc(),
                          TII->get(AMDGPU::V_READFIRSTLANE_B32), DstReg);
diff --git a/llvm/lib/Target/AMDGPU/VOP1Instructions.td b/llvm/lib/Target/AMDGPU/VOP1Instructions.td
index 071f55ce16403..352a3f9c2d27f 100644
--- a/llvm/lib/Target/AMDGPU/VOP1Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP1Instructions.td
@@ -1472,16 +1472,9 @@ def : GCNPat <
 
 } // End OtherPredicates = [isGFX8Plus, p]
 
-let True16Predicate = UseFakeTrue16Insts in {
-def : GCNPat<
-  (i32 (DivergentUnaryFrag<anyext> i16:$src)),
-  (COPY $src)
->;
-} // End True16Predicate = UseFakeTrue16Insts
-
 let True16Predicate = UseRealTrue16Insts in {
 def : GCNPat<
-  (i32 (UniformUnaryFrag<anyext> (i16 SReg_32:$src))),
+  (i32 (UniformUnaryFrag<anyext> i16:$src)),
   (COPY $src)
 >;
 
diff --git a/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-vgpr16-to-spgr32.ll b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-vgpr16-to-spgr32.ll
new file mode 100644
index 0000000000000..0b42274f9553d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-vgpr16-to-spgr32.ll
@@ -0,0 +1,41 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr=+real-true16 < %s | FileCheck %s
+
+; expect readfirstlane to pick the 32bit register
+define amdgpu_gs i32 @vgpr16_copyto_sgpr(ptr addrspace(3) %a, i32 %b, ptr addrspace(1) %out) {
+; CHECK-LABEL: vgpr16_copyto_sgpr:
+; CHECK:       ; %bb.0: ; %entry
+; CHECK-NEXT:    ds_load_2addr_b32 v[0:1], v0 offset1:1
+; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
+; CHECK-NEXT:    v_cvt_f16_f32_e32 v0.l, v0
+; CHECK-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(SALU_CYCLE_1)
+; CHECK-NEXT:    v_readfirstlane_b32 s0, v0
+; CHECK-NEXT:    s_and_b32 s0, 0xffff, s0
+; CHECK-NEXT:    s_mul_i32 s0, s0, 5
+; CHECK-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
+; CHECK-NEXT:    s_cmp_lg_u32 s0, 2
+; CHECK-NEXT:    s_cbranch_scc1 .LBB0_2
+; CHECK-NEXT:  ; %bb.1: ; %a1
+; CHECK-NEXT:    s_mov_b32 s0, 1
+; CHECK-NEXT:    s_branch .LBB0_3
+; CHECK-NEXT:  .LBB0_2: ; %a2
+; CHECK-NEXT:    s_mov_b32 s0, 2
+; CHECK-NEXT:    s_branch .LBB0_3
+; CHECK-NEXT:  .LBB0_3:
+entry:
+  %1 = load <4 x float>, ptr addrspace(3) poison, align 4
+  %2 = extractelement <4 x float> %1, i32 0
+  %3 = fptrunc float %2 to half
+  %4 = bitcast half %3 to i16
+  %5 = zext i16 %4 to i32
+  %6 = add i32 %5, 1
+  %7 = mul i32 %6, 5
+  %8 = icmp eq i32 %7, 7
+  br i1 %8, label %a1, label %a2
+
+a1:
+  ret i32 1
+
+a2:
+  ret i32 2
+}

broxigarchen · 2025-05-02T14:22:24Z

There is a related PR #118037 created before. Trim the patch and reopened here

Sisyph · 2025-05-02T16:21:42Z

I would recommend not opening a new PR for a new version of a patch. It only serves to make finding previous discussion harder, and messing up people's subscribe/unsubscribe selections. The previous PR should be updated.

broxigarchen · 2025-05-02T16:33:19Z

I would recommend not opening a new PR for a new version of a patch. It only serves to make finding previous discussion harder, and messing up people's subscribe/unsubscribe selections. The previous PR should be updated.

as Joe suggested, close this and use the original PR

broxigarchen force-pushed the main-merge-true16-codegen-fix-readfirstlane-2 branch 3 times, most recently from 4e657dc to e945541 Compare May 2, 2025 05:19

broxigarchen changed the title ~~tmp~~ [AMDGPU][True16][MC] readfirstlane for vgpr16 copy to sgpr32 May 2, 2025

broxigarchen changed the title ~~[AMDGPU][True16][MC] readfirstlane for vgpr16 copy to sgpr32~~ [AMDGPU][True16][CodeGen] readfirstlane for vgpr16 copy to sgpr32 May 2, 2025

broxigarchen force-pushed the main-merge-true16-codegen-fix-readfirstlane-2 branch from e945541 to 9adcefe Compare May 2, 2025 14:11

broxigarchen marked this pull request as ready for review May 2, 2025 14:11

broxigarchen requested review from arsenm and Sisyph May 2, 2025 14:11

llvmbot added the backend:AMDGPU label May 2, 2025

broxigarchen requested a review from kosarev May 2, 2025 14:11

fix vgpr16 copy to sgpr32

2973ed7

broxigarchen force-pushed the main-merge-true16-codegen-fix-readfirstlane-2 branch from 9adcefe to 2973ed7 Compare May 2, 2025 15:33

broxigarchen closed this May 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU][True16][CodeGen] readfirstlane for vgpr16 copy to sgpr32 #137848

[AMDGPU][True16][CodeGen] readfirstlane for vgpr16 copy to sgpr32 #137848

Uh oh!

broxigarchen commented Apr 29, 2025 •

edited

Loading

Uh oh!

llvmbot commented May 2, 2025

Uh oh!

broxigarchen commented May 2, 2025 •

edited

Loading

Uh oh!

Sisyph commented May 2, 2025

Uh oh!

broxigarchen commented May 2, 2025

Uh oh!

Uh oh!

[AMDGPU][True16][CodeGen] readfirstlane for vgpr16 copy to sgpr32 #137848

[AMDGPU][True16][CodeGen] readfirstlane for vgpr16 copy to sgpr32 #137848

Uh oh!

Conversation

broxigarchen commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented May 2, 2025

Uh oh!

broxigarchen commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sisyph commented May 2, 2025

Uh oh!

broxigarchen commented May 2, 2025

Uh oh!

Uh oh!

broxigarchen commented Apr 29, 2025 •

edited

Loading

broxigarchen commented May 2, 2025 •

edited

Loading