[NVPTX] Avoid introducing unnecessary ProxyRegs and Movs in ISel #120486

AlexMaclean · 2024-12-18T22:05:34Z

Avoid introducing ProxyReg and MOV nodes during ISel when lowering bitconvert or similar operations. These nodes are all erased by a later pass but not introducing them in the first place is simpler and likely saves compile time.

Also remove redundant MOV instruction definitions.

llvmbot · 2024-12-18T22:05:56Z

@llvm/pr-subscribers-backend-nvptx

Author: Alex MacLean (AlexMaclean)

Changes

Avoid introducing ProxyReg and MOV nodes during ISel when lowering bitconvert or similar operations. These nodes are all erased by a later pass but not introducing them in the first place is simpler and likely saves compile time.

Also remove redundant MOV instruction definitions.

Patch is 201.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/120486.diff

30 Files Affected:

(modified) llvm/lib/Target/NVPTX/NVPTXInstrInfo.td (+14-25)
(modified) llvm/lib/Target/NVPTX/NVPTXIntrinsics.td (+2-2)
(modified) llvm/test/CodeGen/NVPTX/atomics-sm70.ll (+13-13)
(modified) llvm/test/CodeGen/NVPTX/atomics-sm90.ll (+24-24)
(modified) llvm/test/CodeGen/NVPTX/bf16-instructions.ll (+68-68)
(modified) llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll (+20-20)
(modified) llvm/test/CodeGen/NVPTX/chain-different-as.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/cmpxchg.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/compute-ptx-value-vts.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/demote-vars.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/extractelement.ll (+21-21)
(modified) llvm/test/CodeGen/NVPTX/f16x2-instructions.ll (+69-69)
(modified) llvm/test/CodeGen/NVPTX/fma-relu-contract.ll (+84-84)
(modified) llvm/test/CodeGen/NVPTX/fma-relu-fma-intrinsic.ll (+53-53)
(modified) llvm/test/CodeGen/NVPTX/fma-relu-instruction-flag.ll (+122-122)
(modified) llvm/test/CodeGen/NVPTX/i1-load-lower.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/i128.ll (+9-9)
(modified) llvm/test/CodeGen/NVPTX/i16x2-instructions.ll (+36-36)
(modified) llvm/test/CodeGen/NVPTX/i8x4-instructions.ll (+86-86)
(modified) llvm/test/CodeGen/NVPTX/inline-asm-b128-test1.ll (+3-3)
(modified) llvm/test/CodeGen/NVPTX/inline-asm-b128-test2.ll (+4-4)
(modified) llvm/test/CodeGen/NVPTX/inline-asm-b128-test3.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/math-intrins.ll (+88-88)
(modified) llvm/test/CodeGen/NVPTX/misched_func_call.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/pr13291-i1-store.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/reg-types.ll (+6-6)
(modified) llvm/test/CodeGen/NVPTX/unfold-masked-merge-vector-variablemask.ll (+130-130)
(modified) llvm/test/CodeGen/NVPTX/vaargs.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/variadics-backend.ll (+6-6)
(modified) llvm/test/CodeGen/NVPTX/vector-returns.ll (+25-25)

diff --git a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
index abaf8e0b0ec1f8..eb4918c43f0dce 100644
--- a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+++ b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
@@ -1994,22 +1994,15 @@ let IsSimpleMove=1, hasSideEffects=0 in {
 def IMOV1ri : NVPTXInst<(outs Int1Regs:$dst), (ins i1imm:$src),
                         "mov.pred \t$dst, $src;",
                         [(set i1:$dst, imm:$src)]>;
-def IMOV16ri : NVPTXInst<(outs Int16Regs:$dst), (ins i16imm:$src),
-                         "mov.u16 \t$dst, $src;",
-                         [(set i16:$dst, imm:$src)]>;
-def IMOV32ri : NVPTXInst<(outs Int32Regs:$dst), (ins i32imm:$src),
-                         "mov.u32 \t$dst, $src;",
-                         [(set i32:$dst, imm:$src)]>;
-def IMOV64ri : NVPTXInst<(outs Int64Regs:$dst), (ins i64imm:$src),
-                        "mov.u64 \t$dst, $src;",
-                        [(set i64:$dst, imm:$src)]>;
-
 def IMOVB16ri : NVPTXInst<(outs Int16Regs:$dst), (ins i16imm:$src),
-                         "mov.b16 \t$dst, $src;", []>;
+                         "mov.b16 \t$dst, $src;",
+                         [(set i16:$dst, imm:$src)]>;
 def IMOVB32ri : NVPTXInst<(outs Int32Regs:$dst), (ins i32imm:$src),
-                         "mov.b32 \t$dst, $src;", []>;
+                         "mov.b32 \t$dst, $src;",
+                         [(set i32:$dst, imm:$src)]>;
 def IMOVB64ri : NVPTXInst<(outs Int64Regs:$dst), (ins i64imm:$src),
-                        "mov.b64 \t$dst, $src;", []>;
+                        "mov.b64 \t$dst, $src;",
+                        [(set i64:$dst, imm:$src)]>;
 
 def FMOV32ri : NVPTXInst<(outs Float32Regs:$dst), (ins f32imm:$src),
                          "mov.f32 \t$dst, $src;",
@@ -2018,8 +2011,8 @@ def FMOV64ri : NVPTXInst<(outs Float64Regs:$dst), (ins f64imm:$src),
                          "mov.f64 \t$dst, $src;",
                          [(set f64:$dst, fpimm:$src)]>;
 
-def : Pat<(i32 (Wrapper texternalsym:$dst)), (IMOV32ri texternalsym:$dst)>;
-def : Pat<(i64 (Wrapper texternalsym:$dst)), (IMOV64ri texternalsym:$dst)>;
+def : Pat<(i32 (Wrapper texternalsym:$dst)), (IMOVB32ri texternalsym:$dst)>;
+def : Pat<(i64 (Wrapper texternalsym:$dst)), (IMOVB64ri texternalsym:$dst)>;
 
 //---- Copy Frame Index ----
 def LEA_ADDRi :   NVPTXInst<(outs Int32Regs:$dst), (ins MEMri:$addr),
@@ -3104,21 +3097,17 @@ def: Pat<(f32 (bitconvert vt:$a)),
          (BITCONVERT_32_I2F Int32Regs:$a)>;
 }
 foreach vt = [f16, bf16] in {
-def: Pat<(vt (bitconvert (i16 UInt16Const:$a))),
-         (IMOVB16ri UInt16Const:$a)>;
-def: Pat<(vt (bitconvert i16:$a)),
-         (ProxyRegI16 Int16Regs:$a)>;
-def: Pat<(i16 (bitconvert vt:$a)),
-         (ProxyRegI16 Int16Regs:$a)>;
+  def: Pat<(vt (bitconvert i16:$a)),
+           (vt Int16Regs:$a)>;
+  def: Pat<(i16 (bitconvert vt:$a)),
+           (i16 Int16Regs:$a)>;
 }
 
 foreach ta = [v2f16, v2bf16, v2i16, v4i8, i32] in {
-  def: Pat<(ta (bitconvert (i32 UInt32Const:$a))),
-           (IMOVB32ri UInt32Const:$a)>;
   foreach tb = [v2f16, v2bf16, v2i16, v4i8, i32] in {
     if !ne(ta, tb) then {
-      def: Pat<(ta (bitconvert (tb Int32Regs:$a))),
-             (ProxyRegI32 Int32Regs:$a)>;
+      def: Pat<(ta (bitconvert tb:$a)),
+               (ta Int32Regs:$a)>;
     }
   }
 }
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 256161d5d79c77..6d4a56f191825b 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -2803,10 +2803,10 @@ def : Pat<(int_nvvm_ptr_param_to_gen i64:$src),
 
 // nvvm.ptr.gen.to.param
 def : Pat<(int_nvvm_ptr_gen_to_param i32:$src),
-          (IMOV32rr Int32Regs:$src)>;
+          (i32 Int32Regs:$src)>;
 
 def : Pat<(int_nvvm_ptr_gen_to_param i64:$src),
-          (IMOV64rr Int64Regs:$src)>;
+          (i64 Int64Regs:$src)>;
 
 // nvvm.move intrinsicc
 def nvvm_move_i16 : NVPTXInst<(outs Int16Regs:$r), (ins Int16Regs:$s),
diff --git a/llvm/test/CodeGen/NVPTX/atomics-sm70.ll b/llvm/test/CodeGen/NVPTX/atomics-sm70.ll
index bae7109288b99f..05f466f2138ec1 100644
--- a/llvm/test/CodeGen/NVPTX/atomics-sm70.ll
+++ b/llvm/test/CodeGen/NVPTX/atomics-sm70.ll
@@ -46,7 +46,7 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, half %
 ; CHECKPTX62-LABEL: test(
 ; CHECKPTX62:       {
 ; CHECKPTX62-NEXT:    .reg .pred %p<5>;
-; CHECKPTX62-NEXT:    .reg .b16 %rs<19>;
+; CHECKPTX62-NEXT:    .reg .b16 %rs<11>;
 ; CHECKPTX62-NEXT:    .reg .b32 %r<58>;
 ; CHECKPTX62-EMPTY:
 ; CHECKPTX62-NEXT:  // %bb.0:
@@ -65,8 +65,8 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, half %
 ; CHECKPTX62-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECKPTX62-NEXT:    shr.u32 %r28, %r54, %r2;
 ; CHECKPTX62-NEXT:    cvt.u16.u32 %rs2, %r28;
-; CHECKPTX62-NEXT:    add.rn.f16 %rs4, %rs2, %rs1;
-; CHECKPTX62-NEXT:    cvt.u32.u16 %r29, %rs4;
+; CHECKPTX62-NEXT:    add.rn.f16 %rs3, %rs2, %rs1;
+; CHECKPTX62-NEXT:    cvt.u32.u16 %r29, %rs3;
 ; CHECKPTX62-NEXT:    shl.b32 %r30, %r29, %r2;
 ; CHECKPTX62-NEXT:    and.b32 %r31, %r54, %r3;
 ; CHECKPTX62-NEXT:    or.b32 %r32, %r31, %r30;
@@ -79,10 +79,10 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, half %
 ; CHECKPTX62-NEXT:  $L__BB0_3: // %atomicrmw.start27
 ; CHECKPTX62-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECKPTX62-NEXT:    shr.u32 %r33, %r55, %r2;
-; CHECKPTX62-NEXT:    cvt.u16.u32 %rs6, %r33;
-; CHECKPTX62-NEXT:    mov.b16 %rs8, 0x3C00;
-; CHECKPTX62-NEXT:    add.rn.f16 %rs9, %rs6, %rs8;
-; CHECKPTX62-NEXT:    cvt.u32.u16 %r34, %rs9;
+; CHECKPTX62-NEXT:    cvt.u16.u32 %rs4, %r33;
+; CHECKPTX62-NEXT:    mov.b16 %rs5, 0x3C00;
+; CHECKPTX62-NEXT:    add.rn.f16 %rs6, %rs4, %rs5;
+; CHECKPTX62-NEXT:    cvt.u32.u16 %r34, %rs6;
 ; CHECKPTX62-NEXT:    shl.b32 %r35, %r34, %r2;
 ; CHECKPTX62-NEXT:    and.b32 %r36, %r55, %r3;
 ; CHECKPTX62-NEXT:    or.b32 %r37, %r36, %r35;
@@ -100,9 +100,9 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, half %
 ; CHECKPTX62-NEXT:  $L__BB0_5: // %atomicrmw.start9
 ; CHECKPTX62-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECKPTX62-NEXT:    shr.u32 %r41, %r56, %r11;
-; CHECKPTX62-NEXT:    cvt.u16.u32 %rs11, %r41;
-; CHECKPTX62-NEXT:    add.rn.f16 %rs13, %rs11, %rs1;
-; CHECKPTX62-NEXT:    cvt.u32.u16 %r42, %rs13;
+; CHECKPTX62-NEXT:    cvt.u16.u32 %rs7, %r41;
+; CHECKPTX62-NEXT:    add.rn.f16 %rs8, %rs7, %rs1;
+; CHECKPTX62-NEXT:    cvt.u32.u16 %r42, %rs8;
 ; CHECKPTX62-NEXT:    shl.b32 %r43, %r42, %r11;
 ; CHECKPTX62-NEXT:    and.b32 %r44, %r56, %r12;
 ; CHECKPTX62-NEXT:    or.b32 %r45, %r44, %r43;
@@ -120,9 +120,9 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, half %
 ; CHECKPTX62-NEXT:  $L__BB0_7: // %atomicrmw.start
 ; CHECKPTX62-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECKPTX62-NEXT:    shr.u32 %r49, %r57, %r17;
-; CHECKPTX62-NEXT:    cvt.u16.u32 %rs15, %r49;
-; CHECKPTX62-NEXT:    add.rn.f16 %rs17, %rs15, %rs1;
-; CHECKPTX62-NEXT:    cvt.u32.u16 %r50, %rs17;
+; CHECKPTX62-NEXT:    cvt.u16.u32 %rs9, %r49;
+; CHECKPTX62-NEXT:    add.rn.f16 %rs10, %rs9, %rs1;
+; CHECKPTX62-NEXT:    cvt.u32.u16 %r50, %rs10;
 ; CHECKPTX62-NEXT:    shl.b32 %r51, %r50, %r17;
 ; CHECKPTX62-NEXT:    and.b32 %r52, %r57, %r18;
 ; CHECKPTX62-NEXT:    or.b32 %r53, %r52, %r51;
diff --git a/llvm/test/CodeGen/NVPTX/atomics-sm90.ll b/llvm/test/CodeGen/NVPTX/atomics-sm90.ll
index 9ddb82321b4ea2..f81b785f13225c 100644
--- a/llvm/test/CodeGen/NVPTX/atomics-sm90.ll
+++ b/llvm/test/CodeGen/NVPTX/atomics-sm90.ll
@@ -46,7 +46,7 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, bfloat
 ; CHECKPTX71-LABEL: test(
 ; CHECKPTX71:       {
 ; CHECKPTX71-NEXT:    .reg .pred %p<5>;
-; CHECKPTX71-NEXT:    .reg .b16 %rs<34>;
+; CHECKPTX71-NEXT:    .reg .b16 %rs<22>;
 ; CHECKPTX71-NEXT:    .reg .b32 %r<4>;
 ; CHECKPTX71-NEXT:    .reg .f32 %f<12>;
 ; CHECKPTX71-EMPTY:
@@ -55,49 +55,49 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, bfloat
 ; CHECKPTX71-NEXT:    ld.param.u32 %r3, [test_param_2];
 ; CHECKPTX71-NEXT:    ld.param.u32 %r2, [test_param_1];
 ; CHECKPTX71-NEXT:    ld.param.u32 %r1, [test_param_0];
-; CHECKPTX71-NEXT:    ld.b16 %rs30, [%r1];
+; CHECKPTX71-NEXT:    ld.b16 %rs18, [%r1];
 ; CHECKPTX71-NEXT:    cvt.f32.bf16 %f1, %rs13;
 ; CHECKPTX71-NEXT:  $L__BB0_1: // %atomicrmw.start14
 ; CHECKPTX71-NEXT:    // =>This Inner Loop Header: Depth=1
-; CHECKPTX71-NEXT:    cvt.f32.bf16 %f2, %rs30;
+; CHECKPTX71-NEXT:    cvt.f32.bf16 %f2, %rs18;
 ; CHECKPTX71-NEXT:    add.rn.f32 %f3, %f2, %f1;
 ; CHECKPTX71-NEXT:    cvt.rn.bf16.f32 %rs14, %f3;
-; CHECKPTX71-NEXT:    atom.cas.b16 %rs17, [%r1], %rs30, %rs14;
-; CHECKPTX71-NEXT:    setp.ne.s16 %p1, %rs17, %rs30;
-; CHECKPTX71-NEXT:    mov.u16 %rs30, %rs17;
+; CHECKPTX71-NEXT:    atom.cas.b16 %rs3, [%r1], %rs18, %rs14;
+; CHECKPTX71-NEXT:    setp.ne.s16 %p1, %rs3, %rs18;
+; CHECKPTX71-NEXT:    mov.u16 %rs18, %rs3;
 ; CHECKPTX71-NEXT:    @%p1 bra $L__BB0_1;
 ; CHECKPTX71-NEXT:  // %bb.2: // %atomicrmw.end13
-; CHECKPTX71-NEXT:    ld.b16 %rs31, [%r1];
+; CHECKPTX71-NEXT:    ld.b16 %rs19, [%r1];
 ; CHECKPTX71-NEXT:  $L__BB0_3: // %atomicrmw.start8
 ; CHECKPTX71-NEXT:    // =>This Inner Loop Header: Depth=1
-; CHECKPTX71-NEXT:    cvt.f32.bf16 %f4, %rs31;
+; CHECKPTX71-NEXT:    cvt.f32.bf16 %f4, %rs19;
 ; CHECKPTX71-NEXT:    add.rn.f32 %f5, %f4, 0f3F800000;
-; CHECKPTX71-NEXT:    cvt.rn.bf16.f32 %rs18, %f5;
-; CHECKPTX71-NEXT:    atom.cas.b16 %rs21, [%r1], %rs31, %rs18;
-; CHECKPTX71-NEXT:    setp.ne.s16 %p2, %rs21, %rs31;
-; CHECKPTX71-NEXT:    mov.u16 %rs31, %rs21;
+; CHECKPTX71-NEXT:    cvt.rn.bf16.f32 %rs15, %f5;
+; CHECKPTX71-NEXT:    atom.cas.b16 %rs6, [%r1], %rs19, %rs15;
+; CHECKPTX71-NEXT:    setp.ne.s16 %p2, %rs6, %rs19;
+; CHECKPTX71-NEXT:    mov.u16 %rs19, %rs6;
 ; CHECKPTX71-NEXT:    @%p2 bra $L__BB0_3;
 ; CHECKPTX71-NEXT:  // %bb.4: // %atomicrmw.end7
-; CHECKPTX71-NEXT:    ld.global.b16 %rs32, [%r2];
+; CHECKPTX71-NEXT:    ld.global.b16 %rs20, [%r2];
 ; CHECKPTX71-NEXT:  $L__BB0_5: // %atomicrmw.start2
 ; CHECKPTX71-NEXT:    // =>This Inner Loop Header: Depth=1
-; CHECKPTX71-NEXT:    cvt.f32.bf16 %f7, %rs32;
+; CHECKPTX71-NEXT:    cvt.f32.bf16 %f7, %rs20;
 ; CHECKPTX71-NEXT:    add.rn.f32 %f8, %f7, %f1;
-; CHECKPTX71-NEXT:    cvt.rn.bf16.f32 %rs22, %f8;
-; CHECKPTX71-NEXT:    atom.global.cas.b16 %rs25, [%r2], %rs32, %rs22;
-; CHECKPTX71-NEXT:    setp.ne.s16 %p3, %rs25, %rs32;
-; CHECKPTX71-NEXT:    mov.u16 %rs32, %rs25;
+; CHECKPTX71-NEXT:    cvt.rn.bf16.f32 %rs16, %f8;
+; CHECKPTX71-NEXT:    atom.global.cas.b16 %rs9, [%r2], %rs20, %rs16;
+; CHECKPTX71-NEXT:    setp.ne.s16 %p3, %rs9, %rs20;
+; CHECKPTX71-NEXT:    mov.u16 %rs20, %rs9;
 ; CHECKPTX71-NEXT:    @%p3 bra $L__BB0_5;
 ; CHECKPTX71-NEXT:  // %bb.6: // %atomicrmw.end1
-; CHECKPTX71-NEXT:    ld.shared.b16 %rs33, [%r3];
+; CHECKPTX71-NEXT:    ld.shared.b16 %rs21, [%r3];
 ; CHECKPTX71-NEXT:  $L__BB0_7: // %atomicrmw.start
 ; CHECKPTX71-NEXT:    // =>This Inner Loop Header: Depth=1
-; CHECKPTX71-NEXT:    cvt.f32.bf16 %f10, %rs33;
+; CHECKPTX71-NEXT:    cvt.f32.bf16 %f10, %rs21;
 ; CHECKPTX71-NEXT:    add.rn.f32 %f11, %f10, %f1;
-; CHECKPTX71-NEXT:    cvt.rn.bf16.f32 %rs26, %f11;
-; CHECKPTX71-NEXT:    atom.shared.cas.b16 %rs29, [%r3], %rs33, %rs26;
-; CHECKPTX71-NEXT:    setp.ne.s16 %p4, %rs29, %rs33;
-; CHECKPTX71-NEXT:    mov.u16 %rs33, %rs29;
+; CHECKPTX71-NEXT:    cvt.rn.bf16.f32 %rs17, %f11;
+; CHECKPTX71-NEXT:    atom.shared.cas.b16 %rs12, [%r3], %rs21, %rs17;
+; CHECKPTX71-NEXT:    setp.ne.s16 %p4, %rs12, %rs21;
+; CHECKPTX71-NEXT:    mov.u16 %rs21, %rs12;
 ; CHECKPTX71-NEXT:    @%p4 bra $L__BB0_7;
 ; CHECKPTX71-NEXT:  // %bb.8: // %atomicrmw.end
 ; CHECKPTX71-NEXT:    ret;
diff --git a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
index 08ed317ef93007..6828bac18cad7f 100644
--- a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
@@ -17,7 +17,7 @@ define bfloat @test_fadd(bfloat %0, bfloat %1) {
 ; SM70-LABEL: test_fadd(
 ; SM70:       {
 ; SM70-NEXT:    .reg .pred %p<2>;
-; SM70-NEXT:    .reg .b16 %rs<3>;
+; SM70-NEXT:    .reg .b16 %rs<2>;
 ; SM70-NEXT:    .reg .b32 %r<11>;
 ; SM70-NEXT:    .reg .f32 %f<4>;
 ; SM70-EMPTY:
@@ -88,7 +88,7 @@ define bfloat @test_fsub(bfloat %0, bfloat %1) {
 ; SM70-LABEL: test_fsub(
 ; SM70:       {
 ; SM70-NEXT:    .reg .pred %p<2>;
-; SM70-NEXT:    .reg .b16 %rs<3>;
+; SM70-NEXT:    .reg .b16 %rs<2>;
 ; SM70-NEXT:    .reg .b32 %r<11>;
 ; SM70-NEXT:    .reg .f32 %f<4>;
 ; SM70-EMPTY:
@@ -159,8 +159,8 @@ define <2 x bfloat> @test_faddx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
 ; SM70-LABEL: test_faddx2(
 ; SM70:       {
 ; SM70-NEXT:    .reg .pred %p<3>;
-; SM70-NEXT:    .reg .b16 %rs<9>;
-; SM70-NEXT:    .reg .b32 %r<25>;
+; SM70-NEXT:    .reg .b16 %rs<5>;
+; SM70-NEXT:    .reg .b32 %r<24>;
 ; SM70-NEXT:    .reg .f32 %f<7>;
 ; SM70-EMPTY:
 ; SM70-NEXT:  // %bb.0:
@@ -170,8 +170,8 @@ define <2 x bfloat> @test_faddx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
 ; SM70-NEXT:    cvt.u32.u16 %r3, %rs2;
 ; SM70-NEXT:    shl.b32 %r4, %r3, 16;
 ; SM70-NEXT:    mov.b32 %f1, %r4;
-; SM70-NEXT:    mov.b32 {%rs4, %rs5}, %r1;
-; SM70-NEXT:    cvt.u32.u16 %r5, %rs5;
+; SM70-NEXT:    mov.b32 {%rs3, %rs4}, %r1;
+; SM70-NEXT:    cvt.u32.u16 %r5, %rs4;
 ; SM70-NEXT:    shl.b32 %r6, %r5, 16;
 ; SM70-NEXT:    mov.b32 %f2, %r6;
 ; SM70-NEXT:    add.rn.f32 %f3, %f2, %f1;
@@ -185,7 +185,7 @@ define <2 x bfloat> @test_faddx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
 ; SM70-NEXT:    cvt.u32.u16 %r13, %rs1;
 ; SM70-NEXT:    shl.b32 %r14, %r13, 16;
 ; SM70-NEXT:    mov.b32 %f4, %r14;
-; SM70-NEXT:    cvt.u32.u16 %r15, %rs4;
+; SM70-NEXT:    cvt.u32.u16 %r15, %rs3;
 ; SM70-NEXT:    shl.b32 %r16, %r15, 16;
 ; SM70-NEXT:    mov.b32 %f5, %r16;
 ; SM70-NEXT:    add.rn.f32 %f6, %f5, %f4;
@@ -260,8 +260,8 @@ define <2 x bfloat> @test_fsubx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
 ; SM70-LABEL: test_fsubx2(
 ; SM70:       {
 ; SM70-NEXT:    .reg .pred %p<3>;
-; SM70-NEXT:    .reg .b16 %rs<9>;
-; SM70-NEXT:    .reg .b32 %r<25>;
+; SM70-NEXT:    .reg .b16 %rs<5>;
+; SM70-NEXT:    .reg .b32 %r<24>;
 ; SM70-NEXT:    .reg .f32 %f<7>;
 ; SM70-EMPTY:
 ; SM70-NEXT:  // %bb.0:
@@ -271,8 +271,8 @@ define <2 x bfloat> @test_fsubx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
 ; SM70-NEXT:    cvt.u32.u16 %r3, %rs2;
 ; SM70-NEXT:    shl.b32 %r4, %r3, 16;
 ; SM70-NEXT:    mov.b32 %f1, %r4;
-; SM70-NEXT:    mov.b32 {%rs4, %rs5}, %r1;
-; SM70-NEXT:    cvt.u32.u16 %r5, %rs5;
+; SM70-NEXT:    mov.b32 {%rs3, %rs4}, %r1;
+; SM70-NEXT:    cvt.u32.u16 %r5, %rs4;
 ; SM70-NEXT:    shl.b32 %r6, %r5, 16;
 ; SM70-NEXT:    mov.b32 %f2, %r6;
 ; SM70-NEXT:    sub.rn.f32 %f3, %f2, %f1;
@@ -286,7 +286,7 @@ define <2 x bfloat> @test_fsubx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
 ; SM70-NEXT:    cvt.u32.u16 %r13, %rs1;
 ; SM70-NEXT:    shl.b32 %r14, %r13, 16;
 ; SM70-NEXT:    mov.b32 %f4, %r14;
-; SM70-NEXT:    cvt.u32.u16 %r15, %rs4;
+; SM70-NEXT:    cvt.u32.u16 %r15, %rs3;
 ; SM70-NEXT:    shl.b32 %r16, %r15, 16;
 ; SM70-NEXT:    mov.b32 %f5, %r16;
 ; SM70-NEXT:    sub.rn.f32 %f6, %f5, %f4;
@@ -361,8 +361,8 @@ define <2 x bfloat> @test_fmulx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
 ; SM70-LABEL: test_fmulx2(
 ; SM70:       {
 ; SM70-NEXT:    .reg .pred %p<3>;
-; SM70-NEXT:    .reg .b16 %rs<9>;
-; SM70-NEXT:    .reg .b32 %r<25>;
+; SM70-NEXT:    .reg .b16 %rs<5>;
+; SM70-NEXT:    .reg .b32 %r<24>;
 ; SM70-NEXT:    .reg .f32 %f<7>;
 ; SM70-EMPTY:
 ; SM70-NEXT:  // %bb.0:
@@ -372,8 +372,8 @@ define <2 x bfloat> @test_fmulx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
 ; SM70-NEXT:    cvt.u32.u16 %r3, %rs2;
 ; SM70-NEXT:    shl.b32 %r4, %r3, 16;
 ; SM70-NEXT:    mov.b32 %f1, %r4;
-; SM70-NEXT:    mov.b32 {%rs4, %rs5}, %r1;
-; SM70-NEXT:    cvt.u32.u16 %r5, %rs5;
+; SM70-NEXT:    mov.b32 {%rs3, %rs4}, %r1;
+; SM70-NEXT:    cvt.u32.u16 %r5, %rs4;
 ; SM70-NEXT:    shl.b32 %r6, %r5, 16;
 ; SM70-NEXT:    mov.b32 %f2, %r6;
 ; SM70-NEXT:    mul.rn.f32 %f3, %f2, %f1;
@@ -387,7 +387,7 @@ define <2 x bfloat> @test_fmulx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
 ; SM70-NEXT:    cvt.u32.u16 %r13, %rs1;
 ; SM70-NEXT:    shl.b32 %r14, %r13, 16;
 ; SM70-NEXT:    mov.b32 %f4, %r14;
-; SM70-NEXT:    cvt.u32.u16 %r15, %rs4;
+; SM70-NEXT:    cvt.u32.u16 %r15, %rs3;
 ; SM70-NEXT:    shl.b32 %r16, %r15, 16;
 ; SM70-NEXT:    mov.b32 %f5, %r16;
 ; SM70-NEXT:    mul.rn.f32 %f6, %f5, %f4;
@@ -462,8 +462,8 @@ define <2 x bfloat> @test_fdiv(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
 ; SM70-LABEL: test_fdiv(
 ; SM70:       {
 ; SM70-NEXT:    .reg .pred %p<3>;
-; SM70-NEXT:    .reg .b16 %rs<9>;
-; SM70-NEXT:    .reg .b32 %r<25>;
+; SM70-NEXT:    .reg .b16 %rs<5>;
+; SM70-NEXT:    .reg .b32 %r<24>;
 ; SM70-NEXT:    .reg .f32 %f<7>;
 ; SM70-EMPTY:
 ; SM70-NEXT:  // %bb.0:
@@ -473,8 +473,8 @@ define <2 x bfloat> @test_fdiv(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
 ; SM70-NEXT:    cvt.u32.u16 %r3, %rs2;
 ; SM70-NEXT:    shl.b32 %r4, %r3, 16;
 ; SM70-NEXT:    mov.b32 %f1, %r4;
-; SM70-NEXT:    mov.b32 {%rs4, %rs5}, %r1;
-; SM70-NEXT:    cvt.u32.u16 %r5, %rs5;
+; SM70-NEXT:    mov.b32 {%rs3, %rs4}, %r1;
+; SM70-NEXT:    cvt.u32.u16 %r5, %rs4;
 ; SM70-NEXT:    shl.b32 %r6, %r5, 16;
 ; SM70-NEXT:    mov.b32 %f2, %r6;
 ; SM70-NEXT:    div.rn.f32 %f3, %f2, %f1;
@@ -488,7 +488,7 @@ define <2 x bfloat> @test_fdiv(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
 ; SM70-NEXT:    cvt.u32.u16 %r13, %rs1;
 ; SM70-NEXT:    shl.b32 %r14, %r13, 16;
 ; SM70-NEXT:    mov.b32 %f4, %r14;
-; SM70-NEXT:    cvt.u32.u16 %r15, %rs4;
+; SM70-NEXT:    cvt.u32.u16 %r15, %rs3;
 ; SM70-NEXT:    shl.b32 %r16, %r15, 16;
 ; SM70-NEXT:    mov.b32 %f5, %r16;
 ; SM70-NEXT:    div.rn.f32 %f6, %f5, %f4;
@@ -648,7 +648,7 @@ define bfloat @test_fptrunc_float(float %a) #0 {
 ; SM70-LABEL: test_fptrunc_float(
 ; SM70:       {
 ; SM70-NEXT:    .reg .pred %p<2>;
-; SM70-NEXT:    .reg .b16 %rs<3>;
+; SM70-NEXT:    .reg .b16 %rs<2>;
 ; SM70-NEXT:    .reg .b32 %r<7>;
 ; SM70-NEXT:    .reg .f32 %f<2>;
 ; SM70-EMPTY:
@@ -705,7 +705,7 @@ define bfloat @test_fadd_imm_1(bfloat %a) #0 {
 ; SM70-LABEL: test_fadd_imm_1(
 ; SM70:       {
 ; SM70-NEXT:    .reg .pred %p<2>;
-; SM70-NEXT:    .reg .b16 %rs<3>;
+; SM70-NEXT:    .reg .b16 %rs<2>;
 ; SM70-NEXT:    .reg .b32 %r<9>;
 ; SM70-NEXT:    .reg .f32 %f<3>;
 ; SM70-EMPTY:
@@ -789,7 +789,7 @@ define bfloat @test_select_cc_bf16_f64(double %a, double %b, bfloat %c, bfloat %
 define <8 x float> @test_extload_bf16x8(ptr addrspace(3) noundef %arg) #0 {
 ; SM70-LABEL: test_extload_bf16x8(
 ; SM70:       {
-; SM70-NEXT:    .reg .b16 %rs<17>;
+; SM70-NEXT:    .reg .b16 %rs<9>;
 ; SM70-NEXT:    .reg .b32 %r<21>;
 ; SM70-NEXT:    .reg .f32 %f<9>;
 ; SM70-NEXT:    .reg .b64 %rd<2>;
@@ -1033,7 +1033,7 @@ define bfloat @test_sitofp_i16(i16 %a) {
 ; SM70-LABEL: test_sitofp_i16(
 ; SM70:       {
 ; SM70-NEXT:    .reg .pred %p<2>;
-; SM70-NEXT:    .reg .b16 %rs<4>;
+; SM70-NEXT:    .reg .b16 %rs<3>;
 ; SM70-NEXT:    .reg .b32 %r<7>;
 ; SM70-NEXT:    .reg .f32 %f<2>;
 ; SM70-EMPTY:
@@ -1092,7 +1092,7 @@ define bfloat @test_uitofp_i8(i8 %a) {
 ; SM70-LABEL: test_uitofp_i8(
 ; SM70:       {
 ; SM70-NEXT:    .reg .pred %p<2>;
-; SM70-NEXT:    .reg .b16 %rs<4>;
+; SM70-NEXT:    .reg .b16 %rs<3>;
 ; SM70-NEXT:    .reg .b32 %r<7>;
 ; SM70-NEXT:    .reg .f32 %f<2>;
 ; SM70-EMPTY:
@@ -1151,7 +1151,7 @@ define bfloat @test_uitofp_i1(i1 %a) {
 ; SM70-LABEL: test_uitofp_i1(
 ; SM70:       {
 ; SM70-NEXT:    .reg .pred %p<3>;
-; SM70-NEXT:    .reg .b16 %rs<5>;
+; SM70-NEXT:    .reg .b16 %rs<4>;
 ; SM70-NEXT:    .reg .b32 %r<8>;
 ; SM70-NEXT:    .reg .f32 %f<2>;
 ; SM70-EMPTY:
@@ -1228,7 +1228,7 @@ define bfloat @test_uitofp_i16(i16 %a) {
 ; SM70-LABEL: test_uitofp_i16(
 ; SM70:       {
 ; SM70-NEXT:    .reg .pred %p<2>;
-; SM70-NEXT:    .reg .b16 %rs<4>;
+; SM70-NEXT:    .reg .b16 %rs<3>;
 ; SM70-NEXT:    .reg .b32 %r<7>;
 ; SM70-NEXT:    .reg .f32 %f<2>;
 ; SM70-EMPTY:
@@ -1287,7 +1287,7 @@ define bfloat @test_uitofp_i32(i32 %a) {
 ; SM70-LABEL: test_uitofp_i32(
 ; SM70:       {
 ; SM70-NEXT:    .reg .pred %p<2>;
-; SM70-NEXT:    .reg .b16 %rs<3>;
+; SM70-NEXT:    .reg .b16 %rs<2>;
 ; SM70-NEXT:    .reg .b32 %r<8>;
 ; SM70-NEXT:    .reg .f32 %f<2>;
 ; SM70-EMPTY:
@@ -1349,7 +1349,7 @@ define bfloat @test_uitofp_i64(i64 %a) {
 ; SM70-LABEL: test_uitofp_i64(
 ; SM70:       ...
[truncated]

justinfargnoli

These nodes are all erased by a later pass

What pass does this?

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td

llvm/test/CodeGen/NVPTX/f16x2-instructions.ll

justinfargnoli

LGTM :)

AlexMaclean · 2024-12-18T22:54:17Z

These nodes are all erased by a later pass

What pass does this?

NVPTXProxyRegErasure

Artem-B

Nice.
LGTM.

Artem-B · 2024-12-18T23:21:22Z

llvm/test/CodeGen/NVPTX/extractelement.ll

+; CHECK-NEXT:    { .reg .b32 tmp; mov.b64 {tmp, %r1}, %rd1; }
+; CHECK-NEXT:    cvt.u32.u64 %r2, %rd1;


Future improvement opportunity: Coalesce partial mov and cvt into mov.b64 {%r2, %r1}, %rd1.

Absolutely! I've been thinking about this for a while.

AlexMaclean · 2024-12-18T23:48:54Z

@Artem-B, @justinfargnoli heads up, I've done some further cleanup in 2756e29 while addressing Justin's request. If you have a minute please take another look.

justinfargnoli · 2024-12-19T00:03:35Z

llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp

-// There's no way to specify FP16 and BF16 immediates in .(b)f16 ops, so we
-// have to load them into an .(b)f16 register first.


We're deleting this code because the comment is false, right? Or am I misunderstanding something.

If the source code comment is false, then the update LGTM.

The comment is sort of true but it doesn't justify making things the way they were. It is simpler and cleaner to replace these with normally named Mov instructions and to use tablegen to generate the ISel logic. This is handled the same way any other instruction where we do not support an immediate operand is.

I'm not sure it's been fixed in ptxas: https://godbolt.org/z/d8EcMevc8

If anything, recent versions seem to be more restrictive than the older ones.

I'm fine with removing this code as long as we can still handle constants correctly. This should already be covered by the original tests in llvm/test/CodeGen/NVPTX/f16-instructions.ll

if LLVM tests are happy after the run with ptxas enabled, we should be fine.

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td

[NVPTX] Avoid introducing unnecessary ProxyRegs and Movs in ISel

0ac2841

AlexMaclean added the backend:NVPTX label Dec 18, 2024

AlexMaclean requested review from Artem-B and justinfargnoli December 18, 2024 22:05

AlexMaclean self-assigned this Dec 18, 2024

justinfargnoli reviewed Dec 18, 2024

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td Outdated Show resolved Hide resolved

llvm/test/CodeGen/NVPTX/f16x2-instructions.ll Show resolved Hide resolved

justinfargnoli approved these changes Dec 18, 2024

View reviewed changes

Artem-B approved these changes Dec 18, 2024

View reviewed changes

further mov cleanup

2756e29

justinfargnoli reviewed Dec 19, 2024

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td Show resolved Hide resolved

AlexMaclean merged commit 310e798 into llvm:main Dec 19, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVPTX] Avoid introducing unnecessary ProxyRegs and Movs in ISel #120486

[NVPTX] Avoid introducing unnecessary ProxyRegs and Movs in ISel #120486

Uh oh!

AlexMaclean commented Dec 18, 2024

Uh oh!

llvmbot commented Dec 18, 2024

Uh oh!

justinfargnoli left a comment

Uh oh!

Uh oh!

Uh oh!

justinfargnoli left a comment

Uh oh!

AlexMaclean commented Dec 18, 2024

Uh oh!

Artem-B left a comment

Uh oh!

Artem-B Dec 18, 2024

Uh oh!

AlexMaclean Dec 18, 2024

Uh oh!

AlexMaclean commented Dec 18, 2024

Uh oh!

justinfargnoli Dec 19, 2024

Uh oh!

justinfargnoli Dec 19, 2024

Uh oh!

AlexMaclean Dec 19, 2024

Uh oh!

Artem-B Dec 19, 2024

Uh oh!

Artem-B Dec 19, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		; CHECK-NEXT: { .reg .b32 tmp; mov.b64 {tmp, %r1}, %rd1; }
		; CHECK-NEXT: cvt.u32.u64 %r2, %rd1;

		// There's no way to specify FP16 and BF16 immediates in .(b)f16 ops, so we
		// have to load them into an .(b)f16 register first.

[NVPTX] Avoid introducing unnecessary ProxyRegs and Movs in ISel #120486

[NVPTX] Avoid introducing unnecessary ProxyRegs and Movs in ISel #120486

Uh oh!

Conversation

AlexMaclean commented Dec 18, 2024

Uh oh!

llvmbot commented Dec 18, 2024

Uh oh!

justinfargnoli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

justinfargnoli left a comment

Choose a reason for hiding this comment

Uh oh!

AlexMaclean commented Dec 18, 2024

Uh oh!

Artem-B left a comment

Choose a reason for hiding this comment

Uh oh!

Artem-B Dec 18, 2024

Choose a reason for hiding this comment

Uh oh!

AlexMaclean Dec 18, 2024

Choose a reason for hiding this comment

Uh oh!

AlexMaclean commented Dec 18, 2024

Uh oh!

justinfargnoli Dec 19, 2024

Choose a reason for hiding this comment

Uh oh!

justinfargnoli Dec 19, 2024

Choose a reason for hiding this comment

Uh oh!

AlexMaclean Dec 19, 2024

Choose a reason for hiding this comment

Uh oh!

Artem-B Dec 19, 2024

Choose a reason for hiding this comment

Uh oh!

Artem-B Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Artem-B Dec 19, 2024 •

edited

Loading