-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[NVPTX] Avoid introducing unnecessary ProxyRegs and Movs in ISel #120486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NVPTX] Avoid introducing unnecessary ProxyRegs and Movs in ISel #120486
Conversation
@llvm/pr-subscribers-backend-nvptx Author: Alex MacLean (AlexMaclean) ChangesAvoid introducing Also remove redundant Patch is 201.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/120486.diff 30 Files Affected:
diff --git a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
index abaf8e0b0ec1f8..eb4918c43f0dce 100644
--- a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+++ b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
@@ -1994,22 +1994,15 @@ let IsSimpleMove=1, hasSideEffects=0 in {
def IMOV1ri : NVPTXInst<(outs Int1Regs:$dst), (ins i1imm:$src),
"mov.pred \t$dst, $src;",
[(set i1:$dst, imm:$src)]>;
-def IMOV16ri : NVPTXInst<(outs Int16Regs:$dst), (ins i16imm:$src),
- "mov.u16 \t$dst, $src;",
- [(set i16:$dst, imm:$src)]>;
-def IMOV32ri : NVPTXInst<(outs Int32Regs:$dst), (ins i32imm:$src),
- "mov.u32 \t$dst, $src;",
- [(set i32:$dst, imm:$src)]>;
-def IMOV64ri : NVPTXInst<(outs Int64Regs:$dst), (ins i64imm:$src),
- "mov.u64 \t$dst, $src;",
- [(set i64:$dst, imm:$src)]>;
-
def IMOVB16ri : NVPTXInst<(outs Int16Regs:$dst), (ins i16imm:$src),
- "mov.b16 \t$dst, $src;", []>;
+ "mov.b16 \t$dst, $src;",
+ [(set i16:$dst, imm:$src)]>;
def IMOVB32ri : NVPTXInst<(outs Int32Regs:$dst), (ins i32imm:$src),
- "mov.b32 \t$dst, $src;", []>;
+ "mov.b32 \t$dst, $src;",
+ [(set i32:$dst, imm:$src)]>;
def IMOVB64ri : NVPTXInst<(outs Int64Regs:$dst), (ins i64imm:$src),
- "mov.b64 \t$dst, $src;", []>;
+ "mov.b64 \t$dst, $src;",
+ [(set i64:$dst, imm:$src)]>;
def FMOV32ri : NVPTXInst<(outs Float32Regs:$dst), (ins f32imm:$src),
"mov.f32 \t$dst, $src;",
@@ -2018,8 +2011,8 @@ def FMOV64ri : NVPTXInst<(outs Float64Regs:$dst), (ins f64imm:$src),
"mov.f64 \t$dst, $src;",
[(set f64:$dst, fpimm:$src)]>;
-def : Pat<(i32 (Wrapper texternalsym:$dst)), (IMOV32ri texternalsym:$dst)>;
-def : Pat<(i64 (Wrapper texternalsym:$dst)), (IMOV64ri texternalsym:$dst)>;
+def : Pat<(i32 (Wrapper texternalsym:$dst)), (IMOVB32ri texternalsym:$dst)>;
+def : Pat<(i64 (Wrapper texternalsym:$dst)), (IMOVB64ri texternalsym:$dst)>;
//---- Copy Frame Index ----
def LEA_ADDRi : NVPTXInst<(outs Int32Regs:$dst), (ins MEMri:$addr),
@@ -3104,21 +3097,17 @@ def: Pat<(f32 (bitconvert vt:$a)),
(BITCONVERT_32_I2F Int32Regs:$a)>;
}
foreach vt = [f16, bf16] in {
-def: Pat<(vt (bitconvert (i16 UInt16Const:$a))),
- (IMOVB16ri UInt16Const:$a)>;
-def: Pat<(vt (bitconvert i16:$a)),
- (ProxyRegI16 Int16Regs:$a)>;
-def: Pat<(i16 (bitconvert vt:$a)),
- (ProxyRegI16 Int16Regs:$a)>;
+ def: Pat<(vt (bitconvert i16:$a)),
+ (vt Int16Regs:$a)>;
+ def: Pat<(i16 (bitconvert vt:$a)),
+ (i16 Int16Regs:$a)>;
}
foreach ta = [v2f16, v2bf16, v2i16, v4i8, i32] in {
- def: Pat<(ta (bitconvert (i32 UInt32Const:$a))),
- (IMOVB32ri UInt32Const:$a)>;
foreach tb = [v2f16, v2bf16, v2i16, v4i8, i32] in {
if !ne(ta, tb) then {
- def: Pat<(ta (bitconvert (tb Int32Regs:$a))),
- (ProxyRegI32 Int32Regs:$a)>;
+ def: Pat<(ta (bitconvert tb:$a)),
+ (ta Int32Regs:$a)>;
}
}
}
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 256161d5d79c77..6d4a56f191825b 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -2803,10 +2803,10 @@ def : Pat<(int_nvvm_ptr_param_to_gen i64:$src),
// nvvm.ptr.gen.to.param
def : Pat<(int_nvvm_ptr_gen_to_param i32:$src),
- (IMOV32rr Int32Regs:$src)>;
+ (i32 Int32Regs:$src)>;
def : Pat<(int_nvvm_ptr_gen_to_param i64:$src),
- (IMOV64rr Int64Regs:$src)>;
+ (i64 Int64Regs:$src)>;
// nvvm.move intrinsicc
def nvvm_move_i16 : NVPTXInst<(outs Int16Regs:$r), (ins Int16Regs:$s),
diff --git a/llvm/test/CodeGen/NVPTX/atomics-sm70.ll b/llvm/test/CodeGen/NVPTX/atomics-sm70.ll
index bae7109288b99f..05f466f2138ec1 100644
--- a/llvm/test/CodeGen/NVPTX/atomics-sm70.ll
+++ b/llvm/test/CodeGen/NVPTX/atomics-sm70.ll
@@ -46,7 +46,7 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, half %
; CHECKPTX62-LABEL: test(
; CHECKPTX62: {
; CHECKPTX62-NEXT: .reg .pred %p<5>;
-; CHECKPTX62-NEXT: .reg .b16 %rs<19>;
+; CHECKPTX62-NEXT: .reg .b16 %rs<11>;
; CHECKPTX62-NEXT: .reg .b32 %r<58>;
; CHECKPTX62-EMPTY:
; CHECKPTX62-NEXT: // %bb.0:
@@ -65,8 +65,8 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, half %
; CHECKPTX62-NEXT: // =>This Inner Loop Header: Depth=1
; CHECKPTX62-NEXT: shr.u32 %r28, %r54, %r2;
; CHECKPTX62-NEXT: cvt.u16.u32 %rs2, %r28;
-; CHECKPTX62-NEXT: add.rn.f16 %rs4, %rs2, %rs1;
-; CHECKPTX62-NEXT: cvt.u32.u16 %r29, %rs4;
+; CHECKPTX62-NEXT: add.rn.f16 %rs3, %rs2, %rs1;
+; CHECKPTX62-NEXT: cvt.u32.u16 %r29, %rs3;
; CHECKPTX62-NEXT: shl.b32 %r30, %r29, %r2;
; CHECKPTX62-NEXT: and.b32 %r31, %r54, %r3;
; CHECKPTX62-NEXT: or.b32 %r32, %r31, %r30;
@@ -79,10 +79,10 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, half %
; CHECKPTX62-NEXT: $L__BB0_3: // %atomicrmw.start27
; CHECKPTX62-NEXT: // =>This Inner Loop Header: Depth=1
; CHECKPTX62-NEXT: shr.u32 %r33, %r55, %r2;
-; CHECKPTX62-NEXT: cvt.u16.u32 %rs6, %r33;
-; CHECKPTX62-NEXT: mov.b16 %rs8, 0x3C00;
-; CHECKPTX62-NEXT: add.rn.f16 %rs9, %rs6, %rs8;
-; CHECKPTX62-NEXT: cvt.u32.u16 %r34, %rs9;
+; CHECKPTX62-NEXT: cvt.u16.u32 %rs4, %r33;
+; CHECKPTX62-NEXT: mov.b16 %rs5, 0x3C00;
+; CHECKPTX62-NEXT: add.rn.f16 %rs6, %rs4, %rs5;
+; CHECKPTX62-NEXT: cvt.u32.u16 %r34, %rs6;
; CHECKPTX62-NEXT: shl.b32 %r35, %r34, %r2;
; CHECKPTX62-NEXT: and.b32 %r36, %r55, %r3;
; CHECKPTX62-NEXT: or.b32 %r37, %r36, %r35;
@@ -100,9 +100,9 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, half %
; CHECKPTX62-NEXT: $L__BB0_5: // %atomicrmw.start9
; CHECKPTX62-NEXT: // =>This Inner Loop Header: Depth=1
; CHECKPTX62-NEXT: shr.u32 %r41, %r56, %r11;
-; CHECKPTX62-NEXT: cvt.u16.u32 %rs11, %r41;
-; CHECKPTX62-NEXT: add.rn.f16 %rs13, %rs11, %rs1;
-; CHECKPTX62-NEXT: cvt.u32.u16 %r42, %rs13;
+; CHECKPTX62-NEXT: cvt.u16.u32 %rs7, %r41;
+; CHECKPTX62-NEXT: add.rn.f16 %rs8, %rs7, %rs1;
+; CHECKPTX62-NEXT: cvt.u32.u16 %r42, %rs8;
; CHECKPTX62-NEXT: shl.b32 %r43, %r42, %r11;
; CHECKPTX62-NEXT: and.b32 %r44, %r56, %r12;
; CHECKPTX62-NEXT: or.b32 %r45, %r44, %r43;
@@ -120,9 +120,9 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, half %
; CHECKPTX62-NEXT: $L__BB0_7: // %atomicrmw.start
; CHECKPTX62-NEXT: // =>This Inner Loop Header: Depth=1
; CHECKPTX62-NEXT: shr.u32 %r49, %r57, %r17;
-; CHECKPTX62-NEXT: cvt.u16.u32 %rs15, %r49;
-; CHECKPTX62-NEXT: add.rn.f16 %rs17, %rs15, %rs1;
-; CHECKPTX62-NEXT: cvt.u32.u16 %r50, %rs17;
+; CHECKPTX62-NEXT: cvt.u16.u32 %rs9, %r49;
+; CHECKPTX62-NEXT: add.rn.f16 %rs10, %rs9, %rs1;
+; CHECKPTX62-NEXT: cvt.u32.u16 %r50, %rs10;
; CHECKPTX62-NEXT: shl.b32 %r51, %r50, %r17;
; CHECKPTX62-NEXT: and.b32 %r52, %r57, %r18;
; CHECKPTX62-NEXT: or.b32 %r53, %r52, %r51;
diff --git a/llvm/test/CodeGen/NVPTX/atomics-sm90.ll b/llvm/test/CodeGen/NVPTX/atomics-sm90.ll
index 9ddb82321b4ea2..f81b785f13225c 100644
--- a/llvm/test/CodeGen/NVPTX/atomics-sm90.ll
+++ b/llvm/test/CodeGen/NVPTX/atomics-sm90.ll
@@ -46,7 +46,7 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, bfloat
; CHECKPTX71-LABEL: test(
; CHECKPTX71: {
; CHECKPTX71-NEXT: .reg .pred %p<5>;
-; CHECKPTX71-NEXT: .reg .b16 %rs<34>;
+; CHECKPTX71-NEXT: .reg .b16 %rs<22>;
; CHECKPTX71-NEXT: .reg .b32 %r<4>;
; CHECKPTX71-NEXT: .reg .f32 %f<12>;
; CHECKPTX71-EMPTY:
@@ -55,49 +55,49 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, bfloat
; CHECKPTX71-NEXT: ld.param.u32 %r3, [test_param_2];
; CHECKPTX71-NEXT: ld.param.u32 %r2, [test_param_1];
; CHECKPTX71-NEXT: ld.param.u32 %r1, [test_param_0];
-; CHECKPTX71-NEXT: ld.b16 %rs30, [%r1];
+; CHECKPTX71-NEXT: ld.b16 %rs18, [%r1];
; CHECKPTX71-NEXT: cvt.f32.bf16 %f1, %rs13;
; CHECKPTX71-NEXT: $L__BB0_1: // %atomicrmw.start14
; CHECKPTX71-NEXT: // =>This Inner Loop Header: Depth=1
-; CHECKPTX71-NEXT: cvt.f32.bf16 %f2, %rs30;
+; CHECKPTX71-NEXT: cvt.f32.bf16 %f2, %rs18;
; CHECKPTX71-NEXT: add.rn.f32 %f3, %f2, %f1;
; CHECKPTX71-NEXT: cvt.rn.bf16.f32 %rs14, %f3;
-; CHECKPTX71-NEXT: atom.cas.b16 %rs17, [%r1], %rs30, %rs14;
-; CHECKPTX71-NEXT: setp.ne.s16 %p1, %rs17, %rs30;
-; CHECKPTX71-NEXT: mov.u16 %rs30, %rs17;
+; CHECKPTX71-NEXT: atom.cas.b16 %rs3, [%r1], %rs18, %rs14;
+; CHECKPTX71-NEXT: setp.ne.s16 %p1, %rs3, %rs18;
+; CHECKPTX71-NEXT: mov.u16 %rs18, %rs3;
; CHECKPTX71-NEXT: @%p1 bra $L__BB0_1;
; CHECKPTX71-NEXT: // %bb.2: // %atomicrmw.end13
-; CHECKPTX71-NEXT: ld.b16 %rs31, [%r1];
+; CHECKPTX71-NEXT: ld.b16 %rs19, [%r1];
; CHECKPTX71-NEXT: $L__BB0_3: // %atomicrmw.start8
; CHECKPTX71-NEXT: // =>This Inner Loop Header: Depth=1
-; CHECKPTX71-NEXT: cvt.f32.bf16 %f4, %rs31;
+; CHECKPTX71-NEXT: cvt.f32.bf16 %f4, %rs19;
; CHECKPTX71-NEXT: add.rn.f32 %f5, %f4, 0f3F800000;
-; CHECKPTX71-NEXT: cvt.rn.bf16.f32 %rs18, %f5;
-; CHECKPTX71-NEXT: atom.cas.b16 %rs21, [%r1], %rs31, %rs18;
-; CHECKPTX71-NEXT: setp.ne.s16 %p2, %rs21, %rs31;
-; CHECKPTX71-NEXT: mov.u16 %rs31, %rs21;
+; CHECKPTX71-NEXT: cvt.rn.bf16.f32 %rs15, %f5;
+; CHECKPTX71-NEXT: atom.cas.b16 %rs6, [%r1], %rs19, %rs15;
+; CHECKPTX71-NEXT: setp.ne.s16 %p2, %rs6, %rs19;
+; CHECKPTX71-NEXT: mov.u16 %rs19, %rs6;
; CHECKPTX71-NEXT: @%p2 bra $L__BB0_3;
; CHECKPTX71-NEXT: // %bb.4: // %atomicrmw.end7
-; CHECKPTX71-NEXT: ld.global.b16 %rs32, [%r2];
+; CHECKPTX71-NEXT: ld.global.b16 %rs20, [%r2];
; CHECKPTX71-NEXT: $L__BB0_5: // %atomicrmw.start2
; CHECKPTX71-NEXT: // =>This Inner Loop Header: Depth=1
-; CHECKPTX71-NEXT: cvt.f32.bf16 %f7, %rs32;
+; CHECKPTX71-NEXT: cvt.f32.bf16 %f7, %rs20;
; CHECKPTX71-NEXT: add.rn.f32 %f8, %f7, %f1;
-; CHECKPTX71-NEXT: cvt.rn.bf16.f32 %rs22, %f8;
-; CHECKPTX71-NEXT: atom.global.cas.b16 %rs25, [%r2], %rs32, %rs22;
-; CHECKPTX71-NEXT: setp.ne.s16 %p3, %rs25, %rs32;
-; CHECKPTX71-NEXT: mov.u16 %rs32, %rs25;
+; CHECKPTX71-NEXT: cvt.rn.bf16.f32 %rs16, %f8;
+; CHECKPTX71-NEXT: atom.global.cas.b16 %rs9, [%r2], %rs20, %rs16;
+; CHECKPTX71-NEXT: setp.ne.s16 %p3, %rs9, %rs20;
+; CHECKPTX71-NEXT: mov.u16 %rs20, %rs9;
; CHECKPTX71-NEXT: @%p3 bra $L__BB0_5;
; CHECKPTX71-NEXT: // %bb.6: // %atomicrmw.end1
-; CHECKPTX71-NEXT: ld.shared.b16 %rs33, [%r3];
+; CHECKPTX71-NEXT: ld.shared.b16 %rs21, [%r3];
; CHECKPTX71-NEXT: $L__BB0_7: // %atomicrmw.start
; CHECKPTX71-NEXT: // =>This Inner Loop Header: Depth=1
-; CHECKPTX71-NEXT: cvt.f32.bf16 %f10, %rs33;
+; CHECKPTX71-NEXT: cvt.f32.bf16 %f10, %rs21;
; CHECKPTX71-NEXT: add.rn.f32 %f11, %f10, %f1;
-; CHECKPTX71-NEXT: cvt.rn.bf16.f32 %rs26, %f11;
-; CHECKPTX71-NEXT: atom.shared.cas.b16 %rs29, [%r3], %rs33, %rs26;
-; CHECKPTX71-NEXT: setp.ne.s16 %p4, %rs29, %rs33;
-; CHECKPTX71-NEXT: mov.u16 %rs33, %rs29;
+; CHECKPTX71-NEXT: cvt.rn.bf16.f32 %rs17, %f11;
+; CHECKPTX71-NEXT: atom.shared.cas.b16 %rs12, [%r3], %rs21, %rs17;
+; CHECKPTX71-NEXT: setp.ne.s16 %p4, %rs12, %rs21;
+; CHECKPTX71-NEXT: mov.u16 %rs21, %rs12;
; CHECKPTX71-NEXT: @%p4 bra $L__BB0_7;
; CHECKPTX71-NEXT: // %bb.8: // %atomicrmw.end
; CHECKPTX71-NEXT: ret;
diff --git a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
index 08ed317ef93007..6828bac18cad7f 100644
--- a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
@@ -17,7 +17,7 @@ define bfloat @test_fadd(bfloat %0, bfloat %1) {
; SM70-LABEL: test_fadd(
; SM70: {
; SM70-NEXT: .reg .pred %p<2>;
-; SM70-NEXT: .reg .b16 %rs<3>;
+; SM70-NEXT: .reg .b16 %rs<2>;
; SM70-NEXT: .reg .b32 %r<11>;
; SM70-NEXT: .reg .f32 %f<4>;
; SM70-EMPTY:
@@ -88,7 +88,7 @@ define bfloat @test_fsub(bfloat %0, bfloat %1) {
; SM70-LABEL: test_fsub(
; SM70: {
; SM70-NEXT: .reg .pred %p<2>;
-; SM70-NEXT: .reg .b16 %rs<3>;
+; SM70-NEXT: .reg .b16 %rs<2>;
; SM70-NEXT: .reg .b32 %r<11>;
; SM70-NEXT: .reg .f32 %f<4>;
; SM70-EMPTY:
@@ -159,8 +159,8 @@ define <2 x bfloat> @test_faddx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
; SM70-LABEL: test_faddx2(
; SM70: {
; SM70-NEXT: .reg .pred %p<3>;
-; SM70-NEXT: .reg .b16 %rs<9>;
-; SM70-NEXT: .reg .b32 %r<25>;
+; SM70-NEXT: .reg .b16 %rs<5>;
+; SM70-NEXT: .reg .b32 %r<24>;
; SM70-NEXT: .reg .f32 %f<7>;
; SM70-EMPTY:
; SM70-NEXT: // %bb.0:
@@ -170,8 +170,8 @@ define <2 x bfloat> @test_faddx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
; SM70-NEXT: cvt.u32.u16 %r3, %rs2;
; SM70-NEXT: shl.b32 %r4, %r3, 16;
; SM70-NEXT: mov.b32 %f1, %r4;
-; SM70-NEXT: mov.b32 {%rs4, %rs5}, %r1;
-; SM70-NEXT: cvt.u32.u16 %r5, %rs5;
+; SM70-NEXT: mov.b32 {%rs3, %rs4}, %r1;
+; SM70-NEXT: cvt.u32.u16 %r5, %rs4;
; SM70-NEXT: shl.b32 %r6, %r5, 16;
; SM70-NEXT: mov.b32 %f2, %r6;
; SM70-NEXT: add.rn.f32 %f3, %f2, %f1;
@@ -185,7 +185,7 @@ define <2 x bfloat> @test_faddx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
; SM70-NEXT: cvt.u32.u16 %r13, %rs1;
; SM70-NEXT: shl.b32 %r14, %r13, 16;
; SM70-NEXT: mov.b32 %f4, %r14;
-; SM70-NEXT: cvt.u32.u16 %r15, %rs4;
+; SM70-NEXT: cvt.u32.u16 %r15, %rs3;
; SM70-NEXT: shl.b32 %r16, %r15, 16;
; SM70-NEXT: mov.b32 %f5, %r16;
; SM70-NEXT: add.rn.f32 %f6, %f5, %f4;
@@ -260,8 +260,8 @@ define <2 x bfloat> @test_fsubx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
; SM70-LABEL: test_fsubx2(
; SM70: {
; SM70-NEXT: .reg .pred %p<3>;
-; SM70-NEXT: .reg .b16 %rs<9>;
-; SM70-NEXT: .reg .b32 %r<25>;
+; SM70-NEXT: .reg .b16 %rs<5>;
+; SM70-NEXT: .reg .b32 %r<24>;
; SM70-NEXT: .reg .f32 %f<7>;
; SM70-EMPTY:
; SM70-NEXT: // %bb.0:
@@ -271,8 +271,8 @@ define <2 x bfloat> @test_fsubx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
; SM70-NEXT: cvt.u32.u16 %r3, %rs2;
; SM70-NEXT: shl.b32 %r4, %r3, 16;
; SM70-NEXT: mov.b32 %f1, %r4;
-; SM70-NEXT: mov.b32 {%rs4, %rs5}, %r1;
-; SM70-NEXT: cvt.u32.u16 %r5, %rs5;
+; SM70-NEXT: mov.b32 {%rs3, %rs4}, %r1;
+; SM70-NEXT: cvt.u32.u16 %r5, %rs4;
; SM70-NEXT: shl.b32 %r6, %r5, 16;
; SM70-NEXT: mov.b32 %f2, %r6;
; SM70-NEXT: sub.rn.f32 %f3, %f2, %f1;
@@ -286,7 +286,7 @@ define <2 x bfloat> @test_fsubx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
; SM70-NEXT: cvt.u32.u16 %r13, %rs1;
; SM70-NEXT: shl.b32 %r14, %r13, 16;
; SM70-NEXT: mov.b32 %f4, %r14;
-; SM70-NEXT: cvt.u32.u16 %r15, %rs4;
+; SM70-NEXT: cvt.u32.u16 %r15, %rs3;
; SM70-NEXT: shl.b32 %r16, %r15, 16;
; SM70-NEXT: mov.b32 %f5, %r16;
; SM70-NEXT: sub.rn.f32 %f6, %f5, %f4;
@@ -361,8 +361,8 @@ define <2 x bfloat> @test_fmulx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
; SM70-LABEL: test_fmulx2(
; SM70: {
; SM70-NEXT: .reg .pred %p<3>;
-; SM70-NEXT: .reg .b16 %rs<9>;
-; SM70-NEXT: .reg .b32 %r<25>;
+; SM70-NEXT: .reg .b16 %rs<5>;
+; SM70-NEXT: .reg .b32 %r<24>;
; SM70-NEXT: .reg .f32 %f<7>;
; SM70-EMPTY:
; SM70-NEXT: // %bb.0:
@@ -372,8 +372,8 @@ define <2 x bfloat> @test_fmulx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
; SM70-NEXT: cvt.u32.u16 %r3, %rs2;
; SM70-NEXT: shl.b32 %r4, %r3, 16;
; SM70-NEXT: mov.b32 %f1, %r4;
-; SM70-NEXT: mov.b32 {%rs4, %rs5}, %r1;
-; SM70-NEXT: cvt.u32.u16 %r5, %rs5;
+; SM70-NEXT: mov.b32 {%rs3, %rs4}, %r1;
+; SM70-NEXT: cvt.u32.u16 %r5, %rs4;
; SM70-NEXT: shl.b32 %r6, %r5, 16;
; SM70-NEXT: mov.b32 %f2, %r6;
; SM70-NEXT: mul.rn.f32 %f3, %f2, %f1;
@@ -387,7 +387,7 @@ define <2 x bfloat> @test_fmulx2(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
; SM70-NEXT: cvt.u32.u16 %r13, %rs1;
; SM70-NEXT: shl.b32 %r14, %r13, 16;
; SM70-NEXT: mov.b32 %f4, %r14;
-; SM70-NEXT: cvt.u32.u16 %r15, %rs4;
+; SM70-NEXT: cvt.u32.u16 %r15, %rs3;
; SM70-NEXT: shl.b32 %r16, %r15, 16;
; SM70-NEXT: mov.b32 %f5, %r16;
; SM70-NEXT: mul.rn.f32 %f6, %f5, %f4;
@@ -462,8 +462,8 @@ define <2 x bfloat> @test_fdiv(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
; SM70-LABEL: test_fdiv(
; SM70: {
; SM70-NEXT: .reg .pred %p<3>;
-; SM70-NEXT: .reg .b16 %rs<9>;
-; SM70-NEXT: .reg .b32 %r<25>;
+; SM70-NEXT: .reg .b16 %rs<5>;
+; SM70-NEXT: .reg .b32 %r<24>;
; SM70-NEXT: .reg .f32 %f<7>;
; SM70-EMPTY:
; SM70-NEXT: // %bb.0:
@@ -473,8 +473,8 @@ define <2 x bfloat> @test_fdiv(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
; SM70-NEXT: cvt.u32.u16 %r3, %rs2;
; SM70-NEXT: shl.b32 %r4, %r3, 16;
; SM70-NEXT: mov.b32 %f1, %r4;
-; SM70-NEXT: mov.b32 {%rs4, %rs5}, %r1;
-; SM70-NEXT: cvt.u32.u16 %r5, %rs5;
+; SM70-NEXT: mov.b32 {%rs3, %rs4}, %r1;
+; SM70-NEXT: cvt.u32.u16 %r5, %rs4;
; SM70-NEXT: shl.b32 %r6, %r5, 16;
; SM70-NEXT: mov.b32 %f2, %r6;
; SM70-NEXT: div.rn.f32 %f3, %f2, %f1;
@@ -488,7 +488,7 @@ define <2 x bfloat> @test_fdiv(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
; SM70-NEXT: cvt.u32.u16 %r13, %rs1;
; SM70-NEXT: shl.b32 %r14, %r13, 16;
; SM70-NEXT: mov.b32 %f4, %r14;
-; SM70-NEXT: cvt.u32.u16 %r15, %rs4;
+; SM70-NEXT: cvt.u32.u16 %r15, %rs3;
; SM70-NEXT: shl.b32 %r16, %r15, 16;
; SM70-NEXT: mov.b32 %f5, %r16;
; SM70-NEXT: div.rn.f32 %f6, %f5, %f4;
@@ -648,7 +648,7 @@ define bfloat @test_fptrunc_float(float %a) #0 {
; SM70-LABEL: test_fptrunc_float(
; SM70: {
; SM70-NEXT: .reg .pred %p<2>;
-; SM70-NEXT: .reg .b16 %rs<3>;
+; SM70-NEXT: .reg .b16 %rs<2>;
; SM70-NEXT: .reg .b32 %r<7>;
; SM70-NEXT: .reg .f32 %f<2>;
; SM70-EMPTY:
@@ -705,7 +705,7 @@ define bfloat @test_fadd_imm_1(bfloat %a) #0 {
; SM70-LABEL: test_fadd_imm_1(
; SM70: {
; SM70-NEXT: .reg .pred %p<2>;
-; SM70-NEXT: .reg .b16 %rs<3>;
+; SM70-NEXT: .reg .b16 %rs<2>;
; SM70-NEXT: .reg .b32 %r<9>;
; SM70-NEXT: .reg .f32 %f<3>;
; SM70-EMPTY:
@@ -789,7 +789,7 @@ define bfloat @test_select_cc_bf16_f64(double %a, double %b, bfloat %c, bfloat %
define <8 x float> @test_extload_bf16x8(ptr addrspace(3) noundef %arg) #0 {
; SM70-LABEL: test_extload_bf16x8(
; SM70: {
-; SM70-NEXT: .reg .b16 %rs<17>;
+; SM70-NEXT: .reg .b16 %rs<9>;
; SM70-NEXT: .reg .b32 %r<21>;
; SM70-NEXT: .reg .f32 %f<9>;
; SM70-NEXT: .reg .b64 %rd<2>;
@@ -1033,7 +1033,7 @@ define bfloat @test_sitofp_i16(i16 %a) {
; SM70-LABEL: test_sitofp_i16(
; SM70: {
; SM70-NEXT: .reg .pred %p<2>;
-; SM70-NEXT: .reg .b16 %rs<4>;
+; SM70-NEXT: .reg .b16 %rs<3>;
; SM70-NEXT: .reg .b32 %r<7>;
; SM70-NEXT: .reg .f32 %f<2>;
; SM70-EMPTY:
@@ -1092,7 +1092,7 @@ define bfloat @test_uitofp_i8(i8 %a) {
; SM70-LABEL: test_uitofp_i8(
; SM70: {
; SM70-NEXT: .reg .pred %p<2>;
-; SM70-NEXT: .reg .b16 %rs<4>;
+; SM70-NEXT: .reg .b16 %rs<3>;
; SM70-NEXT: .reg .b32 %r<7>;
; SM70-NEXT: .reg .f32 %f<2>;
; SM70-EMPTY:
@@ -1151,7 +1151,7 @@ define bfloat @test_uitofp_i1(i1 %a) {
; SM70-LABEL: test_uitofp_i1(
; SM70: {
; SM70-NEXT: .reg .pred %p<3>;
-; SM70-NEXT: .reg .b16 %rs<5>;
+; SM70-NEXT: .reg .b16 %rs<4>;
; SM70-NEXT: .reg .b32 %r<8>;
; SM70-NEXT: .reg .f32 %f<2>;
; SM70-EMPTY:
@@ -1228,7 +1228,7 @@ define bfloat @test_uitofp_i16(i16 %a) {
; SM70-LABEL: test_uitofp_i16(
; SM70: {
; SM70-NEXT: .reg .pred %p<2>;
-; SM70-NEXT: .reg .b16 %rs<4>;
+; SM70-NEXT: .reg .b16 %rs<3>;
; SM70-NEXT: .reg .b32 %r<7>;
; SM70-NEXT: .reg .f32 %f<2>;
; SM70-EMPTY:
@@ -1287,7 +1287,7 @@ define bfloat @test_uitofp_i32(i32 %a) {
; SM70-LABEL: test_uitofp_i32(
; SM70: {
; SM70-NEXT: .reg .pred %p<2>;
-; SM70-NEXT: .reg .b16 %rs<3>;
+; SM70-NEXT: .reg .b16 %rs<2>;
; SM70-NEXT: .reg .b32 %r<8>;
; SM70-NEXT: .reg .f32 %f<2>;
; SM70-EMPTY:
@@ -1349,7 +1349,7 @@ define bfloat @test_uitofp_i64(i64 %a) {
; SM70-LABEL: test_uitofp_i64(
; SM70: ...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These nodes are all erased by a later pass
What pass does this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
NVPTXProxyRegErasure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
LGTM.
; CHECK-NEXT: { .reg .b32 tmp; mov.b64 {tmp, %r1}, %rd1; } | ||
; CHECK-NEXT: cvt.u32.u64 %r2, %rd1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future improvement opportunity: Coalesce partial mov and cvt into mov.b64 {%r2, %r1}, %rd1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely! I've been thinking about this for a while.
@Artem-B, @justinfargnoli heads up, I've done some further cleanup in 2756e29 while addressing Justin's request. If you have a minute please take another look. |
// There's no way to specify FP16 and BF16 immediates in .(b)f16 ops, so we | ||
// have to load them into an .(b)f16 register first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're deleting this code because the comment is false, right? Or am I misunderstanding something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the source code comment is false, then the update LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is sort of true but it doesn't justify making things the way they were. It is simpler and cleaner to replace these with normally named Mov instructions and to use tablegen to generate the ISel logic. This is handled the same way any other instruction where we do not support an immediate operand is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's been fixed in ptxas: https://godbolt.org/z/d8EcMevc8
If anything, recent versions seem to be more restrictive than the older ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with removing this code as long as we can still handle constants correctly. This should already be covered by the original tests in llvm/test/CodeGen/NVPTX/f16-instructions.ll
if LLVM tests are happy after the run with ptxas enabled, we should be fine.
Avoid introducing
ProxyReg
andMOV
nodes during ISel when loweringbitconvert
or similar operations. These nodes are all erased by a later pass but not introducing them in the first place is simpler and likely saves compile time.Also remove redundant
MOV
instruction definitions.