Skip to content

[NVPTX] Avoid introducing unnecessary ProxyRegs and Movs in ISel #120486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 0 additions & 19 deletions llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -176,10 +176,6 @@ void NVPTXDAGToDAGISel::Select(SDNode *N) {
case ISD::ADDRSPACECAST:
SelectAddrSpaceCast(N);
return;
case ISD::ConstantFP:
if (tryConstantFP(N))
return;
break;
case ISD::CopyToReg: {
if (N->getOperand(1).getValueType() == MVT::i128) {
SelectV2I64toI128(N);
Expand Down Expand Up @@ -212,21 +208,6 @@ bool NVPTXDAGToDAGISel::tryIntrinsicChain(SDNode *N) {
}
}

// There's no way to specify FP16 and BF16 immediates in .(b)f16 ops, so we
// have to load them into an .(b)f16 register first.
Comment on lines -215 to -216
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're deleting this code because the comment is false, right? Or am I misunderstanding something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the source code comment is false, then the update LGTM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is sort of true but it doesn't justify making things the way they were. It is simpler and cleaner to replace these with normally named Mov instructions and to use tablegen to generate the ISel logic. This is handled the same way any other instruction where we do not support an immediate operand is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's been fixed in ptxas: https://godbolt.org/z/d8EcMevc8

If anything, recent versions seem to be more restrictive than the older ones.

Copy link
Member

@Artem-B Artem-B Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with removing this code as long as we can still handle constants correctly. This should already be covered by the original tests in llvm/test/CodeGen/NVPTX/f16-instructions.ll

if LLVM tests are happy after the run with ptxas enabled, we should be fine.

bool NVPTXDAGToDAGISel::tryConstantFP(SDNode *N) {
if (N->getValueType(0) != MVT::f16 && N->getValueType(0) != MVT::bf16)
return false;
SDValue Val = CurDAG->getTargetConstantFP(
cast<ConstantFPSDNode>(N)->getValueAPF(), SDLoc(N), N->getValueType(0));
SDNode *LoadConstF16 = CurDAG->getMachineNode(
(N->getValueType(0) == MVT::f16 ? NVPTX::LOAD_CONST_F16
: NVPTX::LOAD_CONST_BF16),
SDLoc(N), N->getValueType(0), Val);
ReplaceNode(N, LoadConstF16);
return true;
}

// Map ISD:CONDCODE value to appropriate CmpMode expected by
// NVPTXInstPrinter::printCmpMode()
static unsigned getPTXCmpMode(const CondCodeSDNode &CondCode, bool FTZ) {
Expand Down
140 changes: 33 additions & 107 deletions llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
Original file line number Diff line number Diff line change
Expand Up @@ -1177,17 +1177,6 @@ def NegDoubleConst : SDNodeXForm<fpimm, [{
SDLoc(N), MVT::f64);
}]>;

// Loads FP16 constant into a register.
//
// ptxas does not have hex representation for fp16, so we can't use
// fp16 immediate values in .f16 instructions. Instead we have to load
// the constant into a register using mov.b16.
def LOAD_CONST_F16 :
NVPTXInst<(outs Int16Regs:$dst), (ins f16imm:$a),
"mov.b16 \t$dst, $a;", []>;
def LOAD_CONST_BF16 :
NVPTXInst<(outs Int16Regs:$dst), (ins bf16imm:$a),
"mov.b16 \t$dst, $a;", []>;
defm FADD : F3_fma_component<"add", fadd>;
defm FSUB : F3_fma_component<"sub", fsub>;
defm FMUL : F3_fma_component<"mul", fmul>;
Expand Down Expand Up @@ -1963,7 +1952,7 @@ let hasSideEffects = false in {


// copyPhysreg is hard-coded in NVPTXInstrInfo.cpp
let IsSimpleMove=1, hasSideEffects=0 in {
let IsSimpleMove=1, hasSideEffects=0, isAsCheapAsAMove=1 in {
def IMOV1rr : NVPTXInst<(outs Int1Regs:$dst), (ins Int1Regs:$sss),
"mov.pred \t$dst, $sss;", []>;
def IMOV16rr : NVPTXInst<(outs Int16Regs:$dst), (ins Int16Regs:$sss),
Expand All @@ -1975,48 +1964,37 @@ let IsSimpleMove=1, hasSideEffects=0 in {
def IMOV128rr : NVPTXInst<(outs Int128Regs:$dst), (ins Int128Regs:$sss),
"mov.b128 \t$dst, $sss;", []>;

def IMOVB16rr : NVPTXInst<(outs Int16Regs:$dst), (ins Int16Regs:$sss),
"mov.b16 \t$dst, $sss;", []>;
def IMOVB32rr : NVPTXInst<(outs Int32Regs:$dst), (ins Int32Regs:$sss),
"mov.b32 \t$dst, $sss;", []>;
def IMOVB64rr : NVPTXInst<(outs Int64Regs:$dst), (ins Int64Regs:$sss),
"mov.b64 \t$dst, $sss;", []>;

def FMOV16rr : NVPTXInst<(outs Int16Regs:$dst), (ins Int16Regs:$src),
// We have to use .b16 here as there's no mov.f16.
"mov.b16 \t$dst, $src;", []>;
def FMOV32rr : NVPTXInst<(outs Float32Regs:$dst), (ins Float32Regs:$src),
"mov.f32 \t$dst, $src;", []>;
def FMOV64rr : NVPTXInst<(outs Float64Regs:$dst), (ins Float64Regs:$src),
"mov.f64 \t$dst, $src;", []>;
}

def IMOV1ri : NVPTXInst<(outs Int1Regs:$dst), (ins i1imm:$src),
"mov.pred \t$dst, $src;",
[(set i1:$dst, imm:$src)]>;
def IMOV16ri : NVPTXInst<(outs Int16Regs:$dst), (ins i16imm:$src),
"mov.u16 \t$dst, $src;",
[(set i16:$dst, imm:$src)]>;
def IMOV32ri : NVPTXInst<(outs Int32Regs:$dst), (ins i32imm:$src),
"mov.u32 \t$dst, $src;",
[(set i32:$dst, imm:$src)]>;
def IMOV64ri : NVPTXInst<(outs Int64Regs:$dst), (ins i64imm:$src),
"mov.u64 \t$dst, $src;",
[(set i64:$dst, imm:$src)]>;

def IMOVB16ri : NVPTXInst<(outs Int16Regs:$dst), (ins i16imm:$src),
"mov.b16 \t$dst, $src;", []>;
def IMOVB32ri : NVPTXInst<(outs Int32Regs:$dst), (ins i32imm:$src),
"mov.b32 \t$dst, $src;", []>;
def IMOVB64ri : NVPTXInst<(outs Int64Regs:$dst), (ins i64imm:$src),
"mov.b64 \t$dst, $src;", []>;

def FMOV32ri : NVPTXInst<(outs Float32Regs:$dst), (ins f32imm:$src),
"mov.f32 \t$dst, $src;",
[(set f32:$dst, fpimm:$src)]>;
def FMOV64ri : NVPTXInst<(outs Float64Regs:$dst), (ins f64imm:$src),
"mov.f64 \t$dst, $src;",
[(set f64:$dst, fpimm:$src)]>;
def IMOV1ri : NVPTXInst<(outs Int1Regs:$dst), (ins i1imm:$src),
"mov.pred \t$dst, $src;",
[(set i1:$dst, imm:$src)]>;
def IMOV16ri : NVPTXInst<(outs Int16Regs:$dst), (ins i16imm:$src),
"mov.b16 \t$dst, $src;",
[(set i16:$dst, imm:$src)]>;
def IMOV32ri : NVPTXInst<(outs Int32Regs:$dst), (ins i32imm:$src),
"mov.b32 \t$dst, $src;",
[(set i32:$dst, imm:$src)]>;
def IMOV64ri : NVPTXInst<(outs Int64Regs:$dst), (ins i64imm:$src),
"mov.b64 \t$dst, $src;",
[(set i64:$dst, imm:$src)]>;

def FMOV16ri : NVPTXInst<(outs Int16Regs:$dst), (ins f16imm:$src),
"mov.b16 \t$dst, $src;",
[(set f16:$dst, fpimm:$src)]>;
def BFMOV16ri : NVPTXInst<(outs Int16Regs:$dst), (ins bf16imm:$src),
"mov.b16 \t$dst, $src;",
[(set bf16:$dst, fpimm:$src)]>;
def FMOV32ri : NVPTXInst<(outs Float32Regs:$dst), (ins f32imm:$src),
"mov.f32 \t$dst, $src;",
[(set f32:$dst, fpimm:$src)]>;
def FMOV64ri : NVPTXInst<(outs Float64Regs:$dst), (ins f64imm:$src),
"mov.f64 \t$dst, $src;",
[(set f64:$dst, fpimm:$src)]>;
}

def : Pat<(i32 (Wrapper texternalsym:$dst)), (IMOV32ri texternalsym:$dst)>;
def : Pat<(i64 (Wrapper texternalsym:$dst)), (IMOV64ri texternalsym:$dst)>;
Expand Down Expand Up @@ -2215,18 +2193,6 @@ multiclass FSET_FORMAT<PatFrag OpNode, PatLeaf Mode, PatLeaf ModeFTZ> {
def : Pat<(i1 (OpNode f16:$a, f16:$b)),
(SETP_f16rr Int16Regs:$a, Int16Regs:$b, Mode)>,
Requires<[useFP16Math]>;
def : Pat<(i1 (OpNode f16:$a, fpimm:$b)),
(SETP_f16rr Int16Regs:$a, (LOAD_CONST_F16 fpimm:$b), ModeFTZ)>,
Requires<[useFP16Math,doF32FTZ]>;
def : Pat<(i1 (OpNode f16:$a, fpimm:$b)),
(SETP_f16rr Int16Regs:$a, (LOAD_CONST_F16 fpimm:$b), Mode)>,
Requires<[useFP16Math]>;
def : Pat<(i1 (OpNode fpimm:$a, f16:$b)),
(SETP_f16rr (LOAD_CONST_F16 fpimm:$a), Int16Regs:$b, ModeFTZ)>,
Requires<[useFP16Math,doF32FTZ]>;
def : Pat<(i1 (OpNode fpimm:$a, f16:$b)),
(SETP_f16rr (LOAD_CONST_F16 fpimm:$a), Int16Regs:$b, Mode)>,
Requires<[useFP16Math]>;

// bf16 -> pred
def : Pat<(i1 (OpNode bf16:$a, bf16:$b)),
Expand All @@ -2235,18 +2201,6 @@ multiclass FSET_FORMAT<PatFrag OpNode, PatLeaf Mode, PatLeaf ModeFTZ> {
def : Pat<(i1 (OpNode bf16:$a, bf16:$b)),
(SETP_bf16rr Int16Regs:$a, Int16Regs:$b, Mode)>,
Requires<[hasBF16Math]>;
def : Pat<(i1 (OpNode bf16:$a, fpimm:$b)),
(SETP_bf16rr Int16Regs:$a, (LOAD_CONST_BF16 fpimm:$b), ModeFTZ)>,
Requires<[hasBF16Math,doF32FTZ]>;
def : Pat<(i1 (OpNode bf16:$a, fpimm:$b)),
(SETP_bf16rr Int16Regs:$a, (LOAD_CONST_BF16 fpimm:$b), Mode)>,
Requires<[hasBF16Math]>;
def : Pat<(i1 (OpNode fpimm:$a, bf16:$b)),
(SETP_bf16rr (LOAD_CONST_BF16 fpimm:$a), Int16Regs:$b, ModeFTZ)>,
Requires<[hasBF16Math,doF32FTZ]>;
def : Pat<(i1 (OpNode fpimm:$a, bf16:$b)),
(SETP_bf16rr (LOAD_CONST_BF16 fpimm:$a), Int16Regs:$b, Mode)>,
Requires<[hasBF16Math]>;

// f32 -> pred
def : Pat<(i1 (OpNode f32:$a, f32:$b)),
Expand Down Expand Up @@ -2280,18 +2234,6 @@ multiclass FSET_FORMAT<PatFrag OpNode, PatLeaf Mode, PatLeaf ModeFTZ> {
def : Pat<(i32 (OpNode f16:$a, f16:$b)),
(SET_f16rr Int16Regs:$a, Int16Regs:$b, Mode)>,
Requires<[useFP16Math]>;
def : Pat<(i32 (OpNode f16:$a, fpimm:$b)),
(SET_f16rr Int16Regs:$a, (LOAD_CONST_F16 fpimm:$b), ModeFTZ)>,
Requires<[useFP16Math, doF32FTZ]>;
def : Pat<(i32 (OpNode f16:$a, fpimm:$b)),
(SET_f16rr Int16Regs:$a, (LOAD_CONST_F16 fpimm:$b), Mode)>,
Requires<[useFP16Math]>;
def : Pat<(i32 (OpNode fpimm:$a, f16:$b)),
(SET_f16ir (LOAD_CONST_F16 fpimm:$a), Int16Regs:$b, ModeFTZ)>,
Requires<[useFP16Math, doF32FTZ]>;
def : Pat<(i32 (OpNode fpimm:$a, f16:$b)),
(SET_f16ir (LOAD_CONST_F16 fpimm:$a), Int16Regs:$b, Mode)>,
Requires<[useFP16Math]>;

// bf16 -> i32
def : Pat<(i32 (OpNode bf16:$a, bf16:$b)),
Expand All @@ -2300,18 +2242,6 @@ multiclass FSET_FORMAT<PatFrag OpNode, PatLeaf Mode, PatLeaf ModeFTZ> {
def : Pat<(i32 (OpNode bf16:$a, bf16:$b)),
(SET_bf16rr Int16Regs:$a, Int16Regs:$b, Mode)>,
Requires<[hasBF16Math]>;
def : Pat<(i32 (OpNode bf16:$a, fpimm:$b)),
(SET_bf16rr Int16Regs:$a, (LOAD_CONST_BF16 fpimm:$b), ModeFTZ)>,
Requires<[hasBF16Math, doF32FTZ]>;
def : Pat<(i32 (OpNode bf16:$a, fpimm:$b)),
(SET_bf16rr Int16Regs:$a, (LOAD_CONST_BF16 fpimm:$b), Mode)>,
Requires<[hasBF16Math]>;
def : Pat<(i32 (OpNode fpimm:$a, bf16:$b)),
(SET_bf16ir (LOAD_CONST_BF16 fpimm:$a), Int16Regs:$b, ModeFTZ)>,
Requires<[hasBF16Math, doF32FTZ]>;
def : Pat<(i32 (OpNode fpimm:$a, bf16:$b)),
(SET_bf16ir (LOAD_CONST_BF16 fpimm:$a), Int16Regs:$b, Mode)>,
Requires<[hasBF16Math]>;

// f32 -> i32
def : Pat<(i32 (OpNode f32:$a, f32:$b)),
Expand Down Expand Up @@ -3104,21 +3034,17 @@ def: Pat<(f32 (bitconvert vt:$a)),
(BITCONVERT_32_I2F Int32Regs:$a)>;
}
foreach vt = [f16, bf16] in {
def: Pat<(vt (bitconvert (i16 UInt16Const:$a))),
(IMOVB16ri UInt16Const:$a)>;
def: Pat<(vt (bitconvert i16:$a)),
(ProxyRegI16 Int16Regs:$a)>;
def: Pat<(i16 (bitconvert vt:$a)),
(ProxyRegI16 Int16Regs:$a)>;
def: Pat<(vt (bitconvert i16:$a)),
(vt Int16Regs:$a)>;
def: Pat<(i16 (bitconvert vt:$a)),
(i16 Int16Regs:$a)>;
}

foreach ta = [v2f16, v2bf16, v2i16, v4i8, i32] in {
def: Pat<(ta (bitconvert (i32 UInt32Const:$a))),
(IMOVB32ri UInt32Const:$a)>;
foreach tb = [v2f16, v2bf16, v2i16, v4i8, i32] in {
if !ne(ta, tb) then {
def: Pat<(ta (bitconvert (tb Int32Regs:$a))),
(ProxyRegI32 Int32Regs:$a)>;
def: Pat<(ta (bitconvert tb:$a)),
(ta Int32Regs:$a)>;
}
}
}
Expand Down
4 changes: 2 additions & 2 deletions llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
Original file line number Diff line number Diff line change
Expand Up @@ -2803,10 +2803,10 @@ def : Pat<(int_nvvm_ptr_param_to_gen i64:$src),

// nvvm.ptr.gen.to.param
def : Pat<(int_nvvm_ptr_gen_to_param i32:$src),
(IMOV32rr Int32Regs:$src)>;
(i32 Int32Regs:$src)>;

def : Pat<(int_nvvm_ptr_gen_to_param i64:$src),
(IMOV64rr Int64Regs:$src)>;
(i64 Int64Regs:$src)>;

// nvvm.move intrinsicc
def nvvm_move_i16 : NVPTXInst<(outs Int16Regs:$r), (ins Int16Regs:$s),
Expand Down
32 changes: 17 additions & 15 deletions llvm/test/CodeGen/NVPTX/atomics-sm70.ll
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, half %
; CHECKPTX62-LABEL: test(
; CHECKPTX62: {
; CHECKPTX62-NEXT: .reg .pred %p<5>;
; CHECKPTX62-NEXT: .reg .b16 %rs<19>;
; CHECKPTX62-NEXT: .reg .b16 %rs<11>;
; CHECKPTX62-NEXT: .reg .b32 %r<58>;
; CHECKPTX62-EMPTY:
; CHECKPTX62-NEXT: // %bb.0:
Expand All @@ -65,8 +65,8 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, half %
; CHECKPTX62-NEXT: // =>This Inner Loop Header: Depth=1
; CHECKPTX62-NEXT: shr.u32 %r28, %r54, %r2;
; CHECKPTX62-NEXT: cvt.u16.u32 %rs2, %r28;
; CHECKPTX62-NEXT: add.rn.f16 %rs4, %rs2, %rs1;
; CHECKPTX62-NEXT: cvt.u32.u16 %r29, %rs4;
; CHECKPTX62-NEXT: add.rn.f16 %rs3, %rs2, %rs1;
; CHECKPTX62-NEXT: cvt.u32.u16 %r29, %rs3;
; CHECKPTX62-NEXT: shl.b32 %r30, %r29, %r2;
; CHECKPTX62-NEXT: and.b32 %r31, %r54, %r3;
; CHECKPTX62-NEXT: or.b32 %r32, %r31, %r30;
Expand All @@ -79,10 +79,10 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, half %
; CHECKPTX62-NEXT: $L__BB0_3: // %atomicrmw.start27
; CHECKPTX62-NEXT: // =>This Inner Loop Header: Depth=1
; CHECKPTX62-NEXT: shr.u32 %r33, %r55, %r2;
; CHECKPTX62-NEXT: cvt.u16.u32 %rs6, %r33;
; CHECKPTX62-NEXT: mov.b16 %rs8, 0x3C00;
; CHECKPTX62-NEXT: add.rn.f16 %rs9, %rs6, %rs8;
; CHECKPTX62-NEXT: cvt.u32.u16 %r34, %rs9;
; CHECKPTX62-NEXT: cvt.u16.u32 %rs4, %r33;
; CHECKPTX62-NEXT: mov.b16 %rs5, 0x3C00;
; CHECKPTX62-NEXT: add.rn.f16 %rs6, %rs4, %rs5;
; CHECKPTX62-NEXT: cvt.u32.u16 %r34, %rs6;
; CHECKPTX62-NEXT: shl.b32 %r35, %r34, %r2;
; CHECKPTX62-NEXT: and.b32 %r36, %r55, %r3;
; CHECKPTX62-NEXT: or.b32 %r37, %r36, %r35;
Expand All @@ -94,15 +94,16 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, half %
; CHECKPTX62-NEXT: and.b32 %r10, %r22, -4;
; CHECKPTX62-NEXT: shl.b32 %r38, %r22, 3;
; CHECKPTX62-NEXT: and.b32 %r11, %r38, 24;
; CHECKPTX62-NEXT: shl.b32 %r40, %r26, %r11;
; CHECKPTX62-NEXT: mov.b32 %r39, 65535;
; CHECKPTX62-NEXT: shl.b32 %r40, %r39, %r11;
; CHECKPTX62-NEXT: not.b32 %r12, %r40;
; CHECKPTX62-NEXT: ld.global.u32 %r56, [%r10];
; CHECKPTX62-NEXT: $L__BB0_5: // %atomicrmw.start9
; CHECKPTX62-NEXT: // =>This Inner Loop Header: Depth=1
; CHECKPTX62-NEXT: shr.u32 %r41, %r56, %r11;
; CHECKPTX62-NEXT: cvt.u16.u32 %rs11, %r41;
; CHECKPTX62-NEXT: add.rn.f16 %rs13, %rs11, %rs1;
; CHECKPTX62-NEXT: cvt.u32.u16 %r42, %rs13;
; CHECKPTX62-NEXT: cvt.u16.u32 %rs7, %r41;
; CHECKPTX62-NEXT: add.rn.f16 %rs8, %rs7, %rs1;
; CHECKPTX62-NEXT: cvt.u32.u16 %r42, %rs8;
; CHECKPTX62-NEXT: shl.b32 %r43, %r42, %r11;
; CHECKPTX62-NEXT: and.b32 %r44, %r56, %r12;
; CHECKPTX62-NEXT: or.b32 %r45, %r44, %r43;
Expand All @@ -114,15 +115,16 @@ define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, half %
; CHECKPTX62-NEXT: and.b32 %r16, %r23, -4;
; CHECKPTX62-NEXT: shl.b32 %r46, %r23, 3;
; CHECKPTX62-NEXT: and.b32 %r17, %r46, 24;
; CHECKPTX62-NEXT: shl.b32 %r48, %r26, %r17;
; CHECKPTX62-NEXT: mov.b32 %r47, 65535;
; CHECKPTX62-NEXT: shl.b32 %r48, %r47, %r17;
; CHECKPTX62-NEXT: not.b32 %r18, %r48;
; CHECKPTX62-NEXT: ld.shared.u32 %r57, [%r16];
; CHECKPTX62-NEXT: $L__BB0_7: // %atomicrmw.start
; CHECKPTX62-NEXT: // =>This Inner Loop Header: Depth=1
; CHECKPTX62-NEXT: shr.u32 %r49, %r57, %r17;
; CHECKPTX62-NEXT: cvt.u16.u32 %rs15, %r49;
; CHECKPTX62-NEXT: add.rn.f16 %rs17, %rs15, %rs1;
; CHECKPTX62-NEXT: cvt.u32.u16 %r50, %rs17;
; CHECKPTX62-NEXT: cvt.u16.u32 %rs9, %r49;
; CHECKPTX62-NEXT: add.rn.f16 %rs10, %rs9, %rs1;
; CHECKPTX62-NEXT: cvt.u32.u16 %r50, %rs10;
; CHECKPTX62-NEXT: shl.b32 %r51, %r50, %r17;
; CHECKPTX62-NEXT: and.b32 %r52, %r57, %r18;
; CHECKPTX62-NEXT: or.b32 %r53, %r52, %r51;
Expand Down
Loading
Loading