Skip to content

[NVPTX] Set v2i16 SETCC to Expand #108969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 19, 2024

Conversation

frasercrmck
Copy link
Contributor

@frasercrmck frasercrmck commented Sep 17, 2024

Note that this refers to the return type of SETCC. This operation is not
legal in PTX but was assumed as such because v2i16 is declared a legal
type. We were already expanding v4i8 SETCC.

The DAGCombiner would in certain circumstances try to fold an extension
of an illegal v2i1 SETCC (because v2i1 is illegal) into a "legal" v2i16
SETCC, which we wouldn't have patterns for.

@llvmbot
Copy link
Member

llvmbot commented Sep 17, 2024

@llvm/pr-subscribers-backend-nvptx

Author: Fraser Cormack (frasercrmck)

Changes

Note that this field is a property of SelectionDAG rather than one of the ISA. The previous value wasn't incorrect, per se, but since NVPTX uses 1-bit predicate registers, either choices are possible.

The problem with using zero-or-negative-one manifested in issues during instruction selection where LLVM would fold a sign-extension of a setcc into the setcc itself, which isn't a legal operation.

We could alternatively let this happen and then custom expand it back into a separate setcc + sext later. Or we could provide explicit patterns for the setp/selp sequence. However, preventing it from forming in the first place feels more appropriate to the ISA.

It does not appear to have a significant knock-on effect on codegen tests, aside from one minor change which looks harmless:

selp.s64 	%rd6, -1, 0, %p1;
add.s64 	%rd7, %rd5, %rd6;

to

selp.u64 	%rd6, 1, 0, %p1;
sub.s64 	%rd7, %rd5, %rd6;

Full diff: https://github.com/llvm/llvm-project/pull/108969.diff

3 Files Affected:

  • (modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/add-sub-128bit.ll (+2-2)
  • (added) llvm/test/CodeGen/NVPTX/sext-setcc.ll (+29)
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index c5a40e4308860c..4495fdb45e74c9 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -416,8 +416,8 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
   MaxStoresPerMemcpy = MaxStoresPerMemcpyOptSize = (unsigned) 0xFFFFFFFF;
   MaxStoresPerMemmove = MaxStoresPerMemmoveOptSize = (unsigned) 0xFFFFFFFF;
 
-  setBooleanContents(ZeroOrNegativeOneBooleanContent);
-  setBooleanVectorContents(ZeroOrNegativeOneBooleanContent);
+  setBooleanContents(ZeroOrOneBooleanContent);
+  setBooleanVectorContents(ZeroOrOneBooleanContent);
 
   // Jump is Expensive. Don't create extra control flow for 'and', 'or'
   // condition branches.
diff --git a/llvm/test/CodeGen/NVPTX/add-sub-128bit.ll b/llvm/test/CodeGen/NVPTX/add-sub-128bit.ll
index 9d451e90650df1..26eb5fb223a6ca 100644
--- a/llvm/test/CodeGen/NVPTX/add-sub-128bit.ll
+++ b/llvm/test/CodeGen/NVPTX/add-sub-128bit.ll
@@ -23,8 +23,8 @@ define i128 @test_add(i128 %a, i128 %b) {
 define i128 @test_sub(i128 %a, i128 %b) {
 ; NOCARRY:        sub.s64
 ; NOCARRY-NEXT:   setp.lt.u64
-; NOCARRY-NEXT:   selp.s64
-; NOCARRY-NEXT:   add.s64
+; NOCARRY-NEXT:   selp.u64
+; NOCARRY-NEXT:   sub.s64
 ; NOCARRY-NEXT:   sub.s64
 
 ; CARRY:          sub.cc.s64
diff --git a/llvm/test/CodeGen/NVPTX/sext-setcc.ll b/llvm/test/CodeGen/NVPTX/sext-setcc.ll
new file mode 100644
index 00000000000000..4b56100f347f49
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/sext-setcc.ll
@@ -0,0 +1,29 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 | FileCheck %s
+; RUN: %if ptxas-11.0 %{ llc < %s -march=nvptx64 -mcpu=sm_80 -mattr=+ptx70 | %ptxas-verify -arch=sm_80 %}
+
+define <2 x i16> @sext_setcc(ptr %p) {
+; CHECK-LABEL: sext_setcc(
+; CHECK:       {
+; CHECK-NEXT:    .reg .pred %p<3>;
+; CHECK-NEXT:    .reg .b16 %rs<5>;
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-NEXT:    .reg .b64 %rd<2>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0: // %entry
+; CHECK-NEXT:    ld.param.u64 %rd1, [sext_setcc_param_0];
+; CHECK-NEXT:    ld.u32 %r1, [%rd1];
+; CHECK-NEXT:    mov.b32 {%rs1, %rs2}, %r1;
+; CHECK-NEXT:    setp.eq.s16 %p1, %rs1, 0;
+; CHECK-NEXT:    setp.eq.s16 %p2, %rs2, 0;
+; CHECK-NEXT:    selp.s16 %rs3, -1, 0, %p2;
+; CHECK-NEXT:    selp.s16 %rs4, -1, 0, %p1;
+; CHECK-NEXT:    mov.b32 %r2, {%rs4, %rs3};
+; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r2;
+; CHECK-NEXT:    ret;
+entry:
+  %v = load <2 x i16>, ptr %p, align 4
+  %cmp = icmp eq <2 x i16> %v, zeroinitializer
+  %sext = sext <2 x i1> %cmp to <2 x i16>
+  ret <2 x i16> %sext
+}

@justinfargnoli
Copy link
Contributor

justinfargnoli commented Sep 17, 2024

Notes:

Note that this field is a property of SelectionDAG rather than one of the ISA. The previous value wasn't incorrect, per se, but since NVPTX uses 1-bit predicate registers, either choice is possible.

This is correct (source: Section 4.5 "Constants" of PTX ISA):

For predicate-type data and instructions, integer constants are allowed and are interpreted as in C, i.e., zero values are False and non-zero values are True.


Compiler Explorer reproducer.

Potentially incorrect transformation occurs during DAG optimization:

Initial selection DAG: %bb.0 'sext_setcc:entry'
SelectionDAG has 14 nodes:
  t0: ch,glue = EntryToken
              t4: v1i32,ch = load<(dereferenceable invariant load (s32) from `ptr addrspace(101) null`, addrspace 101)> t0, TargetExternalSymbol:i32'sext_setcc_param_0', undef:i32
            t5: i32 = extract_vector_elt t4, Constant:i32<0>
          t6: v2i16,ch = load<(load (s32) from %ir.p)> t0, t5, undef:i32
          t8: v2i16 = BUILD_VECTOR Constant:i16<0>, Constant:i16<0>
        t10: v2i1 = setcc t6, t8, seteq:ch
      t11: v2i16 = sign_extend t10
    t12: ch = NVPTXISD::StoreRetval<(store (s32), align 1)> t0, Constant:i32<0>, t11
  t13: ch = NVPTXISD::RET_GLUE t12



Optimized lowered selection DAG: %bb.0 'sext_setcc:entry'
SelectionDAG has 13 nodes:
  t0: ch,glue = EntryToken
            t4: v1i32,ch = load<(dereferenceable invariant load (s32) from `ptr addrspace(101) null`, addrspace 101)> t0, TargetExternalSymbol:i32'sext_setcc_param_0', undef:i32
          t5: i32 = extract_vector_elt t4, Constant:i32<0>
        t6: v2i16,ch = load<(load (s32) from %ir.p)> t0, t5, undef:i32
        t8: v2i16 = BUILD_VECTOR Constant:i16<0>, Constant:i16<0>
      t14: v2i16 = setcc t6, t8, seteq:ch
    t12: ch = NVPTXISD::StoreRetval<(store (s32), align 1)> t0, Constant:i32<0>, t14
  t13: ch = NVPTXISD::RET_GLUE t12

@justinfargnoli
Copy link
Contributor

justinfargnoli commented Sep 17, 2024

Where in the codebase is the sext(setcc) -> setcc DAG combine performed?

I'm curious to see how this change will disable the transformation.

Copy link
Contributor

@justinfargnoli justinfargnoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I'm not opposed to this change for its own sake, I'm not confident it's the correct fix for this bug.

Based on my understanding, the root cause of this issue is that v2i16 setcc isn't being recognized as an illegal operation and then legalized to v2i16 sext(v2i1 setcc).

If that's correct, then I think we'd want to ensure that v2i16 setcc is correctly legalized instead of trying to ensure that the DAG combiner never generates a v2i16 setcc.

@topperc
Copy link
Collaborator

topperc commented Sep 17, 2024

I'll take a look at this. AVX512 on X86 is similar so let me see how we handled this.

@topperc
Copy link
Collaborator

topperc commented Sep 17, 2024

So it looks like DAGCombiner sees that getSetCCResultType returns v2i1, but then sees that v2i1 type isn't legal.

Had it not been extended it looks like v2i1 setccs get scalarized by type legalization. Maybe you just need to mark v2i16 SETCC has Expand so LegalizeVectorOps will scalarize it?

@topperc
Copy link
Collaborator

topperc commented Sep 17, 2024

So it looks like DAGCombiner sees that getSetCCResultType returns v2i1, but then sees that v2i1 type isn't legal.

Had it not been extended it looks like v2i1 setccs get scalarized by type legalization. Maybe you just need to mark v2i16 SETCC has Expand so LegalizeVectorOps will scalarize it?

Actually just marking it Expand stops the DAGCombiner which checks

    if (N0.hasOneUse() && TLI.isOperationLegalOrCustom(ISD::SETCC, VT) &&        
        !TLI.isOperationLegalOrCustom(ISD::SETCC, SVT)) {

where VT is v2i16 and SVT is v2i1.

Note that this refers to the return type of SETCC. This operation is not
legal in PTX but was assumed as such because v2i16 is declared a legal
type. We were already expanding v4i8 SETCC.

The DAGCombiner would in certain circumstances try to fold an extension
of an illegal v2i1 SETCC (because v2i1 is illegal) into a "legal" v2i16
SETCC, which we wouldn't have patterns for.
@frasercrmck frasercrmck changed the title [NVPTX] Set boolean contents to zero-or-one [NVPTX] Set v2i16 SETCC to Expand Sep 18, 2024
@frasercrmck
Copy link
Contributor Author

Thanks for the help, @justinfargnoli and @topperc. I agree that that's a better approach. I've updated the PR.

I suspect ZeroOrOne might still make sense for PTX but I don't think we have enough testing to justify switching that up.

Copy link
Contributor

@justinfargnoli justinfargnoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@frasercrmck frasercrmck merged commit 90330e9 into llvm:main Sep 19, 2024
8 checks passed
@frasercrmck frasercrmck deleted the fix-nvptx-sext-setcc branch September 19, 2024 06:12
tmsri pushed a commit to tmsri/llvm-project that referenced this pull request Sep 19, 2024
Note that this refers to the return type of SETCC. This operation is not
legal in PTX but was assumed as such because v2i16 is declared a legal
type. We were already expanding v4i8 SETCC.

The DAGCombiner would in certain circumstances try to fold an extension
of an illegal v2i1 SETCC (because v2i1 is illegal) into a "legal" v2i16
SETCC, which we wouldn't have patterns for.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants