-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[NVPTX] Set v2i16 SETCC to Expand #108969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-nvptx Author: Fraser Cormack (frasercrmck) ChangesNote that this field is a property of SelectionDAG rather than one of the ISA. The previous value wasn't incorrect, per se, but since NVPTX uses 1-bit predicate registers, either choices are possible. The problem with using zero-or-negative-one manifested in issues during instruction selection where LLVM would fold a sign-extension of a setcc into the setcc itself, which isn't a legal operation. We could alternatively let this happen and then custom expand it back into a separate setcc + sext later. Or we could provide explicit patterns for the setp/selp sequence. However, preventing it from forming in the first place feels more appropriate to the ISA. It does not appear to have a significant knock-on effect on codegen tests, aside from one minor change which looks harmless:
to
Full diff: https://github.com/llvm/llvm-project/pull/108969.diff 3 Files Affected:
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index c5a40e4308860c..4495fdb45e74c9 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -416,8 +416,8 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
MaxStoresPerMemcpy = MaxStoresPerMemcpyOptSize = (unsigned) 0xFFFFFFFF;
MaxStoresPerMemmove = MaxStoresPerMemmoveOptSize = (unsigned) 0xFFFFFFFF;
- setBooleanContents(ZeroOrNegativeOneBooleanContent);
- setBooleanVectorContents(ZeroOrNegativeOneBooleanContent);
+ setBooleanContents(ZeroOrOneBooleanContent);
+ setBooleanVectorContents(ZeroOrOneBooleanContent);
// Jump is Expensive. Don't create extra control flow for 'and', 'or'
// condition branches.
diff --git a/llvm/test/CodeGen/NVPTX/add-sub-128bit.ll b/llvm/test/CodeGen/NVPTX/add-sub-128bit.ll
index 9d451e90650df1..26eb5fb223a6ca 100644
--- a/llvm/test/CodeGen/NVPTX/add-sub-128bit.ll
+++ b/llvm/test/CodeGen/NVPTX/add-sub-128bit.ll
@@ -23,8 +23,8 @@ define i128 @test_add(i128 %a, i128 %b) {
define i128 @test_sub(i128 %a, i128 %b) {
; NOCARRY: sub.s64
; NOCARRY-NEXT: setp.lt.u64
-; NOCARRY-NEXT: selp.s64
-; NOCARRY-NEXT: add.s64
+; NOCARRY-NEXT: selp.u64
+; NOCARRY-NEXT: sub.s64
; NOCARRY-NEXT: sub.s64
; CARRY: sub.cc.s64
diff --git a/llvm/test/CodeGen/NVPTX/sext-setcc.ll b/llvm/test/CodeGen/NVPTX/sext-setcc.ll
new file mode 100644
index 00000000000000..4b56100f347f49
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/sext-setcc.ll
@@ -0,0 +1,29 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 | FileCheck %s
+; RUN: %if ptxas-11.0 %{ llc < %s -march=nvptx64 -mcpu=sm_80 -mattr=+ptx70 | %ptxas-verify -arch=sm_80 %}
+
+define <2 x i16> @sext_setcc(ptr %p) {
+; CHECK-LABEL: sext_setcc(
+; CHECK: {
+; CHECK-NEXT: .reg .pred %p<3>;
+; CHECK-NEXT: .reg .b16 %rs<5>;
+; CHECK-NEXT: .reg .b32 %r<3>;
+; CHECK-NEXT: .reg .b64 %rd<2>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0: // %entry
+; CHECK-NEXT: ld.param.u64 %rd1, [sext_setcc_param_0];
+; CHECK-NEXT: ld.u32 %r1, [%rd1];
+; CHECK-NEXT: mov.b32 {%rs1, %rs2}, %r1;
+; CHECK-NEXT: setp.eq.s16 %p1, %rs1, 0;
+; CHECK-NEXT: setp.eq.s16 %p2, %rs2, 0;
+; CHECK-NEXT: selp.s16 %rs3, -1, 0, %p2;
+; CHECK-NEXT: selp.s16 %rs4, -1, 0, %p1;
+; CHECK-NEXT: mov.b32 %r2, {%rs4, %rs3};
+; CHECK-NEXT: st.param.b32 [func_retval0+0], %r2;
+; CHECK-NEXT: ret;
+entry:
+ %v = load <2 x i16>, ptr %p, align 4
+ %cmp = icmp eq <2 x i16> %v, zeroinitializer
+ %sext = sext <2 x i1> %cmp to <2 x i16>
+ ret <2 x i16> %sext
+}
|
f8eecfa
to
983eeb0
Compare
Notes:
This is correct (source: Section 4.5 "Constants" of PTX ISA):
Potentially incorrect transformation occurs during DAG optimization:
|
Where in the codebase is the I'm curious to see how this change will disable the transformation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I'm not opposed to this change for its own sake, I'm not confident it's the correct fix for this bug.
Based on my understanding, the root cause of this issue is that v2i16 setcc
isn't being recognized as an illegal operation and then legalized to v2i16 sext(v2i1 setcc)
.
If that's correct, then I think we'd want to ensure that v2i16 setcc
is correctly legalized instead of trying to ensure that the DAG combiner never generates a v2i16 setcc
.
I'll take a look at this. AVX512 on X86 is similar so let me see how we handled this. |
So it looks like DAGCombiner sees that getSetCCResultType returns v2i1, but then sees that v2i1 type isn't legal. Had it not been extended it looks like v2i1 setccs get scalarized by type legalization. Maybe you just need to mark v2i16 SETCC has Expand so LegalizeVectorOps will scalarize it? |
Actually just marking it Expand stops the DAGCombiner which checks
where VT is v2i16 and SVT is v2i1. |
Note that this refers to the return type of SETCC. This operation is not legal in PTX but was assumed as such because v2i16 is declared a legal type. We were already expanding v4i8 SETCC. The DAGCombiner would in certain circumstances try to fold an extension of an illegal v2i1 SETCC (because v2i1 is illegal) into a "legal" v2i16 SETCC, which we wouldn't have patterns for.
983eeb0
to
0970633
Compare
Thanks for the help, @justinfargnoli and @topperc. I agree that that's a better approach. I've updated the PR. I suspect ZeroOrOne might still make sense for PTX but I don't think we have enough testing to justify switching that up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Note that this refers to the return type of SETCC. This operation is not legal in PTX but was assumed as such because v2i16 is declared a legal type. We were already expanding v4i8 SETCC. The DAGCombiner would in certain circumstances try to fold an extension of an illegal v2i1 SETCC (because v2i1 is illegal) into a "legal" v2i16 SETCC, which we wouldn't have patterns for.
Note that this refers to the return type of SETCC. This operation is not
legal in PTX but was assumed as such because v2i16 is declared a legal
type. We were already expanding v4i8 SETCC.
The DAGCombiner would in certain circumstances try to fold an extension
of an illegal v2i1 SETCC (because v2i1 is illegal) into a "legal" v2i16
SETCC, which we wouldn't have patterns for.