-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AArch64][SDAG] Fix selection of extend of v1if16 SETCC #140274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There is a DAG combine, that folds: ``` t1: v1i1 = setcc x:v1f16, y:v1f16, setogt:ch t2: v1i64 = zero_extend t1 ``` -> ``` t1: v1i16 = setcc x:v1f16, y:v1f16, setogt:ch t2: v1i64 = any_extend t1 ``` This creates an issue on AArch64 when attempting to widen the result to `v4i16`. The operand types (v1f16) are set to be scalarized, so the "by hand" widening with `DAG.WidenVector` is used for them, however, this only widens to the next power-of-2, so returns v2f16, which does not match the result VF. The fix is to manually construct the widened inputs using INSERT_SUBVECTOR. Fixes llvm#136540
@llvm/pr-subscribers-backend-aarch64 Author: Benjamin Maxwell (MacDue) ChangesThere is a DAG combine, that folds:
->
This creates an issue on AArch64 when attempting to widen the result to Fixes #136540 Full diff: https://github.com/llvm/llvm-project/pull/140274.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 0c0e700f6abca..ac7e4f5ab9e20 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -6623,8 +6623,12 @@ SDValue DAGTypeLegalizer::WidenVecRes_SETCC(SDNode *N) {
InOp1 = GetWidenedVector(InOp1);
InOp2 = GetWidenedVector(InOp2);
} else {
- InOp1 = DAG.WidenVector(InOp1, SDLoc(N));
- InOp2 = DAG.WidenVector(InOp2, SDLoc(N));
+ SDValue Undef = DAG.getUNDEF(WidenInVT);
+ SDValue ZeroIdx = DAG.getVectorIdxConstant(0, SDLoc(N));
+ InOp1 = DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), WidenInVT, Undef,
+ InOp1, ZeroIdx);
+ InOp2 = DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), WidenInVT, Undef,
+ InOp2, ZeroIdx);
}
// Assume that the input and output will be widen appropriately. If not,
diff --git a/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll b/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
index 6c70d19a977a5..5ed362604dc5f 100644
--- a/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
@@ -249,3 +249,19 @@ if.then:
if.end:
ret i32 1;
}
+
+define <1 x i64> @test_zext_half(<1 x half> %v1) {
+; CHECK-LABEL: test_zext_half:
+; CHECK: // %bb.0:
+; CHECK-NEXT: // kill: def $h0 killed $h0 def $d0
+; CHECK-NEXT: mov w8, #1 // =0x1
+; CHECK-NEXT: fcvtl v0.4s, v0.4h
+; CHECK-NEXT: fmov d1, x8
+; CHECK-NEXT: fcmgt v0.4s, v0.4s, #0.0
+; CHECK-NEXT: xtn v0.4h, v0.4s
+; CHECK-NEXT: and v0.8b, v0.8b, v1.8b
+; CHECK-NEXT: ret
+ %1 = fcmp ogt <1 x half> %v1, zeroinitializer
+ %2 = zext <1 x i1> %1 to <1 x i64>
+ ret <1 x i64> %2
+}
|
@llvm/pr-subscribers-llvm-selectiondag Author: Benjamin Maxwell (MacDue) ChangesThere is a DAG combine, that folds:
->
This creates an issue on AArch64 when attempting to widen the result to Fixes #136540 Full diff: https://github.com/llvm/llvm-project/pull/140274.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 0c0e700f6abca..ac7e4f5ab9e20 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -6623,8 +6623,12 @@ SDValue DAGTypeLegalizer::WidenVecRes_SETCC(SDNode *N) {
InOp1 = GetWidenedVector(InOp1);
InOp2 = GetWidenedVector(InOp2);
} else {
- InOp1 = DAG.WidenVector(InOp1, SDLoc(N));
- InOp2 = DAG.WidenVector(InOp2, SDLoc(N));
+ SDValue Undef = DAG.getUNDEF(WidenInVT);
+ SDValue ZeroIdx = DAG.getVectorIdxConstant(0, SDLoc(N));
+ InOp1 = DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), WidenInVT, Undef,
+ InOp1, ZeroIdx);
+ InOp2 = DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), WidenInVT, Undef,
+ InOp2, ZeroIdx);
}
// Assume that the input and output will be widen appropriately. If not,
diff --git a/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll b/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
index 6c70d19a977a5..5ed362604dc5f 100644
--- a/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
@@ -249,3 +249,19 @@ if.then:
if.end:
ret i32 1;
}
+
+define <1 x i64> @test_zext_half(<1 x half> %v1) {
+; CHECK-LABEL: test_zext_half:
+; CHECK: // %bb.0:
+; CHECK-NEXT: // kill: def $h0 killed $h0 def $d0
+; CHECK-NEXT: mov w8, #1 // =0x1
+; CHECK-NEXT: fcvtl v0.4s, v0.4h
+; CHECK-NEXT: fmov d1, x8
+; CHECK-NEXT: fcmgt v0.4s, v0.4s, #0.0
+; CHECK-NEXT: xtn v0.4h, v0.4s
+; CHECK-NEXT: and v0.8b, v0.8b, v1.8b
+; CHECK-NEXT: ret
+ %1 = fcmp ogt <1 x half> %v1, zeroinitializer
+ %2 = zext <1 x i1> %1 to <1 x i64>
+ ret <1 x i64> %2
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this, I didn't get too far when looking into it before. Should it be scalarizing somewhere along the line? I would expect that to produce better codegen.
It seems like this was deliberately done back in https://reviews.llvm.org/D4322 (apparently to avoid some legalization issues). |
Oh - it was setcc specifically I was thinking of, not all operations. setcc is a little odd because the return type is comparatively less important to the type of the operands, which specifies whether the operation is float or integer. If the DAG combine that creates the v1i16 is the transform that is causing this to go wrong then maybe it would be better to disable it somehow. It should be possible to mark that specific type as v1i1 in getSetCCResultType to encourage it to scalarize instead. The vector codegen isn't as efficient as it could be so maybe it doesn't matter a lot whether we scalarize or not, and your explanation as to what goes wrong makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is OK. LGTM
I believe it's this DAG combine that results in the return type changing to v1i16: https://github.com/llvm/llvm-project/blob/9f8d798942e7679ca265efb23a7f655936d19a49/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L14631-L14638 However, I think using |
That combine is busted, it shouldn't be picking a random type for the return value. It should be using getSetCCResultType |
I just had another look at this, and I believe that this patch is a valid fix for this case, as |
There is a DAG combine, that folds: ``` t1: v1i1 = setcc x:v1f16, y:v1f16, setogt:ch t2: v1i64 = zero_extend t1 ``` -> ``` t1: v1i16 = setcc x:v1f16, y:v1f16, setogt:ch t2: v1i64 = any_extend t1 ``` This creates an issue on AArch64 when attempting to widen the result to `v4i16`. The operand types (`v1f16`) are set to be scalarized, so the "by hand" widening with `DAG.WidenVector` is used for them, however, this only widens to the next power-of-2, so returns `v2f16`, which does not match the result VF. The fix is to manually construct the widened inputs using `INSERT_SUBVECTOR`. Fixes llvm#136540
There is a DAG combine, that folds: ``` t1: v1i1 = setcc x:v1f16, y:v1f16, setogt:ch t2: v1i64 = zero_extend t1 ``` -> ``` t1: v1i16 = setcc x:v1f16, y:v1f16, setogt:ch t2: v1i64 = any_extend t1 ``` This creates an issue on AArch64 when attempting to widen the result to `v4i16`. The operand types (`v1f16`) are set to be scalarized, so the "by hand" widening with `DAG.WidenVector` is used for them, however, this only widens to the next power-of-2, so returns `v2f16`, which does not match the result VF. The fix is to manually construct the widened inputs using `INSERT_SUBVECTOR`. Fixes llvm#136540
There is a DAG combine, that folds:
->
This creates an issue on AArch64 when attempting to widen the result to
v4i16
. The operand types (v1f16
) are set to be scalarized, so the "by hand" widening withDAG.WidenVector
is used for them, however, this only widens to the next power-of-2, so returnsv2f16
, which does not match the result VF. The fix is to manually construct the widened inputs usingINSERT_SUBVECTOR
.Fixes #136540