Skip to content

Commit 5caffff

Browse files
MacDuerorth
authored andcommitted
[AArch64][SDAG] Fix selection of extend of v1if16 SETCC (llvm#140274)
There is a DAG combine, that folds: ``` t1: v1i1 = setcc x:v1f16, y:v1f16, setogt:ch t2: v1i64 = zero_extend t1 ``` -> ``` t1: v1i16 = setcc x:v1f16, y:v1f16, setogt:ch t2: v1i64 = any_extend t1 ``` This creates an issue on AArch64 when attempting to widen the result to `v4i16`. The operand types (`v1f16`) are set to be scalarized, so the "by hand" widening with `DAG.WidenVector` is used for them, however, this only widens to the next power-of-2, so returns `v2f16`, which does not match the result VF. The fix is to manually construct the widened inputs using `INSERT_SUBVECTOR`. Fixes llvm#136540
1 parent 29535d9 commit 5caffff

File tree

2 files changed

+22
-2
lines changed

2 files changed

+22
-2
lines changed

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6634,8 +6634,12 @@ SDValue DAGTypeLegalizer::WidenVecRes_SETCC(SDNode *N) {
66346634
InOp1 = GetWidenedVector(InOp1);
66356635
InOp2 = GetWidenedVector(InOp2);
66366636
} else {
6637-
InOp1 = DAG.WidenVector(InOp1, SDLoc(N));
6638-
InOp2 = DAG.WidenVector(InOp2, SDLoc(N));
6637+
SDValue Poison = DAG.getPOISON(WidenInVT);
6638+
SDValue ZeroIdx = DAG.getVectorIdxConstant(0, SDLoc(N));
6639+
InOp1 = DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), WidenInVT, Poison,
6640+
InOp1, ZeroIdx);
6641+
InOp2 = DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), WidenInVT, Poison,
6642+
InOp2, ZeroIdx);
66396643
}
66406644

66416645
// Assume that the input and output will be widen appropriately. If not,

llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,3 +249,19 @@ if.then:
249249
if.end:
250250
ret i32 1;
251251
}
252+
253+
define <1 x i64> @test_zext_half(<1 x half> %v1) {
254+
; CHECK-LABEL: test_zext_half:
255+
; CHECK: // %bb.0:
256+
; CHECK-NEXT: // kill: def $h0 killed $h0 def $d0
257+
; CHECK-NEXT: mov w8, #1 // =0x1
258+
; CHECK-NEXT: fcvtl v0.4s, v0.4h
259+
; CHECK-NEXT: fmov d1, x8
260+
; CHECK-NEXT: fcmgt v0.4s, v0.4s, #0.0
261+
; CHECK-NEXT: xtn v0.4h, v0.4s
262+
; CHECK-NEXT: and v0.8b, v0.8b, v1.8b
263+
; CHECK-NEXT: ret
264+
%1 = fcmp ogt <1 x half> %v1, zeroinitializer
265+
%2 = zext <1 x i1> %1 to <1 x i64>
266+
ret <1 x i64> %2
267+
}

0 commit comments

Comments
 (0)