Skip to content

Commit 64e8089

Browse files
committed
[AArch64] Prevent unnecessary truncation in bool vector reduce code generation
Prevent unnecessarily truncating results of 128 bit wide vector comparisons to 64 bit wide vector values in boolean vector reduce operations.
1 parent 65e0031 commit 64e8089

File tree

4 files changed

+722
-55
lines changed

4 files changed

+722
-55
lines changed

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15928,11 +15928,20 @@ static SDValue getVectorBitwiseReduce(unsigned Opcode, SDValue Vec, EVT VT,
1592815928
return getVectorBitwiseReduce(Opcode, HalfVec, VT, DL, DAG);
1592915929
}
1593015930

15931-
// Vectors that are less than 64 bits get widened to neatly fit a 64 bit
15932-
// register, so e.g. <4 x i1> gets lowered to <4 x i16>. Sign extending to
15933-
// this element size leads to the best codegen, since e.g. setcc results
15934-
// might need to be truncated otherwise.
15935-
EVT ExtendedVT = MVT::getIntegerVT(std::max(64u / NumElems, 8u));
15931+
// Results of setcc operations get widened to 128 bits if their input
15932+
// operands are 128 bits wide and in case of reduce_and and reduce_or have
15933+
// at least 4 elements, otherwise vectors that are less than 64 bits get
15934+
// widened to neatly fit a 64 bit register, so e.g. <4 x i1> gets lowered to
15935+
// either <4 x i16> or <4 x i32>. Sign extending to this element size leads
15936+
// to the best codegen, since e.g. setcc results might need to be truncated
15937+
// otherwise.
15938+
unsigned ExtendedWidth = 64;
15939+
if ((ScalarOpcode == ISD::XOR || NumElems >= 4) &&
15940+
Vec.getOpcode() == ISD::SETCC &&
15941+
Vec.getOperand(0).getValueSizeInBits() >= 128) {
15942+
ExtendedWidth = 128;
15943+
}
15944+
EVT ExtendedVT = MVT::getIntegerVT(std::max(ExtendedWidth / NumElems, 8u));
1593615945

1593715946
// any_ext doesn't work with umin/umax, so only use it for uadd.
1593815947
unsigned ExtendOp =

llvm/test/CodeGen/AArch64/illegal-floating-point-vector-compares.ll

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,7 @@ define i1 @unordered_floating_point_compare_on_v8f32(<8 x float> %a_vec) {
1212
; CHECK-NEXT: mov w8, #1 // =0x1
1313
; CHECK-NEXT: uzp1 v0.8h, v0.8h, v1.8h
1414
; CHECK-NEXT: mvn v0.16b, v0.16b
15-
; CHECK-NEXT: xtn v0.8b, v0.8h
16-
; CHECK-NEXT: umaxv b0, v0.8b
15+
; CHECK-NEXT: umaxv h0, v0.8h
1716
; CHECK-NEXT: fmov w9, s0
1817
; CHECK-NEXT: bic w0, w8, w9
1918
; CHECK-NEXT: ret

0 commit comments

Comments
 (0)