-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RISCV] Fix the cost of llvm.vector.reduce.and
#119160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-risc-v Author: Shao-Ce SUN (sunshaoce) ChangesI added some CodeGen test cases related to reduce. To maintain consistency, I also added cases for instructions like For cases where Patch is 48.98 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119160.diff 6 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index cbf8c57fde44d1..4b5f5b0c8e047d 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1538,13 +1538,27 @@ RISCVTTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
if (ElementTy->isIntegerTy(1)) {
if (ISD == ISD::AND) {
// Example sequences:
+ // vfirst.m a0, v0
+ // seqz a0, a0
+ if (LT.second == MVT::v1i1)
+ return getRISCVInstructionCost(RISCV::VFIRST_M, LT.second, CostKind) +
+ getCmpSelInstrCost(Instruction::ICmp, ElementTy, ElementTy,
+ CmpInst::ICMP_EQ, CostKind);
+ // Example sequences:
// vsetvli a0, zero, e8, mf8, ta, ma
// vmand.mm v8, v9, v8 ; needed every time type is split
// vmnot.m v8, v0
// vcpop.m a0, v8
// seqz a0, a0
- return LT.first * getRISCVInstructionCost(RISCV::VMNAND_MM, LT.second,
- CostKind) +
+
+ // Fixed VT: In v512i1 and larger vector elements,
+ // Scalable VT: In v128i1 and larger vector elements,
+ // the VMAND_MM instructions have started to be added.
+ return ((LT.first >= 2)
+ ? LT.first - (LT.second.isScalableVector() ? 1 : 2)
+ : 0) *
+ getRISCVInstructionCost(RISCV::VMAND_MM, LT.second, CostKind) +
+ getRISCVInstructionCost(RISCV::VMNAND_MM, LT.second, CostKind) +
getRISCVInstructionCost(RISCV::VCPOP_M, LT.second, CostKind) +
getCmpSelInstrCost(Instruction::ICmp, ElementTy, ElementTy,
CmpInst::ICMP_EQ, CostKind);
diff --git a/llvm/test/Analysis/CostModel/RISCV/reduce-and.ll b/llvm/test/Analysis/CostModel/RISCV/reduce-and.ll
index e4f1cf8ff418a8..7b626c426fc226 100644
--- a/llvm/test/Analysis/CostModel/RISCV/reduce-and.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/reduce-and.ll
@@ -6,7 +6,7 @@
define i32 @reduce_i1(i32 %arg) {
; CHECK-LABEL: 'reduce_i1'
-; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V1 = call i1 @llvm.vector.reduce.and.v1i1(<1 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V1 = call i1 @llvm.vector.reduce.and.v1i1(<1 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i1 @llvm.vector.reduce.and.v2i1(<2 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8 = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> undef)
@@ -14,13 +14,24 @@ define i32 @reduce_i1(i32 %arg) {
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V32 = call i1 @llvm.vector.reduce.and.v32i1(<32 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V64 = call i1 @llvm.vector.reduce.and.v64i1(<64 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V128 = call i1 @llvm.vector.reduce.and.v128i1(<128 x i1> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = call i1 @llvm.vector.reduce.and.v256i1(<256 x i1> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V512 = call i1 @llvm.vector.reduce.and.v512i1(<512 x i1> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V1024 = call i1 @llvm.vector.reduce.and.v1024i1(<1024 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V256 = call i1 @llvm.vector.reduce.and.v256i1(<256 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V512 = call i1 @llvm.vector.reduce.and.v512i1(<512 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %V1024 = call i1 @llvm.vector.reduce.and.v1024i1(<1024 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV1 = call i1 @llvm.vector.reduce.and.nxv1i1(<vscale x 1 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV2 = call i1 @llvm.vector.reduce.and.nxv2i1(<vscale x 2 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV4 = call i1 @llvm.vector.reduce.and.nxv4i1(<vscale x 4 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV8 = call i1 @llvm.vector.reduce.and.nxv8i1(<vscale x 8 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV16 = call i1 @llvm.vector.reduce.and.nxv16i1(<vscale x 16 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV32 = call i1 @llvm.vector.reduce.and.nxv32i1(<vscale x 32 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV64 = call i1 @llvm.vector.reduce.and.nxv64i1(<vscale x 64 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %NXV128 = call i1 @llvm.vector.reduce.and.nxv128i1(<vscale x 128 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %NXV256 = call i1 @llvm.vector.reduce.and.nxv256i1(<vscale x 256 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %NXV512 = call i1 @llvm.vector.reduce.and.nxv512i1(<vscale x 512 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %NXV1024 = call i1 @llvm.vector.reduce.and.nxv1024i1(<vscale x 1024 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;
; SIZE-LABEL: 'reduce_i1'
-; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V1 = call i1 @llvm.vector.reduce.and.v1i1(<1 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V1 = call i1 @llvm.vector.reduce.and.v1i1(<1 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i1 @llvm.vector.reduce.and.v2i1(<2 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8 = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> undef)
@@ -28,9 +39,20 @@ define i32 @reduce_i1(i32 %arg) {
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V32 = call i1 @llvm.vector.reduce.and.v32i1(<32 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V64 = call i1 @llvm.vector.reduce.and.v64i1(<64 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V128 = call i1 @llvm.vector.reduce.and.v128i1(<128 x i1> undef)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = call i1 @llvm.vector.reduce.and.v256i1(<256 x i1> undef)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V512 = call i1 @llvm.vector.reduce.and.v512i1(<512 x i1> undef)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V1024 = call i1 @llvm.vector.reduce.and.v1024i1(<1024 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V256 = call i1 @llvm.vector.reduce.and.v256i1(<256 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V512 = call i1 @llvm.vector.reduce.and.v512i1(<512 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %V1024 = call i1 @llvm.vector.reduce.and.v1024i1(<1024 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV1 = call i1 @llvm.vector.reduce.and.nxv1i1(<vscale x 1 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV2 = call i1 @llvm.vector.reduce.and.nxv2i1(<vscale x 2 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV4 = call i1 @llvm.vector.reduce.and.nxv4i1(<vscale x 4 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV8 = call i1 @llvm.vector.reduce.and.nxv8i1(<vscale x 8 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV16 = call i1 @llvm.vector.reduce.and.nxv16i1(<vscale x 16 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV32 = call i1 @llvm.vector.reduce.and.nxv32i1(<vscale x 32 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV64 = call i1 @llvm.vector.reduce.and.nxv64i1(<vscale x 64 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %NXV128 = call i1 @llvm.vector.reduce.and.nxv128i1(<vscale x 128 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %NXV256 = call i1 @llvm.vector.reduce.and.nxv256i1(<vscale x 256 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %NXV512 = call i1 @llvm.vector.reduce.and.nxv512i1(<vscale x 512 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %NXV1024 = call i1 @llvm.vector.reduce.and.nxv1024i1(<vscale x 1024 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef
;
%V1 = call i1 @llvm.vector.reduce.and.v1i1(<1 x i1> undef)
@@ -44,6 +66,18 @@ define i32 @reduce_i1(i32 %arg) {
%V256 = call i1 @llvm.vector.reduce.and.v256i1(<256 x i1> undef)
%V512 = call i1 @llvm.vector.reduce.and.v512i1(<512 x i1> undef)
%V1024 = call i1 @llvm.vector.reduce.and.v1024i1(<1024 x i1> undef)
+
+ %NXV1 = call i1 @llvm.vector.reduce.and.nxv1i1(<vscale x 1 x i1> undef)
+ %NXV2 = call i1 @llvm.vector.reduce.and.nxv2i1(<vscale x 2 x i1> undef)
+ %NXV4 = call i1 @llvm.vector.reduce.and.nxv4i1(<vscale x 4 x i1> undef)
+ %NXV8 = call i1 @llvm.vector.reduce.and.nxv8i1(<vscale x 8 x i1> undef)
+ %NXV16 = call i1 @llvm.vector.reduce.and.nxv16i1(<vscale x 16 x i1> undef)
+ %NXV32 = call i1 @llvm.vector.reduce.and.nxv32i1(<vscale x 32 x i1> undef)
+ %NXV64 = call i1 @llvm.vector.reduce.and.nxv64i1(<vscale x 64 x i1> undef)
+ %NXV128 = call i1 @llvm.vector.reduce.and.nxv128i1(<vscale x 128 x i1> undef)
+ %NXV256 = call i1 @llvm.vector.reduce.and.nxv256i1(<vscale x 256 x i1> undef)
+ %NXV512 = call i1 @llvm.vector.reduce.and.nxv512i1(<vscale x 512 x i1> undef)
+ %NXV1024 = call i1 @llvm.vector.reduce.and.nxv1024i1(<vscale x 1024 x i1> undef)
ret i32 undef
}
diff --git a/llvm/test/Analysis/CostModel/RISCV/reduce-max.ll b/llvm/test/Analysis/CostModel/RISCV/reduce-max.ll
index f11e9f2b5ae837..5c9303af31747e 100644
--- a/llvm/test/Analysis/CostModel/RISCV/reduce-max.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/reduce-max.ll
@@ -176,7 +176,7 @@ define i32 @reduce_umax_i64(i32 %arg) {
define i32 @reduce_smin_i1(i32 %arg) {
; CHECK-LABEL: 'reduce_smin_i1'
-; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V1 = call i1 @llvm.vector.reduce.smax.v1i1(<1 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V1 = call i1 @llvm.vector.reduce.smax.v1i1(<1 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i1 @llvm.vector.reduce.smax.v2i1(<2 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i1 @llvm.vector.reduce.smax.v4i1(<4 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8 = call i1 @llvm.vector.reduce.smax.v8i1(<8 x i1> undef)
@@ -187,7 +187,7 @@ define i32 @reduce_smin_i1(i32 %arg) {
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;
; SIZE-LABEL: 'reduce_smin_i1'
-; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V1 = call i1 @llvm.vector.reduce.smax.v1i1(<1 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V1 = call i1 @llvm.vector.reduce.smax.v1i1(<1 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i1 @llvm.vector.reduce.smax.v2i1(<2 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i1 @llvm.vector.reduce.smax.v4i1(<4 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8 = call i1 @llvm.vector.reduce.smax.v8i1(<8 x i1> undef)
diff --git a/llvm/test/Analysis/CostModel/RISCV/reduce-min.ll b/llvm/test/Analysis/CostModel/RISCV/reduce-min.ll
index 457fdbe46f73b5..9875d3e5858115 100644
--- a/llvm/test/Analysis/CostModel/RISCV/reduce-min.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/reduce-min.ll
@@ -6,7 +6,7 @@
define i32 @reduce_umin_i1(i32 %arg) {
; CHECK-LABEL: 'reduce_umin_i1'
-; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V1 = call i1 @llvm.vector.reduce.umin.v1i1(<1 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V1 = call i1 @llvm.vector.reduce.umin.v1i1(<1 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i1 @llvm.vector.reduce.umin.v2i1(<2 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i1 @llvm.vector.reduce.umin.v4i1(<4 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8 = call i1 @llvm.vector.reduce.umin.v8i1(<8 x i1> undef)
@@ -17,7 +17,7 @@ define i32 @reduce_umin_i1(i32 %arg) {
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;
; SIZE-LABEL: 'reduce_umin_i1'
-; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V1 = call i1 @llvm.vector.reduce.umin.v1i1(<1 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V1 = call i1 @llvm.vector.reduce.umin.v1i1(<1 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i1 @llvm.vector.reduce.umin.v2i1(<2 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i1 @llvm.vector.reduce.umin.v4i1(<4 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8 = call i1 @llvm.vector.reduce.umin.v8i1(<8 x i1> undef)
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vreductions-mask.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vreductions-mask.ll
index 44d4a8a1e04cda..0d31ec5f784352 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vreductions-mask.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vreductions-mask.ll
@@ -763,3 +763,475 @@ define zeroext i1 @vreduce_add_v64i1(<64 x i1> %v) {
%red = call i1 @llvm.vector.reduce.add.v64i1(<64 x i1> %v)
ret i1 %red
}
+
+declare i1 @llvm.vector.reduce.or.v128i1(<128 x i1>)
+
+define zeroext i1 @vreduce_or_v128i1(<128 x i1> %v) {
+; CHECK-LABEL: vreduce_or_v128i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vcpop.m a0, v0
+; CHECK-NEXT: snez a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.or.v128i1(<128 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.xor.v128i1(<128 x i1>)
+
+define zeroext i1 @vreduce_xor_v128i1(<128 x i1> %v) {
+; CHECK-LABEL: vreduce_xor_v128i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vcpop.m a0, v0
+; CHECK-NEXT: andi a0, a0, 1
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.xor.v128i1(<128 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.and.v128i1(<128 x i1>)
+
+define zeroext i1 @vreduce_and_v128i1(<128 x i1> %v) {
+; CHECK-LABEL: vreduce_and_v128i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vmnot.m v8, v0
+; CHECK-NEXT: vcpop.m a0, v8
+; CHECK-NEXT: seqz a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.and.v128i1(<128 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.umax.v128i1(<128 x i1>)
+
+define zeroext i1 @vreduce_umax_v128i1(<128 x i1> %v) {
+; CHECK-LABEL: vreduce_umax_v128i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vcpop.m a0, v0
+; CHECK-NEXT: snez a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.umax.v128i1(<128 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.smax.v128i1(<128 x i1>)
+
+define zeroext i1 @vreduce_smax_v128i1(<128 x i1> %v) {
+; CHECK-LABEL: vreduce_smax_v128i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vmnot.m v8, v0
+; CHECK-NEXT: vcpop.m a0, v8
+; CHECK-NEXT: seqz a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.smax.v128i1(<128 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.umin.v128i1(<128 x i1>)
+
+define zeroext i1 @vreduce_umin_v128i1(<128 x i1> %v) {
+; CHECK-LABEL: vreduce_umin_v128i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vmnot.m v8, v0
+; CHECK-NEXT: vcpop.m a0, v8
+; CHECK-NEXT: seqz a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.umin.v128i1(<128 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.smin.v128i1(<128 x i1>)
+
+define zeroext i1 @vreduce_smin_v128i1(<128 x i1> %v) {
+; CHECK-LABEL: vreduce_smin_v128i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vcpop.m a0, v0
+; CHECK-NEXT: snez a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.smin.v128i1(<128 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.or.v256i1(<256 x i1>)
+
+define zeroext i1 @vreduce_or_v256i1(<256 x i1> %v) {
+; CHECK-LABEL: vreduce_or_v256i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vmor.mm v8, v0, v8
+; CHECK-NEXT: vcpop.m a0, v8
+; CHECK-NEXT: snez a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.or.v256i1(<256 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.xor.v256i1(<256 x i1>)
+
+define zeroext i1 @vreduce_xor_v256i1(<256 x i1> %v) {
+; CHECK-LABEL: vreduce_xor_v256i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vmxor.mm v8, v0, v8
+; CHECK-NEXT: vcpop.m a0, v8
+; CHECK-NEXT: andi a0, a0, 1
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.xor.v256i1(<256 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.and.v256i1(<256 x i1>)
+
+define zeroext i1 @vreduce_and_v256i1(<256 x i1> %v) {
+; CHECK-LABEL: vreduce_and_v256i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vmnand.mm v8, v0, v8
+; CHECK-NEXT: vcpop.m a0, v8
+; CHECK-NEXT: seqz a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.and.v256i1(<256 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.umax.v256i1(<256 x i1>)
+
+define zeroext i1 @vreduce_umax_v256i1(<256 x i1> %v) {
+; CHECK-LABEL: vreduce_umax_v256i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vmor.mm v8, v0, v8
+; CHECK-NEXT: vcpop.m a0, v8
+; CHECK-NEXT: snez a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.umax.v256i1(<256 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.smax.v256i1(<256 x i1>)
+
+define zeroext i1 @vreduce_smax_v256i1(<256 x i1> %v) {
+; CHECK-LABEL: vreduce_smax_v256i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vmnand.mm v8, v0, v8
+; CHECK-NEXT: vcpop.m a0, v8
+; CHECK-NEXT: seqz a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.smax.v256i1(<256 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.r...
[truncated]
|
@llvm/pr-subscribers-llvm-analysis Author: Shao-Ce SUN (sunshaoce) ChangesI added some CodeGen test cases related to reduce. To maintain consistency, I also added cases for instructions like For cases where Patch is 48.98 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119160.diff 6 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index cbf8c57fde44d1..4b5f5b0c8e047d 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1538,13 +1538,27 @@ RISCVTTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
if (ElementTy->isIntegerTy(1)) {
if (ISD == ISD::AND) {
// Example sequences:
+ // vfirst.m a0, v0
+ // seqz a0, a0
+ if (LT.second == MVT::v1i1)
+ return getRISCVInstructionCost(RISCV::VFIRST_M, LT.second, CostKind) +
+ getCmpSelInstrCost(Instruction::ICmp, ElementTy, ElementTy,
+ CmpInst::ICMP_EQ, CostKind);
+ // Example sequences:
// vsetvli a0, zero, e8, mf8, ta, ma
// vmand.mm v8, v9, v8 ; needed every time type is split
// vmnot.m v8, v0
// vcpop.m a0, v8
// seqz a0, a0
- return LT.first * getRISCVInstructionCost(RISCV::VMNAND_MM, LT.second,
- CostKind) +
+
+ // Fixed VT: In v512i1 and larger vector elements,
+ // Scalable VT: In v128i1 and larger vector elements,
+ // the VMAND_MM instructions have started to be added.
+ return ((LT.first >= 2)
+ ? LT.first - (LT.second.isScalableVector() ? 1 : 2)
+ : 0) *
+ getRISCVInstructionCost(RISCV::VMAND_MM, LT.second, CostKind) +
+ getRISCVInstructionCost(RISCV::VMNAND_MM, LT.second, CostKind) +
getRISCVInstructionCost(RISCV::VCPOP_M, LT.second, CostKind) +
getCmpSelInstrCost(Instruction::ICmp, ElementTy, ElementTy,
CmpInst::ICMP_EQ, CostKind);
diff --git a/llvm/test/Analysis/CostModel/RISCV/reduce-and.ll b/llvm/test/Analysis/CostModel/RISCV/reduce-and.ll
index e4f1cf8ff418a8..7b626c426fc226 100644
--- a/llvm/test/Analysis/CostModel/RISCV/reduce-and.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/reduce-and.ll
@@ -6,7 +6,7 @@
define i32 @reduce_i1(i32 %arg) {
; CHECK-LABEL: 'reduce_i1'
-; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V1 = call i1 @llvm.vector.reduce.and.v1i1(<1 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V1 = call i1 @llvm.vector.reduce.and.v1i1(<1 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i1 @llvm.vector.reduce.and.v2i1(<2 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8 = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> undef)
@@ -14,13 +14,24 @@ define i32 @reduce_i1(i32 %arg) {
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V32 = call i1 @llvm.vector.reduce.and.v32i1(<32 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V64 = call i1 @llvm.vector.reduce.and.v64i1(<64 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V128 = call i1 @llvm.vector.reduce.and.v128i1(<128 x i1> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = call i1 @llvm.vector.reduce.and.v256i1(<256 x i1> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V512 = call i1 @llvm.vector.reduce.and.v512i1(<512 x i1> undef)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V1024 = call i1 @llvm.vector.reduce.and.v1024i1(<1024 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V256 = call i1 @llvm.vector.reduce.and.v256i1(<256 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V512 = call i1 @llvm.vector.reduce.and.v512i1(<512 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %V1024 = call i1 @llvm.vector.reduce.and.v1024i1(<1024 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV1 = call i1 @llvm.vector.reduce.and.nxv1i1(<vscale x 1 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV2 = call i1 @llvm.vector.reduce.and.nxv2i1(<vscale x 2 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV4 = call i1 @llvm.vector.reduce.and.nxv4i1(<vscale x 4 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV8 = call i1 @llvm.vector.reduce.and.nxv8i1(<vscale x 8 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV16 = call i1 @llvm.vector.reduce.and.nxv16i1(<vscale x 16 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV32 = call i1 @llvm.vector.reduce.and.nxv32i1(<vscale x 32 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV64 = call i1 @llvm.vector.reduce.and.nxv64i1(<vscale x 64 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %NXV128 = call i1 @llvm.vector.reduce.and.nxv128i1(<vscale x 128 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %NXV256 = call i1 @llvm.vector.reduce.and.nxv256i1(<vscale x 256 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %NXV512 = call i1 @llvm.vector.reduce.and.nxv512i1(<vscale x 512 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %NXV1024 = call i1 @llvm.vector.reduce.and.nxv1024i1(<vscale x 1024 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;
; SIZE-LABEL: 'reduce_i1'
-; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V1 = call i1 @llvm.vector.reduce.and.v1i1(<1 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V1 = call i1 @llvm.vector.reduce.and.v1i1(<1 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i1 @llvm.vector.reduce.and.v2i1(<2 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8 = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> undef)
@@ -28,9 +39,20 @@ define i32 @reduce_i1(i32 %arg) {
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V32 = call i1 @llvm.vector.reduce.and.v32i1(<32 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V64 = call i1 @llvm.vector.reduce.and.v64i1(<64 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V128 = call i1 @llvm.vector.reduce.and.v128i1(<128 x i1> undef)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = call i1 @llvm.vector.reduce.and.v256i1(<256 x i1> undef)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V512 = call i1 @llvm.vector.reduce.and.v512i1(<512 x i1> undef)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V1024 = call i1 @llvm.vector.reduce.and.v1024i1(<1024 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V256 = call i1 @llvm.vector.reduce.and.v256i1(<256 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V512 = call i1 @llvm.vector.reduce.and.v512i1(<512 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %V1024 = call i1 @llvm.vector.reduce.and.v1024i1(<1024 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV1 = call i1 @llvm.vector.reduce.and.nxv1i1(<vscale x 1 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV2 = call i1 @llvm.vector.reduce.and.nxv2i1(<vscale x 2 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV4 = call i1 @llvm.vector.reduce.and.nxv4i1(<vscale x 4 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV8 = call i1 @llvm.vector.reduce.and.nxv8i1(<vscale x 8 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV16 = call i1 @llvm.vector.reduce.and.nxv16i1(<vscale x 16 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV32 = call i1 @llvm.vector.reduce.and.nxv32i1(<vscale x 32 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %NXV64 = call i1 @llvm.vector.reduce.and.nxv64i1(<vscale x 64 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %NXV128 = call i1 @llvm.vector.reduce.and.nxv128i1(<vscale x 128 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %NXV256 = call i1 @llvm.vector.reduce.and.nxv256i1(<vscale x 256 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %NXV512 = call i1 @llvm.vector.reduce.and.nxv512i1(<vscale x 512 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %NXV1024 = call i1 @llvm.vector.reduce.and.nxv1024i1(<vscale x 1024 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef
;
%V1 = call i1 @llvm.vector.reduce.and.v1i1(<1 x i1> undef)
@@ -44,6 +66,18 @@ define i32 @reduce_i1(i32 %arg) {
%V256 = call i1 @llvm.vector.reduce.and.v256i1(<256 x i1> undef)
%V512 = call i1 @llvm.vector.reduce.and.v512i1(<512 x i1> undef)
%V1024 = call i1 @llvm.vector.reduce.and.v1024i1(<1024 x i1> undef)
+
+ %NXV1 = call i1 @llvm.vector.reduce.and.nxv1i1(<vscale x 1 x i1> undef)
+ %NXV2 = call i1 @llvm.vector.reduce.and.nxv2i1(<vscale x 2 x i1> undef)
+ %NXV4 = call i1 @llvm.vector.reduce.and.nxv4i1(<vscale x 4 x i1> undef)
+ %NXV8 = call i1 @llvm.vector.reduce.and.nxv8i1(<vscale x 8 x i1> undef)
+ %NXV16 = call i1 @llvm.vector.reduce.and.nxv16i1(<vscale x 16 x i1> undef)
+ %NXV32 = call i1 @llvm.vector.reduce.and.nxv32i1(<vscale x 32 x i1> undef)
+ %NXV64 = call i1 @llvm.vector.reduce.and.nxv64i1(<vscale x 64 x i1> undef)
+ %NXV128 = call i1 @llvm.vector.reduce.and.nxv128i1(<vscale x 128 x i1> undef)
+ %NXV256 = call i1 @llvm.vector.reduce.and.nxv256i1(<vscale x 256 x i1> undef)
+ %NXV512 = call i1 @llvm.vector.reduce.and.nxv512i1(<vscale x 512 x i1> undef)
+ %NXV1024 = call i1 @llvm.vector.reduce.and.nxv1024i1(<vscale x 1024 x i1> undef)
ret i32 undef
}
diff --git a/llvm/test/Analysis/CostModel/RISCV/reduce-max.ll b/llvm/test/Analysis/CostModel/RISCV/reduce-max.ll
index f11e9f2b5ae837..5c9303af31747e 100644
--- a/llvm/test/Analysis/CostModel/RISCV/reduce-max.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/reduce-max.ll
@@ -176,7 +176,7 @@ define i32 @reduce_umax_i64(i32 %arg) {
define i32 @reduce_smin_i1(i32 %arg) {
; CHECK-LABEL: 'reduce_smin_i1'
-; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V1 = call i1 @llvm.vector.reduce.smax.v1i1(<1 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V1 = call i1 @llvm.vector.reduce.smax.v1i1(<1 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i1 @llvm.vector.reduce.smax.v2i1(<2 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i1 @llvm.vector.reduce.smax.v4i1(<4 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8 = call i1 @llvm.vector.reduce.smax.v8i1(<8 x i1> undef)
@@ -187,7 +187,7 @@ define i32 @reduce_smin_i1(i32 %arg) {
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;
; SIZE-LABEL: 'reduce_smin_i1'
-; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V1 = call i1 @llvm.vector.reduce.smax.v1i1(<1 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V1 = call i1 @llvm.vector.reduce.smax.v1i1(<1 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i1 @llvm.vector.reduce.smax.v2i1(<2 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i1 @llvm.vector.reduce.smax.v4i1(<4 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8 = call i1 @llvm.vector.reduce.smax.v8i1(<8 x i1> undef)
diff --git a/llvm/test/Analysis/CostModel/RISCV/reduce-min.ll b/llvm/test/Analysis/CostModel/RISCV/reduce-min.ll
index 457fdbe46f73b5..9875d3e5858115 100644
--- a/llvm/test/Analysis/CostModel/RISCV/reduce-min.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/reduce-min.ll
@@ -6,7 +6,7 @@
define i32 @reduce_umin_i1(i32 %arg) {
; CHECK-LABEL: 'reduce_umin_i1'
-; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V1 = call i1 @llvm.vector.reduce.umin.v1i1(<1 x i1> undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V1 = call i1 @llvm.vector.reduce.umin.v1i1(<1 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i1 @llvm.vector.reduce.umin.v2i1(<2 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i1 @llvm.vector.reduce.umin.v4i1(<4 x i1> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8 = call i1 @llvm.vector.reduce.umin.v8i1(<8 x i1> undef)
@@ -17,7 +17,7 @@ define i32 @reduce_umin_i1(i32 %arg) {
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;
; SIZE-LABEL: 'reduce_umin_i1'
-; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V1 = call i1 @llvm.vector.reduce.umin.v1i1(<1 x i1> undef)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V1 = call i1 @llvm.vector.reduce.umin.v1i1(<1 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i1 @llvm.vector.reduce.umin.v2i1(<2 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i1 @llvm.vector.reduce.umin.v4i1(<4 x i1> undef)
; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8 = call i1 @llvm.vector.reduce.umin.v8i1(<8 x i1> undef)
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vreductions-mask.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vreductions-mask.ll
index 44d4a8a1e04cda..0d31ec5f784352 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vreductions-mask.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vreductions-mask.ll
@@ -763,3 +763,475 @@ define zeroext i1 @vreduce_add_v64i1(<64 x i1> %v) {
%red = call i1 @llvm.vector.reduce.add.v64i1(<64 x i1> %v)
ret i1 %red
}
+
+declare i1 @llvm.vector.reduce.or.v128i1(<128 x i1>)
+
+define zeroext i1 @vreduce_or_v128i1(<128 x i1> %v) {
+; CHECK-LABEL: vreduce_or_v128i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vcpop.m a0, v0
+; CHECK-NEXT: snez a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.or.v128i1(<128 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.xor.v128i1(<128 x i1>)
+
+define zeroext i1 @vreduce_xor_v128i1(<128 x i1> %v) {
+; CHECK-LABEL: vreduce_xor_v128i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vcpop.m a0, v0
+; CHECK-NEXT: andi a0, a0, 1
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.xor.v128i1(<128 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.and.v128i1(<128 x i1>)
+
+define zeroext i1 @vreduce_and_v128i1(<128 x i1> %v) {
+; CHECK-LABEL: vreduce_and_v128i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vmnot.m v8, v0
+; CHECK-NEXT: vcpop.m a0, v8
+; CHECK-NEXT: seqz a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.and.v128i1(<128 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.umax.v128i1(<128 x i1>)
+
+define zeroext i1 @vreduce_umax_v128i1(<128 x i1> %v) {
+; CHECK-LABEL: vreduce_umax_v128i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vcpop.m a0, v0
+; CHECK-NEXT: snez a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.umax.v128i1(<128 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.smax.v128i1(<128 x i1>)
+
+define zeroext i1 @vreduce_smax_v128i1(<128 x i1> %v) {
+; CHECK-LABEL: vreduce_smax_v128i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vmnot.m v8, v0
+; CHECK-NEXT: vcpop.m a0, v8
+; CHECK-NEXT: seqz a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.smax.v128i1(<128 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.umin.v128i1(<128 x i1>)
+
+define zeroext i1 @vreduce_umin_v128i1(<128 x i1> %v) {
+; CHECK-LABEL: vreduce_umin_v128i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vmnot.m v8, v0
+; CHECK-NEXT: vcpop.m a0, v8
+; CHECK-NEXT: seqz a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.umin.v128i1(<128 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.smin.v128i1(<128 x i1>)
+
+define zeroext i1 @vreduce_smin_v128i1(<128 x i1> %v) {
+; CHECK-LABEL: vreduce_smin_v128i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vcpop.m a0, v0
+; CHECK-NEXT: snez a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.smin.v128i1(<128 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.or.v256i1(<256 x i1>)
+
+define zeroext i1 @vreduce_or_v256i1(<256 x i1> %v) {
+; CHECK-LABEL: vreduce_or_v256i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vmor.mm v8, v0, v8
+; CHECK-NEXT: vcpop.m a0, v8
+; CHECK-NEXT: snez a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.or.v256i1(<256 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.xor.v256i1(<256 x i1>)
+
+define zeroext i1 @vreduce_xor_v256i1(<256 x i1> %v) {
+; CHECK-LABEL: vreduce_xor_v256i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vmxor.mm v8, v0, v8
+; CHECK-NEXT: vcpop.m a0, v8
+; CHECK-NEXT: andi a0, a0, 1
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.xor.v256i1(<256 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.and.v256i1(<256 x i1>)
+
+define zeroext i1 @vreduce_and_v256i1(<256 x i1> %v) {
+; CHECK-LABEL: vreduce_and_v256i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vmnand.mm v8, v0, v8
+; CHECK-NEXT: vcpop.m a0, v8
+; CHECK-NEXT: seqz a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.and.v256i1(<256 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.umax.v256i1(<256 x i1>)
+
+define zeroext i1 @vreduce_umax_v256i1(<256 x i1> %v) {
+; CHECK-LABEL: vreduce_umax_v256i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vmor.mm v8, v0, v8
+; CHECK-NEXT: vcpop.m a0, v8
+; CHECK-NEXT: snez a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.umax.v256i1(<256 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.reduce.smax.v256i1(<256 x i1>)
+
+define zeroext i1 @vreduce_smax_v256i1(<256 x i1> %v) {
+; CHECK-LABEL: vreduce_smax_v256i1:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li a0, 128
+; CHECK-NEXT: vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT: vmnand.mm v8, v0, v8
+; CHECK-NEXT: vcpop.m a0, v8
+; CHECK-NEXT: seqz a0, a0
+; CHECK-NEXT: ret
+ %red = call i1 @llvm.vector.reduce.smax.v256i1(<256 x i1> %v)
+ ret i1 %red
+}
+
+declare i1 @llvm.vector.r...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, would you be able to precommit the tests that were added in this PR just so we can see the diff? Feel free to push force to this branch
97b0a45
to
06f4ca2
Compare
Done. Thanks! |
Gently ping. |
CostKind) + | ||
|
||
// Fixed VT: In v512i1 and larger vector elements, | ||
// Scalable VT: In v128i1 and larger vector elements, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v128i1 isn't a scalable type. Did you mean nxv128i1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks!
// Scalable VT: In v128i1 and larger vector elements, | ||
// the VMAND_MM instructions have started to be added. | ||
return ((LT.first >= 2) | ||
? LT.first - (LT.second.isScalableVector() ? 1 : 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why fixed vectors and scalable vectors would be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a file llvm/CodeGen/RISCV/rvv/reduce-and-i1.ll
to show the difference between the two. It seems that scalable vectors are not affected by VLEN.
return LT.first * getRISCVInstructionCost(RISCV::VMNAND_MM, LT.second, | ||
CostKind) + | ||
|
||
// Fixed VT: In v512i1 and larger vector elements, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v512i1 shouldn't require a VMAND if VL is >= 64. This should depended on minimum vlen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed. Thanks!
; CHECK: # %bb.0: | ||
; CHECK-NEXT: vsetvli a0, zero, e8, m8, ta, ma | ||
; CHECK-NEXT: vmand.mm v8, v0, v8 | ||
; CHECK-NEXT: vmnot.m v8, v8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's a missed optimization for combining vmand+vnot for scalable vectors? Is this what's making fixed and scalable different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the inconsistency in the generated instructions, the LMUL
used by vsetvli
in FixedVT varies depending on the VLEN
, whereas ScalableVT does not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the inconsistency fixed there should always be LT.second-2 vmands right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimization for scalable vectors should be present after #121812
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am combining the formulas of NumOfVMAND
now.
// Scalable VT: In nxv256i1 and larger vector elements, | ||
// Fixed VT: If getFixedSizeInBits() >= (4 * getRealMinVLen()), | ||
// the VMAND_MM instructions have started to be added. | ||
InstructionCost NumOfVMAND = 0; | ||
if (LT.second.isScalableVector() || | ||
LT.second.getFixedSizeInBits() == ST->getRealMinVLen()) | ||
NumOfVMAND = (LT.first > 2) ? (LT.first - 2) : 0; | ||
return NumOfVMAND * | ||
getRISCVInstructionCost(RISCV::VMAND_MM, LT.second, CostKind) + | ||
getRISCVInstructionCost(RISCV::VMNAND_MM, LT.second, CostKind) + | ||
getRISCVInstructionCost(RISCV::VCPOP_M, LT.second, CostKind) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing the split cost! My understanding is:
- For LMUL <= 8, there is no splitting, and the sequences are
vmnot
andvcpop
. - When LMUL > 8 and split == 1, it uses
vmnand
andvcpop
. - When LMUL > 8 and split > 1, the sequence becomes
vmand
* (split -2),vmnand
, andvcpop
.
So the cost formula would be something like
cost = vcpop;
if (LT.first == 1) cost += vmnot.
else cost += vmnand.
if (LT.first > 2) cost += (LT.first - 2) * vmand.
and this applies to both scalable VT and fixed VT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indeed makes the logic much clearer. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please wait for one more approval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -0,0 +1,239 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much overlap is there between this test and llvm/test/CodeGen/RISCV/rvv/vreductions-mask.ll and llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vreductions-mask.ll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions in CostModel/RISCV/reduce-and-i1.ll
and CodeGen/RISCV/rvv/reduce-and-i1.ll
are consistent, and they indeed overlap with CodeGen/RISCV/rvv/vreductions-mask.ll
and CodeGen/RISCV/rvv/fixed-vectors-vreductions-mask.ll
. CodeGen/RISCV/rvv/reduce-and-i1.ll
should be removed.
I think CostModel/RISCV/reduce-and-i1.ll
should be kept, as it only overlaps with CostModel/RISCV/reduce-and.ll
in a small portion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, CodeGen/RISCV/rvv/vreductions-mask.ll
and CodeGen/RISCV/rvv/fixed-vectors-vreductions-mask.ll
only test the default VLEN. Should different VLENs be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I added some CodeGen test cases related to reduce. To maintain consistency, I also added cases for instructions like `vector.reduce.or`. For cases where `v1i1` type generates `VFIRST`, please refer to: https://reviews.llvm.org/D139512.
I added some CodeGen test cases related to reduce. To maintain consistency, I also added cases for instructions like
vector.reduce.or
.For cases where
v1i1
type generatesVFIRST
, please refer to: https://reviews.llvm.org/D139512.