-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RISCV][TTI] Model the cost of insert/extractelt when the vector split into multiple register group and idx exceed single group. #118401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV][TTI] Model the cost of insert/extractelt when the vector split into multiple register group and idx exceed single group. #118401
Conversation
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-llvm-analysis Author: Elvis Wang (ElvisWang123) ChangesThis patch implements the cost when the size of the vector need to split into multiple groups and the index exceed single vector group. Under this situation, we need the store the entire vector to stack then load the target element. After this patch, the cost of extractelement will close to the generated assembly. Patch is 36.77 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118401.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 57f635ca6f42a8..20ca80aedab62c 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1945,6 +1945,23 @@ InstructionCost RISCVTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,
// TODO: should we count these special vsetvlis?
BaseCost = Opcode == Instruction::InsertElement ? 3 : 4;
}
+
+ // When the vector need to split into multiple register groups and the index
+ // exceed single vector resgister group, we need to extract the element via
+ // stack.
+ if (Opcode == Instruction::ExtractElement && LT.first > 1 &&
+ ((Index == -1U) || (Index > LT.second.getVectorMinNumElements() &&
+ LT.second.isScalableVector()))) {
+ Type *ScalarType = Val->getScalarType();
+ Align VecAlign = DL.getPrefTypeAlign(Val);
+ Align SclAlign = DL.getPrefTypeAlign(ScalarType);
+ // Store all split vectors into stack and load the target element.
+ return LT.first *
+ getMemoryOpCost(Instruction::Store, Val, VecAlign, 0, CostKind) +
+ getMemoryOpCost(Instruction::Load, ScalarType, SclAlign, 0,
+ CostKind);
+ }
+
return BaseCost + SlideCost;
}
diff --git a/llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll b/llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll
index 618b7bc8945a50..34a323066689ba 100644
--- a/llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll
@@ -139,7 +139,7 @@ define void @extractelement_int(i32 %x) {
; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv16i8_x = extractelement <vscale x 16 x i8> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv32i8_x = extractelement <vscale x 32 x i8> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv64i8_x = extractelement <vscale x 64 x i8> undef, i32 %x
-; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv128i8_x = extractelement <vscale x 128 x i8> undef, i32 %x
+; RV32V-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %nxv128i8_x = extractelement <vscale x 128 x i8> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2i16_x = extractelement <2 x i16> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4i16_x = extractelement <4 x i16> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v8i16_x = extractelement <8 x i16> undef, i32 %x
@@ -151,7 +151,7 @@ define void @extractelement_int(i32 %x) {
; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv8i16_x = extractelement <vscale x 8 x i16> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv16i16_x = extractelement <vscale x 16 x i16> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv32i16_x = extractelement <vscale x 32 x i16> undef, i32 %x
-; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv64i16_x = extractelement <vscale x 64 x i16> undef, i32 %x
+; RV32V-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %nxv64i16_x = extractelement <vscale x 64 x i16> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2i32_x = extractelement <2 x i32> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4i32_x = extractelement <4 x i32> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v8i32_x = extractelement <8 x i32> undef, i32 %x
@@ -161,7 +161,7 @@ define void @extractelement_int(i32 %x) {
; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv4i32_x = extractelement <vscale x 4 x i32> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv8i32_x = extractelement <vscale x 8 x i32> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv16i32_x = extractelement <vscale x 16 x i32> undef, i32 %x
-; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv32i32_x = extractelement <vscale x 32 x i32> undef, i32 %x
+; RV32V-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %nxv32i32_x = extractelement <vscale x 32 x i32> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v2i64_x = extractelement <2 x i64> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v4i64_x = extractelement <4 x i64> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v8i64_x = extractelement <8 x i64> undef, i32 %x
@@ -169,7 +169,7 @@ define void @extractelement_int(i32 %x) {
; RV32V-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %nxv2i64_x = extractelement <vscale x 2 x i64> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %nxv4i64_x = extractelement <vscale x 4 x i64> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %nxv8i64_x = extractelement <vscale x 8 x i64> undef, i32 %x
-; RV32V-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %nxv16i64_x = extractelement <vscale x 16 x i64> undef, i32 %x
+; RV32V-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %nxv16i64_x = extractelement <vscale x 16 x i64> undef, i32 %x
; RV32V-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
; RV64V-LABEL: 'extractelement_int'
@@ -304,7 +304,7 @@ define void @extractelement_int(i32 %x) {
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv16i8_x = extractelement <vscale x 16 x i8> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv32i8_x = extractelement <vscale x 32 x i8> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv64i8_x = extractelement <vscale x 64 x i8> undef, i32 %x
-; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv128i8_x = extractelement <vscale x 128 x i8> undef, i32 %x
+; RV64V-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %nxv128i8_x = extractelement <vscale x 128 x i8> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2i16_x = extractelement <2 x i16> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4i16_x = extractelement <4 x i16> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v8i16_x = extractelement <8 x i16> undef, i32 %x
@@ -316,7 +316,7 @@ define void @extractelement_int(i32 %x) {
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv8i16_x = extractelement <vscale x 8 x i16> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv16i16_x = extractelement <vscale x 16 x i16> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv32i16_x = extractelement <vscale x 32 x i16> undef, i32 %x
-; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv64i16_x = extractelement <vscale x 64 x i16> undef, i32 %x
+; RV64V-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %nxv64i16_x = extractelement <vscale x 64 x i16> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2i32_x = extractelement <2 x i32> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4i32_x = extractelement <4 x i32> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v8i32_x = extractelement <8 x i32> undef, i32 %x
@@ -326,7 +326,7 @@ define void @extractelement_int(i32 %x) {
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv4i32_x = extractelement <vscale x 4 x i32> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv8i32_x = extractelement <vscale x 8 x i32> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv16i32_x = extractelement <vscale x 16 x i32> undef, i32 %x
-; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv32i32_x = extractelement <vscale x 32 x i32> undef, i32 %x
+; RV64V-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %nxv32i32_x = extractelement <vscale x 32 x i32> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2i64_x = extractelement <2 x i64> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4i64_x = extractelement <4 x i64> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v8i64_x = extractelement <8 x i64> undef, i32 %x
@@ -334,7 +334,7 @@ define void @extractelement_int(i32 %x) {
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv2i64_x = extractelement <vscale x 2 x i64> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv4i64_x = extractelement <vscale x 4 x i64> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv8i64_x = extractelement <vscale x 8 x i64> undef, i32 %x
-; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv16i64_x = extractelement <vscale x 16 x i64> undef, i32 %x
+; RV64V-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %nxv16i64_x = extractelement <vscale x 16 x i64> undef, i32 %x
; RV64V-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
; RV32ZVE64X-LABEL: 'extractelement_int'
@@ -462,44 +462,44 @@ define void @extractelement_int(i32 %x) {
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16i8_x = extractelement <16 x i8> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i8_x = extractelement <32 x i8> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v64i8_x = extractelement <64 x i8> undef, i32 %x
-; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v128i8_x = extractelement <128 x i8> undef, i32 %x
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %v128i8_x = extractelement <128 x i8> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv2i8_x = extractelement <vscale x 2 x i8> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv4i8_x = extractelement <vscale x 4 x i8> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv8i8_x = extractelement <vscale x 8 x i8> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv16i8_x = extractelement <vscale x 16 x i8> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv32i8_x = extractelement <vscale x 32 x i8> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv64i8_x = extractelement <vscale x 64 x i8> undef, i32 %x
-; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv128i8_x = extractelement <vscale x 128 x i8> undef, i32 %x
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %nxv128i8_x = extractelement <vscale x 128 x i8> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2i16_x = extractelement <2 x i16> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4i16_x = extractelement <4 x i16> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v8i16_x = extractelement <8 x i16> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16i16_x = extractelement <16 x i16> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i16_x = extractelement <32 x i16> undef, i32 %x
-; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v64i16_x = extractelement <64 x i16> undef, i32 %x
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %v64i16_x = extractelement <64 x i16> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv2i16_x = extractelement <vscale x 2 x i16> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv4i16_x = extractelement <vscale x 4 x i16> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv8i16_x = extractelement <vscale x 8 x i16> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv16i16_x = extractelement <vscale x 16 x i16> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv32i16_x = extractelement <vscale x 32 x i16> undef, i32 %x
-; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv64i16_x = extractelement <vscale x 64 x i16> undef, i32 %x
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %nxv64i16_x = extractelement <vscale x 64 x i16> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2i32_x = extractelement <2 x i32> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4i32_x = extractelement <4 x i32> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v8i32_x = extractelement <8 x i32> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16i32_x = extractelement <16 x i32> undef, i32 %x
-; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_x = extractelement <32 x i32> undef, i32 %x
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %v32i32_x = extractelement <32 x i32> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv2i32_x = extractelement <vscale x 2 x i32> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv4i32_x = extractelement <vscale x 4 x i32> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv8i32_x = extractelement <vscale x 8 x i32> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv16i32_x = extractelement <vscale x 16 x i32> undef, i32 %x
-; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv32i32_x = extractelement <vscale x 32 x i32> undef, i32 %x
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %nxv32i32_x = extractelement <vscale x 32 x i32> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v2i64_x = extractelement <2 x i64> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v4i64_x = extractelement <4 x i64> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v8i64_x = extractelement <8 x i64> undef, i32 %x
-; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v16i64_x = extractelement <16 x i64> undef, i32 %x
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %v16i64_x = extractelement <16 x i64> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %nxv2i64_x = extractelement <vscale x 2 x i64> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %nxv4i64_x = extractelement <vscale x 4 x i64> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %nxv8i64_x = extractelement <vscale x 8 x i64> undef, i32 %x
-; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %nxv16i64_x = extractelement <vscale x 16 x i64> undef, i32 %x
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %nxv16i64_x = extractelement <vscale x 16 x i64> undef, i32 %x
; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
; RV64ZVE64X-LABEL: 'extractelement_int'
@@ -627,44 +627,44 @@ define void @extractelement_int(i32 %x) {
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16i8_x = extractelement <16 x i8> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i8_x = extractelement <32 x i8> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v64i8_x = extractelement <64 x i8> undef, i32 %x
-; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v128i8_x = extractelement <128 x i8> undef, i32 %x
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %v128i8_x = extractelement <128 x i8> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv2i8_x = extractelement <vscale x 2 x i8> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv4i8_x = extractelement <vscale x 4 x i8> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv8i8_x = extractelement <vscale x 8 x i8> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv16i8_x = extractelement <vscale x 16 x i8> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv32i8_x = extractelement <vscale x 32 x i8> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv64i8_x = extractelement <vscale x 64 x i8> undef, i32 %x
-; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv128i8_x = extractelement <vscale x 128 x i8> undef, i32 %x
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %nxv128i8_x = extractelement <vscale x 128 x i8> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2i16_x = extractelement <2 x i16> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4i16_x = extractelement <4 x i16> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v8i16_x = extractelement <8 x i16> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16i16_x = extractelement <16 x i16> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i16_x = extractelement <32 x i16> undef, i32 %x
-; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v64i16_x = extractelement <64 x i16> undef, i32 %x
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %v64i16_x = extractelement <64 x i16> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv2i16_x = extractelement <vscale x 2 x i16> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv4i16_x = extractelement <vscale x 4 x i16> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv8i16_x = extractelement <vscale x 8 x i16> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv16i16_x = extractelement <vscale x 16 x i16> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv32i16_x = extractelement <vscale x 32 x i16> undef, i32 %x
-; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv64i16_x = extracte...
[truncated]
|
CodeGen for extract/insert element exceed single register group. https://godbolt.org/z/r1zKEoK1P |
|
||
// Store all split vectors into stack and load the target element. | ||
if (Opcode == Instruction::ExtractElement) | ||
return LT.first * getMemoryOpCost(Instruction::Store, Val, VecAlign, 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need to multiply by LT.first here. getMemoryOpCost does that internally. Or, at least, it should. If it's not doing so in the split case, we should fix that in getMemoryOpCost instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks!
|
||
// Store all split vectors into stack and store the target element and load | ||
// vectors back. | ||
return LT.first * (getMemoryOpCost(Instruction::Store, Val, VecAlign, 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same point here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks!
if (Opcode == Instruction::ExtractElement) | ||
return LT.first * getMemoryOpCost(Instruction::Store, Val, VecAlign, 0, | ||
CostKind) + | ||
getMemoryOpCost(Instruction::Load, ScalarType, SclAlign, 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing the addressing cost in both cases here. For the vector, that should be handled inside getMemoryOpCost, but you need to include the ADDI for the non-constant index case on the scalar load or store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, thanks!
@@ -1945,6 +1972,7 @@ InstructionCost RISCVTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, | |||
// TODO: should we count these special vsetvlis? | |||
BaseCost = Opcode == Instruction::InsertElement ? 3 : 4; | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove stray whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks.
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32 = extractelement <32 x i32> undef, i32 %x | ||
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v64i32 = extractelement <64 x i32> undef, i32 %x | ||
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %v128i8 = extractelement <128 x i8> undef, i32 %x | ||
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 129 for instruction: %v256i8 = extractelement <256 x i8> undef, i32 %x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cost seems way too big.
256 x i8 on V is 16 registers worth of data. Storing that should be ~16 in cost. A single scalar load and addressing is ~2-3. So, I'd expect something in the order of ~20, not ~129 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 256 x i8 need 32 register under zve64x. 256 / 64 * 8 = 32
The cost of 128 x i8 is 18
and that is close to your expectation.
Thank you for including this. It made spot checking the codegen much simpler. |
…vslide. This patch implement the cost when the size of the vector need to split into multiple groups and the index exceed single vector group. Under this situation, we need the store the vector to stack and load the target element. After this patch, the cost of extract element will close to the generated assembly.
c41b47e
to
fe662e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This patch implements the cost when the size of the vector need to split into multiple groups and the index exceed single vector group.
For extract element, we need to store split vectors to stack and load the target element.
For insert element, we need to store split vectors to stack and store the target element and load vectors back.
After this patch, the cost of insert/extract element will close to the generated assembly.