Skip to content

[RISCV][TTI] Model the cost of insert/extractelt when the vector split into multiple register group and idx exceed single group. #118401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 12, 2024

Conversation

ElvisWang123
Copy link
Contributor

@ElvisWang123 ElvisWang123 commented Dec 2, 2024

This patch implements the cost when the size of the vector need to split into multiple groups and the index exceed single vector group.

For extract element, we need to store split vectors to stack and load the target element.
For insert element, we need to store split vectors to stack and store the target element and load vectors back.

After this patch, the cost of insert/extract element will close to the generated assembly.

@llvmbot llvmbot added backend:RISC-V llvm:analysis Includes value tracking, cost tables and constant folding labels Dec 2, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 2, 2024

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-analysis

Author: Elvis Wang (ElvisWang123)

Changes

This patch implements the cost when the size of the vector need to split into multiple groups and the index exceed single vector group. Under this situation, we need the store the entire vector to stack then load the target element.

After this patch, the cost of extractelement will close to the generated assembly.


Patch is 36.77 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118401.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+17)
  • (modified) llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll (+44-44)
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 57f635ca6f42a8..20ca80aedab62c 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1945,6 +1945,23 @@ InstructionCost RISCVTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,
     // TODO: should we count these special vsetvlis?
     BaseCost = Opcode == Instruction::InsertElement ? 3 : 4;
   }
+
+  // When the vector need to split into multiple register groups and the index
+  // exceed single vector resgister group, we need to extract the element via
+  // stack.
+  if (Opcode == Instruction::ExtractElement && LT.first > 1 &&
+      ((Index == -1U) || (Index > LT.second.getVectorMinNumElements() &&
+                          LT.second.isScalableVector()))) {
+    Type *ScalarType = Val->getScalarType();
+    Align VecAlign = DL.getPrefTypeAlign(Val);
+    Align SclAlign = DL.getPrefTypeAlign(ScalarType);
+    // Store all split vectors into stack and load the target element.
+    return LT.first *
+               getMemoryOpCost(Instruction::Store, Val, VecAlign, 0, CostKind) +
+           getMemoryOpCost(Instruction::Load, ScalarType, SclAlign, 0,
+                           CostKind);
+  }
+
   return BaseCost + SlideCost;
 }
 
diff --git a/llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll b/llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll
index 618b7bc8945a50..34a323066689ba 100644
--- a/llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll
@@ -139,7 +139,7 @@ define void @extractelement_int(i32 %x) {
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv16i8_x = extractelement <vscale x 16 x i8> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv32i8_x = extractelement <vscale x 32 x i8> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv64i8_x = extractelement <vscale x 64 x i8> undef, i32 %x
-; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv128i8_x = extractelement <vscale x 128 x i8> undef, i32 %x
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %nxv128i8_x = extractelement <vscale x 128 x i8> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v2i16_x = extractelement <2 x i16> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v4i16_x = extractelement <4 x i16> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v8i16_x = extractelement <8 x i16> undef, i32 %x
@@ -151,7 +151,7 @@ define void @extractelement_int(i32 %x) {
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv8i16_x = extractelement <vscale x 8 x i16> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv16i16_x = extractelement <vscale x 16 x i16> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv32i16_x = extractelement <vscale x 32 x i16> undef, i32 %x
-; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv64i16_x = extractelement <vscale x 64 x i16> undef, i32 %x
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %nxv64i16_x = extractelement <vscale x 64 x i16> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v2i32_x = extractelement <2 x i32> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v4i32_x = extractelement <4 x i32> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v8i32_x = extractelement <8 x i32> undef, i32 %x
@@ -161,7 +161,7 @@ define void @extractelement_int(i32 %x) {
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv4i32_x = extractelement <vscale x 4 x i32> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv8i32_x = extractelement <vscale x 8 x i32> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv16i32_x = extractelement <vscale x 16 x i32> undef, i32 %x
-; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv32i32_x = extractelement <vscale x 32 x i32> undef, i32 %x
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %nxv32i32_x = extractelement <vscale x 32 x i32> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v2i64_x = extractelement <2 x i64> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v4i64_x = extractelement <4 x i64> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v8i64_x = extractelement <8 x i64> undef, i32 %x
@@ -169,7 +169,7 @@ define void @extractelement_int(i32 %x) {
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %nxv2i64_x = extractelement <vscale x 2 x i64> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %nxv4i64_x = extractelement <vscale x 4 x i64> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %nxv8i64_x = extractelement <vscale x 8 x i64> undef, i32 %x
-; RV32V-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %nxv16i64_x = extractelement <vscale x 16 x i64> undef, i32 %x
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 34 for instruction: %nxv16i64_x = extractelement <vscale x 16 x i64> undef, i32 %x
 ; RV32V-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; RV64V-LABEL: 'extractelement_int'
@@ -304,7 +304,7 @@ define void @extractelement_int(i32 %x) {
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv16i8_x = extractelement <vscale x 16 x i8> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv32i8_x = extractelement <vscale x 32 x i8> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv64i8_x = extractelement <vscale x 64 x i8> undef, i32 %x
-; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv128i8_x = extractelement <vscale x 128 x i8> undef, i32 %x
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %nxv128i8_x = extractelement <vscale x 128 x i8> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v2i16_x = extractelement <2 x i16> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v4i16_x = extractelement <4 x i16> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v8i16_x = extractelement <8 x i16> undef, i32 %x
@@ -316,7 +316,7 @@ define void @extractelement_int(i32 %x) {
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv8i16_x = extractelement <vscale x 8 x i16> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv16i16_x = extractelement <vscale x 16 x i16> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv32i16_x = extractelement <vscale x 32 x i16> undef, i32 %x
-; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv64i16_x = extractelement <vscale x 64 x i16> undef, i32 %x
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %nxv64i16_x = extractelement <vscale x 64 x i16> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v2i32_x = extractelement <2 x i32> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v4i32_x = extractelement <4 x i32> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v8i32_x = extractelement <8 x i32> undef, i32 %x
@@ -326,7 +326,7 @@ define void @extractelement_int(i32 %x) {
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv4i32_x = extractelement <vscale x 4 x i32> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv8i32_x = extractelement <vscale x 8 x i32> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv16i32_x = extractelement <vscale x 16 x i32> undef, i32 %x
-; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv32i32_x = extractelement <vscale x 32 x i32> undef, i32 %x
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %nxv32i32_x = extractelement <vscale x 32 x i32> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v2i64_x = extractelement <2 x i64> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v4i64_x = extractelement <4 x i64> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v8i64_x = extractelement <8 x i64> undef, i32 %x
@@ -334,7 +334,7 @@ define void @extractelement_int(i32 %x) {
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv2i64_x = extractelement <vscale x 2 x i64> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv4i64_x = extractelement <vscale x 4 x i64> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv8i64_x = extractelement <vscale x 8 x i64> undef, i32 %x
-; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv16i64_x = extractelement <vscale x 16 x i64> undef, i32 %x
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %nxv16i64_x = extractelement <vscale x 16 x i64> undef, i32 %x
 ; RV64V-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; RV32ZVE64X-LABEL: 'extractelement_int'
@@ -462,44 +462,44 @@ define void @extractelement_int(i32 %x) {
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v16i8_x = extractelement <16 x i8> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i8_x = extractelement <32 x i8> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v64i8_x = extractelement <64 x i8> undef, i32 %x
-; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v128i8_x = extractelement <128 x i8> undef, i32 %x
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %v128i8_x = extractelement <128 x i8> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv2i8_x = extractelement <vscale x 2 x i8> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv4i8_x = extractelement <vscale x 4 x i8> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv8i8_x = extractelement <vscale x 8 x i8> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv16i8_x = extractelement <vscale x 16 x i8> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv32i8_x = extractelement <vscale x 32 x i8> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv64i8_x = extractelement <vscale x 64 x i8> undef, i32 %x
-; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv128i8_x = extractelement <vscale x 128 x i8> undef, i32 %x
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %nxv128i8_x = extractelement <vscale x 128 x i8> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v2i16_x = extractelement <2 x i16> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v4i16_x = extractelement <4 x i16> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v8i16_x = extractelement <8 x i16> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v16i16_x = extractelement <16 x i16> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i16_x = extractelement <32 x i16> undef, i32 %x
-; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v64i16_x = extractelement <64 x i16> undef, i32 %x
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %v64i16_x = extractelement <64 x i16> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv2i16_x = extractelement <vscale x 2 x i16> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv4i16_x = extractelement <vscale x 4 x i16> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv8i16_x = extractelement <vscale x 8 x i16> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv16i16_x = extractelement <vscale x 16 x i16> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv32i16_x = extractelement <vscale x 32 x i16> undef, i32 %x
-; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv64i16_x = extractelement <vscale x 64 x i16> undef, i32 %x
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %nxv64i16_x = extractelement <vscale x 64 x i16> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v2i32_x = extractelement <2 x i32> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v4i32_x = extractelement <4 x i32> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v8i32_x = extractelement <8 x i32> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v16i32_x = extractelement <16 x i32> undef, i32 %x
-; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_x = extractelement <32 x i32> undef, i32 %x
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %v32i32_x = extractelement <32 x i32> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv2i32_x = extractelement <vscale x 2 x i32> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv4i32_x = extractelement <vscale x 4 x i32> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv8i32_x = extractelement <vscale x 8 x i32> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv16i32_x = extractelement <vscale x 16 x i32> undef, i32 %x
-; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv32i32_x = extractelement <vscale x 32 x i32> undef, i32 %x
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %nxv32i32_x = extractelement <vscale x 32 x i32> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v2i64_x = extractelement <2 x i64> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v4i64_x = extractelement <4 x i64> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v8i64_x = extractelement <8 x i64> undef, i32 %x
-; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v16i64_x = extractelement <16 x i64> undef, i32 %x
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 34 for instruction: %v16i64_x = extractelement <16 x i64> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %nxv2i64_x = extractelement <vscale x 2 x i64> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %nxv4i64_x = extractelement <vscale x 4 x i64> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %nxv8i64_x = extractelement <vscale x 8 x i64> undef, i32 %x
-; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %nxv16i64_x = extractelement <vscale x 16 x i64> undef, i32 %x
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 34 for instruction: %nxv16i64_x = extractelement <vscale x 16 x i64> undef, i32 %x
 ; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; RV64ZVE64X-LABEL: 'extractelement_int'
@@ -627,44 +627,44 @@ define void @extractelement_int(i32 %x) {
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v16i8_x = extractelement <16 x i8> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i8_x = extractelement <32 x i8> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v64i8_x = extractelement <64 x i8> undef, i32 %x
-; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v128i8_x = extractelement <128 x i8> undef, i32 %x
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %v128i8_x = extractelement <128 x i8> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv2i8_x = extractelement <vscale x 2 x i8> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv4i8_x = extractelement <vscale x 4 x i8> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv8i8_x = extractelement <vscale x 8 x i8> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv16i8_x = extractelement <vscale x 16 x i8> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv32i8_x = extractelement <vscale x 32 x i8> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv64i8_x = extractelement <vscale x 64 x i8> undef, i32 %x
-; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv128i8_x = extractelement <vscale x 128 x i8> undef, i32 %x
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %nxv128i8_x = extractelement <vscale x 128 x i8> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v2i16_x = extractelement <2 x i16> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v4i16_x = extractelement <4 x i16> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v8i16_x = extractelement <8 x i16> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v16i16_x = extractelement <16 x i16> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i16_x = extractelement <32 x i16> undef, i32 %x
-; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v64i16_x = extractelement <64 x i16> undef, i32 %x
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %v64i16_x = extractelement <64 x i16> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv2i16_x = extractelement <vscale x 2 x i16> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv4i16_x = extractelement <vscale x 4 x i16> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv8i16_x = extractelement <vscale x 8 x i16> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv16i16_x = extractelement <vscale x 16 x i16> undef, i32 %x
 ; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv32i16_x = extractelement <vscale x 32 x i16> undef, i32 %x
-; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nxv64i16_x = extracte...
[truncated]

@ElvisWang123 ElvisWang123 changed the title [RISCV][TTI] Model the cost of extractelt when the vector split into multiple register group and idx exceed single group. [RISCV][TTI] Model the cost of insert/extractelt when the vector split into multiple register group and idx exceed single group. Dec 4, 2024
@ElvisWang123
Copy link
Contributor Author

CodeGen for extract/insert element exceed single register group. https://godbolt.org/z/r1zKEoK1P


// Store all split vectors into stack and load the target element.
if (Opcode == Instruction::ExtractElement)
return LT.first * getMemoryOpCost(Instruction::Store, Val, VecAlign, 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to multiply by LT.first here. getMemoryOpCost does that internally. Or, at least, it should. If it's not doing so in the split case, we should fix that in getMemoryOpCost instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks!


// Store all split vectors into stack and store the target element and load
// vectors back.
return LT.first * (getMemoryOpCost(Instruction::Store, Val, VecAlign, 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks!

if (Opcode == Instruction::ExtractElement)
return LT.first * getMemoryOpCost(Instruction::Store, Val, VecAlign, 0,
CostKind) +
getMemoryOpCost(Instruction::Load, ScalarType, SclAlign, 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing the addressing cost in both cases here. For the vector, that should be handled inside getMemoryOpCost, but you need to include the ADDI for the non-constant index case on the scalar load or store.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!

@@ -1945,6 +1972,7 @@ InstructionCost RISCVTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,
// TODO: should we count these special vsetvlis?
BaseCost = Opcode == Instruction::InsertElement ? 3 : 4;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove stray whitespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks.

; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32 = extractelement <32 x i32> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v64i32 = extractelement <64 x i32> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %v128i8 = extractelement <128 x i8> undef, i32 %x
; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 129 for instruction: %v256i8 = extractelement <256 x i8> undef, i32 %x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cost seems way too big.

256 x i8 on V is 16 registers worth of data. Storing that should be ~16 in cost. A single scalar load and addressing is ~2-3. So, I'd expect something in the order of ~20, not ~129 here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 256 x i8 need 32 register under zve64x. 256 / 64 * 8 = 32
The cost of 128 x i8 is 18 and that is close to your expectation.

@preames
Copy link
Collaborator

preames commented Dec 10, 2024

CodeGen for extract/insert element exceed single register group. https://godbolt.org/z/r1zKEoK1P

Thank you for including this. It made spot checking the codegen much simpler.

…vslide.

This patch implement the cost when the size of the vector need to split
into multiple groups and the index exceed single vector group.
Under this situation, we need the store the vector to stack and load the
target element.

After this patch, the cost of extract element will close to the
generated assembly.
@ElvisWang123 ElvisWang123 force-pushed the upstream-getVectorInstrCost-new branch from c41b47e to fe662e0 Compare December 11, 2024 06:43
Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ElvisWang123 ElvisWang123 merged commit a674209 into llvm:main Dec 12, 2024
8 checks passed
@ElvisWang123 ElvisWang123 deleted the upstream-getVectorInstrCost-new branch December 12, 2024 01:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:RISC-V llvm:analysis Includes value tracking, cost tables and constant folding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants