Skip to content

[RISCV] Prefer VLS over VLA if costs are equal #100564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wangpc-pp
Copy link
Contributor

This is inspired by #95819.

Some kernels like s000 have some improvements and we can reduce
code for calculating vector length, fully unroll tail epilogue.

Currently, we add a SubtargetFeature for this and the processors
can add it if needed.

This is inspired by llvm#95819.

Some kernels like s000 have some improvements and we can reduce
code for calculating vector length, fully unroll tail epilogue.

Currently, we add a SubtargetFeature for this and the processors
can add it if needed.
@llvmbot
Copy link
Member

llvmbot commented Jul 25, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Pengcheng Wang (wangpc-pp)

Changes

This is inspired by #95819.

Some kernels like s000 have some improvements and we can reduce
code for calculating vector length, fully unroll tail epilogue.

Currently, we add a SubtargetFeature for this and the processors
can add it if needed.


Full diff: https://github.com/llvm/llvm-project/pull/100564.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVFeatures.td (+6)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (+4)
  • (added) llvm/test/Transforms/LoopVectorize/RISCV/prefer-fixed-if-equal-to-scalable.ll (+165)
diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index 3c868dbbf8b3a..96ec2dcbb715b 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -1324,6 +1324,12 @@ def FeaturePredictableSelectIsExpensive
     : SubtargetFeature<"predictable-select-expensive", "PredictableSelectIsExpensive", "true",
                        "Prefer likely predicted branches over selects">;
 
+def FeatureUseFixedOverScalableIfEqualCost
+    : SubtargetFeature<"use-fixed-over-scalable-if-equal-cost",
+                       "UseFixedOverScalableIfEqualCost", "true",
+                       "Prefer fixed width loop vectorization over scalable"
+                       "if the cost-model assigns equal costs">;
+
 def TuneOptimizedZeroStrideLoad
    : SubtargetFeature<"optimized-zero-stride-load", "HasOptimizedZeroStrideLoad",
                       "true", "Optimized (perform fewer memory operations)"
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index 9c37a4f6ec2d0..fffae92e78b2f 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -342,6 +342,10 @@ class RISCVTTIImpl : public BasicTTIImplBase<RISCVTTIImpl> {
 
   bool enableInterleavedAccessVectorization() { return true; }
 
+  bool preferFixedOverScalableIfEqualCost() const {
+    return ST->useFixedOverScalableIfEqualCost();
+  }
+
   enum RISCVRegisterClass { GPRRC, FPRRC, VRRC };
   unsigned getNumberOfRegisters(unsigned ClassID) const {
     switch (ClassID) {
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/prefer-fixed-if-equal-to-scalable.ll b/llvm/test/Transforms/LoopVectorize/RISCV/prefer-fixed-if-equal-to-scalable.ll
new file mode 100644
index 0000000000000..eebd34958905c
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/prefer-fixed-if-equal-to-scalable.ll
@@ -0,0 +1,165 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -mtriple riscv64 -S -passes=loop-vectorize -force-target-instruction-cost=1 < %s \
+; RUN:   -mattr=+v | FileCheck %s -check-prefix=SCALABLE
+; RUN: opt -mtriple riscv64 -S -passes=loop-vectorize -force-target-instruction-cost=1 < %s  \
+; RUN:   -mattr=+v,+use-fixed-over-scalable-if-equal-cost \
+; RUN:   | FileCheck %s -check-prefix=FIXED
+
+define void @s000(ptr %a, ptr %b, i32 %n) {
+; SCALABLE-LABEL: define void @s000(
+; SCALABLE-SAME: ptr [[A:%.*]], ptr [[B:%.*]], i32 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; SCALABLE-NEXT:  [[ENTRY:.*:]]
+; SCALABLE-NEXT:    [[B2:%.*]] = ptrtoint ptr [[B]] to i64
+; SCALABLE-NEXT:    [[A1:%.*]] = ptrtoint ptr [[A]] to i64
+; SCALABLE-NEXT:    [[CMP6:%.*]] = icmp sgt i32 [[N]], 0
+; SCALABLE-NEXT:    br i1 [[CMP6]], label %[[FOR_BODY_PREHEADER:.*]], label %[[FOR_COND_CLEANUP:.*]]
+; SCALABLE:       [[FOR_BODY_PREHEADER]]:
+; SCALABLE-NEXT:    [[WIDE_TRIP_COUNT:%.*]] = zext nneg i32 [[N]] to i64
+; SCALABLE-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
+; SCALABLE-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 4
+; SCALABLE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.umax.i64(i64 8, i64 [[TMP1]])
+; SCALABLE-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], [[TMP2]]
+; SCALABLE-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; SCALABLE:       [[VECTOR_MEMCHECK]]:
+; SCALABLE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
+; SCALABLE-NEXT:    [[TMP4:%.*]] = mul i64 [[TMP3]], 4
+; SCALABLE-NEXT:    [[TMP5:%.*]] = mul i64 [[TMP4]], 4
+; SCALABLE-NEXT:    [[TMP6:%.*]] = sub i64 [[A1]], [[B2]]
+; SCALABLE-NEXT:    [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP6]], [[TMP5]]
+; SCALABLE-NEXT:    br i1 [[DIFF_CHECK]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
+; SCALABLE:       [[VECTOR_PH]]:
+; SCALABLE-NEXT:    [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
+; SCALABLE-NEXT:    [[TMP8:%.*]] = mul i64 [[TMP7]], 4
+; SCALABLE-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], [[TMP8]]
+; SCALABLE-NEXT:    [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
+; SCALABLE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
+; SCALABLE-NEXT:    [[TMP10:%.*]] = mul i64 [[TMP9]], 4
+; SCALABLE-NEXT:    br label %[[VECTOR_BODY:.*]]
+; SCALABLE:       [[VECTOR_BODY]]:
+; SCALABLE-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; SCALABLE-NEXT:    [[TMP11:%.*]] = add i64 [[INDEX]], 0
+; SCALABLE-NEXT:    [[TMP12:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[TMP11]]
+; SCALABLE-NEXT:    [[TMP13:%.*]] = getelementptr inbounds float, ptr [[TMP12]], i32 0
+; SCALABLE-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 4 x float>, ptr [[TMP13]], align 4
+; SCALABLE-NEXT:    [[TMP14:%.*]] = fadd <vscale x 4 x float> [[WIDE_LOAD]], shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 1.000000e+00, i64 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer)
+; SCALABLE-NEXT:    [[TMP15:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP11]]
+; SCALABLE-NEXT:    [[TMP16:%.*]] = getelementptr inbounds float, ptr [[TMP15]], i32 0
+; SCALABLE-NEXT:    store <vscale x 4 x float> [[TMP14]], ptr [[TMP16]], align 4
+; SCALABLE-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP10]]
+; SCALABLE-NEXT:    [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; SCALABLE-NEXT:    br i1 [[TMP17]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; SCALABLE:       [[MIDDLE_BLOCK]]:
+; SCALABLE-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
+; SCALABLE-NEXT:    br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; SCALABLE:       [[SCALAR_PH]]:
+; SCALABLE-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[FOR_BODY_PREHEADER]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; SCALABLE-NEXT:    br label %[[FOR_BODY:.*]]
+; SCALABLE:       [[FOR_COND_CLEANUP_LOOPEXIT]]:
+; SCALABLE-NEXT:    br label %[[FOR_COND_CLEANUP]]
+; SCALABLE:       [[FOR_COND_CLEANUP]]:
+; SCALABLE-NEXT:    ret void
+; SCALABLE:       [[FOR_BODY]]:
+; SCALABLE-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; SCALABLE-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[INDVARS_IV]]
+; SCALABLE-NEXT:    [[TMP18:%.*]] = load float, ptr [[ARRAYIDX]], align 4
+; SCALABLE-NEXT:    [[ADD:%.*]] = fadd float [[TMP18]], 1.000000e+00
+; SCALABLE-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[INDVARS_IV]]
+; SCALABLE-NEXT:    store float [[ADD]], ptr [[ARRAYIDX2]], align 4
+; SCALABLE-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; SCALABLE-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
+; SCALABLE-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP_LOOPEXIT]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+;
+; FIXED-LABEL: define void @s000(
+; FIXED-SAME: ptr [[A:%.*]], ptr [[B:%.*]], i32 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; FIXED-NEXT:  [[ENTRY:.*:]]
+; FIXED-NEXT:    [[B2:%.*]] = ptrtoint ptr [[B]] to i64
+; FIXED-NEXT:    [[A1:%.*]] = ptrtoint ptr [[A]] to i64
+; FIXED-NEXT:    [[CMP6:%.*]] = icmp sgt i32 [[N]], 0
+; FIXED-NEXT:    br i1 [[CMP6]], label %[[FOR_BODY_PREHEADER:.*]], label %[[FOR_COND_CLEANUP:.*]]
+; FIXED:       [[FOR_BODY_PREHEADER]]:
+; FIXED-NEXT:    [[WIDE_TRIP_COUNT:%.*]] = zext nneg i32 [[N]] to i64
+; FIXED-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], 16
+; FIXED-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; FIXED:       [[VECTOR_MEMCHECK]]:
+; FIXED-NEXT:    [[TMP0:%.*]] = sub i64 [[A1]], [[B2]]
+; FIXED-NEXT:    [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 64
+; FIXED-NEXT:    br i1 [[DIFF_CHECK]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
+; FIXED:       [[VECTOR_PH]]:
+; FIXED-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], 16
+; FIXED-NEXT:    [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
+; FIXED-NEXT:    br label %[[VECTOR_BODY:.*]]
+; FIXED:       [[VECTOR_BODY]]:
+; FIXED-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; FIXED-NEXT:    [[TMP1:%.*]] = add i64 [[INDEX]], 0
+; FIXED-NEXT:    [[TMP2:%.*]] = add i64 [[INDEX]], 8
+; FIXED-NEXT:    [[TMP3:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[TMP1]]
+; FIXED-NEXT:    [[TMP4:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[TMP2]]
+; FIXED-NEXT:    [[TMP5:%.*]] = getelementptr inbounds float, ptr [[TMP3]], i32 0
+; FIXED-NEXT:    [[TMP6:%.*]] = getelementptr inbounds float, ptr [[TMP3]], i32 8
+; FIXED-NEXT:    [[WIDE_LOAD:%.*]] = load <8 x float>, ptr [[TMP5]], align 4
+; FIXED-NEXT:    [[WIDE_LOAD3:%.*]] = load <8 x float>, ptr [[TMP6]], align 4
+; FIXED-NEXT:    [[TMP7:%.*]] = fadd <8 x float> [[WIDE_LOAD]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
+; FIXED-NEXT:    [[TMP8:%.*]] = fadd <8 x float> [[WIDE_LOAD3]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
+; FIXED-NEXT:    [[TMP9:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP1]]
+; FIXED-NEXT:    [[TMP10:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP2]]
+; FIXED-NEXT:    [[TMP11:%.*]] = getelementptr inbounds float, ptr [[TMP9]], i32 0
+; FIXED-NEXT:    [[TMP12:%.*]] = getelementptr inbounds float, ptr [[TMP9]], i32 8
+; FIXED-NEXT:    store <8 x float> [[TMP7]], ptr [[TMP11]], align 4
+; FIXED-NEXT:    store <8 x float> [[TMP8]], ptr [[TMP12]], align 4
+; FIXED-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
+; FIXED-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; FIXED-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; FIXED:       [[MIDDLE_BLOCK]]:
+; FIXED-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
+; FIXED-NEXT:    br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; FIXED:       [[SCALAR_PH]]:
+; FIXED-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[FOR_BODY_PREHEADER]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; FIXED-NEXT:    br label %[[FOR_BODY:.*]]
+; FIXED:       [[FOR_COND_CLEANUP_LOOPEXIT]]:
+; FIXED-NEXT:    br label %[[FOR_COND_CLEANUP]]
+; FIXED:       [[FOR_COND_CLEANUP]]:
+; FIXED-NEXT:    ret void
+; FIXED:       [[FOR_BODY]]:
+; FIXED-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; FIXED-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[INDVARS_IV]]
+; FIXED-NEXT:    [[TMP14:%.*]] = load float, ptr [[ARRAYIDX]], align 4
+; FIXED-NEXT:    [[ADD:%.*]] = fadd float [[TMP14]], 1.000000e+00
+; FIXED-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[INDVARS_IV]]
+; FIXED-NEXT:    store float [[ADD]], ptr [[ARRAYIDX2]], align 4
+; FIXED-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; FIXED-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
+; FIXED-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP_LOOPEXIT]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+;
+entry:
+  %cmp6 = icmp sgt i32 %n, 0
+  br i1 %cmp6, label %for.body.preheader, label %for.cond.cleanup
+
+for.body.preheader:
+  %wide.trip.count = zext nneg i32 %n to i64
+  br label %for.body
+
+for.cond.cleanup:
+  ret void
+
+for.body:
+  %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
+  %arrayidx = getelementptr inbounds float, ptr %b, i64 %indvars.iv
+  %0 = load float, ptr %arrayidx, align 4
+  %add = fadd float %0, 1.000000e+00
+  %arrayidx2 = getelementptr inbounds float, ptr %a, i64 %indvars.iv
+  store float %add, ptr %arrayidx2, align 4
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+}
+;.
+; SCALABLE: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; SCALABLE: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; SCALABLE: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; SCALABLE: [[LOOP3]] = distinct !{[[LOOP3]], [[META1]]}
+;.
+; FIXED: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; FIXED: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; FIXED: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; FIXED: [[LOOP3]] = distinct !{[[LOOP3]], [[META1]]}
+;.

@llvmbot
Copy link
Member

llvmbot commented Jul 25, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Pengcheng Wang (wangpc-pp)

Changes

This is inspired by #95819.

Some kernels like s000 have some improvements and we can reduce
code for calculating vector length, fully unroll tail epilogue.

Currently, we add a SubtargetFeature for this and the processors
can add it if needed.


Full diff: https://github.com/llvm/llvm-project/pull/100564.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVFeatures.td (+6)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (+4)
  • (added) llvm/test/Transforms/LoopVectorize/RISCV/prefer-fixed-if-equal-to-scalable.ll (+165)
diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index 3c868dbbf8b3a..96ec2dcbb715b 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -1324,6 +1324,12 @@ def FeaturePredictableSelectIsExpensive
     : SubtargetFeature<"predictable-select-expensive", "PredictableSelectIsExpensive", "true",
                        "Prefer likely predicted branches over selects">;
 
+def FeatureUseFixedOverScalableIfEqualCost
+    : SubtargetFeature<"use-fixed-over-scalable-if-equal-cost",
+                       "UseFixedOverScalableIfEqualCost", "true",
+                       "Prefer fixed width loop vectorization over scalable"
+                       "if the cost-model assigns equal costs">;
+
 def TuneOptimizedZeroStrideLoad
    : SubtargetFeature<"optimized-zero-stride-load", "HasOptimizedZeroStrideLoad",
                       "true", "Optimized (perform fewer memory operations)"
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index 9c37a4f6ec2d0..fffae92e78b2f 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -342,6 +342,10 @@ class RISCVTTIImpl : public BasicTTIImplBase<RISCVTTIImpl> {
 
   bool enableInterleavedAccessVectorization() { return true; }
 
+  bool preferFixedOverScalableIfEqualCost() const {
+    return ST->useFixedOverScalableIfEqualCost();
+  }
+
   enum RISCVRegisterClass { GPRRC, FPRRC, VRRC };
   unsigned getNumberOfRegisters(unsigned ClassID) const {
     switch (ClassID) {
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/prefer-fixed-if-equal-to-scalable.ll b/llvm/test/Transforms/LoopVectorize/RISCV/prefer-fixed-if-equal-to-scalable.ll
new file mode 100644
index 0000000000000..eebd34958905c
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/prefer-fixed-if-equal-to-scalable.ll
@@ -0,0 +1,165 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -mtriple riscv64 -S -passes=loop-vectorize -force-target-instruction-cost=1 < %s \
+; RUN:   -mattr=+v | FileCheck %s -check-prefix=SCALABLE
+; RUN: opt -mtriple riscv64 -S -passes=loop-vectorize -force-target-instruction-cost=1 < %s  \
+; RUN:   -mattr=+v,+use-fixed-over-scalable-if-equal-cost \
+; RUN:   | FileCheck %s -check-prefix=FIXED
+
+define void @s000(ptr %a, ptr %b, i32 %n) {
+; SCALABLE-LABEL: define void @s000(
+; SCALABLE-SAME: ptr [[A:%.*]], ptr [[B:%.*]], i32 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; SCALABLE-NEXT:  [[ENTRY:.*:]]
+; SCALABLE-NEXT:    [[B2:%.*]] = ptrtoint ptr [[B]] to i64
+; SCALABLE-NEXT:    [[A1:%.*]] = ptrtoint ptr [[A]] to i64
+; SCALABLE-NEXT:    [[CMP6:%.*]] = icmp sgt i32 [[N]], 0
+; SCALABLE-NEXT:    br i1 [[CMP6]], label %[[FOR_BODY_PREHEADER:.*]], label %[[FOR_COND_CLEANUP:.*]]
+; SCALABLE:       [[FOR_BODY_PREHEADER]]:
+; SCALABLE-NEXT:    [[WIDE_TRIP_COUNT:%.*]] = zext nneg i32 [[N]] to i64
+; SCALABLE-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
+; SCALABLE-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 4
+; SCALABLE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.umax.i64(i64 8, i64 [[TMP1]])
+; SCALABLE-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], [[TMP2]]
+; SCALABLE-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; SCALABLE:       [[VECTOR_MEMCHECK]]:
+; SCALABLE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
+; SCALABLE-NEXT:    [[TMP4:%.*]] = mul i64 [[TMP3]], 4
+; SCALABLE-NEXT:    [[TMP5:%.*]] = mul i64 [[TMP4]], 4
+; SCALABLE-NEXT:    [[TMP6:%.*]] = sub i64 [[A1]], [[B2]]
+; SCALABLE-NEXT:    [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP6]], [[TMP5]]
+; SCALABLE-NEXT:    br i1 [[DIFF_CHECK]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
+; SCALABLE:       [[VECTOR_PH]]:
+; SCALABLE-NEXT:    [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
+; SCALABLE-NEXT:    [[TMP8:%.*]] = mul i64 [[TMP7]], 4
+; SCALABLE-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], [[TMP8]]
+; SCALABLE-NEXT:    [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
+; SCALABLE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
+; SCALABLE-NEXT:    [[TMP10:%.*]] = mul i64 [[TMP9]], 4
+; SCALABLE-NEXT:    br label %[[VECTOR_BODY:.*]]
+; SCALABLE:       [[VECTOR_BODY]]:
+; SCALABLE-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; SCALABLE-NEXT:    [[TMP11:%.*]] = add i64 [[INDEX]], 0
+; SCALABLE-NEXT:    [[TMP12:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[TMP11]]
+; SCALABLE-NEXT:    [[TMP13:%.*]] = getelementptr inbounds float, ptr [[TMP12]], i32 0
+; SCALABLE-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 4 x float>, ptr [[TMP13]], align 4
+; SCALABLE-NEXT:    [[TMP14:%.*]] = fadd <vscale x 4 x float> [[WIDE_LOAD]], shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 1.000000e+00, i64 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer)
+; SCALABLE-NEXT:    [[TMP15:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP11]]
+; SCALABLE-NEXT:    [[TMP16:%.*]] = getelementptr inbounds float, ptr [[TMP15]], i32 0
+; SCALABLE-NEXT:    store <vscale x 4 x float> [[TMP14]], ptr [[TMP16]], align 4
+; SCALABLE-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP10]]
+; SCALABLE-NEXT:    [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; SCALABLE-NEXT:    br i1 [[TMP17]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; SCALABLE:       [[MIDDLE_BLOCK]]:
+; SCALABLE-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
+; SCALABLE-NEXT:    br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; SCALABLE:       [[SCALAR_PH]]:
+; SCALABLE-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[FOR_BODY_PREHEADER]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; SCALABLE-NEXT:    br label %[[FOR_BODY:.*]]
+; SCALABLE:       [[FOR_COND_CLEANUP_LOOPEXIT]]:
+; SCALABLE-NEXT:    br label %[[FOR_COND_CLEANUP]]
+; SCALABLE:       [[FOR_COND_CLEANUP]]:
+; SCALABLE-NEXT:    ret void
+; SCALABLE:       [[FOR_BODY]]:
+; SCALABLE-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; SCALABLE-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[INDVARS_IV]]
+; SCALABLE-NEXT:    [[TMP18:%.*]] = load float, ptr [[ARRAYIDX]], align 4
+; SCALABLE-NEXT:    [[ADD:%.*]] = fadd float [[TMP18]], 1.000000e+00
+; SCALABLE-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[INDVARS_IV]]
+; SCALABLE-NEXT:    store float [[ADD]], ptr [[ARRAYIDX2]], align 4
+; SCALABLE-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; SCALABLE-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
+; SCALABLE-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP_LOOPEXIT]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+;
+; FIXED-LABEL: define void @s000(
+; FIXED-SAME: ptr [[A:%.*]], ptr [[B:%.*]], i32 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; FIXED-NEXT:  [[ENTRY:.*:]]
+; FIXED-NEXT:    [[B2:%.*]] = ptrtoint ptr [[B]] to i64
+; FIXED-NEXT:    [[A1:%.*]] = ptrtoint ptr [[A]] to i64
+; FIXED-NEXT:    [[CMP6:%.*]] = icmp sgt i32 [[N]], 0
+; FIXED-NEXT:    br i1 [[CMP6]], label %[[FOR_BODY_PREHEADER:.*]], label %[[FOR_COND_CLEANUP:.*]]
+; FIXED:       [[FOR_BODY_PREHEADER]]:
+; FIXED-NEXT:    [[WIDE_TRIP_COUNT:%.*]] = zext nneg i32 [[N]] to i64
+; FIXED-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], 16
+; FIXED-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; FIXED:       [[VECTOR_MEMCHECK]]:
+; FIXED-NEXT:    [[TMP0:%.*]] = sub i64 [[A1]], [[B2]]
+; FIXED-NEXT:    [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 64
+; FIXED-NEXT:    br i1 [[DIFF_CHECK]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
+; FIXED:       [[VECTOR_PH]]:
+; FIXED-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], 16
+; FIXED-NEXT:    [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
+; FIXED-NEXT:    br label %[[VECTOR_BODY:.*]]
+; FIXED:       [[VECTOR_BODY]]:
+; FIXED-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; FIXED-NEXT:    [[TMP1:%.*]] = add i64 [[INDEX]], 0
+; FIXED-NEXT:    [[TMP2:%.*]] = add i64 [[INDEX]], 8
+; FIXED-NEXT:    [[TMP3:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[TMP1]]
+; FIXED-NEXT:    [[TMP4:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[TMP2]]
+; FIXED-NEXT:    [[TMP5:%.*]] = getelementptr inbounds float, ptr [[TMP3]], i32 0
+; FIXED-NEXT:    [[TMP6:%.*]] = getelementptr inbounds float, ptr [[TMP3]], i32 8
+; FIXED-NEXT:    [[WIDE_LOAD:%.*]] = load <8 x float>, ptr [[TMP5]], align 4
+; FIXED-NEXT:    [[WIDE_LOAD3:%.*]] = load <8 x float>, ptr [[TMP6]], align 4
+; FIXED-NEXT:    [[TMP7:%.*]] = fadd <8 x float> [[WIDE_LOAD]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
+; FIXED-NEXT:    [[TMP8:%.*]] = fadd <8 x float> [[WIDE_LOAD3]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
+; FIXED-NEXT:    [[TMP9:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP1]]
+; FIXED-NEXT:    [[TMP10:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP2]]
+; FIXED-NEXT:    [[TMP11:%.*]] = getelementptr inbounds float, ptr [[TMP9]], i32 0
+; FIXED-NEXT:    [[TMP12:%.*]] = getelementptr inbounds float, ptr [[TMP9]], i32 8
+; FIXED-NEXT:    store <8 x float> [[TMP7]], ptr [[TMP11]], align 4
+; FIXED-NEXT:    store <8 x float> [[TMP8]], ptr [[TMP12]], align 4
+; FIXED-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
+; FIXED-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; FIXED-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; FIXED:       [[MIDDLE_BLOCK]]:
+; FIXED-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
+; FIXED-NEXT:    br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; FIXED:       [[SCALAR_PH]]:
+; FIXED-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[FOR_BODY_PREHEADER]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; FIXED-NEXT:    br label %[[FOR_BODY:.*]]
+; FIXED:       [[FOR_COND_CLEANUP_LOOPEXIT]]:
+; FIXED-NEXT:    br label %[[FOR_COND_CLEANUP]]
+; FIXED:       [[FOR_COND_CLEANUP]]:
+; FIXED-NEXT:    ret void
+; FIXED:       [[FOR_BODY]]:
+; FIXED-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; FIXED-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[INDVARS_IV]]
+; FIXED-NEXT:    [[TMP14:%.*]] = load float, ptr [[ARRAYIDX]], align 4
+; FIXED-NEXT:    [[ADD:%.*]] = fadd float [[TMP14]], 1.000000e+00
+; FIXED-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[INDVARS_IV]]
+; FIXED-NEXT:    store float [[ADD]], ptr [[ARRAYIDX2]], align 4
+; FIXED-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; FIXED-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
+; FIXED-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP_LOOPEXIT]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+;
+entry:
+  %cmp6 = icmp sgt i32 %n, 0
+  br i1 %cmp6, label %for.body.preheader, label %for.cond.cleanup
+
+for.body.preheader:
+  %wide.trip.count = zext nneg i32 %n to i64
+  br label %for.body
+
+for.cond.cleanup:
+  ret void
+
+for.body:
+  %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
+  %arrayidx = getelementptr inbounds float, ptr %b, i64 %indvars.iv
+  %0 = load float, ptr %arrayidx, align 4
+  %add = fadd float %0, 1.000000e+00
+  %arrayidx2 = getelementptr inbounds float, ptr %a, i64 %indvars.iv
+  store float %add, ptr %arrayidx2, align 4
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+}
+;.
+; SCALABLE: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; SCALABLE: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; SCALABLE: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; SCALABLE: [[LOOP3]] = distinct !{[[LOOP3]], [[META1]]}
+;.
+; FIXED: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; FIXED: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; FIXED: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; FIXED: [[LOOP3]] = distinct !{[[LOOP3]], [[META1]]}
+;.

@wangpc-pp wangpc-pp requested review from lukel97 and topperc July 25, 2024 12:35
Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me

@preames
Copy link
Collaborator

preames commented Jul 25, 2024

I would prefer this not land without some evidence of profitability on at least some hardware.

I glanced at something like this - though I keyed mine off having an exactly known VLEN from -mrvv-vector-bits=zvl - and glanced at some test deltas. I didn't see much, and what I did see appeared to be easily addressable in the scalable lowering.

I'm not really opposed to the idea, I just want to make sure we're not carrying complexity for no value.

@topperc
Copy link
Collaborator

topperc commented Jul 25, 2024

Is the cost affected by Zvl*b? I see in the test case, SCALABLE is LMUL=2 vectorization. For FIXED, we use LMUL=2 registers, but VL=8. If the runtime VLEN is greater than 128 the scalable code will be able to process more elements per iteration.

If the microarchitecture doesn't dynamically reduce LMUL based on VL, then the FIXED code will waste resources with VLEN > 128 since all 8 elements will fit in half of an LMUL=2 register.

@lukel97
Copy link
Contributor

lukel97 commented Jul 26, 2024

Is the cost affected by Zvl*b?

Doesn't the vectorizer estimate the scalable VF costs based off of Zvl*b via getVScaleForTuning to begin with? This would only make a difference in a tie-breaker. For other cases today the vectorizer might still choose a fixed VF even if the runtime VLEN might be more profitable

@wangpc-pp
Copy link
Contributor Author

I would prefer this not land without some evidence of profitability on at least some hardware.

Compile TSVC with clang -O2 -mcpu=spacemit-x60 -Xclang -target-feature -Xclang +use-fixed-over-scalable-if-equal-cost and test it on Spacemit K1 board, we can see some improvments (we can reduce 1 scalar instruction in some loops like s121).

If the runtime VLEN is greater than 128 the scalable code will be able to process more elements per iteration.

I was thinking that we are just compiling for one CPU with -mcpu.

@preames
Copy link
Collaborator

preames commented Jul 29, 2024

Compile TSVC with clang -O2 -mcpu=spacemit-x60 -Xclang -target-feature -Xclang +use-fixed-over-scalable-if-equal-cost and test it on Spacemit K1 board, we can see some improvments (we can reduce 1 scalar instruction in some loops like s121).

Can you share actual perf results? And maybe an example where we actually save the instruction? I'm wondering if we have an opportunity to improve the scalable lowering.

@wangpc-pp
Copy link
Contributor Author

wangpc-pp commented Jul 30, 2024

Compile TSVC with clang -O2 -mcpu=spacemit-x60 -Xclang -target-feature -Xclang +use-fixed-over-scalable-if-equal-cost and test it on Spacemit K1 board, we can see some improvments (we can reduce 1 scalar instruction in some loops like s121).

Can you share actual perf results?

I'm glad to but I just found the result has variability (some kernels have large runtime diffs, even if there is no binary diff). Do you have a method to make the result stable?

And maybe an example where we actually save the instruction? I'm wondering if we have an opportunity to improve the scalable lowering.

Yeah, please have a look at: https://godbolt.org/z/5fshc7Enq. I disable loop unrolling here to make the assemblies comparable.
As you can see, we can reduce one scalar instruction (for adding trip count I think) if we are doing fixed-vector vectorization.

@topperc
Copy link
Collaborator

topperc commented Jul 30, 2024

Compile TSVC with clang -O2 -mcpu=spacemit-x60 -Xclang -target-feature -Xclang +use-fixed-over-scalable-if-equal-cost and test it on Spacemit K1 board, we can see some improvments (we can reduce 1 scalar instruction in some loops like s121).

Can you share actual perf results?

I'm glad to but I just found the result has variability (some kernels have large runtime diffs, even if there is no binary diff). Do you have a method to make the result stable?

And maybe an example where we actually save the instruction? I'm wondering if we have an opportunity to improve the scalable lowering.

Yeah, please have a look at: https://godbolt.org/z/5fshc7Enq. I disable loop unrolling here to make the assemblies comparable. As you can see, we can reduce one scalar instruction (for adding trip count I think) if we are doing fixed-vector vectorization.

Looks maybe LSR related? Disabling LSR has the same number of instructions in both loops.

@preames
Copy link
Collaborator

preames commented Jul 30, 2024

Looks maybe LSR related? Disabling LSR has the same number of instructions in both loops.

First impression is this is a deficiency in lsr term-folding. If you want to file a separate bug for this, I can take a look and see if it's easy to fix. My initial guess is that we're struggling to prove the step non-zero given the slightly complicate expression there.

Worth noting is that adding -mrvv-vector-bits=zvl to both sides results in exactly the same code, and that code appears reasonable.

@wangpc-pp
Copy link
Contributor Author

Worth noting is that adding -mrvv-vector-bits=zvl to both sides results in exactly the same code, and that code appears reasonable.

I think it is because we are using fixed-vectors with -mrvv-vector-bits=zvl.

@preames
Copy link
Collaborator

preames commented Jul 31, 2024

Worth noting is that adding -mrvv-vector-bits=zvl to both sides results in exactly the same code, and that code appears reasonable.

I think it is because we are using fixed-vectors with -mrvv-vector-bits=zvl.

On the compiler explorer example you posted, we still use scalable types. We just know the exact size of the scalable type.

wangpc-pp added a commit to wangpc-pp/llvm-project that referenced this pull request Aug 9, 2024
So that SCEV can analyse the bound of loop count.

This can fix issue found in llvm#100564.
wangpc-pp added a commit to wangpc-pp/llvm-project that referenced this pull request Aug 13, 2024
So that SCEV can analyse the bound of loop count.

This can fix issue found in llvm#100564.
wangpc-pp added a commit to wangpc-pp/llvm-project that referenced this pull request Aug 19, 2024
So that SCEV can analyse the bound of loop count.

This can fix issue found in llvm#100564.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants