Skip to content

[RISCV] Enable TTI::shouldDropLSRSolutionIfLessProfitable by default #89927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

asb
Copy link
Contributor

@asb asb commented Apr 24, 2024

This avoids some cases where LSR produces results that lead to very poor codegen. There's a chance we'll see minor degradations for some inputs in the case that our metrics say the found solution is worse, but in reality it's better than the starting point.


Stacks on #89924

Not tested enough yet to realistically merge at this point, but posting early in case others have looked at this already. Here is an example input that has terrible codegen without this:

%struct.ham = type { i64, i32, i32, i32, i32, i32, i32, i32, i32, i64, i32, i64, i64, i32, i64 }

define i32 @main() {
bb:
  %call = tail call ptr null(i64 0)
  br label %bb2

bb1:                                              ; No predecessors!
  %load = load i32, ptr %call, align 4
  ret i32 0

bb2:                                              ; preds = %bb2, %bb
  %phi = phi i64 [ 0, %bb ], [ %add, %bb2 ]
  %getelementptr = getelementptr %struct.ham, ptr %call, i64 %phi
  %getelementptr3 = getelementptr i8, ptr %getelementptr, i64 8
  store i32 0, ptr %getelementptr3, align 8
  %getelementptr4 = getelementptr i8, ptr %getelementptr, i64 12
  store i32 0, ptr %getelementptr4, align 4
  %getelementptr5 = getelementptr i8, ptr %getelementptr, i64 16
  store i32 0, ptr %getelementptr5, align 8
  %getelementptr6 = getelementptr i8, ptr %getelementptr, i64 20
  store i32 0, ptr %getelementptr6, align 4
  %getelementptr7 = getelementptr i8, ptr %getelementptr, i64 24
  store i32 0, ptr %getelementptr7, align 8
  %getelementptr8 = getelementptr i8, ptr %getelementptr, i64 28
  store i32 0, ptr %getelementptr8, align 4
  %getelementptr9 = getelementptr i8, ptr %getelementptr, i64 32
  store i32 0, ptr %getelementptr9, align 8
  %getelementptr10 = getelementptr i8, ptr %getelementptr, i64 36
  store i32 0, ptr %getelementptr10, align 4
  %getelementptr11 = getelementptr i8, ptr %getelementptr, i64 40
  store i64 0, ptr %getelementptr11, align 8
  %getelementptr12 = getelementptr i8, ptr %getelementptr, i64 48
  store i32 0, ptr %getelementptr12, align 8
  %getelementptr13 = getelementptr i8, ptr %getelementptr, i64 72
  store i32 0, ptr %getelementptr13, align 8
  %getelementptr14 = getelementptr i8, ptr %getelementptr, i64 80
  store i64 0, ptr %getelementptr14, align 8
  %add = add i64 %phi, 1
  br label %bb2
}

There's probably an LSR change related to its search heuristic that would address this (drop the last gep+store and LSR finds a good solution), but if LSR finds something that by it's own metrics is much worse than the starting point then going ahead with it seems hard to justify.

@asb asb requested review from preames, lukel97 and topperc April 24, 2024 13:35
@llvmbot llvmbot added backend:RISC-V llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Apr 24, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 24, 2024

@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-transforms

Author: Alex Bradbury (asb)

Changes

This avoids some cases where LSR produces results that lead to very poor codegen. There's a chance we'll see minor degradations for some inputs in the case that our metrics say the found solution is worse, but in reality it's better than the starting point.


Stacks on #89924

Not tested enough yet to realistically merge at this point, but posting early in case others have looked at this already. Here is an example input that has terrible codegen without this:

%struct.ham = type { i64, i32, i32, i32, i32, i32, i32, i32, i32, i64, i32, i64, i64, i32, i64 }

define i32 @<!-- -->main() {
bb:
  %call = tail call ptr null(i64 0)
  br label %bb2

bb1:                                              ; No predecessors!
  %load = load i32, ptr %call, align 4
  ret i32 0

bb2:                                              ; preds = %bb2, %bb
  %phi = phi i64 [ 0, %bb ], [ %add, %bb2 ]
  %getelementptr = getelementptr %struct.ham, ptr %call, i64 %phi
  %getelementptr3 = getelementptr i8, ptr %getelementptr, i64 8
  store i32 0, ptr %getelementptr3, align 8
  %getelementptr4 = getelementptr i8, ptr %getelementptr, i64 12
  store i32 0, ptr %getelementptr4, align 4
  %getelementptr5 = getelementptr i8, ptr %getelementptr, i64 16
  store i32 0, ptr %getelementptr5, align 8
  %getelementptr6 = getelementptr i8, ptr %getelementptr, i64 20
  store i32 0, ptr %getelementptr6, align 4
  %getelementptr7 = getelementptr i8, ptr %getelementptr, i64 24
  store i32 0, ptr %getelementptr7, align 8
  %getelementptr8 = getelementptr i8, ptr %getelementptr, i64 28
  store i32 0, ptr %getelementptr8, align 4
  %getelementptr9 = getelementptr i8, ptr %getelementptr, i64 32
  store i32 0, ptr %getelementptr9, align 8
  %getelementptr10 = getelementptr i8, ptr %getelementptr, i64 36
  store i32 0, ptr %getelementptr10, align 4
  %getelementptr11 = getelementptr i8, ptr %getelementptr, i64 40
  store i64 0, ptr %getelementptr11, align 8
  %getelementptr12 = getelementptr i8, ptr %getelementptr, i64 48
  store i32 0, ptr %getelementptr12, align 8
  %getelementptr13 = getelementptr i8, ptr %getelementptr, i64 72
  store i32 0, ptr %getelementptr13, align 8
  %getelementptr14 = getelementptr i8, ptr %getelementptr, i64 80
  store i64 0, ptr %getelementptr14, align 8
  %add = add i64 %phi, 1
  br label %bb2
}

There's probably an LSR change related to its search heuristic, but finding a solution that by LSR's metrics is much worse than the starting point and then going ahead with it seems hard to justify.


Patch is 52.26 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/89927.diff

9 Files Affected:

  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+8)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+2)
  • (modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+4)
  • (modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+4)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (+2)
  • (modified) llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp (+15-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/dont-sink-splat-operands.ll (+63-66)
  • (modified) llvm/test/CodeGen/RISCV/rvv/sink-splat-operands.ll (+229-245)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll (+18-18)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 58c69ac939763a..7849da9606b259 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -740,6 +740,10 @@ class TargetTransformInfo {
   /// When successful, makes the primary IV dead.
   bool shouldFoldTerminatingConditionAfterLSR() const;
 
+  /// Return true if LSR should drop a found solution if it's calculated to be
+  /// less profitable than the baseline.
+  bool shouldDropLSRSolutionIfLessProfitable() const;
+
   /// \returns true if LSR should not optimize a chain that includes \p I.
   bool isProfitableLSRChainElement(Instruction *I) const;
 
@@ -1861,6 +1865,7 @@ class TargetTransformInfo::Concept {
                              const TargetTransformInfo::LSRCost &C2) = 0;
   virtual bool isNumRegsMajorCostOfLSR() = 0;
   virtual bool shouldFoldTerminatingConditionAfterLSR() const = 0;
+  virtual bool shouldDropLSRSolutionIfLessProfitable() const = 0;
   virtual bool isProfitableLSRChainElement(Instruction *I) = 0;
   virtual bool canMacroFuseCmp() = 0;
   virtual bool canSaveCmp(Loop *L, BranchInst **BI, ScalarEvolution *SE,
@@ -2333,6 +2338,9 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
   bool shouldFoldTerminatingConditionAfterLSR() const override {
     return Impl.shouldFoldTerminatingConditionAfterLSR();
   }
+  bool shouldDropLSRSolutionIfLessProfitable() const override {
+    return Impl.shouldDropLSRSolutionIfLessProfitable();
+  }
   bool isProfitableLSRChainElement(Instruction *I) override {
     return Impl.isProfitableLSRChainElement(I);
   }
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 4d5cd963e09267..221e851ba23c01 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -240,6 +240,8 @@ class TargetTransformInfoImplBase {
 
   bool shouldFoldTerminatingConditionAfterLSR() const { return false; }
 
+  bool shouldDropLSRSolutionIfLessProfitable() const { return false; }
+
   bool isProfitableLSRChainElement(Instruction *I) const { return false; }
 
   bool canMacroFuseCmp() const { return false; }
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 06a19c75cf873a..cfa43709e58e10 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -398,6 +398,10 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
         shouldFoldTerminatingConditionAfterLSR();
   }
 
+  bool shouldDropLSRSolutionIfLessProfitable() const {
+    return TargetTransformInfoImplBase::shouldDropLSRSolutionIfLessProfitable();
+  }
+
   bool isProfitableLSRChainElement(Instruction *I) {
     return TargetTransformInfoImplBase::isProfitableLSRChainElement(I);
   }
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 33c899fe889990..a8f258ab14561e 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -427,6 +427,10 @@ bool TargetTransformInfo::shouldFoldTerminatingConditionAfterLSR() const {
   return TTIImpl->shouldFoldTerminatingConditionAfterLSR();
 }
 
+bool TargetTransformInfo::shouldDropLSRSolutionIfLessProfitable() const {
+  return TTIImpl->shouldDropLSRSolutionIfLessProfitable();
+}
+
 bool TargetTransformInfo::isProfitableLSRChainElement(Instruction *I) const {
   return TTIImpl->isProfitableLSRChainElement(I);
 }
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index 2f9281ab892447..fa3a88a700226f 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -397,6 +397,8 @@ class RISCVTTIImpl : public BasicTTIImplBase<RISCVTTIImpl> {
   bool shouldFoldTerminatingConditionAfterLSR() const {
     return true;
   }
+
+  bool shouldDropLSRSolutionIfLessProfitable() const { return true; }
 };
 
 } // end namespace llvm
diff --git a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
index ec42e2d6e193a6..1384ecfce373c7 100644
--- a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
@@ -193,8 +193,8 @@ static cl::opt<cl::boolOrDefault> AllowTerminatingConditionFoldingAfterLSR(
     "lsr-term-fold", cl::Hidden,
     cl::desc("Attempt to replace primary IV with other IV."));
 
-static cl::opt<bool> AllowDropSolutionIfLessProfitable(
-    "lsr-drop-solution", cl::Hidden, cl::init(false),
+static cl::opt<cl::boolOrDefault> AllowDropSolutionIfLessProfitable(
+    "lsr-drop-solution", cl::Hidden,
     cl::desc("Attempt to drop solution if it is less profitable"));
 
 STATISTIC(NumTermFold,
@@ -5248,10 +5248,22 @@ void LSRInstance::Solve(SmallVectorImpl<const Formula *> &Solution) const {
 
   assert(Solution.size() == Uses.size() && "Malformed solution!");
 
+  const bool EnableDropUnprofitableSolution = [&] {
+    switch (AllowDropSolutionIfLessProfitable) {
+    case cl::BOU_TRUE:
+      return true;
+    case cl::BOU_FALSE:
+      return false;
+    case cl::BOU_UNSET:
+      return TTI.shouldDropLSRSolutionIfLessProfitable();
+    }
+    llvm_unreachable("Unhandled cl::boolOrDefault enum");
+  }();
+
   if (BaselineCost.isLess(SolutionCost)) {
     LLVM_DEBUG(dbgs() << "The baseline solution requires ";
                BaselineCost.print(dbgs()); dbgs() << "\n");
-    if (!AllowDropSolutionIfLessProfitable)
+    if (!EnableDropUnprofitableSolution)
       LLVM_DEBUG(
           dbgs() << "Baseline is more profitable than chosen solution, "
                     "add option 'lsr-drop-solution' to drop LSR solution.\n");
diff --git a/llvm/test/CodeGen/RISCV/rvv/dont-sink-splat-operands.ll b/llvm/test/CodeGen/RISCV/rvv/dont-sink-splat-operands.ll
index dc4d28819bbbd8..92639be0017e8f 100644
--- a/llvm/test/CodeGen/RISCV/rvv/dont-sink-splat-operands.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/dont-sink-splat-operands.ll
@@ -86,30 +86,29 @@ declare i64 @llvm.vscale.i64()
 define void @sink_splat_add_scalable(ptr nocapture %a, i32 signext %x) {
 ; NO-SINK-LABEL: sink_splat_add_scalable:
 ; NO-SINK:       # %bb.0: # %entry
-; NO-SINK-NEXT:    csrr a5, vlenb
-; NO-SINK-NEXT:    srli a2, a5, 1
+; NO-SINK-NEXT:    csrr a2, vlenb
+; NO-SINK-NEXT:    srli a2, a2, 1
 ; NO-SINK-NEXT:    li a3, 1024
 ; NO-SINK-NEXT:    bgeu a3, a2, .LBB1_2
 ; NO-SINK-NEXT:  # %bb.1:
 ; NO-SINK-NEXT:    li a3, 0
 ; NO-SINK-NEXT:    j .LBB1_5
 ; NO-SINK-NEXT:  .LBB1_2: # %vector.ph
+; NO-SINK-NEXT:    li a5, 0
 ; NO-SINK-NEXT:    addi a3, a2, -1
 ; NO-SINK-NEXT:    andi a4, a3, 1024
 ; NO-SINK-NEXT:    xori a3, a4, 1024
 ; NO-SINK-NEXT:    vsetvli a6, zero, e32, m2, ta, ma
 ; NO-SINK-NEXT:    vmv.v.x v8, a1
-; NO-SINK-NEXT:    slli a5, a5, 1
-; NO-SINK-NEXT:    mv a6, a0
-; NO-SINK-NEXT:    mv a7, a3
 ; NO-SINK-NEXT:  .LBB1_3: # %vector.body
 ; NO-SINK-NEXT:    # =>This Inner Loop Header: Depth=1
+; NO-SINK-NEXT:    slli a6, a5, 2
+; NO-SINK-NEXT:    add a6, a0, a6
 ; NO-SINK-NEXT:    vl2re32.v v10, (a6)
 ; NO-SINK-NEXT:    vadd.vv v10, v10, v8
+; NO-SINK-NEXT:    add a5, a5, a2
 ; NO-SINK-NEXT:    vs2r.v v10, (a6)
-; NO-SINK-NEXT:    sub a7, a7, a2
-; NO-SINK-NEXT:    add a6, a6, a5
-; NO-SINK-NEXT:    bnez a7, .LBB1_3
+; NO-SINK-NEXT:    bne a5, a3, .LBB1_3
 ; NO-SINK-NEXT:  # %bb.4: # %middle.block
 ; NO-SINK-NEXT:    beqz a4, .LBB1_7
 ; NO-SINK-NEXT:  .LBB1_5: # %for.body.preheader
@@ -129,29 +128,28 @@ define void @sink_splat_add_scalable(ptr nocapture %a, i32 signext %x) {
 ;
 ; SINK-LABEL: sink_splat_add_scalable:
 ; SINK:       # %bb.0: # %entry
-; SINK-NEXT:    csrr a5, vlenb
-; SINK-NEXT:    srli a2, a5, 1
+; SINK-NEXT:    csrr a2, vlenb
+; SINK-NEXT:    srli a2, a2, 1
 ; SINK-NEXT:    li a3, 1024
 ; SINK-NEXT:    bgeu a3, a2, .LBB1_2
 ; SINK-NEXT:  # %bb.1:
 ; SINK-NEXT:    li a3, 0
 ; SINK-NEXT:    j .LBB1_5
 ; SINK-NEXT:  .LBB1_2: # %vector.ph
+; SINK-NEXT:    li a5, 0
 ; SINK-NEXT:    addi a3, a2, -1
 ; SINK-NEXT:    andi a4, a3, 1024
 ; SINK-NEXT:    xori a3, a4, 1024
-; SINK-NEXT:    slli a5, a5, 1
 ; SINK-NEXT:    vsetvli a6, zero, e32, m2, ta, ma
-; SINK-NEXT:    mv a6, a0
-; SINK-NEXT:    mv a7, a3
 ; SINK-NEXT:  .LBB1_3: # %vector.body
 ; SINK-NEXT:    # =>This Inner Loop Header: Depth=1
+; SINK-NEXT:    slli a6, a5, 2
+; SINK-NEXT:    add a6, a0, a6
 ; SINK-NEXT:    vl2re32.v v8, (a6)
 ; SINK-NEXT:    vadd.vx v8, v8, a1
+; SINK-NEXT:    add a5, a5, a2
 ; SINK-NEXT:    vs2r.v v8, (a6)
-; SINK-NEXT:    sub a7, a7, a2
-; SINK-NEXT:    add a6, a6, a5
-; SINK-NEXT:    bnez a7, .LBB1_3
+; SINK-NEXT:    bne a5, a3, .LBB1_3
 ; SINK-NEXT:  # %bb.4: # %middle.block
 ; SINK-NEXT:    beqz a4, .LBB1_7
 ; SINK-NEXT:  .LBB1_5: # %for.body.preheader
@@ -171,29 +169,28 @@ define void @sink_splat_add_scalable(ptr nocapture %a, i32 signext %x) {
 ;
 ; DEFAULT-LABEL: sink_splat_add_scalable:
 ; DEFAULT:       # %bb.0: # %entry
-; DEFAULT-NEXT:    csrr a5, vlenb
-; DEFAULT-NEXT:    srli a2, a5, 1
+; DEFAULT-NEXT:    csrr a2, vlenb
+; DEFAULT-NEXT:    srli a2, a2, 1
 ; DEFAULT-NEXT:    li a3, 1024
 ; DEFAULT-NEXT:    bgeu a3, a2, .LBB1_2
 ; DEFAULT-NEXT:  # %bb.1:
 ; DEFAULT-NEXT:    li a3, 0
 ; DEFAULT-NEXT:    j .LBB1_5
 ; DEFAULT-NEXT:  .LBB1_2: # %vector.ph
+; DEFAULT-NEXT:    li a5, 0
 ; DEFAULT-NEXT:    addi a3, a2, -1
 ; DEFAULT-NEXT:    andi a4, a3, 1024
 ; DEFAULT-NEXT:    xori a3, a4, 1024
-; DEFAULT-NEXT:    slli a5, a5, 1
 ; DEFAULT-NEXT:    vsetvli a6, zero, e32, m2, ta, ma
-; DEFAULT-NEXT:    mv a6, a0
-; DEFAULT-NEXT:    mv a7, a3
 ; DEFAULT-NEXT:  .LBB1_3: # %vector.body
 ; DEFAULT-NEXT:    # =>This Inner Loop Header: Depth=1
+; DEFAULT-NEXT:    slli a6, a5, 2
+; DEFAULT-NEXT:    add a6, a0, a6
 ; DEFAULT-NEXT:    vl2re32.v v8, (a6)
 ; DEFAULT-NEXT:    vadd.vx v8, v8, a1
+; DEFAULT-NEXT:    add a5, a5, a2
 ; DEFAULT-NEXT:    vs2r.v v8, (a6)
-; DEFAULT-NEXT:    sub a7, a7, a2
-; DEFAULT-NEXT:    add a6, a6, a5
-; DEFAULT-NEXT:    bnez a7, .LBB1_3
+; DEFAULT-NEXT:    bne a5, a3, .LBB1_3
 ; DEFAULT-NEXT:  # %bb.4: # %middle.block
 ; DEFAULT-NEXT:    beqz a4, .LBB1_7
 ; DEFAULT-NEXT:  .LBB1_5: # %for.body.preheader
@@ -407,32 +404,32 @@ define void @sink_splat_fadd_scalable(ptr nocapture %a, float %x) {
 ; NO-SINK-LABEL: sink_splat_fadd_scalable:
 ; NO-SINK:       # %bb.0: # %entry
 ; NO-SINK-NEXT:    csrr a1, vlenb
-; NO-SINK-NEXT:    srli a2, a1, 2
-; NO-SINK-NEXT:    li a3, 1024
-; NO-SINK-NEXT:    bgeu a3, a2, .LBB4_2
+; NO-SINK-NEXT:    srli a1, a1, 2
+; NO-SINK-NEXT:    li a2, 1024
+; NO-SINK-NEXT:    bgeu a2, a1, .LBB4_2
 ; NO-SINK-NEXT:  # %bb.1:
-; NO-SINK-NEXT:    li a3, 0
+; NO-SINK-NEXT:    li a2, 0
 ; NO-SINK-NEXT:    j .LBB4_5
 ; NO-SINK-NEXT:  .LBB4_2: # %vector.ph
-; NO-SINK-NEXT:    addi a3, a2, -1
-; NO-SINK-NEXT:    andi a4, a3, 1024
-; NO-SINK-NEXT:    xori a3, a4, 1024
+; NO-SINK-NEXT:    li a4, 0
+; NO-SINK-NEXT:    addi a2, a1, -1
+; NO-SINK-NEXT:    andi a3, a2, 1024
+; NO-SINK-NEXT:    xori a2, a3, 1024
 ; NO-SINK-NEXT:    vsetvli a5, zero, e32, m1, ta, ma
 ; NO-SINK-NEXT:    vfmv.v.f v8, fa0
-; NO-SINK-NEXT:    mv a5, a0
-; NO-SINK-NEXT:    mv a6, a3
 ; NO-SINK-NEXT:  .LBB4_3: # %vector.body
 ; NO-SINK-NEXT:    # =>This Inner Loop Header: Depth=1
+; NO-SINK-NEXT:    slli a5, a4, 2
+; NO-SINK-NEXT:    add a5, a0, a5
 ; NO-SINK-NEXT:    vl1re32.v v9, (a5)
 ; NO-SINK-NEXT:    vfadd.vv v9, v9, v8
+; NO-SINK-NEXT:    add a4, a4, a1
 ; NO-SINK-NEXT:    vs1r.v v9, (a5)
-; NO-SINK-NEXT:    sub a6, a6, a2
-; NO-SINK-NEXT:    add a5, a5, a1
-; NO-SINK-NEXT:    bnez a6, .LBB4_3
+; NO-SINK-NEXT:    bne a4, a2, .LBB4_3
 ; NO-SINK-NEXT:  # %bb.4: # %middle.block
-; NO-SINK-NEXT:    beqz a4, .LBB4_7
+; NO-SINK-NEXT:    beqz a3, .LBB4_7
 ; NO-SINK-NEXT:  .LBB4_5: # %for.body.preheader
-; NO-SINK-NEXT:    slli a1, a3, 2
+; NO-SINK-NEXT:    slli a1, a2, 2
 ; NO-SINK-NEXT:    add a1, a0, a1
 ; NO-SINK-NEXT:    lui a2, 1
 ; NO-SINK-NEXT:    add a0, a0, a2
@@ -449,31 +446,31 @@ define void @sink_splat_fadd_scalable(ptr nocapture %a, float %x) {
 ; SINK-LABEL: sink_splat_fadd_scalable:
 ; SINK:       # %bb.0: # %entry
 ; SINK-NEXT:    csrr a1, vlenb
-; SINK-NEXT:    srli a2, a1, 2
-; SINK-NEXT:    li a3, 1024
-; SINK-NEXT:    bgeu a3, a2, .LBB4_2
+; SINK-NEXT:    srli a1, a1, 2
+; SINK-NEXT:    li a2, 1024
+; SINK-NEXT:    bgeu a2, a1, .LBB4_2
 ; SINK-NEXT:  # %bb.1:
-; SINK-NEXT:    li a3, 0
+; SINK-NEXT:    li a2, 0
 ; SINK-NEXT:    j .LBB4_5
 ; SINK-NEXT:  .LBB4_2: # %vector.ph
-; SINK-NEXT:    addi a3, a2, -1
-; SINK-NEXT:    andi a4, a3, 1024
-; SINK-NEXT:    xori a3, a4, 1024
+; SINK-NEXT:    li a4, 0
+; SINK-NEXT:    addi a2, a1, -1
+; SINK-NEXT:    andi a3, a2, 1024
+; SINK-NEXT:    xori a2, a3, 1024
 ; SINK-NEXT:    vsetvli a5, zero, e32, m1, ta, ma
-; SINK-NEXT:    mv a5, a0
-; SINK-NEXT:    mv a6, a3
 ; SINK-NEXT:  .LBB4_3: # %vector.body
 ; SINK-NEXT:    # =>This Inner Loop Header: Depth=1
+; SINK-NEXT:    slli a5, a4, 2
+; SINK-NEXT:    add a5, a0, a5
 ; SINK-NEXT:    vl1re32.v v8, (a5)
 ; SINK-NEXT:    vfadd.vf v8, v8, fa0
+; SINK-NEXT:    add a4, a4, a1
 ; SINK-NEXT:    vs1r.v v8, (a5)
-; SINK-NEXT:    sub a6, a6, a2
-; SINK-NEXT:    add a5, a5, a1
-; SINK-NEXT:    bnez a6, .LBB4_3
+; SINK-NEXT:    bne a4, a2, .LBB4_3
 ; SINK-NEXT:  # %bb.4: # %middle.block
-; SINK-NEXT:    beqz a4, .LBB4_7
+; SINK-NEXT:    beqz a3, .LBB4_7
 ; SINK-NEXT:  .LBB4_5: # %for.body.preheader
-; SINK-NEXT:    slli a1, a3, 2
+; SINK-NEXT:    slli a1, a2, 2
 ; SINK-NEXT:    add a1, a0, a1
 ; SINK-NEXT:    lui a2, 1
 ; SINK-NEXT:    add a0, a0, a2
@@ -490,31 +487,31 @@ define void @sink_splat_fadd_scalable(ptr nocapture %a, float %x) {
 ; DEFAULT-LABEL: sink_splat_fadd_scalable:
 ; DEFAULT:       # %bb.0: # %entry
 ; DEFAULT-NEXT:    csrr a1, vlenb
-; DEFAULT-NEXT:    srli a2, a1, 2
-; DEFAULT-NEXT:    li a3, 1024
-; DEFAULT-NEXT:    bgeu a3, a2, .LBB4_2
+; DEFAULT-NEXT:    srli a1, a1, 2
+; DEFAULT-NEXT:    li a2, 1024
+; DEFAULT-NEXT:    bgeu a2, a1, .LBB4_2
 ; DEFAULT-NEXT:  # %bb.1:
-; DEFAULT-NEXT:    li a3, 0
+; DEFAULT-NEXT:    li a2, 0
 ; DEFAULT-NEXT:    j .LBB4_5
 ; DEFAULT-NEXT:  .LBB4_2: # %vector.ph
-; DEFAULT-NEXT:    addi a3, a2, -1
-; DEFAULT-NEXT:    andi a4, a3, 1024
-; DEFAULT-NEXT:    xori a3, a4, 1024
+; DEFAULT-NEXT:    li a4, 0
+; DEFAULT-NEXT:    addi a2, a1, -1
+; DEFAULT-NEXT:    andi a3, a2, 1024
+; DEFAULT-NEXT:    xori a2, a3, 1024
 ; DEFAULT-NEXT:    vsetvli a5, zero, e32, m1, ta, ma
-; DEFAULT-NEXT:    mv a5, a0
-; DEFAULT-NEXT:    mv a6, a3
 ; DEFAULT-NEXT:  .LBB4_3: # %vector.body
 ; DEFAULT-NEXT:    # =>This Inner Loop Header: Depth=1
+; DEFAULT-NEXT:    slli a5, a4, 2
+; DEFAULT-NEXT:    add a5, a0, a5
 ; DEFAULT-NEXT:    vl1re32.v v8, (a5)
 ; DEFAULT-NEXT:    vfadd.vf v8, v8, fa0
+; DEFAULT-NEXT:    add a4, a4, a1
 ; DEFAULT-NEXT:    vs1r.v v8, (a5)
-; DEFAULT-NEXT:    sub a6, a6, a2
-; DEFAULT-NEXT:    add a5, a5, a1
-; DEFAULT-NEXT:    bnez a6, .LBB4_3
+; DEFAULT-NEXT:    bne a4, a2, .LBB4_3
 ; DEFAULT-NEXT:  # %bb.4: # %middle.block
-; DEFAULT-NEXT:    beqz a4, .LBB4_7
+; DEFAULT-NEXT:    beqz a3, .LBB4_7
 ; DEFAULT-NEXT:  .LBB4_5: # %for.body.preheader
-; DEFAULT-NEXT:    slli a1, a3, 2
+; DEFAULT-NEXT:    slli a1, a2, 2
 ; DEFAULT-NEXT:    add a1, a0, a1
 ; DEFAULT-NEXT:    lui a2, 1
 ; DEFAULT-NEXT:    add a0, a0, a2
diff --git a/llvm/test/CodeGen/RISCV/rvv/sink-splat-operands.ll b/llvm/test/CodeGen/RISCV/rvv/sink-splat-operands.ll
index 9046c861c3367a..94b5996b768f80 100644
--- a/llvm/test/CodeGen/RISCV/rvv/sink-splat-operands.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/sink-splat-operands.ll
@@ -243,29 +243,28 @@ for.cond.cleanup:                                 ; preds = %vector.body
 define void @sink_splat_mul_scalable(ptr nocapture %a, i32 signext %x) {
 ; CHECK-LABEL: sink_splat_mul_scalable:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    csrr a5, vlenb
-; CHECK-NEXT:    srli a2, a5, 1
+; CHECK-NEXT:    csrr a2, vlenb
+; CHECK-NEXT:    srli a2, a2, 1
 ; CHECK-NEXT:    li a3, 1024
 ; CHECK-NEXT:    bgeu a3, a2, .LBB7_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    li a3, 0
 ; CHECK-NEXT:    j .LBB7_5
 ; CHECK-NEXT:  .LBB7_2: # %vector.ph
+; CHECK-NEXT:    li a5, 0
 ; CHECK-NEXT:    addi a3, a2, -1
 ; CHECK-NEXT:    andi a4, a3, 1024
 ; CHECK-NEXT:    xori a3, a4, 1024
-; CHECK-NEXT:    slli a5, a5, 1
 ; CHECK-NEXT:    vsetvli a6, zero, e32, m2, ta, ma
-; CHECK-NEXT:    mv a6, a0
-; CHECK-NEXT:    mv a7, a3
 ; CHECK-NEXT:  .LBB7_3: # %vector.body
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    slli a6, a5, 2
+; CHECK-NEXT:    add a6, a0, a6
 ; CHECK-NEXT:    vl2re32.v v8, (a6)
 ; CHECK-NEXT:    vmul.vx v8, v8, a1
+; CHECK-NEXT:    add a5, a5, a2
 ; CHECK-NEXT:    vs2r.v v8, (a6)
-; CHECK-NEXT:    sub a7, a7, a2
-; CHECK-NEXT:    add a6, a6, a5
-; CHECK-NEXT:    bnez a7, .LBB7_3
+; CHECK-NEXT:    bne a5, a3, .LBB7_3
 ; CHECK-NEXT:  # %bb.4: # %middle.block
 ; CHECK-NEXT:    beqz a4, .LBB7_7
 ; CHECK-NEXT:  .LBB7_5: # %for.body.preheader
@@ -334,29 +333,28 @@ for.body:                                         ; preds = %for.body.preheader,
 define void @sink_splat_add_scalable(ptr nocapture %a, i32 signext %x) {
 ; CHECK-LABEL: sink_splat_add_scalable:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    csrr a5, vlenb
-; CHECK-NEXT:    srli a2, a5, 1
+; CHECK-NEXT:    csrr a2, vlenb
+; CHECK-NEXT:    srli a2, a2, 1
 ; CHECK-NEXT:    li a3, 1024
 ; CHECK-NEXT:    bgeu a3, a2, .LBB8_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    li a3, 0
 ; CHECK-NEXT:    j .LBB8_5
 ; CHECK-NEXT:  .LBB8_2: # %vector.ph
+; CHECK-NEXT:    li a5, 0
 ; CHECK-NEXT:    addi a3, a2, -1
 ; CHECK-NEXT:    andi a4, a3, 1024
 ; CHECK-NEXT:    xori a3, a4, 1024
-; CHECK-NEXT:    slli a5, a5, 1
 ; CHECK-NEXT:    vsetvli a6, zero, e32, m2, ta, ma
-; CHECK-NEXT:    mv a6, a0
-; CHECK-NEXT:    mv a7, a3
 ; CHECK-NEXT:  .LBB8_3: # %vector.body
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    slli a6, a5, 2
+; CHECK-NEXT:    add a6, a0, a6
 ; CHECK-NEXT:    vl2re32.v v8, (a6)
 ; CHECK-NEXT:    vadd.vx v8, v8, a1
+; CHECK-NEXT:    add a5, a5, a2
 ; CHECK-NEXT:    vs2r.v v8, (a6)
-; CHECK-NEXT:    sub a7, a7, a2
-; CHECK-NEXT:    add a6, a6, a5
-; CHECK-NEXT:    bnez a7, .LBB8_3
+; CHECK-NEXT:    bne a5, a3, .LBB8_3
 ; CHECK-NEXT:  # %bb.4: # %middle.block
 ; CHECK-NEXT:    beqz a4, .LBB8_7
 ; CHECK-NEXT:  .LBB8_5: # %for.body.preheader
@@ -425,29 +423,28 @@ for.body:                                         ; preds = %for.body.preheader,
 define void @sink_splat_sub_scalable(ptr nocapture %a, i32 signext %x) {
 ; CHECK-LABEL: sink_splat_sub_scalable:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    csrr a5, vlenb
-; CHECK-NEXT:    srli a2, a5, 1
+; CHECK-NEXT:    csrr a2, vlenb
+; CHECK-NEXT:    srli a2, a2, 1
 ; CHECK-NEXT:    li a3, 1024
 ; CHECK-NEXT:    bgeu a3, a2, .LBB9_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    li a3, 0
 ; CHECK-NEXT:    j .LBB9_5
 ; CHECK-NEXT:  .LBB9_2: # %vector.ph
+; CHECK-NEXT:    li a5, 0
 ; CHECK-NEXT:    addi a3, a2, -1
 ; CHECK-NEXT:    andi a4, a3, 1024
 ; CHECK-NEXT:    xori a3, a4, 1024
-; CHECK-NEXT:    slli a5, a5, 1
 ; CHECK-NEXT:    vsetvli a6, zero, e32, m2, ta, ma
-; CHECK-NEXT:    mv a6, a0
-; CHECK-NEXT:    mv a7, a3
 ; CHECK-NEXT:  .LBB9_3: # %vector.body
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    slli a6, a5, 2
+; CHECK-NEXT:    add a6, a0, a6
 ; CHECK-NEXT:    vl2re32.v v8, (a6)
 ; CHECK-NEXT:    vsub.vx v8, v8, a1
+; CHECK-NEXT:    add a5, a5, a2
 ; CHECK-NEXT:    vs2r.v v8, (a6)
-; CHECK-NEXT:    sub a7, a7, a2
-; CHECK-NEXT:    add a6, a6, a5
-; CHECK-NEXT:    bnez a7, .LBB9_3
+; CHECK-NEXT:    bne a5, a3, .LBB9_3
 ; CHECK-NEXT:  # %bb.4: # %middle.block
 ; CHECK-NEXT:    beqz a4, .LBB9_7
 ; CHECK-NEXT:  .LBB9_5: # %for.body.preheader
@@ -516,29 +513,28 @@ for.body:                                         ; preds = %for.body.preheader,
 define void @sink_splat_rsub_scalable(ptr nocapture %a, i32 signext %x) {
 ; CHECK-LABEL: sink_splat_rsub_scalable:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    csrr a5, vlenb
-; CHECK-NEXT:    srli a2, a5, 1
+; CHECK-NEXT:    csrr a2, vlenb
+; CHECK-NEXT:    srli a2, a2, 1
 ; CHECK-NEXT:    li a3, 1024
 ; CHECK-NEXT:    bgeu a3, a2, .LBB10_2
 ; CHECK-NEXT:  ...
[truncated]

@asb
Copy link
Contributor Author

asb commented May 7, 2024

Although blocked on #89924 moving the option to a hook, I'd be interested if anyone has any feedback or thoughts on this direction. Dropping LSR solutions that our metrics deem unprofitable, and working to improve the heuristics if that causes any unexpected degradation seems sensible to me. But there may be history with work in this area I'm not aware of.

; NO-SINK-NEXT: vl2re32.v v10, (a6)
; NO-SINK-NEXT: vadd.vv v10, v10, v8
; NO-SINK-NEXT: add a5, a5, a2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These diffs seem likely unprofitable, but more than that, they seems weird. The solution being chosen should "clearly" be higher cost than the one being rejected here. But more than that, the previous solution was also sub-optimal.

At first, I thought this might relate to the shifted add and went looking for a case where we forgot to guard a shNadd instruction on Zba. However, that turned out to be a red herring.

However, when doing that investigation, I noticed something. We have two different implementations of isLegalAddressing mode. One on TLI and one on TTI. We have overridden the former, but don't appear to have done the later.

The default implementation on TTIImpl assumes reg+reg addressing is legal. For RISCV vector, this is not a legal addressing mode. I think what we're seeing here is that LSR is computing a bad cost for basically all addresses used by vector loads and stores, and that the actual effect of this change is basically noise.

I suspect that if we implement TTI::isLegalAddressing, we'll see significant improvements in LSR solutions. Once we do that, we can revisit this patch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like I got myself confused here. BasicTTI does perform the proxy to TLI, so these implementations should be the same.

@lukel97
Copy link
Contributor

lukel97 commented May 15, 2024

Would it be useful to pre-commit the example function in the PR description as a test?

@asb asb force-pushed the 2024q2-lsr-riscv-drop-unprofitable-solutions-by-default branch from f1a1777 to 0dcdd13 Compare May 15, 2024 09:41
@asb
Copy link
Contributor Author

asb commented May 15, 2024

Thank you for taking a look. I've gone ahead and rebased to check that the changes remain similar.

My main goal of posting this quickly was to see if there was any memory on previous efforts in this direction given the original LSR flag came from RISC-V contributors. It sounds like there's not!

I agree that the changes to the in-tree vector code are a minor degradation - multiple cases where we're we have an additional scalar instruction in the loop. Not a big regression, but it's something and probably worth understanding. This patch also misses the test/Transforms/LoopStrengthReduce/RISCV changes that merit a closer look.

"Noise" is a good description and roughly what I would expect outside of the case I found or those like it. The heuristics for LSR seem very rough and potentially misleading in some cases, so although I think dropping solutions we view as less profitable than the starting point is a sensible way to go and a better basis on which to build any further LSR improvements, it's likely to cause minor variations that aren't much better or worse in many cases.

I think based on the discussion here the next sensible steps are:

  • Precommit the test case in the PR
    • It seems a useful example to have coverage for regardless of whether this ends up landing or not.
  • Look more closely at why we see a minor regression in some of the changed vector test cases
  • Run across a wider range of external inputs to look for surprises
    • I'd wondered if people had already done this, but it seems not

@topperc
Copy link
Collaborator

topperc commented May 22, 2024

We've been using -lsr-drop-solution for some benchmarks in our downstream for a while. Overall, I think it has been a net improvement.

@preames
Copy link
Collaborator

preames commented May 22, 2024

We've been using -lsr-drop-solution for some benchmarks in our downstream for a while. Overall, I think it has been a net improvement.

This is good to hear. I have no downstream experience with this change, but also have no objection to landing this.

Given that, LGTM. We can iterate to improve the couple of regressions in tree.

asb added a commit that referenced this pull request Jun 5, 2024
…ofitable (#89924)

<https://reviews.llvm.org/D126043> introduced a flag to drop solutions
if deemed unprofitable. As noted there, introducing a TTI hook enables
backends to individually opt into this behaviour.

This will be used by #89927.
@asb asb force-pushed the 2024q2-lsr-riscv-drop-unprofitable-solutions-by-default branch from 0dcdd13 to a0e87e3 Compare June 5, 2024 17:55
@asb asb force-pushed the 2024q2-lsr-riscv-drop-unprofitable-solutions-by-default branch from a0e87e3 to 904473d Compare June 5, 2024 19:02
@asb
Copy link
Contributor Author

asb commented Jun 5, 2024

Sorry for the delay, I was away on holiday. I've rebased, precommitted the example I used in the original description, and updated the affected tests in llvm/test/Transforms/LoopStrengthReduce/RISCV that were missed in the first patch.

@preames I think you wrote the affected LoopStrengthReduce/RISCV tests - I just wanted to reconfirm you're happy with the test changes shown in these files, and if you think any additional comments should be added.

@preames
Copy link
Collaborator

preames commented Jun 5, 2024

@preames I think you wrote the affected LoopStrengthReduce/RISCV tests - I just wanted to reconfirm you're happy with the test changes shown in these files, and if you think any additional comments should be added.

The reasoning from my early LGTM still applies. I glanced at the LSR changes, and see a couple of regressions, but I still think the right path forward is to enable this and then iterative improve.

@topperc
Copy link
Collaborator

topperc commented Jul 3, 2024

What's the status of this? @asb can you rebase?

@asb asb closed this Jul 10, 2024
@asb asb force-pushed the 2024q2-lsr-riscv-drop-unprofitable-solutions-by-default branch from 904473d to a37c35e Compare July 10, 2024 08:20
This avoids some cases where LSR produces results that lead to very poor
codegen. There's a chance we'll see minor degradations for some inputs
in the case that our metrics say the found solution is worse, but in
reality it's better than the starting point.

Per the review thread, at least one vendor has been enabling this by
defualt for some time and found overall it's an improvement. As such,
we'll enable by default and aim to fix any as-yet-unknown regressions
in-tree.
@asb asb reopened this Jul 10, 2024
@asb
Copy link
Contributor Author

asb commented Jul 10, 2024

What's the status of this? @asb can you rebase?

I think given the positive review from Philip and your note that you've been using this downstream for some time it's good to land. I'd missed that I hadn't actually hit the merge button yet. I've rebased and will merge now.

@asb asb merged commit af47a4e into llvm:main Jul 10, 2024
7 checks passed
dtcxzyw added a commit to dtcxzyw/llvm-project that referenced this pull request Jul 10, 2024
@dtcxzyw
Copy link
Member

dtcxzyw commented Jul 10, 2024

Hi @asb, I found this patch caused some significant performance regression (up to 37.8%) on llvm-test-suite with rv64gczba_zbb_zbkb_zbs -O3 -flto=thin. Can you have a look about this?

Name Before (Billion Insts) After (Billion Insts) Ratio
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/doitgen/doitgen 4.222953611 5.820735229 1.378
SingleSource/Benchmarks/Polybench/datamining/covariance/covariance 4.090009899 5.593993395 1.368
SingleSource/Benchmarks/Polybench/datamining/correlation/correlation 4.095016076 5.599001576 1.367
SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect 0.206776896 0.267266747 1.293
SingleSource/Benchmarks/Shootout/Shootout-ary3 10.507102477 13.507101469 1.286
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/mvt/mvt 0.576970074 0.720934057 1.250
SingleSource/Benchmarks/Polybench/linear-algebra/solvers/durbin/durbin 0.535427699 0.663343689 1.239
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv 0.160518143 0.192501995 1.199
SingleSource/Benchmarks/Polybench/stencils/adi/adi 3.382377197 4.017469438 1.188
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan 0.284736344 0.334505345 1.175
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/atax/atax 0.368520477 0.43248861 1.174
SingleSource/Benchmarks/BenchmarkGame/puzzle 3.092676262 3.592676238 1.162
SingleSource/Benchmarks/Misc/himenobmtxpa 2.368697352 2.746954721 1.160
MultiSource/Benchmarks/Rodinia/srad/srad 2.247984884 2.600044498 1.157

For detailed performance data, please refer to plctlab/llvm-ci#1165 and plctlab/llvm-ci#1166.

asb added a commit that referenced this pull request Jul 10, 2024
…default" (#98328)

Reverts #89927 while we investigate performance
regressions reported by @dtcxzyw
@asb
Copy link
Contributor Author

asb commented Jul 10, 2024

@dtcxzyw thank you, that's definitely more impact than I was expecting so I've reverted while it's still easy to do so in order to allow proper investigation.

aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
…lvm#89927)

This avoids some cases where LSR produces results that lead to very poor
codegen. There's a chance we'll see minor degradations for some inputs
in the case that our metrics say the found solution is worse, but in
reality it's better than the starting point.

Per the review thread, at least one vendor has been enabling this by
defualt for some time and found overall it's an improvement. As such,
we'll enable by default and aim to fix any as-yet-unknown regressions
in-tree.
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
…default" (llvm#98328)

Reverts llvm#89927 while we investigate performance
regressions reported by @dtcxzyw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:RISC-V llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants