[LV][EVL] Introduce the EVLIndVarSimplify Pass for EVL-vectorized loops #131005

mshockwave · 2025-03-12T18:24:58Z

This is the renovated version of #91796 . That PR was a bit too old and I used a slightly different approach than my original plan so I thought it'll be easier to open another PR.

There are two major differences compared with #91796:

We're now relying on the new llvm.loop.isvectorized.withevl from [LV][EVL] Attach a new metadata on EVL vectorized loops #131000 to identify EVL-vectorized loop. I believe this approach is much safer and can address many of the correctness concerns from the original PR. I still keep the IR pattern matching logics for extracting induction variable as well as keeping there just in case
This Pass now runs right after the vectorizer Passes, instead of running right before codegen. This ensures minimal disturbance on the shape of the vectorized loops which this Pass expects

Here is the revised patch description:
When we enable EVL-based loop vectorization w/ predicated tail-folding, each vectorized loop has effectively two induction variables: one calculates the step using (VF x vscale) and the other one increases the IV by values returned from experiment.get.vector.length. The former, also known as canonical IV, is more favorable for analyses as it's "countable" in the sense of SCEV; the latter (EVL-based IV), however, is more favorable to codegen, at least for those that support scalable vectors like AArch64 SVE and RISC-V.

The idea is that we use canonical IV all the way until the end of all vectorizers, where we replace it with EVL-based IV using EVLIVSimplify introduced here. Such that we can have the best from both worlds.

This Pass is enabled by default in RISC-V. However, since we haven't really vectorize loops with predicate tail-folding by default, this Pass is no-op at this moment.

I have validate the correctness of this Pass by enabling EVL-based LV + predicated tail-folding and run on SPEC2006INT & FP w/ ref workload. Performance wise I observed an average 0.89% improvement of dynamic instruction counts on sifive-p470. Notably, 401.bzip2's dynamic instruction counts were improved by 8.4%.

This stacks on top of #131000

llvmbot · 2025-03-12T18:25:33Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-backend-risc-v

Author: Min-Yih Hsu (mshockwave)

Changes

This is the renovated version of #91796 . That PR was a bit too old and I used a slightly different approach than my original plan so I thought it'll be easier to open another PR.

There are two major differences compared with #91796:

We're now relying on the new llvm.loop.isvectorized.withevl from #131000 to identify EVL-vectorized loop. I believe this approach is much safer and can address many of the correctness concerns from the original PR. I still keep the IR pattern matching logics for extracting induction variable as well as keeping there just in case
This Pass now runs right after the vectorizer Passes, instead of running right before codegen. This ensures minimal disturbance on the shape of the vectorized loops which this Pass expects

Here is the revised patch description:
When we enable EVL-based loop vectorization w/ predicated tail-folding, each vectorized loop has effectively two induction variables: one calculates the step using (VF x vscale) and the other one increases the IV by values returned from experiment.get.vector.length. The former, also known as canonical IV, is more favorable for analyses as it's "countable" in the sense of SCEV; the latter (EVL-based IV), however, is more favorable to codegen, at least for those that support scalable vectors like AArch64 SVE and RISC-V.

The idea is that we use canonical IV all the way until the end of all vectorizers, where we replace it with EVL-based IV using EVLIVSimplify introduced here. Such that we can have the best from both worlds.

This Pass is enabled by default in RISC-V. However, since we haven't really vectorize loops with predicate tail-folding by default, this Pass is no-op at this moment.

I have validate the correctness of this Pass by enabling EVL-based LV + predicated tail-folding and run on SPEC2006INT & FP w/ ref workload. Performance wise I observed an average 0.89% improvement of dynamic instruction counts on sifive-p470. Notably, 401.bzip2's dynamic instruction counts were improved by 8.4%.

This stacks on top of #131000

Patch is 114.54 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131005.diff

19 Files Affected:

(added) llvm/include/llvm/Transforms/Vectorize/EVLIndVarSimplify.h (+31)
(modified) llvm/lib/Passes/PassBuilder.cpp (+1)
(modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+1)
(modified) llvm/lib/Passes/PassRegistry.def (+1)
(modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+7)
(modified) llvm/lib/Transforms/Vectorize/CMakeLists.txt (+1)
(added) llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp (+242)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+26)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll (+6-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll (+6-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll (+74-73)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-call-intrinsics.ll (+38-37)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll (+70-69)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-div.ll (+18-17)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-iv32.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll (+14-13)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-masked-loadstore.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-uniform-store.ll (+6-5)
(added) llvm/test/Transforms/LoopVectorize/evl-iv-simplify.ll (+333)

diff --git a/llvm/include/llvm/Transforms/Vectorize/EVLIndVarSimplify.h b/llvm/include/llvm/Transforms/Vectorize/EVLIndVarSimplify.h
new file mode 100644
index 0000000000000..3178dc762a195
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Vectorize/EVLIndVarSimplify.h
@@ -0,0 +1,31 @@
+//===------ EVLIndVarSimplify.h - Optimize vectorized loops w/ EVL IV------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass optimizes a vectorized loop with canonical IV to using EVL-based
+// IV if it was tail-folded by predicated EVL.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_VECTORIZE_EVLINDVARSIMPLIFY_H
+#define LLVM_TRANSFORMS_VECTORIZE_EVLINDVARSIMPLIFY_H
+
+#include "llvm/Analysis/LoopAnalysisManager.h"
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+class Loop;
+class LPMUpdater;
+
+/// Turn vectorized loops with canonical induction variables into loops that
+/// only use a single EVL-based induction variable.
+struct EVLIndVarSimplifyPass : public PassInfoMixin<EVLIndVarSimplifyPass> {
+  PreservedAnalyses run(Loop &L, LoopAnalysisManager &LAM,
+                        LoopStandardAnalysisResults &AR, LPMUpdater &U);
+};
+} // namespace llvm
+#endif
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 8d1273e3631c3..77872cba04404 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -356,6 +356,7 @@
 #include "llvm/Transforms/Utils/SymbolRewriter.h"
 #include "llvm/Transforms/Utils/UnifyFunctionExitNodes.h"
 #include "llvm/Transforms/Utils/UnifyLoopExits.h"
+#include "llvm/Transforms/Vectorize/EVLIndVarSimplify.h"
 #include "llvm/Transforms/Vectorize/LoadStoreVectorizer.h"
 #include "llvm/Transforms/Vectorize/LoopIdiomVectorize.h"
 #include "llvm/Transforms/Vectorize/LoopVectorize.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 07db107325f02..f7a4dd1538e55 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -143,6 +143,7 @@
 #include "llvm/Transforms/Utils/NameAnonGlobals.h"
 #include "llvm/Transforms/Utils/RelLookupTableConverter.h"
 #include "llvm/Transforms/Utils/SimplifyCFGOptions.h"
+#include "llvm/Transforms/Vectorize/EVLIndVarSimplify.h"
 #include "llvm/Transforms/Vectorize/LoopVectorize.h"
 #include "llvm/Transforms/Vectorize/SLPVectorizer.h"
 #include "llvm/Transforms/Vectorize/VectorCombine.h"
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index bfd952df25e98..1ed47ca15577b 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -661,6 +661,7 @@ LOOP_ANALYSIS("should-run-extra-simple-loop-unswitch",
 #endif
 LOOP_PASS("canon-freeze", CanonicalizeFreezeInLoopsPass())
 LOOP_PASS("dot-ddg", DDGDotPrinterPass())
+LOOP_PASS("evl-iv-simplify", EVLIndVarSimplifyPass())
 LOOP_PASS("guard-widening", GuardWideningPass())
 LOOP_PASS("extra-simple-loop-unswitch-passes",
               ExtraLoopPassManager<ShouldRunExtraSimpleLoopUnswitch>())
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index f78e5f8147d98..11ce53deafb2f 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -37,6 +37,7 @@
 #include "llvm/Target/TargetOptions.h"
 #include "llvm/Transforms/IPO.h"
 #include "llvm/Transforms/Scalar.h"
+#include "llvm/Transforms/Vectorize/EVLIndVarSimplify.h"
 #include "llvm/Transforms/Vectorize/LoopIdiomVectorize.h"
 #include <optional>
 using namespace llvm;
@@ -639,6 +640,12 @@ void RISCVTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {
                                                  OptimizationLevel Level) {
     LPM.addPass(LoopIdiomVectorizePass(LoopIdiomVectorizeStyle::Predicated));
   });
+
+  PB.registerVectorizerEndEPCallback(
+      [](FunctionPassManager &FPM, OptimizationLevel Level) {
+        if (Level.isOptimizingForSpeed())
+          FPM.addPass(createFunctionToLoopPassAdaptor(EVLIndVarSimplifyPass()));
+      });
 }
 
 yaml::MachineFunctionInfo *
diff --git a/llvm/lib/Transforms/Vectorize/CMakeLists.txt b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
index 7dac6d0059b26..85555158645f3 100644
--- a/llvm/lib/Transforms/Vectorize/CMakeLists.txt
+++ b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
@@ -1,4 +1,5 @@
 add_llvm_component_library(LLVMVectorize
+  EVLIndVarSimplify.cpp
   LoadStoreVectorizer.cpp
   LoopIdiomVectorize.cpp
   LoopVectorizationLegality.cpp
diff --git a/llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp b/llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp
new file mode 100644
index 0000000000000..8ffe287c183f1
--- /dev/null
+++ b/llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp
@@ -0,0 +1,242 @@
+//===---- EVLIndVarSimplify.cpp - Optimize vectorized loops w/ EVL IV------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass optimizes a vectorized loop with canonical IV to using EVL-based
+// IV if it was tail-folded by predicated EVL.
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Transforms/Vectorize/EVLIndVarSimplify.h"
+#include "llvm/ADT/Statistic.h"
+#include "llvm/Analysis/IVDescriptors.h"
+#include "llvm/Analysis/LoopInfo.h"
+#include "llvm/Analysis/LoopPass.h"
+#include "llvm/Analysis/ScalarEvolution.h"
+#include "llvm/Analysis/ScalarEvolutionExpressions.h"
+#include "llvm/Analysis/ValueTracking.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/MathExtras.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Transforms/Scalar/LoopPassManager.h"
+#include "llvm/Transforms/Utils/Local.h"
+#include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"
+
+#define DEBUG_TYPE "evl-iv-simplify"
+
+using namespace llvm;
+
+STATISTIC(NumEliminatedCanonicalIV, "Number of canonical IVs we eliminated");
+
+static cl::opt<bool> EnableEVLIndVarSimplify(
+    "enable-evl-indvar-simplify",
+    cl::desc("Enable EVL-based induction variable simplify Pass"), cl::Hidden,
+    cl::init(true));
+
+namespace {
+struct EVLIndVarSimplifyImpl {
+  ScalarEvolution &SE;
+
+  explicit EVLIndVarSimplifyImpl(LoopStandardAnalysisResults &LAR)
+      : SE(LAR.SE) {}
+
+  explicit EVLIndVarSimplifyImpl(ScalarEvolution &SE) : SE(SE) {}
+
+  // Returns true if modify the loop.
+  bool run(Loop &L);
+};
+} // anonymous namespace
+
+// Returns the constant part of vectorization factor from the induction
+// variable's step value SCEV expression.
+static uint32_t getVFFromIndVar(const SCEV *Step, const Function &F) {
+  if (!Step)
+    return 0U;
+
+  // Looking for loops with IV step value in the form of `(<constant VF> x
+  // vscale)`.
+  if (auto *Mul = dyn_cast<SCEVMulExpr>(Step)) {
+    if (Mul->getNumOperands() == 2) {
+      const SCEV *LHS = Mul->getOperand(0);
+      const SCEV *RHS = Mul->getOperand(1);
+      if (auto *Const = dyn_cast<SCEVConstant>(LHS)) {
+        uint64_t V = Const->getAPInt().getLimitedValue();
+        if (isa<SCEVVScale>(RHS) && llvm::isUInt<32>(V))
+          return V;
+      }
+    }
+  }
+
+  // If not, see if the vscale_range of the parent function is a fixed value,
+  // which makes the step value to be replaced by a constant.
+  if (F.hasFnAttribute(Attribute::VScaleRange))
+    if (auto *ConstStep = dyn_cast<SCEVConstant>(Step)) {
+      APInt V = ConstStep->getAPInt().abs();
+      ConstantRange CR = llvm::getVScaleRange(&F, 64);
+      if (const APInt *Fixed = CR.getSingleElement()) {
+        V = V.zextOrTrunc(Fixed->getBitWidth());
+        uint64_t VF = V.udiv(*Fixed).getLimitedValue();
+        if (VF && llvm::isUInt<32>(VF) &&
+            // Make sure step is divisible by vscale.
+            V.urem(*Fixed).isZero())
+          return VF;
+      }
+    }
+
+  return 0U;
+}
+
+bool EVLIndVarSimplifyImpl::run(Loop &L) {
+  if (!EnableEVLIndVarSimplify)
+    return false;
+
+  if (!getBooleanLoopAttribute(&L, "llvm.loop.isvectorized") ||
+      !getBooleanLoopAttribute(&L, "llvm.loop.isvectorized.withevl"))
+    return false;
+
+  BasicBlock *LatchBlock = L.getLoopLatch();
+  ICmpInst *OrigLatchCmp = L.getLatchCmpInst();
+  if (!LatchBlock || !OrigLatchCmp)
+    return false;
+
+  InductionDescriptor IVD;
+  PHINode *IndVar = L.getInductionVariable(SE);
+  if (!IndVar || !L.getInductionDescriptor(SE, IVD)) {
+    LLVM_DEBUG(dbgs() << "Cannot retrieve IV from loop " << L.getName()
+                      << "\n");
+    return false;
+  }
+
+  BasicBlock *InitBlock, *BackEdgeBlock;
+  if (!L.getIncomingAndBackEdge(InitBlock, BackEdgeBlock)) {
+    LLVM_DEBUG(dbgs() << "Expect unique incoming and backedge in "
+                      << L.getName() << "\n");
+    return false;
+  }
+
+  // Retrieve the loop bounds.
+  std::optional<Loop::LoopBounds> Bounds = L.getBounds(SE);
+  if (!Bounds) {
+    LLVM_DEBUG(dbgs() << "Could not obtain the bounds for loop " << L.getName()
+                      << "\n");
+    return false;
+  }
+  Value *CanonicalIVInit = &Bounds->getInitialIVValue();
+  Value *CanonicalIVFinal = &Bounds->getFinalIVValue();
+
+  const SCEV *StepV = IVD.getStep();
+  uint32_t VF = getVFFromIndVar(StepV, *L.getHeader()->getParent());
+  if (!VF) {
+    LLVM_DEBUG(dbgs() << "Could not infer VF from IndVar step '" << *StepV
+                      << "'\n");
+    return false;
+  }
+  LLVM_DEBUG(dbgs() << "Using VF=" << VF << " for loop " << L.getName()
+                    << "\n");
+
+  // Try to find the EVL-based induction variable.
+  using namespace PatternMatch;
+  BasicBlock *BB = IndVar->getParent();
+
+  Value *EVLIndVar = nullptr;
+  Value *RemTC = nullptr;
+  Value *TC = nullptr;
+  auto IntrinsicMatch = m_Intrinsic<Intrinsic::experimental_get_vector_length>(
+      m_Value(RemTC), m_SpecificInt(VF),
+      /*Scalable=*/m_SpecificInt(1));
+  for (auto &PN : BB->phis()) {
+    if (&PN == IndVar)
+      continue;
+
+    // Check 1: it has to contain both incoming (init) & backedge blocks
+    // from IndVar.
+    if (PN.getBasicBlockIndex(InitBlock) < 0 ||
+        PN.getBasicBlockIndex(BackEdgeBlock) < 0)
+      continue;
+    // Check 2: EVL index is always increasing, thus its inital value has to be
+    // equal to either the initial IV value (when the canonical IV is also
+    // increasing) or the last IV value (when canonical IV is decreasing).
+    Value *Init = PN.getIncomingValueForBlock(InitBlock);
+    using Direction = Loop::LoopBounds::Direction;
+    switch (Bounds->getDirection()) {
+    case Direction::Increasing:
+      if (Init != CanonicalIVInit)
+        continue;
+      break;
+    case Direction::Decreasing:
+      if (Init != CanonicalIVFinal)
+        continue;
+      break;
+    case Direction::Unknown:
+      // To be more permissive and see if either the initial or final IV value
+      // matches PN's init value.
+      if (Init != CanonicalIVInit && Init != CanonicalIVFinal)
+        continue;
+      break;
+    }
+    Value *RecValue = PN.getIncomingValueForBlock(BackEdgeBlock);
+    assert(RecValue);
+
+    LLVM_DEBUG(dbgs() << "Found candidate PN of EVL-based IndVar: " << PN
+                      << "\n");
+
+    // Check 3: Pattern match to find the EVL-based index and total trip count
+    // (TC).
+    if (match(RecValue,
+              m_c_Add(m_ZExtOrSelf(IntrinsicMatch), m_Specific(&PN))) &&
+        match(RemTC, m_Sub(m_Value(TC), m_Specific(&PN)))) {
+      EVLIndVar = RecValue;
+      break;
+    }
+  }
+
+  if (!EVLIndVar || !TC)
+    return false;
+
+  LLVM_DEBUG(dbgs() << "Using " << *EVLIndVar << " for EVL-based IndVar\n");
+
+  // Create an EVL-based comparison and replace the branch to use it as
+  // predicate.
+
+  // Loop::getLatchCmpInst check at the beginning of this function has ensured
+  // that latch block ends in a conditional branch.
+  auto *LatchBranch = cast<BranchInst>(LatchBlock->getTerminator());
+  assert(LatchBranch->isConditional());
+  ICmpInst::Predicate Pred;
+  if (LatchBranch->getSuccessor(0) == L.getHeader())
+    Pred = ICmpInst::ICMP_NE;
+  else
+    Pred = ICmpInst::ICMP_EQ;
+
+  IRBuilder<> Builder(OrigLatchCmp);
+  auto *NewLatchCmp = Builder.CreateICmp(Pred, EVLIndVar, TC);
+  OrigLatchCmp->replaceAllUsesWith(NewLatchCmp);
+
+  // llvm::RecursivelyDeleteDeadPHINode only deletes cycles whose values are
+  // not used outside the cycles. However, in this case the now-RAUW-ed
+  // OrigLatchCmp will be considered a use outside the cycle while in reality
+  // it's practically dead. Thus we need to remove it before calling
+  // RecursivelyDeleteDeadPHINode.
+  (void)RecursivelyDeleteTriviallyDeadInstructions(OrigLatchCmp);
+  if (llvm::RecursivelyDeleteDeadPHINode(IndVar))
+    LLVM_DEBUG(dbgs() << "Removed original IndVar\n");
+
+  ++NumEliminatedCanonicalIV;
+
+  return true;
+}
+
+PreservedAnalyses EVLIndVarSimplifyPass::run(Loop &L, LoopAnalysisManager &LAM,
+                                             LoopStandardAnalysisResults &AR,
+                                             LPMUpdater &U) {
+  if (EVLIndVarSimplifyImpl(AR).run(L))
+    return PreservedAnalyses::allInSet<CFGAnalyses>();
+  return PreservedAnalyses::all();
+}
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index e595347d62bf5..7ef06957d5322 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -1013,6 +1013,32 @@ void VPlan::execute(VPTransformState *State) {
     Value *Val = State->get(PhiR->getBackedgeValue(), NeedsScalar);
     cast<PHINode>(Phi)->addIncoming(Val, VectorLatchBB);
   }
+
+  // Check if it's EVL-vectorized and mark the corresponding metadata.
+  // Note that we could have done this during the codegen of
+  // ExplictVectorLength, but the enclosing vector loop was not in a good shape
+  // for us to attach the metadata.
+  bool IsEVLVectorized = llvm::any_of(*Header, [](const VPRecipeBase &Recipe) {
+    // Looking for the ExplictVectorLength VPInstruction.
+    if (const auto *VI = dyn_cast<VPInstruction>(&Recipe))
+      return VI->getOpcode() == VPInstruction::ExplicitVectorLength;
+    return false;
+  });
+  if (IsEVLVectorized) {
+    // VPTransformState::CurrentParentLoop has already been reset
+    // at this moment.
+    Loop *L = State->LI->getLoopFor(VectorLatchBB);
+    assert(L);
+    LLVMContext &Context = State->Builder.getContext();
+    MDNode *LoopID = L->getLoopID();
+    auto *IsEVLVectorizedMD = MDNode::get(
+        Context,
+        {MDString::get(Context, "llvm.loop.isvectorized.withevl"),
+         ConstantAsMetadata::get(ConstantInt::get(Context, APInt(32, 1)))});
+    MDNode *NewLoopID = makePostTransformationMetadata(Context, LoopID, {},
+                                                       {IsEVLVectorizedMD});
+    L->setLoopID(NewLoopID);
+  }
 }
 
 InstructionCost VPlan::cost(ElementCount VF, VPCostContext &Ctx) {
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll b/llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll
index ba7158eb02d90..7a28574740348 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll
@@ -53,7 +53,7 @@ define void @truncate_to_minimal_bitwidths_widen_cast_recipe(ptr %src) {
 ; CHECK-NEXT:    store i8 [[CONV36]], ptr null, align 1
 ; CHECK-NEXT:    [[IV_NEXT]] = add i64 [[IV]], 1
 ; CHECK-NEXT:    [[EC:%.*]] = icmp eq i64 [[IV]], 1
-; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
@@ -77,8 +77,9 @@ exit:                                             ; preds = %loop
   ret void
 }
 ;.
-; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
-; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
-; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.isvectorized.withevl", i32 1}
+; CHECK: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META3]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META3]], [[META2]]}
 ;.
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll b/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
index c95414db18bef..8543964968c5a 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
@@ -80,7 +80,7 @@ define void @type_info_cache_clobber(ptr %dstv, ptr %src, i64 %wide.trip.count)
 ; CHECK-NEXT:    store i16 [[CONV36]], ptr null, align 2
 ; CHECK-NEXT:    [[IV_NEXT]] = add i64 [[IV]], 1
 ; CHECK-NEXT:    [[EC:%.*]] = icmp eq i64 [[IV]], [[WIDE_TRIP_COUNT]]
-; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP8:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP9:![0-9]+]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
@@ -114,8 +114,9 @@ exit:
 ; CHECK: [[META2]] = distinct !{[[META2]], !"LVerDomain"}
 ; CHECK: [[META3]] = !{[[META4:![0-9]+]]}
 ; CHECK: [[META4]] = distinct !{[[META4]], [[META2]]}
-; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META6:![0-9]+]], [[META7:![0-9]+]]}
-; CHECK: [[META6]] = !{!"llvm.loop.isvectorized", i32 1}
-; CHECK: [[META7]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[LOOP8]] = distinct !{[[LOOP8]], [[META6]]}
+; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META6:![0-9]+]], [[META7:![0-9]+]], [[META8:![0-9]+]]}
+; CHECK: [[META6]] = !{!"llvm.loop.isvectorized.withevl", i32 1}
+; CHECK: [[META7]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META8]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP9]] = distinct !{[[LOOP9]], [[META7]]}
 ;.
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll
index e7181f7f30c77..d962258662728 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll
@@ -65,7 +65,7 @@ define void @test_and(ptr nocapture %a, ptr nocapture readonly %b) {
 ; IF-EVL-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, ptr [[B]], i64 [[LEN]]
 ; IF-EVL-NEXT:    store i8 [[TMP]], ptr [[ARRAYIDX1]], align 1
 ; IF-EVL-NEXT:    [[DOTNOT:%.*]] = icmp eq i64 [[DEC]], 100
-; IF-EVL-NEXT:    br i1 [[DOTNOT]], label %[[FINISH_LOOPEXIT]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; IF-EVL-NEXT:    br i1 [[DOTNOT]], label %[[FINISH_LOOPEXIT]], label %[[LOOP]], !llvm.loop [[LOOP4:![0-9]+]]
 ; IF-EVL:       [[FINISH_LOOPEXIT]]:
 ; IF-EVL-NEXT:    ret void
 ;
@@ -144,7 +144,7 @@ define void @test_or(ptr nocapture %a, ptr nocapture readonly %b) {
 ; IF-EVL-NEXT:    [[INDEX_EVL_NEXT]] = add nuw i64 [[TMP18]], [[EVL_BASED_IV]]
 ; IF-EVL-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP9]]
 ; IF-EVL-NEXT:    [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; IF-EVL-NEXT:    br i1 [[TMP19]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; IF-EVL-NEXT:    br i1 [[TMP19]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
 ; IF-EVL:       [[MIDDL...
[truncated]

llvmbot · 2025-03-12T18:25:34Z

@llvm/pr-subscribers-llvm-transforms

Author: Min-Yih Hsu (mshockwave)

Changes

This is the renovated version of #91796 . That PR was a bit too old and I used a slightly different approach than my original plan so I thought it'll be easier to open another PR.

There are two major differences compared with #91796:

We're now relying on the new llvm.loop.isvectorized.withevl from #131000 to identify EVL-vectorized loop. I believe this approach is much safer and can address many of the correctness concerns from the original PR. I still keep the IR pattern matching logics for extracting induction variable as well as keeping there just in case
This Pass now runs right after the vectorizer Passes, instead of running right before codegen. This ensures minimal disturbance on the shape of the vectorized loops which this Pass expects

Here is the revised patch description:
When we enable EVL-based loop vectorization w/ predicated tail-folding, each vectorized loop has effectively two induction variables: one calculates the step using (VF x vscale) and the other one increases the IV by values returned from experiment.get.vector.length. The former, also known as canonical IV, is more favorable for analyses as it's "countable" in the sense of SCEV; the latter (EVL-based IV), however, is more favorable to codegen, at least for those that support scalable vectors like AArch64 SVE and RISC-V.

The idea is that we use canonical IV all the way until the end of all vectorizers, where we replace it with EVL-based IV using EVLIVSimplify introduced here. Such that we can have the best from both worlds.

This Pass is enabled by default in RISC-V. However, since we haven't really vectorize loops with predicate tail-folding by default, this Pass is no-op at this moment.

I have validate the correctness of this Pass by enabling EVL-based LV + predicated tail-folding and run on SPEC2006INT & FP w/ ref workload. Performance wise I observed an average 0.89% improvement of dynamic instruction counts on sifive-p470. Notably, 401.bzip2's dynamic instruction counts were improved by 8.4%.

This stacks on top of #131000

Patch is 114.54 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131005.diff

19 Files Affected:

(added) llvm/include/llvm/Transforms/Vectorize/EVLIndVarSimplify.h (+31)
(modified) llvm/lib/Passes/PassBuilder.cpp (+1)
(modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+1)
(modified) llvm/lib/Passes/PassRegistry.def (+1)
(modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+7)
(modified) llvm/lib/Transforms/Vectorize/CMakeLists.txt (+1)
(added) llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp (+242)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+26)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll (+6-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll (+6-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll (+74-73)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-call-intrinsics.ll (+38-37)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll (+70-69)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-div.ll (+18-17)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-iv32.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll (+14-13)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-masked-loadstore.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-uniform-store.ll (+6-5)
(added) llvm/test/Transforms/LoopVectorize/evl-iv-simplify.ll (+333)

diff --git a/llvm/include/llvm/Transforms/Vectorize/EVLIndVarSimplify.h b/llvm/include/llvm/Transforms/Vectorize/EVLIndVarSimplify.h
new file mode 100644
index 0000000000000..3178dc762a195
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Vectorize/EVLIndVarSimplify.h
@@ -0,0 +1,31 @@
+//===------ EVLIndVarSimplify.h - Optimize vectorized loops w/ EVL IV------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass optimizes a vectorized loop with canonical IV to using EVL-based
+// IV if it was tail-folded by predicated EVL.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_VECTORIZE_EVLINDVARSIMPLIFY_H
+#define LLVM_TRANSFORMS_VECTORIZE_EVLINDVARSIMPLIFY_H
+
+#include "llvm/Analysis/LoopAnalysisManager.h"
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+class Loop;
+class LPMUpdater;
+
+/// Turn vectorized loops with canonical induction variables into loops that
+/// only use a single EVL-based induction variable.
+struct EVLIndVarSimplifyPass : public PassInfoMixin<EVLIndVarSimplifyPass> {
+  PreservedAnalyses run(Loop &L, LoopAnalysisManager &LAM,
+                        LoopStandardAnalysisResults &AR, LPMUpdater &U);
+};
+} // namespace llvm
+#endif
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 8d1273e3631c3..77872cba04404 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -356,6 +356,7 @@
 #include "llvm/Transforms/Utils/SymbolRewriter.h"
 #include "llvm/Transforms/Utils/UnifyFunctionExitNodes.h"
 #include "llvm/Transforms/Utils/UnifyLoopExits.h"
+#include "llvm/Transforms/Vectorize/EVLIndVarSimplify.h"
 #include "llvm/Transforms/Vectorize/LoadStoreVectorizer.h"
 #include "llvm/Transforms/Vectorize/LoopIdiomVectorize.h"
 #include "llvm/Transforms/Vectorize/LoopVectorize.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 07db107325f02..f7a4dd1538e55 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -143,6 +143,7 @@
 #include "llvm/Transforms/Utils/NameAnonGlobals.h"
 #include "llvm/Transforms/Utils/RelLookupTableConverter.h"
 #include "llvm/Transforms/Utils/SimplifyCFGOptions.h"
+#include "llvm/Transforms/Vectorize/EVLIndVarSimplify.h"
 #include "llvm/Transforms/Vectorize/LoopVectorize.h"
 #include "llvm/Transforms/Vectorize/SLPVectorizer.h"
 #include "llvm/Transforms/Vectorize/VectorCombine.h"
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index bfd952df25e98..1ed47ca15577b 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -661,6 +661,7 @@ LOOP_ANALYSIS("should-run-extra-simple-loop-unswitch",
 #endif
 LOOP_PASS("canon-freeze", CanonicalizeFreezeInLoopsPass())
 LOOP_PASS("dot-ddg", DDGDotPrinterPass())
+LOOP_PASS("evl-iv-simplify", EVLIndVarSimplifyPass())
 LOOP_PASS("guard-widening", GuardWideningPass())
 LOOP_PASS("extra-simple-loop-unswitch-passes",
               ExtraLoopPassManager<ShouldRunExtraSimpleLoopUnswitch>())
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index f78e5f8147d98..11ce53deafb2f 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -37,6 +37,7 @@
 #include "llvm/Target/TargetOptions.h"
 #include "llvm/Transforms/IPO.h"
 #include "llvm/Transforms/Scalar.h"
+#include "llvm/Transforms/Vectorize/EVLIndVarSimplify.h"
 #include "llvm/Transforms/Vectorize/LoopIdiomVectorize.h"
 #include <optional>
 using namespace llvm;
@@ -639,6 +640,12 @@ void RISCVTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {
                                                  OptimizationLevel Level) {
     LPM.addPass(LoopIdiomVectorizePass(LoopIdiomVectorizeStyle::Predicated));
   });
+
+  PB.registerVectorizerEndEPCallback(
+      [](FunctionPassManager &FPM, OptimizationLevel Level) {
+        if (Level.isOptimizingForSpeed())
+          FPM.addPass(createFunctionToLoopPassAdaptor(EVLIndVarSimplifyPass()));
+      });
 }
 
 yaml::MachineFunctionInfo *
diff --git a/llvm/lib/Transforms/Vectorize/CMakeLists.txt b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
index 7dac6d0059b26..85555158645f3 100644
--- a/llvm/lib/Transforms/Vectorize/CMakeLists.txt
+++ b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
@@ -1,4 +1,5 @@
 add_llvm_component_library(LLVMVectorize
+  EVLIndVarSimplify.cpp
   LoadStoreVectorizer.cpp
   LoopIdiomVectorize.cpp
   LoopVectorizationLegality.cpp
diff --git a/llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp b/llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp
new file mode 100644
index 0000000000000..8ffe287c183f1
--- /dev/null
+++ b/llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp
@@ -0,0 +1,242 @@
+//===---- EVLIndVarSimplify.cpp - Optimize vectorized loops w/ EVL IV------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass optimizes a vectorized loop with canonical IV to using EVL-based
+// IV if it was tail-folded by predicated EVL.
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Transforms/Vectorize/EVLIndVarSimplify.h"
+#include "llvm/ADT/Statistic.h"
+#include "llvm/Analysis/IVDescriptors.h"
+#include "llvm/Analysis/LoopInfo.h"
+#include "llvm/Analysis/LoopPass.h"
+#include "llvm/Analysis/ScalarEvolution.h"
+#include "llvm/Analysis/ScalarEvolutionExpressions.h"
+#include "llvm/Analysis/ValueTracking.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/MathExtras.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Transforms/Scalar/LoopPassManager.h"
+#include "llvm/Transforms/Utils/Local.h"
+#include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"
+
+#define DEBUG_TYPE "evl-iv-simplify"
+
+using namespace llvm;
+
+STATISTIC(NumEliminatedCanonicalIV, "Number of canonical IVs we eliminated");
+
+static cl::opt<bool> EnableEVLIndVarSimplify(
+    "enable-evl-indvar-simplify",
+    cl::desc("Enable EVL-based induction variable simplify Pass"), cl::Hidden,
+    cl::init(true));
+
+namespace {
+struct EVLIndVarSimplifyImpl {
+  ScalarEvolution &SE;
+
+  explicit EVLIndVarSimplifyImpl(LoopStandardAnalysisResults &LAR)
+      : SE(LAR.SE) {}
+
+  explicit EVLIndVarSimplifyImpl(ScalarEvolution &SE) : SE(SE) {}
+
+  // Returns true if modify the loop.
+  bool run(Loop &L);
+};
+} // anonymous namespace
+
+// Returns the constant part of vectorization factor from the induction
+// variable's step value SCEV expression.
+static uint32_t getVFFromIndVar(const SCEV *Step, const Function &F) {
+  if (!Step)
+    return 0U;
+
+  // Looking for loops with IV step value in the form of `(<constant VF> x
+  // vscale)`.
+  if (auto *Mul = dyn_cast<SCEVMulExpr>(Step)) {
+    if (Mul->getNumOperands() == 2) {
+      const SCEV *LHS = Mul->getOperand(0);
+      const SCEV *RHS = Mul->getOperand(1);
+      if (auto *Const = dyn_cast<SCEVConstant>(LHS)) {
+        uint64_t V = Const->getAPInt().getLimitedValue();
+        if (isa<SCEVVScale>(RHS) && llvm::isUInt<32>(V))
+          return V;
+      }
+    }
+  }
+
+  // If not, see if the vscale_range of the parent function is a fixed value,
+  // which makes the step value to be replaced by a constant.
+  if (F.hasFnAttribute(Attribute::VScaleRange))
+    if (auto *ConstStep = dyn_cast<SCEVConstant>(Step)) {
+      APInt V = ConstStep->getAPInt().abs();
+      ConstantRange CR = llvm::getVScaleRange(&F, 64);
+      if (const APInt *Fixed = CR.getSingleElement()) {
+        V = V.zextOrTrunc(Fixed->getBitWidth());
+        uint64_t VF = V.udiv(*Fixed).getLimitedValue();
+        if (VF && llvm::isUInt<32>(VF) &&
+            // Make sure step is divisible by vscale.
+            V.urem(*Fixed).isZero())
+          return VF;
+      }
+    }
+
+  return 0U;
+}
+
+bool EVLIndVarSimplifyImpl::run(Loop &L) {
+  if (!EnableEVLIndVarSimplify)
+    return false;
+
+  if (!getBooleanLoopAttribute(&L, "llvm.loop.isvectorized") ||
+      !getBooleanLoopAttribute(&L, "llvm.loop.isvectorized.withevl"))
+    return false;
+
+  BasicBlock *LatchBlock = L.getLoopLatch();
+  ICmpInst *OrigLatchCmp = L.getLatchCmpInst();
+  if (!LatchBlock || !OrigLatchCmp)
+    return false;
+
+  InductionDescriptor IVD;
+  PHINode *IndVar = L.getInductionVariable(SE);
+  if (!IndVar || !L.getInductionDescriptor(SE, IVD)) {
+    LLVM_DEBUG(dbgs() << "Cannot retrieve IV from loop " << L.getName()
+                      << "\n");
+    return false;
+  }
+
+  BasicBlock *InitBlock, *BackEdgeBlock;
+  if (!L.getIncomingAndBackEdge(InitBlock, BackEdgeBlock)) {
+    LLVM_DEBUG(dbgs() << "Expect unique incoming and backedge in "
+                      << L.getName() << "\n");
+    return false;
+  }
+
+  // Retrieve the loop bounds.
+  std::optional<Loop::LoopBounds> Bounds = L.getBounds(SE);
+  if (!Bounds) {
+    LLVM_DEBUG(dbgs() << "Could not obtain the bounds for loop " << L.getName()
+                      << "\n");
+    return false;
+  }
+  Value *CanonicalIVInit = &Bounds->getInitialIVValue();
+  Value *CanonicalIVFinal = &Bounds->getFinalIVValue();
+
+  const SCEV *StepV = IVD.getStep();
+  uint32_t VF = getVFFromIndVar(StepV, *L.getHeader()->getParent());
+  if (!VF) {
+    LLVM_DEBUG(dbgs() << "Could not infer VF from IndVar step '" << *StepV
+                      << "'\n");
+    return false;
+  }
+  LLVM_DEBUG(dbgs() << "Using VF=" << VF << " for loop " << L.getName()
+                    << "\n");
+
+  // Try to find the EVL-based induction variable.
+  using namespace PatternMatch;
+  BasicBlock *BB = IndVar->getParent();
+
+  Value *EVLIndVar = nullptr;
+  Value *RemTC = nullptr;
+  Value *TC = nullptr;
+  auto IntrinsicMatch = m_Intrinsic<Intrinsic::experimental_get_vector_length>(
+      m_Value(RemTC), m_SpecificInt(VF),
+      /*Scalable=*/m_SpecificInt(1));
+  for (auto &PN : BB->phis()) {
+    if (&PN == IndVar)
+      continue;
+
+    // Check 1: it has to contain both incoming (init) & backedge blocks
+    // from IndVar.
+    if (PN.getBasicBlockIndex(InitBlock) < 0 ||
+        PN.getBasicBlockIndex(BackEdgeBlock) < 0)
+      continue;
+    // Check 2: EVL index is always increasing, thus its inital value has to be
+    // equal to either the initial IV value (when the canonical IV is also
+    // increasing) or the last IV value (when canonical IV is decreasing).
+    Value *Init = PN.getIncomingValueForBlock(InitBlock);
+    using Direction = Loop::LoopBounds::Direction;
+    switch (Bounds->getDirection()) {
+    case Direction::Increasing:
+      if (Init != CanonicalIVInit)
+        continue;
+      break;
+    case Direction::Decreasing:
+      if (Init != CanonicalIVFinal)
+        continue;
+      break;
+    case Direction::Unknown:
+      // To be more permissive and see if either the initial or final IV value
+      // matches PN's init value.
+      if (Init != CanonicalIVInit && Init != CanonicalIVFinal)
+        continue;
+      break;
+    }
+    Value *RecValue = PN.getIncomingValueForBlock(BackEdgeBlock);
+    assert(RecValue);
+
+    LLVM_DEBUG(dbgs() << "Found candidate PN of EVL-based IndVar: " << PN
+                      << "\n");
+
+    // Check 3: Pattern match to find the EVL-based index and total trip count
+    // (TC).
+    if (match(RecValue,
+              m_c_Add(m_ZExtOrSelf(IntrinsicMatch), m_Specific(&PN))) &&
+        match(RemTC, m_Sub(m_Value(TC), m_Specific(&PN)))) {
+      EVLIndVar = RecValue;
+      break;
+    }
+  }
+
+  if (!EVLIndVar || !TC)
+    return false;
+
+  LLVM_DEBUG(dbgs() << "Using " << *EVLIndVar << " for EVL-based IndVar\n");
+
+  // Create an EVL-based comparison and replace the branch to use it as
+  // predicate.
+
+  // Loop::getLatchCmpInst check at the beginning of this function has ensured
+  // that latch block ends in a conditional branch.
+  auto *LatchBranch = cast<BranchInst>(LatchBlock->getTerminator());
+  assert(LatchBranch->isConditional());
+  ICmpInst::Predicate Pred;
+  if (LatchBranch->getSuccessor(0) == L.getHeader())
+    Pred = ICmpInst::ICMP_NE;
+  else
+    Pred = ICmpInst::ICMP_EQ;
+
+  IRBuilder<> Builder(OrigLatchCmp);
+  auto *NewLatchCmp = Builder.CreateICmp(Pred, EVLIndVar, TC);
+  OrigLatchCmp->replaceAllUsesWith(NewLatchCmp);
+
+  // llvm::RecursivelyDeleteDeadPHINode only deletes cycles whose values are
+  // not used outside the cycles. However, in this case the now-RAUW-ed
+  // OrigLatchCmp will be considered a use outside the cycle while in reality
+  // it's practically dead. Thus we need to remove it before calling
+  // RecursivelyDeleteDeadPHINode.
+  (void)RecursivelyDeleteTriviallyDeadInstructions(OrigLatchCmp);
+  if (llvm::RecursivelyDeleteDeadPHINode(IndVar))
+    LLVM_DEBUG(dbgs() << "Removed original IndVar\n");
+
+  ++NumEliminatedCanonicalIV;
+
+  return true;
+}
+
+PreservedAnalyses EVLIndVarSimplifyPass::run(Loop &L, LoopAnalysisManager &LAM,
+                                             LoopStandardAnalysisResults &AR,
+                                             LPMUpdater &U) {
+  if (EVLIndVarSimplifyImpl(AR).run(L))
+    return PreservedAnalyses::allInSet<CFGAnalyses>();
+  return PreservedAnalyses::all();
+}
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index e595347d62bf5..7ef06957d5322 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -1013,6 +1013,32 @@ void VPlan::execute(VPTransformState *State) {
     Value *Val = State->get(PhiR->getBackedgeValue(), NeedsScalar);
     cast<PHINode>(Phi)->addIncoming(Val, VectorLatchBB);
   }
+
+  // Check if it's EVL-vectorized and mark the corresponding metadata.
+  // Note that we could have done this during the codegen of
+  // ExplictVectorLength, but the enclosing vector loop was not in a good shape
+  // for us to attach the metadata.
+  bool IsEVLVectorized = llvm::any_of(*Header, [](const VPRecipeBase &Recipe) {
+    // Looking for the ExplictVectorLength VPInstruction.
+    if (const auto *VI = dyn_cast<VPInstruction>(&Recipe))
+      return VI->getOpcode() == VPInstruction::ExplicitVectorLength;
+    return false;
+  });
+  if (IsEVLVectorized) {
+    // VPTransformState::CurrentParentLoop has already been reset
+    // at this moment.
+    Loop *L = State->LI->getLoopFor(VectorLatchBB);
+    assert(L);
+    LLVMContext &Context = State->Builder.getContext();
+    MDNode *LoopID = L->getLoopID();
+    auto *IsEVLVectorizedMD = MDNode::get(
+        Context,
+        {MDString::get(Context, "llvm.loop.isvectorized.withevl"),
+         ConstantAsMetadata::get(ConstantInt::get(Context, APInt(32, 1)))});
+    MDNode *NewLoopID = makePostTransformationMetadata(Context, LoopID, {},
+                                                       {IsEVLVectorizedMD});
+    L->setLoopID(NewLoopID);
+  }
 }
 
 InstructionCost VPlan::cost(ElementCount VF, VPCostContext &Ctx) {
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll b/llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll
index ba7158eb02d90..7a28574740348 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll
@@ -53,7 +53,7 @@ define void @truncate_to_minimal_bitwidths_widen_cast_recipe(ptr %src) {
 ; CHECK-NEXT:    store i8 [[CONV36]], ptr null, align 1
 ; CHECK-NEXT:    [[IV_NEXT]] = add i64 [[IV]], 1
 ; CHECK-NEXT:    [[EC:%.*]] = icmp eq i64 [[IV]], 1
-; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
@@ -77,8 +77,9 @@ exit:                                             ; preds = %loop
   ret void
 }
 ;.
-; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
-; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
-; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.isvectorized.withevl", i32 1}
+; CHECK: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META3]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META3]], [[META2]]}
 ;.
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll b/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
index c95414db18bef..8543964968c5a 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
@@ -80,7 +80,7 @@ define void @type_info_cache_clobber(ptr %dstv, ptr %src, i64 %wide.trip.count)
 ; CHECK-NEXT:    store i16 [[CONV36]], ptr null, align 2
 ; CHECK-NEXT:    [[IV_NEXT]] = add i64 [[IV]], 1
 ; CHECK-NEXT:    [[EC:%.*]] = icmp eq i64 [[IV]], [[WIDE_TRIP_COUNT]]
-; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP8:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP9:![0-9]+]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
@@ -114,8 +114,9 @@ exit:
 ; CHECK: [[META2]] = distinct !{[[META2]], !"LVerDomain"}
 ; CHECK: [[META3]] = !{[[META4:![0-9]+]]}
 ; CHECK: [[META4]] = distinct !{[[META4]], [[META2]]}
-; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META6:![0-9]+]], [[META7:![0-9]+]]}
-; CHECK: [[META6]] = !{!"llvm.loop.isvectorized", i32 1}
-; CHECK: [[META7]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[LOOP8]] = distinct !{[[LOOP8]], [[META6]]}
+; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META6:![0-9]+]], [[META7:![0-9]+]], [[META8:![0-9]+]]}
+; CHECK: [[META6]] = !{!"llvm.loop.isvectorized.withevl", i32 1}
+; CHECK: [[META7]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META8]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP9]] = distinct !{[[LOOP9]], [[META7]]}
 ;.
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll
index e7181f7f30c77..d962258662728 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll
@@ -65,7 +65,7 @@ define void @test_and(ptr nocapture %a, ptr nocapture readonly %b) {
 ; IF-EVL-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, ptr [[B]], i64 [[LEN]]
 ; IF-EVL-NEXT:    store i8 [[TMP]], ptr [[ARRAYIDX1]], align 1
 ; IF-EVL-NEXT:    [[DOTNOT:%.*]] = icmp eq i64 [[DEC]], 100
-; IF-EVL-NEXT:    br i1 [[DOTNOT]], label %[[FINISH_LOOPEXIT]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; IF-EVL-NEXT:    br i1 [[DOTNOT]], label %[[FINISH_LOOPEXIT]], label %[[LOOP]], !llvm.loop [[LOOP4:![0-9]+]]
 ; IF-EVL:       [[FINISH_LOOPEXIT]]:
 ; IF-EVL-NEXT:    ret void
 ;
@@ -144,7 +144,7 @@ define void @test_or(ptr nocapture %a, ptr nocapture readonly %b) {
 ; IF-EVL-NEXT:    [[INDEX_EVL_NEXT]] = add nuw i64 [[TMP18]], [[EVL_BASED_IV]]
 ; IF-EVL-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP9]]
 ; IF-EVL-NEXT:    [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; IF-EVL-NEXT:    br i1 [[TMP19]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; IF-EVL-NEXT:    br i1 [[TMP19]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
 ; IF-EVL:       [[MIDDL...
[truncated]

lukel97 · 2025-03-13T12:31:51Z

The former, also known as canonical IV, is more favorable for analyses as it's "countable" in the sense of SCEV

I presume in comparison to #91796 the canonical IV now has a much shorter lifetime. Do we know what analyses still use the canonical IV in between the loop vectorizer & the end of vectorization?

EDIT: I've answered my own question, from opt-pipeline-vector-passes.ll:

Running pass: LoopVectorizePass on f (11 instructions)
Running analysis: DemandedBitsAnalysis on f
Running pass: InferAlignmentPass on f (11 instructions)
Running pass: LoopLoadEliminationPass on f (11 instructions)
Running pass: InstCombinePass on f (11 instructions)
Running analysis: LastRunTrackingAnalysis on f
Running pass: SimplifyCFGPass on f (11 instructions)
Running pass: SLPVectorizerPass on f (11 instructions)
Running pass: VectorCombinePass on f (11 instructions)
Running pass: InstCombinePass on f (11 instructions)
Running pass: LoopUnrollPass on f (11 instructions)
Running pass: WarnMissedTransformationsPass on f (11 instructions)
Running pass: SROAPass on f (11 instructions)
Running pass: InferAlignmentPass on f (11 instructions)
Running pass: InstCombinePass on f (11 instructions)
Running pass: LoopSimplifyPass on f (11 instructions)
Running pass: LCSSAPass on f (11 instructions)
Running analysis: MemorySSAAnalysis on f
Running pass: LICMPass on loop %loop in function f
Running pass: AlignmentFromAssumptionsPass on f (11 instructions)

mshockwave · 2025-03-14T19:38:25Z

The former, also known as canonical IV, is more favorable for analyses as it's "countable" in the sense of SCEV

I presume in comparison to #91796 the canonical IV now has a much shorter lifetime. Do we know what analyses still use the canonical IV in between the loop vectorizer & the end of vectorization?

EDIT: I've answered my own question, from opt-pipeline-vector-passes.ll:

Running pass: LoopVectorizePass on f (11 instructions)
Running analysis: DemandedBitsAnalysis on f
Running pass: InferAlignmentPass on f (11 instructions)
Running pass: LoopLoadEliminationPass on f (11 instructions)
Running pass: InstCombinePass on f (11 instructions)
Running analysis: LastRunTrackingAnalysis on f
Running pass: SimplifyCFGPass on f (11 instructions)
Running pass: SLPVectorizerPass on f (11 instructions)
Running pass: VectorCombinePass on f (11 instructions)
Running pass: InstCombinePass on f (11 instructions)
Running pass: LoopUnrollPass on f (11 instructions)
Running pass: WarnMissedTransformationsPass on f (11 instructions)
Running pass: SROAPass on f (11 instructions)
Running pass: InferAlignmentPass on f (11 instructions)
Running pass: InstCombinePass on f (11 instructions)
Running pass: LoopSimplifyPass on f (11 instructions)
Running pass: LCSSAPass on f (11 instructions)
Running analysis: MemorySSAAnalysis on f
Running pass: LICMPass on loop %loop in function f
Running pass: AlignmentFromAssumptionsPass on f (11 instructions)

To give another data point: I actually tried to put this Pass right after the LoopVectorizer, and in that case it only brought ~3% improvement on the dynamic IC of 401.bzip2, instead of the 8% improvement we see now.

mshockwave · 2025-03-24T20:58:02Z

ping

alexey-bataev · 2025-03-24T21:09:01Z

Rebase?

mshockwave · 2025-03-31T20:27:05Z

Rebase?

Done.

mshockwave · 2025-04-14T22:07:44Z

gentle ping (I'll rebase to fix the conflict on the test)

mshockwave · 2025-04-30T17:28:54Z

The merge conflicts are resolved now.

alexey-bataev · 2025-05-06T13:45:01Z

llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp

+// Returns the constant part of vectorization factor from the induction
+// variable's step value SCEV expression.


Suggested change

// Returns the constant part of vectorization factor from the induction

// variable's step value SCEV expression.

/// Returns the constant part of vectorization factor from the induction

/// variable's step value SCEV expression.

alexey-bataev · 2025-05-06T13:49:53Z

llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp

+    LLVM_DEBUG(dbgs() << "Cannot retrieve IV from loop " << L.getName()
+                      << "\n");


Can you add an optimization remark here too?

Done. I also add an extra remarks triggered when this transformation success.

TBA...

alexey-bataev · 2025-05-07T20:29:21Z

llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp

+
+  // Looking for loops with IV step value in the form of `(<constant VF> x
+  // vscale)`.
+  if (auto *Mul = dyn_cast<SCEVMulExpr>(Step)) {


Suggested change

if (auto *Mul = dyn_cast<SCEVMulExpr>(Step)) {

if (const auto *Mul = dyn_cast<SCEVMulExpr>(Step)) {

alexey-bataev · 2025-05-07T20:29:50Z

llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp

+  // Looking for loops with IV step value in the form of `(<constant VF> x
+  // vscale)`.
+  if (auto *Mul = dyn_cast<SCEVMulExpr>(Step)) {
+    if (Mul->getNumOperands() == 2) {


Can it have number of operands != 2 at all?

Since this is one of the first few checks against the candidate loops, I think it is possible to have a loop with step value consisting of consecutive multiplications, in which case SCEV would merge them into a single SCEVMulExpr with more than two operands.

alexey-bataev · 2025-05-07T20:30:08Z

llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp

+    if (Mul->getNumOperands() == 2) {
+      const SCEV *LHS = Mul->getOperand(0);
+      const SCEV *RHS = Mul->getOperand(1);
+      if (auto *Const = dyn_cast<SCEVConstant>(LHS)) {


Suggested change

if (auto *Const = dyn_cast<SCEVConstant>(LHS)) {

if (const auto *Const = dyn_cast<SCEVConstant>(LHS); Const && isa<SCEVVScale>(RHS)) {

alexey-bataev · 2025-05-07T20:31:02Z

llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp

+      const SCEV *RHS = Mul->getOperand(1);
+      if (auto *Const = dyn_cast<SCEVConstant>(LHS)) {
+        uint64_t V = Const->getAPInt().getLimitedValue();
+        if (isa<SCEVVScale>(RHS) && llvm::isUInt<32>(V))


Suggested change

if (isa<SCEVVScale>(RHS) && llvm::isUInt<32>(V))

if (llvm::isUInt<32>(V))

alexey-bataev · 2025-05-07T20:31:20Z

llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp

+  // If not, see if the vscale_range of the parent function is a fixed value,
+  // which makes the step value to be replaced by a constant.
+  if (F.hasFnAttribute(Attribute::VScaleRange))
+    if (auto *ConstStep = dyn_cast<SCEVConstant>(Step)) {


Suggested change

if (auto *ConstStep = dyn_cast<SCEVConstant>(Step)) {

if (const auto *ConstStep = dyn_cast<SCEVConstant>(Step)) {

alexey-bataev · 2025-05-07T20:34:03Z

llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp

+                                 : "cannot recognize induction variable");
+    LLVM_DEBUG(dbgs() << "Cannot retrieve IV from loop " << L.getName()
+                      << " because" << Reason << "\n");
+    if (ORE) {


Can you add remarks to other places, where the analysis fails too?

Yes, it's done now.

alexey-bataev · 2025-05-07T20:34:46Z

llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp

+      break;
+    }
+    Value *RecValue = PN.getIncomingValueForBlock(BackEdgeBlock);
+    assert(RecValue);


Add assertion message

alexey-bataev · 2025-05-07T20:35:15Z

llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp

+  // Loop::getLatchCmpInst check at the beginning of this function has ensured
+  // that latch block ends in a conditional branch.
+  auto *LatchBranch = cast<BranchInst>(LatchBlock->getTerminator());
+  assert(LatchBranch->isConditional());


Add assertion message

alexey-bataev · 2025-05-07T20:36:20Z

llvm/test/Transforms/LoopVectorize/evl-iv-simplify.ll

@@ -0,0 +1,333 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4


Move the test to RISCV subdir

Good catch. Done

…mplified-extension-point-metadata

And add more optimization remarks

alexey-bataev · 2025-05-13T16:07:13Z

llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp

+                        OptimizationRemarkEmitter *ORE)
+      : SE(LAR.SE), ORE(ORE) {}
+
+  // Returns true if modify the loop.


Suggested change

// Returns true if modify the loop.

/// Returns true if modify the loop.

alexey-bataev · 2025-05-13T16:08:31Z

llvm/lib/Transforms/Vectorize/EVLIndVarSimplify.cpp

+  auto IntrinsicMatch = m_Intrinsic<Intrinsic::experimental_get_vector_length>(
+      m_Value(RemTC), m_SpecificInt(VF),
+      /*Scalable=*/m_SpecificInt(1));
+  for (auto &PN : BB->phis()) {


Expand auto here

llvm-ci · 2025-05-16T02:41:21Z

LLVM Buildbot has detected a new failure on builder lld-x86_64-win running on as-worker-93 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/2925

Here is the relevant piece of the build log for the reference

Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM-Unit :: Support/./SupportTests.exe/90/95' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe-LLVM-Unit-12764-90-95.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=95 GTEST_SHARD_INDEX=90 C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe
--

Script:
--
C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe --gtest_filter=ProgramEnvTest.CreateProcessLongPath
--
C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(160): error: Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(163): error: fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied



C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:160
Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:163
fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied




********************

lukel97 · 2025-05-20T09:08:38Z

I think these should be the SPEC 2017/TSVC changes from this patch: https://lnt.lukelau.me/db_default/v4/nts/568

There's a lot of code size improvements and a 2% improvement in 525.x264_r.

However there's also a regression in 523.xalancbmk_r and one of the TSVC tests.

No need to revert or anything, I'm just posting this here so we don't forget to take a look at these. Thanks for pushing this through!

mshockwave requested review from fhahn, preames, lukel97 and alexey-bataev March 12, 2025 18:24

llvmbot added backend:RISC-V vectorizers llvm:transforms labels Mar 12, 2025

mshockwave mentioned this pull request Mar 12, 2025

[LV] Introduce the EVLIVSimplify Pass for EVL-vectorized loops #91796

Closed

mshockwave force-pushed the patch/riscv/evl-iv-simplified-extension-point-metadata branch from 4c553fe to be6105b Compare March 14, 2025 19:33

lukel97 mentioned this pull request Mar 25, 2025

[LV] Use original trip-count as the vector-trip-count if use predicated EVL instructions for tail-folding. #132675

Open

mshockwave force-pushed the patch/riscv/evl-iv-simplified-extension-point-metadata branch from be6105b to aef319e Compare March 31, 2025 20:26

mshockwave force-pushed the patch/riscv/evl-iv-simplified-extension-point-metadata branch from aef319e to 0d8906b Compare April 30, 2025 17:28

alexey-bataev reviewed May 6, 2025

View reviewed changes

mshockwave added 3 commits May 7, 2025 09:24

[LV][EVL] Introduce the EVLIVSimplify Pass for EVL-vectorized loops

e96162f

TBA...

!fixup Updated with the latest EVL-vectorized metadata

9b41ae3

fixup! Add some optimization remarks

f21428d

mshockwave force-pushed the patch/riscv/evl-iv-simplified-extension-point-metadata branch from 0d8906b to f21428d Compare May 7, 2025 17:29

alexey-bataev reviewed May 7, 2025

View reviewed changes

mshockwave added 2 commits May 9, 2025 09:41

Merge remote-tracking branch 'origin/main' into patch/riscv/evl-iv-si…

8616f22

…mplified-extension-point-metadata

fixup! Address review comments

896009b

And add more optimization remarks

mshockwave requested a review from alexey-bataev May 9, 2025 17:33

alexey-bataev reviewed May 13, 2025

View reviewed changes

fixup! Address review comments

eeddd6d

alexey-bataev approved these changes May 14, 2025

View reviewed changes

mshockwave merged commit 0ab67ec into llvm:main May 14, 2025
9 of 11 checks passed

mshockwave deleted the patch/riscv/evl-iv-simplified-extension-point-metadata branch May 14, 2025 20:49

		// Returns the constant part of vectorization factor from the induction
		// variable's step value SCEV expression.

		LLVM_DEBUG(dbgs() << "Cannot retrieve IV from loop " << L.getName()
		<< "\n");

	if (auto *Mul = dyn_cast<SCEVMulExpr>(Step)) {
	if (const auto *Mul = dyn_cast<SCEVMulExpr>(Step)) {

	if (auto *Const = dyn_cast<SCEVConstant>(LHS)) {
	if (const auto *Const = dyn_cast<SCEVConstant>(LHS); Const && isa<SCEVVScale>(RHS)) {

	if (isa<SCEVVScale>(RHS) && llvm::isUInt<32>(V))
	if (llvm::isUInt<32>(V))

	if (auto *ConstStep = dyn_cast<SCEVConstant>(Step)) {
	if (const auto *ConstStep = dyn_cast<SCEVConstant>(Step)) {

		@@ -0,0 +1,333 @@
		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4

	// Returns true if modify the loop.
	/// Returns true if modify the loop.

[LV][EVL] Introduce the EVLIndVarSimplify Pass for EVL-vectorized loops #131005

[LV][EVL] Introduce the EVLIndVarSimplify Pass for EVL-vectorized loops #131005

Uh oh!

Conversation

mshockwave commented Mar 12, 2025

Uh oh!

llvmbot commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 12, 2025

Uh oh!

lukel97 commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mshockwave commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mshockwave commented Mar 24, 2025

Uh oh!

alexey-bataev commented Mar 24, 2025

Uh oh!

mshockwave commented Mar 31, 2025

Uh oh!

mshockwave commented Apr 14, 2025

Uh oh!

mshockwave commented Apr 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Mar 12, 2025 •

edited

Loading

lukel97 commented Mar 13, 2025 •

edited

Loading

mshockwave commented Mar 14, 2025 •

edited

Loading