Skip to content

Commit 1e7efd3

Browse files
committed
[LV] Legalize scalable VF hints
In the following loop: void foo(int *a, int *b, int N) { for (int i=0; i<N; ++i) a[i + 4] = a[i] + b[i]; } The loop dependence constrains the VF to a maximum of (4, fixed), which would mean using <4 x i32> as the vector type in vectorization. Extending this to scalable vectorization, a VF of (4, scalable) implies a vector type of <vscale x 4 x i32>. To determine if this is legal vscale must be taken into account. For this example, unless max(vscale)=1, it's unsafe to vectorize. For SVE, the number of bits in an SVE register is architecturally defined to be a multiple of 128 bits with a maximum of 2048 bits, thus the maximum vscale is 16. In the loop above it is therefore unfeasible to vectorize with SVE. However, in this loop: void foo(int *a, int *b, int N) { #pragma clang loop vectorize_width(X, scalable) for (int i=0; i<N; ++i) a[i + 32] = a[i] + b[i]; } As long as max(vscale) multiplied by the number of lanes 'X' doesn't exceed the dependence distance, it is safe to vectorize. For SVE a VF of (2, scalable) is within this constraint, since a vector of <16 x 2 x 32> will have no dependencies between lanes. For any number of lanes larger than this it would be unsafe to vectorize. This patch extends 'computeFeasibleMaxVF' to legalize scalable VFs specified as loop hints, implementing the following behaviour: * If the backend does not support scalable vectors, ignore the hint. * If scalable vectorization is unfeasible given the loop dependence, like in the first example above for SVE, then use a fixed VF. * Accept scalable VFs if it's safe to do so. * Otherwise, clamp scalable VFs that exceed the maximum safe VF. Reviewed By: sdesmalen, fhahn, david-arm Differential Revision: https://reviews.llvm.org/D91718
1 parent eeba70a commit 1e7efd3

File tree

9 files changed

+522
-45
lines changed

9 files changed

+522
-45
lines changed

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,12 @@ class MemoryDepChecker {
205205
return Status == VectorizationSafetyStatus::Safe;
206206
}
207207

208+
/// Return true if the number of elements that are safe to operate on
209+
/// simultaneously is not bounded.
210+
bool isSafeForAnyVectorWidth() const {
211+
return MaxSafeVectorWidthInBits == UINT_MAX;
212+
}
213+
208214
/// The maximum number of bytes of a vector register we can vectorize
209215
/// the accesses safely with.
210216
uint64_t getMaxSafeDepDistBytes() { return MaxSafeDepDistBytes; }

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -325,6 +325,10 @@ class LoopVectorizationLegality {
325325

326326
const LoopAccessInfo *getLAI() const { return LAI; }
327327

328+
bool isSafeForAnyVectorWidth() const {
329+
return LAI->getDepChecker().isSafeForAnyVectorWidth();
330+
}
331+
328332
unsigned getMaxSafeDepDistBytes() { return LAI->getMaxSafeDepDistBytes(); }
329333

330334
uint64_t getMaxSafeVectorWidthInBits() const {

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Lines changed: 79 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -272,6 +272,12 @@ static cl::opt<unsigned> ForceTargetInstructionCost(
272272
"an instruction to a single constant value. Mostly "
273273
"useful for getting consistent testing."));
274274

275+
static cl::opt<bool> ForceTargetSupportsScalableVectors(
276+
"force-target-supports-scalable-vectors", cl::init(false), cl::Hidden,
277+
cl::desc(
278+
"Pretend that scalable vectors are supported, even if the target does "
279+
"not support them. This flag should only be used for testing."));
280+
275281
static cl::opt<unsigned> SmallLoopCost(
276282
"small-loop-cost", cl::init(20), cl::Hidden,
277283
cl::desc(
@@ -5592,6 +5598,30 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
55925598
ElementCount
55935599
LoopVectorizationCostModel::computeFeasibleMaxVF(unsigned ConstTripCount,
55945600
ElementCount UserVF) {
5601+
bool IgnoreScalableUserVF = UserVF.isScalable() &&
5602+
!TTI.supportsScalableVectors() &&
5603+
!ForceTargetSupportsScalableVectors;
5604+
if (IgnoreScalableUserVF) {
5605+
LLVM_DEBUG(
5606+
dbgs() << "LV: Ignoring VF=" << UserVF
5607+
<< " because target does not support scalable vectors.\n");
5608+
ORE->emit([&]() {
5609+
return OptimizationRemarkAnalysis(DEBUG_TYPE, "IgnoreScalableUserVF",
5610+
TheLoop->getStartLoc(),
5611+
TheLoop->getHeader())
5612+
<< "Ignoring VF=" << ore::NV("UserVF", UserVF)
5613+
<< " because target does not support scalable vectors.";
5614+
});
5615+
}
5616+
5617+
// Beyond this point two scenarios are handled. If UserVF isn't specified
5618+
// then a suitable VF is chosen. If UserVF is specified and there are
5619+
// dependencies, check if it's legal. However, if a UserVF is specified and
5620+
// there are no dependencies, then there's nothing to do.
5621+
if (UserVF.isNonZero() && !IgnoreScalableUserVF &&
5622+
Legal->isSafeForAnyVectorWidth())
5623+
return UserVF;
5624+
55955625
MinBWs = computeMinimumValueSizes(TheLoop->getBlocks(), *DB, &TTI);
55965626
unsigned SmallestType, WidestType;
55975627
std::tie(SmallestType, WidestType) = getSmallestAndWidestTypes();
@@ -5603,15 +5633,42 @@ LoopVectorizationCostModel::computeFeasibleMaxVF(unsigned ConstTripCount,
56035633
// dependence distance).
56045634
unsigned MaxSafeVectorWidthInBits = Legal->getMaxSafeVectorWidthInBits();
56055635

5606-
if (UserVF.isNonZero()) {
5607-
// For now, don't verify legality of scalable vectors.
5608-
// This will be addressed properly in https://reviews.llvm.org/D91718.
5609-
if (UserVF.isScalable())
5610-
return UserVF;
5636+
// If the user vectorization factor is legally unsafe, clamp it to a safe
5637+
// value. Otherwise, return as is.
5638+
if (UserVF.isNonZero() && !IgnoreScalableUserVF) {
5639+
unsigned MaxSafeElements =
5640+
PowerOf2Floor(MaxSafeVectorWidthInBits / WidestType);
5641+
ElementCount MaxSafeVF = ElementCount::getFixed(MaxSafeElements);
5642+
5643+
if (UserVF.isScalable()) {
5644+
Optional<unsigned> MaxVScale = TTI.getMaxVScale();
5645+
5646+
// Scale VF by vscale before checking if it's safe.
5647+
MaxSafeVF = ElementCount::getScalable(
5648+
MaxVScale ? (MaxSafeElements / MaxVScale.getValue()) : 0);
5649+
5650+
if (MaxSafeVF.isZero()) {
5651+
// The dependence distance is too small to use scalable vectors,
5652+
// fallback on fixed.
5653+
LLVM_DEBUG(
5654+
dbgs()
5655+
<< "LV: Max legal vector width too small, scalable vectorization "
5656+
"unfeasible. Using fixed-width vectorization instead.\n");
5657+
ORE->emit([&]() {
5658+
return OptimizationRemarkAnalysis(DEBUG_TYPE, "ScalableVFUnfeasible",
5659+
TheLoop->getStartLoc(),
5660+
TheLoop->getHeader())
5661+
<< "Max legal vector width too small, scalable vectorization "
5662+
<< "unfeasible. Using fixed-width vectorization instead.";
5663+
});
5664+
return computeFeasibleMaxVF(
5665+
ConstTripCount, ElementCount::getFixed(UserVF.getKnownMinValue()));
5666+
}
5667+
}
56115668

5612-
// If legally unsafe, clamp the user vectorization factor to a safe value.
5613-
unsigned MaxSafeVF = PowerOf2Floor(MaxSafeVectorWidthInBits / WidestType);
5614-
if (UserVF.getFixedValue() <= MaxSafeVF)
5669+
LLVM_DEBUG(dbgs() << "LV: The max safe VF is: " << MaxSafeVF << ".\n");
5670+
5671+
if (ElementCount::isKnownLE(UserVF, MaxSafeVF))
56155672
return UserVF;
56165673

56175674
LLVM_DEBUG(dbgs() << "LV: User VF=" << UserVF
@@ -5626,7 +5683,7 @@ LoopVectorizationCostModel::computeFeasibleMaxVF(unsigned ConstTripCount,
56265683
<< " is unsafe, clamping to maximum safe vectorization factor "
56275684
<< ore::NV("VectorizationFactor", MaxSafeVF);
56285685
});
5629-
return ElementCount::getFixed(MaxSafeVF);
5686+
return MaxSafeVF;
56305687
}
56315688

56325689
WidestRegister = std::min(WidestRegister, MaxSafeVectorWidthInBits);
@@ -7426,17 +7483,24 @@ LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {
74267483
ElementCount MaxVF = MaybeMaxVF.getValue();
74277484
assert(MaxVF.isNonZero() && "MaxVF is zero.");
74287485

7429-
if (!UserVF.isZero() && ElementCount::isKnownLE(UserVF, MaxVF)) {
7430-
LLVM_DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");
7431-
assert(isPowerOf2_32(UserVF.getKnownMinValue()) &&
7486+
bool UserVFIsLegal = ElementCount::isKnownLE(UserVF, MaxVF);
7487+
if (!UserVF.isZero() &&
7488+
(UserVFIsLegal || (UserVF.isScalable() && MaxVF.isScalable()))) {
7489+
// FIXME: MaxVF is temporarily used inplace of UserVF for illegal scalable
7490+
// VFs here, this should be reverted to only use legal UserVFs once the
7491+
// loop below supports scalable VFs.
7492+
ElementCount VF = UserVFIsLegal ? UserVF : MaxVF;
7493+
LLVM_DEBUG(dbgs() << "LV: Using " << (UserVFIsLegal ? "user" : "max")
7494+
<< " VF " << VF << ".\n");
7495+
assert(isPowerOf2_32(VF.getKnownMinValue()) &&
74327496
"VF needs to be a power of two");
74337497
// Collect the instructions (and their associated costs) that will be more
74347498
// profitable to scalarize.
7435-
CM.selectUserVectorizationFactor(UserVF);
7499+
CM.selectUserVectorizationFactor(VF);
74367500
CM.collectInLoopReductions();
7437-
buildVPlansWithVPRecipes(UserVF, UserVF);
7501+
buildVPlansWithVPRecipes(VF, VF);
74387502
LLVM_DEBUG(printPlans(dbgs()));
7439-
return {{UserVF, 0}};
7503+
return {{VF, 0}};
74407504
}
74417505

74427506
assert(!MaxVF.isScalable() &&

0 commit comments

Comments
 (0)