Skip to content

Commit d47bfd7

Browse files
committed
[LoopVectorize] Perform loop versioning for some early exit loops
When attempting to vectorise a loop with an uncountable early exit, we attempt to discover if all the loads in the loop are known to be dereferenceable. If at least one load could potentially fault then we abandon vectorisation. This patch adds support for vectorising loops with one potentially faulting load by versioning the loop based on the load pointer alignment. It is required that the vector load must always fault on the first lane, i.e. the load should not straddle a page boundary. Doing so ensures that the behaviour of the vector and scalar loops is identical, i.e. if a load does fault it will fault at the same scalar iteration. Such vectorisation depends on the following conditions being met: 1. The max vector width must not exceed the minimum page size. This is done by adding a getMaxSafeVectorWidthInBits wrapper that checks if we have an uncountable early exit. For scalable vectors we must be able to determine the maximum possible value of vscale. 2. The size of the loaded type must be a power of 2. This is checked during legalisation. 3. The VF must be a power of two (so that the vector width can divide wholly into the page size which is also power of 2). For fixed-width vectors this is always true, and for scalable vectors we query the TTI hook isVScaleKnownToBeAPowerOfTwo. If the effective runtime VF could change during the loop then this cannot be vectorised via loop versioning. 4. The load pointer must be aligned to a multiple of the vector width. (NOTE: interleaving is currently disabled for these early exit loops.) We add a runtime check to ensure this is true.
1 parent 7ba35bb commit d47bfd7

File tree

8 files changed

+871
-159
lines changed

8 files changed

+871
-159
lines changed

llvm/include/llvm/Analysis/Loads.h

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -88,12 +88,6 @@ bool isDereferenceableAndAlignedInLoop(
8888
AssumptionCache *AC = nullptr,
8989
SmallVectorImpl<const SCEVPredicate *> *Predicates = nullptr);
9090

91-
/// Return true if the loop \p L cannot fault on any iteration and only
92-
/// contains read-only memory accesses.
93-
bool isDereferenceableReadOnlyLoop(
94-
Loop *L, ScalarEvolution *SE, DominatorTree *DT, AssumptionCache *AC,
95-
SmallVectorImpl<const SCEVPredicate *> *Predicates = nullptr);
96-
9791
/// Return true if we know that executing a load from this value cannot trap.
9892
///
9993
/// If DT and ScanFrom are specified this method performs context-sensitive

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -382,11 +382,18 @@ class LoopVectorizationLegality {
382382
const LoopAccessInfo *getLAI() const { return LAI; }
383383

384384
bool isSafeForAnyVectorWidth() const {
385-
return LAI->getDepChecker().isSafeForAnyVectorWidth();
385+
return LAI->getDepChecker().isSafeForAnyVectorWidth() &&
386+
(!hasUncountableEarlyExit() || !getNumPotentiallyFaultingPointers());
386387
}
387388

388389
uint64_t getMaxSafeVectorWidthInBits() const {
389-
return LAI->getDepChecker().getMaxSafeVectorWidthInBits();
390+
uint64_t MaxSafeVectorWidth =
391+
LAI->getDepChecker().getMaxSafeVectorWidthInBits();
392+
// The legalizer bails out if getMinPageSize does not return a value.
393+
if (hasUncountableEarlyExit() && getNumPotentiallyFaultingPointers())
394+
MaxSafeVectorWidth =
395+
std::min(MaxSafeVectorWidth, uint64_t(*TTI->getMinPageSize()) * 8);
396+
return MaxSafeVectorWidth;
390397
}
391398

392399
/// Returns true if the loop has exactly one uncountable early exit, i.e. an
@@ -419,6 +426,19 @@ class LoopVectorizationLegality {
419426
unsigned getNumStores() const { return LAI->getNumStores(); }
420427
unsigned getNumLoads() const { return LAI->getNumLoads(); }
421428

429+
/// Return the number of pointers in the loop that could potentially fault in
430+
/// a loop with uncountable early exits.
431+
unsigned getNumPotentiallyFaultingPointers() const {
432+
return PotentiallyFaultingPtrs.size();
433+
}
434+
435+
/// Return a vector of all potentially faulting pointers in a loop with
436+
/// uncountable early exits.
437+
const SmallVectorImpl<std::pair<const SCEV *, Type *>> *
438+
getPotentiallyFaultingPointers() const {
439+
return &PotentiallyFaultingPtrs;
440+
}
441+
422442
/// Returns a HistogramInfo* for the given instruction if it was determined
423443
/// to be part of a load -> update -> store sequence where multiple lanes
424444
/// may be working on the same memory address.
@@ -524,6 +544,11 @@ class LoopVectorizationLegality {
524544
/// additional cases safely.
525545
bool isVectorizableEarlyExitLoop();
526546

547+
/// Returns true if all loads in the loop contained in \p Loads can be
548+
/// analyzed as potentially faulting. Any loads that may fault are added to
549+
/// the member variable PotentiallyFaultingPtrs.
550+
bool analyzePotentiallyFaultingLoads(SmallVectorImpl<LoadInst *> *Loads);
551+
527552
/// Return true if all of the instructions in the block can be speculatively
528553
/// executed, and record the loads/stores that require masking.
529554
/// \p SafePtrs is a list of addresses that are known to be legal and we know
@@ -642,6 +667,10 @@ class LoopVectorizationLegality {
642667
/// Keep track of the loop edge to an uncountable exit, comprising a pair
643668
/// of (Exiting, Exit) blocks, if there is exactly one early exit.
644669
std::optional<std::pair<BasicBlock *, BasicBlock *>> UncountableEdge;
670+
671+
/// Keep a record of all potentially faulting pointers in loops with
672+
/// uncountable early exits.
673+
SmallVector<std::pair<const SCEV *, Type *>, 4> PotentiallyFaultingPtrs;
645674
};
646675

647676
} // namespace llvm

llvm/lib/Analysis/Loads.cpp

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -816,18 +816,3 @@ bool llvm::canReplacePointersIfEqual(const Value *From, const Value *To,
816816

817817
return isPointerAlwaysReplaceable(From, To, DL);
818818
}
819-
820-
bool llvm::isDereferenceableReadOnlyLoop(
821-
Loop *L, ScalarEvolution *SE, DominatorTree *DT, AssumptionCache *AC,
822-
SmallVectorImpl<const SCEVPredicate *> *Predicates) {
823-
for (BasicBlock *BB : L->blocks()) {
824-
for (Instruction &I : *BB) {
825-
if (auto *LI = dyn_cast<LoadInst>(&I)) {
826-
if (!isDereferenceableAndAlignedInLoop(LI, L, *SE, *DT, AC, Predicates))
827-
return false;
828-
} else if (I.mayReadFromMemory() || I.mayWriteToMemory() || I.mayThrow())
829-
return false;
830-
}
831-
}
832-
return true;
833-
}

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Lines changed: 73 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1602,6 +1602,43 @@ bool LoopVectorizationLegality::canVectorizeLoopNestCFG(
16021602
return Result;
16031603
}
16041604

1605+
bool LoopVectorizationLegality::analyzePotentiallyFaultingLoads(
1606+
SmallVectorImpl<LoadInst *> *Loads) {
1607+
LLVM_DEBUG(dbgs() << "LV: Looking for potentially faulting loads in loop "
1608+
"with uncountable early exit:\n");
1609+
for (LoadInst *LI : *Loads) {
1610+
LLVM_DEBUG(dbgs() << "LV: Load: " << *LI << '\n');
1611+
Value *Ptr = LI->getPointerOperand();
1612+
if (!Ptr)
1613+
return false;
1614+
const SCEV *PtrExpr = PSE.getSCEV(Ptr);
1615+
const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrExpr);
1616+
// TODO: Deal with loop invariant pointers.
1617+
if (!AR || AR->getLoop() != TheLoop || !AR->isAffine())
1618+
return false;
1619+
auto Step = dyn_cast<SCEVConstant>(AR->getStepRecurrence(*PSE.getSE()));
1620+
if (!Step)
1621+
return false;
1622+
const SCEV *Start = AR->getStart();
1623+
1624+
// Make sure the step is positive and matches the object size in memory.
1625+
// TODO: Extend this to cover more cases.
1626+
auto &DL = LI->getDataLayout();
1627+
APInt EltSize(DL.getIndexTypeSizeInBits(Ptr->getType()),
1628+
DL.getTypeStoreSize(LI->getType()).getFixedValue());
1629+
1630+
// Also discard element sizes that are not a power of 2, since the loop
1631+
// vectorizer can only perform loop versioning with pointer alignment
1632+
// checks for vector loads that are power-of-2 in size.
1633+
if (EltSize != Step->getAPInt() || !EltSize.isPowerOf2())
1634+
return false;
1635+
1636+
LLVM_DEBUG(dbgs() << "LV: SCEV for Load Ptr: " << *Start << '\n');
1637+
PotentiallyFaultingPtrs.push_back({Start, LI->getType()});
1638+
}
1639+
return true;
1640+
}
1641+
16051642
bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {
16061643
BasicBlock *LatchBB = TheLoop->getLoopLatch();
16071644
if (!LatchBB) {
@@ -1706,6 +1743,8 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {
17061743
}
17071744
};
17081745

1746+
Predicates.clear();
1747+
SmallVector<LoadInst *, 4> NonDerefLoads;
17091748
for (auto *BB : TheLoop->blocks())
17101749
for (auto &I : *BB) {
17111750
if (I.mayWriteToMemory()) {
@@ -1715,30 +1754,52 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {
17151754
"Cannot vectorize early exit loop with writes to memory",
17161755
"WritesInEarlyExitLoop", ORE, TheLoop);
17171756
return false;
1718-
} else if (!IsSafeOperation(&I)) {
1757+
} else if (I.mayThrow() || !IsSafeOperation(&I)) {
17191758
reportVectorizationFailure("Early exit loop contains operations that "
17201759
"cannot be speculatively executed",
17211760
"UnsafeOperationsEarlyExitLoop", ORE,
17221761
TheLoop);
17231762
return false;
1763+
} else if (I.mayReadFromMemory()) {
1764+
auto *LI = dyn_cast<LoadInst>(&I);
1765+
bool UnsafeRead = false;
1766+
if (!LI)
1767+
UnsafeRead = true;
1768+
else if (!isDereferenceableAndAlignedInLoop(LI, TheLoop, *PSE.getSE(),
1769+
*DT, AC, &Predicates)) {
1770+
if (LI->getParent() != TheLoop->getHeader())
1771+
UnsafeRead = true;
1772+
else
1773+
NonDerefLoads.push_back(LI);
1774+
}
1775+
1776+
if (UnsafeRead) {
1777+
reportVectorizationFailure(
1778+
"Loop may fault",
1779+
"Cannot vectorize potentially faulting early exit loop",
1780+
"PotentiallyFaultingEarlyExitLoop", ORE, TheLoop);
1781+
return false;
1782+
}
17241783
}
17251784
}
17261785

1786+
if (!NonDerefLoads.empty()) {
1787+
if (!TTI->getMinPageSize() ||
1788+
!analyzePotentiallyFaultingLoads(&NonDerefLoads)) {
1789+
PotentiallyFaultingPtrs.clear();
1790+
reportVectorizationFailure(
1791+
"Loop may fault",
1792+
"Cannot vectorize potentially faulting early exit loop",
1793+
"PotentiallyFaultingEarlyExitLoop", ORE, TheLoop);
1794+
return false;
1795+
}
1796+
LLVM_DEBUG(dbgs() << "We can vectorize the loop with runtime checks.\n");
1797+
}
1798+
17271799
// The vectoriser cannot handle loads that occur after the early exit block.
17281800
assert(LatchBB->getUniquePredecessor() == SingleUncountableEdge->first &&
17291801
"Expected latch predecessor to be the early exiting block");
17301802

1731-
// TODO: Handle loops that may fault.
1732-
Predicates.clear();
1733-
if (!isDereferenceableReadOnlyLoop(TheLoop, PSE.getSE(), DT, AC,
1734-
&Predicates)) {
1735-
reportVectorizationFailure(
1736-
"Loop may fault",
1737-
"Cannot vectorize potentially faulting early exit loop",
1738-
"PotentiallyFaultingEarlyExitLoop", ORE, TheLoop);
1739-
return false;
1740-
}
1741-
17421803
[[maybe_unused]] const SCEV *SymbolicMaxBTC =
17431804
PSE.getSymbolicMaxBackedgeTakenCount();
17441805
// Since we have an exact exit count for the latch and the early exit

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Lines changed: 64 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -401,6 +401,12 @@ static cl::opt<bool> EnableEarlyExitVectorization(
401401
cl::desc(
402402
"Enable vectorization of early exit loops with uncountable exits."));
403403

404+
static cl::opt<unsigned> MaxNumPotentiallyFaultingPointers(
405+
"max-num-faulting-pointers", cl::init(0), cl::Hidden,
406+
cl::desc(
407+
"The maximum number of potentially faulting pointers we permit when "
408+
"vectorizing loops with uncountable exits."));
409+
404410
// Likelyhood of bypassing the vectorized loop because assumptions about SCEV
405411
// variables not overflowing do not hold. See `emitSCEVChecks`.
406412
static constexpr uint32_t SCEVCheckBypassWeights[] = {1, 127};
@@ -2163,6 +2169,27 @@ class GeneratedRTChecks {
21632169
};
21642170
} // namespace
21652171

2172+
static void addPointerAlignmentChecks(
2173+
const SmallVectorImpl<std::pair<const SCEV *, Type *>> *Ptrs, Function *F,
2174+
PredicatedScalarEvolution &PSE, TargetTransformInfo *TTI, ElementCount VF,
2175+
unsigned IC) {
2176+
ScalarEvolution *SE = PSE.getSE();
2177+
const DataLayout &DL = SE->getDataLayout();
2178+
2179+
for (auto Ptr : *Ptrs) {
2180+
Type *PtrIntType = DL.getIntPtrType(Ptr.first->getType());
2181+
APInt EltSize(PtrIntType->getScalarSizeInBits(),
2182+
DL.getTypeStoreSize(Ptr.second).getFixedValue());
2183+
const SCEV *Start = SE->getPtrToIntExpr(Ptr.first, PtrIntType);
2184+
const SCEV *ScevEC = SE->getElementCount(PtrIntType, VF * IC);
2185+
const SCEV *Align =
2186+
SE->getMulExpr(ScevEC, SE->getConstant(EltSize),
2187+
(SCEV::NoWrapFlags)(SCEV::FlagNSW | SCEV::FlagNUW));
2188+
const SCEV *Rem = SE->getURemExpr(Start, Align);
2189+
PSE.addPredicate(*(SE->getEqualPredicate(Rem, SE->getZero(PtrIntType))));
2190+
}
2191+
}
2192+
21662193
static bool useActiveLaneMask(TailFoldingStyle Style) {
21672194
return Style == TailFoldingStyle::Data ||
21682195
Style == TailFoldingStyle::DataAndControlFlow ||
@@ -3842,6 +3869,15 @@ bool LoopVectorizationCostModel::isScalableVectorizationAllowed() {
38423869
return false;
38433870
}
38443871

3872+
if (Legal->hasUncountableEarlyExit() &&
3873+
Legal->getNumPotentiallyFaultingPointers() &&
3874+
!TTI.isVScaleKnownToBeAPowerOfTwo()) {
3875+
reportVectorizationInfo("Cannot vectorize potentially faulting early exit "
3876+
"loop with scalable vectors.",
3877+
"ScalableVFUnfeasible", ORE, TheLoop);
3878+
return false;
3879+
}
3880+
38453881
IsScalableVectorizationAllowed = true;
38463882
return true;
38473883
}
@@ -10508,11 +10544,25 @@ bool LoopVectorizePass::processLoop(Loop *L) {
1050810544
return false;
1050910545
}
1051010546

10511-
if (LVL.hasUncountableEarlyExit() && !EnableEarlyExitVectorization) {
10512-
reportVectorizationFailure("Auto-vectorization of loops with uncountable "
10513-
"early exit is not enabled",
10514-
"UncountableEarlyExitLoopsDisabled", ORE, L);
10515-
return false;
10547+
if (LVL.hasUncountableEarlyExit()) {
10548+
if (!EnableEarlyExitVectorization) {
10549+
reportVectorizationFailure("Auto-vectorization of loops with uncountable "
10550+
"early exit is not enabled",
10551+
"UncountableEarlyExitLoopsDisabled", ORE, L);
10552+
return false;
10553+
}
10554+
10555+
unsigned NumPotentiallyFaultingPointers =
10556+
LVL.getNumPotentiallyFaultingPointers();
10557+
if (NumPotentiallyFaultingPointers > MaxNumPotentiallyFaultingPointers) {
10558+
reportVectorizationFailure("Not worth vectorizing loop with uncountable "
10559+
"early exit, due to number of potentially "
10560+
"faulting loads",
10561+
"UncountableEarlyExitMayFault", ORE, L);
10562+
return false;
10563+
} else if (NumPotentiallyFaultingPointers)
10564+
LLVM_DEBUG(dbgs() << "LV: Need to version early-exit vector loop with "
10565+
<< "pointer alignment checks.\n");
1051610566
}
1051710567

1051810568
// Entrance to the VPlan-native vectorization path. Outer loops are processed
@@ -10663,8 +10713,16 @@ bool LoopVectorizePass::processLoop(Loop *L) {
1066310713
unsigned SelectedIC = std::max(IC, UserIC);
1066410714
// Optimistically generate runtime checks if they are needed. Drop them if
1066510715
// they turn out to not be profitable.
10666-
if (VF.Width.isVector() || SelectedIC > 1)
10716+
if (VF.Width.isVector() || SelectedIC > 1) {
10717+
if (LVL.getNumPotentiallyFaultingPointers()) {
10718+
assert(!CM.foldTailWithEVL() &&
10719+
"Explicit vector length unsupported for early exit loops and "
10720+
"potentially faulting loads");
10721+
addPointerAlignmentChecks(LVL.getPotentiallyFaultingPointers(), F, PSE,
10722+
TTI, VF.Width, SelectedIC);
10723+
}
1066710724
Checks.create(L, *LVL.getLAI(), PSE.getPredicate(), VF.Width, SelectedIC);
10725+
}
1066810726

1066910727
// Check if it is profitable to vectorize with runtime checks.
1067010728
bool ForceVectorization =

0 commit comments

Comments
 (0)