-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LV] Split checking if tail-folding is possible, collecting masked ops. #77612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LV] Split checking if tail-folding is possible, collecting masked ops. #77612
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesIntroduce new canFoldTail helper which only checks if tail-folding is possible, but without modifying MaskedOps. Just because tail-folding is possible doesn't mean the tail will be folded; that's up to the cost-model to decide. Separating the check if tail-folding is possible and preparing for tail-folding makes sure that MaskedOps is only populated when tail-folding is actually selected. This allows only creating the header mask if needed after #76635. Full diff: https://github.com/llvm/llvm-project/pull/77612.diff 3 Files Affected:
diff --git a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
index a509ebf6a7e1b3..af23813da569b5 100644
--- a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
+++ b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
@@ -276,9 +276,12 @@ class LoopVectorizationLegality {
bool canVectorizeFPMath(bool EnableStrictReductions);
/// Return true if we can vectorize this loop while folding its tail by
- /// masking, and mark all respective loads/stores for masking.
- /// This object's state is only modified iff this function returns true.
- bool prepareToFoldTailByMasking();
+ /// masking.
+ bool canFoldTailByMasking() const;
+
+ /// Mark all respective loads/stores for masking. Must only be called when
+ /// ail-folding is possible.
+ void prepareToFoldTailByMasking();
/// Returns the primary induction variable.
PHINode *getPrimaryInduction() { return PrimaryInduction; }
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index 37a356c43e29a4..88fabf620a83ab 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -1525,7 +1525,7 @@ bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) {
return Result;
}
-bool LoopVectorizationLegality::prepareToFoldTailByMasking() {
+bool LoopVectorizationLegality::canFoldTailByMasking() const {
LLVM_DEBUG(dbgs() << "LV: checking if tail can be folded by masking.\n");
@@ -1570,8 +1570,24 @@ bool LoopVectorizationLegality::prepareToFoldTailByMasking() {
LLVM_DEBUG(dbgs() << "LV: can fold tail by masking.\n");
- MaskedOp.insert(TmpMaskedOp.begin(), TmpMaskedOp.end());
return true;
}
+void LoopVectorizationLegality::prepareToFoldTailByMasking() {
+ // The list of pointers that we can safely read and write to remains empty.
+ SmallPtrSet<Value *, 8> SafePointers;
+
+ // Collect masked ops in temporary set first to avoid partially populating
+ // MaskedOp if a block cannot be predicated.
+ SmallPtrSet<const Instruction *, 8> TmpMaskedOp;
+
+ // Check and mark all blocks for predication, including those that ordinarily
+ // do not need predication such as the header block.
+ for (BasicBlock *BB : TheLoop->blocks()) {
+ bool R = blockCanBePredicated(BB, SafePointers, MaskedOp);
+ (void)R;
+ assert(R && "Must be able to predicate block when tail-folding.");
+ }
+}
+
} // namespace llvm
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 51ce88480c0880..d2506bd2aee6ae 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -4689,7 +4689,7 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
// found modulo the vectorization factor is not zero, try to fold the tail
// by masking.
// FIXME: look for a smaller MaxVF that does divide TC rather than masking.
- if (Legal->prepareToFoldTailByMasking()) {
+ if (Legal->canFoldTailByMasking()) {
CanFoldTailByMasking = true;
return MaxFactors;
}
@@ -7307,6 +7307,9 @@ LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {
CM.invalidateCostModelingDecisions();
}
+ if (CM.foldTailByMasking())
+ Legal->prepareToFoldTailByMasking();
+
ElementCount MaxUserVF =
UserVF.isScalable() ? MaxFactors.ScalableVF : MaxFactors.FixedVF;
bool UserVFIsLegal = ElementCount::isKnownLE(UserVF, MaxUserVF);
@@ -8680,10 +8683,12 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
VPBB->setName(BB->getName());
Builder.setInsertPoint(VPBB);
- if (VPBB == HeaderVPBB)
- RecipeBuilder.createHeaderMask(*Plan);
- else if (NeedsMasks)
- RecipeBuilder.createBlockInMask(BB, *Plan);
+ if (NeedsMasks) {
+ if (VPBB == HeaderVPBB)
+ RecipeBuilder.createHeaderMask(*Plan);
+ else
+ RecipeBuilder.createBlockInMask(BB, *Plan);
+ }
// Introduce each ingredient into VPlan.
// TODO: Model and preserve debug intrinsics in VPlan.
|
507e366
to
35741d9
Compare
35741d9
to
2aaf792
Compare
Introduce new canFoldTail helper which only checks if tail-folding is possible, but without modifying MaskedOps. Just because tail-folding is possible doesn't mean the tail will be folded; that's up to the cost-model to decide. Separating the check if tail-folding is possible and preparing for tail-folding makes sure that MaskedOps is only populated when tail-folding is actually selected. This allows only creating the header mask if needed after llvm#76635.
2aaf792
to
9fa9457
Compare
Updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG with a nit
bool R = blockCanBePredicated(BB, SafePointers, MaskedOp); | ||
(void)R; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool R = blockCanBePredicated(BB, SafePointers, MaskedOp); | |
(void)R; | |
[[maybe_unused]] bool R = blockCanBePredicated(BB, SafePointers, MaskedOp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM2, adding various nits.
bool canFoldTailByMasking() const; | ||
|
||
/// Mark all respective loads/stores for masking. Must only be called when | ||
/// ail-folding is possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// ail-folding is possible. | |
/// tail-folding is possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed thanks!
return true; | ||
} | ||
|
||
void LoopVectorizationLegality::prepareToFoldTailByMasking() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above comments worth updating:
// Collect masked ops in temporary set first to avoid partially populating
// MaskedOp if a block cannot be predicated.
comment needs rephrasing.
// Check and mark all blocks for predication, including those that ordinarily
// do not need predication such as the header block.
only Check, w/o mark
LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking as requested.\n");
drop "as requested"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
// Collect masked ops in temporary set first to avoid partially populating | ||
// MaskedOp if a block cannot be predicated. | ||
SmallPtrSet<const Instruction *, 8> TmpMaskedOp; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Collect masked ops in temporary set first to avoid partially populating | |
// MaskedOp if a block cannot be predicated. | |
SmallPtrSet<const Instruction *, 8> TmpMaskedOp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed thanks!
// MaskedOp if a block cannot be predicated. | ||
SmallPtrSet<const Instruction *, 8> TmpMaskedOp; | ||
|
||
// Check and mark all blocks for predication, including those that ordinarily |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Check and mark all blocks for predication, including those that ordinarily | |
// Mark all blocks for predication, including those that ordinarily |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
// Check and mark all blocks for predication, including those that ordinarily | ||
// do not need predication such as the header block. | ||
for (BasicBlock *BB : TheLoop->blocks()) { | ||
bool R = blockCanBePredicated(BB, SafePointers, MaskedOp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth considering splitting this into two, as checking if block can be predicated and collecting its masked ops (if it can) seem two distinct operations. Both avoid considering SafePointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left as is for now for this PR to keep the number of changes small, but will check as follow-up. More generally, we should probably collect and consider safe pointers with tail folding as well
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/41/builds/456 Here is the relevant piece of the build log for the reference:
|
Introduce new canFoldTail helper which only checks if tail-folding is possible, but without modifying MaskedOps.
Just because tail-folding is possible doesn't mean the tail will be folded; that's up to the cost-model to decide. Separating the check if tail-folding is possible and preparing for tail-folding makes sure that MaskedOps is only populated when tail-folding is actually selected.
This allows only creating the header mask if needed after #76635.