-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[SLP] Avoid crash in computeExtractCost #93188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
For a downstream target we ended up in a situation with assertion failures in ShuffleCostEstimator::computeExtractCost. Input IR looked like this: define void @foo(ptr %p0, <64 x i32> %vec) { %p1 = getelementptr i32, ptr %p0, i16 1 %p2 = getelementptr i32, ptr %p0, i16 2 %p3 = getelementptr i32, ptr %p0, i16 3 %p4 = getelementptr i32, ptr %p0, i16 4 %p5 = getelementptr i32, ptr %p0, i16 5 %p6 = getelementptr i32, ptr %p0, i16 6 %p7 = getelementptr i32, ptr %p0, i16 7 %elt = extractelement <64 x i32> %vec, i32 0 store i32 %elt, ptr %p0 store i32 %elt, ptr %p1 store i32 %elt, ptr %p2 store i32 %elt, ptr %p3 store i32 %elt, ptr %p4 store i32 %elt, ptr %p5 store i32 %elt, ptr %p6 store i32 %elt, ptr %p7 ret void } And the scenario was like this: - VL and Mask has 8 elements at entry to computeExtractCost. - NumParts is 2 (v8i32 is not legal, but v4i32 is legal). - NumElts is calculated as 64 (given by extractelement <64 x i32>). - NumSrcRegs is calculated and is set to 1 (v64i32 is legal). - EltsPerVector is calculated as 64 (given by NumElts/NumSrcRegs). - Assertion failure happens when doing ArrayRef<int> MaskSlice = Mask.slice(Part * EltsPerVector, (Part == NumParts - 1 && Mask.size() % EltsPerVector != 0) ? Mask.size() % EltsPerVector : EltsPerVector); since EltsPerVector is larger than Mask.size() already for Part==0. This patch resolved the issue by making sure that we slice up Mask in at most EltsPerVector pieces until we have covered the full Mask. When we have covered all elements in Mask we break the loop. Haven't been able to reproduce this scenario for any in-tree target. So unfortunately there is no regression test included in the patch.
@llvm/pr-subscribers-llvm-transforms Author: Björn Pettersson (bjope) ChangesFor a downstream target we ended up in a situation with assertion failures in ShuffleCostEstimator::computeExtractCost. Input IR looked like this: define void @foo(ptr %p0, <64 x i32> %vec) { And the scenario was like this:
This patch resolved the issue by making sure that we slice up Mask in at most EltsPerVector pieces until we have covered the full Mask. When we have covered all elements in Mask we break the loop. Haven't been able to reproduce this scenario for any in-tree target. So unfortunately there is no regression test included in the patch. Full diff: https://github.com/llvm/llvm-project/pull/93188.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 08ecbe304429e..f25ddf6bd8082 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -8318,11 +8318,13 @@ class BoUpSLP::ShuffleCostEstimator : public BaseShuffleAnalysis {
for (unsigned Part = 0; Part < NumParts; ++Part) {
if (!ShuffleKinds[Part])
continue;
- ArrayRef<int> MaskSlice =
- Mask.slice(Part * EltsPerVector,
- (Part == NumParts - 1 && Mask.size() % EltsPerVector != 0)
- ? Mask.size() % EltsPerVector
- : EltsPerVector);
+ unsigned SliceStart = Part * EltsPerVector;
+ if (SliceStart >= Mask.size())
+ break;
+ unsigned SliceSize = (SliceStart + EltsPerVector) > Mask.size()
+ ? Mask.size() - SliceStart
+ : EltsPerVector;
+ ArrayRef<int> MaskSlice = Mask.slice(SliceStart, SliceSize);
SmallVector<int> SubMask(EltsPerVector, PoisonMaskElem);
copy(MaskSlice, SubMask.begin());
std::optional<TTI::ShuffleKind> RegShuffleKind =
|
Hi @alexey-bataev , do you think this patch looks OK despite not having any in-tree reproducer? I could not really find any matching piece of code doing the slicing based on EltsPerVector like that, but I wonder a bit if the cost estimates here need to match some other piece of code used to emit the actual shuffles? When I looked at this I also noticed some other things that looked suspect to me. ShuffleCostEstimator::adjustExtracts is calculating
won't cover for the full VL/mask. There are a couple of more places with similar code (e.g. BoUpSLP::tryToGatherExtractElements). It is not clear to me if this is a real problem. But the code seems to have some assumptions regarding how NumParts relates to VL.size() and Mask.size() etc. But it is not quite clear what those assumptions are. |
It is just an unexpected situation, where <8 x i32> has 2 parts, while <64 x i32> has just one. Is this correct at all? Actually, all this code must be moved to TTI and calculated there for best cost estimation, this is just a temporary solution. |
adjustExtracts code is correct for power-of-2 number of elements, but not correct for non-power-of-2 number of elements. I will fix it. |
I actually find the existing TTI.getNumberOfParts helper a bit weird (at least the default BasicTTIImple version). It is implemented as checking for some type legalization cost. Type legalization might involve scalarization/widening/splitting etc. So the number returned is some sort of cost (how many steps involved in legalizing the type?). But SLPVectorizer is for example treating it as NumSrcRegs. |
TTI.getNumberOfParts is correct in general. I'll try to fix the code to support such non-standard situations. |
For a downstream target we ended up in a situation with assertion failures in ShuffleCostEstimator::computeExtractCost.
Input IR looked like this:
define void @foo(ptr %p0, <64 x i32> %vec) {
%p1 = getelementptr i32, ptr %p0, i16 1
%p2 = getelementptr i32, ptr %p0, i16 2
%p3 = getelementptr i32, ptr %p0, i16 3
%p4 = getelementptr i32, ptr %p0, i16 4
%p5 = getelementptr i32, ptr %p0, i16 5
%p6 = getelementptr i32, ptr %p0, i16 6
%p7 = getelementptr i32, ptr %p0, i16 7
%elt = extractelement <64 x i32> %vec, i32 0
store i32 %elt, ptr %p0
store i32 %elt, ptr %p1
store i32 %elt, ptr %p2
store i32 %elt, ptr %p3
store i32 %elt, ptr %p4
store i32 %elt, ptr %p5
store i32 %elt, ptr %p6
store i32 %elt, ptr %p7
ret void
}
And the scenario was like this:
This patch resolved the issue by making sure that we slice up Mask in at most EltsPerVector pieces until we have covered the full Mask. When we have covered all elements in Mask we break the loop.
Haven't been able to reproduce this scenario for any in-tree target. So unfortunately there is no regression test included in the patch.