-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[VPlan] Fix VPTypeAnalysis cache clobbering in EVL transform #120252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VPlan] Fix VPTypeAnalysis cache clobbering in EVL transform #120252
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Luke Lau (lukel97) ChangesWhen building SPEC CPU 2017 with RISC-V and EVL tail folding, this assertion in VPTypeAnalysis would trigger during the transformation to EVL recipes: llvm-project/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp Lines 135 to 142 in d8a0709
It was caused by this recipe:
Having its type inferred as i16, when ir<%add33> and ir<0> had inferred types of i32 somehow. The cause of this turned out to be because the VPTypeAnalysis cache was getting clobbered: In this transform we were erasing recipes but keeping around the same mapping from VPValue* to Type*. In the meantime, new recipes would be created which would have the same address as the old value. They would then incorrectly get the old erased VPValue*'s cached type:
This fixes this by not sharing the VPTypeAnalysis across iterations of the loop, i.e. clearing the cache after anything is erased. The test case might be a bit flakey since it just happens to have the right conditions to recreate this. I tried to add an assert in inferScalarType that every VPValue in the cache was valid, but couldn't find a way of telling if a VPValue had been erased. Another alternative is that we could maybe add a method to remove specific VPValues from the cache, but it looks like recursivelyDeleteDeadRecipes might also erase other recipes too. Full diff: https://github.com/llvm/llvm-project/pull/120252.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 9a3b82fe57c12a..bfff2504c393b4 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1446,7 +1446,6 @@ void VPlanTransforms::addActiveLaneMask(
static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
using namespace llvm::VPlanPatternMatch;
Type *CanonicalIVType = Plan.getCanonicalIV()->getScalarType();
- VPTypeAnalysis TypeInfo(CanonicalIVType);
LLVMContext &Ctx = CanonicalIVType->getContext();
SmallVector<VPValue *> HeaderMasks = collectAllHeaderMasks(Plan);
@@ -1463,6 +1462,10 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
return HeaderMask == OrigMask ? nullptr : OrigMask;
};
+ // Don't preseve the type info cache across loop iterations since
+ // CurRecipe will be erased and invalidate it.
+ VPTypeAnalysis TypeInfo(CanonicalIVType);
+
VPRecipeBase *NewRecipe =
TypeSwitch<VPRecipeBase *, VPRecipeBase *>(CurRecipe)
.Case<VPWidenLoadRecipe>([&](VPWidenLoadRecipe *L) {
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll b/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
new file mode 100644
index 00000000000000..4ab129d8b3ac39
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
@@ -0,0 +1,121 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes=loop-vectorize -force-tail-folding-style=data-with-evl -prefer-predicate-over-epilogue=predicate-dont-vectorize -mtriple=riscv64 -mattr=+v -S %s | FileCheck %s
+
+; This test tries to recreate the conditions for a crash that occurred when the
+; VPTypeAnalysis cache wasn't cleared after a recipe was erased and clobbered
+; with a new one.
+
+define void @type_info_cache_clobber(ptr %dstv, ptr %src, i64 %wide.trip.count) {
+; CHECK-LABEL: define void @type_info_invalid_cache(
+; CHECK-SAME: ptr [[DSTV:%.*]], ptr [[SRC:%.*]], i64 [[WIDE_TRIP_COUNT:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: [[ENTRY:.*]]:
+; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[WIDE_TRIP_COUNT]], 1
+; CHECK-NEXT: [[TMP1:%.*]] = sub i64 -1, [[TMP0]]
+; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 8
+; CHECK-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP1]], [[TMP3]]
+; CHECK-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; CHECK: [[VECTOR_MEMCHECK]]:
+; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DSTV]], i64 1
+; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[WIDE_TRIP_COUNT]], 1
+; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP5]]
+; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[DSTV]], [[SCEVGEP1]]
+; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SRC]], [[SCEVGEP]]
+; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
+; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
+; CHECK: [[VECTOR_PH]]:
+; CHECK-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT: [[TMP7:%.*]] = mul i64 [[TMP6]], 8
+; CHECK-NEXT: [[TMP8:%.*]] = sub i64 [[TMP7]], 1
+; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[TMP0]], [[TMP8]]
+; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP7]]
+; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
+; CHECK-NEXT: [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT: [[TMP10:%.*]] = mul i64 [[TMP9]], 8
+; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 8 x ptr> poison, ptr [[DSTV]], i64 0
+; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 8 x ptr> [[BROADCAST_SPLATINSERT]], <vscale x 8 x ptr> poison, <vscale x 8 x i32> zeroinitializer
+; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
+; CHECK: [[VECTOR_BODY]]:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT: [[EVL_BASED_IV:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_EVL_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT: [[AVL:%.*]] = sub i64 [[TMP0]], [[EVL_BASED_IV]]
+; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[AVL]], i32 8, i1 true)
+; CHECK-NEXT: [[TMP12:%.*]] = add i64 [[EVL_BASED_IV]], 0
+; CHECK-NEXT: [[TMP13:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP12]]
+; CHECK-NEXT: [[TMP14:%.*]] = getelementptr i8, ptr [[TMP13]], i32 0
+; CHECK-NEXT: [[VP_OP_LOAD:%.*]] = call <vscale x 8 x i8> @llvm.vp.load.nxv8i8.p0(ptr align 1 [[TMP14]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]]), !alias.scope [[META0:![0-9]+]]
+; CHECK-NEXT: [[TMP15:%.*]] = call <vscale x 8 x i32> @llvm.vp.zext.nxv8i32.nxv8i8(<vscale x 8 x i8> [[VP_OP_LOAD]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT: [[VP_OP:%.*]] = call <vscale x 8 x i32> @llvm.vp.mul.nxv8i32(<vscale x 8 x i32> [[TMP15]], <vscale x 8 x i32> zeroinitializer, <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT: [[VP_OP2:%.*]] = call <vscale x 8 x i32> @llvm.vp.ashr.nxv8i32(<vscale x 8 x i32> [[TMP15]], <vscale x 8 x i32> zeroinitializer, <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT: [[VP_OP3:%.*]] = call <vscale x 8 x i32> @llvm.vp.or.nxv8i32(<vscale x 8 x i32> [[VP_OP2]], <vscale x 8 x i32> zeroinitializer, <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT: [[TMP16:%.*]] = icmp ult <vscale x 8 x i32> [[TMP15]], zeroinitializer
+; CHECK-NEXT: [[TMP17:%.*]] = call <vscale x 8 x i32> @llvm.vp.select.nxv8i32(<vscale x 8 x i1> [[TMP16]], <vscale x 8 x i32> [[VP_OP3]], <vscale x 8 x i32> zeroinitializer, i32 [[TMP11]])
+; CHECK-NEXT: [[TMP18:%.*]] = call <vscale x 8 x i8> @llvm.vp.trunc.nxv8i8.nxv8i32(<vscale x 8 x i32> [[TMP17]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT: call void @llvm.vp.scatter.nxv8i8.nxv8p0(<vscale x 8 x i8> [[TMP18]], <vscale x 8 x ptr> align 1 [[BROADCAST_SPLAT]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]]), !alias.scope [[META3:![0-9]+]], !noalias [[META0]]
+; CHECK-NEXT: [[TMP19:%.*]] = call <vscale x 8 x i16> @llvm.vp.trunc.nxv8i16.nxv8i32(<vscale x 8 x i32> [[VP_OP]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT: call void @llvm.vp.scatter.nxv8i16.nxv8p0(<vscale x 8 x i16> [[TMP19]], <vscale x 8 x ptr> align 2 zeroinitializer, <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT: [[TMP20:%.*]] = zext i32 [[TMP11]] to i64
+; CHECK-NEXT: [[INDEX_EVL_NEXT]] = add i64 [[TMP20]], [[EVL_BASED_IV]]
+; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP10]]
+; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT: br i1 [[TMP21]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK: [[MIDDLE_BLOCK]]:
+; CHECK-NEXT: br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK: [[SCALAR_PH]]:
+; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT: br label %[[LOOP:.*]]
+; CHECK: [[LOOP]]:
+; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
+; CHECK-NEXT: [[ARRAYIDX13:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[IV]]
+; CHECK-NEXT: [[TMP22:%.*]] = load i8, ptr [[ARRAYIDX13]], align 1
+; CHECK-NEXT: [[CONV14:%.*]] = zext i8 [[TMP22]] to i32
+; CHECK-NEXT: [[MUL21_NEG:%.*]] = mul i32 [[CONV14]], 0
+; CHECK-NEXT: [[ADD33:%.*]] = ashr i32 [[CONV14]], 0
+; CHECK-NEXT: [[SHR:%.*]] = or i32 [[ADD33]], 0
+; CHECK-NEXT: [[TOBOOL_NOT_I:%.*]] = icmp ult i32 [[CONV14]], 0
+; CHECK-NEXT: [[COND_I:%.*]] = select i1 [[TOBOOL_NOT_I]], i32 [[SHR]], i32 0
+; CHECK-NEXT: [[CONV_I:%.*]] = trunc i32 [[COND_I]] to i8
+; CHECK-NEXT: store i8 [[CONV_I]], ptr [[DSTV]], align 1
+; CHECK-NEXT: [[CONV36:%.*]] = trunc i32 [[MUL21_NEG]] to i16
+; CHECK-NEXT: store i16 [[CONV36]], ptr null, align 2
+; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1
+; CHECK-NEXT: [[EC:%.*]] = icmp eq i64 [[IV]], [[WIDE_TRIP_COUNT]]
+; CHECK-NEXT: br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP8:![0-9]+]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret void
+;
+entry:
+ br label %loop
+
+loop:
+ %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
+ %arrayidx13 = getelementptr i8, ptr %src, i64 %iv
+ %0 = load i8, ptr %arrayidx13, align 1
+ %conv14 = zext i8 %0 to i32
+ %mul21.neg = mul i32 %conv14, 0
+ %add33 = ashr i32 %conv14, 0
+ %shr = or i32 %add33, 0
+ %tobool.not.i = icmp ult i32 %conv14, 0
+ %cond.i = select i1 %tobool.not.i, i32 %shr, i32 0
+ %conv.i = trunc i32 %cond.i to i8
+ store i8 %conv.i, ptr %dstv, align 1
+ %conv36 = trunc i32 %mul21.neg to i16
+ store i16 %conv36, ptr null, align 2
+ %iv.next = add i64 %iv, 1
+ %ec = icmp eq i64 %iv, %wide.trip.count
+ br i1 %ec, label %exit, label %loop
+
+exit:
+ ret void
+}
+;.
+; CHECK: [[META0]] = !{[[META1:![0-9]+]]}
+; CHECK: [[META1]] = distinct !{[[META1]], [[META2:![0-9]+]]}
+; CHECK: [[META2]] = distinct !{[[META2]], !"LVerDomain"}
+; CHECK: [[META3]] = !{[[META4:![0-9]+]]}
+; CHECK: [[META4]] = distinct !{[[META4]], [[META2]]}
+; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META6:![0-9]+]], [[META7:![0-9]+]]}
+; CHECK: [[META6]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META7]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP8]] = distinct !{[[LOOP8]], [[META6]]}
+;.
|
@llvm/pr-subscribers-vectorizers Author: Luke Lau (lukel97) ChangesWhen building SPEC CPU 2017 with RISC-V and EVL tail folding, this assertion in VPTypeAnalysis would trigger during the transformation to EVL recipes: llvm-project/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp Lines 135 to 142 in d8a0709
It was caused by this recipe:
Having its type inferred as i16, when ir<%add33> and ir<0> had inferred types of i32 somehow. The cause of this turned out to be because the VPTypeAnalysis cache was getting clobbered: In this transform we were erasing recipes but keeping around the same mapping from VPValue* to Type*. In the meantime, new recipes would be created which would have the same address as the old value. They would then incorrectly get the old erased VPValue*'s cached type:
This fixes this by not sharing the VPTypeAnalysis across iterations of the loop, i.e. clearing the cache after anything is erased. The test case might be a bit flakey since it just happens to have the right conditions to recreate this. I tried to add an assert in inferScalarType that every VPValue in the cache was valid, but couldn't find a way of telling if a VPValue had been erased. Another alternative is that we could maybe add a method to remove specific VPValues from the cache, but it looks like recursivelyDeleteDeadRecipes might also erase other recipes too. Full diff: https://github.com/llvm/llvm-project/pull/120252.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 9a3b82fe57c12a..bfff2504c393b4 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1446,7 +1446,6 @@ void VPlanTransforms::addActiveLaneMask(
static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
using namespace llvm::VPlanPatternMatch;
Type *CanonicalIVType = Plan.getCanonicalIV()->getScalarType();
- VPTypeAnalysis TypeInfo(CanonicalIVType);
LLVMContext &Ctx = CanonicalIVType->getContext();
SmallVector<VPValue *> HeaderMasks = collectAllHeaderMasks(Plan);
@@ -1463,6 +1462,10 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
return HeaderMask == OrigMask ? nullptr : OrigMask;
};
+ // Don't preseve the type info cache across loop iterations since
+ // CurRecipe will be erased and invalidate it.
+ VPTypeAnalysis TypeInfo(CanonicalIVType);
+
VPRecipeBase *NewRecipe =
TypeSwitch<VPRecipeBase *, VPRecipeBase *>(CurRecipe)
.Case<VPWidenLoadRecipe>([&](VPWidenLoadRecipe *L) {
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll b/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
new file mode 100644
index 00000000000000..4ab129d8b3ac39
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
@@ -0,0 +1,121 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes=loop-vectorize -force-tail-folding-style=data-with-evl -prefer-predicate-over-epilogue=predicate-dont-vectorize -mtriple=riscv64 -mattr=+v -S %s | FileCheck %s
+
+; This test tries to recreate the conditions for a crash that occurred when the
+; VPTypeAnalysis cache wasn't cleared after a recipe was erased and clobbered
+; with a new one.
+
+define void @type_info_cache_clobber(ptr %dstv, ptr %src, i64 %wide.trip.count) {
+; CHECK-LABEL: define void @type_info_invalid_cache(
+; CHECK-SAME: ptr [[DSTV:%.*]], ptr [[SRC:%.*]], i64 [[WIDE_TRIP_COUNT:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: [[ENTRY:.*]]:
+; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[WIDE_TRIP_COUNT]], 1
+; CHECK-NEXT: [[TMP1:%.*]] = sub i64 -1, [[TMP0]]
+; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 8
+; CHECK-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP1]], [[TMP3]]
+; CHECK-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; CHECK: [[VECTOR_MEMCHECK]]:
+; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DSTV]], i64 1
+; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[WIDE_TRIP_COUNT]], 1
+; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP5]]
+; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[DSTV]], [[SCEVGEP1]]
+; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SRC]], [[SCEVGEP]]
+; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
+; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
+; CHECK: [[VECTOR_PH]]:
+; CHECK-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT: [[TMP7:%.*]] = mul i64 [[TMP6]], 8
+; CHECK-NEXT: [[TMP8:%.*]] = sub i64 [[TMP7]], 1
+; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[TMP0]], [[TMP8]]
+; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP7]]
+; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
+; CHECK-NEXT: [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT: [[TMP10:%.*]] = mul i64 [[TMP9]], 8
+; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 8 x ptr> poison, ptr [[DSTV]], i64 0
+; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 8 x ptr> [[BROADCAST_SPLATINSERT]], <vscale x 8 x ptr> poison, <vscale x 8 x i32> zeroinitializer
+; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
+; CHECK: [[VECTOR_BODY]]:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT: [[EVL_BASED_IV:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_EVL_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT: [[AVL:%.*]] = sub i64 [[TMP0]], [[EVL_BASED_IV]]
+; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[AVL]], i32 8, i1 true)
+; CHECK-NEXT: [[TMP12:%.*]] = add i64 [[EVL_BASED_IV]], 0
+; CHECK-NEXT: [[TMP13:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP12]]
+; CHECK-NEXT: [[TMP14:%.*]] = getelementptr i8, ptr [[TMP13]], i32 0
+; CHECK-NEXT: [[VP_OP_LOAD:%.*]] = call <vscale x 8 x i8> @llvm.vp.load.nxv8i8.p0(ptr align 1 [[TMP14]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]]), !alias.scope [[META0:![0-9]+]]
+; CHECK-NEXT: [[TMP15:%.*]] = call <vscale x 8 x i32> @llvm.vp.zext.nxv8i32.nxv8i8(<vscale x 8 x i8> [[VP_OP_LOAD]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT: [[VP_OP:%.*]] = call <vscale x 8 x i32> @llvm.vp.mul.nxv8i32(<vscale x 8 x i32> [[TMP15]], <vscale x 8 x i32> zeroinitializer, <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT: [[VP_OP2:%.*]] = call <vscale x 8 x i32> @llvm.vp.ashr.nxv8i32(<vscale x 8 x i32> [[TMP15]], <vscale x 8 x i32> zeroinitializer, <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT: [[VP_OP3:%.*]] = call <vscale x 8 x i32> @llvm.vp.or.nxv8i32(<vscale x 8 x i32> [[VP_OP2]], <vscale x 8 x i32> zeroinitializer, <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT: [[TMP16:%.*]] = icmp ult <vscale x 8 x i32> [[TMP15]], zeroinitializer
+; CHECK-NEXT: [[TMP17:%.*]] = call <vscale x 8 x i32> @llvm.vp.select.nxv8i32(<vscale x 8 x i1> [[TMP16]], <vscale x 8 x i32> [[VP_OP3]], <vscale x 8 x i32> zeroinitializer, i32 [[TMP11]])
+; CHECK-NEXT: [[TMP18:%.*]] = call <vscale x 8 x i8> @llvm.vp.trunc.nxv8i8.nxv8i32(<vscale x 8 x i32> [[TMP17]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT: call void @llvm.vp.scatter.nxv8i8.nxv8p0(<vscale x 8 x i8> [[TMP18]], <vscale x 8 x ptr> align 1 [[BROADCAST_SPLAT]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]]), !alias.scope [[META3:![0-9]+]], !noalias [[META0]]
+; CHECK-NEXT: [[TMP19:%.*]] = call <vscale x 8 x i16> @llvm.vp.trunc.nxv8i16.nxv8i32(<vscale x 8 x i32> [[VP_OP]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT: call void @llvm.vp.scatter.nxv8i16.nxv8p0(<vscale x 8 x i16> [[TMP19]], <vscale x 8 x ptr> align 2 zeroinitializer, <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT: [[TMP20:%.*]] = zext i32 [[TMP11]] to i64
+; CHECK-NEXT: [[INDEX_EVL_NEXT]] = add i64 [[TMP20]], [[EVL_BASED_IV]]
+; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP10]]
+; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT: br i1 [[TMP21]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK: [[MIDDLE_BLOCK]]:
+; CHECK-NEXT: br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK: [[SCALAR_PH]]:
+; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT: br label %[[LOOP:.*]]
+; CHECK: [[LOOP]]:
+; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
+; CHECK-NEXT: [[ARRAYIDX13:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[IV]]
+; CHECK-NEXT: [[TMP22:%.*]] = load i8, ptr [[ARRAYIDX13]], align 1
+; CHECK-NEXT: [[CONV14:%.*]] = zext i8 [[TMP22]] to i32
+; CHECK-NEXT: [[MUL21_NEG:%.*]] = mul i32 [[CONV14]], 0
+; CHECK-NEXT: [[ADD33:%.*]] = ashr i32 [[CONV14]], 0
+; CHECK-NEXT: [[SHR:%.*]] = or i32 [[ADD33]], 0
+; CHECK-NEXT: [[TOBOOL_NOT_I:%.*]] = icmp ult i32 [[CONV14]], 0
+; CHECK-NEXT: [[COND_I:%.*]] = select i1 [[TOBOOL_NOT_I]], i32 [[SHR]], i32 0
+; CHECK-NEXT: [[CONV_I:%.*]] = trunc i32 [[COND_I]] to i8
+; CHECK-NEXT: store i8 [[CONV_I]], ptr [[DSTV]], align 1
+; CHECK-NEXT: [[CONV36:%.*]] = trunc i32 [[MUL21_NEG]] to i16
+; CHECK-NEXT: store i16 [[CONV36]], ptr null, align 2
+; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1
+; CHECK-NEXT: [[EC:%.*]] = icmp eq i64 [[IV]], [[WIDE_TRIP_COUNT]]
+; CHECK-NEXT: br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP8:![0-9]+]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret void
+;
+entry:
+ br label %loop
+
+loop:
+ %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
+ %arrayidx13 = getelementptr i8, ptr %src, i64 %iv
+ %0 = load i8, ptr %arrayidx13, align 1
+ %conv14 = zext i8 %0 to i32
+ %mul21.neg = mul i32 %conv14, 0
+ %add33 = ashr i32 %conv14, 0
+ %shr = or i32 %add33, 0
+ %tobool.not.i = icmp ult i32 %conv14, 0
+ %cond.i = select i1 %tobool.not.i, i32 %shr, i32 0
+ %conv.i = trunc i32 %cond.i to i8
+ store i8 %conv.i, ptr %dstv, align 1
+ %conv36 = trunc i32 %mul21.neg to i16
+ store i16 %conv36, ptr null, align 2
+ %iv.next = add i64 %iv, 1
+ %ec = icmp eq i64 %iv, %wide.trip.count
+ br i1 %ec, label %exit, label %loop
+
+exit:
+ ret void
+}
+;.
+; CHECK: [[META0]] = !{[[META1:![0-9]+]]}
+; CHECK: [[META1]] = distinct !{[[META1]], [[META2:![0-9]+]]}
+; CHECK: [[META2]] = distinct !{[[META2]], !"LVerDomain"}
+; CHECK: [[META3]] = !{[[META4:![0-9]+]]}
+; CHECK: [[META4]] = distinct !{[[META4]], [[META2]]}
+; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META6:![0-9]+]], [[META7:![0-9]+]]}
+; CHECK: [[META6]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META7]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP8]] = distinct !{[[LOOP8]], [[META6]]}
+;.
|
How about alternative approach - actual deletion of the dead recipes after the transformation? |
Yes that would probably be good. |
Yeah I had considered that but I thought recursivelyDeleteDeadRecipes might have made it a bit tricky since it erases recipes in flight. But I don't see why we couldn't also just run that once at the end instead of after each header mask. I'll give that a try |
You probably need to collect the recipes to remove, as some of the replaced recipes won't be considered dead (e.g. store recipes) |
} | ||
recursivelyDeleteDeadRecipes(HeaderMask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add HeaderMask
to be erased here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that HeaderMask
might still be used by a recipe if it can't be converted to an EVL recipe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, maybe just recursively delete the recipes to erase, which then will automatically clean up the header mask if it becomes dead?
All recipes you are queuing should be VPSingelDefRecipe*
for (VPRecipeBase *R : ToErase)
recursivelyDeleteDeadRecipes(R);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the recipes are stores with zero values, but I was able to copy the trick from optimizeForVFAndUF where you erase the recipe and then call recursivelyDeleteDeadRecipes on its operands
Co-authored-by: Florian Hahn <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/137/builds/10510 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/10886 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/56/builds/14764 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/15472 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/8474 Here is the relevant piece of the build log for the reference
|
Buildbot failures should hopefully be fixed in b1f4a02 |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/17891 Here is the relevant piece of the build log for the reference
|
Add additional OR simplification to fix a divergence between legacy and VPlan-based cost model. This adds a new m_AllOnes matcher by generalizing specific_intval to int_pred_ty, which takes a predicate to check to support matching both specific APInts and other APInt predices, like isAllOnes. Fixes #131359.
When building SPEC CPU 2017 with RISC-V and EVL tail folding, this assertion in VPTypeAnalysis would trigger during the transformation to EVL recipes:
llvm-project/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
Lines 135 to 142 in d8a0709
It was caused by this recipe:
Having its type inferred as i16, when ir<%add33> and ir<0> had inferred types of i32 somehow.
The cause of this turned out to be because the VPTypeAnalysis cache was getting clobbered: In this transform we were erasing recipes but keeping around the same mapping from VPValue* to Type*. In the meantime, new recipes would be created which would have the same address as the old value. They would then incorrectly get the old erased VPValue*'s cached type:
This fixes this by deferring the erasing of recipes till after the transformation.
The test case might be a bit flakey since it just happens to have the right conditions to recreate this. I tried to add an assert in inferScalarType that every VPValue in the cache was valid, but couldn't find a way of telling if a VPValue had been erased.