-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LV][AArch64] Don't query registers for illegal scalable vector elts #109411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When trying to maximize vector bandwidth we ask TTI for the number of registers required for a given operation. If the type of that operation happens to be something illegal for scalable vectors (e.g. <vscale x 4 x fp128>) then we would see a crash. Instead, just return a default value and let the cost model reject the invalid operation later.
@llvm/pr-subscribers-llvm-transforms Author: Graham Hunter (huntergr-arm) ChangesWhen trying to maximize vector bandwidth we ask TTI for the number of Instead, just return a default value and let the cost model reject the Full diff: https://github.com/llvm/llvm-project/pull/109411.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 9f554827a8287d..66c2eeed17d38b 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -5207,7 +5207,9 @@ LoopVectorizationCostModel::calculateRegisterUsage(ArrayRef<ElementCount> VFs) {
const auto &TTICapture = TTI;
auto GetRegUsage = [&TTICapture](Type *Ty, ElementCount VF) -> unsigned {
- if (Ty->isTokenTy() || !VectorType::isValidElementType(Ty))
+ if (Ty->isTokenTy() || !VectorType::isValidElementType(Ty) ||
+ (VF.isScalable() &&
+ !TTICapture.isElementTypeLegalForScalableVector(Ty)))
return 0;
return TTICapture.getRegUsageForType(VectorType::get(Ty, VF));
};
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-fp-ext-trunc-illegal-type.ll b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-fp-ext-trunc-illegal-type.ll
new file mode 100644
index 00000000000000..bb89ead3ca303c
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-fp-ext-trunc-illegal-type.ll
@@ -0,0 +1,81 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
+; RUN: opt < %s -passes=loop-vectorize -debug-only=loop-vectorize -vectorizer-maximize-bandwidth -S 2>&1 | FileCheck %s
+; REQUIRES: asserts
+
+target triple = "aarch64-unknown-linux-gnu"
+
+;; Make sure we reject scalable vectors for fp128 types. We were previously
+;; crashing before reaching the cost model when checking for the number of
+;; registers required for a <vscale x 4 x fp128> when trying to maximize
+;; vector bandwidth with SVE.
+
+; CHECK: LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %load.ext = fpext double %load.in to fp128
+
+define void @load_ext_trunc_store(ptr readonly %in, ptr noalias %out, i64 %N) #0 {
+; CHECK-LABEL: define void @load_ext_trunc_store(
+; CHECK-SAME: ptr readonly [[IN:%.*]], ptr noalias [[OUT:%.*]], i64 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: [[ENTRY:.*]]:
+; CHECK-NEXT: [[UMAX:%.*]] = call i64 @llvm.umax.i64(i64 [[N]], i64 1)
+; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[UMAX]], 4
+; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK: [[VECTOR_PH]]:
+; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[UMAX]], 4
+; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[UMAX]], [[N_MOD_VF]]
+; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
+; CHECK: [[VECTOR_BODY]]:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
+; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds double, ptr [[IN]], i64 [[TMP0]]
+; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds double, ptr [[TMP2]], i32 0
+; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, ptr [[TMP4]], align 8
+; CHECK-NEXT: [[TMP3:%.*]] = fpext <4 x double> [[WIDE_LOAD]] to <4 x fp128>
+; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds float, ptr [[OUT]], i64 [[TMP0]]
+; CHECK-NEXT: [[TMP5:%.*]] = fptrunc <4 x fp128> [[TMP3]] to <4 x float>
+; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds float, ptr [[TMP8]], i32 0
+; CHECK-NEXT: store <4 x float> [[TMP5]], ptr [[TMP12]], align 4
+; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
+; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT: br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK: [[MIDDLE_BLOCK]]:
+; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[UMAX]], [[N_VEC]]
+; CHECK-NEXT: br i1 [[CMP_N]], label %[[FOR_EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK: [[SCALAR_PH]]:
+; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT: br label %[[FOR_BODY:.*]]
+; CHECK: [[FOR_BODY]]:
+; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; CHECK-NEXT: [[GEP_IN:%.*]] = getelementptr inbounds nuw double, ptr [[IN]], i64 [[IV]]
+; CHECK-NEXT: [[LOAD_IN:%.*]] = load double, ptr [[GEP_IN]], align 8
+; CHECK-NEXT: [[LOAD_EXT:%.*]] = fpext double [[LOAD_IN]] to fp128
+; CHECK-NEXT: [[GEP_OUT:%.*]] = getelementptr inbounds nuw float, ptr [[OUT]], i64 [[IV]]
+; CHECK-NEXT: [[TRUNC_OUT:%.*]] = fptrunc fp128 [[LOAD_EXT]] to float
+; CHECK-NEXT: store float [[TRUNC_OUT]], ptr [[GEP_OUT]], align 4
+; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
+; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ult i64 [[IV_NEXT]], [[N]]
+; CHECK-NEXT: br i1 [[EXITCOND]], label %[[FOR_BODY]], label %[[FOR_EXIT]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK: [[FOR_EXIT]]:
+; CHECK-NEXT: ret void
+;
+entry:
+ br label %for.body
+
+for.body:
+ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+ %gep.in = getelementptr inbounds nuw double, ptr %in, i64 %iv
+ %load.in = load double, ptr %gep.in, align 8
+ %load.ext = fpext double %load.in to fp128
+ %gep.out = getelementptr inbounds nuw float, ptr %out, i64 %iv
+ %trunc.out = fptrunc fp128 %load.ext to float
+ store float %trunc.out, ptr %gep.out, align 4
+ %iv.next = add nuw nsw i64 %iv, 1
+ %exitcond = icmp ult i64 %iv.next, %N
+ br i1 %exitcond, label %for.body, label %for.exit, !llvm.loop !0
+
+for.exit:
+ ret void
+}
+
+attributes #0 = { "target-features"="+sve" vscale_range(1,16) }
+
+!0 = distinct !{!0, !1}
+!1 = !{!"llvm.loop.interleave.count", i32 1}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
attributes #0 = { "target-features"="+sve" vscale_range(1,16) } | ||
|
||
!0 = distinct !{!0, !1} | ||
!1 = !{!"llvm.loop.interleave.count", i32 1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to reduce the number of check lines to review; I've switched to using a flag instead to match the other change.
ret void | ||
} | ||
|
||
attributes #0 = { "target-features"="+sve" vscale_range(1,16) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be simpler/easier to see if using -mattr=+sve
in the run line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
When trying to maximize vector bandwidth we ask TTI for the number of
registers required for a given operation. If the type of that operation
happens to be something illegal for scalable vectors (e.g.
<vscale x 4 x fp128>) then we would see a crash.
Instead, just return a default value and let the cost model reject the
invalid operation later.