[LV][AArch64] Don't query registers for illegal scalable vector elts #109411

huntergr-arm · 2024-09-20T12:05:52Z

When trying to maximize vector bandwidth we ask TTI for the number of
registers required for a given operation. If the type of that operation
happens to be something illegal for scalable vectors (e.g.
<vscale x 4 x fp128>) then we would see a crash.

Instead, just return a default value and let the cost model reject the
invalid operation later.

When trying to maximize vector bandwidth we ask TTI for the number of registers required for a given operation. If the type of that operation happens to be something illegal for scalable vectors (e.g. <vscale x 4 x fp128>) then we would see a crash. Instead, just return a default value and let the cost model reject the invalid operation later.

llvmbot · 2024-09-20T12:06:30Z

@llvm/pr-subscribers-llvm-transforms

Author: Graham Hunter (huntergr-arm)

Changes

When trying to maximize vector bandwidth we ask TTI for the number of
registers required for a given operation. If the type of that operation
happens to be something illegal for scalable vectors (e.g.
<vscale x 4 x fp128>) then we would see a crash.

Instead, just return a default value and let the cost model reject the
invalid operation later.

Full diff: https://github.com/llvm/llvm-project/pull/109411.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+3-1)
(added) llvm/test/Transforms/LoopVectorize/AArch64/scalable-fp-ext-trunc-illegal-type.ll (+81)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 9f554827a8287d..66c2eeed17d38b 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -5207,7 +5207,9 @@ LoopVectorizationCostModel::calculateRegisterUsage(ArrayRef<ElementCount> VFs) {
 
   const auto &TTICapture = TTI;
   auto GetRegUsage = [&TTICapture](Type *Ty, ElementCount VF) -> unsigned {
-    if (Ty->isTokenTy() || !VectorType::isValidElementType(Ty))
+    if (Ty->isTokenTy() || !VectorType::isValidElementType(Ty) ||
+        (VF.isScalable() &&
+         !TTICapture.isElementTypeLegalForScalableVector(Ty)))
       return 0;
     return TTICapture.getRegUsageForType(VectorType::get(Ty, VF));
   };
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-fp-ext-trunc-illegal-type.ll b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-fp-ext-trunc-illegal-type.ll
new file mode 100644
index 00000000000000..bb89ead3ca303c
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-fp-ext-trunc-illegal-type.ll
@@ -0,0 +1,81 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
+; RUN: opt < %s -passes=loop-vectorize -debug-only=loop-vectorize -vectorizer-maximize-bandwidth -S 2>&1 | FileCheck %s
+; REQUIRES: asserts
+
+target triple = "aarch64-unknown-linux-gnu"
+
+;; Make sure we reject scalable vectors for fp128 types. We were previously
+;; crashing before reaching the cost model when checking for the number of
+;; registers required for a <vscale x 4 x fp128> when trying to maximize
+;; vector bandwidth with SVE.
+
+; CHECK: LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %load.ext = fpext double %load.in to fp128
+
+define void @load_ext_trunc_store(ptr readonly %in, ptr noalias %out, i64 %N) #0 {
+; CHECK-LABEL: define void @load_ext_trunc_store(
+; CHECK-SAME: ptr readonly [[IN:%.*]], ptr noalias [[OUT:%.*]], i64 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    [[UMAX:%.*]] = call i64 @llvm.umax.i64(i64 [[N]], i64 1)
+; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[UMAX]], 4
+; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[UMAX]], 4
+; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[UMAX]], [[N_MOD_VF]]
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[TMP0:%.*]] = add i64 [[INDEX]], 0
+; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds double, ptr [[IN]], i64 [[TMP0]]
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds double, ptr [[TMP2]], i32 0
+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x double>, ptr [[TMP4]], align 8
+; CHECK-NEXT:    [[TMP3:%.*]] = fpext <4 x double> [[WIDE_LOAD]] to <4 x fp128>
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds float, ptr [[OUT]], i64 [[TMP0]]
+; CHECK-NEXT:    [[TMP5:%.*]] = fptrunc <4 x fp128> [[TMP3]] to <4 x float>
+; CHECK-NEXT:    [[TMP12:%.*]] = getelementptr inbounds float, ptr [[TMP8]], i32 0
+; CHECK-NEXT:    store <4 x float> [[TMP5]], ptr [[TMP12]], align 4
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
+; CHECK-NEXT:    [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[UMAX]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[CMP_N]], label %[[FOR_EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK:       [[SCALAR_PH]]:
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
+; CHECK:       [[FOR_BODY]]:
+; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; CHECK-NEXT:    [[GEP_IN:%.*]] = getelementptr inbounds nuw double, ptr [[IN]], i64 [[IV]]
+; CHECK-NEXT:    [[LOAD_IN:%.*]] = load double, ptr [[GEP_IN]], align 8
+; CHECK-NEXT:    [[LOAD_EXT:%.*]] = fpext double [[LOAD_IN]] to fp128
+; CHECK-NEXT:    [[GEP_OUT:%.*]] = getelementptr inbounds nuw float, ptr [[OUT]], i64 [[IV]]
+; CHECK-NEXT:    [[TRUNC_OUT:%.*]] = fptrunc fp128 [[LOAD_EXT]] to float
+; CHECK-NEXT:    store float [[TRUNC_OUT]], ptr [[GEP_OUT]], align 4
+; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ult i64 [[IV_NEXT]], [[N]]
+; CHECK-NEXT:    br i1 [[EXITCOND]], label %[[FOR_BODY]], label %[[FOR_EXIT]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK:       [[FOR_EXIT]]:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %for.body
+
+for.body:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+  %gep.in = getelementptr inbounds nuw double, ptr %in, i64 %iv
+  %load.in = load double, ptr %gep.in, align 8
+  %load.ext = fpext double %load.in to fp128
+  %gep.out = getelementptr inbounds nuw float, ptr %out, i64 %iv
+  %trunc.out = fptrunc fp128 %load.ext to float
+  store float %trunc.out, ptr %gep.out, align 4
+  %iv.next = add nuw nsw i64 %iv, 1
+  %exitcond = icmp ult i64 %iv.next, %N
+  br i1 %exitcond, label %for.body, label %for.exit, !llvm.loop !0
+
+for.exit:
+  ret void
+}
+
+attributes #0 = { "target-features"="+sve" vscale_range(1,16) }
+
+!0 = distinct !{!0, !1}
+!1 = !{!"llvm.loop.interleave.count", i32 1}

fhahn

LGTM, thanks!

fhahn · 2024-09-22T18:16:20Z

llvm/test/Transforms/LoopVectorize/AArch64/scalable-fp-ext-trunc-illegal-type.ll

+attributes #0 = { "target-features"="+sve" vscale_range(1,16) }
+
+!0 = distinct !{!0, !1}
+!1 = !{!"llvm.loop.interleave.count", i32 1}


is this needed?

I wanted to reduce the number of check lines to review; I've switched to using a flag instead to match the other change.

fhahn · 2024-09-22T18:17:15Z

llvm/test/Transforms/LoopVectorize/AArch64/scalable-fp-ext-trunc-illegal-type.ll

+  ret void
+}
+
+attributes #0 = { "target-features"="+sve" vscale_range(1,16) }


Might be simpler/easier to see if using -mattr=+sve in the run line?

huntergr-arm requested review from fhahn, SamTebbs33 and paulwalker-arm September 20, 2024 12:05

llvmbot added vectorizers llvm:transforms labels Sep 20, 2024

paulwalker-arm approved these changes Sep 20, 2024

View reviewed changes

fhahn approved these changes Sep 22, 2024

View reviewed changes

Use flags instead of metadata for machine features and interleaving

74137fa

huntergr-arm merged commit 785337e into llvm:main Sep 23, 2024
6 of 8 checks passed

huntergr-arm deleted the sve-illegal-type-max-bw branch September 23, 2024 12:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV][AArch64] Don't query registers for illegal scalable vector elts #109411

[LV][AArch64] Don't query registers for illegal scalable vector elts #109411

Uh oh!

huntergr-arm commented Sep 20, 2024

Uh oh!

llvmbot commented Sep 20, 2024

Uh oh!

fhahn left a comment

Uh oh!

fhahn Sep 22, 2024

Uh oh!

huntergr-arm Sep 23, 2024

Uh oh!

fhahn Sep 22, 2024

Uh oh!

huntergr-arm Sep 23, 2024

Uh oh!

Uh oh!

Uh oh!

[LV][AArch64] Don't query registers for illegal scalable vector elts #109411

[LV][AArch64] Don't query registers for illegal scalable vector elts #109411

Uh oh!

Conversation

huntergr-arm commented Sep 20, 2024

Uh oh!

llvmbot commented Sep 20, 2024

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn Sep 22, 2024

Choose a reason for hiding this comment

Uh oh!

huntergr-arm Sep 23, 2024

Choose a reason for hiding this comment

Uh oh!

fhahn Sep 22, 2024

Choose a reason for hiding this comment

Uh oh!

huntergr-arm Sep 23, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!