-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AArch64] Optimized generated assembly for bool to svbool_t conversions #83001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The original assembly was generating `AND(WHILELO, SPLAT 1)` pattern when only `WHILELO` was necessary
@llvm/pr-subscribers-backend-aarch64 Author: None (Lukacma) ChangesThe original assembly was generating Full diff: https://github.com/llvm/llvm-project/pull/83001.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index a3b7e3128ac1a4..dba3a787734721 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -276,6 +276,7 @@ static bool isZeroingInactiveLanes(SDValue Op) {
if (ISD::isConstantSplatVectorAllOnes(Op.getNode()))
return true;
return false;
+ case ISD::SPLAT_VECTOR:
case AArch64ISD::PTRUE:
case AArch64ISD::SETCC_MERGE_ZERO:
return true;
diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-reinterpret.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-reinterpret.ll
index 82bf756f822898..c7c102f5d567d9 100644
--- a/llvm/test/CodeGen/AArch64/sve-intrinsics-reinterpret.ll
+++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-reinterpret.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme < %s | FileCheck %s
@@ -150,6 +150,46 @@ define <vscale x 16 x i1> @chained_reinterpret() {
ret <vscale x 16 x i1> %out
}
+define <vscale x 16 x i1> @reinterpret_scalar_bool_h(i1 %x){
+; CHECK-LABEL: reinterpret_scalar_bool_h:
+; CHECK: // %bb.0:
+; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT: sbfx x8, x0, #0, #1
+; CHECK-NEXT: whilelo p0.h, xzr, x8
+; CHECK-NEXT: ret
+ %.splatinsert = insertelement <vscale x 8 x i1> poison, i1 %x, i64 0
+ %.splat = shufflevector <vscale x 8 x i1> %.splatinsert, <vscale x 8 x i1> poison, <vscale x 8 x i32> zeroinitializer
+ %out = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv8i1(<vscale x 8 x i1> %.splat)
+ ret <vscale x 16 x i1> %out
+}
+
+define <vscale x 16 x i1> @reinterpret_scalar_bool_s(i1 %x){
+; CHECK-LABEL: reinterpret_scalar_bool_s:
+; CHECK: // %bb.0:
+; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT: sbfx x8, x0, #0, #1
+; CHECK-NEXT: whilelo p0.s, xzr, x8
+; CHECK-NEXT: ret
+ %.splatinsert = insertelement <vscale x 4 x i1> poison, i1 %x, i64 0
+ %.splat = shufflevector <vscale x 4 x i1> %.splatinsert, <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer
+ %out = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1> %.splat)
+ ret <vscale x 16 x i1> %out
+}
+
+define <vscale x 16 x i1> @reinterpret_scalar_bool_q(i1 %x){
+; CHECK-LABEL: reinterpret_scalar_bool_q:
+; CHECK: // %bb.0:
+; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT: sbfx x8, x0, #0, #1
+; CHECK-NEXT: whilelo p0.d, xzr, x8
+; CHECK-NEXT: ret
+ %.splatinsert = insertelement <vscale x 2 x i1> poison, i1 %x, i64 0
+ %.splat = shufflevector <vscale x 2 x i1> %.splatinsert, <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer
+ %out = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv2i1(<vscale x 2 x i1> %.splat)
+ ret <vscale x 16 x i1> %out
+}
+
+
declare <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 immarg)
declare <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 immarg)
declare <vscale x 8 x i1> @llvm.aarch64.sve.cmpgt.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
|
The main concern I have about this commit is that it relies on splat_vector being always implemented using zeroing instruction. Do you think it is okay to embed this reliance into the code ? |
The real question is what is the canonical representation in the AArch64 backend of types like |
By design the "invisible bits" within a predicate type are undefined. When changing the visibility of bits via the to/from_svbool intrinsics we didn't want to leave newly visible bits hanging and so to_svbool defines them as zero. I suspect the reason that non-constant |
In the test I have an example, where previously splat_vectors were used to go from scalar to vector type and because they weren't contracted to zero lanes it resulted in extra instructions :
|
Sure, I can see the tests but I meant from a C/C++ point of view. |
It was "simple" casting from bool to svbool_t :
|
I see, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I'm happy for the default case to be cleaned up as a separate patch but if you'd rather do it here then I'll just re-review the updates.
…ng through bitcasts
I cleaned it up in this patch and added capability to see through bitcasts. Not sure if that is necessary for all cases, but it might happen in certain situations. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
…ns (llvm#83001) In certain cases Legalizer was generating `AND(WHILELO, SPLAT 1)` instruction pattern, when `WHILELO` would be sufficient.
The original assembly was generating
AND(WHILELO, SPLAT 1)
pattern when onlyWHILELO
was necessary.This change relies on the fact that current lowering of SPLAT_VECTOR node relies on whilelo, which zeroes unused lanes.