[AArch64] Optimized generated assembly for bool to svbool_t conversions #83001

Lukacma · 2024-02-26T13:24:33Z

The original assembly was generating AND(WHILELO, SPLAT 1) pattern when only WHILELO was necessary.

This change relies on the fact that current lowering of SPLAT_VECTOR node relies on whilelo, which zeroes unused lanes.

The original assembly was generating `AND(WHILELO, SPLAT 1)` pattern when only `WHILELO` was necessary

llvmbot · 2024-02-26T13:25:02Z

@llvm/pr-subscribers-backend-aarch64

Author: None (Lukacma)

Changes

The original assembly was generating AND(WHILELO, SPLAT 1) pattern when only WHILELO was necessary.

Full diff: https://github.com/llvm/llvm-project/pull/83001.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+1)
(modified) llvm/test/CodeGen/AArch64/sve-intrinsics-reinterpret.ll (+41-1)

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index a3b7e3128ac1a4..dba3a787734721 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -276,6 +276,7 @@ static bool isZeroingInactiveLanes(SDValue Op) {
     if (ISD::isConstantSplatVectorAllOnes(Op.getNode()))
       return true;
     return false;
+  case ISD::SPLAT_VECTOR:
   case AArch64ISD::PTRUE:
   case AArch64ISD::SETCC_MERGE_ZERO:
     return true;
diff --git a/llvm/test/CodeGen/AArch64/sve-intrinsics-reinterpret.ll b/llvm/test/CodeGen/AArch64/sve-intrinsics-reinterpret.ll
index 82bf756f822898..c7c102f5d567d9 100644
--- a/llvm/test/CodeGen/AArch64/sve-intrinsics-reinterpret.ll
+++ b/llvm/test/CodeGen/AArch64/sve-intrinsics-reinterpret.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme < %s | FileCheck %s
 
@@ -150,6 +150,46 @@ define <vscale x 16 x i1> @chained_reinterpret() {
   ret <vscale x 16 x i1> %out
 }
 
+define <vscale x 16 x i1> @reinterpret_scalar_bool_h(i1 %x){
+; CHECK-LABEL: reinterpret_scalar_bool_h:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    // kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT:    sbfx x8, x0, #0, #1
+; CHECK-NEXT:    whilelo p0.h, xzr, x8
+; CHECK-NEXT:    ret
+  %.splatinsert = insertelement <vscale x 8 x i1> poison, i1 %x, i64 0
+  %.splat = shufflevector <vscale x 8 x i1> %.splatinsert, <vscale x 8 x i1> poison, <vscale x 8 x i32> zeroinitializer
+  %out = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv8i1(<vscale x 8 x i1> %.splat)
+  ret <vscale x 16 x i1> %out
+}
+
+define <vscale x 16 x i1> @reinterpret_scalar_bool_s(i1 %x){
+; CHECK-LABEL: reinterpret_scalar_bool_s:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    // kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT:    sbfx x8, x0, #0, #1
+; CHECK-NEXT:    whilelo p0.s, xzr, x8
+; CHECK-NEXT:    ret
+  %.splatinsert = insertelement <vscale x 4 x i1> poison, i1 %x, i64 0
+  %.splat = shufflevector <vscale x 4 x i1> %.splatinsert, <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer
+  %out = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1> %.splat)
+  ret <vscale x 16 x i1> %out
+}
+
+define <vscale x 16 x i1> @reinterpret_scalar_bool_q(i1 %x){
+; CHECK-LABEL: reinterpret_scalar_bool_q:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    // kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT:    sbfx x8, x0, #0, #1
+; CHECK-NEXT:    whilelo p0.d, xzr, x8
+; CHECK-NEXT:    ret
+  %.splatinsert = insertelement <vscale x 2 x i1> poison, i1 %x, i64 0
+  %.splat = shufflevector <vscale x 2 x i1> %.splatinsert, <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer
+  %out = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv2i1(<vscale x 2 x i1> %.splat)
+  ret <vscale x 16 x i1> %out
+}
+
+
 declare <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 immarg)
 declare <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 immarg)
 declare <vscale x 8 x i1> @llvm.aarch64.sve.cmpgt.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Lukacma · 2024-02-26T16:34:47Z

The main concern I have about this commit is that it relies on splat_vector being always implemented using zeroing instruction. Do you think it is okay to embed this reliance into the code ?

momchil-velikov · 2024-02-26T17:43:07Z

The main concern I have about this commit is that it relies on splat_vector being always implemented using zeroing instruction.
Do you think it is okay to embed this reliance into the code ?

The real question is what is the canonical representation in the AArch64 backend of types like <vscale x N x i1> where N < 16.
Taking <vscale x 8 x i1> as an example, we know it has to represented with a bit-pattern that is vscale * 16 bits wide. Naturally, the "active" bits are the even-numbered bits and the "inactive" bits are the odd-numbered bits. The question is about the inactive bits. We can say they must be zero. Or we can say they can be undefined. We are going through a lot of effort to zero the "inactive" bits (that's the whole point of convert to/from svbool), which makes me think the canonical representation is with zero in "inactive" bits. Hence splat_vector must be lowered to instructions which yield zero in the inactive bits, other possible lowerings would be semantically incorrect.

paulwalker-arm · 2024-02-27T11:09:18Z

The main concern I have about this commit is that it relies on splat_vector being always implemented using zeroing instruction.
Do you think it is okay to embed this reliance into the code ?

The real question is what is the canonical representation in the AArch64 backend of types like <vscale x N x i1> where N < 16. Taking <vscale x 8 x i1> as an example, we know it has to represented with a bit-pattern that is vscale * 16 bits wide. Naturally, the "active" bits are the even-numbered bits and the "inactive" bits are the odd-numbered bits. The question is about the inactive bits. We can say they must be zero. Or we can say they can be undefined. We are going through a lot of effort to zero the "inactive" bits (that's the whole point of convert to/from svbool), which makes me think the canonical representation is with zero in "inactive" bits. Hence splat_vector must be lowered to instructions which yield zero in the inactive bits, other possible lowerings would be semantically incorrect.

By design the "invisible bits" within a predicate type are undefined. When changing the visibility of bits via the to/from_svbool intrinsics we didn't want to leave newly visible bits hanging and so to_svbool defines them as zero. isZeroingInactiveLanes exists to reduce the need to explicitly zero these lanes by entering certain operations into a contract to guarantee all isel related to them will zero their invisible lanes.

I suspect the reason that non-constant SPLAT_VECTOR's were omitted is because they don't typically come from ACLE code where to_svbool is prevalent. Out of interest, in what context are you seeing them today?

Lukacma · 2024-02-27T16:22:32Z

The main concern I have about this commit is that it relies on splat_vector being always implemented using zeroing instruction.
Do you think it is okay to embed this reliance into the code ?

The real question is what is the canonical representation in the AArch64 backend of types like <vscale x N x i1> where N < 16. Taking <vscale x 8 x i1> as an example, we know it has to represented with a bit-pattern that is vscale * 16 bits wide. Naturally, the "active" bits are the even-numbered bits and the "inactive" bits are the odd-numbered bits. The question is about the inactive bits. We can say they must be zero. Or we can say they can be undefined. We are going through a lot of effort to zero the "inactive" bits (that's the whole point of convert to/from svbool), which makes me think the canonical representation is with zero in "inactive" bits. Hence splat_vector must be lowered to instructions which yield zero in the inactive bits, other possible lowerings would be semantically incorrect.

By design the "invisible bits" within a predicate type are undefined. When changing the visibility of bits via the to/from_svbool intrinsics we didn't want to leave newly visible bits hanging and so to_svbool defines them as zero. isZeroingInactiveLanes exists to reduce the need to explicitly zero these lanes by entering certain operations into a contract to guarantee all isel related to them will zero their invisible lanes.

I suspect the reason that non-constant SPLAT_VECTOR's were omitted is because they don't typically come from ACLE code where to_svbool is prevalent. Out of interest, in what context are you seeing them today?

In the test I have an example, where previously splat_vectors were used to go from scalar to vector type and because they weren't contracted to zero lanes it resulted in extra instructions :

#define <vscale x 16 x i1> @reinterpret_scalar_bool_h(i1 %x){
; CHECK-LABEL: reinterpret_scalar_bool_h:
; CHECK:       // %bb.0:
; CHECK-NEXT:    // kill: def $w0 killed $w0 def $x0
; CHECK-NEXT:    sbfx x8, x0, #0, #1
; CHECK-NEXT:    whilelo p0.h, xzr, x8
; CHECK-NEXT:    ret
  %.splatinsert = insertelement <vscale x 8 x i1> poison, i1 %x, i64 0
  %.splat = shufflevector <vscale x 8 x i1> %.splatinsert, <vscale x 8 x i1> poison, <vscale x 8 x i32> zeroinitializer
  %out = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv8i1(<vscale x 8 x i1> %.splat)
  ret <vscale x 16 x i1> %out
}

paulwalker-arm · 2024-02-27T16:25:40Z

Sure, I can see the tests but I meant from a C/C++ point of view.

Lukacma · 2024-02-27T16:27:59Z

Sure, I can see the tests but I meant from a C/C++ point of view.

It was "simple" casting from bool to svbool_t :

svbool_t dup_u16(_Bool x) {
    return svdup_n_b16(x);
}

paulwalker-arm · 2024-02-27T16:38:00Z

I see, thanks.

paulwalker-arm

Looks good to me. I'm happy for the default case to be cleaned up as a separate patch but if you'd rather do it here then I'll just re-review the updates.

…ng through bitcasts

Lukacma · 2024-02-28T11:17:13Z

I cleaned it up in this patch and added capability to see through bitcasts. Not sure if that is necessary for all cases, but it might happen in certain situations.

github-actions · 2024-02-28T11:18:54Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

…ns (llvm#83001) In certain cases Legalizer was generating `AND(WHILELO, SPLAT 1)` instruction pattern, when `WHILELO` would be sufficient.

[AArch64] Optimized generated assembly for bool to svbool_t conversions

3c4270c

The original assembly was generating `AND(WHILELO, SPLAT 1)` pattern when only `WHILELO` was necessary

Lukacma requested review from momchil-velikov, hassnaaHamdi and CarolineConcatto February 26, 2024 13:24

llvmbot added the backend:AArch64 label Feb 26, 2024

momchil-velikov reviewed Feb 26, 2024

View reviewed changes

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Show resolved Hide resolved

momchil-velikov requested review from paulwalker-arm and RosieSumpter February 26, 2024 17:44

paulwalker-arm approved these changes Feb 27, 2024

View reviewed changes

Cleaned up default behaviour of isZeroingInactiveLanes and added seei…

8724b32

…ng through bitcasts

paulwalker-arm reviewed Feb 28, 2024

View reviewed changes

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Outdated Show resolved Hide resolved

Lukacma added 3 commits February 28, 2024 11:50

test

7d47f2b

Removed seeing through bitcasts

ec9659a

Reverted test commit

305f663

paulwalker-arm approved these changes Feb 28, 2024

View reviewed changes

Lukacma merged commit 2640277 into llvm:main Feb 28, 2024

Lukacma deleted the whilelo_opt branch February 28, 2024 16:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64] Optimized generated assembly for bool to svbool_t conversions #83001

[AArch64] Optimized generated assembly for bool to svbool_t conversions #83001

Uh oh!

Lukacma commented Feb 26, 2024 •

edited

Loading

Uh oh!

llvmbot commented Feb 26, 2024

Uh oh!

Uh oh!

Lukacma commented Feb 26, 2024

Uh oh!

momchil-velikov commented Feb 26, 2024

Uh oh!

paulwalker-arm commented Feb 27, 2024

Uh oh!

Lukacma commented Feb 27, 2024 •

edited

Loading

Uh oh!

paulwalker-arm commented Feb 27, 2024

Uh oh!

Lukacma commented Feb 27, 2024

Uh oh!

paulwalker-arm commented Feb 27, 2024

Uh oh!

paulwalker-arm left a comment

Uh oh!

Lukacma commented Feb 28, 2024

Uh oh!

github-actions bot commented Feb 28, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[AArch64] Optimized generated assembly for bool to svbool_t conversions #83001

[AArch64] Optimized generated assembly for bool to svbool_t conversions #83001

Uh oh!

Conversation

Lukacma commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 26, 2024

Uh oh!

Uh oh!

Lukacma commented Feb 26, 2024

Uh oh!

momchil-velikov commented Feb 26, 2024

Uh oh!

paulwalker-arm commented Feb 27, 2024

Uh oh!

Lukacma commented Feb 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paulwalker-arm commented Feb 27, 2024

Uh oh!

Lukacma commented Feb 27, 2024

Uh oh!

paulwalker-arm commented Feb 27, 2024

Uh oh!

paulwalker-arm left a comment

Choose a reason for hiding this comment

Uh oh!

Lukacma commented Feb 28, 2024

Uh oh!

github-actions bot commented Feb 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Lukacma commented Feb 26, 2024 •

edited

Loading

Lukacma commented Feb 27, 2024 •

edited

Loading

github-actions bot commented Feb 28, 2024 •

edited

Loading