[DAGCombiner] Remove a hasOneUse check in visitAND #115142

david-arm · 2024-11-06T09:58:25Z

For some reason there was a hasOneUse check on the splat for the
second operand and it's not obvious to me why. The check blocks
optimisations for lowering of nodes like AVGFLOORU and AVGCEILU.

In a follow-on patch I also plan to improve the generated code
for AVGCEILU further by teaching computeKnownBits about
zero-extending masked loads.

llvmbot · 2024-11-06T09:58:58Z

@llvm/pr-subscribers-backend-aarch64

Author: David Sherwood (david-arm)

Changes

For some reason there was a hasOneUse check on the splat for the
second operand and it's not obvious to me why. The check blocks
optimisations for lowering of nodes like AVGFLOORU and AVGCEILU.

In a follow-on patch I also plan to improve the generated code
for AVGCEILU further by teaching computeKnownBits about
zero-extending masked loads.

Full diff: https://github.com/llvm/llvm-project/pull/115142.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+1-2)
(modified) llvm/test/CodeGen/AArch64/avg.ll (+47-1)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 7eef09e55101d0..f718cbf65480ab 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -7095,8 +7095,7 @@ SDValue DAGCombiner::visitAND(SDNode *N) {
     // fold (and (masked_load) (splat_vec (x, ...))) to zext_masked_load
     auto *MLoad = dyn_cast<MaskedLoadSDNode>(N0);
     ConstantSDNode *Splat = isConstOrConstSplat(N1, true, true);
-    if (MLoad && MLoad->getExtensionType() == ISD::EXTLOAD && Splat &&
-        N1.hasOneUse()) {
+    if (MLoad && MLoad->getExtensionType() == ISD::EXTLOAD && Splat) {
       EVT LoadVT = MLoad->getMemoryVT();
       EVT ExtVT = VT;
       if (TLI.isLoadExtLegal(ISD::ZEXTLOAD, ExtVT, LoadVT)) {
diff --git a/llvm/test/CodeGen/AArch64/avg.ll b/llvm/test/CodeGen/AArch64/avg.ll
index ea07b10c22c2e7..aac797aafcf2eb 100644
--- a/llvm/test/CodeGen/AArch64/avg.ll
+++ b/llvm/test/CodeGen/AArch64/avg.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
-; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64 -mattr=+sve < %s | FileCheck %s
 
 define <16 x i16> @zext_avgflooru(<16 x i8> %a0, <16 x i8> %a1) {
 ; CHECK-LABEL: zext_avgflooru:
@@ -17,6 +17,28 @@ define <16 x i16> @zext_avgflooru(<16 x i8> %a0, <16 x i8> %a1) {
   ret <16 x i16> %avg
 }
 
+define void @zext_mload_avgflooru(ptr %p1, ptr %p2, <vscale x 8 x i1> %mask) {
+; CHECK-LABEL: zext_mload_avgflooru:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ld1b { z0.h }, p0/z, [x0]
+; CHECK-NEXT:    ld1b { z1.h }, p0/z, [x1]
+; CHECK-NEXT:    eor z2.d, z0.d, z1.d
+; CHECK-NEXT:    and z0.d, z0.d, z1.d
+; CHECK-NEXT:    lsr z1.h, z2.h, #1
+; CHECK-NEXT:    add z0.h, z0.h, z1.h
+; CHECK-NEXT:    st1h { z0.h }, p0, [x0]
+; CHECK-NEXT:    ret
+  %ld1 = call <vscale x 8 x i8> @llvm.masked.load(ptr %p1, i32 16, <vscale x 8 x i1> %mask, <vscale x 8 x i8> zeroinitializer)
+  %ld2 = call <vscale x 8 x i8> @llvm.masked.load(ptr %p2, i32 16, <vscale x 8 x i1> %mask, <vscale x 8 x i8> zeroinitializer)
+  %and = and <vscale x 8 x i8> %ld1, %ld2
+  %xor = xor <vscale x 8 x i8> %ld1, %ld2
+  %shift = lshr <vscale x 8 x i8> %xor, splat(i8 1)
+  %avg = add <vscale x 8 x i8> %and, %shift
+  %avgext = zext <vscale x 8 x i8> %avg to <vscale x 8 x i16>
+  call void @llvm.masked.store.nxv8i16(<vscale x 8 x i16> %avgext, ptr %p1, i32 16, <vscale x 8 x i1> %mask)
+  ret void
+}
+
 define <16 x i16> @zext_avgflooru_mismatch(<16 x i8> %a0, <16 x i4> %a1) {
 ; CHECK-LABEL: zext_avgflooru_mismatch:
 ; CHECK:       // %bb.0:
@@ -51,6 +73,30 @@ define <16 x i16> @zext_avgceilu(<16 x i8> %a0, <16 x i8> %a1) {
   ret <16 x i16> %avg
 }
 
+define void @zext_mload_avgceilu(ptr %p1, ptr %p2, <vscale x 8 x i1> %mask) {
+; CHECK-LABEL: zext_mload_avgceilu:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ld1b { z0.h }, p0/z, [x0]
+; CHECK-NEXT:    ld1b { z1.h }, p0/z, [x1]
+; CHECK-NEXT:    eor z2.d, z0.d, z1.d
+; CHECK-NEXT:    orr z0.d, z0.d, z1.d
+; CHECK-NEXT:    lsr z1.h, z2.h, #1
+; CHECK-NEXT:    sub z0.h, z0.h, z1.h
+; CHECK-NEXT:    st1b { z0.h }, p0, [x0]
+; CHECK-NEXT:    ret
+  %ld1 = call <vscale x 8 x i8> @llvm.masked.load(ptr %p1, i32 16, <vscale x 8 x i1> %mask, <vscale x 8 x i8> zeroinitializer)
+  %ld2 = call <vscale x 8 x i8> @llvm.masked.load(ptr %p2, i32 16, <vscale x 8 x i1> %mask, <vscale x 8 x i8> zeroinitializer)
+  %zext1 = zext <vscale x 8 x i8> %ld1 to <vscale x 8 x i16>
+  %zext2 = zext <vscale x 8 x i8> %ld2 to <vscale x 8 x i16>
+  %add1 = add nuw nsw <vscale x 8 x i16> %zext1, splat(i16 1)
+  %add2 = add nuw nsw <vscale x 8 x i16> %add1, %zext2
+  %shift = lshr <vscale x 8 x i16> %add2, splat(i16 1)
+  %trunc = trunc <vscale x 8 x i16> %shift to <vscale x 8 x i8>
+  call void @llvm.masked.store.nxv8i8(<vscale x 8 x i8> %trunc, ptr %p1, i32 16, <vscale x 8 x i1> %mask)
+  ret void
+}
+
+
 define <16 x i16> @zext_avgceilu_mismatch(<16 x i4> %a0, <16 x i8> %a1) {
 ; CHECK-LABEL: zext_avgceilu_mismatch:
 ; CHECK:       // %bb.0:

llvmbot · 2024-11-06T09:58:58Z

@llvm/pr-subscribers-llvm-selectiondag

Author: David Sherwood (david-arm)

Changes

For some reason there was a hasOneUse check on the splat for the
second operand and it's not obvious to me why. The check blocks
optimisations for lowering of nodes like AVGFLOORU and AVGCEILU.

In a follow-on patch I also plan to improve the generated code
for AVGCEILU further by teaching computeKnownBits about
zero-extending masked loads.

Full diff: https://github.com/llvm/llvm-project/pull/115142.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+1-2)
(modified) llvm/test/CodeGen/AArch64/avg.ll (+47-1)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 7eef09e55101d0..f718cbf65480ab 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -7095,8 +7095,7 @@ SDValue DAGCombiner::visitAND(SDNode *N) {
     // fold (and (masked_load) (splat_vec (x, ...))) to zext_masked_load
     auto *MLoad = dyn_cast<MaskedLoadSDNode>(N0);
     ConstantSDNode *Splat = isConstOrConstSplat(N1, true, true);
-    if (MLoad && MLoad->getExtensionType() == ISD::EXTLOAD && Splat &&
-        N1.hasOneUse()) {
+    if (MLoad && MLoad->getExtensionType() == ISD::EXTLOAD && Splat) {
       EVT LoadVT = MLoad->getMemoryVT();
       EVT ExtVT = VT;
       if (TLI.isLoadExtLegal(ISD::ZEXTLOAD, ExtVT, LoadVT)) {
diff --git a/llvm/test/CodeGen/AArch64/avg.ll b/llvm/test/CodeGen/AArch64/avg.ll
index ea07b10c22c2e7..aac797aafcf2eb 100644
--- a/llvm/test/CodeGen/AArch64/avg.ll
+++ b/llvm/test/CodeGen/AArch64/avg.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
-; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64 -mattr=+sve < %s | FileCheck %s
 
 define <16 x i16> @zext_avgflooru(<16 x i8> %a0, <16 x i8> %a1) {
 ; CHECK-LABEL: zext_avgflooru:
@@ -17,6 +17,28 @@ define <16 x i16> @zext_avgflooru(<16 x i8> %a0, <16 x i8> %a1) {
   ret <16 x i16> %avg
 }
 
+define void @zext_mload_avgflooru(ptr %p1, ptr %p2, <vscale x 8 x i1> %mask) {
+; CHECK-LABEL: zext_mload_avgflooru:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ld1b { z0.h }, p0/z, [x0]
+; CHECK-NEXT:    ld1b { z1.h }, p0/z, [x1]
+; CHECK-NEXT:    eor z2.d, z0.d, z1.d
+; CHECK-NEXT:    and z0.d, z0.d, z1.d
+; CHECK-NEXT:    lsr z1.h, z2.h, #1
+; CHECK-NEXT:    add z0.h, z0.h, z1.h
+; CHECK-NEXT:    st1h { z0.h }, p0, [x0]
+; CHECK-NEXT:    ret
+  %ld1 = call <vscale x 8 x i8> @llvm.masked.load(ptr %p1, i32 16, <vscale x 8 x i1> %mask, <vscale x 8 x i8> zeroinitializer)
+  %ld2 = call <vscale x 8 x i8> @llvm.masked.load(ptr %p2, i32 16, <vscale x 8 x i1> %mask, <vscale x 8 x i8> zeroinitializer)
+  %and = and <vscale x 8 x i8> %ld1, %ld2
+  %xor = xor <vscale x 8 x i8> %ld1, %ld2
+  %shift = lshr <vscale x 8 x i8> %xor, splat(i8 1)
+  %avg = add <vscale x 8 x i8> %and, %shift
+  %avgext = zext <vscale x 8 x i8> %avg to <vscale x 8 x i16>
+  call void @llvm.masked.store.nxv8i16(<vscale x 8 x i16> %avgext, ptr %p1, i32 16, <vscale x 8 x i1> %mask)
+  ret void
+}
+
 define <16 x i16> @zext_avgflooru_mismatch(<16 x i8> %a0, <16 x i4> %a1) {
 ; CHECK-LABEL: zext_avgflooru_mismatch:
 ; CHECK:       // %bb.0:
@@ -51,6 +73,30 @@ define <16 x i16> @zext_avgceilu(<16 x i8> %a0, <16 x i8> %a1) {
   ret <16 x i16> %avg
 }
 
+define void @zext_mload_avgceilu(ptr %p1, ptr %p2, <vscale x 8 x i1> %mask) {
+; CHECK-LABEL: zext_mload_avgceilu:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ld1b { z0.h }, p0/z, [x0]
+; CHECK-NEXT:    ld1b { z1.h }, p0/z, [x1]
+; CHECK-NEXT:    eor z2.d, z0.d, z1.d
+; CHECK-NEXT:    orr z0.d, z0.d, z1.d
+; CHECK-NEXT:    lsr z1.h, z2.h, #1
+; CHECK-NEXT:    sub z0.h, z0.h, z1.h
+; CHECK-NEXT:    st1b { z0.h }, p0, [x0]
+; CHECK-NEXT:    ret
+  %ld1 = call <vscale x 8 x i8> @llvm.masked.load(ptr %p1, i32 16, <vscale x 8 x i1> %mask, <vscale x 8 x i8> zeroinitializer)
+  %ld2 = call <vscale x 8 x i8> @llvm.masked.load(ptr %p2, i32 16, <vscale x 8 x i1> %mask, <vscale x 8 x i8> zeroinitializer)
+  %zext1 = zext <vscale x 8 x i8> %ld1 to <vscale x 8 x i16>
+  %zext2 = zext <vscale x 8 x i8> %ld2 to <vscale x 8 x i16>
+  %add1 = add nuw nsw <vscale x 8 x i16> %zext1, splat(i16 1)
+  %add2 = add nuw nsw <vscale x 8 x i16> %add1, %zext2
+  %shift = lshr <vscale x 8 x i16> %add2, splat(i16 1)
+  %trunc = trunc <vscale x 8 x i16> %shift to <vscale x 8 x i8>
+  call void @llvm.masked.store.nxv8i8(<vscale x 8 x i8> %trunc, ptr %p1, i32 16, <vscale x 8 x i1> %mask)
+  ret void
+}
+
+
 define <16 x i16> @zext_avgceilu_mismatch(<16 x i4> %a0, <16 x i8> %a1) {
 ; CHECK-LABEL: zext_avgceilu_mismatch:
 ; CHECK:       // %bb.0:

davemgreen · 2024-11-06T10:01:46Z

llvm/test/CodeGen/AArch64/avg.ll

@@ -17,6 +17,28 @@ define <16 x i16> @zext_avgflooru(<16 x i8> %a0, <16 x i8> %a1) {
  ret <16 x i16> %avg
 }

+define void @zext_mload_avgflooru(ptr %p1, ptr %p2, <vscale x 8 x i1> %mask) {


This feels like it should be in a sve file, like sve-hadd.ll

davemgreen · 2024-11-06T10:02:33Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

@@ -7095,8 +7095,7 @@ SDValue DAGCombiner::visitAND(SDNode *N) {
    // fold (and (masked_load) (splat_vec (x, ...))) to zext_masked_load
    auto *MLoad = dyn_cast<MaskedLoadSDNode>(N0);
    ConstantSDNode *Splat = isConstOrConstSplat(N1, true, true);
-    if (MLoad && MLoad->getExtensionType() == ISD::EXTLOAD && Splat &&
-        N1.hasOneUse()) {


Should it be checking N0.hasOneUse()? (i.e the load)

If I add that check it causes regression in CodeGen/AArch64/sve-load-compare-store.ll and CodeGen/Thumb2/mve-masked-load.ll

An extra instruction in multi_user_zext (mve-masked-load.ll) and two extra in sve_load_compare_store (sve-load-compare-store.ll)

You can see from the code the transformation has explicit handling for this (i.e. no new use count check is necessary). The original load is an any-extend and so the transformation is simply promoting this to a zero-extend which can be used by all uses.

For some reason there was a hasOneUse check on the splat for the second operand and it's not obvious to me why. The check blocks optimisations for lowering of nodes like AVGFLOORU and AVGCEILU. In a follow-on patch I also plan to improve the generated code for AVGCEILU further by teaching computeKnownBits about zero-extending masked loads.

paulwalker-arm

FYI: Not affected by this PR but I think the original transformation might have a bug because when promoting the any-extend to zero-extend, I think passthru should also be explicitly zero-extended.

I'm not 100% sure because an equally valid interpretation would be for users of passthru to apply the extension type.

My guess is with most all uses being either undef or zero this isn't actually affects us yet.

david-arm · 2024-11-08T08:20:20Z

I ran make check-all downstream on Linux and it passed. Not sure what is happening with the pre-commit testing, but the Windows build passed fine.

For some reason there was a hasOneUse check on the splat for the second operand and it's not obvious to me why. The check blocks optimisations for lowering of nodes like AVGFLOORU and AVGCEILU. In a follow-on patch I also plan to improve the generated code for AVGCEILU further by teaching computeKnownBits about zero-extending masked loads.

david-arm requested review from SamTebbs33, davemgreen and paulwalker-arm November 6, 2024 09:58

llvmbot added backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well labels Nov 6, 2024

davemgreen reviewed Nov 6, 2024

View reviewed changes

david-arm added 2 commits November 6, 2024 11:24

Add tests

3f6753b

david-arm force-pushed the avgflooru branch from 3bb5171 to 204701b Compare November 6, 2024 11:26

paulwalker-arm approved these changes Nov 7, 2024

View reviewed changes

david-arm merged commit b9dd602 into llvm:main Nov 8, 2024
6 of 8 checks passed

paulwalker-arm mentioned this pull request Nov 8, 2024

[SelectionDAG] Add support for extending masked loads in computeKnownBits #115450

Merged

david-arm deleted the avgflooru branch January 28, 2025 11:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DAGCombiner] Remove a hasOneUse check in visitAND #115142

[DAGCombiner] Remove a hasOneUse check in visitAND #115142

Uh oh!

david-arm commented Nov 6, 2024

Uh oh!

llvmbot commented Nov 6, 2024

Uh oh!

llvmbot commented Nov 6, 2024

Uh oh!

davemgreen Nov 6, 2024

Uh oh!

david-arm Nov 6, 2024

Uh oh!

davemgreen Nov 6, 2024

Uh oh!

david-arm Nov 6, 2024

Uh oh!

david-arm Nov 6, 2024

Uh oh!

paulwalker-arm Nov 6, 2024 •

edited

Loading

Uh oh!

paulwalker-arm left a comment

Uh oh!

david-arm commented Nov 8, 2024

Uh oh!

Uh oh!

Uh oh!

[DAGCombiner] Remove a hasOneUse check in visitAND #115142

[DAGCombiner] Remove a hasOneUse check in visitAND #115142

Uh oh!

Conversation

david-arm commented Nov 6, 2024

Uh oh!

llvmbot commented Nov 6, 2024

Uh oh!

llvmbot commented Nov 6, 2024

Uh oh!

davemgreen Nov 6, 2024

Choose a reason for hiding this comment

Uh oh!

david-arm Nov 6, 2024

Choose a reason for hiding this comment

Uh oh!

davemgreen Nov 6, 2024

Choose a reason for hiding this comment

Uh oh!

david-arm Nov 6, 2024

Choose a reason for hiding this comment

Uh oh!

david-arm Nov 6, 2024

Choose a reason for hiding this comment

Uh oh!

paulwalker-arm Nov 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paulwalker-arm left a comment

Choose a reason for hiding this comment

Uh oh!

david-arm commented Nov 8, 2024

Uh oh!

Uh oh!

Uh oh!

paulwalker-arm Nov 6, 2024 •

edited

Loading