Skip to content

[AArch64] Don't try to sink and(load) #122274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 10, 2025

Conversation

davemgreen
Copy link
Collaborator

If we sink the and in and(load), CGP can hoist is back again to the load, getting into an infinite loop. This prevents sinking the and in this case.

Fixes #122074

@llvmbot
Copy link
Member

llvmbot commented Jan 9, 2025

@llvm/pr-subscribers-backend-aarch64

Author: David Green (davemgreen)

Changes

If we sink the and in and(load), CGP can hoist is back again to the load, getting into an infinite loop. This prevents sinking the and in this case.

Fixes #122074


Full diff: https://github.com/llvm/llvm-project/pull/122274.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+4-1)
  • (modified) llvm/test/CodeGen/AArch64/aarch64-dup-ext-crash.ll (+46)
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 29ea098386cec1..932a6f9ce23fd2 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -5514,7 +5514,10 @@ bool AArch64TTIImpl::isProfitableToSinkOperands(
         NumZExts++;
       }
 
-      Ops.push_back(&Insert->getOperandUse(1));
+      // And(Load) is excluded to prevent CGP getting stuck in a loop of sinking
+      // the And, just to hoist it again back to the load.
+      if (!match(OperandInstr, m_And(m_Load(m_Value()), m_Value())))
+        Ops.push_back(&Insert->getOperandUse(1));
       Ops.push_back(&Shuffle->getOperandUse(0));
       Ops.push_back(&Op);
     }
diff --git a/llvm/test/CodeGen/AArch64/aarch64-dup-ext-crash.ll b/llvm/test/CodeGen/AArch64/aarch64-dup-ext-crash.ll
index 95c54cd8b01511..60781466f0eb79 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-dup-ext-crash.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-dup-ext-crash.ll
@@ -40,3 +40,49 @@ vector.body:                                      ; preds = %vector.body, %vecto
   store <2 x i32> %3, ptr %4, align 4
   br label %vector.body
 }
+
+; This test got stuck in a loop hoisting the and to the load, and sinking it back to the mull
+define i32 @dup_and_load(ptr %p, i1 %c) {
+; CHECK-LABEL: dup_and_load:
+; CHECK:       // %bb.0: // %for.body.lr.ph
+; CHECK-NEXT:    mov x8, x0
+; CHECK-NEXT:    ldrb w0, [x0]
+; CHECK-NEXT:    tbz w1, #0, .LBB1_3
+; CHECK-NEXT:  // %bb.1: // %ph
+; CHECK-NEXT:    dup v0.8h, w0
+; CHECK-NEXT:    mov w9, wzr
+; CHECK-NEXT:  .LBB1_2: // %vector.body
+; CHECK-NEXT:    // =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    ldr d1, [x8]
+; CHECK-NEXT:    add w9, w9, #1
+; CHECK-NEXT:    cmp w9, #100
+; CHECK-NEXT:    ushll v1.8h, v1.8b, #0
+; CHECK-NEXT:    umull2 v2.4s, v0.8h, v1.8h
+; CHECK-NEXT:    umull v1.4s, v0.4h, v1.4h
+; CHECK-NEXT:    stp q1, q2, [x8]
+; CHECK-NEXT:    b.lt .LBB1_2
+; CHECK-NEXT:  .LBB1_3: // %end
+; CHECK-NEXT:    ret
+for.body.lr.ph:
+  %l = load i32, ptr %p
+  %conv314 = and i32 %l, 255
+  br i1 %c, label %ph, label %end
+
+ph:
+  %broadcast.splatinsert = insertelement <8 x i32> poison, i32 %conv314, i32 0
+  %broadcast.splat = shufflevector <8 x i32> %broadcast.splatinsert, <8 x i32> poison, <8 x i32> zeroinitializer
+  br label %vector.body
+
+vector.body:                                      ; preds = %vector.body, %vector.ph
+  %iv = phi i32 [ 0, %ph ], [ %iv.next, %vector.body ]
+  %wide.load = load <8 x i8>, ptr %p, align 4
+  %0 = zext <8 x i8> %wide.load to <8 x i32>
+  %1 = mul <8 x i32> %broadcast.splat, %0
+  store <8 x i32> %1, ptr %p, align 4
+  %iv.next = add i32 %iv, 1
+  %e = icmp slt i32 %iv.next, 100
+  br i1 %e, label %vector.body, label %end
+
+end:
+  ret i32 %conv314
+}

Copy link
Contributor

@hazzlim hazzlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, LGTM - thanks for the fix! :)

Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a sensible fix! I just had a few minor comments if you agree with them.

%broadcast.splat = shufflevector <8 x i32> %broadcast.splatinsert, <8 x i32> poison, <8 x i32> zeroinitializer
br label %vector.body

vector.body: ; preds = %vector.body, %vector.ph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a loop in order to trigger the crash? Just wondering if it could be a normal block to reduce the lines of IR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is OK - it was copied from the test above and modified enough to trigger the issue.

; CHECK-NEXT: b.lt .LBB1_2
; CHECK-NEXT: .LBB1_3: // %end
; CHECK-NEXT: ret
for.body.lr.ph:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Might be nicer to change the block name to entry or something like that?

; CHECK-NEXT: ret
for.body.lr.ph:
%l = load i32, ptr %p
%conv314 = and i32 %l, 255
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Perhaps rename the variables to something a bit more readable, i.e. %and or %and_255. Similarly for the zext and mul in the loop below.

@kongy
Copy link
Collaborator

kongy commented Jan 10, 2025

Thanks, verified that this change fixes the Android's build hang.

If we sink the and in and(load), CGP can hoist is back again to the laod,
getting into an infinite loop. This prevents sinking the and in this case.

Fixes llvm#122074
@davemgreen davemgreen force-pushed the gh-a64-fixsinkandload branch from dae8a51 to e295a9f Compare January 10, 2025 11:52
Copy link
Collaborator Author

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, verified that this change fixes the Android's build hang.

Thanks for checking!

%broadcast.splat = shufflevector <8 x i32> %broadcast.splatinsert, <8 x i32> poison, <8 x i32> zeroinitializer
br label %vector.body

vector.body: ; preds = %vector.body, %vector.ph
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is OK - it was copied from the test above and modified enough to trigger the issue.

@davemgreen davemgreen merged commit 5a069ea into llvm:main Jan 10, 2025
5 of 7 checks passed
@davemgreen davemgreen deleted the gh-a64-fixsinkandload branch January 10, 2025 11:54
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 10, 2025

LLVM Buildbot has detected a new failure on builder clang-armv8-quick running on linaro-clang-armv8-quick while building llvm at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/154/builds/10060

Here is the relevant piece of the build log for the reference
Step 5 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'lit :: googletest-timeout.py' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 9
not env -u FILECHECK_OPTS "/usr/bin/python3.10" /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/llvm/utils/lit/lit.py -j1 --order=lexical -v Inputs/googletest-timeout    --param gtest_filter=InfiniteLoopSubTest --timeout=1 > /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/Output/googletest-timeout.py.tmp.cmd.out
# executed command: not env -u FILECHECK_OPTS /usr/bin/python3.10 /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/llvm/utils/lit/lit.py -j1 --order=lexical -v Inputs/googletest-timeout --param gtest_filter=InfiniteLoopSubTest --timeout=1
# .---command stderr------------
# | lit.py: /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 1 seconds was requested on the command line. Forcing timeout to be 1 seconds.
# `-----------------------------
# RUN: at line 11
FileCheck --check-prefix=CHECK-INF < /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/Output/googletest-timeout.py.tmp.cmd.out /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/googletest-timeout.py
# executed command: FileCheck --check-prefix=CHECK-INF /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/googletest-timeout.py
# RUN: at line 16
not env -u FILECHECK_OPTS "/usr/bin/python3.10" /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/llvm/utils/lit/lit.py -j1 --order=lexical -v Inputs/googletest-timeout   --param gtest_filter=InfiniteLoopSubTest  --param set_timeout=1   > /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/Output/googletest-timeout.py.tmp.cfgset.out
# executed command: not env -u FILECHECK_OPTS /usr/bin/python3.10 /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/llvm/utils/lit/lit.py -j1 --order=lexical -v Inputs/googletest-timeout --param gtest_filter=InfiniteLoopSubTest --param set_timeout=1
# RUN: at line 19
FileCheck --check-prefix=CHECK-INF < /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/Output/googletest-timeout.py.tmp.cfgset.out /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/googletest-timeout.py
# executed command: FileCheck --check-prefix=CHECK-INF /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/googletest-timeout.py
# .---command stderr------------
# | /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/googletest-timeout.py:34:14: error: CHECK-INF: expected string not found in input
# | # CHECK-INF: Timed Out: 1
# |              ^
# | <stdin>:13:29: note: scanning from here
# | Reached timeout of 1 seconds
# |                             ^
# | <stdin>:37:2: note: possible intended match here
# |  Timed Out: 2 (100.00%)
# |  ^
# | 
# | Input file: <stdin>
# | Check file: /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/googletest-timeout.py
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             .
# |             .
# |             .
# |             8:  
# |             9:  
# |            10: -- 
# |            11: exit: -9 
# |            12: -- 
# |            13: Reached timeout of 1 seconds 
# | check:34'0                                 X error: no match found
# |            14: ******************** 
# | check:34'0     ~~~~~~~~~~~~~~~~~~~~~
...

BaiXilin pushed a commit to BaiXilin/llvm-fix-vnni-instr-types that referenced this pull request Jan 12, 2025
If we sink the and in and(load), CGP can hoist is back again to the
load, getting into an infinite loop. This prevents sinking the and in
this case.

Fixes llvm#122074
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LLD hangs forever on AArch64
6 participants