Skip to content

MemCpyOpt: replace an AA query with MSSA query (NFC) #108535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 24, 2024

Conversation

artagnon
Copy link
Contributor

@artagnon artagnon commented Sep 13, 2024

Fix a long-standing TODO.

An AA query in processStoreOfLoad misses certain cases, and has been
marked as a TODO. Replace it with an MSSA query to increase MemCpyOpt's
power, fixing the long-standing TODO.
@artagnon artagnon requested a review from nikic as a code owner September 13, 2024 11:20
@artagnon artagnon requested review from dtcxzyw and nikic and removed request for nikic September 13, 2024 11:20
@llvmbot
Copy link
Member

llvmbot commented Sep 13, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Ramkumar Ramachandra (artagnon)

Changes

An AA query in processStoreOfLoad misses certain cases, and has been marked as a TODO. Replace it with an MSSA query to increase MemCpyOpt's power, fixing the long-standing TODO.


Full diff: https://github.com/llvm/llvm-project/pull/108535.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp (+12-13)
  • (modified) llvm/test/Transforms/MemCpyOpt/fca2memcpy.ll (+14-17)
diff --git a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
index 1d67773585d593..0b7e58d5b961ae 100644
--- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
+++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
@@ -647,19 +647,18 @@ bool MemCpyOptPass::processStoreOfLoad(StoreInst *SI, LoadInst *LI,
       (EnableMemCpyOptWithoutLibcalls ||
        (TLI->has(LibFunc_memcpy) && TLI->has(LibFunc_memmove)))) {
     MemoryLocation LoadLoc = MemoryLocation::get(LI);
-
-    // We use alias analysis to check if an instruction may store to
-    // the memory we load from in between the load and the store. If
-    // such an instruction is found, we try to promote there instead
-    // of at the store position.
-    // TODO: Can use MSSA for this.
-    Instruction *P = SI;
-    for (auto &I : make_range(++LI->getIterator(), SI->getIterator())) {
-      if (isModSet(AA->getModRefInfo(&I, LoadLoc))) {
-        P = &I;
-        break;
-      }
-    }
+    MemoryUseOrDef *LoadAccess = MSSA->getMemoryAccess(LI),
+                   *StoreAccess = MSSA->getMemoryAccess(SI);
+
+    // We use MSSA to check if an instruction may store to the memory we load
+    // from in between the load and the store. If such an instruction is found,
+    // we try to promote there instead of at the store position.
+    BatchAAResults BAA(*AA);
+    auto *Clobber =
+        cast<MemoryUseOrDef>(MSSA->getWalker()->getClobberingMemoryAccess(
+            StoreAccess, LoadLoc, BAA));
+    Instruction *P =
+        MSSA->dominates(LoadAccess, Clobber) ? Clobber->getMemoryInst() : SI;
 
     // If we found an instruction that may write to the loaded memory,
     // we can try to promote at this position instead of the store
diff --git a/llvm/test/Transforms/MemCpyOpt/fca2memcpy.ll b/llvm/test/Transforms/MemCpyOpt/fca2memcpy.ll
index 51fad820509393..00cffb9b1c7f19 100644
--- a/llvm/test/Transforms/MemCpyOpt/fca2memcpy.ll
+++ b/llvm/test/Transforms/MemCpyOpt/fca2memcpy.ll
@@ -38,9 +38,8 @@ define void @noaliasdst(ptr %src, ptr noalias %dst) {
 
 define void @destroysrc(ptr %src, ptr %dst) {
 ; CHECK-LABEL: @destroysrc(
-; CHECK-NEXT:    [[TMP1:%.*]] = load [[S:%.*]], ptr [[SRC:%.*]], align 8
-; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[SRC]], i8 0, i64 16, i1 false)
-; CHECK-NEXT:    store [[S]] [[TMP1]], ptr [[DST:%.*]], align 8
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[SRC:%.*]], i8 0, i64 16, i1 false)
+; CHECK-NEXT:    call void @llvm.memmove.p0.p0.i64(ptr align 8 [[DST:%.*]], ptr align 8 [[SRC]], i64 16, i1 false)
 ; CHECK-NEXT:    ret void
 ;
   %1 = load %S, ptr %src
@@ -51,8 +50,8 @@ define void @destroysrc(ptr %src, ptr %dst) {
 
 define void @destroynoaliassrc(ptr noalias %src, ptr %dst) {
 ; CHECK-LABEL: @destroynoaliassrc(
-; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 8 [[DST:%.*]], ptr align 8 [[SRC]], i64 16, i1 false)
-; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[SRC:%.*]], i8 0, i64 16, i1 false)
+; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 8 [[DST:%.*]], ptr align 8 [[SRC:%.*]], i64 16, i1 false)
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[SRC]], i8 0, i64 16, i1 false)
 ; CHECK-NEXT:    ret void
 ;
   %1 = load %S, ptr %src
@@ -63,9 +62,8 @@ define void @destroynoaliassrc(ptr noalias %src, ptr %dst) {
 
 define void @copyalias(ptr %src, ptr %dst) {
 ; CHECK-LABEL: @copyalias(
-; CHECK-NEXT:    [[TMP1:%.*]] = load [[S:%.*]], ptr [[SRC:%.*]], align 8
-; CHECK-NEXT:    call void @llvm.memmove.p0.p0.i64(ptr align 8 [[DST:%.*]], ptr align 8 [[SRC]], i64 16, i1 false)
-; CHECK-NEXT:    store [[S]] [[TMP1]], ptr [[DST]], align 8
+; CHECK-NEXT:    call void @llvm.memmove.p0.p0.i64(ptr align 8 [[DST:%.*]], ptr align 8 [[SRC:%.*]], i64 16, i1 false)
+; CHECK-NEXT:    call void @llvm.memmove.p0.p0.i64(ptr align 8 [[DST]], ptr align 8 [[SRC]], i64 16, i1 false)
 ; CHECK-NEXT:    ret void
 ;
   %1 = load %S, ptr %src
@@ -79,9 +77,9 @@ define void @copyalias(ptr %src, ptr %dst) {
 ; sure we lift the computation as well if needed and possible.
 define void @addrproducer(ptr %src, ptr %dst) {
 ; CHECK-LABEL: @addrproducer(
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[DST:%.*]], i8 undef, i64 16, i1 false)
 ; CHECK-NEXT:    [[DST2:%.*]] = getelementptr [[S:%.*]], ptr [[DST]], i64 1
 ; CHECK-NEXT:    call void @llvm.memmove.p0.p0.i64(ptr align 8 [[DST2]], ptr align 8 [[SRC:%.*]], i64 16, i1 false)
-; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[DST:%.*]], i8 undef, i64 16, i1 false)
 ; CHECK-NEXT:    ret void
 ;
   %1 = load %S, ptr %src
@@ -93,11 +91,10 @@ define void @addrproducer(ptr %src, ptr %dst) {
 
 define void @aliasaddrproducer(ptr %src, ptr %dst, ptr %dstidptr) {
 ; CHECK-LABEL: @aliasaddrproducer(
-; CHECK-NEXT:    [[TMP1:%.*]] = load [[S:%.*]], ptr [[SRC:%.*]], align 8
 ; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[DST:%.*]], i8 undef, i64 16, i1 false)
 ; CHECK-NEXT:    [[DSTINDEX:%.*]] = load i32, ptr [[DSTIDPTR:%.*]], align 4
-; CHECK-NEXT:    [[DST2:%.*]] = getelementptr [[S]], ptr [[DST]], i32 [[DSTINDEX]]
-; CHECK-NEXT:    store [[S]] [[TMP1]], ptr [[DST2]], align 8
+; CHECK-NEXT:    [[DST2:%.*]] = getelementptr [[S:%.*]], ptr [[DST]], i32 [[DSTINDEX]]
+; CHECK-NEXT:    call void @llvm.memmove.p0.p0.i64(ptr align 8 [[DST2]], ptr align 8 [[SRC:%.*]], i64 16, i1 false)
 ; CHECK-NEXT:    ret void
 ;
   %1 = load %S, ptr %src
@@ -110,11 +107,11 @@ define void @aliasaddrproducer(ptr %src, ptr %dst, ptr %dstidptr) {
 
 define void @noaliasaddrproducer(ptr %src, ptr noalias %dst, ptr noalias %dstidptr) {
 ; CHECK-LABEL: @noaliasaddrproducer(
-; CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[DSTIDPTR:%.*]], align 4
-; CHECK-NEXT:    [[DSTINDEX:%.*]] = or i32 [[TMP2]], 1
+; CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[DSTIDPTR:%.*]], align 4
+; CHECK-NEXT:    [[DSTINDEX:%.*]] = or i32 [[TMP1]], 1
 ; CHECK-NEXT:    [[DST2:%.*]] = getelementptr [[S:%.*]], ptr [[DST:%.*]], i32 [[DSTINDEX]]
-; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 8 [[DST2]], ptr align 8 [[SRC]], i64 16, i1 false)
-; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[SRC:%.*]], i8 undef, i64 16, i1 false)
+; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 8 [[DST2]], ptr align 8 [[SRC:%.*]], i64 16, i1 false)
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[SRC]], i8 undef, i64 16, i1 false)
 ; CHECK-NEXT:    ret void
 ;
   %1 = load %S, ptr %src
@@ -130,7 +127,7 @@ define void @throwing_call(ptr noalias %src, ptr %dst) {
 ; CHECK-LABEL: @throwing_call(
 ; CHECK-NEXT:    [[TMP1:%.*]] = load [[S:%.*]], ptr [[SRC:%.*]], align 8
 ; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[SRC]], i8 0, i64 16, i1 false)
-; CHECK-NEXT:    call void @call() [[ATTR2:#.*]]
+; CHECK-NEXT:    call void @call() #[[ATTR2:[0-9]+]]
 ; CHECK-NEXT:    store [[S]] [[TMP1]], ptr [[DST:%.*]], align 8
 ; CHECK-NEXT:    ret void
 ;

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Sep 13, 2024
@dtcxzyw dtcxzyw requested a review from fhahn September 13, 2024 16:08
@artagnon artagnon changed the title MemCpyOpt: replace an AA query with MSSA query MemCpyOpt: replace an AA query with MSSA query (NFC) Sep 14, 2024
@artagnon
Copy link
Contributor Author

Gentle ping. This patch should be okay, no?

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A disadvantage of using MSSA is that we'll scan for clobbers higher than the load instruction. This is a general problem with MSSA clobber walks though, it would be nice if they have an option to specify a "stop access" so we don't waste time scanning higher.

Copy link

github-actions bot commented Sep 23, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@artagnon
Copy link
Contributor Author

A disadvantage of using MSSA is that we'll scan for clobbers higher than the load instruction. This is a general problem with MSSA clobber walks though, it would be nice if they have an option to specify a "stop access" so we don't waste time scanning higher.

Yes, I saw this being noted in a TODO in the file. Does it walk through non-memory instructions as well, or does it keep a representation of just the memory instructions?

@nikic
Copy link
Contributor

nikic commented Sep 23, 2024

A disadvantage of using MSSA is that we'll scan for clobbers higher than the load instruction. This is a general problem with MSSA clobber walks though, it would be nice if they have an option to specify a "stop access" so we don't waste time scanning higher.

Yes, I saw this being noted in a TODO in the file. Does it walk through non-memory instructions as well, or does it keep a representation of just the memory instructions?

It only walks memory instructions. But the expensive part are the AA queries.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@artagnon artagnon merged commit f664d31 into llvm:main Sep 24, 2024
8 checks passed
@artagnon artagnon deleted the memcpyopt-mssa branch September 24, 2024 10:18
nikic added a commit that referenced this pull request Mar 12, 2025
This effectively reverts #108535. The old AA code was looking for
the *first* clobber between the load and store and then trying to
move all the way up there. The new MSSA based code instead found
the *last* clobber. There might still be an earlier clobber that
has not been accounted for.

Fixes #130632.
swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request Mar 13, 2025
This effectively reverts llvm#108535. The old AA code was looking for
the *first* clobber between the load and store and then trying to
move all the way up there. The new MSSA based code instead found
the *last* clobber. There might still be an earlier clobber that
has not been accounted for.

Fixes llvm#130632.

(cherry picked from commit 5da9044)
frederik-h pushed a commit to frederik-h/llvm-project that referenced this pull request Mar 18, 2025
This effectively reverts llvm#108535. The old AA code was looking for
the *first* clobber between the load and store and then trying to
move all the way up there. The new MSSA based code instead found
the *last* clobber. There might still be an earlier clobber that
has not been accounted for.

Fixes llvm#130632.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants