Skip to content

[MemCpyOpt] handle memcpy from memset in more cases #140954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 11, 2025

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented May 21, 2025

This aims to reduce the divergence between the initial checks in this function and processMemCpyMemCpyDependence (in particular, adding handling of offsets), with the goal to eventually reduce duplication there and improve this pass in other ways.

This aims to reduce the divergence between this function and
processMemCpyMemCpyDependence, with the goal to eventually reduce
duplication here and combine them and improve this pass.
@llvmbot
Copy link
Member

llvmbot commented May 21, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Jameson Nash (vtjnash)

Changes

This aims to reduce the divergence between the initial checks in this function and processMemCpyMemCpyDependence (in particular, adding handling of offsets and not requiring known sizes), with the goal to eventually reduce duplication there and improve this pass in other ways.


Full diff: https://github.com/llvm/llvm-project/pull/140954.diff

6 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp (+67-36)
  • (modified) llvm/test/Transforms/MemCpyOpt/lifetime-missing.ll (+5-1)
  • (modified) llvm/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll (+48)
  • (modified) llvm/test/Transforms/MemCpyOpt/memset-memcpy-to-2x-memset.ll (+1-1)
  • (modified) llvm/test/Transforms/MemCpyOpt/mixed-sizes.ll (+1-1)
  • (modified) llvm/test/Transforms/MemCpyOpt/variable-sized-memset-memcpy.ll (+2-2)
diff --git a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
index 9c8cd53d56b56..95d956f18aa69 100644
--- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
+++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
@@ -1367,8 +1367,9 @@ bool MemCpyOptPass::processMemSetMemCpyDependence(MemCpyInst *MemCpy,
   return true;
 }
 
-/// Determine whether the instruction has undefined content for the given Size,
-/// either because it was freshly alloca'd or started its lifetime.
+/// Determine whether the pointer V had only undefined content from Def up to
+/// the given Size, either because it was freshly alloca'd or started its
+/// lifetime.
 static bool hasUndefContents(MemorySSA *MSSA, BatchAAResults &AA, Value *V,
                              MemoryDef *Def, Value *Size) {
   if (MSSA->isLiveOnEntryDef(Def))
@@ -1403,6 +1404,24 @@ static bool hasUndefContents(MemorySSA *MSSA, BatchAAResults &AA, Value *V,
   return false;
 }
 
+static bool coversInputFully(MemorySSA *MSSA, MemCpyInst *MemCpy,
+                             MemIntrinsic *MemSrc, BatchAAResults &BAA) {
+  // If the memcpy is larger than the previous, but the memory was undef prior
+  // to that, we can just ignore the tail. Technically we're only
+  // interested in the bytes from 0..MemSrcOffset and
+  // MemSrcLength+MemSrcOffset..CopySize here, but as we can't easily represent
+  // this location, we use the full 0..CopySize range.
+  Value *CopySize = MemCpy->getLength();
+  MemoryLocation MemCpyLoc = MemoryLocation::getForSource(MemCpy);
+  MemoryUseOrDef *MemSrcAccess = MSSA->getMemoryAccess(MemSrc);
+  MemoryAccess *Clobber = MSSA->getWalker()->getClobberingMemoryAccess(
+      MemSrcAccess->getDefiningAccess(), MemCpyLoc, BAA);
+  if (auto *MD = dyn_cast<MemoryDef>(Clobber))
+    if (hasUndefContents(MSSA, BAA, MemCpy->getSource(), MD, CopySize))
+      return true;
+  return false;
+}
+
 /// Transform memcpy to memset when its source was just memset.
 /// In other words, turn:
 /// \code
@@ -1418,51 +1437,63 @@ static bool hasUndefContents(MemorySSA *MSSA, BatchAAResults &AA, Value *V,
 bool MemCpyOptPass::performMemCpyToMemSetOptzn(MemCpyInst *MemCpy,
                                                MemSetInst *MemSet,
                                                BatchAAResults &BAA) {
-  // Make sure that memcpy(..., memset(...), ...), that is we are memsetting and
-  // memcpying from the same address. Otherwise it is hard to reason about.
-  if (!BAA.isMustAlias(MemSet->getRawDest(), MemCpy->getRawSource()))
-    return false;
-
   Value *MemSetSize = MemSet->getLength();
   Value *CopySize = MemCpy->getLength();
 
-  if (MemSetSize != CopySize) {
-    // Make sure the memcpy doesn't read any more than what the memset wrote.
-    // Don't worry about sizes larger than i64.
-
-    // A known memset size is required.
-    auto *CMemSetSize = dyn_cast<ConstantInt>(MemSetSize);
-    if (!CMemSetSize)
+  int64_t MOffset = 0;
+  const DataLayout &DL = MemCpy->getModule()->getDataLayout();
+  // We can only transforms memcpy's where the dest of one is the source of the
+  // other, or they have a known offset.
+  if (MemCpy->getSource() != MemSet->getDest()) {
+    std::optional<int64_t> Offset =
+        MemCpy->getSource()->getPointerOffsetFrom(MemSet->getDest(), DL);
+    if (!Offset)
       return false;
+    MOffset = *Offset;
+  }
 
-    // A known memcpy size is also required.
+  MaybeAlign MDestAlign = MemCpy->getDestAlign();
+  int64_t MOffsetAligned = MDestAlign.valueOrOne().value() > 1 && MOffset < 0 ? -(-MOffset & ~(MDestAlign.valueOrOne().value() - 1)) : MOffset; // Compute the MOffset that keeps MDest aligned (truncate towards zero)
+  if (MOffset != 0 || MemSetSize != CopySize) {
+    // Make sure the memcpy doesn't read any more than what the memset wrote, other than undef.
+    auto *CMemSetSize = dyn_cast<ConstantInt>(MemSetSize);
     auto *CCopySize = dyn_cast<ConstantInt>(CopySize);
-    if (!CCopySize)
-      return false;
-    if (CCopySize->getZExtValue() > CMemSetSize->getZExtValue()) {
-      // If the memcpy is larger than the memset, but the memory was undef prior
-      // to the memset, we can just ignore the tail. Technically we're only
-      // interested in the bytes from MemSetSize..CopySize here, but as we can't
-      // easily represent this location, we use the full 0..CopySize range.
-      MemoryLocation MemCpyLoc = MemoryLocation::getForSource(MemCpy);
-      bool CanReduceSize = false;
-      MemoryUseOrDef *MemSetAccess = MSSA->getMemoryAccess(MemSet);
-      MemoryAccess *Clobber = MSSA->getWalker()->getClobberingMemoryAccess(
-          MemSetAccess->getDefiningAccess(), MemCpyLoc, BAA);
-      if (auto *MD = dyn_cast<MemoryDef>(Clobber))
-        if (hasUndefContents(MSSA, BAA, MemCpy->getSource(), MD, CopySize))
-          CanReduceSize = true;
-
-      if (!CanReduceSize)
+    // Don't worry about sizes larger than i64.
+    if (!CMemSetSize || !CCopySize || MOffset < 0 ||
+        CCopySize->getZExtValue() + MOffset > CMemSetSize->getZExtValue()) {
+      if (!coversInputFully(MSSA, MemCpy, MemSet, BAA))
         return false;
-      CopySize = MemSetSize;
+
+      if (CMemSetSize && CCopySize) {
+        // If both have constant sizes and offsets, clip the memcpy to the bounds of the memset if applicable.
+        if (CCopySize->getZExtValue() + std::abs(MOffset) > CMemSetSize->getZExtValue()) {
+          if (MOffsetAligned == 0 || (MOffset < 0 && CCopySize->getZExtValue() + MOffset > CMemSetSize->getZExtValue()))
+            CopySize = MemSetSize;
+          else
+            CopySize = ConstantInt::get(CopySize->getType(), std::max((int64_t)0, (int64_t)(CMemSetSize->getZExtValue() - std::abs(MOffsetAligned))));
+        }
+        else if (MOffsetAligned < 0) {
+          // Even if CMemSetSize isn't known, if the MOffsetAligned is negative, make sure to clip the new memset
+          CopySize = ConstantInt::get(CopySize->getType(), CCopySize->getZExtValue() + MOffsetAligned);
+        }
+      }
+      else if (CCopySize && MOffsetAligned < 0) {
+        // Even if CMemSetSize isn't known, if the MOffsetAligned is negative, can still clip the new memset
+        CopySize = ConstantInt::get(CopySize->getType(), CCopySize->getZExtValue() + MOffsetAligned);
+      }
+      else {
+        MOffsetAligned = 0;
+      }
     }
   }
 
   IRBuilder<> Builder(MemCpy);
+  Value *MDest = MemCpy->getRawDest();
+  if (MOffsetAligned < 0)
+    MDest = Builder.CreateInBoundsPtrAdd(MDest, Builder.getInt64(-MOffsetAligned));
   Instruction *NewM =
-      Builder.CreateMemSet(MemCpy->getRawDest(), MemSet->getOperand(1),
-                           CopySize, MemCpy->getDestAlign());
+      Builder.CreateMemSet(MDest, MemSet->getOperand(1),
+                           CopySize, MDestAlign);
   auto *LastDef = cast<MemoryDef>(MSSA->getMemoryAccess(MemCpy));
   auto *NewAccess = MSSAU->createMemoryAccessAfter(NewM, nullptr, LastDef);
   MSSAU->insertDef(cast<MemoryDef>(NewAccess), /*RenameUses=*/true);
@@ -1683,7 +1714,7 @@ bool MemCpyOptPass::performStackMoveOptzn(Instruction *Load, Instruction *Store,
     I->setMetadata(LLVMContext::MD_tbaa_struct, nullptr);
   }
 
-  LLVM_DEBUG(dbgs() << "Stack Move: Performed staack-move optimization\n");
+  LLVM_DEBUG(dbgs() << "Stack Move: Performed stack-move optimization\n");
   NumStackMove++;
   return true;
 }
diff --git a/llvm/test/Transforms/MemCpyOpt/lifetime-missing.ll b/llvm/test/Transforms/MemCpyOpt/lifetime-missing.ll
index 0626f09702f7e..3ad49a63b357a 100644
--- a/llvm/test/Transforms/MemCpyOpt/lifetime-missing.ll
+++ b/llvm/test/Transforms/MemCpyOpt/lifetime-missing.ll
@@ -14,8 +14,12 @@ define void @test() {
 ; CHECK-LABEL: define void @test() {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[AGG_TMP_SROA_14:%.*]] = alloca [20 x i8], align 4
-; CHECK-NEXT:    [[AGG_TMP_SROA_14_128_SROA_IDX:%.*]] = getelementptr i8, ptr [[AGG_TMP_SROA_14]], i64 4
+; CHECK-NEXT:    [[AGG_TMP_SROA_15:%.*]] = alloca [20 x i8], align 4
+; CHECK-NEXT:    [[AGG_TMP_SROA_14_128_SROA_IDX:%.*]] = getelementptr i8, ptr [[AGG_TMP_SROA_15]], i64 4
 ; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[AGG_TMP_SROA_14_128_SROA_IDX]], i8 0, i64 1, i1 false)
+; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 20, ptr [[AGG_TMP_SROA_14]])
+; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds i8, ptr [[AGG_TMP_SROA_14]], i64 4
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[TMP0]], i8 0, i64 1, i1 false)
 ; CHECK-NEXT:    [[AGG_TMP3_SROA_35_128_SROA_IDX:%.*]] = getelementptr i8, ptr [[AGG_TMP_SROA_14]], i64 4
 ; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr inttoptr (i64 4 to ptr), i8 0, i64 1, i1 false)
 ; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr null, i8 0, i64 1, i1 false)
diff --git a/llvm/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll b/llvm/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll
index 1c3896407e950..424cdde5fa780 100644
--- a/llvm/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll
+++ b/llvm/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll
@@ -187,6 +187,54 @@ define void @test_write_before_memset_in_both_regions(ptr %result) {
   ret void
 }
 
+define void @test_offset_memset(ptr %result) {
+; CHECK-LABEL: @test_offset_memset(
+; CHECK-NEXT:    [[A1:%.*]] = alloca [4 x i32], align 8
+; CHECK-NEXT:    [[A:%.*]] = getelementptr i32, ptr [[A1]], i32 1
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[A]], i8 0, i64 12, i1 false)
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i8, ptr [[RESULT:%.*]], i64 4
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[TMP1]], i8 0, i64 8, i1 false)
+; CHECK-NEXT:    ret void
+;
+  %a = alloca [ 4 x i32 ], align 8
+  %b = getelementptr i32, ptr %a, i32 1
+  call void @llvm.memset.p0.i64(ptr align 8 %b, i8 0, i64 12, i1 false)
+  call void @llvm.memcpy.p0.p0.i64(ptr %result, ptr align 8 %a, i64 12, i1 false)
+  ret void
+}
+
+define void @test_offset_memsetcpy(ptr %result) {
+; CHECK-LABEL: @test_offset_memsetcpy(
+; CHECK-NEXT:    [[A1:%.*]] = alloca [4 x i32], align 8
+; CHECK-NEXT:    [[A:%.*]] = getelementptr i32, ptr [[A1]], i32 1
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[A1]], i8 0, i64 12, i1 false)
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[RESULT:%.*]], i8 0, i64 8, i1 false)
+; CHECK-NEXT:    ret void
+;
+  %a = alloca [ 4 x i32 ], align 8
+  %b = getelementptr i32, ptr %a, i32 1
+  call void @llvm.memset.p0.i64(ptr align 8 %a, i8 0, i64 12, i1 false)
+  call void @llvm.memcpy.p0.p0.i64(ptr %result, ptr align 8 %b, i64 12, i1 false)
+  ret void
+}
+
+define void @test_two_memset(ptr %result) {
+; CHECK-LABEL: @test_two_memset(
+; CHECK-NEXT:    [[A:%.*]] = alloca [4 x i32], align 8
+; CHECK-NEXT:    [[B:%.*]] = getelementptr i32, ptr [[A]], i32 3
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[A]], i8 0, i64 12, i1 false)
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[B]], i8 1, i64 4, i1 false)
+; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr [[RESULT:%.*]], ptr align 8 [[A]], i64 16, i1 false)
+; CHECK-NEXT:    ret void
+;
+  %a = alloca [ 4 x i32 ], align 8
+  %b = getelementptr i32, ptr %a, i32 3
+  call void @llvm.memset.p0.i64(ptr align 8 %a, i8 0, i64 12, i1 false)
+  call void @llvm.memset.p0.i64(ptr align 8 %b, i8 1, i64 4, i1 false)
+  call void @llvm.memcpy.p0.p0.i64(ptr %result, ptr align 8 %a, i64 16, i1 false)
+  ret void
+}
+
 declare ptr @malloc(i64)
 declare void @free(ptr)
 
diff --git a/llvm/test/Transforms/MemCpyOpt/memset-memcpy-to-2x-memset.ll b/llvm/test/Transforms/MemCpyOpt/memset-memcpy-to-2x-memset.ll
index 47474e8dac051..dc55d0524ddfb 100644
--- a/llvm/test/Transforms/MemCpyOpt/memset-memcpy-to-2x-memset.ll
+++ b/llvm/test/Transforms/MemCpyOpt/memset-memcpy-to-2x-memset.ll
@@ -73,7 +73,7 @@ define void @test_different_source_gep(ptr %dst1, ptr %dst2, i8 %c) {
 ; CHECK-LABEL: @test_different_source_gep(
 ; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[DST1:%.*]], i8 [[C:%.*]], i64 128, i1 false)
 ; CHECK-NEXT:    [[P:%.*]] = getelementptr i8, ptr [[DST1]], i64 64
-; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr [[DST2:%.*]], ptr [[P]], i64 64, i1 false)
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[DST2:%.*]], i8 [[C]], i64 64, i1 false)
 ; CHECK-NEXT:    ret void
 ;
   call void @llvm.memset.p0.i64(ptr %dst1, i8 %c, i64 128, i1 false)
diff --git a/llvm/test/Transforms/MemCpyOpt/mixed-sizes.ll b/llvm/test/Transforms/MemCpyOpt/mixed-sizes.ll
index 5e13432746bf7..0e312bc42d463 100644
--- a/llvm/test/Transforms/MemCpyOpt/mixed-sizes.ll
+++ b/llvm/test/Transforms/MemCpyOpt/mixed-sizes.ll
@@ -19,7 +19,7 @@ define i32 @foo(i1 %z) {
 ; CHECK:       for.body3.lr.ph:
 ; CHECK-NEXT:    br label [[FOR_INC7_1]]
 ; CHECK:       for.inc7.1:
-; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[A]], ptr align 4 [[SCEVGEP]], i64 4, i1 false)
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 4 [[A]], i8 0, i64 4, i1 false)
 ; CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[A]], align 4
 ; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
diff --git a/llvm/test/Transforms/MemCpyOpt/variable-sized-memset-memcpy.ll b/llvm/test/Transforms/MemCpyOpt/variable-sized-memset-memcpy.ll
index a834d2465dfa5..d739d53d8a62c 100644
--- a/llvm/test/Transforms/MemCpyOpt/variable-sized-memset-memcpy.ll
+++ b/llvm/test/Transforms/MemCpyOpt/variable-sized-memset-memcpy.ll
@@ -18,13 +18,13 @@ define void @test(ptr %src, i8 %c, i64 %size) {
   ret void
 }
 
-; Differing sizes, so left as it is.
+; Differing sizes, but would be UB if size1 > size2
 define void @negative_test(ptr %src, i8 %c, i64 %size1, i64 %size2) {
 ; CHECK-LABEL: @negative_test(
 ; CHECK-NEXT:    [[DST1:%.*]] = alloca i8, i64 [[SIZE1:%.*]], align 1
 ; CHECK-NEXT:    [[DST2:%.*]] = alloca i8, i64 [[SIZE2:%.*]], align 1
 ; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[DST1]], i8 [[C:%.*]], i64 [[SIZE1]], i1 false)
-; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 8 [[DST2]], ptr align 8 [[DST1]], i64 [[SIZE2]], i1 false)
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[DST2]], i8 [[C]], i64 [[SIZE2]], i1 false)
 ; CHECK-NEXT:    ret void
 ;
   %dst1 = alloca i8, i64 %size1

Copy link

github-actions bot commented May 21, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@dtcxzyw dtcxzyw requested a review from dianqk May 22, 2025 03:01
@vtjnash
Copy link
Member Author

vtjnash commented May 28, 2025

Review request bump? I have some other improvements I'd like to propose (main...vtjnash:llvm-project:jn/memcpyopt-destonly), so this PR was a bit of a practice exercise for me to learn the optimization pass

@dianqk
Copy link
Member

dianqk commented May 28, 2025

I'll look into it this weekend. :3

Copy link
Member

@dianqk dianqk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible miscompile: https://llvm-compile-time-tracker.com/compare.php?from=2f66e5fcbad211e0d71b1a533071d31b36f088ec&to=0d828e6dcb86d886f1e5ee57ec7e71b945bfd36c&stat=instructions:u

Handle the there cases of Offset is difficult to read. How about splitting into two parts (with two PRs)?

  • If it's an undefined value, trim the start and end
  • Handle the case where Offset >= 0 here

@vtjnash
Copy link
Member Author

vtjnash commented Jun 2, 2025

Okay, I've trimmed down the PR to just add the code for handling Offset > 0 now

@vtjnash vtjnash force-pushed the jn/memcpyopt-memset-offset branch from 16bfef3 to bd2e887 Compare June 2, 2025 21:23
@vtjnash
Copy link
Member Author

vtjnash commented Jun 2, 2025

Second part (sizes aren't constant) is stacked on top of this PR, for once this is merged:
https://github.com/vtjnash/llvm-project/compare/jn/memcpyopt-memset-offset...vtjnash:llvm-project:jn/memcpyopt-memset-nonconst-offset?expand=1

Third part (MOffset < 0) seems potentially too annoying to be worth doing, unless you want it. But maybe it is worth just doing the memcpy->memset replacement, without also trying to computed the optimized the start and end points?

@vtjnash vtjnash force-pushed the jn/memcpyopt-memset-offset branch from bd2e887 to bb5fdef Compare June 2, 2025 21:33
Copy link
Member

@dianqk dianqk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please check the new comments before merging.

auto *CCopySize = dyn_cast<ConstantInt>(CopySize);
if (!CCopySize)
return false;
if (CCopySize->getZExtValue() > CMemSetSize->getZExtValue()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also reduce the diff here? How about just modifying the comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the helper function makes this function more readable, and the helper function should be more generally useful, since it implements a check that other functions in this file should also implement. In particular, I'm wanting to use this in processMemCpyMemCpyDependence next, since the purpose of this PR was to make those two functions start to converge in functionality and implementation. But I can do that refactoring as a separate PR if you prefer.

Copy link
Member

@dianqk dianqk Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, making a new function is the more general approach when sharing code or when a piece of code grows significantly in size.
Also, in my mind, the better approach is:

--- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
+++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
@@ -1790,6 +1790,9 @@ bool MemCpyOptPass::processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI) {
           }
         }
       }
+      if (truncateUndefs(M, MI)) {
+        return true;
+      }
       if (auto *MDep = dyn_cast<MemCpyInst>(MI))
         if (processMemCpyMemCpyDependence(M, MDep, BAA))
           return true;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this helper function should also be used in performStackMoveOptzn, I just didn't want to add additional improvements that re-use the logic to the current PR. I have a list of other TODOs to improve in this file (such as 827019c#diff-afb2b04eff951cb67d214bc6dfa62a087a72591451817dba7929b48c5fe635f9R1530) but they're blocked on getting this PR to progress

@vtjnash
Copy link
Member Author

vtjnash commented Jun 9, 2025

before merging

I don't have merge rights anymore (I wasn't active enough when they were reviewing the list of people with access, though I can ask for it to be reinstated if you'd prefer that I merge)

Copy link
Member

@dianqk dianqk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have merge rights anymore (I wasn't active enough when they were reviewing the list of people with access, though I can ask for it to be reinstated if you'd prefer that I merge)

Feel free to request :)
And I can merge it after your response.

auto *CCopySize = dyn_cast<ConstantInt>(CopySize);
if (!CCopySize)
return false;
if (CCopySize->getZExtValue() > CMemSetSize->getZExtValue()) {
Copy link
Member

@dianqk dianqk Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, making a new function is the more general approach when sharing code or when a piece of code grows significantly in size.
Also, in my mind, the better approach is:

--- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
+++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
@@ -1790,6 +1790,9 @@ bool MemCpyOptPass::processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI) {
           }
         }
       }
+      if (truncateUndefs(M, MI)) {
+        return true;
+      }
       if (auto *MDep = dyn_cast<MemCpyInst>(MI))
         if (processMemCpyMemCpyDependence(M, MDep, BAA))
           return true;

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks like a nice extension.

@@ -1403,6 +1404,24 @@ static bool hasUndefContents(MemorySSA *MSSA, BatchAAResults &AA, Value *V,
return false;
}

static bool coversInputFully(MemorySSA *MSSA, MemCpyInst *MemCpy,
MemIntrinsic *MemSrc, BatchAAResults &BAA) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this helper seems quite misleading. The corresponding boolean was previously called CanReduceSize, which is what this is actually checking (can we reduce CopySize to the size written by MemSrc).

Copy link
Member Author

@vtjnash vtjnash Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CanReduceSize was badly named, since it is not what it was for checking (which is whether the MemCpy operation is fully covered (def or undef) by the MemSrc operation). It was what it was incidentally implemented to do with the result, but I have a new commit ready to remove that unnecessary restriction on the implementation and only leave the check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get what followup change you have in mind, but if you think canReduceSize is inaccurate, then maybe something along the lines of overreadIsUndef?

I don't think the new name inputFullyCoveredBySrc is really better than the old, because this function is specifically detecting the case where the input is not fully covered by src (it is partially covered by src and partially by undef initialization)...

Copy link
Member Author

@vtjnash vtjnash Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CanReduceSize bool is actually a property of having both sizes be constants (the other conditional here). If they aren't constants, then you cannot reduce the size, though this query and this optimization are still okay

61d16aa

@nikic nikic changed the title [memcpyopt] handle memcpy from memset in more cases [MemCpyOpt] handle memcpy from memset in more cases Jun 11, 2025
@nikic nikic merged commit 7460c70 into llvm:main Jun 11, 2025
7 checks passed
rorth pushed a commit to rorth/llvm-project that referenced this pull request Jun 11, 2025
This aims to reduce the divergence between the initial checks in this
function and processMemCpyMemCpyDependence (in particular, adding
handling of offsets), with the goal to eventually reduce duplication
there and improve this pass in other ways.
vtjnash added a commit that referenced this pull request Jun 12, 2025
Allows forwarding memset to memcpy for mismatching unknown sizes if
overread has undef contents. In that case we can refine the undef bytes
to the memset value.

Refs #140954 which laid some of the groundwork for this.
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
This aims to reduce the divergence between the initial checks in this
function and processMemCpyMemCpyDependence (in particular, adding
handling of offsets), with the goal to eventually reduce duplication
there and improve this pass in other ways.
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
…3727)

Allows forwarding memset to memcpy for mismatching unknown sizes if
overread has undef contents. In that case we can refine the undef bytes
to the memset value.

Refs llvm#140954 which laid some of the groundwork for this.
vtjnash added a commit that referenced this pull request Jun 18, 2025
…Dependence (#143745)

Allows memcpy to memcpy forwarding in cases where the second memcpy is
larger, but the overread is known to be undef, by shrinking the memcpy
size.

Refs #140954 which laid some of
the groundwork for this.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 18, 2025
…emCpyMemCpyDependence (#143745)

Allows memcpy to memcpy forwarding in cases where the second memcpy is
larger, but the overread is known to be undef, by shrinking the memcpy
size.

Refs llvm/llvm-project#140954 which laid some of
the groundwork for this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants