[MemCpyOpt] handle memcpy from memset in more cases #140954

vtjnash · 2025-05-21T19:15:08Z

This aims to reduce the divergence between the initial checks in this function and processMemCpyMemCpyDependence (in particular, adding handling of offsets), with the goal to eventually reduce duplication there and improve this pass in other ways.

This aims to reduce the divergence between this function and processMemCpyMemCpyDependence, with the goal to eventually reduce duplication here and combine them and improve this pass.

llvmbot · 2025-05-21T19:15:44Z

@llvm/pr-subscribers-llvm-transforms

Author: Jameson Nash (vtjnash)

Changes

This aims to reduce the divergence between the initial checks in this function and processMemCpyMemCpyDependence (in particular, adding handling of offsets and not requiring known sizes), with the goal to eventually reduce duplication there and improve this pass in other ways.

Full diff: https://github.com/llvm/llvm-project/pull/140954.diff

6 Files Affected:

(modified) llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp (+67-36)
(modified) llvm/test/Transforms/MemCpyOpt/lifetime-missing.ll (+5-1)
(modified) llvm/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll (+48)
(modified) llvm/test/Transforms/MemCpyOpt/memset-memcpy-to-2x-memset.ll (+1-1)
(modified) llvm/test/Transforms/MemCpyOpt/mixed-sizes.ll (+1-1)
(modified) llvm/test/Transforms/MemCpyOpt/variable-sized-memset-memcpy.ll (+2-2)

diff --git a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
index 9c8cd53d56b56..95d956f18aa69 100644
--- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
+++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
@@ -1367,8 +1367,9 @@ bool MemCpyOptPass::processMemSetMemCpyDependence(MemCpyInst *MemCpy,
   return true;
 }
 
-/// Determine whether the instruction has undefined content for the given Size,
-/// either because it was freshly alloca'd or started its lifetime.
+/// Determine whether the pointer V had only undefined content from Def up to
+/// the given Size, either because it was freshly alloca'd or started its
+/// lifetime.
 static bool hasUndefContents(MemorySSA *MSSA, BatchAAResults &AA, Value *V,
                              MemoryDef *Def, Value *Size) {
   if (MSSA->isLiveOnEntryDef(Def))
@@ -1403,6 +1404,24 @@ static bool hasUndefContents(MemorySSA *MSSA, BatchAAResults &AA, Value *V,
   return false;
 }
 
+static bool coversInputFully(MemorySSA *MSSA, MemCpyInst *MemCpy,
+                             MemIntrinsic *MemSrc, BatchAAResults &BAA) {
+  // If the memcpy is larger than the previous, but the memory was undef prior
+  // to that, we can just ignore the tail. Technically we're only
+  // interested in the bytes from 0..MemSrcOffset and
+  // MemSrcLength+MemSrcOffset..CopySize here, but as we can't easily represent
+  // this location, we use the full 0..CopySize range.
+  Value *CopySize = MemCpy->getLength();
+  MemoryLocation MemCpyLoc = MemoryLocation::getForSource(MemCpy);
+  MemoryUseOrDef *MemSrcAccess = MSSA->getMemoryAccess(MemSrc);
+  MemoryAccess *Clobber = MSSA->getWalker()->getClobberingMemoryAccess(
+      MemSrcAccess->getDefiningAccess(), MemCpyLoc, BAA);
+  if (auto *MD = dyn_cast<MemoryDef>(Clobber))
+    if (hasUndefContents(MSSA, BAA, MemCpy->getSource(), MD, CopySize))
+      return true;
+  return false;
+}
+
 /// Transform memcpy to memset when its source was just memset.
 /// In other words, turn:
 /// \code
@@ -1418,51 +1437,63 @@ static bool hasUndefContents(MemorySSA *MSSA, BatchAAResults &AA, Value *V,
 bool MemCpyOptPass::performMemCpyToMemSetOptzn(MemCpyInst *MemCpy,
                                                MemSetInst *MemSet,
                                                BatchAAResults &BAA) {
-  // Make sure that memcpy(..., memset(...), ...), that is we are memsetting and
-  // memcpying from the same address. Otherwise it is hard to reason about.
-  if (!BAA.isMustAlias(MemSet->getRawDest(), MemCpy->getRawSource()))
-    return false;
-
   Value *MemSetSize = MemSet->getLength();
   Value *CopySize = MemCpy->getLength();
 
-  if (MemSetSize != CopySize) {
-    // Make sure the memcpy doesn't read any more than what the memset wrote.
-    // Don't worry about sizes larger than i64.
-
-    // A known memset size is required.
-    auto *CMemSetSize = dyn_cast<ConstantInt>(MemSetSize);
-    if (!CMemSetSize)
+  int64_t MOffset = 0;
+  const DataLayout &DL = MemCpy->getModule()->getDataLayout();
+  // We can only transforms memcpy's where the dest of one is the source of the
+  // other, or they have a known offset.
+  if (MemCpy->getSource() != MemSet->getDest()) {
+    std::optional<int64_t> Offset =
+        MemCpy->getSource()->getPointerOffsetFrom(MemSet->getDest(), DL);
+    if (!Offset)
       return false;
+    MOffset = *Offset;
+  }
 
-    // A known memcpy size is also required.
+  MaybeAlign MDestAlign = MemCpy->getDestAlign();
+  int64_t MOffsetAligned = MDestAlign.valueOrOne().value() > 1 && MOffset < 0 ? -(-MOffset & ~(MDestAlign.valueOrOne().value() - 1)) : MOffset; // Compute the MOffset that keeps MDest aligned (truncate towards zero)
+  if (MOffset != 0 || MemSetSize != CopySize) {
+    // Make sure the memcpy doesn't read any more than what the memset wrote, other than undef.
+    auto *CMemSetSize = dyn_cast<ConstantInt>(MemSetSize);
     auto *CCopySize = dyn_cast<ConstantInt>(CopySize);
-    if (!CCopySize)
-      return false;
-    if (CCopySize->getZExtValue() > CMemSetSize->getZExtValue()) {
-      // If the memcpy is larger than the memset, but the memory was undef prior
-      // to the memset, we can just ignore the tail. Technically we're only
-      // interested in the bytes from MemSetSize..CopySize here, but as we can't
-      // easily represent this location, we use the full 0..CopySize range.
-      MemoryLocation MemCpyLoc = MemoryLocation::getForSource(MemCpy);
-      bool CanReduceSize = false;
-      MemoryUseOrDef *MemSetAccess = MSSA->getMemoryAccess(MemSet);
-      MemoryAccess *Clobber = MSSA->getWalker()->getClobberingMemoryAccess(
-          MemSetAccess->getDefiningAccess(), MemCpyLoc, BAA);
-      if (auto *MD = dyn_cast<MemoryDef>(Clobber))
-        if (hasUndefContents(MSSA, BAA, MemCpy->getSource(), MD, CopySize))
-          CanReduceSize = true;
-
-      if (!CanReduceSize)
+    // Don't worry about sizes larger than i64.
+    if (!CMemSetSize || !CCopySize || MOffset < 0 ||
+        CCopySize->getZExtValue() + MOffset > CMemSetSize->getZExtValue()) {
+      if (!coversInputFully(MSSA, MemCpy, MemSet, BAA))
         return false;
-      CopySize = MemSetSize;
+
+      if (CMemSetSize && CCopySize) {
+        // If both have constant sizes and offsets, clip the memcpy to the bounds of the memset if applicable.
+        if (CCopySize->getZExtValue() + std::abs(MOffset) > CMemSetSize->getZExtValue()) {
+          if (MOffsetAligned == 0 || (MOffset < 0 && CCopySize->getZExtValue() + MOffset > CMemSetSize->getZExtValue()))
+            CopySize = MemSetSize;
+          else
+            CopySize = ConstantInt::get(CopySize->getType(), std::max((int64_t)0, (int64_t)(CMemSetSize->getZExtValue() - std::abs(MOffsetAligned))));
+        }
+        else if (MOffsetAligned < 0) {
+          // Even if CMemSetSize isn't known, if the MOffsetAligned is negative, make sure to clip the new memset
+          CopySize = ConstantInt::get(CopySize->getType(), CCopySize->getZExtValue() + MOffsetAligned);
+        }
+      }
+      else if (CCopySize && MOffsetAligned < 0) {
+        // Even if CMemSetSize isn't known, if the MOffsetAligned is negative, can still clip the new memset
+        CopySize = ConstantInt::get(CopySize->getType(), CCopySize->getZExtValue() + MOffsetAligned);
+      }
+      else {
+        MOffsetAligned = 0;
+      }
     }
   }
 
   IRBuilder<> Builder(MemCpy);
+  Value *MDest = MemCpy->getRawDest();
+  if (MOffsetAligned < 0)
+    MDest = Builder.CreateInBoundsPtrAdd(MDest, Builder.getInt64(-MOffsetAligned));
   Instruction *NewM =
-      Builder.CreateMemSet(MemCpy->getRawDest(), MemSet->getOperand(1),
-                           CopySize, MemCpy->getDestAlign());
+      Builder.CreateMemSet(MDest, MemSet->getOperand(1),
+                           CopySize, MDestAlign);
   auto *LastDef = cast<MemoryDef>(MSSA->getMemoryAccess(MemCpy));
   auto *NewAccess = MSSAU->createMemoryAccessAfter(NewM, nullptr, LastDef);
   MSSAU->insertDef(cast<MemoryDef>(NewAccess), /*RenameUses=*/true);
@@ -1683,7 +1714,7 @@ bool MemCpyOptPass::performStackMoveOptzn(Instruction *Load, Instruction *Store,
     I->setMetadata(LLVMContext::MD_tbaa_struct, nullptr);
   }
 
-  LLVM_DEBUG(dbgs() << "Stack Move: Performed staack-move optimization\n");
+  LLVM_DEBUG(dbgs() << "Stack Move: Performed stack-move optimization\n");
   NumStackMove++;
   return true;
 }
diff --git a/llvm/test/Transforms/MemCpyOpt/lifetime-missing.ll b/llvm/test/Transforms/MemCpyOpt/lifetime-missing.ll
index 0626f09702f7e..3ad49a63b357a 100644
--- a/llvm/test/Transforms/MemCpyOpt/lifetime-missing.ll
+++ b/llvm/test/Transforms/MemCpyOpt/lifetime-missing.ll
@@ -14,8 +14,12 @@ define void @test() {
 ; CHECK-LABEL: define void @test() {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[AGG_TMP_SROA_14:%.*]] = alloca [20 x i8], align 4
-; CHECK-NEXT:    [[AGG_TMP_SROA_14_128_SROA_IDX:%.*]] = getelementptr i8, ptr [[AGG_TMP_SROA_14]], i64 4
+; CHECK-NEXT:    [[AGG_TMP_SROA_15:%.*]] = alloca [20 x i8], align 4
+; CHECK-NEXT:    [[AGG_TMP_SROA_14_128_SROA_IDX:%.*]] = getelementptr i8, ptr [[AGG_TMP_SROA_15]], i64 4
 ; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[AGG_TMP_SROA_14_128_SROA_IDX]], i8 0, i64 1, i1 false)
+; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 20, ptr [[AGG_TMP_SROA_14]])
+; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds i8, ptr [[AGG_TMP_SROA_14]], i64 4
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[TMP0]], i8 0, i64 1, i1 false)
 ; CHECK-NEXT:    [[AGG_TMP3_SROA_35_128_SROA_IDX:%.*]] = getelementptr i8, ptr [[AGG_TMP_SROA_14]], i64 4
 ; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr inttoptr (i64 4 to ptr), i8 0, i64 1, i1 false)
 ; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr null, i8 0, i64 1, i1 false)
diff --git a/llvm/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll b/llvm/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll
index 1c3896407e950..424cdde5fa780 100644
--- a/llvm/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll
+++ b/llvm/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll
@@ -187,6 +187,54 @@ define void @test_write_before_memset_in_both_regions(ptr %result) {
   ret void
 }
 
+define void @test_offset_memset(ptr %result) {
+; CHECK-LABEL: @test_offset_memset(
+; CHECK-NEXT:    [[A1:%.*]] = alloca [4 x i32], align 8
+; CHECK-NEXT:    [[A:%.*]] = getelementptr i32, ptr [[A1]], i32 1
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[A]], i8 0, i64 12, i1 false)
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i8, ptr [[RESULT:%.*]], i64 4
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[TMP1]], i8 0, i64 8, i1 false)
+; CHECK-NEXT:    ret void
+;
+  %a = alloca [ 4 x i32 ], align 8
+  %b = getelementptr i32, ptr %a, i32 1
+  call void @llvm.memset.p0.i64(ptr align 8 %b, i8 0, i64 12, i1 false)
+  call void @llvm.memcpy.p0.p0.i64(ptr %result, ptr align 8 %a, i64 12, i1 false)
+  ret void
+}
+
+define void @test_offset_memsetcpy(ptr %result) {
+; CHECK-LABEL: @test_offset_memsetcpy(
+; CHECK-NEXT:    [[A1:%.*]] = alloca [4 x i32], align 8
+; CHECK-NEXT:    [[A:%.*]] = getelementptr i32, ptr [[A1]], i32 1
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[A1]], i8 0, i64 12, i1 false)
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[RESULT:%.*]], i8 0, i64 8, i1 false)
+; CHECK-NEXT:    ret void
+;
+  %a = alloca [ 4 x i32 ], align 8
+  %b = getelementptr i32, ptr %a, i32 1
+  call void @llvm.memset.p0.i64(ptr align 8 %a, i8 0, i64 12, i1 false)
+  call void @llvm.memcpy.p0.p0.i64(ptr %result, ptr align 8 %b, i64 12, i1 false)
+  ret void
+}
+
+define void @test_two_memset(ptr %result) {
+; CHECK-LABEL: @test_two_memset(
+; CHECK-NEXT:    [[A:%.*]] = alloca [4 x i32], align 8
+; CHECK-NEXT:    [[B:%.*]] = getelementptr i32, ptr [[A]], i32 3
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[A]], i8 0, i64 12, i1 false)
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[B]], i8 1, i64 4, i1 false)
+; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr [[RESULT:%.*]], ptr align 8 [[A]], i64 16, i1 false)
+; CHECK-NEXT:    ret void
+;
+  %a = alloca [ 4 x i32 ], align 8
+  %b = getelementptr i32, ptr %a, i32 3
+  call void @llvm.memset.p0.i64(ptr align 8 %a, i8 0, i64 12, i1 false)
+  call void @llvm.memset.p0.i64(ptr align 8 %b, i8 1, i64 4, i1 false)
+  call void @llvm.memcpy.p0.p0.i64(ptr %result, ptr align 8 %a, i64 16, i1 false)
+  ret void
+}
+
 declare ptr @malloc(i64)
 declare void @free(ptr)
 
diff --git a/llvm/test/Transforms/MemCpyOpt/memset-memcpy-to-2x-memset.ll b/llvm/test/Transforms/MemCpyOpt/memset-memcpy-to-2x-memset.ll
index 47474e8dac051..dc55d0524ddfb 100644
--- a/llvm/test/Transforms/MemCpyOpt/memset-memcpy-to-2x-memset.ll
+++ b/llvm/test/Transforms/MemCpyOpt/memset-memcpy-to-2x-memset.ll
@@ -73,7 +73,7 @@ define void @test_different_source_gep(ptr %dst1, ptr %dst2, i8 %c) {
 ; CHECK-LABEL: @test_different_source_gep(
 ; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[DST1:%.*]], i8 [[C:%.*]], i64 128, i1 false)
 ; CHECK-NEXT:    [[P:%.*]] = getelementptr i8, ptr [[DST1]], i64 64
-; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr [[DST2:%.*]], ptr [[P]], i64 64, i1 false)
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[DST2:%.*]], i8 [[C]], i64 64, i1 false)
 ; CHECK-NEXT:    ret void
 ;
   call void @llvm.memset.p0.i64(ptr %dst1, i8 %c, i64 128, i1 false)
diff --git a/llvm/test/Transforms/MemCpyOpt/mixed-sizes.ll b/llvm/test/Transforms/MemCpyOpt/mixed-sizes.ll
index 5e13432746bf7..0e312bc42d463 100644
--- a/llvm/test/Transforms/MemCpyOpt/mixed-sizes.ll
+++ b/llvm/test/Transforms/MemCpyOpt/mixed-sizes.ll
@@ -19,7 +19,7 @@ define i32 @foo(i1 %z) {
 ; CHECK:       for.body3.lr.ph:
 ; CHECK-NEXT:    br label [[FOR_INC7_1]]
 ; CHECK:       for.inc7.1:
-; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[A]], ptr align 4 [[SCEVGEP]], i64 4, i1 false)
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 4 [[A]], i8 0, i64 4, i1 false)
 ; CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[A]], align 4
 ; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
diff --git a/llvm/test/Transforms/MemCpyOpt/variable-sized-memset-memcpy.ll b/llvm/test/Transforms/MemCpyOpt/variable-sized-memset-memcpy.ll
index a834d2465dfa5..d739d53d8a62c 100644
--- a/llvm/test/Transforms/MemCpyOpt/variable-sized-memset-memcpy.ll
+++ b/llvm/test/Transforms/MemCpyOpt/variable-sized-memset-memcpy.ll
@@ -18,13 +18,13 @@ define void @test(ptr %src, i8 %c, i64 %size) {
   ret void
 }
 
-; Differing sizes, so left as it is.
+; Differing sizes, but would be UB if size1 > size2
 define void @negative_test(ptr %src, i8 %c, i64 %size1, i64 %size2) {
 ; CHECK-LABEL: @negative_test(
 ; CHECK-NEXT:    [[DST1:%.*]] = alloca i8, i64 [[SIZE1:%.*]], align 1
 ; CHECK-NEXT:    [[DST2:%.*]] = alloca i8, i64 [[SIZE2:%.*]], align 1
 ; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[DST1]], i8 [[C:%.*]], i64 [[SIZE1]], i1 false)
-; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 8 [[DST2]], ptr align 8 [[DST1]], i64 [[SIZE2]], i1 false)
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[DST2]], i8 [[C]], i64 [[SIZE2]], i1 false)
 ; CHECK-NEXT:    ret void
 ;
   %dst1 = alloca i8, i64 %size1

github-actions · 2025-05-21T19:18:10Z

✅ With the latest revision this PR passed the C/C++ code formatter.

vtjnash · 2025-05-28T18:41:37Z

Review request bump? I have some other improvements I'd like to propose (main...vtjnash:llvm-project:jn/memcpyopt-destonly), so this PR was a bit of a practice exercise for me to learn the optimization pass

dianqk · 2025-05-28T21:19:25Z

I'll look into it this weekend. :3

dianqk

Possible miscompile: https://llvm-compile-time-tracker.com/compare.php?from=2f66e5fcbad211e0d71b1a533071d31b36f088ec&to=0d828e6dcb86d886f1e5ee57ec7e71b945bfd36c&stat=instructions:u

Handle the there cases of Offset is difficult to read. How about splitting into two parts (with two PRs)?

If it's an undefined value, trim the start and end
Handle the case where Offset >= 0 here

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

llvm/test/Transforms/MemCpyOpt/variable-sized-memset-memcpy.ll

llvm/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

llvm/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll

vtjnash · 2025-06-02T21:12:32Z

Okay, I've trimmed down the PR to just add the code for handling Offset > 0 now

vtjnash · 2025-06-02T21:32:20Z

Second part (sizes aren't constant) is stacked on top of this PR, for once this is merged:
https://github.com/vtjnash/llvm-project/compare/jn/memcpyopt-memset-offset...vtjnash:llvm-project:jn/memcpyopt-memset-nonconst-offset?expand=1

Third part (MOffset < 0) seems potentially too annoying to be worth doing, unless you want it. But maybe it is worth just doing the memcpy->memset replacement, without also trying to computed the optimized the start and end points?

dianqk

LGTM. Please check the new comments before merging.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

dianqk · 2025-06-09T14:13:51Z

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

    auto *CCopySize = dyn_cast<ConstantInt>(CopySize);
    if (!CCopySize)
      return false;
-    if (CCopySize->getZExtValue() > CMemSetSize->getZExtValue()) {


Should we also reduce the diff here? How about just modifying the comments?

I think the helper function makes this function more readable, and the helper function should be more generally useful, since it implements a check that other functions in this file should also implement. In particular, I'm wanting to use this in processMemCpyMemCpyDependence next, since the purpose of this PR was to make those two functions start to converge in functionality and implementation. But I can do that refactoring as a separate PR if you prefer.

In my opinion, making a new function is the more general approach when sharing code or when a piece of code grows significantly in size.
Also, in my mind, the better approach is:

--- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp +++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp @@ -1790,6 +1790,9 @@ bool MemCpyOptPass::processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI) { } } } + if (truncateUndefs(M, MI)) { + return true; + } if (auto *MDep = dyn_cast<MemCpyInst>(MI)) if (processMemCpyMemCpyDependence(M, MDep, BAA)) return true;

I believe this helper function should also be used in performStackMoveOptzn, I just didn't want to add additional improvements that re-use the logic to the current PR. I have a list of other TODOs to improve in this file (such as 827019c#diff-afb2b04eff951cb67d214bc6dfa62a087a72591451817dba7929b48c5fe635f9R1530) but they're blocked on getting this PR to progress

vtjnash · 2025-06-09T15:25:40Z

before merging

I don't have merge rights anymore (I wasn't active enough when they were reviewing the list of people with access, though I can ask for it to be reinstated if you'd prefer that I merge)

dianqk

I don't have merge rights anymore (I wasn't active enough when they were reviewing the list of people with access, though I can ask for it to be reinstated if you'd prefer that I merge)

Feel free to request :)
And I can merge it after your response.

dianqk · 2025-06-10T04:28:32Z

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

    auto *CCopySize = dyn_cast<ConstantInt>(CopySize);
    if (!CCopySize)
      return false;
-    if (CCopySize->getZExtValue() > CMemSetSize->getZExtValue()) {


In my opinion, making a new function is the more general approach when sharing code or when a piece of code grows significantly in size.
Also, in my mind, the better approach is:

--- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp +++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp @@ -1790,6 +1790,9 @@ bool MemCpyOptPass::processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI) { } } } + if (truncateUndefs(M, MI)) { + return true; + } if (auto *MDep = dyn_cast<MemCpyInst>(MI)) if (processMemCpyMemCpyDependence(M, MDep, BAA)) return true;

nikic

Thanks, this looks like a nice extension.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

nikic · 2025-06-10T15:02:35Z

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

@@ -1403,6 +1404,24 @@ static bool hasUndefContents(MemorySSA *MSSA, BatchAAResults &AA, Value *V,
  return false;
 }

+static bool coversInputFully(MemorySSA *MSSA, MemCpyInst *MemCpy,
+                             MemIntrinsic *MemSrc, BatchAAResults &BAA) {


The name of this helper seems quite misleading. The corresponding boolean was previously called CanReduceSize, which is what this is actually checking (can we reduce CopySize to the size written by MemSrc).

CanReduceSize was badly named, since it is not what it was for checking (which is whether the MemCpy operation is fully covered (def or undef) by the MemSrc operation). It was what it was incidentally implemented to do with the result, but I have a new commit ready to remove that unnecessary restriction on the implementation and only leave the check

I don't really get what followup change you have in mind, but if you think canReduceSize is inaccurate, then maybe something along the lines of overreadIsUndef?

I don't think the new name inputFullyCoveredBySrc is really better than the old, because this function is specifically detecting the case where the input is not fully covered by src (it is partially covered by src and partially by undef initialization)...

The CanReduceSize bool is actually a property of having both sizes be constants (the other conditional here). If they aren't constants, then you cannot reduce the size, though this query and this optimization are still okay

61d16aa

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

llvm/test/Transforms/MemCpyOpt/memset-memcpy-to-2x-memset.ll

This aims to reduce the divergence between the initial checks in this function and processMemCpyMemCpyDependence (in particular, adding handling of offsets), with the goal to eventually reduce duplication there and improve this pass in other ways.

Allows forwarding memset to memcpy for mismatching unknown sizes if overread has undef contents. In that case we can refine the undef bytes to the memset value. Refs #140954 which laid some of the groundwork for this.

This aims to reduce the divergence between the initial checks in this function and processMemCpyMemCpyDependence (in particular, adding handling of offsets), with the goal to eventually reduce duplication there and improve this pass in other ways.

…3727) Allows forwarding memset to memcpy for mismatching unknown sizes if overread has undef contents. In that case we can refine the undef bytes to the memset value. Refs llvm#140954 which laid some of the groundwork for this.

…Dependence (#143745) Allows memcpy to memcpy forwarding in cases where the second memcpy is larger, but the overread is known to be undef, by shrinking the memcpy size. Refs #140954 which laid some of the groundwork for this.

…emCpyMemCpyDependence (#143745) Allows memcpy to memcpy forwarding in cases where the second memcpy is larger, but the overread is known to be undef, by shrinking the memcpy size. Refs llvm/llvm-project#140954 which laid some of the groundwork for this.

[memcpyopt] handle memcpy from memset in more cases

0d828e6

This aims to reduce the divergence between this function and processMemCpyMemCpyDependence, with the goal to eventually reduce duplication here and combine them and improve this pass.

vtjnash requested a review from nikic as a code owner May 21, 2025 19:15

llvmbot added the llvm:transforms label May 21, 2025

vtjnash added the julialang label May 21, 2025

dtcxzyw requested a review from dianqk May 22, 2025 03:01

dianqk reviewed Jun 2, 2025

View reviewed changes

vtjnash force-pushed the jn/memcpyopt-memset-offset branch from 16bfef3 to bd2e887 Compare June 2, 2025 21:23

vtjnash force-pushed the jn/memcpyopt-memset-offset branch from bd2e887 to bb5fdef Compare June 2, 2025 21:33

remove handling of non-constant sizes and offsets < 0

9b678ea

vtjnash force-pushed the jn/memcpyopt-memset-offset branch from bb5fdef to 9b678ea Compare June 2, 2025 21:33

dianqk mentioned this pull request Jun 7, 2025

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

dtcxzyw mentioned this pull request Jun 7, 2025

pre-commit: PR140954 dtcxzyw/llvm-opt-benchmark#2408

Open

dianqk approved these changes Jun 9, 2025

View reviewed changes

remove NFC

ec254a3

dianqk reviewed Jun 10, 2025

View reviewed changes

vtjnash mentioned this pull request Jun 10, 2025

Request Commit Access For vtjnash again #143534

Closed

nikic reviewed Jun 10, 2025

View reviewed changes

vtjnash added 2 commits June 10, 2025 15:53

cleanup

a32941a

rename again

0759e08

nikic changed the title ~~[memcpyopt] handle memcpy from memset in more cases~~ [MemCpyOpt] handle memcpy from memset in more cases Jun 11, 2025

nikic merged commit 7460c70 into llvm:main Jun 11, 2025
7 checks passed

vtjnash mentioned this pull request Jun 11, 2025

[MemCpyOpt] handle memcpy from memset for non-constant sizes #143727

Merged

vtjnash mentioned this pull request Jun 11, 2025

[MemCpyOpt] allow some undef contents overread in processMemCpyMemCpyDependence #143745

Merged

arjunUpatel mentioned this pull request Jun 24, 2025

[lldb] Revive TestSimulatorPlatform.py (#142244) arjunUpatel/llvm-project#1

Closed

[MemCpyOpt] handle memcpy from memset in more cases #140954

[MemCpyOpt] handle memcpy from memset in more cases #140954

Uh oh!

Conversation

vtjnash commented May 21, 2025 • edited by nikic Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented May 21, 2025

Uh oh!

github-actions bot commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vtjnash commented May 28, 2025

Uh oh!

dianqk commented May 28, 2025

Uh oh!

dianqk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vtjnash commented Jun 2, 2025

Uh oh!

vtjnash commented Jun 2, 2025

Uh oh!

dianqk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dianqk Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

vtjnash Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

dianqk Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vtjnash Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

vtjnash commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dianqk left a comment

Choose a reason for hiding this comment

Uh oh!

dianqk Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nikic Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

vtjnash Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikic Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

vtjnash Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vtjnash commented May 21, 2025 •

edited by nikic

Loading

github-actions bot commented May 21, 2025 •

edited

Loading

dianqk Jun 10, 2025 •

edited

Loading

vtjnash commented Jun 9, 2025 •

edited

Loading

dianqk Jun 10, 2025 •

edited

Loading

vtjnash Jun 10, 2025 •

edited

Loading

vtjnash Jun 10, 2025 •

edited

Loading