Skip to content

[MemCpyOpt] handle memcpy from memset in more cases #140954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 47 additions & 27 deletions llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1367,8 +1367,9 @@ bool MemCpyOptPass::processMemSetMemCpyDependence(MemCpyInst *MemCpy,
return true;
}

/// Determine whether the instruction has undefined content for the given Size,
/// either because it was freshly alloca'd or started its lifetime.
/// Determine whether the pointer V had only undefined content (due to Def) up
/// to the given Size, either because it was freshly alloca'd or started its
/// lifetime.
static bool hasUndefContents(MemorySSA *MSSA, BatchAAResults &AA, Value *V,
MemoryDef *Def, Value *Size) {
if (MSSA->isLiveOnEntryDef(Def))
Expand Down Expand Up @@ -1403,6 +1404,24 @@ static bool hasUndefContents(MemorySSA *MSSA, BatchAAResults &AA, Value *V,
return false;
}

// If the memcpy is larger than the previous, but the memory was undef prior to
// that, we can just ignore the tail. Technically we're only interested in the
// bytes from 0..MemSrcOffset and MemSrcLength+MemSrcOffset..CopySize here, but
// as we can't easily represent this location (hasUndefContents uses mustAlias
// which cannot deal with offsets), we use the full 0..CopySize range.
static bool overreadUndefContents(MemorySSA *MSSA, MemCpyInst *MemCpy,
MemIntrinsic *MemSrc, BatchAAResults &BAA) {
Value *CopySize = MemCpy->getLength();
MemoryLocation MemCpyLoc = MemoryLocation::getForSource(MemCpy);
MemoryUseOrDef *MemSrcAccess = MSSA->getMemoryAccess(MemSrc);
MemoryAccess *Clobber = MSSA->getWalker()->getClobberingMemoryAccess(
MemSrcAccess->getDefiningAccess(), MemCpyLoc, BAA);
if (auto *MD = dyn_cast<MemoryDef>(Clobber))
if (hasUndefContents(MSSA, BAA, MemCpy->getSource(), MD, CopySize))
return true;
return false;
}

/// Transform memcpy to memset when its source was just memset.
/// In other words, turn:
/// \code
Expand All @@ -1418,19 +1437,25 @@ static bool hasUndefContents(MemorySSA *MSSA, BatchAAResults &AA, Value *V,
bool MemCpyOptPass::performMemCpyToMemSetOptzn(MemCpyInst *MemCpy,
MemSetInst *MemSet,
BatchAAResults &BAA) {
// Make sure that memcpy(..., memset(...), ...), that is we are memsetting and
// memcpying from the same address. Otherwise it is hard to reason about.
if (!BAA.isMustAlias(MemSet->getRawDest(), MemCpy->getRawSource()))
return false;

Value *MemSetSize = MemSet->getLength();
Value *CopySize = MemCpy->getLength();

if (MemSetSize != CopySize) {
// Make sure the memcpy doesn't read any more than what the memset wrote.
// Don't worry about sizes larger than i64.
int64_t MOffset = 0;
const DataLayout &DL = MemCpy->getModule()->getDataLayout();
// We can only transforms memcpy's where the dest of one is the source of the
// other, or the memory transfer has a known offset from the memset.
if (MemCpy->getSource() != MemSet->getDest()) {
std::optional<int64_t> Offset =
MemCpy->getSource()->getPointerOffsetFrom(MemSet->getDest(), DL);
if (!Offset || *Offset < 0)
return false;
MOffset = *Offset;
}

// A known memset size is required.
if (MOffset != 0 || MemSetSize != CopySize) {
// Make sure the memcpy doesn't read any more than what the memset wrote,
// other than undef. Don't worry about sizes larger than i64. A known memset
// size is required.
auto *CMemSetSize = dyn_cast<ConstantInt>(MemSetSize);
if (!CMemSetSize)
return false;
Expand All @@ -1439,23 +1464,18 @@ bool MemCpyOptPass::performMemCpyToMemSetOptzn(MemCpyInst *MemCpy,
auto *CCopySize = dyn_cast<ConstantInt>(CopySize);
if (!CCopySize)
return false;
if (CCopySize->getZExtValue() > CMemSetSize->getZExtValue()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also reduce the diff here? How about just modifying the comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the helper function makes this function more readable, and the helper function should be more generally useful, since it implements a check that other functions in this file should also implement. In particular, I'm wanting to use this in processMemCpyMemCpyDependence next, since the purpose of this PR was to make those two functions start to converge in functionality and implementation. But I can do that refactoring as a separate PR if you prefer.

Copy link
Member

@dianqk dianqk Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, making a new function is the more general approach when sharing code or when a piece of code grows significantly in size.
Also, in my mind, the better approach is:

--- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
+++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
@@ -1790,6 +1790,9 @@ bool MemCpyOptPass::processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI) {
           }
         }
       }
+      if (truncateUndefs(M, MI)) {
+        return true;
+      }
       if (auto *MDep = dyn_cast<MemCpyInst>(MI))
         if (processMemCpyMemCpyDependence(M, MDep, BAA))
           return true;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this helper function should also be used in performStackMoveOptzn, I just didn't want to add additional improvements that re-use the logic to the current PR. I have a list of other TODOs to improve in this file (such as 827019c#diff-afb2b04eff951cb67d214bc6dfa62a087a72591451817dba7929b48c5fe635f9R1530) but they're blocked on getting this PR to progress

// If the memcpy is larger than the memset, but the memory was undef prior
// to the memset, we can just ignore the tail. Technically we're only
// interested in the bytes from MemSetSize..CopySize here, but as we can't
// easily represent this location, we use the full 0..CopySize range.
MemoryLocation MemCpyLoc = MemoryLocation::getForSource(MemCpy);
bool CanReduceSize = false;
MemoryUseOrDef *MemSetAccess = MSSA->getMemoryAccess(MemSet);
MemoryAccess *Clobber = MSSA->getWalker()->getClobberingMemoryAccess(
MemSetAccess->getDefiningAccess(), MemCpyLoc, BAA);
if (auto *MD = dyn_cast<MemoryDef>(Clobber))
if (hasUndefContents(MSSA, BAA, MemCpy->getSource(), MD, CopySize))
CanReduceSize = true;

if (!CanReduceSize)
if (CCopySize->getZExtValue() + MOffset > CMemSetSize->getZExtValue()) {
if (!overreadUndefContents(MSSA, MemCpy, MemSet, BAA))
return false;
CopySize = MemSetSize;
// Clip the memcpy to the bounds of the memset
if (MOffset == 0)
CopySize = MemSetSize;
else
CopySize =
ConstantInt::get(CopySize->getType(),
CMemSetSize->getZExtValue() <= (uint64_t)MOffset
? 0
: CMemSetSize->getZExtValue() - MOffset);
}
}

Expand Down
47 changes: 47 additions & 0 deletions llvm/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,53 @@ define void @test_write_before_memset_in_both_regions(ptr %result) {
ret void
}

define void @test_negative_offset_memset(ptr %result) {
; CHECK-LABEL: @test_negative_offset_memset(
; CHECK-NEXT: [[A1:%.*]] = alloca [16 x i8], align 8
; CHECK-NEXT: [[A:%.*]] = getelementptr i8, ptr [[A1]], i32 4
; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 8 [[A]], i8 0, i64 12, i1 false)
; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr [[RESULT:%.*]], ptr align 8 [[A1]], i64 12, i1 false)
; CHECK-NEXT: ret void
;
%a = alloca [ 16 x i8 ], align 8
%b = getelementptr i8, ptr %a, i32 4
call void @llvm.memset.p0.i64(ptr align 8 %b, i8 0, i64 12, i1 false)
call void @llvm.memcpy.p0.p0.i64(ptr %result, ptr align 8 %a, i64 12, i1 false)
ret void
}

define void @test_offset_memsetcpy(ptr %result) {
; CHECK-LABEL: @test_offset_memsetcpy(
; CHECK-NEXT: [[A1:%.*]] = alloca [16 x i8], align 8
; CHECK-NEXT: [[A:%.*]] = getelementptr i8, ptr [[A1]], i32 4
; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 8 [[A1]], i8 0, i64 12, i1 false)
; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr [[RESULT:%.*]], i8 0, i64 8, i1 false)
; CHECK-NEXT: ret void
;
%a = alloca [ 16 x i8 ], align 8
%b = getelementptr i8, ptr %a, i32 4
call void @llvm.memset.p0.i64(ptr align 8 %a, i8 0, i64 12, i1 false)
call void @llvm.memcpy.p0.p0.i64(ptr %result, ptr align 8 %b, i64 12, i1 false)
ret void
}

define void @test_two_memset(ptr %result) {
; CHECK-LABEL: @test_two_memset(
; CHECK-NEXT: [[A:%.*]] = alloca [16 x i8], align 8
; CHECK-NEXT: [[B:%.*]] = getelementptr i8, ptr [[A]], i32 12
; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 8 [[A]], i8 0, i64 12, i1 false)
; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 8 [[B]], i8 1, i64 4, i1 false)
; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr [[RESULT:%.*]], ptr align 8 [[A]], i64 16, i1 false)
; CHECK-NEXT: ret void
;
%a = alloca [ 16 x i8 ], align 8
%b = getelementptr i8, ptr %a, i32 12
call void @llvm.memset.p0.i64(ptr align 8 %a, i8 0, i64 12, i1 false)
call void @llvm.memset.p0.i64(ptr align 8 %b, i8 1, i64 4, i1 false)
call void @llvm.memcpy.p0.p0.i64(ptr %result, ptr align 8 %a, i64 16, i1 false)
ret void
}

declare ptr @malloc(i64)
declare void @free(ptr)

Expand Down
3 changes: 1 addition & 2 deletions llvm/test/Transforms/MemCpyOpt/memset-memcpy-to-2x-memset.ll
Original file line number Diff line number Diff line change
Expand Up @@ -73,11 +73,10 @@ define void @test_different_source_gep(ptr %dst1, ptr %dst2, i8 %c) {
; CHECK-LABEL: @test_different_source_gep(
; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr [[DST1:%.*]], i8 [[C:%.*]], i64 128, i1 false)
; CHECK-NEXT: [[P:%.*]] = getelementptr i8, ptr [[DST1]], i64 64
; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr [[DST2:%.*]], ptr [[P]], i64 64, i1 false)
; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr [[DST2:%.*]], i8 [[C]], i64 64, i1 false)
; CHECK-NEXT: ret void
;
call void @llvm.memset.p0.i64(ptr %dst1, i8 %c, i64 128, i1 false)
; FIXME: We could optimize this as well.
%p = getelementptr i8, ptr %dst1, i64 64
call void @llvm.memcpy.p0.p0.i64(ptr %dst2, ptr %p, i64 64, i1 false)
ret void
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/Transforms/MemCpyOpt/mixed-sizes.ll
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ define i32 @foo(i1 %z) {
; CHECK: for.body3.lr.ph:
; CHECK-NEXT: br label [[FOR_INC7_1]]
; CHECK: for.inc7.1:
; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[A]], ptr align 4 [[SCEVGEP]], i64 4, i1 false)
; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 4 [[A]], i8 0, i64 4, i1 false)
; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[A]], align 4
; CHECK-NEXT: ret i32 [[TMP2]]
;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ define void @test(ptr %src, i8 %c, i64 %size) {
ret void
}

; Differing sizes, so left as it is.
; Differing sizes, but would be UB if size1 < size2 since the memcpy would reference outside of the first alloca
define void @negative_test(ptr %src, i8 %c, i64 %size1, i64 %size2) {
; CHECK-LABEL: @negative_test(
; CHECK-NEXT: [[DST1:%.*]] = alloca i8, i64 [[SIZE1:%.*]], align 1
Expand Down
Loading