-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[MemCpyOpt] handle memcpy from memset in more cases #140954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This aims to reduce the divergence between this function and processMemCpyMemCpyDependence, with the goal to eventually reduce duplication here and combine them and improve this pass.
@llvm/pr-subscribers-llvm-transforms Author: Jameson Nash (vtjnash) ChangesThis aims to reduce the divergence between the initial checks in this function and processMemCpyMemCpyDependence (in particular, adding handling of offsets and not requiring known sizes), with the goal to eventually reduce duplication there and improve this pass in other ways. Full diff: https://github.com/llvm/llvm-project/pull/140954.diff 6 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
index 9c8cd53d56b56..95d956f18aa69 100644
--- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
+++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
@@ -1367,8 +1367,9 @@ bool MemCpyOptPass::processMemSetMemCpyDependence(MemCpyInst *MemCpy,
return true;
}
-/// Determine whether the instruction has undefined content for the given Size,
-/// either because it was freshly alloca'd or started its lifetime.
+/// Determine whether the pointer V had only undefined content from Def up to
+/// the given Size, either because it was freshly alloca'd or started its
+/// lifetime.
static bool hasUndefContents(MemorySSA *MSSA, BatchAAResults &AA, Value *V,
MemoryDef *Def, Value *Size) {
if (MSSA->isLiveOnEntryDef(Def))
@@ -1403,6 +1404,24 @@ static bool hasUndefContents(MemorySSA *MSSA, BatchAAResults &AA, Value *V,
return false;
}
+static bool coversInputFully(MemorySSA *MSSA, MemCpyInst *MemCpy,
+ MemIntrinsic *MemSrc, BatchAAResults &BAA) {
+ // If the memcpy is larger than the previous, but the memory was undef prior
+ // to that, we can just ignore the tail. Technically we're only
+ // interested in the bytes from 0..MemSrcOffset and
+ // MemSrcLength+MemSrcOffset..CopySize here, but as we can't easily represent
+ // this location, we use the full 0..CopySize range.
+ Value *CopySize = MemCpy->getLength();
+ MemoryLocation MemCpyLoc = MemoryLocation::getForSource(MemCpy);
+ MemoryUseOrDef *MemSrcAccess = MSSA->getMemoryAccess(MemSrc);
+ MemoryAccess *Clobber = MSSA->getWalker()->getClobberingMemoryAccess(
+ MemSrcAccess->getDefiningAccess(), MemCpyLoc, BAA);
+ if (auto *MD = dyn_cast<MemoryDef>(Clobber))
+ if (hasUndefContents(MSSA, BAA, MemCpy->getSource(), MD, CopySize))
+ return true;
+ return false;
+}
+
/// Transform memcpy to memset when its source was just memset.
/// In other words, turn:
/// \code
@@ -1418,51 +1437,63 @@ static bool hasUndefContents(MemorySSA *MSSA, BatchAAResults &AA, Value *V,
bool MemCpyOptPass::performMemCpyToMemSetOptzn(MemCpyInst *MemCpy,
MemSetInst *MemSet,
BatchAAResults &BAA) {
- // Make sure that memcpy(..., memset(...), ...), that is we are memsetting and
- // memcpying from the same address. Otherwise it is hard to reason about.
- if (!BAA.isMustAlias(MemSet->getRawDest(), MemCpy->getRawSource()))
- return false;
-
Value *MemSetSize = MemSet->getLength();
Value *CopySize = MemCpy->getLength();
- if (MemSetSize != CopySize) {
- // Make sure the memcpy doesn't read any more than what the memset wrote.
- // Don't worry about sizes larger than i64.
-
- // A known memset size is required.
- auto *CMemSetSize = dyn_cast<ConstantInt>(MemSetSize);
- if (!CMemSetSize)
+ int64_t MOffset = 0;
+ const DataLayout &DL = MemCpy->getModule()->getDataLayout();
+ // We can only transforms memcpy's where the dest of one is the source of the
+ // other, or they have a known offset.
+ if (MemCpy->getSource() != MemSet->getDest()) {
+ std::optional<int64_t> Offset =
+ MemCpy->getSource()->getPointerOffsetFrom(MemSet->getDest(), DL);
+ if (!Offset)
return false;
+ MOffset = *Offset;
+ }
- // A known memcpy size is also required.
+ MaybeAlign MDestAlign = MemCpy->getDestAlign();
+ int64_t MOffsetAligned = MDestAlign.valueOrOne().value() > 1 && MOffset < 0 ? -(-MOffset & ~(MDestAlign.valueOrOne().value() - 1)) : MOffset; // Compute the MOffset that keeps MDest aligned (truncate towards zero)
+ if (MOffset != 0 || MemSetSize != CopySize) {
+ // Make sure the memcpy doesn't read any more than what the memset wrote, other than undef.
+ auto *CMemSetSize = dyn_cast<ConstantInt>(MemSetSize);
auto *CCopySize = dyn_cast<ConstantInt>(CopySize);
- if (!CCopySize)
- return false;
- if (CCopySize->getZExtValue() > CMemSetSize->getZExtValue()) {
- // If the memcpy is larger than the memset, but the memory was undef prior
- // to the memset, we can just ignore the tail. Technically we're only
- // interested in the bytes from MemSetSize..CopySize here, but as we can't
- // easily represent this location, we use the full 0..CopySize range.
- MemoryLocation MemCpyLoc = MemoryLocation::getForSource(MemCpy);
- bool CanReduceSize = false;
- MemoryUseOrDef *MemSetAccess = MSSA->getMemoryAccess(MemSet);
- MemoryAccess *Clobber = MSSA->getWalker()->getClobberingMemoryAccess(
- MemSetAccess->getDefiningAccess(), MemCpyLoc, BAA);
- if (auto *MD = dyn_cast<MemoryDef>(Clobber))
- if (hasUndefContents(MSSA, BAA, MemCpy->getSource(), MD, CopySize))
- CanReduceSize = true;
-
- if (!CanReduceSize)
+ // Don't worry about sizes larger than i64.
+ if (!CMemSetSize || !CCopySize || MOffset < 0 ||
+ CCopySize->getZExtValue() + MOffset > CMemSetSize->getZExtValue()) {
+ if (!coversInputFully(MSSA, MemCpy, MemSet, BAA))
return false;
- CopySize = MemSetSize;
+
+ if (CMemSetSize && CCopySize) {
+ // If both have constant sizes and offsets, clip the memcpy to the bounds of the memset if applicable.
+ if (CCopySize->getZExtValue() + std::abs(MOffset) > CMemSetSize->getZExtValue()) {
+ if (MOffsetAligned == 0 || (MOffset < 0 && CCopySize->getZExtValue() + MOffset > CMemSetSize->getZExtValue()))
+ CopySize = MemSetSize;
+ else
+ CopySize = ConstantInt::get(CopySize->getType(), std::max((int64_t)0, (int64_t)(CMemSetSize->getZExtValue() - std::abs(MOffsetAligned))));
+ }
+ else if (MOffsetAligned < 0) {
+ // Even if CMemSetSize isn't known, if the MOffsetAligned is negative, make sure to clip the new memset
+ CopySize = ConstantInt::get(CopySize->getType(), CCopySize->getZExtValue() + MOffsetAligned);
+ }
+ }
+ else if (CCopySize && MOffsetAligned < 0) {
+ // Even if CMemSetSize isn't known, if the MOffsetAligned is negative, can still clip the new memset
+ CopySize = ConstantInt::get(CopySize->getType(), CCopySize->getZExtValue() + MOffsetAligned);
+ }
+ else {
+ MOffsetAligned = 0;
+ }
}
}
IRBuilder<> Builder(MemCpy);
+ Value *MDest = MemCpy->getRawDest();
+ if (MOffsetAligned < 0)
+ MDest = Builder.CreateInBoundsPtrAdd(MDest, Builder.getInt64(-MOffsetAligned));
Instruction *NewM =
- Builder.CreateMemSet(MemCpy->getRawDest(), MemSet->getOperand(1),
- CopySize, MemCpy->getDestAlign());
+ Builder.CreateMemSet(MDest, MemSet->getOperand(1),
+ CopySize, MDestAlign);
auto *LastDef = cast<MemoryDef>(MSSA->getMemoryAccess(MemCpy));
auto *NewAccess = MSSAU->createMemoryAccessAfter(NewM, nullptr, LastDef);
MSSAU->insertDef(cast<MemoryDef>(NewAccess), /*RenameUses=*/true);
@@ -1683,7 +1714,7 @@ bool MemCpyOptPass::performStackMoveOptzn(Instruction *Load, Instruction *Store,
I->setMetadata(LLVMContext::MD_tbaa_struct, nullptr);
}
- LLVM_DEBUG(dbgs() << "Stack Move: Performed staack-move optimization\n");
+ LLVM_DEBUG(dbgs() << "Stack Move: Performed stack-move optimization\n");
NumStackMove++;
return true;
}
diff --git a/llvm/test/Transforms/MemCpyOpt/lifetime-missing.ll b/llvm/test/Transforms/MemCpyOpt/lifetime-missing.ll
index 0626f09702f7e..3ad49a63b357a 100644
--- a/llvm/test/Transforms/MemCpyOpt/lifetime-missing.ll
+++ b/llvm/test/Transforms/MemCpyOpt/lifetime-missing.ll
@@ -14,8 +14,12 @@ define void @test() {
; CHECK-LABEL: define void @test() {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[AGG_TMP_SROA_14:%.*]] = alloca [20 x i8], align 4
-; CHECK-NEXT: [[AGG_TMP_SROA_14_128_SROA_IDX:%.*]] = getelementptr i8, ptr [[AGG_TMP_SROA_14]], i64 4
+; CHECK-NEXT: [[AGG_TMP_SROA_15:%.*]] = alloca [20 x i8], align 4
+; CHECK-NEXT: [[AGG_TMP_SROA_14_128_SROA_IDX:%.*]] = getelementptr i8, ptr [[AGG_TMP_SROA_15]], i64 4
; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr [[AGG_TMP_SROA_14_128_SROA_IDX]], i8 0, i64 1, i1 false)
+; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 20, ptr [[AGG_TMP_SROA_14]])
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds i8, ptr [[AGG_TMP_SROA_14]], i64 4
+; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr [[TMP0]], i8 0, i64 1, i1 false)
; CHECK-NEXT: [[AGG_TMP3_SROA_35_128_SROA_IDX:%.*]] = getelementptr i8, ptr [[AGG_TMP_SROA_14]], i64 4
; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr inttoptr (i64 4 to ptr), i8 0, i64 1, i1 false)
; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr null, i8 0, i64 1, i1 false)
diff --git a/llvm/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll b/llvm/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll
index 1c3896407e950..424cdde5fa780 100644
--- a/llvm/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll
+++ b/llvm/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll
@@ -187,6 +187,54 @@ define void @test_write_before_memset_in_both_regions(ptr %result) {
ret void
}
+define void @test_offset_memset(ptr %result) {
+; CHECK-LABEL: @test_offset_memset(
+; CHECK-NEXT: [[A1:%.*]] = alloca [4 x i32], align 8
+; CHECK-NEXT: [[A:%.*]] = getelementptr i32, ptr [[A1]], i32 1
+; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 8 [[A]], i8 0, i64 12, i1 false)
+; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds i8, ptr [[RESULT:%.*]], i64 4
+; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr [[TMP1]], i8 0, i64 8, i1 false)
+; CHECK-NEXT: ret void
+;
+ %a = alloca [ 4 x i32 ], align 8
+ %b = getelementptr i32, ptr %a, i32 1
+ call void @llvm.memset.p0.i64(ptr align 8 %b, i8 0, i64 12, i1 false)
+ call void @llvm.memcpy.p0.p0.i64(ptr %result, ptr align 8 %a, i64 12, i1 false)
+ ret void
+}
+
+define void @test_offset_memsetcpy(ptr %result) {
+; CHECK-LABEL: @test_offset_memsetcpy(
+; CHECK-NEXT: [[A1:%.*]] = alloca [4 x i32], align 8
+; CHECK-NEXT: [[A:%.*]] = getelementptr i32, ptr [[A1]], i32 1
+; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 8 [[A1]], i8 0, i64 12, i1 false)
+; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr [[RESULT:%.*]], i8 0, i64 8, i1 false)
+; CHECK-NEXT: ret void
+;
+ %a = alloca [ 4 x i32 ], align 8
+ %b = getelementptr i32, ptr %a, i32 1
+ call void @llvm.memset.p0.i64(ptr align 8 %a, i8 0, i64 12, i1 false)
+ call void @llvm.memcpy.p0.p0.i64(ptr %result, ptr align 8 %b, i64 12, i1 false)
+ ret void
+}
+
+define void @test_two_memset(ptr %result) {
+; CHECK-LABEL: @test_two_memset(
+; CHECK-NEXT: [[A:%.*]] = alloca [4 x i32], align 8
+; CHECK-NEXT: [[B:%.*]] = getelementptr i32, ptr [[A]], i32 3
+; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 8 [[A]], i8 0, i64 12, i1 false)
+; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 8 [[B]], i8 1, i64 4, i1 false)
+; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr [[RESULT:%.*]], ptr align 8 [[A]], i64 16, i1 false)
+; CHECK-NEXT: ret void
+;
+ %a = alloca [ 4 x i32 ], align 8
+ %b = getelementptr i32, ptr %a, i32 3
+ call void @llvm.memset.p0.i64(ptr align 8 %a, i8 0, i64 12, i1 false)
+ call void @llvm.memset.p0.i64(ptr align 8 %b, i8 1, i64 4, i1 false)
+ call void @llvm.memcpy.p0.p0.i64(ptr %result, ptr align 8 %a, i64 16, i1 false)
+ ret void
+}
+
declare ptr @malloc(i64)
declare void @free(ptr)
diff --git a/llvm/test/Transforms/MemCpyOpt/memset-memcpy-to-2x-memset.ll b/llvm/test/Transforms/MemCpyOpt/memset-memcpy-to-2x-memset.ll
index 47474e8dac051..dc55d0524ddfb 100644
--- a/llvm/test/Transforms/MemCpyOpt/memset-memcpy-to-2x-memset.ll
+++ b/llvm/test/Transforms/MemCpyOpt/memset-memcpy-to-2x-memset.ll
@@ -73,7 +73,7 @@ define void @test_different_source_gep(ptr %dst1, ptr %dst2, i8 %c) {
; CHECK-LABEL: @test_different_source_gep(
; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr [[DST1:%.*]], i8 [[C:%.*]], i64 128, i1 false)
; CHECK-NEXT: [[P:%.*]] = getelementptr i8, ptr [[DST1]], i64 64
-; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr [[DST2:%.*]], ptr [[P]], i64 64, i1 false)
+; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr [[DST2:%.*]], i8 [[C]], i64 64, i1 false)
; CHECK-NEXT: ret void
;
call void @llvm.memset.p0.i64(ptr %dst1, i8 %c, i64 128, i1 false)
diff --git a/llvm/test/Transforms/MemCpyOpt/mixed-sizes.ll b/llvm/test/Transforms/MemCpyOpt/mixed-sizes.ll
index 5e13432746bf7..0e312bc42d463 100644
--- a/llvm/test/Transforms/MemCpyOpt/mixed-sizes.ll
+++ b/llvm/test/Transforms/MemCpyOpt/mixed-sizes.ll
@@ -19,7 +19,7 @@ define i32 @foo(i1 %z) {
; CHECK: for.body3.lr.ph:
; CHECK-NEXT: br label [[FOR_INC7_1]]
; CHECK: for.inc7.1:
-; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[A]], ptr align 4 [[SCEVGEP]], i64 4, i1 false)
+; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 4 [[A]], i8 0, i64 4, i1 false)
; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[A]], align 4
; CHECK-NEXT: ret i32 [[TMP2]]
;
diff --git a/llvm/test/Transforms/MemCpyOpt/variable-sized-memset-memcpy.ll b/llvm/test/Transforms/MemCpyOpt/variable-sized-memset-memcpy.ll
index a834d2465dfa5..d739d53d8a62c 100644
--- a/llvm/test/Transforms/MemCpyOpt/variable-sized-memset-memcpy.ll
+++ b/llvm/test/Transforms/MemCpyOpt/variable-sized-memset-memcpy.ll
@@ -18,13 +18,13 @@ define void @test(ptr %src, i8 %c, i64 %size) {
ret void
}
-; Differing sizes, so left as it is.
+; Differing sizes, but would be UB if size1 > size2
define void @negative_test(ptr %src, i8 %c, i64 %size1, i64 %size2) {
; CHECK-LABEL: @negative_test(
; CHECK-NEXT: [[DST1:%.*]] = alloca i8, i64 [[SIZE1:%.*]], align 1
; CHECK-NEXT: [[DST2:%.*]] = alloca i8, i64 [[SIZE2:%.*]], align 1
; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 8 [[DST1]], i8 [[C:%.*]], i64 [[SIZE1]], i1 false)
-; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 8 [[DST2]], ptr align 8 [[DST1]], i64 [[SIZE2]], i1 false)
+; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 8 [[DST2]], i8 [[C]], i64 [[SIZE2]], i1 false)
; CHECK-NEXT: ret void
;
%dst1 = alloca i8, i64 %size1
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Review request bump? I have some other improvements I'd like to propose (main...vtjnash:llvm-project:jn/memcpyopt-destonly), so this PR was a bit of a practice exercise for me to learn the optimization pass |
I'll look into it this weekend. :3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible miscompile: https://llvm-compile-time-tracker.com/compare.php?from=2f66e5fcbad211e0d71b1a533071d31b36f088ec&to=0d828e6dcb86d886f1e5ee57ec7e71b945bfd36c&stat=instructions:u
Handle the there cases of Offset
is difficult to read. How about splitting into two parts (with two PRs)?
- If it's an undefined value, trim the start and end
- Handle the case where Offset >= 0 here
Okay, I've trimmed down the PR to just add the code for handling |
16bfef3
to
bd2e887
Compare
Second part (sizes aren't constant) is stacked on top of this PR, for once this is merged: Third part (MOffset < 0) seems potentially too annoying to be worth doing, unless you want it. But maybe it is worth just doing the memcpy->memset replacement, without also trying to computed the optimized the start and end points? |
bd2e887
to
bb5fdef
Compare
bb5fdef
to
9b678ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please check the new comments before merging.
auto *CCopySize = dyn_cast<ConstantInt>(CopySize); | ||
if (!CCopySize) | ||
return false; | ||
if (CCopySize->getZExtValue() > CMemSetSize->getZExtValue()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also reduce the diff here? How about just modifying the comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the helper function makes this function more readable, and the helper function should be more generally useful, since it implements a check that other functions in this file should also implement. In particular, I'm wanting to use this in processMemCpyMemCpyDependence
next, since the purpose of this PR was to make those two functions start to converge in functionality and implementation. But I can do that refactoring as a separate PR if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, making a new function is the more general approach when sharing code or when a piece of code grows significantly in size.
Also, in my mind, the better approach is:
--- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
+++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
@@ -1790,6 +1790,9 @@ bool MemCpyOptPass::processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI) {
}
}
}
+ if (truncateUndefs(M, MI)) {
+ return true;
+ }
if (auto *MDep = dyn_cast<MemCpyInst>(MI))
if (processMemCpyMemCpyDependence(M, MDep, BAA))
return true;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this helper function should also be used in performStackMoveOptzn
, I just didn't want to add additional improvements that re-use the logic to the current PR. I have a list of other TODOs to improve in this file (such as 827019c#diff-afb2b04eff951cb67d214bc6dfa62a087a72591451817dba7929b48c5fe635f9R1530) but they're blocked on getting this PR to progress
I don't have merge rights anymore (I wasn't active enough when they were reviewing the list of people with access, though I can ask for it to be reinstated if you'd prefer that I merge) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have merge rights anymore (I wasn't active enough when they were reviewing the list of people with access, though I can ask for it to be reinstated if you'd prefer that I merge)
Feel free to request :)
And I can merge it after your response.
auto *CCopySize = dyn_cast<ConstantInt>(CopySize); | ||
if (!CCopySize) | ||
return false; | ||
if (CCopySize->getZExtValue() > CMemSetSize->getZExtValue()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, making a new function is the more general approach when sharing code or when a piece of code grows significantly in size.
Also, in my mind, the better approach is:
--- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
+++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
@@ -1790,6 +1790,9 @@ bool MemCpyOptPass::processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI) {
}
}
}
+ if (truncateUndefs(M, MI)) {
+ return true;
+ }
if (auto *MDep = dyn_cast<MemCpyInst>(MI))
if (processMemCpyMemCpyDependence(M, MDep, BAA))
return true;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks like a nice extension.
@@ -1403,6 +1404,24 @@ static bool hasUndefContents(MemorySSA *MSSA, BatchAAResults &AA, Value *V, | |||
return false; | |||
} | |||
|
|||
static bool coversInputFully(MemorySSA *MSSA, MemCpyInst *MemCpy, | |||
MemIntrinsic *MemSrc, BatchAAResults &BAA) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this helper seems quite misleading. The corresponding boolean was previously called CanReduceSize, which is what this is actually checking (can we reduce CopySize to the size written by MemSrc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CanReduceSize
was badly named, since it is not what it was for checking (which is whether the MemCpy operation is fully covered (def or undef) by the MemSrc operation). It was what it was incidentally implemented to do with the result, but I have a new commit ready to remove that unnecessary restriction on the implementation and only leave the check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get what followup change you have in mind, but if you think canReduceSize is inaccurate, then maybe something along the lines of overreadIsUndef?
I don't think the new name inputFullyCoveredBySrc is really better than the old, because this function is specifically detecting the case where the input is not fully covered by src (it is partially covered by src and partially by undef initialization)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CanReduceSize bool is actually a property of having both sizes be constants (the other conditional here). If they aren't constants, then you cannot reduce the size, though this query and this optimization are still okay
This aims to reduce the divergence between the initial checks in this function and processMemCpyMemCpyDependence (in particular, adding handling of offsets), with the goal to eventually reduce duplication there and improve this pass in other ways.
Allows forwarding memset to memcpy for mismatching unknown sizes if overread has undef contents. In that case we can refine the undef bytes to the memset value. Refs #140954 which laid some of the groundwork for this.
This aims to reduce the divergence between the initial checks in this function and processMemCpyMemCpyDependence (in particular, adding handling of offsets), with the goal to eventually reduce duplication there and improve this pass in other ways.
…3727) Allows forwarding memset to memcpy for mismatching unknown sizes if overread has undef contents. In that case we can refine the undef bytes to the memset value. Refs llvm#140954 which laid some of the groundwork for this.
…emCpyMemCpyDependence (#143745) Allows memcpy to memcpy forwarding in cases where the second memcpy is larger, but the overread is known to be undef, by shrinking the memcpy size. Refs llvm/llvm-project#140954 which laid some of the groundwork for this.
This aims to reduce the divergence between the initial checks in this function and processMemCpyMemCpyDependence (in particular, adding handling of offsets), with the goal to eventually reduce duplication there and improve this pass in other ways.