-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[GVN] Drop Clobber dependency if store may overwrite only the same value #68322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-transforms ChangesMotivation: https://godbolt.org/z/sfx6fPc3n In this example the second load is redundant and can be optimized. The problem is a cloberring store between it and first load, but here it can be safely skipped because one-byte store with alignment of 1 can only be MustAlias or NoAlias (no partial overlapping possible), and in case of MustAlias it will rewrite exactly the same value. Note: this case can be optimized by GCC and there were previous attempts to fix this issue, e.g. https://reviews.llvm.org/D155843 Full diff: https://github.com/llvm/llvm-project/pull/68322.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp b/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
index 071ecdba8a54ace..eb503e4a781c94e 100644
--- a/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
@@ -360,11 +360,42 @@ MemoryDependenceResults::getInvariantGroupPointerDependency(LoadInst *LI,
return MemDepResult::getNonLocal();
}
+// canSkipClobberingStore - check if SI that may alias with MemLoc can be safely
+// skipped. This is possible in case if SI can only must alias or no alias with
+// MemLoc (no partial overlapping possible) and it writes the same value that
+// MemLoc contains now (it was loaded before this store and was not modified in
+// between).
+static bool canSkipClobberingStore(const StoreInst *SI,
+ const MemoryLocation &MemLoc,
+ Align MemLocAlign, BatchAAResults &BatchAA) {
+ if (!MemLoc.Size.hasValue())
+ return false;
+ if (MemoryLocation::get(SI).Size != MemLoc.Size)
+ return false;
+ if (MemLocAlign.value() < MemLoc.Size.getValue())
+ return false;
+ if (SI->getAlign() < MemLocAlign)
+ return false;
+
+ auto *LI = dyn_cast<LoadInst>(SI->getValueOperand());
+ if (!LI || LI->getParent() != SI->getParent())
+ return false;
+ if (BatchAA.alias(MemoryLocation::get(LI), MemLoc) != AliasResult::MustAlias)
+ return false;
+ for (const auto *I = LI->getNextNode(); I != SI; I = I->getNextNode())
+ if (isModSet(BatchAA.getModRefInfo(I, MemLoc)))
+ return false;
+
+ return true;
+}
+
MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(
const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,
BasicBlock *BB, Instruction *QueryInst, unsigned *Limit,
BatchAAResults &BatchAA) {
bool isInvariantLoad = false;
+ Align MemLocAlign =
+ MemLoc.Ptr->getPointerAlignment(BB->getModule()->getDataLayout());
unsigned DefaultLimit = getDefaultBlockScanLimit();
if (!Limit)
@@ -402,11 +433,12 @@ MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(
// do want to respect mustalias results since defs are useful for value
// forwarding, but any mayalias write can be assumed to be noalias.
// Arguably, this logic should be pushed inside AliasAnalysis itself.
- if (isLoad && QueryInst) {
- LoadInst *LI = dyn_cast<LoadInst>(QueryInst);
- if (LI && LI->hasMetadata(LLVMContext::MD_invariant_load))
- isInvariantLoad = true;
- }
+ if (isLoad && QueryInst)
+ if (LoadInst *LI = dyn_cast<LoadInst>(QueryInst)) {
+ if (LI->hasMetadata(LLVMContext::MD_invariant_load))
+ isInvariantLoad = true;
+ MemLocAlign = LI->getAlign();
+ }
// True for volatile instruction.
// For Load/Store return true if atomic ordering is stronger than AO,
@@ -577,6 +609,8 @@ MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(
return MemDepResult::getDef(Inst);
if (isInvariantLoad)
continue;
+ if (canSkipClobberingStore(SI, MemLoc, MemLocAlign, BatchAA))
+ continue;
return MemDepResult::getClobber(Inst);
}
diff --git a/llvm/test/Transforms/GVN/rle-clobbering-store.ll b/llvm/test/Transforms/GVN/rle-clobbering-store.ll
new file mode 100644
index 000000000000000..d6b8949752af4ac
--- /dev/null
+++ b/llvm/test/Transforms/GVN/rle-clobbering-store.ll
@@ -0,0 +1,91 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
+; RUN: opt < %s -passes=gvn -S | FileCheck %s
+
+define void @test_i8(ptr %a, ptr %b, ptr %c) {
+; CHECK-LABEL: define void @test_i8(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr [[A]], align 1
+; CHECK-NEXT: store i8 [[TMP0]], ptr [[B]], align 1
+; CHECK-NEXT: store i8 [[TMP0]], ptr [[C]], align 1
+; CHECK-NEXT: ret void
+;
+entry:
+ %0 = load i8, ptr %a, align 1
+ store i8 %0, ptr %b, align 1
+ %1 = load i8, ptr %a, align 1
+ store i8 %1, ptr %c, align 1
+ ret void
+}
+
+define void @test_i32(ptr %a, ptr %b, ptr %c) {
+; CHECK-LABEL: define void @test_i32(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[A]], align 4
+; CHECK-NEXT: store i32 [[TMP0]], ptr [[B]], align 4
+; CHECK-NEXT: store i32 [[TMP0]], ptr [[C]], align 4
+; CHECK-NEXT: ret void
+;
+entry:
+ %0 = load i32, ptr %a, align 4
+ store i32 %0, ptr %b, align 4
+ %1 = load i32, ptr %a, align 4
+ store i32 %1, ptr %c, align 4
+ ret void
+}
+
+define void @test_float(ptr %a, ptr %b, ptr %c) {
+; CHECK-LABEL: define void @test_float(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = load float, ptr [[A]], align 4
+; CHECK-NEXT: store float [[TMP0]], ptr [[B]], align 4
+; CHECK-NEXT: store float [[TMP0]], ptr [[C]], align 4
+; CHECK-NEXT: ret void
+;
+entry:
+ %0 = load float, ptr %a, align 4
+ store float %0, ptr %b, align 4
+ %1 = load float, ptr %a, align 4
+ store float %1, ptr %c, align 4
+ ret void
+}
+
+define void @test_unaligned(ptr %a, ptr %b, ptr %c) {
+; CHECK-LABEL: define void @test_unaligned(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[A]], align 4
+; CHECK-NEXT: store i32 [[TMP0]], ptr [[B]], align 2
+; CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr [[A]], align 4
+; CHECK-NEXT: store i32 [[TMP1]], ptr [[C]], align 2
+; CHECK-NEXT: ret void
+;
+entry:
+ %0 = load i32, ptr %a, align 4
+ store i32 %0, ptr %b, align 2
+ %1 = load i32, ptr %a, align 4
+ store i32 %1, ptr %c, align 2
+ ret void
+}
+
+define void @test_modify_between(ptr %a, ptr %b, ptr %c) {
+; CHECK-LABEL: define void @test_modify_between(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr [[A]], align 1
+; CHECK-NEXT: store i8 42, ptr [[C]], align 1
+; CHECK-NEXT: store i8 [[TMP0]], ptr [[B]], align 1
+; CHECK-NEXT: [[TMP1:%.*]] = load i8, ptr [[A]], align 1
+; CHECK-NEXT: store i8 [[TMP1]], ptr [[C]], align 1
+; CHECK-NEXT: ret void
+;
+entry:
+ %0 = load i8, ptr %a, align 1
+ store i8 42, ptr %c, align 1
+ store i8 %0, ptr %b, align 1
+ %1 = load i8, ptr %a, align 1
+ store i8 %1, ptr %c, align 1
+ ret void
+}
|
Results on test-suite with SPEC2006 (number of removed redundant loads):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me.
Compile-time: http://llvm-compile-time-tracker.com/compare.php?from=ff5e2babcb46e7eb3887ee265decb2948da2792c&to=8c6f93c8606faeb2a750cf7c31e0effd21bb88e6&stat=instructions:u
Code-size: http://llvm-compile-time-tracker.com/compare.php?from=ff5e2babcb46e7eb3887ee265decb2948da2792c&to=8c6f93c8606faeb2a750cf7c31e0effd21bb88e6&stat=size-text
@@ -360,11 +360,42 @@ MemoryDependenceResults::getInvariantGroupPointerDependency(LoadInst *LI, | |||
return MemDepResult::getNonLocal(); | |||
} | |||
|
|||
// canSkipClobberingStore - check if SI that may alias with MemLoc can be safely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't repeat function name in comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
%1 = load float, ptr %a, align 4 | ||
store float %1, ptr %c, align 4 | ||
ret void | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also test non-byte-sized value (e.g. i1
). I'm not entirely sure the optimization is correct in that case. (Because the store will write bits outside the value itself.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test_i1, I think that we shouldn't have any problems with it, because the only relevant information for this optimization is the location size (not loaded/stored type), and for i1 the location size will be DataLayout::getTypeStoreSize(i1), which is something like 1 byte in most cases, so this test should be optimized very similarly to test_i8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problematic case I had in mind here is this one:
define i8 @test_i1(ptr %a, ptr %b, ptr %c) {
entry:
%0 = load i1, ptr %a, align 1
store i1 %0, ptr %b, align 1
%1 = load i8, ptr %a, align 1
ret i8 %1
}
store i1 %0
is going to write the same value to the low bit, but it will also write the 7 high bits (typically to zero, though this is unspecified), so it's not actually a no-op.
Though given that LangRef says
When loading a value of a type like i20 with a size that is not an integral number of bytes, the result is undefined if the value was not originally written using a store of the same type.
this would fall under undefined behavior, so I guess it's fine.
07a2e89
to
6df3c7b
Compare
@@ -577,6 +610,9 @@ MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom( | |||
return MemDepResult::getDef(Inst); | |||
if (isInvariantLoad) | |||
continue; | |||
if (canSkipClobberingStore(SI, MemLoc, MemLocAlign, BatchAA, | |||
getDefaultBlockScanLimit())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, do you think it would make sense to pass Limit
here instead? That way, if we already walked ~100 instructions in the main clobber walk, we're not going to walk another ~100 trying to skip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I've checked statistics on llvm-test-suite (gvn.NumGVNLoad, gvn.NumPRELoad) and seems that it will not cause any changes with this more conservative limit
%1 = load float, ptr %a, align 4 | ||
store float %1, ptr %c, align 4 | ||
ret void | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problematic case I had in mind here is this one:
define i8 @test_i1(ptr %a, ptr %b, ptr %c) {
entry:
%0 = load i1, ptr %a, align 1
store i1 %0, ptr %b, align 1
%1 = load i8, ptr %a, align 1
ret i8 %1
}
store i1 %0
is going to write the same value to the low bit, but it will also write the 7 high bits (typically to zero, though this is unspecified), so it's not actually a no-op.
Though given that LangRef says
When loading a value of a type like i20 with a size that is not an integral number of bytes, the result is undefined if the value was not originally written using a store of the same type.
this would fall under undefined behavior, so I guess it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In some cases clobbering store can be safely skipped if it can only must or no alias with memory location and it writes the same value. This patch supports simple case when the value from memory location was loaded in the same basic block before the store and there are no modifications between them.
40161cd
to
eca1041
Compare
Motivation: https://godbolt.org/z/sfx6fPc3n
In this example the second load is redundant and can be optimized. The problem is a cloberring store between it and first load, but here it can be safely skipped because one-byte store with alignment of 1 can only be MustAlias or NoAlias (no partial overlapping possible), and in case of MustAlias it will rewrite exactly the same value.
Note: this case can be optimized by GCC and there were previous attempts to fix this issue, e.g. https://reviews.llvm.org/D155843