-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[MemDep] Use EarliestEscapeInfo #69727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Use BatchAA with EarliestEscapeInfo instead of callCapturesBefore(). The advantage of this is that it will also take not-captured-before information into account for non-calls, and that this is a cached analysis. The disadvantage is that EII is slightly less precise than full CapturedBefore analysis. In practice the impact is positive, with gvn.NumGVNLoad going from 22022 to 22808 on test-suite.
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-transforms Author: Nikita Popov (nikic) ChangesUse BatchAA with EarliestEscapeInfo instead of callCapturesBefore() in MemDepAnalysis. The advantage of this is that it will also take not-captured-before information into account for non-calls (see test_store_before_capture for a representative example), and that this is a cached analysis. The disadvantage is that EII is slightly less precise than full CapturedBefore analysis. In practice the impact is positive, with gvn.NumGVNLoad going from 22022 to 22808 on test-suite. Full list of diffs: https://gist.github.com/nikic/9b6b4187d96d25249e9d874b47be99df The impact to compile-time is also positive, mainly in the ThinLTO configuration: http://llvm-compile-time-tracker.com/compare.php?from=7c7896b1beeb1d275a06557111363b99b754398a&to=72a9442c743de0ad3fe84e2f8042f392aacda2c7&stat=instructions%3Au Full diff: https://github.com/llvm/llvm-project/pull/69727.diff 4 Files Affected:
diff --git a/llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h b/llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h
index 27185aa9942e4e3..d5b2eb6253db959 100644
--- a/llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h
+++ b/llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h
@@ -18,6 +18,7 @@
#include "llvm/ADT/PointerIntPair.h"
#include "llvm/ADT/PointerSumType.h"
#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/MemoryLocation.h"
#include "llvm/IR/PassManager.h"
#include "llvm/IR/PredIteratorCache.h"
@@ -27,7 +28,6 @@
namespace llvm {
-class AAResults;
class AssumptionCache;
class BatchAAResults;
class DominatorTree;
@@ -356,6 +356,7 @@ class MemoryDependenceResults {
const TargetLibraryInfo &TLI;
DominatorTree &DT;
PredIteratorCache PredCache;
+ EarliestEscapeInfo EII;
unsigned DefaultBlockScanLimit;
@@ -367,7 +368,7 @@ class MemoryDependenceResults {
MemoryDependenceResults(AAResults &AA, AssumptionCache &AC,
const TargetLibraryInfo &TLI, DominatorTree &DT,
unsigned DefaultBlockScanLimit)
- : AA(AA), AC(AC), TLI(TLI), DT(DT),
+ : AA(AA), AC(AC), TLI(TLI), DT(DT), EII(DT),
DefaultBlockScanLimit(DefaultBlockScanLimit) {}
/// Handle invalidation in the new PM.
diff --git a/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp b/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
index 49a31a52cf0e850..1c1a0873ac520f9 100644
--- a/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
@@ -268,7 +268,7 @@ MemDepResult MemoryDependenceResults::getPointerDependencyFrom(
MemDepResult MemoryDependenceResults::getPointerDependencyFrom(
const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,
BasicBlock *BB, Instruction *QueryInst, unsigned *Limit) {
- BatchAAResults BatchAA(AA);
+ BatchAAResults BatchAA(AA, &EII);
return getPointerDependencyFrom(MemLoc, isLoad, ScanIt, BB, QueryInst, Limit,
BatchAA);
}
@@ -645,11 +645,7 @@ MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(
continue;
// See if this instruction (e.g. a call or vaarg) mod/ref's the pointer.
- ModRefInfo MR = BatchAA.getModRefInfo(Inst, MemLoc);
- // If necessary, perform additional analysis.
- if (isModAndRefSet(MR))
- MR = BatchAA.callCapturesBefore(Inst, MemLoc, &DT);
- switch (MR) {
+ switch (BatchAA.getModRefInfo(Inst, MemLoc)) {
case ModRefInfo::NoModRef:
// If the call has no effect on the queried pointer, just ignore it.
continue;
@@ -1227,7 +1223,7 @@ bool MemoryDependenceResults::getNonLocalPointerDepFromBB(
bool GotWorklistLimit = false;
LLVM_DEBUG(AssertSorted(*Cache));
- BatchAAResults BatchAA(AA);
+ BatchAAResults BatchAA(AA, &EII);
while (!Worklist.empty()) {
BasicBlock *BB = Worklist.pop_back_val();
@@ -1539,6 +1535,8 @@ void MemoryDependenceResults::invalidateCachedPredecessors() {
}
void MemoryDependenceResults::removeInstruction(Instruction *RemInst) {
+ EII.removeInstruction(RemInst);
+
// Walk through the Non-local dependencies, removing this one as the value
// for any cached queries.
NonLocalDepMapType::iterator NLDI = NonLocalDepsMap.find(RemInst);
diff --git a/llvm/test/CodeGen/BPF/CORE/no-narrow-load.ll b/llvm/test/CodeGen/BPF/CORE/no-narrow-load.ll
index b154b1d29fa24bf..dbfc4d2f99fc56d 100644
--- a/llvm/test/CodeGen/BPF/CORE/no-narrow-load.ll
+++ b/llvm/test/CodeGen/BPF/CORE/no-narrow-load.ll
@@ -65,10 +65,10 @@ lor.end: ; preds = %lor.end.critedge, %
ret void, !dbg !53
}
-; CHECK: r[[LOAD1:[0-9]+]] = *(u32 *)(r{{[0-9]+}} + 4)
-; CHECK: r[[LOAD1]] &= 65536
-; CHECK: r[[LOAD2:[0-9]+]] = *(u32 *)(r{{[0-9]+}} + 4)
-; CHECK: r[[LOAD2]] &= 32768
+; CHECK: r[[LOAD:[0-9]+]] = *(u32 *)(r{{[0-9]+}} + 4)
+; CHECK: r[[COPY:[0-9]+]] = r[[LOAD]]
+; CHECK: r[[COPY]] &= 65536
+; CHECK: r[[LOAD]] &= 32768
; Function Attrs: nounwind readnone speculatable willreturn
declare void @llvm.dbg.declare(metadata, metadata, metadata) #1
diff --git a/llvm/test/Transforms/GVN/captured-before.ll b/llvm/test/Transforms/GVN/captured-before.ll
index 4e025b3f772e103..6d012edea34610f 100644
--- a/llvm/test/Transforms/GVN/captured-before.ll
+++ b/llvm/test/Transforms/GVN/captured-before.ll
@@ -46,9 +46,8 @@ define i32 @test_store_before_capture(ptr %p) {
; CHECK-NEXT: store i32 123, ptr [[A]], align 4
; CHECK-NEXT: [[P2:%.*]] = load ptr, ptr [[P]], align 8
; CHECK-NEXT: store i32 42, ptr [[P2]], align 4
-; CHECK-NEXT: [[V:%.*]] = load i32, ptr [[A]], align 4
; CHECK-NEXT: call void @capture(ptr [[A]])
-; CHECK-NEXT: ret i32 [[V]]
+; CHECK-NEXT: ret i32 123
;
%a = alloca i32
store i32 123, ptr %a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It seems to cause mis-compile to our tests: https://bugs.chromium.org/p/chromium/issues/detail?id=1495809#c3 Attached the good & bad IR for the test function body with: |
@ZequanWu I looked at the test case, and I don't think this is a miscompile. The faulting load just gets optimized away. The code uses the pattern
which previously managed to prevent the optimization, but no longer does. You should be able to force the load to not be optimized away by using a volatile load instead. Something like |
Thanks, that seems to be the cause. |
Use BatchAA with EarliestEscapeInfo instead of callCapturesBefore() in MemDepAnalysis. The advantage of this is that it will also take not-captured-before information into account for non-calls (see test_store_before_capture for a representative example), and that this is a cached analysis. The disadvantage is that EII is slightly less precise than full CapturedBefore analysis. In practice the impact is positive, with gvn.NumGVNLoad going from 22022 to 22808 on test-suite. The impact to compile-time is also positive, mainly in the ThinLTO configuration. (cherry-picked from 2ad9fde)
Use BatchAA with EarliestEscapeInfo instead of callCapturesBefore() in MemDepAnalysis. The advantage of this is that it will also take not-captured-before information into account for non-calls (see test_store_before_capture for a representative example), and that this is a cached analysis. The disadvantage is that EII is slightly less precise than full CapturedBefore analysis.
In practice the impact is positive, with gvn.NumGVNLoad going from 22022 to 22808 on test-suite. Full list of diffs: https://gist.github.com/nikic/9b6b4187d96d25249e9d874b47be99df
The impact to compile-time is also positive, mainly in the ThinLTO configuration: http://llvm-compile-time-tracker.com/compare.php?from=97e06a0d83a55ff67e5fd0c7328233c44415de33&to=4921af8845260f2cea9aed0a350aec4e8c7b10e6&stat=instructions:u