Skip to content

Commit 5ce8927

Browse files
committed
[DSE] Track earliest escape, use for loads in isReadClobber.
At the moment, DSE only considers whether a pointer may be captured at all in a function. This leads to cases where we fail to remove stores to local objects because we do not check if they escape before potential read-clobbers or after. Doing context-sensitive escape queries in isReadClobber has been removed a while ago in d1a1cce to save compile-time. See PR50220 for more context. This patch introduces a new capture tracker, which keeps track of the 'earliest' capture. An instruction A is considered earlier than instruction B, if A dominates B. If 2 escapes do not dominate each other, the terminator of the common dominator is chosen. If not all uses cannot be analyzed, the earliest escape is set to the first instruction in the function entry block. If the query instruction dominates the earliest escape and is not in a cycle, then pointer does not escape before the query instruction. This patch uses this information when checking if a load of a loaded underlying object may alias a write to a stack object. If the stack object does not escape before the load, they do not alias. I will share a follow-up patch to also use the information for call instructions to fix PR50220. In terms of compile-time, the impact is low in general, NewPM-O3: +0.05% NewPM-ReleaseThinLTO: +0.05% NewPM-ReleaseLTO-g: +0.03 with the largest change being tramp3d-v4 (+0.30%) http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions Compared to always computing the capture information on demand, we get the following benefits from the caching: NewPM-O3: -0.03% NewPM-ReleaseThinLTO: -0.08% NewPM-ReleaseLTO-g: -0.04% The biggest speedup is tramp3d-v4 (-0.21%). http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions Overall there is a small, but noticeable benefit from caching. I am not entirely sure if the speedups warrant the extra complexity of caching. The way the caching works also means that we might miss a few cases, as it is less precise. Also, there may be a better way to cache things. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109844
1 parent fbacf5a commit 5ce8927

File tree

4 files changed

+191
-21
lines changed

4 files changed

+191
-21
lines changed

llvm/include/llvm/Analysis/CaptureTracking.h

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ namespace llvm {
2323
class Instruction;
2424
class DominatorTree;
2525
class LoopInfo;
26+
class Function;
2627

2728
/// getDefaultMaxUsesToExploreForCaptureTracking - Return default value of
2829
/// the maximal number of uses to explore before giving up. It is used by
@@ -63,6 +64,19 @@ namespace llvm {
6364
unsigned MaxUsesToExplore = 0,
6465
const LoopInfo *LI = nullptr);
6566

67+
// Returns the 'earliest' instruction that captures \p V in \F. An instruction
68+
// A is considered earlier than instruction B, if A dominates B. If 2 escapes
69+
// do not dominate each other, the terminator of the common dominator is
70+
// chosen. If not all uses can be analyzed, the earliest escape is set to
71+
// the first instruction in the function entry block. If \p V does not escape,
72+
// nullptr is returned. Note that the caller of the function has to ensure
73+
// that the instruction the result value is compared against is not in a
74+
// cycle.
75+
Instruction *FindEarliestCapture(const Value *V, Function &F,
76+
bool ReturnCaptures, bool StoreCaptures,
77+
const DominatorTree &DT,
78+
unsigned MaxUsesToExplore = 0);
79+
6680
/// This callback is used in conjunction with PointerMayBeCaptured. In
6781
/// addition to the interface here, you'll need to provide your own getters
6882
/// to see whether anything was captured.

llvm/lib/Analysis/CaptureTracking.cpp

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,66 @@ namespace {
143143

144144
const LoopInfo *LI;
145145
};
146+
147+
/// Find the 'earliest' instruction before which the pointer is known not to
148+
/// be captured. Here an instruction A is considered earlier than instruction
149+
/// B, if A dominates B. If 2 escapes do not dominate each other, the
150+
/// terminator of the common dominator is chosen. If not all uses cannot be
151+
/// analyzed, the earliest escape is set to the first instruction in the
152+
/// function entry block.
153+
// NOTE: Users have to make sure instructions compared against the earliest
154+
// escape are not in a cycle.
155+
struct EarliestCaptures : public CaptureTracker {
156+
157+
EarliestCaptures(bool ReturnCaptures, Function &F, const DominatorTree &DT)
158+
: DT(DT), ReturnCaptures(ReturnCaptures), Captured(false), F(F) {}
159+
160+
void tooManyUses() override {
161+
Captured = true;
162+
EarliestCapture = &*F.getEntryBlock().begin();
163+
}
164+
165+
bool captured(const Use *U) override {
166+
Instruction *I = cast<Instruction>(U->getUser());
167+
if (isa<ReturnInst>(I) && !ReturnCaptures)
168+
return false;
169+
170+
if (!EarliestCapture) {
171+
EarliestCapture = I;
172+
} else if (EarliestCapture->getParent() == I->getParent()) {
173+
if (I->comesBefore(EarliestCapture))
174+
EarliestCapture = I;
175+
} else {
176+
BasicBlock *CurrentBB = I->getParent();
177+
BasicBlock *EarliestBB = EarliestCapture->getParent();
178+
if (DT.dominates(EarliestBB, CurrentBB)) {
179+
// EarliestCapture already comes before the current use.
180+
} else if (DT.dominates(CurrentBB, EarliestBB)) {
181+
EarliestCapture = I;
182+
} else {
183+
// Otherwise find the nearest common dominator and use its terminator.
184+
auto *NearestCommonDom =
185+
DT.findNearestCommonDominator(CurrentBB, EarliestBB);
186+
EarliestCapture = NearestCommonDom->getTerminator();
187+
}
188+
}
189+
Captured = true;
190+
191+
// Return false to continue analysis; we need to see all potential
192+
// captures.
193+
return false;
194+
}
195+
196+
Instruction *EarliestCapture = nullptr;
197+
198+
const DominatorTree &DT;
199+
200+
bool ReturnCaptures;
201+
202+
bool Captured;
203+
204+
Function &F;
205+
};
146206
}
147207

148208
/// PointerMayBeCaptured - Return true if this pointer value may be captured
@@ -206,6 +266,22 @@ bool llvm::PointerMayBeCapturedBefore(const Value *V, bool ReturnCaptures,
206266
return CB.Captured;
207267
}
208268

269+
Instruction *llvm::FindEarliestCapture(const Value *V, Function &F,
270+
bool ReturnCaptures, bool StoreCaptures,
271+
const DominatorTree &DT,
272+
unsigned MaxUsesToExplore) {
273+
assert(!isa<GlobalValue>(V) &&
274+
"It doesn't make sense to ask whether a global is captured.");
275+
276+
EarliestCaptures CB(ReturnCaptures, F, DT);
277+
PointerMayBeCaptured(V, &CB, MaxUsesToExplore);
278+
if (CB.Captured)
279+
++NumCapturedBefore;
280+
else
281+
++NumNotCapturedBefore;
282+
return CB.EarliestCapture;
283+
}
284+
209285
void llvm::PointerMayBeCaptured(const Value *V, CaptureTracker *Tracker,
210286
unsigned MaxUsesToExplore) {
211287
assert(V->getType()->isPointerTy() && "Capture is for pointers only!");

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
#include "llvm/ADT/Statistic.h"
3939
#include "llvm/ADT/StringRef.h"
4040
#include "llvm/Analysis/AliasAnalysis.h"
41+
#include "llvm/Analysis/CFG.h"
4142
#include "llvm/Analysis/CaptureTracking.h"
4243
#include "llvm/Analysis/GlobalsModRef.h"
4344
#include "llvm/Analysis/LoopInfo.h"
@@ -897,6 +898,9 @@ struct DSEState {
897898
/// basic block.
898899
DenseMap<BasicBlock *, InstOverlapIntervalsTy> IOLs;
899900

901+
DenseMap<const Value *, Instruction *> EarliestEscapes;
902+
DenseMap<Instruction *, TinyPtrVector<const Value *>> Inst2Obj;
903+
900904
DSEState(Function &F, AliasAnalysis &AA, MemorySSA &MSSA, DominatorTree &DT,
901905
PostDominatorTree &PDT, const TargetLibraryInfo &TLI,
902906
const LoopInfo &LI)
@@ -1264,6 +1268,30 @@ struct DSEState {
12641268
DepWriteOffset) == OW_Complete;
12651269
}
12661270

1271+
/// Returns true if \p Object is not captured before or by \p I.
1272+
bool notCapturedBeforeOrAt(const Value *Object, Instruction *I) {
1273+
if (!isIdentifiedFunctionLocal(Object))
1274+
return false;
1275+
1276+
auto Iter = EarliestEscapes.insert({Object, nullptr});
1277+
if (Iter.second) {
1278+
Instruction *EarliestCapture = FindEarliestCapture(
1279+
Object, F, /*ReturnCaptures=*/false, /*StoreCaptures=*/true, DT);
1280+
if (EarliestCapture) {
1281+
auto Ins = Inst2Obj.insert({EarliestCapture, {}});
1282+
Ins.first->second.push_back(Object);
1283+
}
1284+
Iter.first->second = EarliestCapture;
1285+
}
1286+
1287+
// No capturing instruction.
1288+
if (!Iter.first->second)
1289+
return true;
1290+
1291+
return I != Iter.first->second &&
1292+
!isPotentiallyReachable(Iter.first->second, I, nullptr, &DT, &LI);
1293+
}
1294+
12671295
// Returns true if \p Use may read from \p DefLoc.
12681296
bool isReadClobber(const MemoryLocation &DefLoc, Instruction *UseInst) {
12691297
if (isNoopIntrinsic(UseInst))
@@ -1281,6 +1309,25 @@ struct DSEState {
12811309
if (CB->onlyAccessesInaccessibleMemory())
12821310
return false;
12831311

1312+
// BasicAA does not spend linear time to check whether local objects escape
1313+
// before potentially aliasing accesses. To improve DSE results, compute and
1314+
// cache escape info for local objects in certain circumstances.
1315+
if (auto *LI = dyn_cast<LoadInst>(UseInst)) {
1316+
// If the loads reads from a loaded underlying object accesses the load
1317+
// cannot alias DefLoc, if DefUO is a local object that has not escaped
1318+
// before the load.
1319+
auto *ReadUO = getUnderlyingObject(LI->getPointerOperand());
1320+
auto *DefUO = getUnderlyingObject(DefLoc.Ptr);
1321+
if (DefUO && ReadUO && isa<LoadInst>(ReadUO) &&
1322+
notCapturedBeforeOrAt(DefUO, UseInst)) {
1323+
assert(
1324+
!PointerMayBeCapturedBefore(DefLoc.Ptr, false, true, UseInst, &DT,
1325+
false, 0, &this->LI) &&
1326+
"cached analysis disagrees with fresh PointerMayBeCapturedBefore");
1327+
return false;
1328+
}
1329+
}
1330+
12841331
// NOTE: For calls, the number of stores removed could be slightly improved
12851332
// by using AA.callCapturesBefore(UseInst, DefLoc, &DT), but that showed to
12861333
// be expensive compared to the benefits in practice. For now, avoid more
@@ -1706,7 +1753,17 @@ struct DSEState {
17061753
if (MemoryAccess *MA = MSSA.getMemoryAccess(DeadInst)) {
17071754
if (MemoryDef *MD = dyn_cast<MemoryDef>(MA)) {
17081755
SkipStores.insert(MD);
1756+
1757+
// Clear any cached escape info for objects associated with the
1758+
// removed instructions.
1759+
auto Iter = Inst2Obj.find(DeadInst);
1760+
if (Iter != Inst2Obj.end()) {
1761+
for (const Value *Obj : Iter->second)
1762+
EarliestEscapes.erase(Obj);
1763+
Inst2Obj.erase(DeadInst);
1764+
}
17091765
}
1766+
17101767
Updater.removeMemoryAccess(MA);
17111768
}
17121769

0 commit comments

Comments
 (0)