-
Notifications
You must be signed in to change notification settings - Fork 14.2k
[SSAUpdater] Don't use large SmallSets for IDFcalc #97823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As per the LLVM programmers manual, SmallSets do linear scans on insertion and then turn into a hash-table if the set gets big. Here in the IDFCalculator, the SmallSets have been configured to have 32 elements in each static allocation... which means that we linearly scan for all problems with up to 32 elements. Replace these SmallSets with a plain DenseSet (and use reserve to avoid any repeated allocations). Doing this yields a 0.13% compile-time improvement for debug-info builds, as we hit IDFCalculator pretty hard in InstrRefBasedLDV.
@llvm/pr-subscribers-llvm-adt @llvm/pr-subscribers-llvm-support Author: Jeremy Morse (jmorse) ChangesAs per the LLVM programmers manual, SmallSets do linear scans on insertion and then turn into a hash-table if the set gets big. Here in the IDFCalculator, the SmallSets have been configured to have 32 elements in each static allocation... which means that we linearly scan for all problems with up to 32 elements, which I feel is quite a large N. Replace these SmallSets with a plain DenseSet (and use reserve to avoid any repeated allocations). Doing this yields a 0.13% compile-time improvement for debug-info builds, as we hit IDFCalculator pretty hard in InstrRefBasedLDV. Adding @nikic as this could affect more than debug-info things, although apparently not in a visible way on CTMark. (tracker link is across a few commits, with this patch being the summary). Full diff: https://github.com/llvm/llvm-project/pull/97823.diff 1 Files Affected:
diff --git a/llvm/include/llvm/Support/GenericIteratedDominanceFrontier.h b/llvm/include/llvm/Support/GenericIteratedDominanceFrontier.h
index 0dc58e37c821df..c218d33c3eb303 100644
--- a/llvm/include/llvm/Support/GenericIteratedDominanceFrontier.h
+++ b/llvm/include/llvm/Support/GenericIteratedDominanceFrontier.h
@@ -144,8 +144,12 @@ void IDFCalculatorBase<NodeTy, IsPostDom>::calculate(
DT.updateDFSNumbers();
SmallVector<DomTreeNodeBase<NodeTy> *, 32> Worklist;
- SmallPtrSet<DomTreeNodeBase<NodeTy> *, 32> VisitedPQ;
- SmallPtrSet<DomTreeNodeBase<NodeTy> *, 32> VisitedWorklist;
+ SmallDenseSet<DomTreeNodeBase<NodeTy> *, 16> VisitedPQ;
+ SmallDenseSet<DomTreeNodeBase<NodeTy> *, 16> VisitedWorklist;
+ if (useLiveIn) {
+ VisitedPQ.reserve(LiveInBlocks->size());
+ VisitedWorklist.reserve(LiveInBlocks->size());
+ }
for (NodeTy *BB : *DefBlocks)
if (DomTreeNodeBase<NodeTy> *Node = DT.getNode(BB)) {
|
Does the benefit here really come from switching from SmallPtrSet to SmallDenseSet, or from the added reserve calls? I'd generally expect SmallPtrSet to still be beneficial, even if used in large mode. |
Looking back at my experiments, I originally hit a small regression (0.09% in -g builds) by using DenseSet, with the overall 0.13% improvement coming when the reserve call was added. So I guess it's the reserves. SmallPtrSet doesn't have a reserve method right now, but if you reckon it's a better fit here then we can add one. |
That's surprising! We should definitely add one... |
Added a SmallPtrSet reserve method and switched the relevant containers back to SmallPtrSet. I stuck it on the compile-time-tracker and got the same speedup: http://llvm-compile-time-tracker.com/compare.php?from=00c622e596f918d9d83674b58097c8982ae1af95&to=ab267cc31f01dcc088c4839a4fadd3eca92b902f&stat=instructions%3Au |
le ping |
Also adjust two off-by-ones in the reserve implementation: * We should be willing to fill the small array up to 100% * The test for reallocating a big array happens before the "last" element that we reserve for is inserted, so apply the filter off-by-one.
I've added some unit tests -- they're pretty basic as I've also twiddled some off-by-ones: we fill the Small array up to 100% full, so shouldn't reallocate if we reserve the same number of elements as the small array size. I've also tried to ensure the big-array reallocation will only occur iff |
You can test this locally with the following command:git-clang-format --diff 00c622e596f918d9d83674b58097c8982ae1af95 eb71b58690afe6f8b150214e5ccf570169732df4 --extensions h,cpp -- llvm/include/llvm/ADT/SmallPtrSet.h llvm/include/llvm/Support/GenericIteratedDominanceFrontier.h llvm/unittests/ADT/SmallPtrSetTest.cpp View the diff from clang-format here.diff --git a/llvm/unittests/ADT/SmallPtrSetTest.cpp b/llvm/unittests/ADT/SmallPtrSetTest.cpp
index 1d9a0d1725..da872601a6 100644
--- a/llvm/unittests/ADT/SmallPtrSetTest.cpp
+++ b/llvm/unittests/ADT/SmallPtrSetTest.cpp
@@ -455,5 +455,6 @@ TEST(SmallPtrSetTest, Reserve) {
Set.reserve(0);
EXPECT_EQ(Set.capacity(), 128u);
EXPECT_EQ(Set.size(), 6u);
- EXPECT_THAT(Set, UnorderedElementsAre(&Vals[0], &Vals[1], &Vals[2], &Vals[3], &Vals[4], &Vals[5]));
+ EXPECT_THAT(Set, UnorderedElementsAre(&Vals[0], &Vals[1], &Vals[2], &Vals[3],
+ &Vals[4], &Vals[5]));
}
|
Oh, I'd expect it to expose something similar to |
This involves adding a filter for zero-sized reserves, which break the off-by-one size test I'd added earlier. To maintain consistency with insert_imp_big, the first allocation always jumps to 128 elements.
Added a |
Co-authored-by: Jakub Kuderski <[email protected]>
Thanks for the reviews! |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/89/builds/3664 Here is the relevant piece of the build log for the reference:
|
As per the LLVM programmers manual, SmallSets do linear scans on insertion and then turn into a hash-table if the set gets big. Here in the IDFCalculator, the SmallSets have been configured to have 32 elements in each static allocation... which means that we linearly scan for all problems with up to 32 elements, which I feel is quite a large N.
Replace these SmallSets with a plain DenseSet (and use reserve to avoid any repeated allocations). Doing this yields a 0.13% compile-time improvement for debug-info builds, as we hit IDFCalculator pretty hard in InstrRefBasedLDV. Adding @nikic as this could affect more than debug-info things, although apparently not in a visible way on CTMark.
http://llvm-compile-time-tracker.com/compare.php?from=4ce6cc488083a931e795dafe30e69a0e4a96a3e4&to=a9f1d377013b1d208ac58f6f75548f1f24d174d2&stat=instructions%3Au
(tracker link is across a few commits, with this patch being the summary).