Skip to content

[SSAUpdater] Don't use large SmallSets for IDFcalc #97823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 8, 2024

Conversation

jmorse
Copy link
Member

@jmorse jmorse commented Jul 5, 2024

As per the LLVM programmers manual, SmallSets do linear scans on insertion and then turn into a hash-table if the set gets big. Here in the IDFCalculator, the SmallSets have been configured to have 32 elements in each static allocation... which means that we linearly scan for all problems with up to 32 elements, which I feel is quite a large N.

Replace these SmallSets with a plain DenseSet (and use reserve to avoid any repeated allocations). Doing this yields a 0.13% compile-time improvement for debug-info builds, as we hit IDFCalculator pretty hard in InstrRefBasedLDV. Adding @nikic as this could affect more than debug-info things, although apparently not in a visible way on CTMark.

http://llvm-compile-time-tracker.com/compare.php?from=4ce6cc488083a931e795dafe30e69a0e4a96a3e4&to=a9f1d377013b1d208ac58f6f75548f1f24d174d2&stat=instructions%3Au

(tracker link is across a few commits, with this patch being the summary).

As per the LLVM programmers manual, SmallSets do linear scans on insertion
and then turn into a hash-table if the set gets big. Here in the
IDFCalculator, the SmallSets have been configured to have 32 elements in
each static allocation... which means that we linearly scan for all
problems with up to 32 elements.

Replace these SmallSets with a plain DenseSet (and use reserve to avoid any
repeated allocations). Doing this yields a 0.13% compile-time improvement
for debug-info builds, as we hit IDFCalculator pretty hard in
InstrRefBasedLDV.
@jmorse jmorse requested review from nikic, SLTozer and OCHyams July 5, 2024 13:00
@llvmbot
Copy link
Member

llvmbot commented Jul 5, 2024

@llvm/pr-subscribers-llvm-adt

@llvm/pr-subscribers-llvm-support

Author: Jeremy Morse (jmorse)

Changes

As per the LLVM programmers manual, SmallSets do linear scans on insertion and then turn into a hash-table if the set gets big. Here in the IDFCalculator, the SmallSets have been configured to have 32 elements in each static allocation... which means that we linearly scan for all problems with up to 32 elements, which I feel is quite a large N.

Replace these SmallSets with a plain DenseSet (and use reserve to avoid any repeated allocations). Doing this yields a 0.13% compile-time improvement for debug-info builds, as we hit IDFCalculator pretty hard in InstrRefBasedLDV. Adding @nikic as this could affect more than debug-info things, although apparently not in a visible way on CTMark.

http://llvm-compile-time-tracker.com/compare.php?from=4ce6cc488083a931e795dafe30e69a0e4a96a3e4&to=a9f1d377013b1d208ac58f6f75548f1f24d174d2&stat=instructions%3Au

(tracker link is across a few commits, with this patch being the summary).


Full diff: https://github.com/llvm/llvm-project/pull/97823.diff

1 Files Affected:

  • (modified) llvm/include/llvm/Support/GenericIteratedDominanceFrontier.h (+6-2)
diff --git a/llvm/include/llvm/Support/GenericIteratedDominanceFrontier.h b/llvm/include/llvm/Support/GenericIteratedDominanceFrontier.h
index 0dc58e37c821df..c218d33c3eb303 100644
--- a/llvm/include/llvm/Support/GenericIteratedDominanceFrontier.h
+++ b/llvm/include/llvm/Support/GenericIteratedDominanceFrontier.h
@@ -144,8 +144,12 @@ void IDFCalculatorBase<NodeTy, IsPostDom>::calculate(
   DT.updateDFSNumbers();
 
   SmallVector<DomTreeNodeBase<NodeTy> *, 32> Worklist;
-  SmallPtrSet<DomTreeNodeBase<NodeTy> *, 32> VisitedPQ;
-  SmallPtrSet<DomTreeNodeBase<NodeTy> *, 32> VisitedWorklist;
+  SmallDenseSet<DomTreeNodeBase<NodeTy> *, 16> VisitedPQ;
+  SmallDenseSet<DomTreeNodeBase<NodeTy> *, 16> VisitedWorklist;
+  if (useLiveIn) {
+    VisitedPQ.reserve(LiveInBlocks->size());
+    VisitedWorklist.reserve(LiveInBlocks->size());
+  }
 
   for (NodeTy *BB : *DefBlocks)
     if (DomTreeNodeBase<NodeTy> *Node = DT.getNode(BB)) {

@nikic
Copy link
Contributor

nikic commented Jul 5, 2024

Does the benefit here really come from switching from SmallPtrSet to SmallDenseSet, or from the added reserve calls? I'd generally expect SmallPtrSet to still be beneficial, even if used in large mode.

@jmorse
Copy link
Member Author

jmorse commented Jul 5, 2024

Looking back at my experiments, I originally hit a small regression (0.09% in -g builds) by using DenseSet, with the overall 0.13% improvement coming when the reserve call was added. So I guess it's the reserves.

SmallPtrSet doesn't have a reserve method right now, but if you reckon it's a better fit here then we can add one.

@nikic
Copy link
Contributor

nikic commented Jul 5, 2024

SmallPtrSet doesn't have a reserve method right now, but if you reckon it's a better fit here then we can add one.

That's surprising! We should definitely add one...

@jmorse
Copy link
Member Author

jmorse commented Jul 16, 2024

Added a SmallPtrSet reserve method and switched the relevant containers back to SmallPtrSet. I stuck it on the compile-time-tracker and got the same speedup: http://llvm-compile-time-tracker.com/compare.php?from=00c622e596f918d9d83674b58097c8982ae1af95&to=ab267cc31f01dcc088c4839a4fadd3eca92b902f&stat=instructions%3Au

@jmorse
Copy link
Member Author

jmorse commented Jul 24, 2024

le ping

Also adjust two off-by-ones in the reserve implementation:
 * We should be willing to fill the small array up to 100%
 * The test for reallocating a big array happens before the "last" element
   that we reserve for is inserted, so apply the filter off-by-one.
@jmorse
Copy link
Member Author

jmorse commented Jul 30, 2024

I've added some unit tests -- they're pretty basic as SmallPtrSet doesn't expose a means of working out if it's reallocated, and it doesn't look like existing LLVM unit tests go to great effort examining internal state. At the least they'll ensure reserve exists, and doesn't do anything silly.

I've also twiddled some off-by-ones: we fill the Small array up to 100% full, so shouldn't reallocate if we reserve the same number of elements as the small array size. I've also tried to ensure the big-array reallocation will only occur iff insert_imp_big would reallocate: it performs its size test before it completes an insertion, thus I've made the reserve logic off-by-one from the true number of elements to be inserted. Hopefully this isn't making it too complicated?

Copy link

github-actions bot commented Jul 30, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 00c622e596f918d9d83674b58097c8982ae1af95 eb71b58690afe6f8b150214e5ccf570169732df4 --extensions h,cpp -- llvm/include/llvm/ADT/SmallPtrSet.h llvm/include/llvm/Support/GenericIteratedDominanceFrontier.h llvm/unittests/ADT/SmallPtrSetTest.cpp
View the diff from clang-format here.
diff --git a/llvm/unittests/ADT/SmallPtrSetTest.cpp b/llvm/unittests/ADT/SmallPtrSetTest.cpp
index 1d9a0d1725..da872601a6 100644
--- a/llvm/unittests/ADT/SmallPtrSetTest.cpp
+++ b/llvm/unittests/ADT/SmallPtrSetTest.cpp
@@ -455,5 +455,6 @@ TEST(SmallPtrSetTest, Reserve) {
   Set.reserve(0);
   EXPECT_EQ(Set.capacity(), 128u);
   EXPECT_EQ(Set.size(), 6u);
-  EXPECT_THAT(Set, UnorderedElementsAre(&Vals[0], &Vals[1], &Vals[2], &Vals[3], &Vals[4], &Vals[5]));
+  EXPECT_THAT(Set, UnorderedElementsAre(&Vals[0], &Vals[1], &Vals[2], &Vals[3],
+                                        &Vals[4], &Vals[5]));
 }

@kuhar
Copy link
Member

kuhar commented Jul 30, 2024

I've added some unit tests -- they're pretty basic as SmallPtrSet doesn't expose a means of working out if it's reallocated, and it doesn't look like existing LLVM unit tests go to great effort examining internal state. At the least they'll ensure reserve exists, and doesn't do anything silly.

Oh, I'd expect it to expose something similar to .capacity()

This involves adding a filter for zero-sized reserves, which break the
off-by-one size test I'd added earlier. To maintain consistency with
insert_imp_big, the first allocation always jumps to 128 elements.
@jmorse
Copy link
Member Author

jmorse commented Jul 30, 2024

Added a capacity method as it's extremely simple and added some more test coverage. As a result, I've tweaked reserve to always allocate 128 elements whenever it grows from a small size, to match the behaviour of insert_imp_big. Also a filter for zero-sized reserves that was going to mess with some of the logic.

Co-authored-by: Jakub Kuderski <[email protected]>
@jmorse jmorse merged commit 7195572 into llvm:main Aug 8, 2024
6 of 7 checks passed
@jmorse
Copy link
Member Author

jmorse commented Aug 8, 2024

Thanks for the reviews!

@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 8, 2024

LLVM Buildbot has detected a new failure on builder flang-aarch64-libcxx running on linaro-flang-aarch64-libcxx while building llvm at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/89/builds/3664

Here is the relevant piece of the build log for the reference:

Step 5 (build-unified-tree) failure: build (failure)
...
5.762 [6751/67/295] Building CXX object third-party/benchmark/src/CMakeFiles/benchmark.dir/benchmark_api_internal.cc.o
5.808 [6751/66/296] Building CXX object third-party/benchmark/src/CMakeFiles/benchmark.dir/benchmark_runner.cc.o
5.826 [6748/68/297] Building CXX object third-party/benchmark/src/CMakeFiles/benchmark.dir/benchmark.cc.o
5.828 [6748/67/298] Linking CXX shared library lib/libFortranDecimal.so.20.0git
5.832 [6748/66/299] Linking CXX executable bin/llvm-PerfectShuffle
5.841 [6748/65/300] Building CXX object third-party/benchmark/src/CMakeFiles/benchmark.dir/benchmark_register.cc.o
5.891 [6748/64/301] Linking CXX static library lib/libDynamicLibraryLib.a
5.986 [6748/63/302] Linking CXX shared module unittests/Support/DynamicLibrary/SecondLib.so
5.993 [6748/62/303] Linking CXX shared module unittests/Support/DynamicLibrary/PipSqueak.so
6.183 [6748/61/304] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/SmallPtrSet.cpp.o
FAILED: lib/Support/CMakeFiles/LLVMSupport.dir/SmallPtrSet.cpp.o 
/usr/local/bin/c++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/lib/Support -I/home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/llvm/lib/Support -I/home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/include -I/home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/llvm/include -stdlib=libc++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections  -O3 -DNDEBUG -fPIC -UNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -MD -MT lib/Support/CMakeFiles/LLVMSupport.dir/SmallPtrSet.cpp.o -MF lib/Support/CMakeFiles/LLVMSupport.dir/SmallPtrSet.cpp.o.d -o lib/Support/CMakeFiles/LLVMSupport.dir/SmallPtrSet.cpp.o -c /home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/llvm/lib/Support/SmallPtrSet.cpp
In file included from ../llvm-project/llvm/lib/Support/SmallPtrSet.cpp:14:
../llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:130:20: error: no member named 'max' in namespace 'std'; did you mean 'fmax'?
  130 |     NewSize = std::max(128u, NewSize);
      |               ~~~~~^~~
      |                    fmax
/usr/local/clang+llvm-17.0.6-aarch64-linux-gnu/bin/../include/c++/v1/cmath:435:9: note: 'fmax' declared here
  435 | using ::fmax _LIBCPP_USING_IF_EXISTS;
      |         ^
1 error generated.
6.365 [6748/60/305] Building CXX object third-party/benchmark/src/CMakeFiles/benchmark.dir/counter.cc.o
6.376 [6748/59/306] Building CXX object third-party/benchmark/src/CMakeFiles/benchmark.dir/perf_counters.cc.o
6.379 [6748/58/307] Building CXX object third-party/benchmark/src/CMakeFiles/benchmark.dir/json_reporter.cc.o
6.381 [6748/57/308] Building CXX object third-party/benchmark/src/CMakeFiles/benchmark_main.dir/benchmark_main.cc.o
6.396 [6748/56/309] Building CXX object third-party/benchmark/src/CMakeFiles/benchmark.dir/statistics.cc.o
6.397 [6748/55/310] Building CXX object third-party/benchmark/src/CMakeFiles/benchmark.dir/reporter.cc.o
6.399 [6748/54/311] Building CXX object third-party/benchmark/src/CMakeFiles/benchmark.dir/string_util.cc.o
6.429 [6748/53/312] Building CXX object third-party/benchmark/src/CMakeFiles/benchmark.dir/csv_reporter.cc.o
6.431 [6748/52/313] Building CXX object third-party/benchmark/src/CMakeFiles/benchmark.dir/timers.cc.o
6.433 [6748/51/314] Building CXX object third-party/benchmark/src/CMakeFiles/benchmark.dir/sysinfo.cc.o
7.085 [6748/50/315] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/Debug.cpp.o
8.396 [6748/49/316] Building CXX object tools/mlir/lib/TableGen/CMakeFiles/MLIRTableGen.dir/GenInfo.cpp.o
8.478 [6748/48/317] Building CXX object lib/TableGen/CMakeFiles/LLVMTableGen.dir/TableGenBackendSkeleton.cpp.o
8.587 [6748/47/318] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/Statistic.cpp.o
8.641 [6748/46/319] Building CXX object lib/TableGen/CMakeFiles/LLVMTableGen.dir/TableGenBackend.cpp.o
8.716 [6748/45/320] Building CXX object tools/clang/utils/TableGen/CMakeFiles/clang-tblgen.dir/ClangDataCollectorsEmitter.cpp.o
8.815 [6748/44/321] Building CXX object utils/TableGen/CMakeFiles/llvm-min-tblgen.dir/Attributes.cpp.o
8.879 [6748/43/322] Building CXX object tools/clang/utils/TableGen/CMakeFiles/clang-tblgen.dir/ClangCommentHTMLTagsEmitter.cpp.o
8.958 [6748/42/323] Building CXX object utils/TableGen/CMakeFiles/llvm-min-tblgen.dir/VTEmitter.cpp.o
8.997 [6748/41/324] Building CXX object utils/TableGen/CMakeFiles/llvm-min-tblgen.dir/TableGen.cpp.o
9.087 [6748/40/325] Building CXX object tools/clang/utils/TableGen/CMakeFiles/clang-tblgen.dir/ClangTypeNodesEmitter.cpp.o
9.205 [6748/39/326] Building CXX object tools/clang/utils/TableGen/CMakeFiles/clang-tblgen.dir/ClangCommentHTMLNamedCharacterReferenceEmitter.cpp.o
9.291 [6748/38/327] Building CXX object tools/reduce-chunk-list/CMakeFiles/reduce-chunk-list.dir/reduce-chunk-list.cpp.o
9.293 [6748/37/328] Building CXX object tools/mlir/lib/TableGen/CMakeFiles/MLIRTableGen.dir/AttrOrTypeDef.cpp.o
9.438 [6748/36/329] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/TypeSize.cpp.o
9.502 [6748/35/330] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/Timer.cpp.o
9.513 [6748/34/331] Building CXX object tools/mlir/lib/TableGen/CMakeFiles/MLIRTableGen.dir/Predicate.cpp.o
9.633 [6748/33/332] Building CXX object utils/TableGen/CMakeFiles/llvm-min-tblgen.dir/ARMTargetDefEmitter.cpp.o

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants