-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Clang][perf-training] Do build of libLLVMSupport for perf training #111625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This adds a build of the libLLVMSupport to the lit suite that is used for generating profile data. This helps to improve both PGO and BOLT optimization of clang over the existing hello world training program. I considered building all of LLVM instead of just libLLVMSupport, but there is only a marginal increase in performance for PGO only builds when training with a build of all of LLVM, and I didn't think it was enough to justify the increased build times given that it is the default configuration. The benchmark[1] I did showed that using libLLVMSupport for training gives a 1.35 +- 0.02 speed up for clang optimized with PGO + BOLT vs just 1.05 +- 0.01 speed up when training with hello world. For comparison, training with a full LLVM build gave a speed up of 1.35 +- 0.1. Raw data: | PGO Training | BOLT Training | Speed Up | Error Range | | ------------ | ------------- | -------- | ----------- | | LLVM Support | LLVM Support | 1.35 | 0.02 | | LLVM All | LLVM All | 1.34 | 0.01 | | LLVM Support | Hello World | 1.29 | 0.02 | | LLVM All | PGO-ONLY | 1.27 | 0.02 | | LLVM Support | PGO-ONLY | 1.22 | 0.02 | | Hello World | Hello World | 1.05 | 0.01 | | Hello World | PGO-ONLY | 1.03 | 0.01 | [1] Benchmark was compiling SemaDecl.cpp
@llvm/pr-subscribers-bolt @llvm/pr-subscribers-clang Author: Tom Stellard (tstellar) ChangesThis adds a build of the libLLVMSupport to the lit suite that is used for generating profile data. This helps to improve both PGO and BOLT optimization of clang over the existing hello world training program. I considered building all of LLVM instead of just libLLVMSupport, but there is only a marginal increase in performance for PGO only builds when training with a build of all of LLVM, and I didn't think it was enough to justify the increased build times given that it is the default configuration. The benchmark[1] I did showed that using libLLVMSupport for training gives a 1.35 +- 0.02 speed up for clang optimized with PGO + BOLT vs just 1.05 +- 0.01 speed up when training with hello world. For comparison, training with a full LLVM build gave a speed up of 1.35 +- 0.1. Raw data:
[1] Benchmark was compiling SemaDecl.cpp Full diff: https://github.com/llvm/llvm-project/pull/111625.diff 5 Files Affected:
diff --git a/clang/utils/perf-training/bolt.lit.cfg b/clang/utils/perf-training/bolt.lit.cfg
index 0e81a5501e9fcf..1d0cf9a8a17a8e 100644
--- a/clang/utils/perf-training/bolt.lit.cfg
+++ b/clang/utils/perf-training/bolt.lit.cfg
@@ -49,3 +49,6 @@ config.substitutions.append(("%clang_cpp", f" {config.clang} --driver-mode=g++ "
config.substitutions.append(("%clang_skip_driver", config.clang))
config.substitutions.append(("%clang", config.clang))
config.substitutions.append(("%test_root", config.test_exec_root))
+config.substitutions.append(('%cmake_generator', config.cmake_generator))
+config.substitutions.append(('%cmake', config.cmake_exe))
+config.substitutions.append(('%llvm_src_dir', config.llvm_src_dir))
diff --git a/clang/utils/perf-training/bolt.lit.site.cfg.in b/clang/utils/perf-training/bolt.lit.site.cfg.in
index 54de12701c1ae9..3de5026e4792ae 100644
--- a/clang/utils/perf-training/bolt.lit.site.cfg.in
+++ b/clang/utils/perf-training/bolt.lit.site.cfg.in
@@ -11,6 +11,9 @@ config.python_exe = "@Python3_EXECUTABLE@"
config.clang_obj_root = path(r"@CLANG_BINARY_DIR@")
config.clang_bolt_mode = "@CLANG_BOLT@"
config.clang_bolt_name = "@CLANG_BOLT_INSTRUMENTED@"
+config.cmake_exe = "@CMAKE_COMMAND@"
+config.llvm_src_dir ="@CMAKE_SOURCE_DIR@"
+config.cmake_generator ="@CMAKE_GENERATOR@"
# Let the main config do the real work.
lit_config.load_config(config, "@CLANG_SOURCE_DIR@/utils/perf-training/bolt.lit.cfg")
diff --git a/clang/utils/perf-training/lit.cfg b/clang/utils/perf-training/lit.cfg
index 0bd06c0d44f650..b4527c602fc484 100644
--- a/clang/utils/perf-training/lit.cfg
+++ b/clang/utils/perf-training/lit.cfg
@@ -34,8 +34,12 @@ config.test_format = lit.formats.ShTest(use_lit_shell == "0")
config.substitutions.append( ('%clang_cpp_skip_driver', ' %s %s %s ' % (cc1_wrapper, config.clang, sysroot_flags)))
config.substitutions.append( ('%clang_cpp', ' %s --driver-mode=g++ %s ' % (config.clang, sysroot_flags)))
config.substitutions.append( ('%clang_skip_driver', ' %s %s %s ' % (cc1_wrapper, config.clang, sysroot_flags)))
-config.substitutions.append( ('%clang', ' %s %s ' % (config.clang, sysroot_flags) ) )
+config.substitutions.append( ('%clang', '%s %s ' % (config.clang, sysroot_flags) ) )
config.substitutions.append( ('%test_root', config.test_exec_root ) )
+config.substitutions.append( ('%cmake_generator', config.cmake_generator ) )
+config.substitutions.append( ('%cmake', config.cmake_exe ) )
+config.substitutions.append( ('%llvm_src_dir', config.llvm_src_dir ) )
+print(config.substitutions)
config.environment['LLVM_PROFILE_FILE'] = 'perf-training-%4m.profraw'
diff --git a/clang/utils/perf-training/lit.site.cfg.in b/clang/utils/perf-training/lit.site.cfg.in
index fae93065a4edf2..9d279d552919ac 100644
--- a/clang/utils/perf-training/lit.site.cfg.in
+++ b/clang/utils/perf-training/lit.site.cfg.in
@@ -8,6 +8,9 @@ config.test_exec_root = "@CMAKE_CURRENT_BINARY_DIR@"
config.test_source_root = "@CLANG_PGO_TRAINING_DATA@"
config.target_triple = "@LLVM_TARGET_TRIPLE@"
config.python_exe = "@Python3_EXECUTABLE@"
+config.cmake_exe = "@CMAKE_COMMAND@"
+config.llvm_src_dir ="@CMAKE_SOURCE_DIR@"
+config.cmake_generator ="@CMAKE_GENERATOR@"
# Let the main config do the real work.
lit_config.load_config(config, "@CLANG_SOURCE_DIR@/utils/perf-training/lit.cfg")
diff --git a/clang/utils/perf-training/llvm-support/build.test b/clang/utils/perf-training/llvm-support/build.test
new file mode 100644
index 00000000000000..f29a594c846869
--- /dev/null
+++ b/clang/utils/perf-training/llvm-support/build.test
@@ -0,0 +1,2 @@
+RUN: %cmake -G %cmake_generator -B %t -S %llvm_src_dir -DCMAKE_C_COMPILER=%clang -DCMAKE_CXX_COMPILER=%clang -DCMAKE_CXX_FLAGS="--driver-mode=g++" -DCMAKE_BUILD_TYPE=Release
+RUN: %cmake --build %t -v --target LLVMSupport
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very interesting!
Thanks for taking the time to collect all the numbers. It definitely seems like collecting proper profiles for BOLT is something that we want to do, at least for the CI compiler, given the numbers here.
I'm assuming you used instrumented BOLT here? Also, do you have numbers on how long the perf training took?
As an aside, we have some stuff in the pipeline to bring bigger self-hosted runners to Github actions. Once we get that going, I'm hoping to simplify the CI compiler build (unify the builds into one stage), and we should have some extra build time to do things like do perf-training for BOLT on all of LLVM/libLLVMSupport.
Yes, it was instrumented BOLT. I've added the perf training times to the commit summary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Adding this to the default perf training I think makes quite a bit of sense given the numbers reported. Not sure what thoughts others have on that though.
Any other comments on this one? If not, I think I'll merge it after the dev meeting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it!
…lvm#111625) This adds a build of the libLLVMSupport to the lit suite that is used for generating profile data. This helps to improve both PGO and BOLT optimization of clang over the existing hello world training program. I considered building all of LLVM instead of just libLLVMSupport, but there is only a marginal increase in performance for PGO only builds when training with a build of all of LLVM, and I didn't think it was enough to justify the increased build times given that it is the default configuration. The benchmark[1] I did showed that using libLLVMSupport for training gives a 1.35 +- 0.02 speed up for clang optimized with PGO + BOLT vs just 1.05 +- 0.01 speed up when training with hello world. For comparison, training with a full LLVM build gave a speed up of 1.35 +- 0.1. Raw data: | PGO Training | BOLT Training | Speed Up | Error Range | | ------------ | ------------- | -------- | ----------- | | LLVM Support | LLVM Support | 1.35 | 0.02 | | LLVM All | LLVM All | 1.34 | 0.01 | | LLVM Support | Hello World | 1.29 | 0.02 | | LLVM All | PGO-ONLY | 1.27 | 0.02 | | LLVM Support | PGO-ONLY | 1.22 | 0.02 | | Hello World | Hello World | 1.05 | 0.01 | | Hello World | PGO-ONLY | 1.03 | 0.01 | Time it takes to generate profile data (on a 64-core system): | Training Data | PGO | BOLT | | ------------- | ----- | ----- | | LLVM All | 1090s | 3239s | | LLVM Support | 91s | 655s | | Hello World | 2s | 9s | [1] Benchmark was compiling SemaDecl.cpp
This adds a build of the libLLVMSupport to the lit suite that is used for generating profile data. This helps to improve both PGO and BOLT optimization of clang over the existing hello world training program.
I considered building all of LLVM instead of just libLLVMSupport, but there is only a marginal increase in performance for PGO only builds when training with a build of all of LLVM, and I didn't think it was enough to justify the increased build times given that it is the default configuration.
The benchmark[1] I did showed that using libLLVMSupport for training gives a 1.35 +- 0.02 speed up for clang optimized with PGO + BOLT vs just 1.05 +- 0.01 speed up when training with hello world.
For comparison, training with a full LLVM build gave a speed up of 1.35 +- 0.1.
Raw data:
Time it takes to generate profile data (on a 64-core system):
[1] Benchmark was compiling SemaDecl.cpp