[Clang][perf-training] Do build of libLLVMSupport for perf training #111625

tstellar · 2024-10-09T03:59:47Z

This adds a build of the libLLVMSupport to the lit suite that is used for generating profile data. This helps to improve both PGO and BOLT optimization of clang over the existing hello world training program.

I considered building all of LLVM instead of just libLLVMSupport, but there is only a marginal increase in performance for PGO only builds when training with a build of all of LLVM, and I didn't think it was enough to justify the increased build times given that it is the default configuration.

The benchmark[1] I did showed that using libLLVMSupport for training gives a 1.35 +- 0.02 speed up for clang optimized with PGO + BOLT vs just 1.05 +- 0.01 speed up when training with hello world.

For comparison, training with a full LLVM build gave a speed up of 1.35 +- 0.1.

Raw data:

PGO Training	BOLT Training	Speed Up	Error Range
LLVM Support	LLVM Support	1.35	0.02
LLVM All	LLVM All	1.34	0.01
LLVM Support	Hello World	1.29	0.02
LLVM All	PGO-ONLY	1.27	0.02
LLVM Support	PGO-ONLY	1.22	0.02
Hello World	Hello World	1.05	0.01
Hello World	PGO-ONLY	1.03	0.01

Time it takes to generate profile data (on a 64-core system):

Training Data	PGO	BOLT
LLVM All	1090s	3239s
LLVM Support	91s	655s
Hello World	2s	9s

[1] Benchmark was compiling SemaDecl.cpp

This adds a build of the libLLVMSupport to the lit suite that is used for generating profile data. This helps to improve both PGO and BOLT optimization of clang over the existing hello world training program. I considered building all of LLVM instead of just libLLVMSupport, but there is only a marginal increase in performance for PGO only builds when training with a build of all of LLVM, and I didn't think it was enough to justify the increased build times given that it is the default configuration. The benchmark[1] I did showed that using libLLVMSupport for training gives a 1.35 +- 0.02 speed up for clang optimized with PGO + BOLT vs just 1.05 +- 0.01 speed up when training with hello world. For comparison, training with a full LLVM build gave a speed up of 1.35 +- 0.1. Raw data: | PGO Training | BOLT Training | Speed Up | Error Range | | ------------ | ------------- | -------- | ----------- | | LLVM Support | LLVM Support | 1.35 | 0.02 | | LLVM All | LLVM All | 1.34 | 0.01 | | LLVM Support | Hello World | 1.29 | 0.02 | | LLVM All | PGO-ONLY | 1.27 | 0.02 | | LLVM Support | PGO-ONLY | 1.22 | 0.02 | | Hello World | Hello World | 1.05 | 0.01 | | Hello World | PGO-ONLY | 1.03 | 0.01 | [1] Benchmark was compiling SemaDecl.cpp

llvmbot · 2024-10-09T04:00:18Z

@llvm/pr-subscribers-bolt

@llvm/pr-subscribers-clang

Author: Tom Stellard (tstellar)

Changes

This adds a build of the libLLVMSupport to the lit suite that is used for generating profile data. This helps to improve both PGO and BOLT optimization of clang over the existing hello world training program.

I considered building all of LLVM instead of just libLLVMSupport, but there is only a marginal increase in performance for PGO only builds when training with a build of all of LLVM, and I didn't think it was enough to justify the increased build times given that it is the default configuration.

The benchmark[1] I did showed that using libLLVMSupport for training gives a 1.35 +- 0.02 speed up for clang optimized with PGO + BOLT vs just 1.05 +- 0.01 speed up when training with hello world.

For comparison, training with a full LLVM build gave a speed up of 1.35 +- 0.1.

Raw data:

PGO Training	BOLT Training	Speed Up	Error Range
LLVM Support	LLVM Support	1.35	0.02
LLVM All	LLVM All	1.34	0.01
LLVM Support	Hello World	1.29	0.02
LLVM All	PGO-ONLY	1.27	0.02
LLVM Support	PGO-ONLY	1.22	0.02
Hello World	Hello World	1.05	0.01
Hello World	PGO-ONLY	1.03	0.01

[1] Benchmark was compiling SemaDecl.cpp

Full diff: https://github.com/llvm/llvm-project/pull/111625.diff

5 Files Affected:

(modified) clang/utils/perf-training/bolt.lit.cfg (+3)
(modified) clang/utils/perf-training/bolt.lit.site.cfg.in (+3)
(modified) clang/utils/perf-training/lit.cfg (+5-1)
(modified) clang/utils/perf-training/lit.site.cfg.in (+3)
(added) clang/utils/perf-training/llvm-support/build.test (+2)

diff --git a/clang/utils/perf-training/bolt.lit.cfg b/clang/utils/perf-training/bolt.lit.cfg
index 0e81a5501e9fcf..1d0cf9a8a17a8e 100644
--- a/clang/utils/perf-training/bolt.lit.cfg
+++ b/clang/utils/perf-training/bolt.lit.cfg
@@ -49,3 +49,6 @@ config.substitutions.append(("%clang_cpp", f" {config.clang} --driver-mode=g++ "
 config.substitutions.append(("%clang_skip_driver", config.clang))
 config.substitutions.append(("%clang", config.clang))
 config.substitutions.append(("%test_root", config.test_exec_root))
+config.substitutions.append(('%cmake_generator', config.cmake_generator))
+config.substitutions.append(('%cmake', config.cmake_exe))
+config.substitutions.append(('%llvm_src_dir', config.llvm_src_dir))
diff --git a/clang/utils/perf-training/bolt.lit.site.cfg.in b/clang/utils/perf-training/bolt.lit.site.cfg.in
index 54de12701c1ae9..3de5026e4792ae 100644
--- a/clang/utils/perf-training/bolt.lit.site.cfg.in
+++ b/clang/utils/perf-training/bolt.lit.site.cfg.in
@@ -11,6 +11,9 @@ config.python_exe = "@Python3_EXECUTABLE@"
 config.clang_obj_root = path(r"@CLANG_BINARY_DIR@")
 config.clang_bolt_mode = "@CLANG_BOLT@"
 config.clang_bolt_name = "@CLANG_BOLT_INSTRUMENTED@"
+config.cmake_exe = "@CMAKE_COMMAND@"
+config.llvm_src_dir ="@CMAKE_SOURCE_DIR@"
+config.cmake_generator ="@CMAKE_GENERATOR@"
 
 # Let the main config do the real work.
 lit_config.load_config(config, "@CLANG_SOURCE_DIR@/utils/perf-training/bolt.lit.cfg")
diff --git a/clang/utils/perf-training/lit.cfg b/clang/utils/perf-training/lit.cfg
index 0bd06c0d44f650..b4527c602fc484 100644
--- a/clang/utils/perf-training/lit.cfg
+++ b/clang/utils/perf-training/lit.cfg
@@ -34,8 +34,12 @@ config.test_format = lit.formats.ShTest(use_lit_shell == "0")
 config.substitutions.append( ('%clang_cpp_skip_driver', ' %s %s %s ' % (cc1_wrapper, config.clang, sysroot_flags)))
 config.substitutions.append( ('%clang_cpp', ' %s --driver-mode=g++ %s ' % (config.clang, sysroot_flags)))
 config.substitutions.append( ('%clang_skip_driver', ' %s %s %s ' % (cc1_wrapper, config.clang, sysroot_flags)))
-config.substitutions.append( ('%clang', ' %s %s ' % (config.clang, sysroot_flags) ) )
+config.substitutions.append( ('%clang', '%s %s ' % (config.clang, sysroot_flags) ) )
 config.substitutions.append( ('%test_root', config.test_exec_root ) )
+config.substitutions.append( ('%cmake_generator', config.cmake_generator ) )
+config.substitutions.append( ('%cmake', config.cmake_exe ) )
+config.substitutions.append( ('%llvm_src_dir', config.llvm_src_dir ) )
 
+print(config.substitutions)
 config.environment['LLVM_PROFILE_FILE'] = 'perf-training-%4m.profraw'
 
diff --git a/clang/utils/perf-training/lit.site.cfg.in b/clang/utils/perf-training/lit.site.cfg.in
index fae93065a4edf2..9d279d552919ac 100644
--- a/clang/utils/perf-training/lit.site.cfg.in
+++ b/clang/utils/perf-training/lit.site.cfg.in
@@ -8,6 +8,9 @@ config.test_exec_root = "@CMAKE_CURRENT_BINARY_DIR@"
 config.test_source_root = "@CLANG_PGO_TRAINING_DATA@"
 config.target_triple = "@LLVM_TARGET_TRIPLE@"
 config.python_exe = "@Python3_EXECUTABLE@"
+config.cmake_exe = "@CMAKE_COMMAND@"
+config.llvm_src_dir ="@CMAKE_SOURCE_DIR@"
+config.cmake_generator ="@CMAKE_GENERATOR@"
 
 # Let the main config do the real work.
 lit_config.load_config(config, "@CLANG_SOURCE_DIR@/utils/perf-training/lit.cfg")
diff --git a/clang/utils/perf-training/llvm-support/build.test b/clang/utils/perf-training/llvm-support/build.test
new file mode 100644
index 00000000000000..f29a594c846869
--- /dev/null
+++ b/clang/utils/perf-training/llvm-support/build.test
@@ -0,0 +1,2 @@
+RUN: %cmake -G %cmake_generator -B %t -S %llvm_src_dir -DCMAKE_C_COMPILER=%clang -DCMAKE_CXX_COMPILER=%clang -DCMAKE_CXX_FLAGS="--driver-mode=g++" -DCMAKE_BUILD_TYPE=Release
+RUN: %cmake --build %t -v --target LLVMSupport

boomanaiden154

This looks very interesting!

Thanks for taking the time to collect all the numbers. It definitely seems like collecting proper profiles for BOLT is something that we want to do, at least for the CI compiler, given the numbers here.

I'm assuming you used instrumented BOLT here? Also, do you have numbers on how long the perf training took?

As an aside, we have some stuff in the pipeline to bring bigger self-hosted runners to Github actions. Once we get that going, I'm hoping to simplify the CI compiler build (unify the builds into one stage), and we should have some extra build time to do things like do perf-training for BOLT on all of LLVM/libLLVMSupport.

clang/utils/perf-training/lit.cfg

clang/utils/perf-training/llvm-support/build.test

tstellar · 2024-10-09T21:51:10Z

I'm assuming you used instrumented BOLT here? Also, do you have numbers on how long the perf training took?

Yes, it was instrumented BOLT. I've added the perf training times to the commit summary.

boomanaiden154

LGTM.

Adding this to the default perf training I think makes quite a bit of sense given the numbers reported. Not sure what thoughts others have on that though.

tstellar · 2024-10-21T21:44:31Z

Any other comments on this one? If not, I think I'll merge it after the dev meeting.

llvm-beanz

Love it!

…lvm#111625) This adds a build of the libLLVMSupport to the lit suite that is used for generating profile data. This helps to improve both PGO and BOLT optimization of clang over the existing hello world training program. I considered building all of LLVM instead of just libLLVMSupport, but there is only a marginal increase in performance for PGO only builds when training with a build of all of LLVM, and I didn't think it was enough to justify the increased build times given that it is the default configuration. The benchmark[1] I did showed that using libLLVMSupport for training gives a 1.35 +- 0.02 speed up for clang optimized with PGO + BOLT vs just 1.05 +- 0.01 speed up when training with hello world. For comparison, training with a full LLVM build gave a speed up of 1.35 +- 0.1. Raw data: | PGO Training | BOLT Training | Speed Up | Error Range | | ------------ | ------------- | -------- | ----------- | | LLVM Support | LLVM Support | 1.35 | 0.02 | | LLVM All | LLVM All | 1.34 | 0.01 | | LLVM Support | Hello World | 1.29 | 0.02 | | LLVM All | PGO-ONLY | 1.27 | 0.02 | | LLVM Support | PGO-ONLY | 1.22 | 0.02 | | Hello World | Hello World | 1.05 | 0.01 | | Hello World | PGO-ONLY | 1.03 | 0.01 | Time it takes to generate profile data (on a 64-core system): | Training Data | PGO | BOLT | | ------------- | ----- | ----- | | LLVM All | 1090s | 3239s | | LLVM Support | 91s | 655s | | Hello World | 2s | 9s | [1] Benchmark was compiling SemaDecl.cpp

tstellar requested review from aaupov and llvm-beanz October 9, 2024 03:59

llvmbot added the clang Clang issues not falling into any other category label Oct 9, 2024

tstellar added the BOLT label Oct 9, 2024

boomanaiden154 reviewed Oct 9, 2024

View reviewed changes

clang/utils/perf-training/lit.cfg Outdated Show resolved Hide resolved

clang/utils/perf-training/llvm-support/build.test Show resolved Hide resolved

Remove debug statement

37919a7

boomanaiden154 approved these changes Oct 19, 2024

View reviewed changes

llvm-beanz approved these changes Nov 8, 2024

View reviewed changes

tstellar merged commit 7382509 into llvm:main Nov 8, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Clang][perf-training] Do build of libLLVMSupport for perf training #111625

[Clang][perf-training] Do build of libLLVMSupport for perf training #111625

Uh oh!

tstellar commented Oct 9, 2024 •

edited

Loading

Uh oh!

llvmbot commented Oct 9, 2024 •

edited

Loading

Uh oh!

boomanaiden154 left a comment

Uh oh!

Uh oh!

Uh oh!

tstellar commented Oct 9, 2024

Uh oh!

boomanaiden154 left a comment

Uh oh!

tstellar commented Oct 21, 2024

Uh oh!

llvm-beanz left a comment

Uh oh!

Uh oh!

Uh oh!

[Clang][perf-training] Do build of libLLVMSupport for perf training #111625

[Clang][perf-training] Do build of libLLVMSupport for perf training #111625

Uh oh!

Conversation

tstellar commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

boomanaiden154 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tstellar commented Oct 9, 2024

Uh oh!

boomanaiden154 left a comment

Choose a reason for hiding this comment

Uh oh!

tstellar commented Oct 21, 2024

Uh oh!

llvm-beanz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tstellar commented Oct 9, 2024 •

edited

Loading

llvmbot commented Oct 9, 2024 •

edited

Loading