Skip to content

[Clang][perf-training] Do build of libLLVMSupport for perf training #111625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 8, 2024

Conversation

tstellar
Copy link
Collaborator

@tstellar tstellar commented Oct 9, 2024

This adds a build of the libLLVMSupport to the lit suite that is used for generating profile data. This helps to improve both PGO and BOLT optimization of clang over the existing hello world training program.

I considered building all of LLVM instead of just libLLVMSupport, but there is only a marginal increase in performance for PGO only builds when training with a build of all of LLVM, and I didn't think it was enough to justify the increased build times given that it is the default configuration.

The benchmark[1] I did showed that using libLLVMSupport for training gives a 1.35 +- 0.02 speed up for clang optimized with PGO + BOLT vs just 1.05 +- 0.01 speed up when training with hello world.

For comparison, training with a full LLVM build gave a speed up of 1.35 +- 0.1.

Raw data:

PGO Training BOLT Training Speed Up Error Range
LLVM Support LLVM Support 1.35 0.02
LLVM All LLVM All 1.34 0.01
LLVM Support Hello World 1.29 0.02
LLVM All PGO-ONLY 1.27 0.02
LLVM Support PGO-ONLY 1.22 0.02
Hello World Hello World 1.05 0.01
Hello World PGO-ONLY 1.03 0.01

Time it takes to generate profile data (on a 64-core system):

Training Data PGO BOLT
LLVM All 1090s 3239s
LLVM Support 91s 655s
Hello World 2s 9s

[1] Benchmark was compiling SemaDecl.cpp

This adds a build of the libLLVMSupport to the lit suite that is used
for generating profile data.  This helps to improve both PGO and BOLT
optimization of clang over the existing hello world training program.

I considered building all of LLVM instead of just libLLVMSupport, but
there is only a marginal increase in performance for PGO only builds
when training with a build of all of LLVM, and I didn't think it was
enough to justify the increased build times given that it is the default
configuration.

The benchmark[1] I did showed that using libLLVMSupport for training
gives a 1.35 +- 0.02 speed up for clang optimized with PGO + BOLT
vs just 1.05 +- 0.01 speed up when training with hello world.

For comparison, training with a full LLVM build gave a speed up of
1.35 +- 0.1.

Raw data:

| PGO Training | BOLT Training | Speed Up | Error Range |
| ------------ | ------------- | -------- | ----------- |
| LLVM Support | LLVM Support  | 1.35     | 0.02        |
| LLVM All     | LLVM All      | 1.34     | 0.01        |
| LLVM Support | Hello World   | 1.29     | 0.02        |
| LLVM All     | PGO-ONLY      | 1.27     | 0.02        |
| LLVM Support | PGO-ONLY      | 1.22     | 0.02        |
| Hello World  | Hello World   | 1.05     | 0.01        |
| Hello World  | PGO-ONLY      | 1.03     | 0.01        |

[1] Benchmark was compiling SemaDecl.cpp
@tstellar tstellar requested review from aaupov and llvm-beanz October 9, 2024 03:59
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Oct 9, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 9, 2024

@llvm/pr-subscribers-bolt

@llvm/pr-subscribers-clang

Author: Tom Stellard (tstellar)

Changes

This adds a build of the libLLVMSupport to the lit suite that is used for generating profile data. This helps to improve both PGO and BOLT optimization of clang over the existing hello world training program.

I considered building all of LLVM instead of just libLLVMSupport, but there is only a marginal increase in performance for PGO only builds when training with a build of all of LLVM, and I didn't think it was enough to justify the increased build times given that it is the default configuration.

The benchmark[1] I did showed that using libLLVMSupport for training gives a 1.35 +- 0.02 speed up for clang optimized with PGO + BOLT vs just 1.05 +- 0.01 speed up when training with hello world.

For comparison, training with a full LLVM build gave a speed up of 1.35 +- 0.1.

Raw data:

PGO Training BOLT Training Speed Up Error Range
LLVM Support LLVM Support 1.35 0.02
LLVM All LLVM All 1.34 0.01
LLVM Support Hello World 1.29 0.02
LLVM All PGO-ONLY 1.27 0.02
LLVM Support PGO-ONLY 1.22 0.02
Hello World Hello World 1.05 0.01
Hello World PGO-ONLY 1.03 0.01

[1] Benchmark was compiling SemaDecl.cpp


Full diff: https://github.com/llvm/llvm-project/pull/111625.diff

5 Files Affected:

  • (modified) clang/utils/perf-training/bolt.lit.cfg (+3)
  • (modified) clang/utils/perf-training/bolt.lit.site.cfg.in (+3)
  • (modified) clang/utils/perf-training/lit.cfg (+5-1)
  • (modified) clang/utils/perf-training/lit.site.cfg.in (+3)
  • (added) clang/utils/perf-training/llvm-support/build.test (+2)
diff --git a/clang/utils/perf-training/bolt.lit.cfg b/clang/utils/perf-training/bolt.lit.cfg
index 0e81a5501e9fcf..1d0cf9a8a17a8e 100644
--- a/clang/utils/perf-training/bolt.lit.cfg
+++ b/clang/utils/perf-training/bolt.lit.cfg
@@ -49,3 +49,6 @@ config.substitutions.append(("%clang_cpp", f" {config.clang} --driver-mode=g++ "
 config.substitutions.append(("%clang_skip_driver", config.clang))
 config.substitutions.append(("%clang", config.clang))
 config.substitutions.append(("%test_root", config.test_exec_root))
+config.substitutions.append(('%cmake_generator', config.cmake_generator))
+config.substitutions.append(('%cmake', config.cmake_exe))
+config.substitutions.append(('%llvm_src_dir', config.llvm_src_dir))
diff --git a/clang/utils/perf-training/bolt.lit.site.cfg.in b/clang/utils/perf-training/bolt.lit.site.cfg.in
index 54de12701c1ae9..3de5026e4792ae 100644
--- a/clang/utils/perf-training/bolt.lit.site.cfg.in
+++ b/clang/utils/perf-training/bolt.lit.site.cfg.in
@@ -11,6 +11,9 @@ config.python_exe = "@Python3_EXECUTABLE@"
 config.clang_obj_root = path(r"@CLANG_BINARY_DIR@")
 config.clang_bolt_mode = "@CLANG_BOLT@"
 config.clang_bolt_name = "@CLANG_BOLT_INSTRUMENTED@"
+config.cmake_exe = "@CMAKE_COMMAND@"
+config.llvm_src_dir ="@CMAKE_SOURCE_DIR@"
+config.cmake_generator ="@CMAKE_GENERATOR@"
 
 # Let the main config do the real work.
 lit_config.load_config(config, "@CLANG_SOURCE_DIR@/utils/perf-training/bolt.lit.cfg")
diff --git a/clang/utils/perf-training/lit.cfg b/clang/utils/perf-training/lit.cfg
index 0bd06c0d44f650..b4527c602fc484 100644
--- a/clang/utils/perf-training/lit.cfg
+++ b/clang/utils/perf-training/lit.cfg
@@ -34,8 +34,12 @@ config.test_format = lit.formats.ShTest(use_lit_shell == "0")
 config.substitutions.append( ('%clang_cpp_skip_driver', ' %s %s %s ' % (cc1_wrapper, config.clang, sysroot_flags)))
 config.substitutions.append( ('%clang_cpp', ' %s --driver-mode=g++ %s ' % (config.clang, sysroot_flags)))
 config.substitutions.append( ('%clang_skip_driver', ' %s %s %s ' % (cc1_wrapper, config.clang, sysroot_flags)))
-config.substitutions.append( ('%clang', ' %s %s ' % (config.clang, sysroot_flags) ) )
+config.substitutions.append( ('%clang', '%s %s ' % (config.clang, sysroot_flags) ) )
 config.substitutions.append( ('%test_root', config.test_exec_root ) )
+config.substitutions.append( ('%cmake_generator', config.cmake_generator ) )
+config.substitutions.append( ('%cmake', config.cmake_exe ) )
+config.substitutions.append( ('%llvm_src_dir', config.llvm_src_dir ) )
 
+print(config.substitutions)
 config.environment['LLVM_PROFILE_FILE'] = 'perf-training-%4m.profraw'
 
diff --git a/clang/utils/perf-training/lit.site.cfg.in b/clang/utils/perf-training/lit.site.cfg.in
index fae93065a4edf2..9d279d552919ac 100644
--- a/clang/utils/perf-training/lit.site.cfg.in
+++ b/clang/utils/perf-training/lit.site.cfg.in
@@ -8,6 +8,9 @@ config.test_exec_root = "@CMAKE_CURRENT_BINARY_DIR@"
 config.test_source_root = "@CLANG_PGO_TRAINING_DATA@"
 config.target_triple = "@LLVM_TARGET_TRIPLE@"
 config.python_exe = "@Python3_EXECUTABLE@"
+config.cmake_exe = "@CMAKE_COMMAND@"
+config.llvm_src_dir ="@CMAKE_SOURCE_DIR@"
+config.cmake_generator ="@CMAKE_GENERATOR@"
 
 # Let the main config do the real work.
 lit_config.load_config(config, "@CLANG_SOURCE_DIR@/utils/perf-training/lit.cfg")
diff --git a/clang/utils/perf-training/llvm-support/build.test b/clang/utils/perf-training/llvm-support/build.test
new file mode 100644
index 00000000000000..f29a594c846869
--- /dev/null
+++ b/clang/utils/perf-training/llvm-support/build.test
@@ -0,0 +1,2 @@
+RUN: %cmake -G %cmake_generator -B %t -S %llvm_src_dir -DCMAKE_C_COMPILER=%clang -DCMAKE_CXX_COMPILER=%clang -DCMAKE_CXX_FLAGS="--driver-mode=g++" -DCMAKE_BUILD_TYPE=Release
+RUN: %cmake --build %t -v --target LLVMSupport

@tstellar tstellar added the BOLT label Oct 9, 2024
Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very interesting!

Thanks for taking the time to collect all the numbers. It definitely seems like collecting proper profiles for BOLT is something that we want to do, at least for the CI compiler, given the numbers here.

I'm assuming you used instrumented BOLT here? Also, do you have numbers on how long the perf training took?

As an aside, we have some stuff in the pipeline to bring bigger self-hosted runners to Github actions. Once we get that going, I'm hoping to simplify the CI compiler build (unify the builds into one stage), and we should have some extra build time to do things like do perf-training for BOLT on all of LLVM/libLLVMSupport.

@tstellar
Copy link
Collaborator Author

tstellar commented Oct 9, 2024

I'm assuming you used instrumented BOLT here? Also, do you have numbers on how long the perf training took?

Yes, it was instrumented BOLT. I've added the perf training times to the commit summary.

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Adding this to the default perf training I think makes quite a bit of sense given the numbers reported. Not sure what thoughts others have on that though.

@tstellar
Copy link
Collaborator Author

Any other comments on this one? If not, I think I'll merge it after the dev meeting.

Copy link
Collaborator

@llvm-beanz llvm-beanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

@tstellar tstellar merged commit 7382509 into llvm:main Nov 8, 2024
8 checks passed
Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024
…lvm#111625)

This adds a build of the libLLVMSupport to the lit suite that is used
for generating profile data. This helps to improve both PGO and BOLT
optimization of clang over the existing hello world training program.

I considered building all of LLVM instead of just libLLVMSupport, but
there is only a marginal increase in performance for PGO only builds
when training with a build of all of LLVM, and I didn't think it was
enough to justify the increased build times given that it is the default
configuration.

The benchmark[1] I did showed that using libLLVMSupport for training
gives a 1.35 +- 0.02 speed up for clang optimized with PGO + BOLT vs
just 1.05 +- 0.01 speed up when training with hello world.

For comparison, training with a full LLVM build gave a speed up of 1.35
+- 0.1.

Raw data:

| PGO Training | BOLT Training | Speed Up | Error Range |
| ------------ | ------------- | -------- | ----------- |
| LLVM Support | LLVM Support  | 1.35     | 0.02        |
| LLVM All     | LLVM All      | 1.34     | 0.01        |
| LLVM Support | Hello World   | 1.29     | 0.02        |
| LLVM All     | PGO-ONLY      | 1.27     | 0.02        |
| LLVM Support | PGO-ONLY      | 1.22     | 0.02        |
| Hello World  | Hello World   | 1.05     | 0.01        |
| Hello World  | PGO-ONLY      | 1.03     | 0.01        |

Time it takes to generate profile data (on a 64-core system):

| Training Data | PGO   | BOLT  |
| ------------- | ----- | ----- |
| LLVM All      | 1090s | 3239s |
| LLVM Support  |   91s |  655s |
| Hello World  |    2s |    9s |

[1] Benchmark was compiling SemaDecl.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BOLT clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants