Skip to content

Commit f20a633

Browse files
xur-llvmAvenger-285714
authored andcommitted
kbuild: Add Propeller configuration for kernel build
[ Upstream commit d5dc958 ] Add the build support for using Clang's Propeller optimizer. Like AutoFDO, Propeller uses hardware sampling to gather information about the frequency of execution of different code paths within a binary. This information is then used to guide the compiler's optimization decisions, resulting in a more efficient binary. The support requires a Clang compiler LLVM 19 or later, and the create_llvm_prof tool (https://github.com/google/autofdo/releases/tag/v0.30.1). This commit is limited to x86 platforms that support PMU features like LBR on Intel machines and AMD Zen3 BRS. Here is an example workflow for building an AutoFDO+Propeller optimized kernel: 1) Build the kernel on the host machine, with AutoFDO and Propeller build config CONFIG_AUTOFDO_CLANG=y CONFIG_PROPELLER_CLANG=y then $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> “<autofdo_profile>” is the profile collected when doing a non-Propeller AutoFDO build. This step builds a kernel that has the same optimization level as AutoFDO, plus a metadata section that records basic block information. This kernel image runs as fast as an AutoFDO optimized kernel. 2) Install the kernel on test/production machines. 3) Run the load tests. The '-c' option in perf specifies the sample event period. We suggest using a suitable prime number, like 500009, for this purpose. For Intel platforms: $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \ -o <perf_file> -- <loadtest> For AMD platforms: The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2 # To see if Zen3 support LBR: $ cat proc/cpuinfo | grep " brs" # To see if Zen4 support LBR: $ cat proc/cpuinfo | grep amd_lbr_v2 # If the result is yes, then collect the profile using: $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \ -N -b -c <count> -o <perf_file> -- <loadtest> 4) (Optional) Download the raw perf file to the host machine. 5) Generate Propeller profile: $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \ --format=propeller --propeller_output_module_name \ --out=<propeller_profile_prefix>_cc_profile.txt \ --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt “create_llvm_prof” is the profile conversion tool, and a prebuilt binary for linux can be found on https://github.com/google/autofdo/releases/tag/v0.30.1 (can also build from source). "<propeller_profile_prefix>" can be something like "/home/user/dir/any_string". This command generates a pair of Propeller profiles: "<propeller_profile_prefix>_cc_profile.txt" and "<propeller_profile_prefix>_ld_profile.txt". 6) Rebuild the kernel using the AutoFDO and Propeller profile files. CONFIG_AUTOFDO_CLANG=y CONFIG_PROPELLER_CLANG=y and $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> \ CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix> Co-developed-by: Han Shen <[email protected]> Signed-off-by: Han Shen <[email protected]> Signed-off-by: Rong Xu <[email protected]> Suggested-by: Sriraman Tallam <[email protected]> Suggested-by: Krzysztof Pszeniczny <[email protected]> Suggested-by: Nick Desaulniers <[email protected]> Suggested-by: Stephane Eranian <[email protected]> Tested-by: Yonghong Song <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Reviewed-by: Kees Cook <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]> [ Backport from v6.13 ] Signed-off-by: WangYuli <[email protected]>
1 parent ada3544 commit f20a633

File tree

11 files changed

+237
-3
lines changed

11 files changed

+237
-3
lines changed

Documentation/dev-tools/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ Documentation/dev-tools/testing-overview.rst
3535
kunit/index
3636
ktap
3737
autofdo
38+
propeller
3839

3940

4041
.. only:: subproject and html

Documentation/dev-tools/propeller.rst

Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
=====================================
4+
Using Propeller with the Linux kernel
5+
=====================================
6+
7+
This enables Propeller build support for the kernel when using Clang
8+
compiler. Propeller is a profile-guided optimization (PGO) method used
9+
to optimize binary executables. Like AutoFDO, it utilizes hardware
10+
sampling to gather information about the frequency of execution of
11+
different code paths within a binary. Unlike AutoFDO, this information
12+
is then used right before linking phase to optimize (among others)
13+
block layout within and across functions.
14+
15+
A few important notes about adopting Propeller optimization:
16+
17+
#. Although it can be used as a standalone optimization step, it is
18+
strongly recommended to apply Propeller on top of AutoFDO,
19+
AutoFDO+ThinLTO or Instrument FDO. The rest of this document
20+
assumes this paradigm.
21+
22+
#. Propeller uses another round of profiling on top of
23+
AutoFDO/AutoFDO+ThinLTO/iFDO. The whole build process involves
24+
"build-afdo - train-afdo - build-propeller - train-propeller -
25+
build-optimized".
26+
27+
#. Propeller requires LLVM 19 release or later for Clang/Clang++
28+
and the linker(ld.lld).
29+
30+
#. In addition to LLVM toolchain, Propeller requires a profiling
31+
conversion tool: https://github.com/google/autofdo with a release
32+
after v0.30.1: https://github.com/google/autofdo/releases/tag/v0.30.1.
33+
34+
The Propeller optimization process involves the following steps:
35+
36+
#. Initial building: Build the AutoFDO or AutoFDO+ThinLTO binary as
37+
you would normally do, but with a set of compile-time / link-time
38+
flags, so that a special metadata section is created within the
39+
kernel binary. The special section is only intend to be used by the
40+
profiling tool, it is not part of the runtime image, nor does it
41+
change kernel run time text sections.
42+
43+
#. Profiling: The above kernel is then run with a representative
44+
workload to gather execution frequency data. This data is collected
45+
using hardware sampling, via perf. Propeller is most effective on
46+
platforms supporting advanced PMU features like LBR on Intel
47+
machines. This step is the same as profiling the kernel for AutoFDO
48+
(the exact perf parameters can be different).
49+
50+
#. Propeller profile generation: Perf output file is converted to a
51+
pair of Propeller profiles via an offline tool.
52+
53+
#. Optimized build: Build the AutoFDO or AutoFDO+ThinLTO optimized
54+
binary as you would normally do, but with a compile-time /
55+
link-time flag to pick up the Propeller compile time and link time
56+
profiles. This build step uses 3 profiles - the AutoFDO profile,
57+
the Propeller compile-time profile and the Propeller link-time
58+
profile.
59+
60+
#. Deployment: The optimized kernel binary is deployed and used
61+
in production environments, providing improved performance
62+
and reduced latency.
63+
64+
Preparation
65+
===========
66+
67+
Configure the kernel with::
68+
69+
CONFIG_AUTOFDO_CLANG=y
70+
CONFIG_PROPELLER_CLANG=y
71+
72+
Customization
73+
=============
74+
75+
The default CONFIG_PROPELLER_CLANG setting covers kernel space objects
76+
for Propeller builds. One can, however, enable or disable Propeller build
77+
for individual files and directories by adding a line similar to the
78+
following to the respective kernel Makefile:
79+
80+
- For enabling a single file (e.g. foo.o)::
81+
82+
PROPELLER_PROFILE_foo.o := y
83+
84+
- For enabling all files in one directory::
85+
86+
PROPELLER_PROFILE := y
87+
88+
- For disabling one file::
89+
90+
PROPELLER_PROFILE_foo.o := n
91+
92+
- For disabling all files in one directory::
93+
94+
PROPELLER__PROFILE := n
95+
96+
97+
Workflow
98+
========
99+
100+
Here is an example workflow for building an AutoFDO+Propeller kernel:
101+
102+
1) Assuming an AutoFDO profile is already collected following
103+
instructions in the AutoFDO document, build the kernel on the host
104+
machine, with AutoFDO and Propeller build configs ::
105+
106+
CONFIG_AUTOFDO_CLANG=y
107+
CONFIG_PROPELLER_CLANG=y
108+
109+
and ::
110+
111+
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo-profile-name>
112+
113+
2) Install the kernel on the test machine.
114+
115+
3) Run the load tests. The '-c' option in perf specifies the sample
116+
event period. We suggest using a suitable prime number, like 500009,
117+
for this purpose.
118+
119+
- For Intel platforms::
120+
121+
$ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
122+
123+
- For AMD platforms::
124+
125+
$ perf record --pfm-event RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
126+
127+
Note you can repeat the above steps to collect multiple <perf_file>s.
128+
129+
4) (Optional) Download the raw perf file(s) to the host machine.
130+
131+
5) Use the create_llvm_prof tool (https://github.com/google/autofdo) to
132+
generate Propeller profile. ::
133+
134+
$ create_llvm_prof --binary=<vmlinux> --profile=<perf_file>
135+
--format=propeller --propeller_output_module_name
136+
--out=<propeller_profile_prefix>_cc_profile.txt
137+
--propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
138+
139+
"<propeller_profile_prefix>" can be something like "/home/user/dir/any_string".
140+
141+
This command generates a pair of Propeller profiles:
142+
"<propeller_profile_prefix>_cc_profile.txt" and
143+
"<propeller_profile_prefix>_ld_profile.txt".
144+
145+
If there are more than 1 perf_file collected in the previous step,
146+
you can create a temp list file "<perf_file_list>" with each line
147+
containing one perf file name and run::
148+
149+
$ create_llvm_prof --binary=<vmlinux> --profile=@<perf_file_list>
150+
--format=propeller --propeller_output_module_name
151+
--out=<propeller_profile_prefix>_cc_profile.txt
152+
--propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
153+
154+
6) Rebuild the kernel using the AutoFDO and Propeller
155+
profiles. ::
156+
157+
CONFIG_AUTOFDO_CLANG=y
158+
CONFIG_PROPELLER_CLANG=y
159+
160+
and ::
161+
162+
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix>

MAINTAINERS

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17257,6 +17257,13 @@ S: Maintained
1725717257
F: include/linux/psi*
1725817258
F: kernel/sched/psi.c
1725917259

17260+
PROPELLER BUILD
17261+
M: Rong Xu <[email protected]>
17262+
M: Han Shen <[email protected]>
17263+
S: Supported
17264+
F: Documentation/dev-tools/propeller.rst
17265+
F: scripts/Makefile.propeller
17266+
1726017267
PRINTK
1726117268
M: Petr Mladek <[email protected]>
1726217269
R: Steven Rostedt <[email protected]>

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1023,6 +1023,7 @@ include-$(CONFIG_UBSAN) += scripts/Makefile.ubsan
10231023
include-$(CONFIG_KCOV) += scripts/Makefile.kcov
10241024
include-$(CONFIG_RANDSTRUCT) += scripts/Makefile.randstruct
10251025
include-$(CONFIG_AUTOFDO_CLANG) += scripts/Makefile.autofdo
1026+
include-$(CONFIG_PROPELLER_CLANG) += scripts/Makefile.propeller
10261027
include-$(CONFIG_GCC_PLUGINS) += scripts/Makefile.gcc-plugins
10271028

10281029
include $(addprefix $(srctree)/, $(include-y))

arch/Kconfig

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -821,6 +821,25 @@ config AUTOFDO_CLANG
821821

822822
If unsure, say N.
823823

824+
config ARCH_SUPPORTS_PROPELLER_CLANG
825+
bool
826+
827+
config PROPELLER_CLANG
828+
bool "Enable Clang's Propeller build"
829+
depends on ARCH_SUPPORTS_PROPELLER_CLANG
830+
depends on CC_IS_CLANG && CLANG_VERSION >= 190000
831+
help
832+
This option enables Clang’s Propeller build. When the Propeller
833+
profiles is specified in variable CLANG_PROPELLER_PROFILE_PREFIX
834+
during the build process, Clang uses the profiles to optimize
835+
the kernel.
836+
837+
If no profile is specified, Propeller options are still passed
838+
to Clang to facilitate the collection of perf data for creating
839+
the Propeller profiles in subsequent builds.
840+
841+
If unsure, say N.
842+
824843
config ARCH_SUPPORTS_CFI_CLANG
825844
bool
826845
help

arch/x86/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,7 @@ config X86
119119
select ARCH_SUPPORTS_LTO_CLANG
120120
select ARCH_SUPPORTS_LTO_CLANG_THIN
121121
select ARCH_SUPPORTS_AUTOFDO_CLANG
122+
select ARCH_SUPPORTS_PROPELLER_CLANG if X86_64
122123
select ARCH_USE_BUILTIN_BSWAP
123124
select ARCH_USE_MEMTEST
124125
select ARCH_USE_QUEUED_RWLOCKS

arch/x86/kernel/vmlinux.lds.S

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -487,6 +487,10 @@ SECTIONS
487487

488488
STABS_DEBUG
489489
DWARF_DEBUG
490+
#ifdef CONFIG_PROPELLER_CLANG
491+
.llvm_bb_addr_map : { *(.llvm_bb_addr_map) }
492+
#endif
493+
490494
ELF_DETAILS
491495

492496
DISCARDS

include/asm-generic/vmlinux.lds.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -93,14 +93,14 @@
9393
* With LTO_CLANG, the linker also splits sections by default, so we need
9494
* these macros to combine the sections during the final link.
9595
*
96-
* With AUTOFDO_CLANG, by default, the linker splits text sections and
97-
* regroups functions into subsections.
96+
* With AUTOFDO_CLANG and PROPELLER_CLANG, by default, the linker splits
97+
* text sections and regroups functions into subsections.
9898
*
9999
* RODATA_MAIN is not used because existing code already defines .rodata.x
100100
* sections to be brought in with rodata.
101101
*/
102102
#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG) || \
103-
defined(CONFIG_AUTOFDO_CLANG)
103+
defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
104104
#define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
105105
#else
106106
#define TEXT_MAIN .text

scripts/Makefile.lib

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,16 @@ _c_flags += $(if $(patsubst n%,, \
210210
$(CFLAGS_AUTOFDO_CLANG))
211211
endif
212212

213+
#
214+
# Enable Propeller build flags except some files or directories we don't want to
215+
# enable (depends on variables AUTOFDO_PROPELLER_obj.o and PROPELLER_PROFILE).
216+
#
217+
ifdef CONFIG_PROPELLER_CLANG
218+
_c_flags += $(if $(patsubst n%,, \
219+
$(AUTOFDO_PROFILE_$(target-stem).o)$(AUTOFDO_PROFILE)$(PROPELLER_PROFILE))$(is-kernel-object), \
220+
$(CFLAGS_PROPELLER_CLANG))
221+
endif
222+
213223
# $(srctree)/$(src) for including checkin headers from generated source files
214224
# $(objtree)/$(obj) for including generated headers from checkin source files
215225
ifeq ($(KBUILD_EXTMOD),)

scripts/Makefile.propeller

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# SPDX-License-Identifier: GPL-2.0
2+
3+
# Enable available and selected Clang Propeller features.
4+
ifdef CLANG_PROPELLER_PROFILE_PREFIX
5+
CFLAGS_PROPELLER_CLANG := -fbasic-block-sections=list=$(CLANG_PROPELLER_PROFILE_PREFIX)_cc_profile.txt -ffunction-sections
6+
KBUILD_LDFLAGS += --symbol-ordering-file=$(CLANG_PROPELLER_PROFILE_PREFIX)_ld_profile.txt --no-warn-symbol-ordering
7+
else
8+
CFLAGS_PROPELLER_CLANG := -fbasic-block-sections=labels
9+
endif
10+
11+
# Propeller requires debug information to embed module names in the profiles.
12+
# If CONFIG_DEBUG_INFO is not enabled, set -gmlt option. Skip this for AutoFDO,
13+
# as the option should already be set.
14+
ifndef CONFIG_DEBUG_INFO
15+
ifndef CONFIG_AUTOFDO_CLANG
16+
CFLAGS_PROPELLER_CLANG += -gmlt
17+
endif
18+
endif
19+
20+
ifdef CONFIG_LTO_CLANG_THIN
21+
ifdef CLANG_PROPELLER_PROFILE_PREFIX
22+
KBUILD_LDFLAGS += --lto-basic-block-sections=$(CLANG_PROPELLER_PROFILE_PREFIX)_cc_profile.txt
23+
else
24+
KBUILD_LDFLAGS += --lto-basic-block-sections=labels
25+
endif
26+
endif
27+
28+
export CFLAGS_PROPELLER_CLANG

tools/objtool/check.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4595,6 +4595,7 @@ static int validate_ibt(struct objtool_file *file)
45954595
!strcmp(sec->name, "__mcount_loc") ||
45964596
!strcmp(sec->name, ".kcfi_traps") ||
45974597
!strcmp(sec->name, ".llvm.call-graph-profile") ||
4598+
!strcmp(sec->name, ".llvm_bb_addr_map") ||
45984599
strstr(sec->name, "__patchable_function_entries"))
45994600
continue;
46004601

0 commit comments

Comments
 (0)