Skip to content

Commit fa48dbd

Browse files
committed
Merge from 'main' to 'sycl-web' (116 commits)
CONFLICT (content): Merge conflict in clang/lib/CodeGen/CodeGenAction.cpp CONFLICT (content): Merge conflict in llvm/lib/IR/LLVMContext.cpp
2 parents 8237b4b + 89d8df1 commit fa48dbd

File tree

384 files changed

+57433
-32060
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

384 files changed

+57433
-32060
lines changed

bolt/docs/OptimizingLinux.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Optimizing Linux Kernel with BOLT
2+
3+
4+
## Introduction
5+
6+
Many Linux applications spend a significant amount of their execution time in the kernel. Thus, when we consider code optimization for system performance, it is essential to improve the CPU utilization not only in the user-space applications and libraries but also in the kernel. BOLT has demonstrated double-digit gains while being applied to user-space programs. This guide shows how to apply BOLT to the x86-64 Linux kernel and enhance your system's performance. In our experiments, BOLT boosted database TPS by 2 percent when applied to the kernel compiled with the highest level optimizations, including PGO and LTO. The database spent ~40% of the time in the kernel and was quite sensitive to kernel performance.
7+
8+
BOLT optimizes code layout based on a low-level execution profile collected with the Linux `perf` tool. The best quality profile should include branch history, such as Intel's last branch records (LBR). BOLT runs on a linked binary and reorders the code while combining frequently executed blocks of instructions in a manner best suited for the hardware. Other than branch instructions, most of the code is left unchanged. Additionally, BOLT updates all metadata associated with the modified code, including DWARF debug information and Linux ORC unwind information.
9+
10+
While BOLT optimizations are not specific to the Linux kernel, certain quirks distinguish the kernel from user-level applications.
11+
12+
BOLT has been successfully applied to and tested with several flavors of the x86-64 Linux kernel.
13+
14+
15+
## QuickStart Guide
16+
17+
BOLT operates on a statically-linked kernel executable, a.k.a. `vmlinux` binary. However, most Linux distributions use a `vmlinuz` compressed image for system booting. To use BOLT on the kernel, you must either repackage `vmlinuz` after BOLT optimizations or add steps for running BOLT into the kernel build and rebuild `vmlinuz`. Uncompressing `vmlinuz` and repackaging it with a new `vmlinux` binary falls beyond the scope of this guide, and at some point, we may add the capability to run BOLT directly on `vmlinuz`. Meanwhile, this guide focuses on steps for integrating BOLT into the kernel build process.
18+
19+
20+
### Building the Kernel
21+
22+
After downloading the kernel sources and configuration for your distribution, you should be able to build `vmlinuz` using the `make bzImage` command. Ideally, the kernel should binary match the kernel on the system you are about to optimize (the target system). The binary matching part is critical as BOLT performs profile matching and optimizations at the binary level. We recommend installing a freshly built kernel on the target system to avoid any discrepancies.
23+
24+
Note that the kernel build will produce several artifacts besides bzImage. The most important of them is the uncompressed `vmlinux` binary, which will be used in the next steps. Make sure to save this file.
25+
26+
Build and target systems should have a `perf` tool installed for collecting and processing profiles. If your build system differs from the target, make sure `perf` versions are compatible. The build system should also have the latest BOLT binary and tools (`llvm-bolt`, `perf2bolt`).
27+
28+
Once the target system boots with the freshly-built kernel, start your workload, such as a database benchmark. While the system is under load, collect the kernel profile using perf:
29+
30+
31+
```bash
32+
$ sudo perf record -a -e cycles -j any,k -F 5000 -- sleep 600
33+
```
34+
35+
36+
Convert `perf` profile into a format suitable for BOLT passing the `vmlinux` binary to `perf2bolt`:
37+
38+
39+
```bash
40+
$ sudo chwon $USER perf.data
41+
$ perf2bolt -p perf.data -o perf.fdata vmlinux
42+
```
43+
44+
45+
Under a high load, `perf.data` should be several gigabytes in size and you should expect the converted `perf.fdata` not to exceed 100 MB.
46+
47+
Two changes are required for the kernel build. The first one is optional but highly recommended. It introduces a BOLT-reserved space into `vmlinux` code section:
48+
49+
50+
```diff
51+
--- a/arch/x86/kernel/vmlinux.lds.S
52+
+++ b/arch/x86/kernel/vmlinux.lds.S
53+
@@ -139,6 +139,11 @@ SECTIONS
54+
STATIC_CALL_TEXT
55+
*(.gnu.warning)
56+
57+
+ /* Allocate space for BOLT */
58+
+ __bolt_reserved_start = .;
59+
+ . += 2048 * 1024;
60+
+ __bolt_reserved_end = .;
61+
+
62+
#ifdef CONFIG_RETPOLINE
63+
__indirect_thunk_start = .;
64+
*(.text.__x86.*)
65+
```
66+
67+
68+
The second patch adds a step that runs BOLT on `vmlinux` binary:
69+
70+
71+
```diff
72+
--- a/scripts/link-vmlinux.sh
73+
+++ b/scripts/link-vmlinux.sh
74+
@@ -340,5 +340,13 @@ if is_enabled CONFIG_KALLSYMS; then
75+
fi
76+
fi
77+
78+
+# Apply BOLT
79+
+BOLT=llvm-bolt
80+
+BOLT_PROFILE=perf.fdata
81+
+BOLT_OPTS="--dyno-stats --eliminate-unreachable=0 --reorder-blocks=ext-tsp --simplify-conditional-tail-calls=0 --skip-funcs=__entry_text_start,irq_entries_start --split-functions"
82+
+mv vmlinux vmlinux.pre-bolt
83+
+echo BOLTing vmlinux
84+
+${BOLT} vmlinux.pre-bolt -o vmlinux --data ${BOLT_PROFILE} ${BOLT_OPTS}
85+
+
86+
# For fixdep
87+
echo "vmlinux: $0" > .vmlinux.d
88+
```
89+
90+
91+
If you skipped the first step or are running BOLT on a pre-built `vmlinux` binary, drop the `--split-functions` option.
92+
93+
94+
## Performance Expectations
95+
96+
By improving the code layout, BOLT can boost the kernel's performance by up to 5% by reducing instruction cache misses and branch mispredictions. When measuring total system performance, you should scale this number accordingly based on the time your application spends in the kernel (excluding I/O time).
97+
98+
99+
## Profile Quality
100+
101+
The timing and duration of the profiling may have a significant effect on the performance of the BOLTed kernel. If you don't know your workload well, it's recommended that you profile for the whole duration of the benchmark run. As longer times will result in larger `perf.data` files, you can lower the profiling frequency by providing a smaller value of `-F` flag. E.g., to record the kernel profile for half an hour, use the following command:
102+
103+
104+
```bash
105+
$ sudo perf record -a -e cycles -j any,k -F 1000 -- sleep 1800
106+
```
107+
108+
109+
110+
## BOLT Disassembly
111+
112+
BOLT annotates the disassembly with control-flow information and attaches Linux-specific metadata to the code. To view annotated disassembly, run:
113+
114+
115+
```bash
116+
$ llvm-bolt vmlinux -o /dev/null --print-cfg
117+
```
118+
119+
120+
If you want to limit the disassembly to a set of functions, add `--print-only=<func1regex>,<func2regex>,...`, where a function name is specified using regular expressions.

clang-tools-extra/clang-doc/tool/CMakeLists.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ set(assets
2525
)
2626

2727
set(asset_dir "${CMAKE_CURRENT_SOURCE_DIR}/../assets")
28-
set(resource_dir "${CMAKE_BINARY_DIR}/share/clang")
28+
set(resource_dir "${CMAKE_BINARY_DIR}/share/clang-doc")
2929
set(out_files)
3030

3131
function(copy_files_to_dst src_dir dst_dir file)
@@ -42,7 +42,7 @@ endfunction(copy_files_to_dst)
4242

4343
foreach(f ${assets})
4444
install(FILES ${asset_dir}/${f}
45-
DESTINATION "${CMAKE_INSTALL_DATADIR}/clang"
45+
DESTINATION "${CMAKE_INSTALL_DATADIR}/clang-doc"
4646
COMPONENT clang-doc)
4747
copy_files_to_dst(${asset_dir} ${resource_dir} ${f})
4848
endforeach(f)

clang-tools-extra/clang-doc/tool/ClangDocMain.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -188,7 +188,7 @@ Example usage for a project using a compile commands database:
188188
llvm::sys::path::native(ClangDocPath, NativeClangDocPath);
189189
llvm::SmallString<128> AssetsPath;
190190
AssetsPath = llvm::sys::path::parent_path(NativeClangDocPath);
191-
llvm::sys::path::append(AssetsPath, "..", "share", "clang");
191+
llvm::sys::path::append(AssetsPath, "..", "share", "clang-doc");
192192
llvm::SmallString<128> DefaultStylesheet;
193193
llvm::sys::path::native(AssetsPath, DefaultStylesheet);
194194
llvm::sys::path::append(DefaultStylesheet,

clang-tools-extra/clang-tidy/tool/run-clang-tidy.py

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -261,20 +261,20 @@ def main():
261261
parser.add_argument(
262262
"-allow-enabling-alpha-checkers",
263263
action="store_true",
264-
help="allow alpha checkers from clang-analyzer.",
264+
help="Allow alpha checkers from clang-analyzer.",
265265
)
266266
parser.add_argument(
267-
"-clang-tidy-binary", metavar="PATH", help="path to clang-tidy binary"
267+
"-clang-tidy-binary", metavar="PATH", help="Path to clang-tidy binary."
268268
)
269269
parser.add_argument(
270270
"-clang-apply-replacements-binary",
271271
metavar="PATH",
272-
help="path to clang-apply-replacements binary",
272+
help="Path to clang-apply-replacements binary.",
273273
)
274274
parser.add_argument(
275275
"-checks",
276276
default=None,
277-
help="checks filter, when not specified, use clang-tidy default",
277+
help="Checks filter, when not specified, use clang-tidy default.",
278278
)
279279
config_group = parser.add_mutually_exclusive_group()
280280
config_group.add_argument(
@@ -307,7 +307,7 @@ def main():
307307
parser.add_argument(
308308
"-header-filter",
309309
default=None,
310-
help="regular expression matching the names of the "
310+
help="Regular expression matching the names of the "
311311
"headers to output diagnostics from. Diagnostics from "
312312
"the main file of each translation unit are always "
313313
"displayed.",
@@ -347,19 +347,22 @@ def main():
347347
"-j",
348348
type=int,
349349
default=0,
350-
help="number of tidy instances to be run in parallel.",
350+
help="Number of tidy instances to be run in parallel.",
351351
)
352352
parser.add_argument(
353-
"files", nargs="*", default=[".*"], help="files to be processed (regex on path)"
353+
"files",
354+
nargs="*",
355+
default=[".*"],
356+
help="Files to be processed (regex on path).",
354357
)
355-
parser.add_argument("-fix", action="store_true", help="apply fix-its")
358+
parser.add_argument("-fix", action="store_true", help="apply fix-its.")
356359
parser.add_argument(
357-
"-format", action="store_true", help="Reformat code after applying fixes"
360+
"-format", action="store_true", help="Reformat code after applying fixes."
358361
)
359362
parser.add_argument(
360363
"-style",
361364
default="file",
362-
help="The style of reformat code after applying fixes",
365+
help="The style of reformat code after applying fixes.",
363366
)
364367
parser.add_argument(
365368
"-use-color",
@@ -388,7 +391,7 @@ def main():
388391
help="Additional argument to prepend to the compiler command line.",
389392
)
390393
parser.add_argument(
391-
"-quiet", action="store_true", help="Run clang-tidy in quiet mode"
394+
"-quiet", action="store_true", help="Run clang-tidy in quiet mode."
392395
)
393396
parser.add_argument(
394397
"-load",
@@ -400,7 +403,7 @@ def main():
400403
parser.add_argument(
401404
"-warnings-as-errors",
402405
default=None,
403-
help="Upgrades warnings to errors. Same format as '-checks'",
406+
help="Upgrades warnings to errors. Same format as '-checks'.",
404407
)
405408
args = parser.parse_args()
406409

clang-tools-extra/clangd/Format.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -281,7 +281,7 @@ formatIncremental(llvm::StringRef OriginalCode, unsigned OriginalCursor,
281281
// Never *remove* lines in response to pressing enter! This annoys users.
282282
if (InsertedText == "\n") {
283283
Style.MaxEmptyLinesToKeep = 1000;
284-
Style.KeepEmptyLines.AtStartOfBlock = true;
284+
Style.KeepEmptyLinesAtTheStartOfBlocks = true;
285285
}
286286

287287
// Compute the code we want to format:

clang-tools-extra/test/clang-doc/Inputs/basic-project/src/Calculator.cpp

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
#include "Calculator.h"
2-
#include <stdexcept>
32

43
int Calculator::add(int a, int b) {
54
return a + b;
@@ -14,8 +13,5 @@ int Calculator::multiply(int a, int b) {
1413
}
1514

1615
double Calculator::divide(int a, int b) {
17-
if (b == 0) {
18-
throw std::invalid_argument("Division by zero");
19-
}
2016
return static_cast<double>(a) / b;
2117
}

clang-tools-extra/test/clang-doc/basic-project.test

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -139,25 +139,25 @@
139139
// HTML-CALC-NEXT: <div>
140140
// HTML-CALC-NEXT: <h3 id="{{([0-9A-F]{40})}}">add</h3>
141141
// HTML-CALC-NEXT: <p>public int add(int a, int b)</p>
142-
// HTML-CALC-NEXT: <p>Defined at line 4 of file {{.*}}Calculator.cpp</p>
142+
// HTML-CALC-NEXT: <p>Defined at line 3 of file {{.*}}Calculator.cpp</p>
143143
// HTML-CALC-NEXT: <div>
144144
// HTML-CALC-NEXT: <div></div>
145145
// HTML-CALC-NEXT: </div>
146146
// HTML-CALC-NEXT: <h3 id="{{([0-9A-F]{40})}}">subtract</h3>
147147
// HTML-CALC-NEXT: <p>public int subtract(int a, int b)</p>
148-
// HTML-CALC-NEXT: <p>Defined at line 8 of file {{.*}}Calculator.cpp</p>
148+
// HTML-CALC-NEXT: <p>Defined at line 7 of file {{.*}}Calculator.cpp</p>
149149
// HTML-CALC-NEXT: <div>
150150
// HTML-CALC-NEXT: <div></div>
151151
// HTML-CALC-NEXT: </div>
152152
// HTML-CALC-NEXT: <h3 id="{{([0-9A-F]{40})}}">multiply</h3>
153153
// HTML-CALC-NEXT: <p>public int multiply(int a, int b)</p>
154-
// HTML-CALC-NEXT: <p>Defined at line 12 of file {{.*}}Calculator.cpp</p>
154+
// HTML-CALC-NEXT: <p>Defined at line 11 of file {{.*}}Calculator.cpp</p>
155155
// HTML-CALC-NEXT: <div>
156156
// HTML-CALC-NEXT: <div></div>
157157
// HTML-CALC-NEXT: </div>
158158
// HTML-CALC-NEXT: <h3 id="{{([0-9A-F]{40})}}">divide</h3>
159159
// HTML-CALC-NEXT: <p>public double divide(int a, int b)</p>
160-
// HTML-CALC-NEXT: <p>Defined at line 16 of file {{.*}}Calculator.cpp</p>
160+
// HTML-CALC-NEXT: <p>Defined at line 15 of file {{.*}}Calculator.cpp</p>
161161
// HTML-CALC-NEXT: <div>
162162
// HTML-CALC-NEXT: <div></div>
163163
// HTML-CALC-NEXT: </div>

clang/docs/ClangFormatStyleOptions.rst

Lines changed: 10 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -4443,51 +4443,23 @@ the configuration (without a prefix: ``Auto``).
44434443
false:
44444444
import {VeryLongImportsAreAnnoying, VeryLongImportsAreAnnoying, VeryLongImportsAreAnnoying,} from "some/module.js"
44454445
4446-
.. _KeepEmptyLines:
4447-
4448-
**KeepEmptyLines** (``KeepEmptyLinesStyle``) :versionbadge:`clang-format 19` :ref:`<KeepEmptyLines>`
4449-
Which empty lines are kept. See ``MaxEmptyLinesToKeep`` for how many
4450-
consecutive empty lines are kept.
4451-
4452-
Nested configuration flags:
4453-
4454-
Options regarding which empty lines are kept.
4455-
4456-
For example, the config below will remove empty lines at start of the
4457-
file, end of the file, and start of blocks.
4458-
4459-
4460-
.. code-block:: c++
4461-
4462-
KeepEmptyLines:
4463-
AtEndOfFile: false
4464-
AtStartOfBlock: false
4465-
AtStartOfFile: false
4466-
4467-
* ``bool AtEndOfFile`` Keep empty lines at end of file.
4468-
4469-
* ``bool AtStartOfBlock`` Keep empty lines at start of a block.
4470-
4471-
.. code-block:: c++
4472-
4473-
true: false:
4474-
if (foo) { vs. if (foo) {
4475-
bar();
4476-
bar(); }
4477-
}
4478-
4479-
* ``bool AtStartOfFile`` Keep empty lines at start of file.
4480-
4481-
44824446
.. _KeepEmptyLinesAtEOF:
44834447

44844448
**KeepEmptyLinesAtEOF** (``Boolean``) :versionbadge:`clang-format 17` :ref:`<KeepEmptyLinesAtEOF>`
4485-
This option is deprecated. See ``AtEndOfFile`` of ``KeepEmptyLines``.
4449+
Keep empty lines (up to ``MaxEmptyLinesToKeep``) at end of file.
44864450

44874451
.. _KeepEmptyLinesAtTheStartOfBlocks:
44884452

44894453
**KeepEmptyLinesAtTheStartOfBlocks** (``Boolean``) :versionbadge:`clang-format 3.7` :ref:`<KeepEmptyLinesAtTheStartOfBlocks>`
4490-
This option is deprecated. See ``AtStartOfBlock`` of ``KeepEmptyLines``.
4454+
If true, the empty line at the start of blocks is kept.
4455+
4456+
.. code-block:: c++
4457+
4458+
true: false:
4459+
if (foo) { vs. if (foo) {
4460+
bar();
4461+
bar(); }
4462+
}
44914463

44924464
.. _LambdaBodyIndentation:
44934465

0 commit comments

Comments
 (0)