@@ -2410,20 +2410,35 @@ usual build cycle when using sample profilers for optimization:
2410
2410
2411
2411
1. Build the code with source line table information. You can use all the
2412
2412
usual build flags that you always build your application with. The only
2413
- requirement is that you add ``-gline-tables-only `` or ``-g `` to the
2414
- command line. This is important for the profiler to be able to map
2415
- instructions back to source line locations.
2413
+ requirement is that DWARF debug info including source line information is
2414
+ generated. This DWARF information is important for the profiler to be able
2415
+ to map instructions back to source line locations.
2416
+
2417
+ On Linux, ``-g `` or just ``-gline-tables-only `` is sufficient:
2416
2418
2417
2419
.. code-block :: console
2418
2420
2419
2421
$ clang++ -O2 -gline-tables-only code.cc -o code
2420
2422
2423
+ It is also possible to include DWARF in Windows binaries:
2424
+
2425
+ .. code-block :: console
2426
+
2427
+ $ clang-cl -O2 -gdwarf -gline-tables-only coff-profile.cpp -fuse-ld=lld -link -debug:dwarf
2428
+
2421
2429
2. Run the executable under a sampling profiler. The specific profiler
2422
2430
you use does not really matter, as long as its output can be converted
2423
- into the format that the LLVM optimizer understands. Currently, there
2424
- exists a conversion tool for the Linux Perf profiler
2425
- (https://perf.wiki.kernel.org/), so these examples assume that you
2426
- are using Linux Perf to profile your code.
2431
+ into the format that the LLVM optimizer understands.
2432
+
2433
+ Two such profilers are the the Linux Perf profiler
2434
+ (https://perf.wiki.kernel.org/) and Intel's Sampling Enabling Product (SEP),
2435
+ available as part of `Intel VTune
2436
+ <https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/vtune-profiler.html> `_.
2437
+
2438
+ The LLVM tool ``llvm-profgen `` can convert output of either Perf or SEP. An
2439
+ external tool, AutoFDO, also supports Linux Perf output.
2440
+
2441
+ When using Perf:
2427
2442
2428
2443
.. code-block :: console
2429
2444
@@ -2434,6 +2449,15 @@ usual build cycle when using sample profilers for optimization:
2434
2449
it provides better call information, which improves the accuracy of
2435
2450
the profile data.
2436
2451
2452
+ When using SEP:
2453
+
2454
+ .. code-block :: console
2455
+
2456
+ $ sep -start -ec BR_INST_RETIRED.NEAR_TAKEN:precise=yes:pdir -lbr no_filter:usr -perf-script ip,brstack -app ./code
2457
+
2458
+ This produces a ``perf.data.script `` output which can be used with
2459
+ ``llvm-profgen ``'s ``--perfscript `` input option.
2460
+
2437
2461
3. Convert the collected profile data to LLVM's sample profile format.
2438
2462
This is currently supported via the AutoFDO converter ``create_llvm_prof ``.
2439
2463
It is available at https://github.com/google/autofdo. Once built and
@@ -2454,7 +2478,14 @@ usual build cycle when using sample profilers for optimization:
2454
2478
2455
2479
.. code-block :: console
2456
2480
2457
- $ llvm-profgen --binary=./code --output=code.prof--perfdata=perf.data
2481
+ $ llvm-profgen --binary=./code --output=code.prof --perfdata=perf.data
2482
+
2483
+ When using SEP the output is in the textual format corresponding to
2484
+ `llvm-profgen --perfscript `. For example:
2485
+
2486
+ .. code-block :: console
2487
+
2488
+ $ llvm-profgen --binary=./code --output=code.prof --perfscript=perf.data.script
2458
2489
2459
2490
2460
2491
4. Build the code again using the collected profile. This step feeds
0 commit comments