@@ -2441,20 +2441,39 @@ usual build cycle when using sample profilers for optimization:
2441
2441
2442
2442
1. Build the code with source line table information. You can use all the
2443
2443
usual build flags that you always build your application with. The only
2444
- requirement is that you add ``-gline-tables-only `` or ``-g `` to the
2445
- command line. This is important for the profiler to be able to map
2446
- instructions back to source line locations.
2444
+ requirement is that DWARF debug info including source line information is
2445
+ generated. This DWARF information is important for the profiler to be able
2446
+ to map instructions back to source line locations.
2447
+
2448
+ On Linux, ``-g `` or just ``-gline-tables-only `` is sufficient:
2447
2449
2448
2450
.. code-block :: console
2449
2451
2450
2452
$ clang++ -O2 -gline-tables-only code.cc -o code
2451
2453
2454
+ While MSVC-style targets default to CodeView debug information, DWARF debug
2455
+ information is required to generate source-level LLVM profiles. Use
2456
+ ``-gdwarf `` to include DWARF debug information:
2457
+
2458
+ .. code-block :: console
2459
+
2460
+ $ clang-cl -O2 -gdwarf -gline-tables-only coff-profile.cpp -fuse-ld=lld -link -debug:dwarf
2461
+
2452
2462
2. Run the executable under a sampling profiler. The specific profiler
2453
2463
you use does not really matter, as long as its output can be converted
2454
- into the format that the LLVM optimizer understands. Currently, there
2455
- exists a conversion tool for the Linux Perf profiler
2456
- (https://perf.wiki.kernel.org/), so these examples assume that you
2457
- are using Linux Perf to profile your code.
2464
+ into the format that the LLVM optimizer understands.
2465
+
2466
+ Two such profilers are the the Linux Perf profiler
2467
+ (https://perf.wiki.kernel.org/) and Intel's Sampling Enabling Product (SEP),
2468
+ available as part of `Intel VTune
2469
+ <https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/vtune-profiler.html> `_.
2470
+ While Perf is Linux-specific, SEP can be used on Linux, Windows, and FreeBSD.
2471
+
2472
+ The LLVM tool ``llvm-profgen `` can convert output of either Perf or SEP. An
2473
+ external project, `AutoFDO <https://github.com/google/autofdo >`_, also
2474
+ provides a ``create_llvm_prof `` tool which supports Linux Perf output.
2475
+
2476
+ When using Perf:
2458
2477
2459
2478
.. code-block :: console
2460
2479
@@ -2465,11 +2484,19 @@ usual build cycle when using sample profilers for optimization:
2465
2484
it provides better call information, which improves the accuracy of
2466
2485
the profile data.
2467
2486
2468
- 3. Convert the collected profile data to LLVM's sample profile format.
2469
- This is currently supported via the AutoFDO converter ``create_llvm_prof ``.
2470
- It is available at https://github.com/google/autofdo. Once built and
2471
- installed, you can convert the ``perf.data `` file to LLVM using
2472
- the command:
2487
+ When using SEP:
2488
+
2489
+ .. code-block :: console
2490
+
2491
+ $ sep -start -out code.tb7 -ec BR_INST_RETIRED.NEAR_TAKEN:precise=yes:pdir -lbr no_filter:usr -perf-script brstack -app ./code
2492
+
2493
+ This produces a ``code.perf.data.script `` output which can be used with
2494
+ ``llvm-profgen ``'s ``--perfscript `` input option.
2495
+
2496
+ 3. Convert the collected profile data to LLVM's sample profile format. This is
2497
+ currently supported via the `AutoFDO <https://github.com/google/autofdo >`_
2498
+ converter ``create_llvm_prof ``. Once built and installed, you can convert
2499
+ the ``perf.data `` file to LLVM using the command:
2473
2500
2474
2501
.. code-block :: console
2475
2502
@@ -2485,7 +2512,14 @@ usual build cycle when using sample profilers for optimization:
2485
2512
2486
2513
.. code-block :: console
2487
2514
2488
- $ llvm-profgen --binary=./code --output=code.prof--perfdata=perf.data
2515
+ $ llvm-profgen --binary=./code --output=code.prof --perfdata=perf.data
2516
+
2517
+ When using SEP the output is in the textual format corresponding to
2518
+ ``llvm-profgen --perfscript ``. For example:
2519
+
2520
+ .. code-block :: console
2521
+
2522
+ $ llvm-profgen --binary=./code --output=code.prof --perfscript=code.perf.data.script
2489
2523
2490
2524
2491
2525
4. Build the code again using the collected profile. This step feeds
0 commit comments