@@ -2319,6 +2319,8 @@ are listed below.
2319
2319
on ELF targets when using the integrated assembler. This flag currently
2320
2320
only has an effect on ELF targets.
2321
2321
2322
+ .. _funique_internal_linkage_names :
2323
+
2322
2324
.. option :: -f[no]-unique-internal-linkage-names
2323
2325
2324
2326
Controls whether Clang emits a unique (best-effort) symbol name for internal
@@ -2448,27 +2450,41 @@ usual build cycle when using sample profilers for optimization:
2448
2450
usual build flags that you always build your application with. The only
2449
2451
requirement is that DWARF debug info including source line information is
2450
2452
generated. This DWARF information is important for the profiler to be able
2451
- to map instructions back to source line locations.
2453
+ to map instructions back to source line locations. The usefulness of this
2454
+ DWARF information can be improved with the ``-fdebug-info-for-profiling ``
2455
+ and ``-funique-internal-linkage-names `` options.
2452
2456
2453
- On Linux, `` -g `` or just `` -gline-tables-only `` is sufficient :
2457
+ On Linux:
2454
2458
2455
2459
.. code-block :: console
2456
2460
2457
- $ clang++ -O2 -gline-tables-only code.cc -o code
2461
+ $ clang++ -O2 -gline-tables-only \
2462
+ -fdebug-info-for-profiling -funique-internal-linkage-names \
2463
+ code.cc -o code
2458
2464
2459
2465
While MSVC-style targets default to CodeView debug information, DWARF debug
2460
2466
information is required to generate source-level LLVM profiles. Use
2461
2467
``-gdwarf `` to include DWARF debug information:
2462
2468
2463
- .. code-block :: console
2469
+ .. code-block :: winbatch
2470
+
2471
+ > clang-cl /O2 -gdwarf -gline-tables-only ^
2472
+ /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
2473
+ code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf
2474
+
2475
+ .. note ::
2464
2476
2465
- $ clang-cl -O2 -gdwarf -gline-tables-only coff-profile.cpp -fuse-ld=lld -link -debug:dwarf
2477
+ :ref: `-funique-internal-linkage-names <funique_internal_linkage_names >`
2478
+ generates unique names based on given command-line source file paths. If
2479
+ your build system uses absolute source paths and these paths may change
2480
+ between steps 1 and 4, then the uniqued function names may change and result
2481
+ in unused profile data. Consider omitting this option in such cases.
2466
2482
2467
2483
2. Run the executable under a sampling profiler. The specific profiler
2468
2484
you use does not really matter, as long as its output can be converted
2469
2485
into the format that the LLVM optimizer understands.
2470
2486
2471
- Two such profilers are the the Linux Perf profiler
2487
+ Two such profilers are the Linux Perf profiler
2472
2488
(https://perf.wiki.kernel.org/) and Intel's Sampling Enabling Product (SEP),
2473
2489
available as part of `Intel VTune
2474
2490
<https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/vtune-profiler.html> `_.
@@ -2482,7 +2498,9 @@ usual build cycle when using sample profilers for optimization:
2482
2498
2483
2499
.. code-block :: console
2484
2500
2485
- $ perf record -b ./code
2501
+ $ perf record -b -e BR_INST_RETIRED.NEAR_TAKEN:uppp ./code
2502
+
2503
+ If the event above is unavailable, ``branches:u `` is probably next-best.
2486
2504
2487
2505
Note the use of the ``-b `` flag. This tells Perf to use the Last Branch
2488
2506
Record (LBR) to record call chains. While this is not strictly required,
@@ -2532,21 +2550,42 @@ usual build cycle when using sample profilers for optimization:
2532
2550
that executes faster than the original one. Note that you are not
2533
2551
required to build the code with the exact same arguments that you
2534
2552
used in the first step. The only requirement is that you build the code
2535
- with ``-gline-tables-only `` and ``-fprofile-sample-use ``.
2553
+ with the same debug info options and ``-fprofile-sample-use ``.
2554
+
2555
+ On Linux:
2536
2556
2537
2557
.. code-block :: console
2538
2558
2539
- $ clang++ -O2 -gline-tables-only -fprofile-sample-use=code.prof code.cc -o code
2559
+ $ clang++ -O2 -gline-tables-only \
2560
+ -fdebug-info-for-profiling -funique-internal-linkage-names \
2561
+ -fprofile-sample-use=code.prof code.cc -o code
2540
2562
2541
- [OPTIONAL] Sampling-based profiles can have inaccuracies or missing block/
2542
- edge counters. The profile inference algorithm (profi) can be used to infer
2543
- missing blocks and edge counts, and improve the quality of profile data.
2544
- Enable it with ``-fsample-profile-use-profi ``.
2563
+ On Windows:
2545
2564
2546
- .. code-block :: console
2565
+ .. code-block :: winbatch
2566
+
2567
+ > clang-cl /O2 -gdwarf -gline-tables-only ^
2568
+ /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
2569
+ /fprofile-sample-use=code.prof code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf
2570
+
2571
+ [OPTIONAL] Sampling-based profiles can have inaccuracies or missing block/
2572
+ edge counters. The profile inference algorithm (profi) can be used to infer
2573
+ missing blocks and edge counts, and improve the quality of profile data.
2574
+ Enable it with ``-fsample-profile-use-profi ``. For example, on Linux:
2575
+
2576
+ .. code-block :: console
2577
+
2578
+ $ clang++ -fsample-profile-use-profi -O2 -gline-tables-only \
2579
+ -fdebug-info-for-profiling -funique-internal-linkage-names \
2580
+ -fprofile-sample-use=code.prof code.cc -o code
2581
+
2582
+ On Windows:
2583
+
2584
+ .. code-block :: winbatch
2547
2585
2548
- $ clang++ -O2 -gline-tables-only -fprofile-sample-use=code.prof \
2549
- -fsample-profile-use-profi code.cc -o code
2586
+ > clang-cl /clang:-fsample-profile-use-profi /O2 -gdwarf -gline-tables-only ^
2587
+ /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
2588
+ /fprofile-sample-use=code.prof code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf
2550
2589
2551
2590
Sample Profile Formats
2552
2591
""""""""""""""""""""""
0 commit comments