Skip to content

Commit 713a202

Browse files
authored
[CGData] Clang Options (#90304)
This adds new Clang flags to support codegen (CG) data: - `-fcodegen-data-generate{=path}`: This flag passes `-codegen-data-generate` as a boolean to the LLVM backend, causing the raw CG data to be emitted into a custom section. Currently, for LLD MachO only, it also passes `--codegen-data-generate-path=<path>` so that the indexed CG data file can be automatically produced at link time. For linkers that do not yet support this feature, `llvm-cgdata` can be used manually to merge this CG data in object files. - `-fcodegen-data-use{=path}`: This flag passes `-codegen-data-use-path=<path>` to the LLVM backend, enabling the use of specified CG data to optimistically outline functions. - The default `<path>` is set to `default.cgdata` when not specified. This depends on #108733. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
1 parent 40e8e4d commit 713a202

File tree

5 files changed

+135
-0
lines changed

5 files changed

+135
-0
lines changed

clang/docs/UsersManual.rst

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2410,6 +2410,39 @@ are listed below.
24102410
link-time optimizations like whole program inter-procedural basic block
24112411
reordering.
24122412

2413+
.. option:: -fcodegen-data-generate[=<path>]
2414+
2415+
Emit the raw codegen (CG) data into custom sections in the object file.
2416+
Currently, this option also combines the raw CG data from the object files
2417+
into an indexed CG data file specified by the <path>, for LLD MachO only.
2418+
When the <path> is not specified, `default.cgdata` is created.
2419+
The CG data file combines all the outlining instances that occurred locally
2420+
in each object file.
2421+
2422+
.. code-block:: console
2423+
2424+
$ clang -fuse-ld=lld -Oz -fcodegen-data-generate code.cc
2425+
2426+
For linkers that do not yet support this feature, `llvm-cgdata` can be used
2427+
manually to merge this CG data in object files.
2428+
2429+
.. code-block:: console
2430+
2431+
$ clang -c -fuse-ld=lld -Oz -fcodegen-data-generate code.cc
2432+
$ llvm-cgdata --merge -o default.cgdata code.o
2433+
2434+
.. option:: -fcodegen-data-use[=<path>]
2435+
2436+
Read the codegen data from the specified path to more effectively outline
2437+
functions across compilation units. When the <path> is not specified,
2438+
`default.cgdata` is used. This option can create many identically outlined
2439+
functions that can be optimized by the conventional linker’s identical code
2440+
folding (ICF).
2441+
2442+
.. code-block:: console
2443+
2444+
$ clang -fuse-ld=lld -Oz -Wl,--icf=safe -fcodegen-data-use code.cc
2445+
24132446
Profile Guided Optimization
24142447
---------------------------
24152448

clang/include/clang/Driver/Options.td

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1894,6 +1894,18 @@ def fprofile_selected_function_group :
18941894
Visibility<[ClangOption, CC1Option]>, MetaVarName<"<i>">,
18951895
HelpText<"Partition functions into N groups using -fprofile-function-groups and select only functions in group i to be instrumented. The valid range is 0 to N-1 inclusive">,
18961896
MarshallingInfoInt<CodeGenOpts<"ProfileSelectedFunctionGroup">>;
1897+
def fcodegen_data_generate_EQ : Joined<["-"], "fcodegen-data-generate=">,
1898+
Group<f_Group>, Visibility<[ClangOption, CLOption]>, MetaVarName<"<path>">,
1899+
HelpText<"Emit codegen data into the object file. LLD for MachO (currently) merges them into the specified <path>.">;
1900+
def fcodegen_data_generate : Flag<["-"], "fcodegen-data-generate">,
1901+
Group<f_Group>, Visibility<[ClangOption, CLOption]>, Alias<fcodegen_data_generate_EQ>, AliasArgs<["default.cgdata"]>,
1902+
HelpText<"Emit codegen data into the object file. LLD for MachO (currently) merges them into default.cgdata.">;
1903+
def fcodegen_data_use_EQ : Joined<["-"], "fcodegen-data-use=">,
1904+
Group<f_Group>, Visibility<[ClangOption, CLOption]>, MetaVarName<"<path>">,
1905+
HelpText<"Use codegen data read from the specified <path>.">;
1906+
def fcodegen_data_use : Flag<["-"], "fcodegen-data-use">,
1907+
Group<f_Group>, Visibility<[ClangOption, CLOption]>, Alias<fcodegen_data_use_EQ>, AliasArgs<["default.cgdata"]>,
1908+
HelpText<"Use codegen data read from default.cgdata to optimize the binary">;
18971909
def fswift_async_fp_EQ : Joined<["-"], "fswift-async-fp=">,
18981910
Group<f_Group>,
18991911
Visibility<[ClangOption, CC1Option, CC1AsOption, CLOption]>,

clang/lib/Driver/ToolChains/CommonArgs.cpp

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2753,6 +2753,25 @@ void tools::addMachineOutlinerArgs(const Driver &D,
27532753
addArg(Twine("-enable-machine-outliner=never"));
27542754
}
27552755
}
2756+
2757+
auto *CodeGenDataGenArg =
2758+
Args.getLastArg(options::OPT_fcodegen_data_generate_EQ);
2759+
auto *CodeGenDataUseArg = Args.getLastArg(options::OPT_fcodegen_data_use_EQ);
2760+
2761+
// We only allow one of them to be specified.
2762+
if (CodeGenDataGenArg && CodeGenDataUseArg)
2763+
D.Diag(diag::err_drv_argument_not_allowed_with)
2764+
<< CodeGenDataGenArg->getAsString(Args)
2765+
<< CodeGenDataUseArg->getAsString(Args);
2766+
2767+
// For codegen data gen, the output file is passed to the linker
2768+
// while a boolean flag is passed to the LLVM backend.
2769+
if (CodeGenDataGenArg)
2770+
addArg(Twine("-codegen-data-generate"));
2771+
2772+
// For codegen data use, the input file is passed to the LLVM backend.
2773+
if (CodeGenDataUseArg)
2774+
addArg(Twine("-codegen-data-use-path=") + CodeGenDataUseArg->getValue());
27562775
}
27572776

27582777
void tools::addOpenMPDeviceRTL(const Driver &D,

clang/lib/Driver/ToolChains/Darwin.cpp

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -476,6 +476,13 @@ void darwin::Linker::AddLinkArgs(Compilation &C, const ArgList &Args,
476476
llvm::sys::path::append(Path, "default.profdata");
477477
CmdArgs.push_back(Args.MakeArgString(Twine("--cs-profile-path=") + Path));
478478
}
479+
480+
auto *CodeGenDataGenArg =
481+
Args.getLastArg(options::OPT_fcodegen_data_generate_EQ);
482+
if (CodeGenDataGenArg)
483+
CmdArgs.push_back(
484+
Args.MakeArgString(Twine("--codegen-data-generate-path=") +
485+
CodeGenDataGenArg->getValue()));
479486
}
480487
}
481488

@@ -633,6 +640,32 @@ void darwin::Linker::ConstructJob(Compilation &C, const JobAction &JA,
633640
CmdArgs.push_back("-mllvm");
634641
CmdArgs.push_back("-enable-linkonceodr-outlining");
635642

643+
// Propagate codegen data flags to the linker for the LLVM backend.
644+
auto *CodeGenDataGenArg =
645+
Args.getLastArg(options::OPT_fcodegen_data_generate_EQ);
646+
auto *CodeGenDataUseArg = Args.getLastArg(options::OPT_fcodegen_data_use_EQ);
647+
648+
// We only allow one of them to be specified.
649+
const Driver &D = getToolChain().getDriver();
650+
if (CodeGenDataGenArg && CodeGenDataUseArg)
651+
D.Diag(diag::err_drv_argument_not_allowed_with)
652+
<< CodeGenDataGenArg->getAsString(Args)
653+
<< CodeGenDataUseArg->getAsString(Args);
654+
655+
// For codegen data gen, the output file is passed to the linker
656+
// while a boolean flag is passed to the LLVM backend.
657+
if (CodeGenDataGenArg) {
658+
CmdArgs.push_back("-mllvm");
659+
CmdArgs.push_back("-codegen-data-generate");
660+
}
661+
662+
// For codegen data use, the input file is passed to the LLVM backend.
663+
if (CodeGenDataUseArg) {
664+
CmdArgs.push_back("-mllvm");
665+
CmdArgs.push_back(Args.MakeArgString(Twine("-codegen-data-use-path=") +
666+
CodeGenDataUseArg->getValue()));
667+
}
668+
636669
// Setup statistics file output.
637670
SmallString<128> StatsFile =
638671
getStatsFileName(Args, Output, Inputs[0], getToolChain().getDriver());

clang/test/Driver/codegen-data.c

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
// Verify only one of codegen-data flag is passed.
2+
// RUN: not %clang -### -S --target=aarch64-linux-gnu -fcodegen-data-generate -fcodegen-data-use %s 2>&1 | FileCheck %s --check-prefix=CONFLICT
3+
// RUN: not %clang -### -S --target=arm64-apple-darwin -fcodegen-data-generate -fcodegen-data-use %s 2>&1 | FileCheck %s --check-prefix=CONFLICT
4+
// CONFLICT: error: invalid argument '-fcodegen-data-generate' not allowed with '-fcodegen-data-use'
5+
6+
// Verify the codegen-data-generate (boolean) flag is passed to LLVM
7+
// RUN: %clang -### -S --target=aarch64-linux-gnu -fcodegen-data-generate %s 2>&1| FileCheck %s --check-prefix=GENERATE
8+
// RUN: %clang -### -S --target=arm64-apple-darwin -fcodegen-data-generate %s 2>&1| FileCheck %s --check-prefix=GENERATE
9+
// GENERATE: "-mllvm" "-codegen-data-generate"
10+
11+
// Verify the codegen-data-use-path flag (with a default value) is passed to LLVM.
12+
// RUN: %clang -### -S --target=aarch64-linux-gnu -fcodegen-data-use %s 2>&1| FileCheck %s --check-prefix=USE
13+
// RUN: %clang -### -S --target=arm64-apple-darwin -fcodegen-data-use %s 2>&1| FileCheck %s --check-prefix=USE
14+
// RUN: %clang -### -S --target=aarch64-linux-gnu -fcodegen-data-use=file %s 2>&1 | FileCheck %s --check-prefix=USE-FILE
15+
// RUN: %clang -### -S --target=arm64-apple-darwin -fcodegen-data-use=file %s 2>&1 | FileCheck %s --check-prefix=USE-FILE
16+
// USE: "-mllvm" "-codegen-data-use-path=default.cgdata"
17+
// USE-FILE: "-mllvm" "-codegen-data-use-path=file"
18+
19+
// Verify the codegen-data-generate (boolean) flag with a LTO.
20+
// RUN: %clang -### -flto --target=aarch64-linux-gnu -fcodegen-data-generate %s 2>&1 | FileCheck %s --check-prefix=GENERATE-LTO
21+
// GENERATE-LTO: {{ld(.exe)?"}}
22+
// GENERATE-LTO-SAME: "-plugin-opt=-codegen-data-generate"
23+
// RUN: %clang -### -flto --target=arm64-apple-darwin -fcodegen-data-generate %s 2>&1 | FileCheck %s --check-prefix=GENERATE-LTO-DARWIN
24+
// GENERATE-LTO-DARWIN: {{ld(.exe)?"}}
25+
// GENERATE-LTO-DARWIN-SAME: "-mllvm" "-codegen-data-generate"
26+
27+
// Verify the codegen-data-use-path flag with a LTO is passed to LLVM.
28+
// RUN: %clang -### -flto=thin --target=aarch64-linux-gnu -fcodegen-data-use %s 2>&1 | FileCheck %s --check-prefix=USE-LTO
29+
// USE-LTO: {{ld(.exe)?"}}
30+
// USE-LTO-SAME: "-plugin-opt=-codegen-data-use-path=default.cgdata"
31+
// RUN: %clang -### -flto=thin --target=arm64-apple-darwin -fcodegen-data-use %s 2>&1 | FileCheck %s --check-prefix=USE-LTO-DARWIN
32+
// USE-LTO-DARWIN: {{ld(.exe)?"}}
33+
// USE-LTO-DARWIN-SAME: "-mllvm" "-codegen-data-use-path=default.cgdata"
34+
35+
// For now, LLD MachO supports for generating the codegen data at link time.
36+
// RUN: %clang -### -fuse-ld=lld -B%S/Inputs/lld --target=arm64-apple-darwin -fcodegen-data-generate %s 2>&1 | FileCheck %s --check-prefix=GENERATE-LLD-DARWIN
37+
// GENERATE-LLD-DARWIN: {{ld(.exe)?"}}
38+
// GENERATE-LLD-DARWIN-SAME: "--codegen-data-generate-path=default.cgdata"

0 commit comments

Comments
 (0)