Skip to content

Commit 6f24808

Browse files
authored
[Driver][SYCL] Enable early AOT abilities when creating objects (#11130)
Adds support for enabling the ability to perform a full device compilation and link. Use the -fno-sycl-rdc option when also compiling to object with -c. This will trigger the additional device linking steps to be performed after the device compilation is completed. Currently only supported for AOT enabled targets for spir64. Upon consumption of these new fat objects, the driver will scan for these unique binaries and instead of going through the device link, these binaries will be sent directly to the host link step to be added to the final executable. This is done by introducing a few triple architecture values to designate target images in the fat objects (spir64_gen_image, spir64_fpga_image, spir64_x86_64_image) Performs some refactoring of the device link code, allowing for a common platform for compile and link to access.
1 parent 63e3ec7 commit 6f24808

File tree

12 files changed

+647
-295
lines changed

12 files changed

+647
-295
lines changed

clang/include/clang/Driver/Action.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -629,11 +629,20 @@ class OffloadUnbundlingJobAction final : public JobAction {
629629
DependentOffloadKind(DependentOffloadKind) {}
630630
};
631631

632+
/// Allow for a complete override of the target to unbundle.
633+
/// This is used for specific unbundles used for SYCL AOT when generating full
634+
/// device files that are bundled with the host object.
635+
void setTargetString(std::string Target) { TargetString = Target; }
636+
637+
std::string getTargetString() const { return TargetString; }
638+
632639
private:
633640
/// Container that keeps information about each dependence of this unbundling
634641
/// action.
635642
SmallVector<DependentActionInfo, 6> DependentActionInfoArray;
636643

644+
std::string TargetString;
645+
637646
public:
638647
// Offloading unbundling doesn't change the type of output.
639648
OffloadUnbundlingJobAction(Action *Input);

clang/include/clang/Driver/Options.td

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3858,7 +3858,14 @@ def ftarget_register_alloc_mode_EQ : Joined<["-"], "ftarget-register-alloc-mode=
38583858
HelpText<"Specify a register allocation mode for specific hardware for use by supported "
38593859
"target backends.">;
38603860
def : Flag<["-"], "fsycl-rdc">, Visibility<[ClangOption, CLOption, DXCOption]>, Alias<fgpu_rdc>;
3861-
def : Flag<["-"], "fno-sycl-rdc">, Visibility<[ClangOption, CLOption, DXCOption]>, Alias<fno_gpu_rdc>;
3861+
def : Flag<["-"], "fno-sycl-rdc">,
3862+
Visibility<[ClangOption, CLOption, DXCOption]>, Alias<fno_gpu_rdc>,
3863+
HelpText<"Generate relocatable device code during SYCL offload target "
3864+
"compilation. Use of ‘-fno-sycl-rdc’ in combination with ‘-c’ will "
3865+
"produce final device binaries within the generated fat object. "
3866+
"When using this option, each kernel must be self-contained within "
3867+
"its translation unit (source file). Therefore, the use of "
3868+
"SYCL_EXTERNAL is disallowed when this option is enabled.">;
38623869
def fsycl_optimize_non_user_code : Flag<["-"], "fsycl-optimize-non-user-code">,
38633870
Visibility<[ClangOption, CLOption, DXCOption, CC1Option]>,
38643871
MarshallingInfoFlag<CodeGenOpts<"OptimizeSYCLFramework">>,

clang/lib/Driver/Driver.cpp

Lines changed: 421 additions & 291 deletions
Large diffs are not rendered by default.

clang/lib/Driver/ToolChains/Clang.cpp

Lines changed: 28 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9216,7 +9216,24 @@ void OffloadBundler::ConstructJob(Compilation &C, const JobAction &JA,
92169216
? Action::GetOffloadKindName(Action::OFK_SYCL)
92179217
: Action::GetOffloadKindName(CurKind);
92189218
Triples += '-';
9219-
Triples += CurTC->getTriple().normalize();
9219+
// Incoming DeviceArch is set, break down the Current triple and add the
9220+
// device arch value to it.
9221+
// This is done for AOT targets only.
9222+
std::string DeviceArch;
9223+
llvm::Triple TargetTriple(CurTC->getTriple());
9224+
if (CurKind == Action::OFK_SYCL && TargetTriple.isSPIRAOT() &&
9225+
tools::SYCL::shouldDoPerObjectFileLinking(C))
9226+
DeviceArch = std::string("image");
9227+
if (CurKind != Action::OFK_Host && !DeviceArch.empty()) {
9228+
llvm::Triple T(CurTC->getTriple());
9229+
SmallString<128> ArchName(CurTC->getArchName());
9230+
ArchName += "_";
9231+
ArchName += DeviceArch.data();
9232+
T.setArchName(ArchName);
9233+
Triples += T.normalize();
9234+
} else {
9235+
Triples += CurTC->getTriple().normalize();
9236+
}
92209237
if ((CurKind == Action::OFK_HIP || CurKind == Action::OFK_OpenMP ||
92219238
CurKind == Action::OFK_Cuda || CurKind == Action::OFK_SYCL) &&
92229239
!StringRef(CurDep->getOffloadingArch()).empty() &&
@@ -9467,7 +9484,16 @@ void OffloadBundler::ConstructJobMultipleOutputs(
94679484
Triples += '-';
94689485
Triples += types::getTypeName(types::TY_FPGA_Dependencies);
94699486
}
9470-
CmdArgs.push_back(TCArgs.MakeArgString(Triples));
9487+
std::string TargetString(UA.getTargetString());
9488+
if (!TargetString.empty()) {
9489+
// The target string was provided, we will override the defaults and use
9490+
// the string provided.
9491+
SmallString<128> TSTriple("-targets=");
9492+
TSTriple += TargetString;
9493+
CmdArgs.push_back(TCArgs.MakeArgString(TSTriple));
9494+
} else {
9495+
CmdArgs.push_back(TCArgs.MakeArgString(Triples));
9496+
}
94719497

94729498
// Get bundled file command.
94739499
CmdArgs.push_back(

clang/lib/Driver/ToolChains/Gnu.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -597,7 +597,8 @@ void tools::gnutools::Linker::ConstructJob(Compilation &C, const JobAction &JA,
597597
// linked archives. The unbundled information is a list of files and not
598598
// an actual object/archive. Take that list and pass those to the linker
599599
// instead of the original object.
600-
if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
600+
if (JA.isDeviceOffloading(Action::OFK_OpenMP) ||
601+
JA.isOffloading(Action::OFK_SYCL)) {
601602
InputInfoList UpdatedInputs;
602603
// Go through the Inputs to the link. When a listfile is encountered, we
603604
// know it is an unbundled generated list.
9.56 KB
Binary file not shown.
6.02 KB
Binary file not shown.
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
// Testing for early AOT device linking. These tests use -fno-sycl-rdc
2+
// -c to create final device binaries during the link step when using -fsycl.
3+
// Behavior is restricted to spir64_gen targets for now.
4+
5+
// Create object that contains final device image
6+
// RUN: %clangxx -c -fno-sycl-rdc -fsycl -fsycl-targets=spir64_gen \
7+
// RUN: --target=x86_64-unknown-linux-gnu -Xsycl-target-backend \
8+
// RUN: "-device skl" --sysroot=%S/Inputs/SYCL -### %s 2>&1 \
9+
// RUN: | FileCheck %s -check-prefix=CREATE_IMAGE
10+
// CREATE_IMAGE: clang{{.*}} "-triple" "spir64_gen-unknown-unknown"{{.*}} "-fsycl-is-device"{{.*}} "-o" "[[DEVICE_BC:.+\.bc]]"
11+
// CREATE_IMAGE: llvm-link{{.*}} "-o" "[[LIB_DEVICE_BC:.+\.bc]]"
12+
// CREATE_IMAGE: llvm-link{{.*}} "[[DEVICE_BC]]" "[[LIB_DEVICE_BC]]"{{.*}} "-o" "[[FINAL_DEVICE_BC:.+\.bc]]"
13+
// CREATE_IMAGE: sycl-post-link{{.*}} "-o" "[[POSTLINK_TABLE:.+\.table]]" "[[FINAL_DEVICE_BC]]"
14+
// CREATE_IMAGE: file-table-tform{{.*}} "-o" "[[TFORM_TXT:.+\.txt]]" "[[POSTLINK_TABLE]]"
15+
// CREATE_IMAGE: llvm-spirv{{.*}} "-o" "[[LLVMSPIRV_TXT:.+\.txt]]"{{.*}} "[[TFORM_TXT]]"
16+
// CREATE_IMAGE: ocloc{{.*}} "-output" "[[OCLOC_OUT:.+\.out]]" "-file" "[[LLVMSPIRV_TXT]]"{{.*}} "-device" "skl"
17+
// CREATE_IMAGE: file-table-tform{{.*}} "-o" "[[TFORM_TABLE:.+\.table]]" "[[POSTLINK_TABLE]]" "[[OCLOC_OUT]]"
18+
// CREATE_IMAGE: clang-offload-wrapper{{.*}} "-o=[[WRAPPER_BC:.+\.bc]]"
19+
// CREATE_IMAGE: llc{{.*}} "-o" "[[DEVICE_OBJECT:.+\.o]]" "[[WRAPPER_BC]]"
20+
// CREATE_IMAGE: append-file{{.*}} "--output=[[APPEND_SOURCE:.+\.cpp]]
21+
// CREATE_IMAGE: clang{{.*}} "-fsycl-is-host"{{.*}} "-o" "[[HOST_OBJECT:.+\.o]]"{{.*}} "[[APPEND_SOURCE]]"
22+
// CREATE_IMAGE: clang-offload-bundler{{.*}} "-targets=sycl-spir64_gen_image-unknown-unknown,host-x86_64-unknown-linux-gnu" "-output={{.*}}" "-input=[[DEVICE_OBJECT]]" "-input=[[HOST_OBJECT]]"
23+
24+
// RUN: %clangxx -c -fno-sycl-rdc -fsycl -fsycl-targets=spir64_gen \
25+
// RUN: --target=x86_64-unknown-linux-gnu -Xsycl-target-backend \
26+
// RUN: "-device skl" --sysroot=%S/Inputs/SYCL -ccc-print-phases %s \
27+
// RUN: -fno-sycl-device-lib=all 2>&1 \
28+
// RUN: | FileCheck %s -check-prefix=CREATE_IMAGE_PHASES
29+
// CREATE_IMAGE_PHASES: 0: input, "[[INPUT:.+\.cpp]]", c++, (device-sycl)
30+
// CREATE_IMAGE_PHASES: 1: preprocessor, {0}, c++-cpp-output, (device-sycl)
31+
// CREATE_IMAGE_PHASES: 2: compiler, {1}, ir, (device-sycl)
32+
// CREATE_IMAGE_PHASES: 3: input, "{{.*libsycl-itt-user-wrappers.o.*}}", object
33+
// CREATE_IMAGE_PHASES: 4: clang-offload-unbundler, {3}, object
34+
// CREATE_IMAGE_PHASES: 5: offload, " (spir64_gen-unknown-unknown)" {4}, object
35+
// CREATE_IMAGE_PHASES: 6: input, "{{.*libsycl-itt-compiler-wrappers.o.*}}", object
36+
// CREATE_IMAGE_PHASES: 7: clang-offload-unbundler, {6}, object
37+
// CREATE_IMAGE_PHASES: 8: offload, " (spir64_gen-unknown-unknown)" {7}, object
38+
// CREATE_IMAGE_PHASES: 9: input, "{{.*libsycl-itt-stubs.o.*}}", object
39+
// CREATE_IMAGE_PHASES: 10: clang-offload-unbundler, {9}, object
40+
// CREATE_IMAGE_PHASES: 11: offload, " (spir64_gen-unknown-unknown)" {10}, object
41+
// CREATE_IMAGE_PHASES: 12: linker, {5, 8, 11}, ir, (device-sycl)
42+
// CREATE_IMAGE_PHASES: 13: linker, {2, 12}, ir, (device-sycl)
43+
// CREATE_IMAGE_PHASES: 14: sycl-post-link, {13}, tempfiletable, (device-sycl)
44+
// CREATE_IMAGE_PHASES: 15: file-table-tform, {14}, tempfilelist, (device-sycl)
45+
// CREATE_IMAGE_PHASES: 16: llvm-spirv, {15}, tempfilelist, (device-sycl)
46+
// CREATE_IMAGE_PHASES: 17: backend-compiler, {16}, image, (device-sycl)
47+
// CREATE_IMAGE_PHASES: 18: file-table-tform, {14, 17}, tempfiletable, (device-sycl)
48+
// CREATE_IMAGE_PHASES: 19: clang-offload-wrapper, {18}, object, (device-sycl)
49+
// CREATE_IMAGE_PHASES: 20: offload, "device-sycl (spir64_gen-unknown-unknown)" {19}, object
50+
// CREATE_IMAGE_PHASES: 21: offload, "device-sycl (spir64_gen-unknown-unknown)" {20}, object
51+
// CREATE_IMAGE_PHASES: 22: input, "[[INPUT]]", c++, (host-sycl)
52+
// CREATE_IMAGE_PHASES: 23: append-footer, {22}, c++, (host-sycl)
53+
// CREATE_IMAGE_PHASES: 24: preprocessor, {23}, c++-cpp-output, (host-sycl)
54+
// CREATE_IMAGE_PHASES: 25: offload, "host-sycl (x86_64-unknown-linux-gnu)" {24}, "device-sycl (spir64_gen-unknown-unknown)" {20}, c++-cpp-output
55+
// CREATE_IMAGE_PHASES: 26: compiler, {25}, ir, (host-sycl)
56+
// CREATE_IMAGE_PHASES: 27: backend, {26}, assembler, (host-sycl)
57+
// CREATE_IMAGE_PHASES: 28: assembler, {27}, object, (host-sycl)
58+
// CREATE_IMAGE_PHASES: 29: clang-offload-bundler, {21, 28}, object, (host-sycl)
59+
60+
// Use of -fno-sycl-rdc -c with non-AOT should not perform the device link.
61+
// RUN: %clangxx -c -fno-sycl-rdc -fsycl -fsycl-targets=spir64 \
62+
// RUN: --target=x86_64-unknown-linux-gnu -ccc-print-phases %s \
63+
// RUN: -fno-sycl-device-lib=all 2>&1 \
64+
// RUN: | FileCheck %s -check-prefix=JIT_ONLY_PHASES
65+
// JIT_ONLY_PHASES: 0: input, "[[INPUT:.+\.cpp]]", c++, (device-sycl)
66+
// JIT_ONLY_PHASES: 1: preprocessor, {0}, c++-cpp-output, (device-sycl)
67+
// JIT_ONLY_PHASES: 2: compiler, {1}, ir, (device-sycl)
68+
// JIT_ONLY_PHASES: 3: offload, "device-sycl (spir64-unknown-unknown)" {2}, ir
69+
// JIT_ONLY_PHASES: 4: input, "[[INPUT]]", c++, (host-sycl)
70+
// JIT_ONLY_PHASES: 5: append-footer, {4}, c++, (host-sycl)
71+
// JIT_ONLY_PHASES: 6: preprocessor, {5}, c++-cpp-output, (host-sycl)
72+
// JIT_ONLY_PHASES: 7: offload, "host-sycl (x86_64-unknown-linux-gnu)" {6}, "device-sycl (spir64-unknown-unknown)" {2}, c++-cpp-output
73+
// JIT_ONLY_PHASES: 8: compiler, {7}, ir, (host-sycl)
74+
// JIT_ONLY_PHASES: 9: backend, {8}, assembler, (host-sycl)
75+
// JIT_ONLY_PHASES: 10: assembler, {9}, object, (host-sycl)
76+
// JIT_ONLY_PHASES: 11: clang-offload-bundler, {3, 10}, object, (host-sycl)
77+
78+
// Mix and match JIT and AOT phases check. Expectation is for AOT to perform
79+
// early device link, and JIT to just produce the LLVM-IR.
80+
// RUN: %clangxx -c -fno-sycl-rdc -fsycl -fsycl-targets=spir64,spir64_gen \
81+
// RUN: --target=x86_64-unknown-linux-gnu --sysroot=%S/Inputs/SYCL \
82+
// RUN: -Xsycl-target-backend=spir64_gen "-device skl" \
83+
// RUN: -ccc-print-phases %s -fno-sycl-device-lib=all 2>&1 \
84+
// RUN: | FileCheck %s -check-prefix=JIT_AOT_PHASES
85+
// JIT_AOT_PHASES: 0: input, "[[INPUT:.+\.cpp]]", c++, (device-sycl)
86+
// JIT_AOT_PHASES: 1: preprocessor, {0}, c++-cpp-output, (device-sycl)
87+
// JIT_AOT_PHASES: 2: compiler, {1}, ir, (device-sycl)
88+
// JIT_AOT_PHASES: 3: offload, "device-sycl (spir64-unknown-unknown)" {2}, ir
89+
// JIT_AOT_PHASES: 4: input, "[[INPUT]]", c++, (device-sycl)
90+
// JIT_AOT_PHASES: 5: preprocessor, {4}, c++-cpp-output, (device-sycl)
91+
// JIT_AOT_PHASES: 6: compiler, {5}, ir, (device-sycl)
92+
// JIT_AOT_PHASES: 7: input, "{{.*libsycl-itt-user-wrappers.o.*}}", object
93+
// JIT_AOT_PHASES: 8: clang-offload-unbundler, {7}, object
94+
// JIT_AOT_PHASES: 9: offload, " (spir64_gen-unknown-unknown)" {8}, object
95+
// JIT_AOT_PHASES: 10: input, "{{.*libsycl-itt-compiler-wrappers.o.*}}", object
96+
// JIT_AOT_PHASES: 11: clang-offload-unbundler, {10}, object
97+
// JIT_AOT_PHASES: 12: offload, " (spir64_gen-unknown-unknown)" {11}, object
98+
// JIT_AOT_PHASES: 13: input, "{{.*libsycl-itt-stubs.o.*}}", object
99+
// JIT_AOT_PHASES: 14: clang-offload-unbundler, {13}, object
100+
// JIT_AOT_PHASES: 15: offload, " (spir64_gen-unknown-unknown)" {14}, object
101+
// JIT_AOT_PHASES: 16: linker, {9, 12, 15}, ir, (device-sycl)
102+
// JIT_AOT_PHASES: 17: linker, {6, 16}, ir, (device-sycl)
103+
// JIT_AOT_PHASES: 18: sycl-post-link, {17}, tempfiletable, (device-sycl)
104+
// JIT_AOT_PHASES: 19: file-table-tform, {18}, tempfilelist, (device-sycl)
105+
// JIT_AOT_PHASES: 20: llvm-spirv, {19}, tempfilelist, (device-sycl)
106+
// JIT_AOT_PHASES: 21: backend-compiler, {20}, image, (device-sycl)
107+
// JIT_AOT_PHASES: 22: file-table-tform, {18, 21}, tempfiletable, (device-sycl)
108+
// JIT_AOT_PHASES: 23: clang-offload-wrapper, {22}, object, (device-sycl)
109+
// JIT_AOT_PHASES: 24: offload, "device-sycl (spir64_gen-unknown-unknown)" {23}, object
110+
// JIT_AOT_PHASES: 25: offload, "device-sycl (spir64_gen-unknown-unknown)" {24}, object
111+
// JIT_AOT_PHASES: 26: input, "[[INPUT]]", c++, (host-sycl)
112+
// JIT_AOT_PHASES: 27: append-footer, {26}, c++, (host-sycl)
113+
// JIT_AOT_PHASES: 28: preprocessor, {27}, c++-cpp-output, (host-sycl)
114+
// JIT_AOT_PHASES: 29: offload, "host-sycl (x86_64-unknown-linux-gnu)" {28}, "device-sycl (spir64_gen-unknown-unknown)" {24}, c++-cpp-output
115+
// JIT_AOT_PHASES: 30: compiler, {29}, ir, (host-sycl)
116+
// JIT_AOT_PHASES: 31: backend, {30}, assembler, (host-sycl)
117+
// JIT_AOT_PHASES: 32: assembler, {31}, object, (host-sycl)
118+
// JIT_AOT_PHASES: 33: clang-offload-bundler, {3, 25, 32}, object, (host-sycl)
119+
120+
// Consume object and library that contain final device images.
121+
// RUN: %clangxx -fsycl --target=x86_64-unknown-linux-gnu -### \
122+
// RUN: %S/Inputs/SYCL/objgenimage.o %s 2>&1 \
123+
// RUN: | FileCheck %s -check-prefix=CONSUME_OBJ
124+
// CONSUME_OBJ: clang-offload-bundler{{.*}} "-type=o" "-targets=sycl-spir64_gen_image-unknown-unknown" "-input={{.*}}objgenimage.o" "-output=[[DEVICE_IMAGE_OBJ:.+\.o]]
125+
// CONSUME_OBJ: ld{{.*}} "[[DEVICE_IMAGE_OBJ]]"
126+
127+
// RUN: %clangxx -fsycl --target=x86_64-unknown-linux-gnu -### \
128+
// RUN: %S/Inputs/SYCL/libgenimage.a %s 2>&1 \
129+
// RUN: | FileCheck %s -check-prefix=CONSUME_LIB
130+
// CONSUME_LIB: clang-offload-bundler{{.*}} "-type=aoo" "-targets=sycl-spir64_gen_image-unknown-unknown" "-input={{.*}}libgenimage.a" "-output=[[DEVICE_IMAGE_LIB:.+\.txt]]
131+
// CONSUME_LIB: ld{{.*}} "@[[DEVICE_IMAGE_LIB]]"

llvm/include/llvm/TargetParser/Triple.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,8 +156,11 @@ class Triple {
156156
MipsSubArch_r6,
157157

158158
SPIRSubArch_fpga,
159+
SPIRSubArch_fpga_image,
159160
SPIRSubArch_gen,
161+
SPIRSubArch_gen_image,
160162
SPIRSubArch_x86_64,
163+
SPIRSubArch_x86_64_image,
161164

162165
PPCSubArch_spe,
163166

@@ -785,6 +788,13 @@ class Triple {
785788
return getArch() == Triple::spir || getArch() == Triple::spir64;
786789
}
787790

791+
/// Tests whether the target is SPIR and AOT related.
792+
bool isSPIRAOT() const {
793+
return isSPIR() && (getSubArch() == Triple::SPIRSubArch_fpga ||
794+
getSubArch() == Triple::SPIRSubArch_gen ||
795+
getSubArch() == Triple::SPIRSubArch_x86_64);
796+
}
797+
788798
/// Tests whether the target is SPIR-V (32/64-bit/Logical).
789799
bool isSPIRV() const {
790800
return getArch() == Triple::spirv32 || getArch() == Triple::spirv64 ||

llvm/lib/TargetParser/Triple.cpp

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -694,10 +694,16 @@ static Triple::SubArchType parseSubArch(StringRef SubArchName) {
694694
if (SA.consume_front("spir64_") || SA.consume_front("spir_")) {
695695
if (SA == "fpga")
696696
return Triple::SPIRSubArch_fpga;
697+
else if (SA == "fpga_image")
698+
return Triple::SPIRSubArch_fpga_image;
697699
else if (SA == "gen")
698700
return Triple::SPIRSubArch_gen;
701+
else if (SA == "gen_image")
702+
return Triple::SPIRSubArch_gen_image;
699703
else if (SA == "x86_64")
700704
return Triple::SPIRSubArch_x86_64;
705+
else if (SA == "x86_64_image")
706+
return Triple::SPIRSubArch_x86_64_image;
701707
}
702708
}
703709

sycl/doc/UsersManual.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,14 @@ and not recommended to use in production environment.
2222
You can specify more than one target, comma separated. Default just in time
2323
(JIT) compilation target can be added to the list to produce a combination
2424
of AOT and JIT code in the resulting fat binary.
25+
26+
Normally, '-fsycl-targets' is specified when linking an application, in
27+
which case the AOT compiled device binaries are embedded within the
28+
application’s fat executable. However, this option may also be used in
29+
combination with '-c' and '-fno-sycl-rdc' when compiling a source file.
30+
In this case, the AOT compiled device binaries are embedded within the fat
31+
object file.
32+
2533
The following triples are supported by default:
2634
* spir64 - this is the default generic SPIR-V target;
2735
* spir64_x86_64 - generate code ahead of time for x86_64 CPUs;

sycl/doc/design/CompilerAndRuntimeDesign.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -423,6 +423,30 @@ Case 1 can be identified in the device binary generation stage (step 1) by
423423
scanning the known kernels. Case 2 must be verified by the driver by checking
424424
for newly introduced kernels in the final link stage (step 3).
425425

426+
#### Device Link during compilation
427+
428+
The `-fno-sycl-rdc` flag can be used in combination with the `-c` option
429+
when generating fat objects. This option combination informs the compiler to
430+
perform a full device link stage against the device object, creating a fat
431+
object that contains the corresponding host object and a fully compiled device
432+
binary. It is expected that usage of `-fno-sycl-rdc` coincide with
433+
ahead of time compiling.
434+
435+
When using the generated fat object in this case, the compiler will recognize
436+
the fat object that contains the fully linked device binary. The device binary
437+
will be unbundled and linked during the final host link and will not be sent
438+
through any additional device linking steps.
439+
440+
1. Generation of fat object: a.cpp -> a_fat.o (contains host object and full
441+
device image)
442+
2. Linking: a_fat.o -> executable
443+
444+
The generation of the full device image during the compilation (-c) step of
445+
creating the object allows for library creation that does not require full
446+
device linking steps which can be a burden to the user. Providing these early
447+
device linking steps give the provider of the archives/objects a better user
448+
experience.
449+
426450
#### Device code post-link step
427451

428452
At link time all the device code is linked into

0 commit comments

Comments
 (0)