Skip to content

[Driver][SYCL] Enable early AOT abilities when creating objects #11130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Oct 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
0e782ed
[Driver][SYCL] Enable early AOT abilities when creating objects
mdtoguchi Sep 7, 2023
955c848
Clang format
mdtoguchi Sep 8, 2023
b3e37ca
Fix behaviors associated with -fsycl-add-targets
mdtoguchi Sep 9, 2023
5d655d8
Address a few review comments
mdtoguchi Sep 11, 2023
1a048c2
Update triple usage to designate the bundle
mdtoguchi Sep 14, 2023
934b9a1
Address a few more review comments
mdtoguchi Sep 14, 2023
54a41bf
Add LIT testing for -ftarget-device-link behaviors
mdtoguchi Sep 15, 2023
8f98ea2
Add missed files for testing
mdtoguchi Sep 15, 2023
e5cc2ce
Fixup lambda function camelcase
mdtoguchi Sep 15, 2023
cfb779d
Introduce design elements for the option to the design doc and update…
mdtoguchi Sep 17, 2023
87ae106
clang-format
mdtoguchi Sep 17, 2023
5f751a4
Update some variable names and comments to address reviews
mdtoguchi Sep 20, 2023
876a077
Adjust implementation to follow updated option specification
mdtoguchi Sep 28, 2023
760cfcd
Add some -fno-sycl-rdc usage information with -fsycl-targets
mdtoguchi Sep 28, 2023
733dbed
Fix LIT test for Windows
mdtoguchi Sep 29, 2023
d275188
Remove old option spelling
mdtoguchi Oct 3, 2023
2132ce8
Address a few review comments
mdtoguchi Oct 3, 2023
3c9f857
Add isSPIRAOT check for the target triple
mdtoguchi Oct 3, 2023
7a0b26a
Merge remote-tracking branch 'intel_llvm/sycl' into early-device-link
mdtoguchi Oct 3, 2023
26dd679
Improve variable names and doc for appendSYCLDeviceLink
mdtoguchi Oct 3, 2023
748a9b2
Clang Format
mdtoguchi Oct 11, 2023
a11b121
Fix word usage
mdtoguchi Oct 11, 2023
cf03de1
Add a few tests to check JIT target behavior and another clang format
mdtoguchi Oct 11, 2023
87b7986
Merge remote-tracking branch 'intel_llvm/sycl' into early-device-link
mdtoguchi Oct 12, 2023
3367f72
Fix test to use test sycl device libs
mdtoguchi Oct 12, 2023
8cd0dc9
Merge remote-tracking branch 'intel_llvm/sycl' into early-device-link
mdtoguchi Oct 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions clang/include/clang/Driver/Action.h
Original file line number Diff line number Diff line change
Expand Up @@ -629,11 +629,20 @@ class OffloadUnbundlingJobAction final : public JobAction {
DependentOffloadKind(DependentOffloadKind) {}
};

/// Allow for a complete override of the target to unbundle.
/// This is used for specific unbundles used for SYCL AOT when generating full
/// device files that are bundled with the host object.
void setTargetString(std::string Target) { TargetString = Target; }

std::string getTargetString() const { return TargetString; }

private:
/// Container that keeps information about each dependence of this unbundling
/// action.
SmallVector<DependentActionInfo, 6> DependentActionInfoArray;

std::string TargetString;

public:
// Offloading unbundling doesn't change the type of output.
OffloadUnbundlingJobAction(Action *Input);
Expand Down
9 changes: 8 additions & 1 deletion clang/include/clang/Driver/Options.td
Original file line number Diff line number Diff line change
Expand Up @@ -3858,7 +3858,14 @@ def ftarget_register_alloc_mode_EQ : Joined<["-"], "ftarget-register-alloc-mode=
HelpText<"Specify a register allocation mode for specific hardware for use by supported "
"target backends.">;
def : Flag<["-"], "fsycl-rdc">, Visibility<[ClangOption, CLOption, DXCOption]>, Alias<fgpu_rdc>;
def : Flag<["-"], "fno-sycl-rdc">, Visibility<[ClangOption, CLOption, DXCOption]>, Alias<fno_gpu_rdc>;
def : Flag<["-"], "fno-sycl-rdc">,
Visibility<[ClangOption, CLOption, DXCOption]>, Alias<fno_gpu_rdc>,
HelpText<"Generate relocatable device code during SYCL offload target "
"compilation. Use of ‘-fno-sycl-rdc’ in combination with ‘-c’ will "
"produce final device binaries within the generated fat object. "
"When using this option, each kernel must be self-contained within "
"its translation unit (source file). Therefore, the use of "
"SYCL_EXTERNAL is disallowed when this option is enabled.">;
def fsycl_optimize_non_user_code : Flag<["-"], "fsycl-optimize-non-user-code">,
Visibility<[ClangOption, CLOption, DXCOption, CC1Option]>,
MarshallingInfoFlag<CodeGenOpts<"OptimizeSYCLFramework">>,
Expand Down
712 changes: 421 additions & 291 deletions clang/lib/Driver/Driver.cpp

Large diffs are not rendered by default.

30 changes: 28 additions & 2 deletions clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9216,7 +9216,24 @@ void OffloadBundler::ConstructJob(Compilation &C, const JobAction &JA,
? Action::GetOffloadKindName(Action::OFK_SYCL)
: Action::GetOffloadKindName(CurKind);
Triples += '-';
Triples += CurTC->getTriple().normalize();
// Incoming DeviceArch is set, break down the Current triple and add the
// device arch value to it.
// This is done for AOT targets only.
std::string DeviceArch;
llvm::Triple TargetTriple(CurTC->getTriple());
if (CurKind == Action::OFK_SYCL && TargetTriple.isSPIRAOT() &&
tools::SYCL::shouldDoPerObjectFileLinking(C))
DeviceArch = std::string("image");
if (CurKind != Action::OFK_Host && !DeviceArch.empty()) {
llvm::Triple T(CurTC->getTriple());
SmallString<128> ArchName(CurTC->getArchName());
ArchName += "_";
ArchName += DeviceArch.data();
T.setArchName(ArchName);
Triples += T.normalize();
} else {
Triples += CurTC->getTriple().normalize();
}
if ((CurKind == Action::OFK_HIP || CurKind == Action::OFK_OpenMP ||
CurKind == Action::OFK_Cuda || CurKind == Action::OFK_SYCL) &&
!StringRef(CurDep->getOffloadingArch()).empty() &&
Expand Down Expand Up @@ -9467,7 +9484,16 @@ void OffloadBundler::ConstructJobMultipleOutputs(
Triples += '-';
Triples += types::getTypeName(types::TY_FPGA_Dependencies);
}
CmdArgs.push_back(TCArgs.MakeArgString(Triples));
std::string TargetString(UA.getTargetString());
if (!TargetString.empty()) {
// The target string was provided, we will override the defaults and use
// the string provided.
SmallString<128> TSTriple("-targets=");
TSTriple += TargetString;
CmdArgs.push_back(TCArgs.MakeArgString(TSTriple));
} else {
CmdArgs.push_back(TCArgs.MakeArgString(Triples));
}

// Get bundled file command.
CmdArgs.push_back(
Expand Down
3 changes: 2 additions & 1 deletion clang/lib/Driver/ToolChains/Gnu.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -597,7 +597,8 @@ void tools::gnutools::Linker::ConstructJob(Compilation &C, const JobAction &JA,
// linked archives. The unbundled information is a list of files and not
// an actual object/archive. Take that list and pass those to the linker
// instead of the original object.
if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
if (JA.isDeviceOffloading(Action::OFK_OpenMP) ||
JA.isOffloading(Action::OFK_SYCL)) {
InputInfoList UpdatedInputs;
// Go through the Inputs to the link. When a listfile is encountered, we
// know it is an unbundled generated list.
Expand Down
Binary file added clang/test/Driver/Inputs/SYCL/libgenimage.a
Binary file not shown.
Binary file added clang/test/Driver/Inputs/SYCL/objgenimage.o
Binary file not shown.
131 changes: 131 additions & 0 deletions clang/test/Driver/sycl-early-device-link.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
// Testing for early AOT device linking. These tests use -fno-sycl-rdc
// -c to create final device binaries during the link step when using -fsycl.
// Behavior is restricted to spir64_gen targets for now.

// Create object that contains final device image
// RUN: %clangxx -c -fno-sycl-rdc -fsycl -fsycl-targets=spir64_gen \
// RUN: --target=x86_64-unknown-linux-gnu -Xsycl-target-backend \
// RUN: "-device skl" --sysroot=%S/Inputs/SYCL -### %s 2>&1 \
// RUN: | FileCheck %s -check-prefix=CREATE_IMAGE
// CREATE_IMAGE: clang{{.*}} "-triple" "spir64_gen-unknown-unknown"{{.*}} "-fsycl-is-device"{{.*}} "-o" "[[DEVICE_BC:.+\.bc]]"
// CREATE_IMAGE: llvm-link{{.*}} "-o" "[[LIB_DEVICE_BC:.+\.bc]]"
// CREATE_IMAGE: llvm-link{{.*}} "[[DEVICE_BC]]" "[[LIB_DEVICE_BC]]"{{.*}} "-o" "[[FINAL_DEVICE_BC:.+\.bc]]"
// CREATE_IMAGE: sycl-post-link{{.*}} "-o" "[[POSTLINK_TABLE:.+\.table]]" "[[FINAL_DEVICE_BC]]"
// CREATE_IMAGE: file-table-tform{{.*}} "-o" "[[TFORM_TXT:.+\.txt]]" "[[POSTLINK_TABLE]]"
// CREATE_IMAGE: llvm-spirv{{.*}} "-o" "[[LLVMSPIRV_TXT:.+\.txt]]"{{.*}} "[[TFORM_TXT]]"
// CREATE_IMAGE: ocloc{{.*}} "-output" "[[OCLOC_OUT:.+\.out]]" "-file" "[[LLVMSPIRV_TXT]]"{{.*}} "-device" "skl"
// CREATE_IMAGE: file-table-tform{{.*}} "-o" "[[TFORM_TABLE:.+\.table]]" "[[POSTLINK_TABLE]]" "[[OCLOC_OUT]]"
// CREATE_IMAGE: clang-offload-wrapper{{.*}} "-o=[[WRAPPER_BC:.+\.bc]]"
// CREATE_IMAGE: llc{{.*}} "-o" "[[DEVICE_OBJECT:.+\.o]]" "[[WRAPPER_BC]]"
// CREATE_IMAGE: append-file{{.*}} "--output=[[APPEND_SOURCE:.+\.cpp]]
// CREATE_IMAGE: clang{{.*}} "-fsycl-is-host"{{.*}} "-o" "[[HOST_OBJECT:.+\.o]]"{{.*}} "[[APPEND_SOURCE]]"
// CREATE_IMAGE: clang-offload-bundler{{.*}} "-targets=sycl-spir64_gen_image-unknown-unknown,host-x86_64-unknown-linux-gnu" "-output={{.*}}" "-input=[[DEVICE_OBJECT]]" "-input=[[HOST_OBJECT]]"

// RUN: %clangxx -c -fno-sycl-rdc -fsycl -fsycl-targets=spir64_gen \
// RUN: --target=x86_64-unknown-linux-gnu -Xsycl-target-backend \
// RUN: "-device skl" --sysroot=%S/Inputs/SYCL -ccc-print-phases %s \
// RUN: -fno-sycl-device-lib=all 2>&1 \
// RUN: | FileCheck %s -check-prefix=CREATE_IMAGE_PHASES
// CREATE_IMAGE_PHASES: 0: input, "[[INPUT:.+\.cpp]]", c++, (device-sycl)
// CREATE_IMAGE_PHASES: 1: preprocessor, {0}, c++-cpp-output, (device-sycl)
// CREATE_IMAGE_PHASES: 2: compiler, {1}, ir, (device-sycl)
// CREATE_IMAGE_PHASES: 3: input, "{{.*libsycl-itt-user-wrappers.o.*}}", object
// CREATE_IMAGE_PHASES: 4: clang-offload-unbundler, {3}, object
// CREATE_IMAGE_PHASES: 5: offload, " (spir64_gen-unknown-unknown)" {4}, object
// CREATE_IMAGE_PHASES: 6: input, "{{.*libsycl-itt-compiler-wrappers.o.*}}", object
// CREATE_IMAGE_PHASES: 7: clang-offload-unbundler, {6}, object
// CREATE_IMAGE_PHASES: 8: offload, " (spir64_gen-unknown-unknown)" {7}, object
// CREATE_IMAGE_PHASES: 9: input, "{{.*libsycl-itt-stubs.o.*}}", object
// CREATE_IMAGE_PHASES: 10: clang-offload-unbundler, {9}, object
// CREATE_IMAGE_PHASES: 11: offload, " (spir64_gen-unknown-unknown)" {10}, object
// CREATE_IMAGE_PHASES: 12: linker, {5, 8, 11}, ir, (device-sycl)
// CREATE_IMAGE_PHASES: 13: linker, {2, 12}, ir, (device-sycl)
// CREATE_IMAGE_PHASES: 14: sycl-post-link, {13}, tempfiletable, (device-sycl)
// CREATE_IMAGE_PHASES: 15: file-table-tform, {14}, tempfilelist, (device-sycl)
// CREATE_IMAGE_PHASES: 16: llvm-spirv, {15}, tempfilelist, (device-sycl)
// CREATE_IMAGE_PHASES: 17: backend-compiler, {16}, image, (device-sycl)
// CREATE_IMAGE_PHASES: 18: file-table-tform, {14, 17}, tempfiletable, (device-sycl)
// CREATE_IMAGE_PHASES: 19: clang-offload-wrapper, {18}, object, (device-sycl)
// CREATE_IMAGE_PHASES: 20: offload, "device-sycl (spir64_gen-unknown-unknown)" {19}, object
// CREATE_IMAGE_PHASES: 21: offload, "device-sycl (spir64_gen-unknown-unknown)" {20}, object
// CREATE_IMAGE_PHASES: 22: input, "[[INPUT]]", c++, (host-sycl)
// CREATE_IMAGE_PHASES: 23: append-footer, {22}, c++, (host-sycl)
// CREATE_IMAGE_PHASES: 24: preprocessor, {23}, c++-cpp-output, (host-sycl)
// CREATE_IMAGE_PHASES: 25: offload, "host-sycl (x86_64-unknown-linux-gnu)" {24}, "device-sycl (spir64_gen-unknown-unknown)" {20}, c++-cpp-output
// CREATE_IMAGE_PHASES: 26: compiler, {25}, ir, (host-sycl)
// CREATE_IMAGE_PHASES: 27: backend, {26}, assembler, (host-sycl)
// CREATE_IMAGE_PHASES: 28: assembler, {27}, object, (host-sycl)
// CREATE_IMAGE_PHASES: 29: clang-offload-bundler, {21, 28}, object, (host-sycl)

// Use of -fno-sycl-rdc -c with non-AOT should not perform the device link.
// RUN: %clangxx -c -fno-sycl-rdc -fsycl -fsycl-targets=spir64 \
// RUN: --target=x86_64-unknown-linux-gnu -ccc-print-phases %s \
// RUN: -fno-sycl-device-lib=all 2>&1 \
// RUN: | FileCheck %s -check-prefix=JIT_ONLY_PHASES
// JIT_ONLY_PHASES: 0: input, "[[INPUT:.+\.cpp]]", c++, (device-sycl)
// JIT_ONLY_PHASES: 1: preprocessor, {0}, c++-cpp-output, (device-sycl)
// JIT_ONLY_PHASES: 2: compiler, {1}, ir, (device-sycl)
// JIT_ONLY_PHASES: 3: offload, "device-sycl (spir64-unknown-unknown)" {2}, ir
// JIT_ONLY_PHASES: 4: input, "[[INPUT]]", c++, (host-sycl)
// JIT_ONLY_PHASES: 5: append-footer, {4}, c++, (host-sycl)
// JIT_ONLY_PHASES: 6: preprocessor, {5}, c++-cpp-output, (host-sycl)
// JIT_ONLY_PHASES: 7: offload, "host-sycl (x86_64-unknown-linux-gnu)" {6}, "device-sycl (spir64-unknown-unknown)" {2}, c++-cpp-output
// JIT_ONLY_PHASES: 8: compiler, {7}, ir, (host-sycl)
// JIT_ONLY_PHASES: 9: backend, {8}, assembler, (host-sycl)
// JIT_ONLY_PHASES: 10: assembler, {9}, object, (host-sycl)
// JIT_ONLY_PHASES: 11: clang-offload-bundler, {3, 10}, object, (host-sycl)

// Mix and match JIT and AOT phases check. Expectation is for AOT to perform
// early device link, and JIT to just produce the LLVM-IR.
// RUN: %clangxx -c -fno-sycl-rdc -fsycl -fsycl-targets=spir64,spir64_gen \
// RUN: --target=x86_64-unknown-linux-gnu --sysroot=%S/Inputs/SYCL \
// RUN: -Xsycl-target-backend=spir64_gen "-device skl" \
// RUN: -ccc-print-phases %s -fno-sycl-device-lib=all 2>&1 \
// RUN: | FileCheck %s -check-prefix=JIT_AOT_PHASES
// JIT_AOT_PHASES: 0: input, "[[INPUT:.+\.cpp]]", c++, (device-sycl)
// JIT_AOT_PHASES: 1: preprocessor, {0}, c++-cpp-output, (device-sycl)
// JIT_AOT_PHASES: 2: compiler, {1}, ir, (device-sycl)
// JIT_AOT_PHASES: 3: offload, "device-sycl (spir64-unknown-unknown)" {2}, ir
// JIT_AOT_PHASES: 4: input, "[[INPUT]]", c++, (device-sycl)
// JIT_AOT_PHASES: 5: preprocessor, {4}, c++-cpp-output, (device-sycl)
// JIT_AOT_PHASES: 6: compiler, {5}, ir, (device-sycl)
// JIT_AOT_PHASES: 7: input, "{{.*libsycl-itt-user-wrappers.o.*}}", object
// JIT_AOT_PHASES: 8: clang-offload-unbundler, {7}, object
// JIT_AOT_PHASES: 9: offload, " (spir64_gen-unknown-unknown)" {8}, object
// JIT_AOT_PHASES: 10: input, "{{.*libsycl-itt-compiler-wrappers.o.*}}", object
// JIT_AOT_PHASES: 11: clang-offload-unbundler, {10}, object
// JIT_AOT_PHASES: 12: offload, " (spir64_gen-unknown-unknown)" {11}, object
// JIT_AOT_PHASES: 13: input, "{{.*libsycl-itt-stubs.o.*}}", object
// JIT_AOT_PHASES: 14: clang-offload-unbundler, {13}, object
// JIT_AOT_PHASES: 15: offload, " (spir64_gen-unknown-unknown)" {14}, object
// JIT_AOT_PHASES: 16: linker, {9, 12, 15}, ir, (device-sycl)
// JIT_AOT_PHASES: 17: linker, {6, 16}, ir, (device-sycl)
// JIT_AOT_PHASES: 18: sycl-post-link, {17}, tempfiletable, (device-sycl)
// JIT_AOT_PHASES: 19: file-table-tform, {18}, tempfilelist, (device-sycl)
// JIT_AOT_PHASES: 20: llvm-spirv, {19}, tempfilelist, (device-sycl)
// JIT_AOT_PHASES: 21: backend-compiler, {20}, image, (device-sycl)
// JIT_AOT_PHASES: 22: file-table-tform, {18, 21}, tempfiletable, (device-sycl)
// JIT_AOT_PHASES: 23: clang-offload-wrapper, {22}, object, (device-sycl)
// JIT_AOT_PHASES: 24: offload, "device-sycl (spir64_gen-unknown-unknown)" {23}, object
// JIT_AOT_PHASES: 25: offload, "device-sycl (spir64_gen-unknown-unknown)" {24}, object
// JIT_AOT_PHASES: 26: input, "[[INPUT]]", c++, (host-sycl)
// JIT_AOT_PHASES: 27: append-footer, {26}, c++, (host-sycl)
// JIT_AOT_PHASES: 28: preprocessor, {27}, c++-cpp-output, (host-sycl)
// JIT_AOT_PHASES: 29: offload, "host-sycl (x86_64-unknown-linux-gnu)" {28}, "device-sycl (spir64_gen-unknown-unknown)" {24}, c++-cpp-output
// JIT_AOT_PHASES: 30: compiler, {29}, ir, (host-sycl)
// JIT_AOT_PHASES: 31: backend, {30}, assembler, (host-sycl)
// JIT_AOT_PHASES: 32: assembler, {31}, object, (host-sycl)
// JIT_AOT_PHASES: 33: clang-offload-bundler, {3, 25, 32}, object, (host-sycl)

// Consume object and library that contain final device images.
// RUN: %clangxx -fsycl --target=x86_64-unknown-linux-gnu -### \
// RUN: %S/Inputs/SYCL/objgenimage.o %s 2>&1 \
// RUN: | FileCheck %s -check-prefix=CONSUME_OBJ
// CONSUME_OBJ: clang-offload-bundler{{.*}} "-type=o" "-targets=sycl-spir64_gen_image-unknown-unknown" "-input={{.*}}objgenimage.o" "-output=[[DEVICE_IMAGE_OBJ:.+\.o]]
// CONSUME_OBJ: ld{{.*}} "[[DEVICE_IMAGE_OBJ]]"

// RUN: %clangxx -fsycl --target=x86_64-unknown-linux-gnu -### \
// RUN: %S/Inputs/SYCL/libgenimage.a %s 2>&1 \
// RUN: | FileCheck %s -check-prefix=CONSUME_LIB
// CONSUME_LIB: clang-offload-bundler{{.*}} "-type=aoo" "-targets=sycl-spir64_gen_image-unknown-unknown" "-input={{.*}}libgenimage.a" "-output=[[DEVICE_IMAGE_LIB:.+\.txt]]
// CONSUME_LIB: ld{{.*}} "@[[DEVICE_IMAGE_LIB]]"
10 changes: 10 additions & 0 deletions llvm/include/llvm/TargetParser/Triple.h
Original file line number Diff line number Diff line change
Expand Up @@ -156,8 +156,11 @@ class Triple {
MipsSubArch_r6,

SPIRSubArch_fpga,
SPIRSubArch_fpga_image,
SPIRSubArch_gen,
SPIRSubArch_gen_image,
SPIRSubArch_x86_64,
SPIRSubArch_x86_64_image,

PPCSubArch_spe,

Expand Down Expand Up @@ -785,6 +788,13 @@ class Triple {
return getArch() == Triple::spir || getArch() == Triple::spir64;
}

/// Tests whether the target is SPIR and AOT related.
bool isSPIRAOT() const {
return isSPIR() && (getSubArch() == Triple::SPIRSubArch_fpga ||
getSubArch() == Triple::SPIRSubArch_gen ||
getSubArch() == Triple::SPIRSubArch_x86_64);
}

/// Tests whether the target is SPIR-V (32/64-bit/Logical).
bool isSPIRV() const {
return getArch() == Triple::spirv32 || getArch() == Triple::spirv64 ||
Expand Down
6 changes: 6 additions & 0 deletions llvm/lib/TargetParser/Triple.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -694,10 +694,16 @@ static Triple::SubArchType parseSubArch(StringRef SubArchName) {
if (SA.consume_front("spir64_") || SA.consume_front("spir_")) {
if (SA == "fpga")
return Triple::SPIRSubArch_fpga;
else if (SA == "fpga_image")
return Triple::SPIRSubArch_fpga_image;
else if (SA == "gen")
return Triple::SPIRSubArch_gen;
else if (SA == "gen_image")
return Triple::SPIRSubArch_gen_image;
else if (SA == "x86_64")
return Triple::SPIRSubArch_x86_64;
else if (SA == "x86_64_image")
return Triple::SPIRSubArch_x86_64_image;
}
}

Expand Down
8 changes: 8 additions & 0 deletions sycl/doc/UsersManual.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,14 @@ and not recommended to use in production environment.
You can specify more than one target, comma separated. Default just in time
(JIT) compilation target can be added to the list to produce a combination
of AOT and JIT code in the resulting fat binary.

Normally, '-fsycl-targets' is specified when linking an application, in
which case the AOT compiled device binaries are embedded within the
application’s fat executable. However, this option may also be used in
combination with '-c' and '-fno-sycl-rdc' when compiling a source file.
In this case, the AOT compiled device binaries are embedded within the fat
object file.

The following triples are supported by default:
* spir64 - this is the default generic SPIR-V target;
* spir64_x86_64 - generate code ahead of time for x86_64 CPUs;
Expand Down
24 changes: 24 additions & 0 deletions sycl/doc/design/CompilerAndRuntimeDesign.md
Original file line number Diff line number Diff line change
Expand Up @@ -423,6 +423,30 @@ Case 1 can be identified in the device binary generation stage (step 1) by
scanning the known kernels. Case 2 must be verified by the driver by checking
for newly introduced kernels in the final link stage (step 3).

#### Device Link during compilation

The `-fno-sycl-rdc` flag can be used in combination with the `-c` option
when generating fat objects. This option combination informs the compiler to
perform a full device link stage against the device object, creating a fat
object that contains the corresponding host object and a fully compiled device
binary. It is expected that usage of `-fno-sycl-rdc` coincide with
ahead of time compiling.

When using the generated fat object in this case, the compiler will recognize
the fat object that contains the fully linked device binary. The device binary
will be unbundled and linked during the final host link and will not be sent
through any additional device linking steps.

1. Generation of fat object: a.cpp -> a_fat.o (contains host object and full
device image)
2. Linking: a_fat.o -> executable

The generation of the full device image during the compilation (-c) step of
creating the object allows for library creation that does not require full
device linking steps which can be a burden to the user. Providing these early
device linking steps give the provider of the archives/objects a better user
experience.

#### Device code post-link step

At link time all the device code is linked into
Expand Down