-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Driver][SYCL] Add initial SYCL offload compilation support #107493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Introduces the SYCL based toolchain and initial toolchain construction when using the '-fsycl' option. This option will enable SYCL based offloading, creating a SPIR-V based IR file packaged into the compiled host object. This includes early support for creating the host/device object using the new offloading model. The device object is created using the spir64-unknown-unknown target triple. New/Updated Options: -fsycl Enables SYCL offloading for host and device -fsycl-device-only Enables device only compilation for SYCL -fsycl-host-only Enables host only compilation for SYCL RFC Reference: https://discourse.llvm.org/t/rfc-sycl-driver-enhancements/74092
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-driver Author: Michael Toguchi (mdtoguchi) ChangesIntroduces the SYCL based toolchain and initial toolchain construction when using the '-fsycl' option. This option will enable SYCL based offloading, creating a SPIR-V based IR file packaged into the compiled host object. This includes early support for creating the host/device object using the new offloading model. The device object is created using the spir64-unknown-unknown target triple. New/Updated Options: RFC Reference: https://discourse.llvm.org/t/rfc-sycl-driver-enhancements/74092 Patch is 40.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107493.diff 20 Files Affected:
diff --git a/clang/include/clang/Driver/Action.h b/clang/include/clang/Driver/Action.h
index 04fa8b01b418f8..feeabae89d6b1c 100644
--- a/clang/include/clang/Driver/Action.h
+++ b/clang/include/clang/Driver/Action.h
@@ -94,6 +94,7 @@ class Action {
OFK_Cuda = 0x02,
OFK_OpenMP = 0x04,
OFK_HIP = 0x08,
+ OFK_SYCL = 0x10,
};
static const char *getClassName(ActionClass AC);
diff --git a/clang/include/clang/Driver/Driver.h b/clang/include/clang/Driver/Driver.h
index 9177d56718ee77..ddadefaa079dd5 100644
--- a/clang/include/clang/Driver/Driver.h
+++ b/clang/include/clang/Driver/Driver.h
@@ -582,6 +582,10 @@ class Driver {
/// @name Helper Methods
/// @{
+ /// MakeSYCLDeviceTriple - Returns the SYCL device triple for the
+ /// specified ArchType.
+ llvm::Triple MakeSYCLDeviceTriple(StringRef TargetArch = "spir64") const;
+
/// PrintActions - Print the list of actions.
void PrintActions(const Compilation &C) const;
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 83cf753e824845..bd632f9270f382 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -182,7 +182,8 @@ def opencl_Group : OptionGroup<"<opencl group>">, Group<f_Group>,
DocName<"OpenCL options">;
def sycl_Group : OptionGroup<"<SYCL group>">, Group<f_Group>,
- DocName<"SYCL options">;
+ DocName<"SYCL options">,
+ Visibility<[ClangOption, CLOption]>;
def cuda_Group : OptionGroup<"<CUDA group>">, Group<f_Group>,
DocName<"CUDA options">,
@@ -6691,12 +6692,16 @@ defm : FlangIgnoredDiagOpt<"frontend-loop-interchange">;
defm : FlangIgnoredDiagOpt<"target-lifetime">;
// C++ SYCL options
+let Group = sycl_Group in {
def fsycl : Flag<["-"], "fsycl">,
- Visibility<[ClangOption, CLOption]>,
- Group<sycl_Group>, HelpText<"Enables SYCL kernels compilation for device">;
+ HelpText<"Enables SYCL kernels compilation for device">;
def fno_sycl : Flag<["-"], "fno-sycl">,
- Visibility<[ClangOption, CLOption]>,
- Group<sycl_Group>, HelpText<"Disables SYCL kernels compilation for device">;
+ HelpText<"Disables SYCL kernels compilation for device">;
+def fsycl_device_only : Flag<["-"], "fsycl-device-only">,
+ Alias<offload_device_only>, HelpText<"Compile SYCL kernels for device only">;
+def fsycl_host_only : Flag<["-"], "fsycl-host-only">,
+ Alias<offload_host_only>, HelpText<"Compile SYCL kernels for host only">;
+} // let Group = sycl_Group
// OS-specific options
let Flags = [TargetSpecific] in {
diff --git a/clang/include/clang/Driver/ToolChain.h b/clang/include/clang/Driver/ToolChain.h
index 5347e29be91439..0bc2a28bceb923 100644
--- a/clang/include/clang/Driver/ToolChain.h
+++ b/clang/include/clang/Driver/ToolChain.h
@@ -762,6 +762,10 @@ class ToolChain {
virtual void AddHIPIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const;
+ /// Add arguments to use SYCL specific includes.
+ virtual void AddSYCLIncludeArgs(const llvm::opt::ArgList &DriverArgs,
+ llvm::opt::ArgStringList &CC1Args) const;
+
/// Add arguments to use MCU GCC toolchain includes.
virtual void AddIAMCUIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const;
diff --git a/clang/lib/Driver/Action.cpp b/clang/lib/Driver/Action.cpp
index 849bf6035ebd2e..23dbcebc9a1ca1 100644
--- a/clang/lib/Driver/Action.cpp
+++ b/clang/lib/Driver/Action.cpp
@@ -111,6 +111,8 @@ std::string Action::getOffloadingKindPrefix() const {
return "device-openmp";
case OFK_HIP:
return "device-hip";
+ case OFK_SYCL:
+ return "device-sycl";
// TODO: Add other programming models here.
}
@@ -128,6 +130,8 @@ std::string Action::getOffloadingKindPrefix() const {
Res += "-hip";
if (ActiveOffloadKindMask & OFK_OpenMP)
Res += "-openmp";
+ if (ActiveOffloadKindMask & OFK_SYCL)
+ Res += "-sycl";
// TODO: Add other programming models here.
@@ -164,6 +168,8 @@ StringRef Action::GetOffloadKindName(OffloadKind Kind) {
return "openmp";
case OFK_HIP:
return "hip";
+ case OFK_SYCL:
+ return "sycl";
// TODO: Add other programming models here.
}
@@ -320,7 +326,7 @@ void OffloadAction::DeviceDependences::add(Action &A, const ToolChain &TC,
DeviceBoundArchs.push_back(BoundArch);
// Add each active offloading kind from a mask.
- for (OffloadKind OKind : {OFK_OpenMP, OFK_Cuda, OFK_HIP})
+ for (OffloadKind OKind : {OFK_OpenMP, OFK_Cuda, OFK_HIP, OFK_SYCL})
if (OKind & OffloadKindMask)
DeviceOffloadKinds.push_back(OKind);
}
diff --git a/clang/lib/Driver/CMakeLists.txt b/clang/lib/Driver/CMakeLists.txt
index 32a4378ab499fa..4cc5f730424cd5 100644
--- a/clang/lib/Driver/CMakeLists.txt
+++ b/clang/lib/Driver/CMakeLists.txt
@@ -85,6 +85,7 @@ add_clang_library(clangDriver
ToolChains/PPCFreeBSD.cpp
ToolChains/InterfaceStubs.cpp
ToolChains/ZOS.cpp
+ ToolChains/SYCL.cpp
Types.cpp
XRayArgs.cpp
diff --git a/clang/lib/Driver/Compilation.cpp b/clang/lib/Driver/Compilation.cpp
index ad077d5bbfa69a..283487548f4cf2 100644
--- a/clang/lib/Driver/Compilation.cpp
+++ b/clang/lib/Driver/Compilation.cpp
@@ -217,10 +217,11 @@ static bool ActionFailed(const Action *A,
if (FailingCommands.empty())
return false;
- // CUDA/HIP can have the same input source code compiled multiple times so do
- // not compiled again if there are already failures. It is OK to abort the
- // CUDA pipeline on errors.
- if (A->isOffloading(Action::OFK_Cuda) || A->isOffloading(Action::OFK_HIP))
+ // CUDA/HIP/SYCL can have the same input source code compiled multiple times
+ // so do not compile again if there are already failures. It is OK to abort
+ // the CUDA pipeline on errors.
+ if (A->isOffloading(Action::OFK_Cuda) || A->isOffloading(Action::OFK_HIP) ||
+ A->isOffloading(Action::OFK_SYCL))
return true;
for (const auto &CI : FailingCommands)
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 43002add33774b..b8d7401f6e01cc 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -43,6 +43,7 @@
#include "ToolChains/PS4CPU.h"
#include "ToolChains/RISCVToolchain.h"
#include "ToolChains/SPIRV.h"
+#include "ToolChains/SYCL.h"
#include "ToolChains/Solaris.h"
#include "ToolChains/TCE.h"
#include "ToolChains/VEToolchain.h"
@@ -767,6 +768,26 @@ Driver::OpenMPRuntimeKind Driver::getOpenMPRuntime(const ArgList &Args) const {
return RT;
}
+static const char *getDefaultSYCLArch(Compilation &C) {
+ if (C.getDefaultToolChain().getTriple().getArch() == llvm::Triple::x86)
+ return "spir";
+ return "spir64";
+}
+
+static bool addSYCLDefaultTriple(Compilation &C,
+ SmallVectorImpl<llvm::Triple> &SYCLTriples) {
+ for (const auto &SYCLTriple : SYCLTriples) {
+ if (SYCLTriple.getSubArch() == llvm::Triple::NoSubArch &&
+ SYCLTriple.isSPIROrSPIRV())
+ return false;
+ }
+ // Add the default triple as it was not found.
+ llvm::Triple DefaultTriple =
+ C.getDriver().MakeSYCLDeviceTriple(getDefaultSYCLArch(C));
+ SYCLTriples.insert(SYCLTriples.begin(), DefaultTriple);
+ return true;
+}
+
void Driver::CreateOffloadingDeviceToolChains(Compilation &C,
InputList &Inputs) {
@@ -979,6 +1000,44 @@ void Driver::CreateOffloadingDeviceToolChains(Compilation &C,
return;
}
+ //
+ // SYCL
+ //
+ // We need to generate a SYCL toolchain if the user specified -fsycl.
+ // If -fsycl is supplied we will assume SPIR-V.
+ bool IsSYCL =
+ C.getInputArgs().hasFlag(options::OPT_fsycl, options::OPT_fno_sycl,
+ false);
+
+ auto argSYCLIncompatible = [&](OptSpecifier OptId) {
+ if (!IsSYCL)
+ return;
+ if (Arg *IncompatArg = C.getInputArgs().getLastArg(OptId))
+ Diag(clang::diag::err_drv_argument_not_allowed_with)
+ << IncompatArg->getSpelling() << "-fsycl";
+ };
+ // -static-libstdc++ is not compatible with -fsycl.
+ argSYCLIncompatible(options::OPT_static_libstdcxx);
+ // -ffreestanding cannot be used with -fsycl
+ argSYCLIncompatible(options::OPT_ffreestanding);
+
+ llvm::SmallVector<llvm::Triple, 4> UniqueSYCLTriplesVec;
+
+ // If -fsycl is supplied we will assume SPIR-V.
+ if (IsSYCL) {
+ addSYCLDefaultTriple(C, UniqueSYCLTriplesVec);
+
+ // We'll need to use the SYCL and host triples as the key into
+ // getOffloadingDeviceToolChain, because the device toolchains we're
+ // going to create will depend on both.
+ const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();
+ for (const auto &TT : UniqueSYCLTriplesVec) {
+ auto SYCLTC = &getOffloadingDeviceToolChain(C.getInputArgs(), TT, *HostTC,
+ Action::OFK_SYCL);
+ C.addOffloadDeviceToolChain(SYCLTC, Action::OFK_SYCL);
+ }
+ }
+
//
// TODO: Add support for other offloading programming models here.
//
@@ -2001,6 +2060,19 @@ void Driver::PrintHelp(bool ShowHidden) const {
VisibilityMask);
}
+llvm::Triple Driver::MakeSYCLDeviceTriple(StringRef TargetArch) const {
+ SmallVector<StringRef, 5> SYCLAlias = { "spir", "spir64", "spirv32", "spirv64"};
+ if (std::find(SYCLAlias.begin(), SYCLAlias.end(), TargetArch) !=
+ SYCLAlias.end()) {
+ llvm::Triple TT;
+ TT.setArchName(TargetArch);
+ TT.setVendor(llvm::Triple::UnknownVendor);
+ TT.setOS(llvm::Triple::UnknownOS);
+ return TT;
+ }
+ return llvm::Triple(TargetArch);
+}
+
void Driver::PrintVersion(const Compilation &C, raw_ostream &OS) const {
if (IsFlangMode()) {
OS << getClangToolFullVersion("flang-new") << '\n';
@@ -4155,6 +4227,7 @@ void Driver::BuildActions(Compilation &C, DerivedArgList &Args,
bool UseNewOffloadingDriver =
C.isOffloadingHostKind(Action::OFK_OpenMP) ||
+ C.isOffloadingHostKind(Action::OFK_SYCL) ||
Args.hasFlag(options::OPT_foffload_via_llvm,
options::OPT_fno_offload_via_llvm, false) ||
Args.hasFlag(options::OPT_offload_new_driver,
@@ -4565,6 +4638,8 @@ Driver::getOffloadArchs(Compilation &C, const llvm::opt::DerivedArgList &Args,
Archs.insert(OffloadArchToString(OffloadArch::HIPDefault));
else if (Kind == Action::OFK_OpenMP)
Archs.insert(StringRef());
+ else if (Kind == Action::OFK_SYCL)
+ Archs.insert(StringRef());
} else {
Args.ClaimAllArgs(options::OPT_offload_arch_EQ);
Args.ClaimAllArgs(options::OPT_no_offload_arch_EQ);
@@ -4589,7 +4664,7 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
OffloadAction::DeviceDependences DDeps;
const Action::OffloadKind OffloadKinds[] = {
- Action::OFK_OpenMP, Action::OFK_Cuda, Action::OFK_HIP};
+ Action::OFK_OpenMP, Action::OFK_Cuda, Action::OFK_HIP, Action::OFK_SYCL};
for (Action::OffloadKind Kind : OffloadKinds) {
SmallVector<const ToolChain *, 2> ToolChains;
@@ -4634,6 +4709,11 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
break;
}
+ // Assemble actions are not used for the SYCL device side. Both compile
+ // and backend actions are used to generate IR and textual IR if needed.
+ if (Kind == Action::OFK_SYCL && Phase == phases::Assemble)
+ continue;
+
auto TCAndArch = TCAndArchs.begin();
for (Action *&A : DeviceActions) {
if (A->getType() == types::TY_Nothing)
@@ -4867,12 +4947,13 @@ Action *Driver::ConstructPhaseAction(
return C.MakeAction<BackendJobAction>(Input, Output);
}
if (Args.hasArg(options::OPT_emit_llvm) ||
- (((Input->getOffloadingToolChain() &&
- Input->getOffloadingToolChain()->getTriple().isAMDGPU()) ||
- TargetDeviceOffloadKind == Action::OFK_HIP) &&
- (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
- false) ||
- TargetDeviceOffloadKind == Action::OFK_OpenMP))) {
+ (TargetDeviceOffloadKind == Action::OFK_SYCL ||
+ (((Input->getOffloadingToolChain() &&
+ Input->getOffloadingToolChain()->getTriple().isAMDGPU()) ||
+ TargetDeviceOffloadKind == Action::OFK_HIP) &&
+ (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
+ false) ||
+ TargetDeviceOffloadKind == Action::OFK_OpenMP)))) {
types::ID Output =
Args.hasArg(options::OPT_S) &&
(TargetDeviceOffloadKind == Action::OFK_None ||
@@ -6553,6 +6634,20 @@ const ToolChain &Driver::getOffloadingDeviceToolChain(
HostTC, Args);
break;
}
+ case Action::OFK_SYCL:
+ // TODO: Add additional Arch values for Ahead of Time support for SYCL.
+ switch (Target.getArch()) {
+ case llvm::Triple::spir:
+ case llvm::Triple::spir64:
+ case llvm::Triple::spirv32:
+ case llvm::Triple::spirv64:
+ TC = std::make_unique<toolchains::SYCLToolChain>(*this, Target, HostTC,
+ Args);
+ break;
+ default:
+ break;
+ }
+ break;
default:
break;
}
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index 76901875c66959..f5b7837e3e3fc4 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -1441,6 +1441,9 @@ void ToolChain::AddCudaIncludeArgs(const ArgList &DriverArgs,
void ToolChain::AddHIPIncludeArgs(const ArgList &DriverArgs,
ArgStringList &CC1Args) const {}
+void ToolChain::AddSYCLIncludeArgs(const ArgList &DriverArgs,
+ ArgStringList &CC1Args) const {}
+
llvm::SmallVector<ToolChain::BitCodeLibraryInfo, 12>
ToolChain::getDeviceLibs(const ArgList &DriverArgs) const {
return {};
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index df86941950e46e..c8398113313bc4 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -24,6 +24,7 @@
#include "Hexagon.h"
#include "MSP430.h"
#include "PS4CPU.h"
+#include "SYCL.h"
#include "clang/Basic/CLWarnings.h"
#include "clang/Basic/CharInfo.h"
#include "clang/Basic/CodeGenOptions.h"
@@ -1073,6 +1074,8 @@ void Clang::AddPreprocessingOptions(Compilation &C, const JobAction &JA,
getToolChain().AddCudaIncludeArgs(Args, CmdArgs);
if (JA.isOffloading(Action::OFK_HIP))
getToolChain().AddHIPIncludeArgs(Args, CmdArgs);
+ if (JA.isOffloading(Action::OFK_SYCL))
+ getToolChain().AddSYCLIncludeArgs(Args, CmdArgs);
// If we are offloading to a target via OpenMP we need to include the
// openmp_wrappers folder which contains alternative system headers.
@@ -4969,17 +4972,21 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
// second input. Module precompilation accepts a list of header files to
// include as part of the module. API extraction accepts a list of header
// files whose API information is emitted in the output. All other jobs are
- // expected to have exactly one input.
+ // expected to have exactly one input. SYCL compilation only expects a
+ // single input.
bool IsCuda = JA.isOffloading(Action::OFK_Cuda);
bool IsCudaDevice = JA.isDeviceOffloading(Action::OFK_Cuda);
bool IsHIP = JA.isOffloading(Action::OFK_HIP);
bool IsHIPDevice = JA.isDeviceOffloading(Action::OFK_HIP);
+ bool IsSYCL = JA.isOffloading(Action::OFK_SYCL);
+ bool IsSYCLDevice = JA.isDeviceOffloading(Action::OFK_SYCL);
bool IsOpenMPDevice = JA.isDeviceOffloading(Action::OFK_OpenMP);
bool IsExtractAPI = isa<ExtractAPIJobAction>(JA);
bool IsDeviceOffloadAction = !(JA.isDeviceOffloading(Action::OFK_None) ||
JA.isDeviceOffloading(Action::OFK_Host));
bool IsHostOffloadingAction =
JA.isHostOffloading(Action::OFK_OpenMP) ||
+ JA.isHostOffloading(Action::OFK_SYCL) ||
(JA.isHostOffloading(C.getActiveOffloadKinds()) &&
Args.hasFlag(options::OPT_offload_new_driver,
options::OPT_no_offload_new_driver, false));
@@ -5029,10 +5036,10 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
bool IsWindowsMSVC = RawTriple.isWindowsMSVCEnvironment();
bool IsIAMCU = RawTriple.isOSIAMCU();
- // Adjust IsWindowsXYZ for CUDA/HIP compilations. Even when compiling in
+ // Adjust IsWindowsXYZ for CUDA/HIP/SYCL compilations. Even when compiling in
// device mode (i.e., getToolchain().getTriple() is NVPTX/AMDGCN, not
// Windows), we need to pass Windows-specific flags to cc1.
- if (IsCuda || IsHIP)
+ if (IsCuda || IsHIP || IsSYCL)
IsWindowsMSVC |= AuxTriple && AuxTriple->isWindowsMSVCEnvironment();
// C++ is not supported for IAMCU.
@@ -5116,15 +5123,39 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
if (const Arg *PF = Args.getLastArg(options::OPT_mprintf_kind_EQ))
PF->claim();
- if (Args.hasFlag(options::OPT_fsycl, options::OPT_fno_sycl, false)) {
+ Arg *SYCLStdArg = Args.getLastArg(options::OPT_sycl_std_EQ);
+
+ if (IsSYCLDevice) {
+ // Host triple is needed when doing SYCL device compilations.
+ llvm::Triple AuxT = C.getDefaultToolChain().getTriple();
+ std::string NormalizedTriple = AuxT.normalize();
+ CmdArgs.push_back("-aux-triple");
+ CmdArgs.push_back(Args.MakeArgString(NormalizedTriple));
+
+ // We want to compile sycl kernels.
CmdArgs.push_back("-fsycl-is-device");
- if (Arg *A = Args.getLastArg(options::OPT_sycl_std_EQ)) {
- A->render(Args, CmdArgs);
+ // Set O2 optimization level by default
+ if (!Args.getLastArg(options::OPT_O_Group))
+ CmdArgs.push_back("-O2");
+ }
+
+ if (IsSYCL) {
+ // Set options for both host and device
+ if (SYCLStdArg) {
+ SYCLStdArg->render(Args, CmdArgs);
} else {
// Ensure the default version in SYCL mode is 2020.
CmdArgs.push_back("-sycl-std=2020");
}
+
+ // Add any options that are needed specific to SYCL offload while
+ // performing the host side compilation.
+ if (!IsSYCLDevice) {
+ // Let the front-end host compilation flow know about SYCL offload
+ // compilation.
+ CmdArgs.push_back("-fsycl-is-host");
+ }
}
if (IsOpenMPDevice) {
@@ -6061,7 +6092,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
// Prepare `-aux-target-cpu` and `-aux-target-feature` unless
// `--gpu-use-aux-triple-only` is specified.
if (!Args.getLastArg(options::OPT_gpu_use_aux_triple_only) &&
- (IsCudaDevice || IsHIPDevice)) {
+ (IsCudaDevice || IsHIPDevice || IsSYCLDevice)) {
const ArgList &HostArgs =
C.getArgsForToolChain(nullptr, StringRef(), Action::OFK_None);
std::string HostCPU =
diff --git a/clang/lib/Driver/ToolChains/Gnu.cpp b/clang/lib/Driver/ToolChains/Gnu.cpp
index 52c2ee90b1b286..129413af750614 100644
--- a/clang/lib/Driver/ToolChains/Gnu.cpp
+++ b/clang/lib/Driver/ToolChains/Gnu.cpp
@@ -3051,7 +3051,8 @@ bool Generic_GCC::GCCInstallationDetector::ScanGentooGccConfig(
Generic_GCC::Generic_GCC(const Driver &D, const llvm::Triple &Triple,
const ArgList &Args)
: ToolChain(D, Triple, Args), GCCInstallation(D),
- CudaInstallation(D, Triple, Args), RocmInstallation(D, Triple, Args) {
+ CudaInstallation(D, Triple, Args), RocmInstallation(D, Triple, Args),
+ SYCLInstallation(D) {
getProgramPaths().push_back(getDriver().Dir);
}
diff --git a/clang/lib/Driver/ToolChains/Gnu.h b/clang/lib/Driver/ToolChains/Gnu.h
index 0b664a182d75e1..4c5ecf5401746a 100644
--- a/clang/lib/Driver/ToolChains/Gnu.h
+++ b/clang/lib/Driver/ToolChains/Gnu.h
@@ -12,6 +12,7 @@
#include "Cuda.h"
#include "LazyDetector.h"
#include "ROCm.h"
+#include "SYCL.h"
#include "clang/Driver/Tool.h"
#include "clang/Driver/ToolChain.h"
#include <set>
@@ -288,6 +289,7 @@ class LLVM_LIBRARY_VISIBILITY Generic_GCC : public ToolChain {
GCCInstallationDetector GCCInstallation;
LazyDetector<CudaInstallationDetector> CudaInstallation;
LazyDetector<RocmInstallationDetector> RocmInstallation;
+ SYCLInstallationDetector SYCLInsta...
[truncated]
|
clang/lib/Driver/ToolChains/SYCL.cpp
Outdated
// -fcf-protection, -fsanitize, -fprofile-generate, -fprofile-instr-generate | ||
// -ftest-coverage, -fcoverage-mapping, -fcreate-profile, -fprofile-arcs | ||
// -fcs-profile-generate -forder-file-instrumentation, --coverage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect this comment to get out of date rather quickly; I'd probably drop the list and let users read the function body instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to add each of the option names in a comment on the same line as the OPT_
prefixed name below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I have some questions and pointed out some minor concerns.
clang/lib/Driver/Driver.cpp
Outdated
// SYCL | ||
// | ||
// We need to generate a SYCL toolchain if the user specified -fsycl. | ||
// If -fsycl is supplied we will assume SPIR-V. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is repeated below where it appears to be more relevant to the surrounding code. I recommend removing this one.
// If -fsycl is supplied we will assume SPIR-V. |
clang/lib/Driver/Driver.cpp
Outdated
if (C.getDefaultToolChain().getTriple().getArch() == llvm::Triple::x86) | ||
return "spir"; | ||
return "spir64"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these perhaps be "spirv32" and "spirv64"? Driver::MakeSYCLDeviceTriple()
below seems to indicate that distinct architecture targets exist and comments elsewhere indicate that SPIR-V (not SPIR) is the intended default.
clang/lib/Driver/Driver.cpp
Outdated
|
||
llvm::SmallVector<llvm::Triple, 4> UniqueSYCLTriplesVec; | ||
|
||
// If -fsycl is supplied we will assume SPIR-V. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this comment doesn't seem all that helpful here either. Perhaps it would be better colocated with addSYCLDefaultTriple()
. Hmm, no, perhaps getDefaultSYCLArch()
then?
clang/lib/Driver/Driver.cpp
Outdated
@@ -6553,6 +6634,20 @@ const ToolChain &Driver::getOffloadingDeviceToolChain( | |||
HostTC, Args); | |||
break; | |||
} | |||
case Action::OFK_SYCL: | |||
// TODO: Add additional Arch values for Ahead of Time support for SYCL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since AoT support doesn't exist in Clang yet, this TODO comment seems premature.
// Set O2 optimization level by default | ||
if (!Args.getLastArg(options::OPT_O_Group)) | ||
CmdArgs.push_back("-O2"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for defaulting the optimization level to something different than the host? I find this surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original implementation was based off of defaults that were applied for OpenCL: https://github.com/llvm/llvm-project/blob/main/clang/lib/Frontend/CompilerInvocation.cpp#L697-L700
Had a short interaction with @bader and he stated optimizations by default are critical for SYCL, as it adds a bit of C++ code like template instantiations to user's code.
clang/lib/Driver/ToolChains/SYCL.h
Outdated
if (this->HostTC.getTriple().isWindowsMSVCEnvironment()) | ||
return this->HostTC.getDefaultDebugFormat(); | ||
return ToolChain::getDefaultDebugFormat(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When/why would the host debug format not be what is desired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this can probably be cleaned up.
clang/lib/Driver/ToolChains/SYCL.h
Outdated
const ToolChain &HostTC; | ||
|
||
SYCLInstallationDetector SYCLInstallation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const ToolChain &HostTC; | |
SYCLInstallationDetector SYCLInstallation; | |
private: | |
const ToolChain &HostTC; | |
SYCLInstallationDetector SYCLInstallation; |
// Tests whether the target is SPIR-V or SPIR. Currently, we use spir-based | ||
// target triples to represent JIT compilation targets for SYCL. We will | ||
// transition to using spir-v based target triples to represent JIT | ||
// compilation targets for SYCL very soon. This helper function is used | ||
// (instead of isSPIR) to ease that transition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments that attempt to predict the future have a tendency to either be wrong or to predict the past ;)
- Improve commenting for unsupported sanitizer options - Update visibility of the HostTC, SYCLInstallation and debug info - cleanup un-needed header file inclusions
- Use spirv32/spirv64 as default triple arch - clean up a number of comments - Fix file ordering inclusion to be alphabetical
Hello - any additional feedback on this? |
The house rule is to ping: |
Ping |
@bader, thanks for adding the SYCL label, can you also take a look at this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdtoguchi, thank you for working on this!
Overall, looks good to me, but I made a few suggestions in the comments.
It would be great if @tahonermann and @AaronBallman can confirm that all their comments are addressed.
- update a number of comments to include SYCL - update function names to better fit standards - use generic check for 32-bit
Thanks for the review @bader, I have made updates and hope to have addressed your concerns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. No other comments from my side.
✅ With the latest revision this PR passed the C/C++ code formatter. |
Ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, any concerns from you @MaskRay?
Ping |
Given that it's been a week, I'm merging; if @MaskRay has any concerns, they can be addressed post commit. |
@mdtoguchi Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/144/builds/11689 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/46/builds/7938 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/9546 Here is the relevant piece of the build log for the reference
|
I reverted the changes due to the failing bots; you can make a new PR with the issues fixed and resubmit for a quick review. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/23/builds/4847 Here is the relevant piece of the build log for the reference
|
Thanks @AaronBallman. I'll look into it. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/13/builds/3540 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/1609 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/24/builds/2686 Here is the relevant piece of the build log for the reference
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was late to re-review these changes before they landed. However, since the changes have been reverted, I took the opportunity to review again and I have a number of concerns that I would like to see addressed before these changes land again. Though I've noted the issues in this PR, I expect a new PR will be needed to address them and re-land the revised patch.
/// getSYCLDeviceTriple - Returns the SYCL device triple for the | ||
/// specified ArchType. | ||
llvm::Triple getSYCLDeviceTriple(StringRef TargetArch = "spir64") const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like getSYCLDeviceTriple()
needs to be exposed in the header. Can this just be a static
function in Driver.cpp
? There are similar static
functions there for CUDA and HIP; see getNVIDIAOffloadTargetTriple()
and getHIPOffloadTargetTriple()
. I think this should follow that precedent if there isn't a good reason to deviate.
def fsycl : Flag<["-"], "fsycl">, | ||
Visibility<[ClangOption, CLOption]>, | ||
Group<sycl_Group>, HelpText<"Enables SYCL kernels compilation for device">; | ||
HelpText<"Enables SYCL kernels compilation for device">; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing help text could use some improvement,. How about simply "Enable SYCL C++ extensions" (and "Disable SYCL C++ extensions" for -fno-sycl
below)?
def fsycl_device_only : Flag<["-"], "fsycl-device-only">, | ||
Alias<offload_device_only>, HelpText<"Compile SYCL kernels for device only">; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"kernels" seems out of place in the help text. How about more closely following the existing help text for CUDA? "Compile SYCL code for device only".
def fsycl_host_only : Flag<["-"], "fsycl-host-only">, | ||
Alias<offload_host_only>, HelpText<"Compile SYCL kernels for host only">; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the above comment; "kernels" seems out of place here. There is also an interesting deviation from the equivalent option for CUDA:
def cuda_host_only : Flag<["--"], "cuda-host-only">, Alias<offload_host_only>,
HelpText<"Compile CUDA code for host only. Has no effect on non-CUDA compilations.">;
SYCL should presumably follow suit here and add the "Has no effect on non-SYCL compilations." qualification (and the behavior of the option should presumably match).
default: | ||
break; | ||
} | ||
break; | ||
default: | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that something needs to be done in these default cases. In this context, TC
is known to be null, but falling through leads to an unconditional dereference of it below. Some llvm_unreachable()
or similar calls seem appropriate.
@@ -288,6 +289,7 @@ class LLVM_LIBRARY_VISIBILITY Generic_GCC : public ToolChain { | |||
GCCInstallationDetector GCCInstallation; | |||
LazyDetector<CudaInstallationDetector> CudaInstallation; | |||
LazyDetector<RocmInstallationDetector> RocmInstallation; | |||
SYCLInstallationDetector SYCLInstallation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A LazyDetector
is used for both CUDA and ROCm; any reason not to do similarly for SYCL?
@@ -138,6 +142,7 @@ class LLVM_LIBRARY_VISIBILITY MSVCToolChain : public ToolChain { | |||
llvm::ToolsetLayout VSLayout = llvm::ToolsetLayout::OlderVS; | |||
LazyDetector<CudaInstallationDetector> CudaInstallation; | |||
LazyDetector<RocmInstallationDetector> RocmInstallation; | |||
SYCLInstallationDetector SYCLInstallation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use LazyDetector
here?
// Add the SYCL header search locations in the specified order. | ||
// ../include/sycl | ||
// ../include/sycl/stl_wrappers | ||
// ../include |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but this looks like it could result in ../include
being added to the command line multiple times. That can be problematic for relative include path orders and, specifically, for #include_next
. I agree that support for #include <sycl/sycl.hpp>
should be included, but I'm skeptical that potentially adding the same path multiple times is the right way to do that.
I think there are other issues here as well. SYCLInstallationDetector
doesn't actually appear to do any detecting. Is the intention that future changes will add probing of installation candidates? Regardless, I think any additions to CC1Args
should be conditional on an installation having been detected. As is, the InstallationCandidates
member doesn't appear to serve any purpose.
options::OPT_fsanitize_EQ, // -fsanitize | ||
options::OPT_fcf_protection_EQ, // -fcf-protection | ||
options::OPT_fprofile_generate, | ||
options::OPT_fprofile_generate_EQ, | ||
options::OPT_fno_profile_generate, // -f[no-]profile-generate | ||
options::OPT_ftest_coverage, | ||
options::OPT_fno_test_coverage, // -f[no-]test-coverage | ||
options::OPT_fcoverage_mapping, | ||
options::OPT_fno_coverage_mapping, // -f[no-]coverage-mapping | ||
options::OPT_coverage, // --coverage | ||
options::OPT_fprofile_instr_generate, | ||
options::OPT_fprofile_instr_generate_EQ, | ||
options::OPT_fno_profile_instr_generate, // -f[no-]profile-instr-generate | ||
options::OPT_fprofile_arcs, | ||
options::OPT_fno_profile_arcs, // -f[no-]profile-arcs | ||
options::OPT_fcreate_profile, // -fcreate-profile | ||
options::OPT_fprofile_instr_use, | ||
options::OPT_fprofile_instr_use_EQ, // -fprofile-instr-use | ||
options::OPT_forder_file_instrumentation, // -forder-file-instrumentation | ||
options::OPT_fcs_profile_generate, // -fcs-profile-generate | ||
options::OPT_fcs_profile_generate_EQ}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please align the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tahonermann, the InstallationDetector
model was used mimicking the Cuda detectors. There isn't really an installation to detect here, as everything should be part of the compiler build. Perhaps the naming is not ideal maybe something more in the lines of SYCLLocation
or just SYCLInstallation
? I'll clean out the InstallationCandidates
as it is not pertinent.
CC1Args.push_back("-internal-isystem"); | ||
CC1Args.push_back(DriverArgs.MakeArgString(STLWrappersPath)); | ||
CC1Args.push_back("-internal-isystem"); | ||
CC1Args.push_back(DriverArgs.MakeArgString(IncludePath)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these additions be contingent on --nobuiltininc
having not been specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to do so, I'll make those adjustments.
…lvm#107493) Introduces the SYCL based toolchain and initial toolchain construction when using the '-fsycl' option. This option will enable SYCL based offloading, creating a SPIR-V based IR file packaged into the compiled host object. This includes early support for creating the host/device object using the new offloading model. The device object is created using the spir64-unknown-unknown target triple. New/Updated Options: -fsycl Enables SYCL offloading for host and device -fsycl-device-only Enables device only compilation for SYCL -fsycl-host-only Enables host only compilation for SYCL RFC Reference: https://discourse.llvm.org/t/rfc-sycl-driver-enhancements/74092 Was reverted due to buildbot issues. Contains additional fixes to pull in the SYCL header dependencies to other toolchains.
) Introduces the SYCL based toolchain and initial toolchain construction when using the '-fsycl' option. This option will enable SYCL based offloading, creating a SPIR-V based IR file packaged into the compiled host object. This includes early support for creating the host/device object using the new offloading model. The device object is created using the spir64-unknown-unknown target triple. New/Updated Options: -fsycl Enables SYCL offloading for host and device -fsycl-device-only Enables device only compilation for SYCL -fsycl-host-only Enables host only compilation for SYCL RFC Reference: https://discourse.llvm.org/t/rfc-sycl-driver-enhancements/74092
Introduces the SYCL based toolchain and initial toolchain construction when using the '-fsycl' option. This option will enable SYCL based offloading, creating a SPIR-V based IR file packaged into the compiled host object. This includes early support for creating the host/device object using the new offloading model. The device object is created using the spir64-unknown-unknown target triple. New/Updated Options: -fsycl Enables SYCL offloading for host and device -fsycl-device-only Enables device only compilation for SYCL -fsycl-host-only Enables host only compilation for SYCL RFC Reference: https://discourse.llvm.org/t/rfc-sycl-driver-enhancements/74092 This is a reland of: #107493
…t (#117268) Introduces the SYCL based toolchain and initial toolchain construction when using the '-fsycl' option. This option will enable SYCL based offloading, creating a SPIR-V based IR file packaged into the compiled host object. This includes early support for creating the host/device object using the new offloading model. The device object is created using the spir64-unknown-unknown target triple. New/Updated Options: -fsycl Enables SYCL offloading for host and device -fsycl-device-only Enables device only compilation for SYCL -fsycl-host-only Enables host only compilation for SYCL RFC Reference: https://discourse.llvm.org/t/rfc-sycl-driver-enhancements/74092 This is a reland of: llvm/llvm-project#107493
Introduces the SYCL based toolchain and initial toolchain construction when using the '-fsycl' option. This option will enable SYCL based offloading, creating a SPIR-V based IR file packaged into the compiled host object.
This includes early support for creating the host/device object using the new offloading model. The device object is created using the spir64-unknown-unknown target triple.
New/Updated Options:
-fsycl Enables SYCL offloading for host and device
-fsycl-device-only
Enables device only compilation for SYCL
-fsycl-host-only
Enables host only compilation for SYCL
RFC Reference: https://discourse.llvm.org/t/rfc-sycl-driver-enhancements/74092