-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[CUDA] Include PTX in non-RDC mode using the new driver #84367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C, | |
DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind); | ||
OffloadAction::DeviceDependences DDep; | ||
DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind); | ||
|
||
// Compiling CUDA in non-RDC mode uses the PTX output if available. | ||
for (Action *Input : A->getInputs()) | ||
if (Kind == Action::OFK_Cuda && A->getType() == types::TY_Object && | ||
!Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not quite sure why we would need to include PTX for RDC compilation. In retrospect, including PTX by default with all compilations turned out to be a wrong default choice. Switching to the new driver is a good point to make a better choice. I would argue that we should not be including PTX by default or, if we do deem that it may be useful, only add it for the most recent chosen GPU variant, to provide some forward compatibility, not for all of them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I don't have my finger on the pulse of the CUDA users here. I think we want this patch to match the current behavior with |
||
false)) | ||
DDep.add(*Input, *TCAndArch->first, TCAndArch->second.data(), Kind); | ||
OffloadActions.push_back(C.MakeAction<OffloadAction>(DDep, A->getType())); | ||
|
||
++TCAndArch; | ||
} | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -503,18 +503,20 @@ void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA, | |
Exec, CmdArgs, Inputs, Output)); | ||
} | ||
|
||
static bool shouldIncludePTX(const ArgList &Args, const char *gpu_arch) { | ||
bool includePTX = true; | ||
for (Arg *A : Args) { | ||
if (!(A->getOption().matches(options::OPT_cuda_include_ptx_EQ) || | ||
A->getOption().matches(options::OPT_no_cuda_include_ptx_EQ))) | ||
continue; | ||
static bool shouldIncludePTX(const ArgList &Args, StringRef InputArch) { | ||
// The new driver does not include PTX by default to avoid overhead. | ||
bool includePTX = !Args.hasFlag(options::OPT_offload_new_driver, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd add a comment on why we're making this decision based on the new vs old driver. |
||
options::OPT_no_offload_new_driver, false); | ||
for (Arg *A : Args.filtered(options::OPT_cuda_include_ptx_EQ, | ||
options::OPT_no_cuda_include_ptx_EQ)) { | ||
A->claim(); | ||
const StringRef ArchStr = A->getValue(); | ||
if (ArchStr == "all" || ArchStr == gpu_arch) { | ||
includePTX = A->getOption().matches(options::OPT_cuda_include_ptx_EQ); | ||
continue; | ||
} | ||
if (A->getOption().matches(options::OPT_cuda_include_ptx_EQ) && | ||
(ArchStr == "all" || ArchStr == InputArch)) | ||
includePTX = true; | ||
else if (A->getOption().matches(options::OPT_no_cuda_include_ptx_EQ) && | ||
(ArchStr == "all" || ArchStr == InputArch)) | ||
includePTX = false; | ||
} | ||
return includePTX; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still respect
--cuda-include-ptx=...
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the current behavior is that it will "always" set the PTX in the job and optionally include it in the
fatbinary
job depending on those settings.