-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] Enable OpenCL hostcall printf (WIP) #72556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
9b0b337
62b5f5a
1b9e2ba
f937af4
e05ffbd
47b927a
de80946
f428579
c69c80a
0f8f4f9
99b1559
04b29e0
cb9dedf
cdacec7
5b73f02
37e0fe4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -49,6 +49,7 @@ | |||
#include "clang/Basic/SourceLocation.h" | ||||
#include "clang/Basic/SourceManager.h" | ||||
#include "clang/Basic/Specifiers.h" | ||||
#include "clang/Basic/TargetBuiltins.h" | ||||
#include "clang/Basic/TargetCXXABI.h" | ||||
#include "clang/Basic/TargetInfo.h" | ||||
#include "clang/Basic/Visibility.h" | ||||
|
@@ -3616,6 +3617,12 @@ unsigned FunctionDecl::getBuiltinID(bool ConsiderWrapperFunctions) const { | |||
if (!ConsiderWrapperFunctions && getStorageClass() == SC_Static) | ||||
return 0; | ||||
|
||||
// AMDGCN implementation supports printf as a builtin | ||||
// for OpenCL | ||||
if (Context.getTargetInfo().getTriple().isAMDGCN() && | ||||
Context.getLangOpts().OpenCL && BuiltinID == AMDGPU::BIprintf) | ||||
return BuiltinID; | ||||
Comment on lines
+3620
to
+3624
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This does not belong here and has nothing to do with AMDGPU There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The signatures of C-printf and OCL printf differ and I dont think generic builtin handling provides a way to register overloaded builtins with "shared" builtin ID's. This needs a new ID. do you have any alternate suggestions here ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought this had been clarified earlier too. It's quite imprecise to just say that "signatures differ". Perhaps the following detailed explanation might move the conversatino forward. The problem is that the OpenCL printf expects a format string in the constant address space, which has no representation in Clang builtin. What we do have is the ability to specify an address-space number in the builtin declaration. But this number is target-specific, which makes the whole builtin target-specific. Is there a way around that magic number 4? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Only other alternative I see currently is to modify Sema (probably ActOnFunctionDeclarator) so that we map the ocl printf declaration to C printf builtin ID. This would be a really hacky solution and I would prefer this implementation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ping @arsenm There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The builtin specifications are also in terms of the lang address space, not the target address space (this was an ugly compromise to make builtins work at all in OpenCL) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @arsenm, thanks for the info. CustomTypeChecking is a valid option. I'm not sure why OpenCL community did not consider this change despite OpenCL specs specifying the details. I could create a separate patch for this (probably folks from OCL community would provide further background). Meanwhile, can this go ahead as an AMDGPU specific workaround for now so that we have the intended feature in place ? (The frontend changes here can be reverted with that follow up patch ) PS :Ah, I see another issue . OpenCL v1.2 s6.9.f states none of the functions defined in C99 headers are available. This would mean std printf is supposed to be treated differently than OpenCL builtins and consequently the builtin IDs assigned to them "need" to be different. If this understanding is correct, moving ahead with using same builtin ID as std printf is not the right way. (probably this is why such an implementation was never considered) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No. That cleanup will never happen.
That's not what that means There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was referring to llvm-project/clang/lib/AST/Decl.cpp Line 3633 in 3e6db60
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that check is buggy. The printf declaration doesn't come from a header, but the printf function does exist in 1.2+. It probably needs to special case printf |
||||
|
||||
// OpenCL v1.2 s6.9.f - The library functions defined in | ||||
// the C99 standard headers are not available. | ||||
if (Context.getLangOpts().OpenCL && | ||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2550,6 +2550,11 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID, | |
&getTarget().getLongDoubleFormat() == &llvm::APFloat::IEEEquad()) | ||
BuiltinID = mutateLongDoubleBuiltin(BuiltinID); | ||
|
||
// Mutate the printf builtin ID so that we use the same CodeGen path for | ||
// HIP and OpenCL with AMDGPU targets. | ||
if (getTarget().getTriple().isAMDGCN() && BuiltinID == AMDGPU::BIprintf) | ||
BuiltinID = Builtin::BIprintf; | ||
Comment on lines
+2553
to
+2556
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You should not need to remap builtins There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This can be removed if you feel so, probably we would need a new case in Expr CodeGen There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think all you need is to add the CustomTypeChecking attribute to the printf definition, and then add language specific type checking on the string argument |
||
|
||
// If the builtin has been declared explicitly with an assembler label, | ||
// disable the specialized emitting below. Ideally we should communicate the | ||
// rename in IR, or at least avoid generating the intrinsic calls that are | ||
|
@@ -5722,7 +5727,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID, | |
return EmitOpenMPDevicePrintfCallExpr(E); | ||
if (getTarget().getTriple().isNVPTX()) | ||
return EmitNVPTXDevicePrintfCallExpr(E); | ||
if (getTarget().getTriple().isAMDGCN() && getLangOpts().HIP) | ||
if (getTarget().getTriple().isAMDGCN() && | ||
(getLangOpts().HIP || getLangOpts().OpenCL)) | ||
return EmitAMDGPUDevicePrintfCallExpr(E); | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,8 @@ | |
|
||
#include "CodeGenFunction.h" | ||
#include "clang/Basic/Builtins.h" | ||
#include "clang/Basic/TargetBuiltins.h" | ||
#include "llvm/Analysis/ValueTracking.h" | ||
#include "llvm/IR/DataLayout.h" | ||
#include "llvm/IR/Instruction.h" | ||
#include "llvm/Support/MathExtras.h" | ||
|
@@ -181,14 +183,17 @@ RValue CodeGenFunction::EmitNVPTXDevicePrintfCallExpr(const CallExpr *E) { | |
RValue CodeGenFunction::EmitAMDGPUDevicePrintfCallExpr(const CallExpr *E) { | ||
assert(getTarget().getTriple().getArch() == llvm::Triple::amdgcn); | ||
assert(E->getBuiltinCallee() == Builtin::BIprintf || | ||
E->getBuiltinCallee() == Builtin::BI__builtin_printf); | ||
E->getBuiltinCallee() == Builtin::BI__builtin_printf || | ||
E->getBuiltinCallee() == AMDGPU::BIprintf); | ||
assert(E->getNumArgs() >= 1); // printf always has at least one arg. | ||
|
||
CallArgList CallArgs; | ||
EmitCallArgs(CallArgs, | ||
E->getDirectCallee()->getType()->getAs<FunctionProtoType>(), | ||
E->arguments(), E->getDirectCallee(), | ||
/* ParamsToSkip = */ 0); | ||
llvm::IRBuilder<> IRB(Builder.GetInsertBlock(), Builder.GetInsertPoint()); | ||
IRB.SetCurrentDebugLocation(Builder.getCurrentDebugLocation()); | ||
|
||
SmallVector<llvm::Value *, 8> Args; | ||
for (const auto &A : CallArgs) { | ||
|
@@ -202,12 +207,17 @@ RValue CodeGenFunction::EmitAMDGPUDevicePrintfCallExpr(const CallExpr *E) { | |
Args.push_back(Arg); | ||
} | ||
|
||
llvm::IRBuilder<> IRB(Builder.GetInsertBlock(), Builder.GetInsertPoint()); | ||
IRB.SetCurrentDebugLocation(Builder.getCurrentDebugLocation()); | ||
auto PFK = CGM.getTarget().getTargetOpts().AMDGPUPrintfKindVal; | ||
bool isBuffered = (PFK == clang::TargetOptions::AMDGPUPrintfKind::Buffered); | ||
|
||
StringRef FmtStr; | ||
if (llvm::getConstantStringInfo(Args[0], FmtStr)) { | ||
if (FmtStr.empty()) | ||
FmtStr = StringRef("", 1); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is producing an invalid StringRef? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not really. This is just to say the format string is not really empty (i.e size = 0) when the user input is an empty format string (a weird corner case) |
||
} | ||
|
||
bool isBuffered = (CGM.getTarget().getTargetOpts().AMDGPUPrintfKindVal == | ||
clang::TargetOptions::AMDGPUPrintfKind::Buffered); | ||
auto Printf = llvm::emitAMDGPUPrintfCall(IRB, Args, isBuffered); | ||
auto Printf = llvm::emitAMDGPUPrintfCall(IRB, Args, FmtStr, isBuffered, | ||
CGM.getLangOpts().OpenCL); | ||
Builder.SetInsertPoint(IRB.GetInsertBlock(), IRB.GetInsertPoint()); | ||
return RValue::get(Printf); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this is necessary for the current task. What does it fix in the parsing OpenCL builtins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although we talked about this offline, the explanation needs to be added here. In fact, the motivation for having this builtin should be added as a comment to the source itself for future reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not belong here. This has nothing to do with AMDGPU