Skip to content

[AMDGPU] Enable OpenCL hostcall printf (WIP) #72556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions clang/include/clang/Basic/BuiltinsAMDGPU.def
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@
#if defined(BUILTIN) && !defined(TARGET_BUILTIN)
# define TARGET_BUILTIN(ID, TYPE, ATTRS, FEATURE) BUILTIN(ID, TYPE, ATTRS)
#endif

#if defined(BUILTIN) && !defined(LANGBUILTIN)
#define LANGBUILTIN(ID, TYPE, ATTRS, BUILTIN_LANG) BUILTIN(ID, TYPE, ATTRS)
#endif
//===----------------------------------------------------------------------===//
// SI+ only builtins.
//===----------------------------------------------------------------------===//
Expand Down Expand Up @@ -498,6 +502,14 @@ TARGET_BUILTIN(__builtin_amdgcn_swmmac_f32_16x16x32_fp8_fp8_w64, "V4fiV2iV4fs",
TARGET_BUILTIN(__builtin_amdgcn_swmmac_f32_16x16x32_fp8_bf8_w64, "V4fiV2iV4fs", "nc", "gfx12-insts,wavefrontsize64")
TARGET_BUILTIN(__builtin_amdgcn_swmmac_f32_16x16x32_bf8_fp8_w64, "V4fiV2iV4fs", "nc", "gfx12-insts,wavefrontsize64")
TARGET_BUILTIN(__builtin_amdgcn_swmmac_f32_16x16x32_bf8_bf8_w64, "V4fiV2iV4fs", "nc", "gfx12-insts,wavefrontsize64")
// OpenCL
// OpenCL printf has the following signature
// int printf(__constant const char* st, ...) __attribute__((format(printf,
// 1, 2)));
// The "__constant" address space corresponds to number 4 in LLVM IR for AMDGPU.
// Following entry makes sure printf is recognized as builtin for OCL inputs.
LANGBUILTIN(printf, "icC*4.", "fp:0:", ALL_OCL_LANGUAGES)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is necessary for the current task. What does it fix in the parsing OpenCL builtins?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we talked about this offline, the explanation needs to be added here. In fact, the motivation for having this builtin should be added as a comment to the source itself for future reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not belong here. This has nothing to do with AMDGPU


#undef BUILTIN
#undef TARGET_BUILTIN
#undef LANGBUILTIN
7 changes: 7 additions & 0 deletions clang/lib/AST/Decl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
#include "clang/Basic/SourceLocation.h"
#include "clang/Basic/SourceManager.h"
#include "clang/Basic/Specifiers.h"
#include "clang/Basic/TargetBuiltins.h"
#include "clang/Basic/TargetCXXABI.h"
#include "clang/Basic/TargetInfo.h"
#include "clang/Basic/Visibility.h"
Expand Down Expand Up @@ -3616,6 +3617,12 @@ unsigned FunctionDecl::getBuiltinID(bool ConsiderWrapperFunctions) const {
if (!ConsiderWrapperFunctions && getStorageClass() == SC_Static)
return 0;

// AMDGCN implementation supports printf as a builtin
// for OpenCL
if (Context.getTargetInfo().getTriple().isAMDGCN() &&
Context.getLangOpts().OpenCL && BuiltinID == AMDGPU::BIprintf)
return BuiltinID;
Comment on lines +3620 to +3624
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not belong here and has nothing to do with AMDGPU

Copy link
Contributor Author

@vikramRH vikramRH Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signatures of C-printf and OCL printf differ and I dont think generic builtin handling provides a way to register overloaded builtins with "shared" builtin ID's. This needs a new ID. do you have any alternate suggestions here ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this had been clarified earlier too. It's quite imprecise to just say that "signatures differ". Perhaps the following detailed explanation might move the conversatino forward. The problem is that the OpenCL printf expects a format string in the constant address space, which has no representation in Clang builtin. What we do have is the ability to specify an address-space number in the builtin declaration. But this number is target-specific, which makes the whole builtin target-specific. Is there a way around that magic number 4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only other alternative I see currently is to modify Sema (probably ActOnFunctionDeclarator) so that we map the ocl printf declaration to C printf builtin ID. This would be a really hacky solution and I would prefer this implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @arsenm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The builtin specifications are also in terms of the lang address space, not the target address space (this was an ugly compromise to make builtins work at all in OpenCL)

Copy link
Contributor Author

@vikramRH vikramRH Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm, thanks for the info. CustomTypeChecking is a valid option. I'm not sure why OpenCL community did not consider this change despite OpenCL specs specifying the details. I could create a separate patch for this (probably folks from OCL community would provide further background). Meanwhile, can this go ahead as an AMDGPU specific workaround for now so that we have the intended feature in place ? (The frontend changes here can be reverted with that follow up patch )

PS :Ah, I see another issue . OpenCL v1.2 s6.9.f states none of the functions defined in C99 headers are available. This would mean std printf is supposed to be treated differently than OpenCL builtins and consequently the builtin IDs assigned to them "need" to be different. If this understanding is correct, moving ahead with using same builtin ID as std printf is not the right way. (probably this is why such an implementation was never considered)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. Meanwhile, can this go ahead as an AMDGPU specific workaround for now so that we have the intended feature in place

No. That cleanup will never happen.

PS :Ah, I see another issue . OpenCL v1.2 s6.9.f states none of the functions defined in C99 headers are available. This would mean std printf is supposed to be treated differently than OpenCL builtins and consequently the builtin IDs assigned to them "need" to be different

That's not what that means

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to

. This essentially means that even if frontend attaches the printf builtin ID to the decl (even after custom type checks), this would revert.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that check is buggy. The printf declaration doesn't come from a header, but the printf function does exist in 1.2+. It probably needs to special case printf


// OpenCL v1.2 s6.9.f - The library functions defined in
// the C99 standard headers are not available.
if (Context.getLangOpts().OpenCL &&
Expand Down
2 changes: 2 additions & 0 deletions clang/lib/Basic/Targets/AMDGPU.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,8 @@ static constexpr Builtin::Info BuiltinInfo[] = {
{#ID, TYPE, ATTRS, nullptr, HeaderDesc::NO_HEADER, ALL_LANGUAGES},
#define TARGET_BUILTIN(ID, TYPE, ATTRS, FEATURE) \
{#ID, TYPE, ATTRS, FEATURE, HeaderDesc::NO_HEADER, ALL_LANGUAGES},
#define LANGBUILTIN(ID, TYPE, ATTRS, LANG) \
{#ID, TYPE, ATTRS, nullptr, HeaderDesc::NO_HEADER, LANG},
#include "clang/Basic/BuiltinsAMDGPU.def"
};

Expand Down
8 changes: 7 additions & 1 deletion clang/lib/CodeGen/CGBuiltin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2550,6 +2550,11 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
&getTarget().getLongDoubleFormat() == &llvm::APFloat::IEEEquad())
BuiltinID = mutateLongDoubleBuiltin(BuiltinID);

// Mutate the printf builtin ID so that we use the same CodeGen path for
// HIP and OpenCL with AMDGPU targets.
if (getTarget().getTriple().isAMDGCN() && BuiltinID == AMDGPU::BIprintf)
BuiltinID = Builtin::BIprintf;
Comment on lines +2553 to +2556
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not need to remap builtins

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed if you feel so, probably we would need a new case in Expr CodeGen

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all you need is to add the CustomTypeChecking attribute to the printf definition, and then add language specific type checking on the string argument


// If the builtin has been declared explicitly with an assembler label,
// disable the specialized emitting below. Ideally we should communicate the
// rename in IR, or at least avoid generating the intrinsic calls that are
Expand Down Expand Up @@ -5722,7 +5727,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
return EmitOpenMPDevicePrintfCallExpr(E);
if (getTarget().getTriple().isNVPTX())
return EmitNVPTXDevicePrintfCallExpr(E);
if (getTarget().getTriple().isAMDGCN() && getLangOpts().HIP)
if (getTarget().getTriple().isAMDGCN() &&
(getLangOpts().HIP || getLangOpts().OpenCL))
return EmitAMDGPUDevicePrintfCallExpr(E);
}

Expand Down
22 changes: 16 additions & 6 deletions clang/lib/CodeGen/CGGPUBuiltin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@

#include "CodeGenFunction.h"
#include "clang/Basic/Builtins.h"
#include "clang/Basic/TargetBuiltins.h"
#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Instruction.h"
#include "llvm/Support/MathExtras.h"
Expand Down Expand Up @@ -181,14 +183,17 @@ RValue CodeGenFunction::EmitNVPTXDevicePrintfCallExpr(const CallExpr *E) {
RValue CodeGenFunction::EmitAMDGPUDevicePrintfCallExpr(const CallExpr *E) {
assert(getTarget().getTriple().getArch() == llvm::Triple::amdgcn);
assert(E->getBuiltinCallee() == Builtin::BIprintf ||
E->getBuiltinCallee() == Builtin::BI__builtin_printf);
E->getBuiltinCallee() == Builtin::BI__builtin_printf ||
E->getBuiltinCallee() == AMDGPU::BIprintf);
assert(E->getNumArgs() >= 1); // printf always has at least one arg.

CallArgList CallArgs;
EmitCallArgs(CallArgs,
E->getDirectCallee()->getType()->getAs<FunctionProtoType>(),
E->arguments(), E->getDirectCallee(),
/* ParamsToSkip = */ 0);
llvm::IRBuilder<> IRB(Builder.GetInsertBlock(), Builder.GetInsertPoint());
IRB.SetCurrentDebugLocation(Builder.getCurrentDebugLocation());

SmallVector<llvm::Value *, 8> Args;
for (const auto &A : CallArgs) {
Expand All @@ -202,12 +207,17 @@ RValue CodeGenFunction::EmitAMDGPUDevicePrintfCallExpr(const CallExpr *E) {
Args.push_back(Arg);
}

llvm::IRBuilder<> IRB(Builder.GetInsertBlock(), Builder.GetInsertPoint());
IRB.SetCurrentDebugLocation(Builder.getCurrentDebugLocation());
auto PFK = CGM.getTarget().getTargetOpts().AMDGPUPrintfKindVal;
bool isBuffered = (PFK == clang::TargetOptions::AMDGPUPrintfKind::Buffered);

StringRef FmtStr;
if (llvm::getConstantStringInfo(Args[0], FmtStr)) {
if (FmtStr.empty())
FmtStr = StringRef("", 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is producing an invalid StringRef?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really. This is just to say the format string is not really empty (i.e size = 0) when the user input is an empty format string (a weird corner case)

}

bool isBuffered = (CGM.getTarget().getTargetOpts().AMDGPUPrintfKindVal ==
clang::TargetOptions::AMDGPUPrintfKind::Buffered);
auto Printf = llvm::emitAMDGPUPrintfCall(IRB, Args, isBuffered);
auto Printf = llvm::emitAMDGPUPrintfCall(IRB, Args, FmtStr, isBuffered,
CGM.getLangOpts().OpenCL);
Builder.SetInsertPoint(IRB.GetInsertBlock(), IRB.GetInsertPoint());
return RValue::get(Printf);
}
Expand Down
8 changes: 8 additions & 0 deletions clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4757,6 +4757,14 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
Args.ClaimAllArgs(options::OPT_gen_cdb_fragment_path);
}

if (TC.getTriple().isAMDGPU() && types::isOpenCL(Input.getType())) {
if (Args.getLastArg(options::OPT_mprintf_kind_EQ)) {
CmdArgs.push_back(Args.MakeArgString(
"-mprintf-kind=" +
Args.getLastArgValue(options::OPT_mprintf_kind_EQ)));
}
}

if (IsCuda || IsHIP) {
// We have to pass the triple of the host if compiling for a CUDA/HIP device
// and vice-versa.
Expand Down
Loading