-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[NVPTX] Add support for maxclusterrank in launch_bounds #66496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -296,8 +296,8 @@ void CodeGenModule::handleCUDALaunchBoundsAttr( | |
NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "maxntidx", | ||
MaxThreads.getExtValue()); | ||
|
||
// min blocks is an optional argument for CUDALaunchBoundsAttr. If it was | ||
// not specified in __launch_bounds__ or if the user specified a 0 value, | ||
// min and max blocks is an optional argument for CUDALaunchBoundsAttr. If it | ||
// was not specified in __launch_bounds__ or if the user specified a 0 value, | ||
// we don't have to add a PTX directive. | ||
if (Attr->getMinBlocks()) { | ||
llvm::APSInt MinBlocks(32); | ||
|
@@ -307,6 +307,14 @@ void CodeGenModule::handleCUDALaunchBoundsAttr( | |
NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "minctasm", | ||
MinBlocks.getExtValue()); | ||
} | ||
if (Attr->getMaxBlocks()) { | ||
llvm::APSInt MaxBlocks(32); | ||
MaxBlocks = Attr->getMaxBlocks()->EvaluateKnownConstInt(getContext()); | ||
if (MaxBlocks > 0) | ||
// Create !{<func-ref>, metadata !"maxclusterrank", i32 <val>} node | ||
NVPTXTargetCodeGenInfo::addNVVMMetadata(F, "maxclusterrank", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have enough information to assert this is non-negative? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good question, so |
||
MaxBlocks.getExtValue()); | ||
} | ||
} | ||
|
||
std::unique_ptr<TargetCodeGenInfo> | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,6 +10,7 @@ | |
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#include "../Basic/Targets/NVPTX.h" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This header is not part of The bazel build shows the problem: https://buildkite.com/llvm-project/upstream-bazel/builds/75928#018ad568-2f7c-4dda-ae90-3b4d787caad7. Breaking bazel is not itself reason to revert, but here it's flagging a real problem that CMake doesn't catch. I've reverted as 0afbcb2 - sorry to do this so abruptly, but I can't fix this myself & such problems block downstream use of LLVM. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sam-mccall, apologies for introducing the bug and thank you for drawing my attention to it. I've got the fix for the problem: diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index 10d1c910d9cd..3b87300e24bc 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -10,7 +10,6 @@
//
//===----------------------------------------------------------------------===//
-#include "../Basic/Targets/NVPTX.h"
#include "clang/AST/ASTConsumer.h"
#include "clang/AST/ASTContext.h"
#include "clang/AST/ASTMutationListener.h"
@@ -5612,7 +5611,8 @@ bool Sema::CheckRegparmAttr(const ParsedAttr &AL, unsigned &numParams) {
static CudaArch getCudaArch(const TargetInfo &TI) {
if (!TI.getTriple().isNVPTX())
llvm_unreachable("getCudaArch is only valid for NVPTX triple");
- return static_cast<const targets::NVPTXTargetInfo *>(&TI)->getGPU();
+ auto &TO = TI.getTargetOpts();
+ return StringToCudaArch(TO.CPU);
}
// Checks whether an argument of launch_bounds attribute is Would you be so king and point me to the process for "reverting the revert" and folding the fix into the original patch? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jchlanda ah, that's simpler than I expected! Wish I'd found that before reverting... That looks good to me, I think rather than a new review you can just Or if you prefer, send it as a new PR, happy to approve it. |
||
#include "clang/AST/ASTConsumer.h" | ||
#include "clang/AST/ASTContext.h" | ||
#include "clang/AST/ASTMutationListener.h" | ||
|
@@ -5608,6 +5609,13 @@ bool Sema::CheckRegparmAttr(const ParsedAttr &AL, unsigned &numParams) { | |
return false; | ||
} | ||
|
||
// Helper to get CudaArch. | ||
static CudaArch getCudaArch(const TargetInfo &TI) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Considering that we do have TargetInfo pointer here, instead of trying to figure out the target GPU via features, can we just extract CudaArch directly from NVPTXTargetInfo::GPU ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is that the kind of thing you had in mind: diff --git a/clang/lib/Basic/Targets/NVPTX.h b/clang/lib/Basic/Targets/NVPTX.h
index 6fa0b8df97d7..20d76b702a94 100644
--- a/clang/lib/Basic/Targets/NVPTX.h
+++ b/clang/lib/Basic/Targets/NVPTX.h
@@ -181,6 +181,8 @@ public:
bool hasBitIntType() const override { return true; }
bool hasBFloat16Type() const override { return true; }
+
+ CudaArch getGPU() const { return GPU; }
};
} // namespace targets
} // namespace clang
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index c4ecaec7728b..636bb0694d36 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -10,6 +10,7 @@
//
//===----------------------------------------------------------------------===//
+#include "../Basic/Targets/NVPTX.h"
#include "clang/AST/ASTConsumer.h"
#include "clang/AST/ASTContext.h"
#include "clang/AST/ASTMutationListener.h"
@@ -5609,17 +5610,7 @@ bool Sema::CheckRegparmAttr(const ParsedAttr &AL, unsigned &numParams) {
// Helper to get CudaArch.
static CudaArch getCudaArch(const TargetInfo &TI) {
- if (!TI.hasFeature("ptx")) {
- return CudaArch::UNKNOWN;
- }
- for (const auto &Feature : TI.getTargetOpts().FeatureMap) {
- if (Feature.getValue()) {
- CudaArch Arch = StringToCudaArch(Feature.getKey());
- if (Arch != CudaArch::UNKNOWN)
- return Arch;
- }
- }
- return CudaArch::UNKNOWN;
+ return static_cast<const targets::NVPTXTargetInfo *>(&TI)->getGPU();
}
// Checks whether an argument of launch_bounds attribute is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You may need to verify that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done in: 3c17966 |
||
if (!TI.getTriple().isNVPTX()) | ||
llvm_unreachable("getCudaArch is only valid for NVPTX triple"); | ||
return static_cast<const targets::NVPTXTargetInfo *>(&TI)->getGPU(); | ||
} | ||
|
||
// Checks whether an argument of launch_bounds attribute is | ||
// acceptable, performs implicit conversion to Rvalue, and returns | ||
// non-nullptr Expr result on success. Otherwise, it returns nullptr | ||
|
@@ -5651,34 +5659,51 @@ static Expr *makeLaunchBoundsArgExpr(Sema &S, Expr *E, | |
|
||
CUDALaunchBoundsAttr * | ||
Sema::CreateLaunchBoundsAttr(const AttributeCommonInfo &CI, Expr *MaxThreads, | ||
Expr *MinBlocks) { | ||
CUDALaunchBoundsAttr TmpAttr(Context, CI, MaxThreads, MinBlocks); | ||
Expr *MinBlocks, Expr *MaxBlocks) { | ||
CUDALaunchBoundsAttr TmpAttr(Context, CI, MaxThreads, MinBlocks, MaxBlocks); | ||
MaxThreads = makeLaunchBoundsArgExpr(*this, MaxThreads, TmpAttr, 0); | ||
if (MaxThreads == nullptr) | ||
if (!MaxThreads) | ||
return nullptr; | ||
|
||
if (MinBlocks) { | ||
MinBlocks = makeLaunchBoundsArgExpr(*this, MinBlocks, TmpAttr, 1); | ||
if (MinBlocks == nullptr) | ||
if (!MinBlocks) | ||
return nullptr; | ||
} | ||
|
||
if (MaxBlocks) { | ||
// '.maxclusterrank' ptx directive requires .target sm_90 or higher. | ||
auto SM = getCudaArch(Context.getTargetInfo()); | ||
if (SM == CudaArch::UNKNOWN || SM < CudaArch::SM_90) { | ||
Diag(MaxBlocks->getBeginLoc(), diag::warn_cuda_maxclusterrank_sm_90) | ||
<< CudaArchToString(SM) << CI << MaxBlocks->getSourceRange(); | ||
// Ignore it by setting MaxBlocks to null; | ||
MaxBlocks = nullptr; | ||
} else { | ||
MaxBlocks = makeLaunchBoundsArgExpr(*this, MaxBlocks, TmpAttr, 2); | ||
if (!MaxBlocks) | ||
return nullptr; | ||
} | ||
} | ||
|
||
return ::new (Context) | ||
CUDALaunchBoundsAttr(Context, CI, MaxThreads, MinBlocks); | ||
CUDALaunchBoundsAttr(Context, CI, MaxThreads, MinBlocks, MaxBlocks); | ||
} | ||
|
||
void Sema::AddLaunchBoundsAttr(Decl *D, const AttributeCommonInfo &CI, | ||
Expr *MaxThreads, Expr *MinBlocks) { | ||
if (auto *Attr = CreateLaunchBoundsAttr(CI, MaxThreads, MinBlocks)) | ||
Expr *MaxThreads, Expr *MinBlocks, | ||
Expr *MaxBlocks) { | ||
if (auto *Attr = CreateLaunchBoundsAttr(CI, MaxThreads, MinBlocks, MaxBlocks)) | ||
D->addAttr(Attr); | ||
} | ||
|
||
static void handleLaunchBoundsAttr(Sema &S, Decl *D, const ParsedAttr &AL) { | ||
if (!AL.checkAtLeastNumArgs(S, 1) || !AL.checkAtMostNumArgs(S, 2)) | ||
if (!AL.checkAtLeastNumArgs(S, 1) || !AL.checkAtMostNumArgs(S, 3)) | ||
return; | ||
|
||
S.AddLaunchBoundsAttr(D, AL, AL.getArgAsExpr(0), | ||
AL.getNumArgs() > 1 ? AL.getArgAsExpr(1) : nullptr); | ||
AL.getNumArgs() > 1 ? AL.getArgAsExpr(1) : nullptr, | ||
AL.getNumArgs() > 2 ? AL.getArgAsExpr(2) : nullptr); | ||
} | ||
|
||
static void handleArgumentWithTypeTagAttr(Sema &S, Decl *D, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
// RUN: %clang_cc1 -std=c++11 -fsyntax-only -triple nvptx-unknown-unknown -target-cpu sm_90 -verify %s | ||
|
||
#include "Inputs/cuda.h" | ||
|
||
__launch_bounds__(128, 7) void Test2Args(void); | ||
__launch_bounds__(128) void Test1Arg(void); | ||
|
||
__launch_bounds__(0xffffffff) void TestMaxArg(void); | ||
__launch_bounds__(0x100000000) void TestTooBigArg(void); // expected-error {{integer constant expression evaluates to value 4294967296 that cannot be represented in a 32-bit unsigned integer type}} | ||
__launch_bounds__(0x10000000000000000) void TestWayTooBigArg(void); // expected-error {{integer literal is too large to be represented in any integer type}} | ||
__launch_bounds__(1, 1, 0x10000000000000000) void TestWayTooBigArg(void); // expected-error {{integer literal is too large to be represented in any integer type}} | ||
|
||
__launch_bounds__(-128, 7) void TestNegArg1(void); // expected-warning {{'launch_bounds' attribute parameter 0 is negative and will be ignored}} | ||
__launch_bounds__(128, -7) void TestNegArg2(void); // expected-warning {{'launch_bounds' attribute parameter 1 is negative and will be ignored}} | ||
__launch_bounds__(-128, 1, 7) void TestNegArg2(void); // expected-warning {{'launch_bounds' attribute parameter 0 is negative and will be ignored}} | ||
__launch_bounds__(128, -1, 7) void TestNegArg2(void); // expected-warning {{'launch_bounds' attribute parameter 1 is negative and will be ignored}} | ||
__launch_bounds__(128, 1, -7) void TestNegArg2(void); // expected-warning {{'launch_bounds' attribute parameter 2 is negative and will be ignored}} | ||
Artem-B marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// expected-warning@20 {{'launch_bounds' attribute parameter 0 is negative and will be ignored}} | ||
// expected-warning@20 {{'launch_bounds' attribute parameter 1 is negative and will be ignored}} | ||
__launch_bounds__(-128, -1, 7) void TestNegArg2(void); | ||
// expected-warning@23 {{'launch_bounds' attribute parameter 0 is negative and will be ignored}} | ||
// expected-warning@23 {{'launch_bounds' attribute parameter 2 is negative and will be ignored}} | ||
__launch_bounds__(-128, 1, -7) void TestNegArg2(void); | ||
// expected-warning@27 {{'launch_bounds' attribute parameter 0 is negative and will be ignored}} | ||
// expected-warning@27 {{'launch_bounds' attribute parameter 1 is negative and will be ignored}} | ||
// expected-warning@27 {{'launch_bounds' attribute parameter 2 is negative and will be ignored}} | ||
__launch_bounds__(-128, -1, -7) void TestNegArg2(void); | ||
|
||
|
||
__launch_bounds__(1, 2, 3, 4) void Test4Args(void); // expected-error {{'launch_bounds' attribute takes no more than 3 arguments}} | ||
__launch_bounds__() void TestNoArgs(void); // expected-error {{'launch_bounds' attribute takes at least 1 argument}} | ||
|
||
int TestNoFunction __launch_bounds__(128, 7, 13); // expected-warning {{'launch_bounds' attribute only applies to Objective-C methods, functions, and function pointers}} | ||
|
||
__launch_bounds__(true) void TestBool(void); | ||
__launch_bounds__(128, 1, 128.0) void TestFP(void); // expected-error {{'launch_bounds' attribute requires parameter 2 to be an integer constant}} | ||
__launch_bounds__(128, 1, (void*)0) void TestNullptr(void); // expected-error {{'launch_bounds' attribute requires parameter 2 to be an integer constant}} | ||
|
||
int nonconstint = 256; | ||
__launch_bounds__(125, 1, nonconstint) void TestNonConstInt(void); // expected-error {{'launch_bounds' attribute requires parameter 2 to be an integer constant}} | ||
|
||
const int constint = 512; | ||
__launch_bounds__(128, 1, constint) void TestConstInt(void); | ||
__launch_bounds__(128, 1, constint * 2 + 3) void TestConstIntExpr(void); | ||
|
||
template <int a, int b, int c> __launch_bounds__(a, b, c) void TestTemplate2Args(void) {} | ||
template void TestTemplate2Args<128,7, 13>(void); | ||
|
||
template <int a, int b, int c> | ||
__launch_bounds__(a + b, c + constint, a + b + c + constint) void TestTemplateExpr(void) {} | ||
template void TestTemplateExpr<128+constint, 3, 7>(void); | ||
|
||
template <int... Args> | ||
__launch_bounds__(Args) void TestTemplateVariadicArgs(void) {} // expected-error {{expression contains unexpanded parameter pack 'Args'}} | ||
|
||
template <int... Args> | ||
__launch_bounds__(1, 22, Args) void TestTemplateVariadicArgs2(void) {} // expected-error {{expression contains unexpanded parameter pack 'Args'}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we ignoring the whole launch_bounds attribute, or only the MaxBlocks parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole thing, this is analogous to how we currently handle:
we issue a warning:
vs max cluster rank:
and the resulting asm contains neither of the directives.