Skip to content

[NVPTX] Revamp NVVMIntrRange pass #94422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 12 additions & 12 deletions clang/test/CodeGenCUDA/cuda-builtin-vars.cu
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@
__attribute__((global))
void kernel(int *out) {
int i = 0;
out[i++] = threadIdx.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x()
out[i++] = threadIdx.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.y()
out[i++] = threadIdx.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.z()
out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.x()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change appears to be causing the test to fail on some bots where there are no extra arguments in this location, causing a single space to not match the expected double space . You can probably fix this by just removing either the leading or trailing space that you added for each of these.

Bot failure:
https://lab.llvm.org/buildbot/#/builders/139/builds/67000


 out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.x()
                                   ^
<stdin>:17:53: note: scanning from here
define dso_local void @_Z6kernelPi(ptr noundef %out) #0 {
                                                    ^
<stdin>:23:7: note: possible intended match here
 %0 = call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x()
      ^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexMaclean I also see this problem on some internal test machines. It seems suspicious - is there some nondeterminism? Or is there a good reason why some machines would not add the range metadata here???

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now that it fails (deterministically) if the NVPTX target is not being built.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexMaclean can you either fix the test or revert the change so that we can get the failing bots green again please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out[i++] = threadIdx.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.y()
out[i++] = threadIdx.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.z()

out[i++] = blockIdx.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
out[i++] = blockIdx.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ctaid.y()
out[i++] = blockIdx.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ctaid.z()
out[i++] = blockIdx.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
out[i++] = blockIdx.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.y()
out[i++] = blockIdx.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ctaid.z()

out[i++] = blockDim.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
out[i++] = blockDim.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ntid.y()
out[i++] = blockDim.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.ntid.z()
out[i++] = blockDim.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
out[i++] = blockDim.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.y()
out[i++] = blockDim.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.ntid.z()

out[i++] = gridDim.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.nctaid.x()
out[i++] = gridDim.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.nctaid.y()
out[i++] = gridDim.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.nctaid.z()
out[i++] = gridDim.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.x()
out[i++] = gridDim.y; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.y()
out[i++] = gridDim.z; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.nctaid.z()

out[i++] = warpSize; // CHECK: store i32 32,

Expand Down
7 changes: 1 addition & 6 deletions llvm/lib/Target/NVPTX/NVPTX.h
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ FunctionPass *createNVPTXISelDag(NVPTXTargetMachine &TM,
ModulePass *createNVPTXAssignValidGlobalNamesPass();
ModulePass *createGenericToNVVMLegacyPass();
ModulePass *createNVPTXCtorDtorLoweringLegacyPass();
FunctionPass *createNVVMIntrRangePass(unsigned int SmVersion);
FunctionPass *createNVVMIntrRangePass();
FunctionPass *createNVVMReflectPass(unsigned int SmVersion);
MachineFunctionPass *createNVPTXPrologEpilogPass();
MachineFunctionPass *createNVPTXReplaceImageHandlesPass();
Expand All @@ -53,12 +53,7 @@ MachineFunctionPass *createNVPTXPeephole();
MachineFunctionPass *createNVPTXProxyRegErasurePass();

struct NVVMIntrRangePass : PassInfoMixin<NVVMIntrRangePass> {
NVVMIntrRangePass();
NVVMIntrRangePass(unsigned SmVersion) : SmVersion(SmVersion) {}
PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);

private:
unsigned SmVersion;
};

struct NVVMReflectPass : PassInfoMixin<NVVMReflectPass> {
Expand Down
32 changes: 13 additions & 19 deletions llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -542,30 +542,24 @@ void NVPTXAsmPrinter::emitKernelFunctionDirectives(const Function &F,
// If the NVVM IR has some of reqntid* specified, then output
// the reqntid directive, and set the unspecified ones to 1.
// If none of Reqntid* is specified, don't output reqntid directive.
unsigned Reqntidx, Reqntidy, Reqntidz;
Reqntidx = Reqntidy = Reqntidz = 1;
bool ReqSpecified = false;
ReqSpecified |= getReqNTIDx(F, Reqntidx);
ReqSpecified |= getReqNTIDy(F, Reqntidy);
ReqSpecified |= getReqNTIDz(F, Reqntidz);
std::optional<unsigned> Reqntidx = getReqNTIDx(F);
std::optional<unsigned> Reqntidy = getReqNTIDy(F);
std::optional<unsigned> Reqntidz = getReqNTIDz(F);

if (ReqSpecified)
O << ".reqntid " << Reqntidx << ", " << Reqntidy << ", " << Reqntidz
<< "\n";
if (Reqntidx || Reqntidy || Reqntidz)
O << ".reqntid " << Reqntidx.value_or(1) << ", " << Reqntidy.value_or(1)
<< ", " << Reqntidz.value_or(1) << "\n";

// If the NVVM IR has some of maxntid* specified, then output
// the maxntid directive, and set the unspecified ones to 1.
// If none of maxntid* is specified, don't output maxntid directive.
unsigned Maxntidx, Maxntidy, Maxntidz;
Maxntidx = Maxntidy = Maxntidz = 1;
bool MaxSpecified = false;
MaxSpecified |= getMaxNTIDx(F, Maxntidx);
MaxSpecified |= getMaxNTIDy(F, Maxntidy);
MaxSpecified |= getMaxNTIDz(F, Maxntidz);

if (MaxSpecified)
O << ".maxntid " << Maxntidx << ", " << Maxntidy << ", " << Maxntidz
<< "\n";
std::optional<unsigned> Maxntidx = getMaxNTIDx(F);
std::optional<unsigned> Maxntidy = getMaxNTIDy(F);
std::optional<unsigned> Maxntidz = getMaxNTIDz(F);

if (Maxntidx || Maxntidy || Maxntidz)
O << ".maxntid " << Maxntidx.value_or(1) << ", " << Maxntidy.value_or(1)
<< ", " << Maxntidz.value_or(1) << "\n";

unsigned Mincta = 0;
if (getMinCTASm(F, Mincta))
Expand Down
6 changes: 3 additions & 3 deletions llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -233,9 +233,9 @@ void NVPTXTargetMachine::registerPassBuilderCallbacks(
[this](ModulePassManager &PM, OptimizationLevel Level) {
FunctionPassManager FPM;
FPM.addPass(NVVMReflectPass(Subtarget.getSmVersion()));
// FIXME: NVVMIntrRangePass is causing numerical discrepancies,
// investigate and re-enable.
// FPM.addPass(NVVMIntrRangePass(Subtarget.getSmVersion()));
// Note: NVVMIntrRangePass was causing numerical discrepancies at one
// point, if issues crop up, consider disabling.
FPM.addPass(NVVMIntrRangePass());
PM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM)));
});
}
Expand Down
57 changes: 45 additions & 12 deletions llvm/lib/Target/NVPTX/NVPTXUtilities.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,14 @@ bool findOneNVVMAnnotation(const GlobalValue *gv, const std::string &prop,
return true;
}

static std::optional<unsigned>
findOneNVVMAnnotation(const GlobalValue &GV, const std::string &PropName) {
unsigned RetVal;
if (findOneNVVMAnnotation(&GV, PropName, RetVal))
return RetVal;
return std::nullopt;
}

bool findAllNVVMAnnotation(const GlobalValue *gv, const std::string &prop,
std::vector<unsigned> &retval) {
auto &AC = getAnnotationCache();
Expand Down Expand Up @@ -252,32 +260,57 @@ std::string getSamplerName(const Value &val) {
return std::string(val.getName());
}

bool getMaxNTIDx(const Function &F, unsigned &x) {
return findOneNVVMAnnotation(&F, "maxntidx", x);
std::optional<unsigned> getMaxNTIDx(const Function &F) {
return findOneNVVMAnnotation(F, "maxntidx");
}

bool getMaxNTIDy(const Function &F, unsigned &y) {
return findOneNVVMAnnotation(&F, "maxntidy", y);
std::optional<unsigned> getMaxNTIDy(const Function &F) {
return findOneNVVMAnnotation(F, "maxntidy");
}

bool getMaxNTIDz(const Function &F, unsigned &z) {
return findOneNVVMAnnotation(&F, "maxntidz", z);
std::optional<unsigned> getMaxNTIDz(const Function &F) {
return findOneNVVMAnnotation(F, "maxntidz");
}

std::optional<unsigned> getMaxNTID(const Function &F) {
// Note: The semantics here are a bit strange. The PTX ISA states the
// following (11.4.2. Performance-Tuning Directives: .maxntid):
//
// Note that this directive guarantees that the total number of threads does
// not exceed the maximum, but does not guarantee that the limit in any
// particular dimension is not exceeded.
std::optional<unsigned> MaxNTIDx = getMaxNTIDx(F);
std::optional<unsigned> MaxNTIDy = getMaxNTIDy(F);
std::optional<unsigned> MaxNTIDz = getMaxNTIDz(F);
if (MaxNTIDx || MaxNTIDy || MaxNTIDz)
return MaxNTIDx.value_or(1) * MaxNTIDy.value_or(1) * MaxNTIDz.value_or(1);
return std::nullopt;
}

bool getMaxClusterRank(const Function &F, unsigned &x) {
return findOneNVVMAnnotation(&F, "maxclusterrank", x);
}

bool getReqNTIDx(const Function &F, unsigned &x) {
return findOneNVVMAnnotation(&F, "reqntidx", x);
std::optional<unsigned> getReqNTIDx(const Function &F) {
return findOneNVVMAnnotation(F, "reqntidx");
}

std::optional<unsigned> getReqNTIDy(const Function &F) {
return findOneNVVMAnnotation(F, "reqntidy");
}

bool getReqNTIDy(const Function &F, unsigned &y) {
return findOneNVVMAnnotation(&F, "reqntidy", y);
std::optional<unsigned> getReqNTIDz(const Function &F) {
return findOneNVVMAnnotation(F, "reqntidz");
}

bool getReqNTIDz(const Function &F, unsigned &z) {
return findOneNVVMAnnotation(&F, "reqntidz", z);
std::optional<unsigned> getReqNTID(const Function &F) {
// Note: The semantics here are a bit strange. See getMaxNTID.
std::optional<unsigned> ReqNTIDx = getReqNTIDx(F);
std::optional<unsigned> ReqNTIDy = getReqNTIDy(F);
std::optional<unsigned> ReqNTIDz = getReqNTIDz(F);
if (ReqNTIDx || ReqNTIDy || ReqNTIDz)
return ReqNTIDx.value_or(1) * ReqNTIDy.value_or(1) * ReqNTIDz.value_or(1);
return std::nullopt;
}

bool getMinCTASm(const Function &F, unsigned &x) {
Expand Down
16 changes: 9 additions & 7 deletions llvm/lib/Target/NVPTX/NVPTXUtilities.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,15 @@ std::string getTextureName(const Value &);
std::string getSurfaceName(const Value &);
std::string getSamplerName(const Value &);

bool getMaxNTIDx(const Function &, unsigned &);
bool getMaxNTIDy(const Function &, unsigned &);
bool getMaxNTIDz(const Function &, unsigned &);

bool getReqNTIDx(const Function &, unsigned &);
bool getReqNTIDy(const Function &, unsigned &);
bool getReqNTIDz(const Function &, unsigned &);
std::optional<unsigned> getMaxNTIDx(const Function &);
std::optional<unsigned> getMaxNTIDy(const Function &);
std::optional<unsigned> getMaxNTIDz(const Function &);
std::optional<unsigned> getMaxNTID(const Function &F);

std::optional<unsigned> getReqNTIDx(const Function &);
std::optional<unsigned> getReqNTIDy(const Function &);
std::optional<unsigned> getReqNTIDz(const Function &);
std::optional<unsigned> getReqNTID(const Function &);

bool getMaxClusterRank(const Function &, unsigned &);
bool getMinCTASm(const Function &, unsigned &);
Expand Down
Loading
Loading