-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[NVPTX] Improve NVVMReflect Efficiency #134416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-backend-nvptx Author: Yonah Goldberg (YonahGoldberg) ChangesThe NVVMReflect pass simply replaces calls to nvvm-reflect functions with the appropriate constant, either the architecture number, or nvvm-reflect-ftz, found in the module's metadata. The implementation is inefficient and does this by traversing through all instructions to find calls. The common case is that you never call nvvm-reflect, so this traversal is costly. This PR:
Full diff: https://github.com/llvm/llvm-project/pull/134416.diff 4 Files Affected:
diff --git a/llvm/lib/Target/NVPTX/NVPTX.h b/llvm/lib/Target/NVPTX/NVPTX.h
index 20a5bf46dc06b..8efa0bb546546 100644
--- a/llvm/lib/Target/NVPTX/NVPTX.h
+++ b/llvm/lib/Target/NVPTX/NVPTX.h
@@ -43,7 +43,7 @@ ModulePass *createNVPTXAssignValidGlobalNamesPass();
ModulePass *createGenericToNVVMLegacyPass();
ModulePass *createNVPTXCtorDtorLoweringLegacyPass();
FunctionPass *createNVVMIntrRangePass();
-FunctionPass *createNVVMReflectPass(unsigned int SmVersion);
+ModulePass *createNVVMReflectPass(unsigned int SmVersion);
MachineFunctionPass *createNVPTXPrologEpilogPass();
MachineFunctionPass *createNVPTXReplaceImageHandlesPass();
FunctionPass *createNVPTXImageOptimizerPass();
@@ -60,12 +60,12 @@ struct NVVMIntrRangePass : PassInfoMixin<NVVMIntrRangePass> {
};
struct NVVMReflectPass : PassInfoMixin<NVVMReflectPass> {
- NVVMReflectPass();
- NVVMReflectPass(unsigned SmVersion) : SmVersion(SmVersion) {}
- PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
+ NVVMReflectPass() : NVVMReflectPass(0) {}
+ NVVMReflectPass(unsigned SmVersion);
+ PreservedAnalyses run(Module &F, ModuleAnalysisManager &AM);
private:
- unsigned SmVersion;
+ StringMap<int> VarMap;
};
struct GenericToNVVMPass : PassInfoMixin<GenericToNVVMPass> {
diff --git a/llvm/lib/Target/NVPTX/NVPTXPassRegistry.def b/llvm/lib/Target/NVPTX/NVPTXPassRegistry.def
index 34c79b8f77bae..1c813c2c51f70 100644
--- a/llvm/lib/Target/NVPTX/NVPTXPassRegistry.def
+++ b/llvm/lib/Target/NVPTX/NVPTXPassRegistry.def
@@ -18,6 +18,7 @@
#endif
MODULE_PASS("generic-to-nvvm", GenericToNVVMPass())
MODULE_PASS("nvptx-lower-ctor-dtor", NVPTXCtorDtorLoweringPass())
+MODULE_PASS("nvvm-reflect", NVVMReflectPass())
#undef MODULE_PASS
#ifndef FUNCTION_ANALYSIS
@@ -36,7 +37,6 @@ FUNCTION_ALIAS_ANALYSIS("nvptx-aa", NVPTXAA())
#define FUNCTION_PASS(NAME, CREATE_PASS)
#endif
FUNCTION_PASS("nvvm-intr-range", NVVMIntrRangePass())
-FUNCTION_PASS("nvvm-reflect", NVVMReflectPass())
FUNCTION_PASS("nvptx-copy-byval-args", NVPTXCopyByValArgsPass())
FUNCTION_PASS("nvptx-lower-args", NVPTXLowerArgsPass(*this));
#undef FUNCTION_PASS
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
index 5bb168704bad0..e84b707725566 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
@@ -260,11 +260,12 @@ void NVPTXTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {
PB.registerPipelineStartEPCallback(
[this](ModulePassManager &PM, OptimizationLevel Level) {
- FunctionPassManager FPM;
// We do not want to fold out calls to nvvm.reflect early if the user
// has not provided a target architecture just yet.
if (Subtarget.hasTargetName())
- FPM.addPass(NVVMReflectPass(Subtarget.getSmVersion()));
+ PM.addPass(NVVMReflectPass(Subtarget.getSmVersion()));
+
+ FunctionPassManager FPM;
// Note: NVVMIntrRangePass was causing numerical discrepancies at one
// point, if issues crop up, consider disabling.
FPM.addPass(NVVMIntrRangePass());
diff --git a/llvm/lib/Target/NVPTX/NVVMReflect.cpp b/llvm/lib/Target/NVPTX/NVVMReflect.cpp
index 20b8bef1899b4..ababb7f7c9d1f 100644
--- a/llvm/lib/Target/NVPTX/NVVMReflect.cpp
+++ b/llvm/lib/Target/NVPTX/NVVMReflect.cpp
@@ -4,7 +4,7 @@
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
-//===----------------------------------------------------------------------===//
+
//
// This pass replaces occurrences of __nvvm_reflect("foo") and llvm.nvvm.reflect
// with an integer.
@@ -25,7 +25,6 @@
#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Function.h"
-#include "llvm/IR/InstIterator.h"
#include "llvm/IR/Instructions.h"
#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/IntrinsicsNVPTX.h"
@@ -39,33 +38,47 @@
#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Local.h"
+#include "llvm/Transforms/Utils/StripGCRelocates.h"
#include <algorithm>
#define NVVM_REFLECT_FUNCTION "__nvvm_reflect"
#define NVVM_REFLECT_OCL_FUNCTION "__nvvm_reflect_ocl"
using namespace llvm;
-#define DEBUG_TYPE "nvptx-reflect"
+#define DEBUG_TYPE "nvvm-reflect"
namespace llvm {
void initializeNVVMReflectPass(PassRegistry &);
}
namespace {
-class NVVMReflect : public FunctionPass {
+class NVVMReflect : public ModulePass {
+private:
+ StringMap<int> VarMap;
+ /// Process a reflect function by finding all its uses and replacing them with
+ /// appropriate constant values. For __CUDA_FTZ, uses the module flag value.
+ /// For __CUDA_ARCH, uses SmVersion * 10. For all other strings, uses 0.
+ bool handleReflectFunction(Function *F);
+ void setVarMap(Module &M);
+
public:
static char ID;
- unsigned int SmVersion;
NVVMReflect() : NVVMReflect(0) {}
- explicit NVVMReflect(unsigned int Sm) : FunctionPass(ID), SmVersion(Sm) {
+ // __CUDA_FTZ is assigned in `runOnModule` by checking nvvm-reflect-ftz module
+ // metadata.
+ explicit NVVMReflect(unsigned int Sm) : ModulePass(ID) {
+ VarMap["__CUDA_ARCH"] = Sm * 10;
initializeNVVMReflectPass(*PassRegistry::getPassRegistry());
}
-
- bool runOnFunction(Function &) override;
+ // This mapping will contain should include __CUDA_FTZ and __CUDA_ARCH values.
+ explicit NVVMReflect(const StringMap<int> &Mapping) : ModulePass(ID), VarMap(Mapping) {
+ initializeNVVMReflectPass(*PassRegistry::getPassRegistry());
+ }
+ bool runOnModule(Module &M) override;
};
} // namespace
-FunctionPass *llvm::createNVVMReflectPass(unsigned int SmVersion) {
+ModulePass *llvm::createNVVMReflectPass(unsigned int SmVersion) {
return new NVVMReflect(SmVersion);
}
@@ -78,27 +91,51 @@ INITIALIZE_PASS(NVVMReflect, "nvvm-reflect",
"Replace occurrences of __nvvm_reflect() calls with 0/1", false,
false)
-static bool runNVVMReflect(Function &F, unsigned SmVersion) {
- if (!NVVMReflectEnabled)
- return false;
+static cl::list<std::string>
+ ReflectList("nvvm-reflect-list", cl::value_desc("name=<int>"), cl::Hidden,
+ cl::desc("A list of string=num assignments"),
+ cl::ValueRequired);
- if (F.getName() == NVVM_REFLECT_FUNCTION ||
- F.getName() == NVVM_REFLECT_OCL_FUNCTION) {
- assert(F.isDeclaration() && "_reflect function should not have a body");
- assert(F.getReturnType()->isIntegerTy() &&
- "_reflect's return type should be integer");
- return false;
+/// The command line can look as follows :
+/// -nvvm-reflect-list a=1,b=2 -nvvm-reflect-list c=3,d=0 -R e=2
+/// The strings "a=1,b=2", "c=3,d=0", "e=2" are available in the
+/// ReflectList vector. First, each of ReflectList[i] is 'split'
+/// using "," as the delimiter. Then each of this part is split
+/// using "=" as the delimiter.
+void NVVMReflect::setVarMap(Module &M) {
+ if (auto *Flag = mdconst::extract_or_null<ConstantInt>(
+ M.getModuleFlag("nvvm-reflect-ftz")))
+ VarMap["__CUDA_FTZ"] = Flag->getSExtValue();
+
+ for (unsigned I = 0, E = ReflectList.size(); I != E; ++I) {
+ LLVM_DEBUG(dbgs() << "Option : " << ReflectList[I] << "\n");
+ SmallVector<StringRef, 4> NameValList;
+ StringRef(ReflectList[I]).split(NameValList, ",");
+ for (unsigned J = 0, EJ = NameValList.size(); J != EJ; ++J) {
+ SmallVector<StringRef, 2> NameValPair;
+ NameValList[J].split(NameValPair, "=");
+ assert(NameValPair.size() == 2 && "name=val expected");
+ StringRef ValStr = NameValPair[1].trim();
+ int Val;
+ if (ValStr.getAsInteger(10, Val))
+ report_fatal_error("integer value expected");
+ VarMap[NameValPair[0]] = Val;
+ }
}
+}
+
+bool NVVMReflect::handleReflectFunction(Function *F) {
+ // Validate _reflect function
+ assert(F->isDeclaration() && "_reflect function should not have a body");
+ assert(F->getReturnType()->isIntegerTy() &&
+ "_reflect's return type should be integer");
SmallVector<Instruction *, 4> ToRemove;
SmallVector<Instruction *, 4> ToSimplify;
- // Go through the calls in this function. Each call to __nvvm_reflect or
- // llvm.nvvm.reflect should be a CallInst with a ConstantArray argument.
- // First validate that. If the c-string corresponding to the ConstantArray can
- // be found successfully, see if it can be found in VarMap. If so, replace the
- // uses of CallInst with the value found in VarMap. If not, replace the use
- // with value 0.
+ // Go through the uses of the reflect function. Each use should be a CallInst
+ // with a ConstantArray argument. Replace the uses with the appropriate
+ // constant values.
// The IR for __nvvm_reflect calls differs between CUDA versions.
//
@@ -119,15 +156,10 @@ static bool runNVVMReflect(Function &F, unsigned SmVersion) {
//
// In this case, we get a Constant with a GlobalVariable operand and we need
// to dig deeper to find its initializer with the string we'll use for lookup.
- for (Instruction &I : instructions(F)) {
- CallInst *Call = dyn_cast<CallInst>(&I);
- if (!Call)
- continue;
- Function *Callee = Call->getCalledFunction();
- if (!Callee || (Callee->getName() != NVVM_REFLECT_FUNCTION &&
- Callee->getName() != NVVM_REFLECT_OCL_FUNCTION &&
- Callee->getIntrinsicID() != Intrinsic::nvvm_reflect))
- continue;
+
+ for (User *U : F->users()) {
+ assert(isa<CallInst>(U) && "Only a call instruction can use _reflect");
+ CallInst *Call = cast<CallInst>(U);
// FIXME: Improve error handling here and elsewhere in this pass.
assert(Call->getNumOperands() == 2 &&
@@ -162,20 +194,15 @@ static bool runNVVMReflect(Function &F, unsigned SmVersion) {
"Format of _reflect function not recognized");
StringRef ReflectArg = cast<ConstantDataSequential>(Operand)->getAsString();
+ // Remove the null terminator from the string
ReflectArg = ReflectArg.substr(0, ReflectArg.size() - 1);
LLVM_DEBUG(dbgs() << "Arg of _reflect : " << ReflectArg << "\n");
int ReflectVal = 0; // The default value is 0
- if (ReflectArg == "__CUDA_FTZ") {
- // Try to pull __CUDA_FTZ from the nvvm-reflect-ftz module flag. Our
- // choice here must be kept in sync with AutoUpgrade, which uses the same
- // technique to detect whether ftz is enabled.
- if (auto *Flag = mdconst::extract_or_null<ConstantInt>(
- F.getParent()->getModuleFlag("nvvm-reflect-ftz")))
- ReflectVal = Flag->getSExtValue();
- } else if (ReflectArg == "__CUDA_ARCH") {
- ReflectVal = SmVersion * 10;
+ if (VarMap.contains(ReflectArg)) {
+ ReflectVal = VarMap[ReflectArg];
}
+ LLVM_DEBUG(dbgs() << "ReflectVal: " << ReflectVal << "\n");
// If the immediate user is a simple comparison we want to simplify it.
for (User *U : Call->users())
@@ -191,7 +218,7 @@ static bool runNVVMReflect(Function &F, unsigned SmVersion) {
// until we find a terminator that we can then remove.
while (!ToSimplify.empty()) {
Instruction *I = ToSimplify.pop_back_val();
- if (Constant *C = ConstantFoldInstruction(I, F.getDataLayout())) {
+ if (Constant *C = ConstantFoldInstruction(I, F->getDataLayout())) {
for (User *U : I->users())
if (Instruction *I = dyn_cast<Instruction>(U))
ToSimplify.push_back(I);
@@ -208,23 +235,45 @@ static bool runNVVMReflect(Function &F, unsigned SmVersion) {
// Removing via isInstructionTriviallyDead may add duplicates to the ToRemove
// array. Filter out the duplicates before starting to erase from parent.
std::sort(ToRemove.begin(), ToRemove.end());
- auto NewLastIter = llvm::unique(ToRemove);
+ auto *NewLastIter = llvm::unique(ToRemove);
ToRemove.erase(NewLastIter, ToRemove.end());
for (Instruction *I : ToRemove)
I->eraseFromParent();
+ // Remove the __nvvm_reflect function from the module
+ F->eraseFromParent();
return ToRemove.size() > 0;
}
-bool NVVMReflect::runOnFunction(Function &F) {
- return runNVVMReflect(F, SmVersion);
-}
+bool NVVMReflect::runOnModule(Module &M) {
+ if (!NVVMReflectEnabled)
+ return false;
+
+ setVarMap(M);
-NVVMReflectPass::NVVMReflectPass() : NVVMReflectPass(0) {}
+ bool Changed = false;
+ // Names of reflect function to find and replace
+ SmallVector<std::string, 3> ReflectNames = {
+ NVVM_REFLECT_FUNCTION, NVVM_REFLECT_OCL_FUNCTION,
+ Intrinsic::getName(Intrinsic::nvvm_reflect).str()};
+
+ // Process all reflect functions
+ for (const std::string &Name : ReflectNames) {
+ Function *ReflectFunction = M.getFunction(Name);
+ if (ReflectFunction) {
+ Changed |= handleReflectFunction(ReflectFunction);
+ }
+ }
+
+ return Changed;
+}
-PreservedAnalyses NVVMReflectPass::run(Function &F,
- FunctionAnalysisManager &AM) {
- return runNVVMReflect(F, SmVersion) ? PreservedAnalyses::none()
- : PreservedAnalyses::all();
+// Implementations for the pass that works with the new pass manager.
+NVVMReflectPass::NVVMReflectPass(unsigned SmVersion) {
+ VarMap["__CUDA_ARCH"] = SmVersion * 10;
}
+PreservedAnalyses NVVMReflectPass::run(Module &M, ModuleAnalysisManager &AM) {
+ return NVVMReflect(VarMap).runOnModule(M) ? PreservedAnalyses::none()
+ : PreservedAnalyses::all();
+}
\ No newline at end of file
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like some nice cleanup. Please add tests for the new debugging option.
SmallVector<StringRef, 4> NameValList; | ||
StringRef(ReflectList[I]).split(NameValList, ","); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use the split
iterator in StringExtras here as it should be simpler and more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to implement the comma-splitting? Wouldn't cl::list handle multiple values for us already if the user specifies the option more than once? If cl::list already handles multiple values, then we should probably rename the option into --nvvm-reflect-add
or something...
Also, using a more modern C++ features for iteration may make work well here.
E.g.
for (StringRef O: ReflectList) {
auto [Key, Value] = O.split("=");
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the splitting more modern.
For the comma-splitting, I think you need it. If you do ... --nvvm-reflect-add a=1,b=2 --nvm-reflect-add c=3,d=4,
then the reflect list looks like {"a=1,b=2", "c=3,d=4"}. So for each option specified you still need to split on the comma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option -> one key=value pair. There's no need to support specifying multiple pairs via a single option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it fine instead to just add the cl::CommaSeparated
argument when creating the option (see current version of code)? Now, like you said before, cl::list
handles if the user specifies the option more than once and it splits on the commas for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's indeed more than one way to do things. The question is -- which one makes more sense?
The way I see it the CLI should be as simple to implement, and as straightforward to use as we can make it. From that standpoint along the comma separated list of key=value pairs will stand out as a sore thumb. In some cases we have to do that, but this is not one of them. The "standard" way for the options is to specify one thing per option. E.g we're not passing comma-separated list of macros to define with -D
. Speaking of which, -D
is actually semantically very close to what you're trying to do, as NVVMReflect is, effectively, a preprocessor for the IR, and your new option defines one of the macro values for it.
So, let's keep thing simple, and define one reflect value at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I fixed it so it has the desired behavior. Thanks for all the great feedback!!
✅ With the latest revision this PR passed the C/C++ code formatter. |
@AlexMaclean do you think we could reuse intrinsic autoupgrade machinery for this, instead of making reflect processing more complicated? It could be somewhat useful for other purposes. @jhuber6 would something like that help with some of your offloading cases. I recall you did run into trouble with NVVMReflect a while back. |
All that really matters for correctness is that this pass is always run and it always does constant prop + DCE when it's used as an edge directly. |
It's an interesting idea, and I agree that intrinsics seem marginally cleaner than relying on string arguments, but I don't think there will be much benefit in the end. For correctness, we'll still need something like NVVMReflect to run to fold these calls away and do the constant prop + DCE. I think with some cleanup this new approach will not be more complicated. In some ways I think this allows for simplification because we shouldn't need to be as careful about deleting and simplifying now that we're not iterating of instructions in a function. |
return false; | ||
static cl::list<std::string> | ||
ReflectList("nvvm-reflect-list", cl::value_desc("name=<int>"), cl::Hidden, | ||
cl::desc("A list of string=num assignments"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: ".. list of comma-separated key= values"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
SmallVector<StringRef, 4> NameValList; | ||
StringRef(ReflectList[I]).split(NameValList, ","); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to implement the comma-splitting? Wouldn't cl::list handle multiple values for us already if the user specifies the option more than once? If cl::list already handles multiple values, then we should probably rename the option into --nvvm-reflect-add
or something...
Also, using a more modern C++ features for iteration may make work well here.
E.g.
for (StringRef O: ReflectList) {
auto [Key, Value] = O.split("=");
...
}
It's also worth noting that #134016 is AMD's attempt at doing the same functionality. |
903731a
to
cbf8664
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking lots better, couple minor stylistic issues to fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks essentially good to go. As a stylistic matter I think the const
qualifier has been somewhat over-applied. In general I think it makes sense to add const
only when it improves readability, and not where ever possible.
@Artem-B Currently I'm failing a Clang test because of the fact that I switched NVVMReflect to be a module pass now instead of a function pass. The test is `clang/test/CodeGen/builtins-nvptx.c:
I'm failing now because the NVVMReflect pass is getting run and removing the call. The difference that's causing it to get run now is in
I changed this in my PR so that the reflect pass is added to the module pass manager, not the function pass manager. For some reason, these function passes added in the callback are NOT run in the lit-test, but the module pass is. For reference, the clang cmd run is: Do you know if this is an issue or why this is happening? |
I don't know why there's a difference in whether reflect gets to run or not while emitting unoptimized IR. In the end it does not really matter much. We can adjust the test to verify that we see the intended reflect value in the generated IR. |
This option worked. If I add it to all the clang compile commands for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please wait for @Artem-B's approval before landing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the code. Tests could use a bit more polishing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there. Few more test nits.
968a917
to
47f280e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
e61755d
to
ac0ffbf
Compare
Merging on behalf of @YonahGoldberg at his request offline. |
@YonahGoldberg Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
The NVVMReflect pass simply replaces calls to nvvm-reflect functions with the appropriate constant, either the architecture number, or nvvm-reflect-ftz, found in the module's metadata. The implementation is inefficient and does this by traversing through all instructions to find calls. The common case is that you never call nvvm-reflect, so this traversal is costly. This PR: - Updates the pass so that it finds the reflect functions by name, and then traverses through their uses to find the calls directly. - Adds a line (245) to make sure the dead nvvm-reflect definitions are erased. - Adds the ability to set reflect values via command line
…tion (llvm#135825) This was already declared in NVPTX.h and I accidentally added it back in llvm#134416.
The NVVMReflect pass simply replaces calls to nvvm-reflect functions with the appropriate constant, either the architecture number, or nvvm-reflect-ftz, found in the module's metadata.
The implementation is inefficient and does this by traversing through all instructions to find calls. The common case is that you never call nvvm-reflect, so this traversal is costly.
This PR: