Skip to content

[NVPTX] Improve NVVMReflect Efficiency #134416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Apr 11, 2025
Merged

Conversation

YonahGoldberg
Copy link
Contributor

The NVVMReflect pass simply replaces calls to nvvm-reflect functions with the appropriate constant, either the architecture number, or nvvm-reflect-ftz, found in the module's metadata.

The implementation is inefficient and does this by traversing through all instructions to find calls. The common case is that you never call nvvm-reflect, so this traversal is costly.

This PR:

  • Updates the pass so that it finds the reflect functions by name, and then traverses through their uses to find the calls directly.
  • Adds a line (245) to make sure the dead nvvm-reflect definitions are erased.
  • Adds the ability to set reflect values via command line

Copy link

github-actions bot commented Apr 4, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Apr 4, 2025

@llvm/pr-subscribers-backend-nvptx

Author: Yonah Goldberg (YonahGoldberg)

Changes

The NVVMReflect pass simply replaces calls to nvvm-reflect functions with the appropriate constant, either the architecture number, or nvvm-reflect-ftz, found in the module's metadata.

The implementation is inefficient and does this by traversing through all instructions to find calls. The common case is that you never call nvvm-reflect, so this traversal is costly.

This PR:

  • Updates the pass so that it finds the reflect functions by name, and then traverses through their uses to find the calls directly.
  • Adds a line (245) to make sure the dead nvvm-reflect definitions are erased.
  • Adds the ability to set reflect values via command line

Full diff: https://github.com/llvm/llvm-project/pull/134416.diff

4 Files Affected:

  • (modified) llvm/lib/Target/NVPTX/NVPTX.h (+5-5)
  • (modified) llvm/lib/Target/NVPTX/NVPTXPassRegistry.def (+1-1)
  • (modified) llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp (+3-2)
  • (modified) llvm/lib/Target/NVPTX/NVVMReflect.cpp (+101-52)
diff --git a/llvm/lib/Target/NVPTX/NVPTX.h b/llvm/lib/Target/NVPTX/NVPTX.h
index 20a5bf46dc06b..8efa0bb546546 100644
--- a/llvm/lib/Target/NVPTX/NVPTX.h
+++ b/llvm/lib/Target/NVPTX/NVPTX.h
@@ -43,7 +43,7 @@ ModulePass *createNVPTXAssignValidGlobalNamesPass();
 ModulePass *createGenericToNVVMLegacyPass();
 ModulePass *createNVPTXCtorDtorLoweringLegacyPass();
 FunctionPass *createNVVMIntrRangePass();
-FunctionPass *createNVVMReflectPass(unsigned int SmVersion);
+ModulePass *createNVVMReflectPass(unsigned int SmVersion);
 MachineFunctionPass *createNVPTXPrologEpilogPass();
 MachineFunctionPass *createNVPTXReplaceImageHandlesPass();
 FunctionPass *createNVPTXImageOptimizerPass();
@@ -60,12 +60,12 @@ struct NVVMIntrRangePass : PassInfoMixin<NVVMIntrRangePass> {
 };
 
 struct NVVMReflectPass : PassInfoMixin<NVVMReflectPass> {
-  NVVMReflectPass();
-  NVVMReflectPass(unsigned SmVersion) : SmVersion(SmVersion) {}
-  PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
+  NVVMReflectPass() : NVVMReflectPass(0) {}
+  NVVMReflectPass(unsigned SmVersion);
+  PreservedAnalyses run(Module &F, ModuleAnalysisManager &AM);
 
 private:
-  unsigned SmVersion;
+  StringMap<int> VarMap;
 };
 
 struct GenericToNVVMPass : PassInfoMixin<GenericToNVVMPass> {
diff --git a/llvm/lib/Target/NVPTX/NVPTXPassRegistry.def b/llvm/lib/Target/NVPTX/NVPTXPassRegistry.def
index 34c79b8f77bae..1c813c2c51f70 100644
--- a/llvm/lib/Target/NVPTX/NVPTXPassRegistry.def
+++ b/llvm/lib/Target/NVPTX/NVPTXPassRegistry.def
@@ -18,6 +18,7 @@
 #endif
 MODULE_PASS("generic-to-nvvm", GenericToNVVMPass())
 MODULE_PASS("nvptx-lower-ctor-dtor", NVPTXCtorDtorLoweringPass())
+MODULE_PASS("nvvm-reflect", NVVMReflectPass())
 #undef MODULE_PASS
 
 #ifndef FUNCTION_ANALYSIS
@@ -36,7 +37,6 @@ FUNCTION_ALIAS_ANALYSIS("nvptx-aa", NVPTXAA())
 #define FUNCTION_PASS(NAME, CREATE_PASS)
 #endif
 FUNCTION_PASS("nvvm-intr-range", NVVMIntrRangePass())
-FUNCTION_PASS("nvvm-reflect", NVVMReflectPass())
 FUNCTION_PASS("nvptx-copy-byval-args", NVPTXCopyByValArgsPass())
 FUNCTION_PASS("nvptx-lower-args", NVPTXLowerArgsPass(*this));
 #undef FUNCTION_PASS
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
index 5bb168704bad0..e84b707725566 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
@@ -260,11 +260,12 @@ void NVPTXTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {
 
   PB.registerPipelineStartEPCallback(
       [this](ModulePassManager &PM, OptimizationLevel Level) {
-        FunctionPassManager FPM;
         // We do not want to fold out calls to nvvm.reflect early if the user
         // has not provided a target architecture just yet.
         if (Subtarget.hasTargetName())
-          FPM.addPass(NVVMReflectPass(Subtarget.getSmVersion()));
+          PM.addPass(NVVMReflectPass(Subtarget.getSmVersion()));
+        
+        FunctionPassManager FPM;
         // Note: NVVMIntrRangePass was causing numerical discrepancies at one
         // point, if issues crop up, consider disabling.
         FPM.addPass(NVVMIntrRangePass());
diff --git a/llvm/lib/Target/NVPTX/NVVMReflect.cpp b/llvm/lib/Target/NVPTX/NVVMReflect.cpp
index 20b8bef1899b4..ababb7f7c9d1f 100644
--- a/llvm/lib/Target/NVPTX/NVVMReflect.cpp
+++ b/llvm/lib/Target/NVPTX/NVVMReflect.cpp
@@ -4,7 +4,7 @@
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
-//===----------------------------------------------------------------------===//
+
 //
 // This pass replaces occurrences of __nvvm_reflect("foo") and llvm.nvvm.reflect
 // with an integer.
@@ -25,7 +25,6 @@
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/Function.h"
-#include "llvm/IR/InstIterator.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/IntrinsicsNVPTX.h"
@@ -39,33 +38,47 @@
 #include "llvm/Transforms/Scalar.h"
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/Local.h"
+#include "llvm/Transforms/Utils/StripGCRelocates.h"
 #include <algorithm>
 #define NVVM_REFLECT_FUNCTION "__nvvm_reflect"
 #define NVVM_REFLECT_OCL_FUNCTION "__nvvm_reflect_ocl"
 
 using namespace llvm;
 
-#define DEBUG_TYPE "nvptx-reflect"
+#define DEBUG_TYPE "nvvm-reflect"
 
 namespace llvm {
 void initializeNVVMReflectPass(PassRegistry &);
 }
 
 namespace {
-class NVVMReflect : public FunctionPass {
+class NVVMReflect : public ModulePass {
+private:
+  StringMap<int> VarMap;
+  /// Process a reflect function by finding all its uses and replacing them with
+  /// appropriate constant values. For __CUDA_FTZ, uses the module flag value.
+  /// For __CUDA_ARCH, uses SmVersion * 10. For all other strings, uses 0.
+  bool handleReflectFunction(Function *F);
+  void setVarMap(Module &M);
+
 public:
   static char ID;
-  unsigned int SmVersion;
   NVVMReflect() : NVVMReflect(0) {}
-  explicit NVVMReflect(unsigned int Sm) : FunctionPass(ID), SmVersion(Sm) {
+  // __CUDA_FTZ is assigned in `runOnModule` by checking nvvm-reflect-ftz module
+  // metadata.
+  explicit NVVMReflect(unsigned int Sm) : ModulePass(ID) {
+    VarMap["__CUDA_ARCH"] = Sm * 10;
     initializeNVVMReflectPass(*PassRegistry::getPassRegistry());
   }
-
-  bool runOnFunction(Function &) override;
+  // This mapping will contain should include __CUDA_FTZ and __CUDA_ARCH values.
+  explicit NVVMReflect(const StringMap<int> &Mapping) : ModulePass(ID), VarMap(Mapping) {
+    initializeNVVMReflectPass(*PassRegistry::getPassRegistry());
+  }
+  bool runOnModule(Module &M) override;
 };
 } // namespace
 
-FunctionPass *llvm::createNVVMReflectPass(unsigned int SmVersion) {
+ModulePass *llvm::createNVVMReflectPass(unsigned int SmVersion) {
   return new NVVMReflect(SmVersion);
 }
 
@@ -78,27 +91,51 @@ INITIALIZE_PASS(NVVMReflect, "nvvm-reflect",
                 "Replace occurrences of __nvvm_reflect() calls with 0/1", false,
                 false)
 
-static bool runNVVMReflect(Function &F, unsigned SmVersion) {
-  if (!NVVMReflectEnabled)
-    return false;
+static cl::list<std::string>
+    ReflectList("nvvm-reflect-list", cl::value_desc("name=<int>"), cl::Hidden,
+                cl::desc("A list of string=num assignments"),
+                cl::ValueRequired);
 
-  if (F.getName() == NVVM_REFLECT_FUNCTION ||
-      F.getName() == NVVM_REFLECT_OCL_FUNCTION) {
-    assert(F.isDeclaration() && "_reflect function should not have a body");
-    assert(F.getReturnType()->isIntegerTy() &&
-           "_reflect's return type should be integer");
-    return false;
+/// The command line can look as follows :
+/// -nvvm-reflect-list a=1,b=2 -nvvm-reflect-list c=3,d=0 -R e=2
+/// The strings "a=1,b=2", "c=3,d=0", "e=2" are available in the
+/// ReflectList vector. First, each of ReflectList[i] is 'split'
+/// using "," as the delimiter. Then each of this part is split
+/// using "=" as the delimiter.
+void NVVMReflect::setVarMap(Module &M) {
+  if (auto *Flag = mdconst::extract_or_null<ConstantInt>(
+          M.getModuleFlag("nvvm-reflect-ftz")))
+    VarMap["__CUDA_FTZ"] = Flag->getSExtValue();
+
+  for (unsigned I = 0, E = ReflectList.size(); I != E; ++I) {
+    LLVM_DEBUG(dbgs() << "Option : " << ReflectList[I] << "\n");
+    SmallVector<StringRef, 4> NameValList;
+    StringRef(ReflectList[I]).split(NameValList, ",");
+    for (unsigned J = 0, EJ = NameValList.size(); J != EJ; ++J) {
+      SmallVector<StringRef, 2> NameValPair;
+      NameValList[J].split(NameValPair, "=");
+      assert(NameValPair.size() == 2 && "name=val expected");
+      StringRef ValStr = NameValPair[1].trim();
+      int Val;
+      if (ValStr.getAsInteger(10, Val))
+        report_fatal_error("integer value expected");
+      VarMap[NameValPair[0]] = Val;
+    }
   }
+}
+
+bool NVVMReflect::handleReflectFunction(Function *F) {
+  // Validate _reflect function
+  assert(F->isDeclaration() && "_reflect function should not have a body");
+  assert(F->getReturnType()->isIntegerTy() &&
+         "_reflect's return type should be integer");
 
   SmallVector<Instruction *, 4> ToRemove;
   SmallVector<Instruction *, 4> ToSimplify;
 
-  // Go through the calls in this function.  Each call to __nvvm_reflect or
-  // llvm.nvvm.reflect should be a CallInst with a ConstantArray argument.
-  // First validate that. If the c-string corresponding to the ConstantArray can
-  // be found successfully, see if it can be found in VarMap. If so, replace the
-  // uses of CallInst with the value found in VarMap. If not, replace the use
-  // with value 0.
+  // Go through the uses of the reflect function. Each use should be a CallInst
+  // with a ConstantArray argument. Replace the uses with the appropriate
+  // constant values.
 
   // The IR for __nvvm_reflect calls differs between CUDA versions.
   //
@@ -119,15 +156,10 @@ static bool runNVVMReflect(Function &F, unsigned SmVersion) {
   //
   // In this case, we get a Constant with a GlobalVariable operand and we need
   // to dig deeper to find its initializer with the string we'll use for lookup.
-  for (Instruction &I : instructions(F)) {
-    CallInst *Call = dyn_cast<CallInst>(&I);
-    if (!Call)
-      continue;
-    Function *Callee = Call->getCalledFunction();
-    if (!Callee || (Callee->getName() != NVVM_REFLECT_FUNCTION &&
-                    Callee->getName() != NVVM_REFLECT_OCL_FUNCTION &&
-                    Callee->getIntrinsicID() != Intrinsic::nvvm_reflect))
-      continue;
+
+  for (User *U : F->users()) {
+    assert(isa<CallInst>(U) && "Only a call instruction can use _reflect");
+    CallInst *Call = cast<CallInst>(U);
 
     // FIXME: Improve error handling here and elsewhere in this pass.
     assert(Call->getNumOperands() == 2 &&
@@ -162,20 +194,15 @@ static bool runNVVMReflect(Function &F, unsigned SmVersion) {
            "Format of _reflect function not recognized");
 
     StringRef ReflectArg = cast<ConstantDataSequential>(Operand)->getAsString();
+    // Remove the null terminator from the string
     ReflectArg = ReflectArg.substr(0, ReflectArg.size() - 1);
     LLVM_DEBUG(dbgs() << "Arg of _reflect : " << ReflectArg << "\n");
 
     int ReflectVal = 0; // The default value is 0
-    if (ReflectArg == "__CUDA_FTZ") {
-      // Try to pull __CUDA_FTZ from the nvvm-reflect-ftz module flag.  Our
-      // choice here must be kept in sync with AutoUpgrade, which uses the same
-      // technique to detect whether ftz is enabled.
-      if (auto *Flag = mdconst::extract_or_null<ConstantInt>(
-              F.getParent()->getModuleFlag("nvvm-reflect-ftz")))
-        ReflectVal = Flag->getSExtValue();
-    } else if (ReflectArg == "__CUDA_ARCH") {
-      ReflectVal = SmVersion * 10;
+    if (VarMap.contains(ReflectArg)) {
+      ReflectVal = VarMap[ReflectArg];
     }
+    LLVM_DEBUG(dbgs() << "ReflectVal: " << ReflectVal << "\n");
 
     // If the immediate user is a simple comparison we want to simplify it.
     for (User *U : Call->users())
@@ -191,7 +218,7 @@ static bool runNVVMReflect(Function &F, unsigned SmVersion) {
   // until we find a terminator that we can then remove.
   while (!ToSimplify.empty()) {
     Instruction *I = ToSimplify.pop_back_val();
-    if (Constant *C = ConstantFoldInstruction(I, F.getDataLayout())) {
+    if (Constant *C = ConstantFoldInstruction(I, F->getDataLayout())) {
       for (User *U : I->users())
         if (Instruction *I = dyn_cast<Instruction>(U))
           ToSimplify.push_back(I);
@@ -208,23 +235,45 @@ static bool runNVVMReflect(Function &F, unsigned SmVersion) {
   // Removing via isInstructionTriviallyDead may add duplicates to the ToRemove
   // array. Filter out the duplicates before starting to erase from parent.
   std::sort(ToRemove.begin(), ToRemove.end());
-  auto NewLastIter = llvm::unique(ToRemove);
+  auto *NewLastIter = llvm::unique(ToRemove);
   ToRemove.erase(NewLastIter, ToRemove.end());
 
   for (Instruction *I : ToRemove)
     I->eraseFromParent();
 
+  // Remove the __nvvm_reflect function from the module
+  F->eraseFromParent();
   return ToRemove.size() > 0;
 }
 
-bool NVVMReflect::runOnFunction(Function &F) {
-  return runNVVMReflect(F, SmVersion);
-}
+bool NVVMReflect::runOnModule(Module &M) {
+  if (!NVVMReflectEnabled)
+    return false;
+
+  setVarMap(M);
 
-NVVMReflectPass::NVVMReflectPass() : NVVMReflectPass(0) {}
+  bool Changed = false;
+  // Names of reflect function to find and replace
+  SmallVector<std::string, 3> ReflectNames = {
+      NVVM_REFLECT_FUNCTION, NVVM_REFLECT_OCL_FUNCTION,
+      Intrinsic::getName(Intrinsic::nvvm_reflect).str()};
+
+  // Process all reflect functions
+  for (const std::string &Name : ReflectNames) {
+    Function *ReflectFunction = M.getFunction(Name);
+    if (ReflectFunction) {
+      Changed |= handleReflectFunction(ReflectFunction);
+    }
+  }
+
+  return Changed;
+}
 
-PreservedAnalyses NVVMReflectPass::run(Function &F,
-                                       FunctionAnalysisManager &AM) {
-  return runNVVMReflect(F, SmVersion) ? PreservedAnalyses::none()
-                                      : PreservedAnalyses::all();
+// Implementations for the pass that works with the new pass manager.
+NVVMReflectPass::NVVMReflectPass(unsigned SmVersion) {
+  VarMap["__CUDA_ARCH"] = SmVersion * 10;
 }
+PreservedAnalyses NVVMReflectPass::run(Module &M, ModuleAnalysisManager &AM) {
+  return NVVMReflect(VarMap).runOnModule(M) ? PreservedAnalyses::none()
+                                            : PreservedAnalyses::all();
+}
\ No newline at end of file

@YonahGoldberg
Copy link
Contributor Author

@AlexMaclean @Artem-B

Copy link
Member

@AlexMaclean AlexMaclean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some nice cleanup. Please add tests for the new debugging option.

Comment on lines 112 to 113
SmallVector<StringRef, 4> NameValList;
StringRef(ReflectList[I]).split(NameValList, ",");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use the split iterator in StringExtras here as it should be simpler and more efficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to implement the comma-splitting? Wouldn't cl::list handle multiple values for us already if the user specifies the option more than once? If cl::list already handles multiple values, then we should probably rename the option into --nvvm-reflect-add or something...

Also, using a more modern C++ features for iteration may make work well here.

E.g.

for (StringRef O: ReflectList) {
   auto [Key, Value] = O.split("=");
   ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the splitting more modern.

For the comma-splitting, I think you need it. If you do ... --nvvm-reflect-add a=1,b=2 --nvm-reflect-add c=3,d=4,
then the reflect list looks like {"a=1,b=2", "c=3,d=4"}. So for each option specified you still need to split on the comma.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option -> one key=value pair. There's no need to support specifying multiple pairs via a single option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fine instead to just add the cl::CommaSeparated argument when creating the option (see current version of code)? Now, like you said before, cl::list handles if the user specifies the option more than once and it splits on the commas for us.

Copy link
Member

@Artem-B Artem-B Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's indeed more than one way to do things. The question is -- which one makes more sense?

The way I see it the CLI should be as simple to implement, and as straightforward to use as we can make it. From that standpoint along the comma separated list of key=value pairs will stand out as a sore thumb. In some cases we have to do that, but this is not one of them. The "standard" way for the options is to specify one thing per option. E.g we're not passing comma-separated list of macros to define with -D. Speaking of which, -D is actually semantically very close to what you're trying to do, as NVVMReflect is, effectively, a preprocessor for the IR, and your new option defines one of the macro values for it.

So, let's keep thing simple, and define one reflect value at a time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I fixed it so it has the desired behavior. Thanks for all the great feedback!!

Copy link

github-actions bot commented Apr 4, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@Artem-B
Copy link
Member

Artem-B commented Apr 4, 2025

@AlexMaclean do you think we could reuse intrinsic autoupgrade machinery for this, instead of making reflect processing more complicated?

It could be somewhat useful for other purposes.
E.g. we could introduce a const-foldable (when we know the GPU we're targeting) nvvm.arch() which would return CUDA_ARCH value and upgrade nvvm_reflect to it. Bonus point is that it would also be useful for IR users to parametrize their code without relying on NVVMReflect.

@jhuber6 would something like that help with some of your offloading cases. I recall you did run into trouble with NVVMReflect a while back.

@jhuber6
Copy link
Contributor

jhuber6 commented Apr 4, 2025

@AlexMaclean do you think we could reuse intrinsic autoupgrade machinery for this, instead of making reflect processing more complicated?

It could be somewhat useful for other purposes. E.g. we could introduce a const-foldable (when we know the GPU we're targeting) nvvm.arch() which would return CUDA_ARCH value and upgrade nvvm_reflect to it. Bonus point is that it would also be useful for IR users to parametrize their code without relying on NVVMReflect.

@jhuber6 would something like that help with some of your offloading cases. I recall you did run into trouble with NVVMReflect a while back.

All that really matters for correctness is that this pass is always run and it always does constant prop + DCE when it's used as an edge directly.

@AlexMaclean
Copy link
Member

@AlexMaclean do you think we could reuse intrinsic autoupgrade machinery for this, instead of making reflect processing more complicated?

It could be somewhat useful for other purposes. E.g. we could introduce a const-foldable (when we know the GPU we're targeting) nvvm.arch() which would return CUDA_ARCH value and upgrade nvvm_reflect to it. Bonus point is that it would also be useful for IR users to parametrize their code without relying on NVVMReflect.

@jhuber6 would something like that help with some of your offloading cases. I recall you did run into trouble with NVVMReflect a while back.

It's an interesting idea, and I agree that intrinsics seem marginally cleaner than relying on string arguments, but I don't think there will be much benefit in the end. For correctness, we'll still need something like NVVMReflect to run to fold these calls away and do the constant prop + DCE.

I think with some cleanup this new approach will not be more complicated. In some ways I think this allows for simplification because we shouldn't need to be as careful about deleting and simplifying now that we're not iterating of instructions in a function.

return false;
static cl::list<std::string>
ReflectList("nvvm-reflect-list", cl::value_desc("name=<int>"), cl::Hidden,
cl::desc("A list of string=num assignments"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ".. list of comma-separated key= values"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines 112 to 113
SmallVector<StringRef, 4> NameValList;
StringRef(ReflectList[I]).split(NameValList, ",");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to implement the comma-splitting? Wouldn't cl::list handle multiple values for us already if the user specifies the option more than once? If cl::list already handles multiple values, then we should probably rename the option into --nvvm-reflect-add or something...

Also, using a more modern C++ features for iteration may make work well here.

E.g.

for (StringRef O: ReflectList) {
   auto [Key, Value] = O.split("=");
   ...
}

@jhuber6
Copy link
Contributor

jhuber6 commented Apr 4, 2025

It's also worth noting that #134016 is AMD's attempt at doing the same functionality.

@YonahGoldberg YonahGoldberg force-pushed the nvvm-reflect branch 4 times, most recently from 903731a to cbf8664 Compare April 8, 2025 22:26
Copy link
Member

@AlexMaclean AlexMaclean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking lots better, couple minor stylistic issues to fix

Copy link
Member

@AlexMaclean AlexMaclean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks essentially good to go. As a stylistic matter I think the const qualifier has been somewhat over-applied. In general I think it makes sense to add const only when it improves readability, and not where ever possible.

@YonahGoldberg
Copy link
Contributor Author

@Artem-B Currently I'm failing a Clang test because of the fact that I switched NVVMReflect to be a module pass now instead of a function pass. The test is `clang/test/CodeGen/builtins-nvptx.c:

__device__ bool reflect() {

// CHECK: call i32 @llvm.nvvm.reflect(ptr {{.*}})

  unsigned x = __nvvm_reflect("__CUDA_ARCH");
  return x >= 700;
}

I'm failing now because the NVVMReflect pass is getting run and removing the call. The difference that's causing it to get run now is in NVPTXTargetMachine.cpp:

PB.registerPipelineStartEPCallback(
      [this](ModulePassManager &PM, OptimizationLevel Level) {
        // We do not want to fold out calls to nvvm.reflect early if the user
        // has not provided a target architecture just yet.
        if (Subtarget.hasTargetName())
          PM.addPass(NVVMReflectPass(Subtarget.getSmVersion()));

        FunctionPassManager FPM;
        // Note: NVVMIntrRangePass was causing numerical discrepancies at one
        // point, if issues crop up, consider disabling.
        FPM.addPass(NVVMIntrRangePass());
        if (EarlyByValArgsCopy)
          FPM.addPass(NVPTXCopyByValArgsPass());
        PM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM)));
      });

I changed this in my PR so that the reflect pass is added to the module pass manager, not the function pass manager. For some reason, these function passes added in the callback are NOT run in the lit-test, but the module pass is.

For reference, the clang cmd run is: clang -cc1 -internal-isystem /mnt/data/ygoldberg/mainline/llvm-project/build/lib/clang/21/include -nostdsysteminc -ffp-contract=off -triple nvptx64-unknown-unknown -target-cpu sm_70 -target-feature +ptx63 -fcuda-is-device -emit-llvm -o - -x cuda /mnt/data/ygoldberg/mainline/llvm-project/clang/test/CodeGen/builtins-nvptx.c

Do you know if this is an issue or why this is happening?

@Artem-B
Copy link
Member

Artem-B commented Apr 9, 2025

I don't know why there's a difference in whether reflect gets to run or not while emitting unoptimized IR.
Can you try adding -disable-llvm-optzns and see if that helps?

In the end it does not really matter much. We can adjust the test to verify that we see the intended reflect value in the generated IR.

@YonahGoldberg
Copy link
Contributor Author

This option worked. If I add it to all the clang compile commands for the builtins-nvptx.c, all tests pass. Should we just go ahead and commit that instead of changing the reflect test?

Copy link
Member

@AlexMaclean AlexMaclean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please wait for @Artem-B's approval before landing.

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the code. Tests could use a bit more polishing.

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. Few more test nits.

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@AlexMaclean
Copy link
Member

Merging on behalf of @YonahGoldberg at his request offline.

@AlexMaclean AlexMaclean merged commit 701d726 into llvm:main Apr 11, 2025
11 checks passed
Copy link

@YonahGoldberg Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

AlexMaclean pushed a commit that referenced this pull request Apr 16, 2025
…tion (#135825)

This was already declared in NVPTX.h and I accidentally added it back in
#134416.
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
The NVVMReflect pass simply replaces calls to nvvm-reflect functions
with the appropriate constant, either the architecture number, or
nvvm-reflect-ftz, found in the module's metadata.

The implementation is inefficient and does this by traversing through
all instructions to find calls. The common case is that you never call
nvvm-reflect, so this traversal is costly.

This PR:
- Updates the pass so that it finds the reflect functions by name, and
then traverses through their uses to find the calls directly.
- Adds a line (245) to make sure the dead nvvm-reflect definitions are
erased.
- Adds the ability to set reflect values via command line
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…tion (llvm#135825)

This was already declared in NVPTX.h and I accidentally added it back in
llvm#134416.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants