[NVPTX] Improve NVVMReflect Efficiency #134416

YonahGoldberg · 2025-04-04T16:57:13Z

The NVVMReflect pass simply replaces calls to nvvm-reflect functions with the appropriate constant, either the architecture number, or nvvm-reflect-ftz, found in the module's metadata.

The implementation is inefficient and does this by traversing through all instructions to find calls. The common case is that you never call nvvm-reflect, so this traversal is costly.

This PR:

Updates the pass so that it finds the reflect functions by name, and then traverses through their uses to find the calls directly.
Adds a line (245) to make sure the dead nvvm-reflect definitions are erased.
Adds the ability to set reflect values via command line

github-actions · 2025-04-04T16:57:31Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-04-04T16:58:02Z

@llvm/pr-subscribers-backend-nvptx

Author: Yonah Goldberg (YonahGoldberg)

Changes

The NVVMReflect pass simply replaces calls to nvvm-reflect functions with the appropriate constant, either the architecture number, or nvvm-reflect-ftz, found in the module's metadata.

The implementation is inefficient and does this by traversing through all instructions to find calls. The common case is that you never call nvvm-reflect, so this traversal is costly.

This PR:

Updates the pass so that it finds the reflect functions by name, and then traverses through their uses to find the calls directly.
Adds a line (245) to make sure the dead nvvm-reflect definitions are erased.
Adds the ability to set reflect values via command line

Full diff: https://github.com/llvm/llvm-project/pull/134416.diff

4 Files Affected:

(modified) llvm/lib/Target/NVPTX/NVPTX.h (+5-5)
(modified) llvm/lib/Target/NVPTX/NVPTXPassRegistry.def (+1-1)
(modified) llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp (+3-2)
(modified) llvm/lib/Target/NVPTX/NVVMReflect.cpp (+101-52)

diff --git a/llvm/lib/Target/NVPTX/NVPTX.h b/llvm/lib/Target/NVPTX/NVPTX.h
index 20a5bf46dc06b..8efa0bb546546 100644
--- a/llvm/lib/Target/NVPTX/NVPTX.h
+++ b/llvm/lib/Target/NVPTX/NVPTX.h
@@ -43,7 +43,7 @@ ModulePass *createNVPTXAssignValidGlobalNamesPass();
 ModulePass *createGenericToNVVMLegacyPass();
 ModulePass *createNVPTXCtorDtorLoweringLegacyPass();
 FunctionPass *createNVVMIntrRangePass();
-FunctionPass *createNVVMReflectPass(unsigned int SmVersion);
+ModulePass *createNVVMReflectPass(unsigned int SmVersion);
 MachineFunctionPass *createNVPTXPrologEpilogPass();
 MachineFunctionPass *createNVPTXReplaceImageHandlesPass();
 FunctionPass *createNVPTXImageOptimizerPass();
@@ -60,12 +60,12 @@ struct NVVMIntrRangePass : PassInfoMixin<NVVMIntrRangePass> {
 };
 
 struct NVVMReflectPass : PassInfoMixin<NVVMReflectPass> {
-  NVVMReflectPass();
-  NVVMReflectPass(unsigned SmVersion) : SmVersion(SmVersion) {}
-  PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
+  NVVMReflectPass() : NVVMReflectPass(0) {}
+  NVVMReflectPass(unsigned SmVersion);
+  PreservedAnalyses run(Module &F, ModuleAnalysisManager &AM);
 
 private:
-  unsigned SmVersion;
+  StringMap<int> VarMap;
 };
 
 struct GenericToNVVMPass : PassInfoMixin<GenericToNVVMPass> {
diff --git a/llvm/lib/Target/NVPTX/NVPTXPassRegistry.def b/llvm/lib/Target/NVPTX/NVPTXPassRegistry.def
index 34c79b8f77bae..1c813c2c51f70 100644
--- a/llvm/lib/Target/NVPTX/NVPTXPassRegistry.def
+++ b/llvm/lib/Target/NVPTX/NVPTXPassRegistry.def
@@ -18,6 +18,7 @@
 #endif
 MODULE_PASS("generic-to-nvvm", GenericToNVVMPass())
 MODULE_PASS("nvptx-lower-ctor-dtor", NVPTXCtorDtorLoweringPass())
+MODULE_PASS("nvvm-reflect", NVVMReflectPass())
 #undef MODULE_PASS
 
 #ifndef FUNCTION_ANALYSIS
@@ -36,7 +37,6 @@ FUNCTION_ALIAS_ANALYSIS("nvptx-aa", NVPTXAA())
 #define FUNCTION_PASS(NAME, CREATE_PASS)
 #endif
 FUNCTION_PASS("nvvm-intr-range", NVVMIntrRangePass())
-FUNCTION_PASS("nvvm-reflect", NVVMReflectPass())
 FUNCTION_PASS("nvptx-copy-byval-args", NVPTXCopyByValArgsPass())
 FUNCTION_PASS("nvptx-lower-args", NVPTXLowerArgsPass(*this));
 #undef FUNCTION_PASS
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
index 5bb168704bad0..e84b707725566 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
@@ -260,11 +260,12 @@ void NVPTXTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {
 
   PB.registerPipelineStartEPCallback(
       [this](ModulePassManager &PM, OptimizationLevel Level) {
-        FunctionPassManager FPM;
         // We do not want to fold out calls to nvvm.reflect early if the user
         // has not provided a target architecture just yet.
         if (Subtarget.hasTargetName())
-          FPM.addPass(NVVMReflectPass(Subtarget.getSmVersion()));
+          PM.addPass(NVVMReflectPass(Subtarget.getSmVersion()));
+        
+        FunctionPassManager FPM;
         // Note: NVVMIntrRangePass was causing numerical discrepancies at one
         // point, if issues crop up, consider disabling.
         FPM.addPass(NVVMIntrRangePass());
diff --git a/llvm/lib/Target/NVPTX/NVVMReflect.cpp b/llvm/lib/Target/NVPTX/NVVMReflect.cpp
index 20b8bef1899b4..ababb7f7c9d1f 100644
--- a/llvm/lib/Target/NVPTX/NVVMReflect.cpp
+++ b/llvm/lib/Target/NVPTX/NVVMReflect.cpp
@@ -4,7 +4,7 @@
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
-//===----------------------------------------------------------------------===//
+
 //
 // This pass replaces occurrences of __nvvm_reflect("foo") and llvm.nvvm.reflect
 // with an integer.
@@ -25,7 +25,6 @@
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/Function.h"
-#include "llvm/IR/InstIterator.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/IntrinsicsNVPTX.h"
@@ -39,33 +38,47 @@
 #include "llvm/Transforms/Scalar.h"
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/Local.h"
+#include "llvm/Transforms/Utils/StripGCRelocates.h"
 #include <algorithm>
 #define NVVM_REFLECT_FUNCTION "__nvvm_reflect"
 #define NVVM_REFLECT_OCL_FUNCTION "__nvvm_reflect_ocl"
 
 using namespace llvm;
 
-#define DEBUG_TYPE "nvptx-reflect"
+#define DEBUG_TYPE "nvvm-reflect"
 
 namespace llvm {
 void initializeNVVMReflectPass(PassRegistry &);
 }
 
 namespace {
-class NVVMReflect : public FunctionPass {
+class NVVMReflect : public ModulePass {
+private:
+  StringMap<int> VarMap;
+  /// Process a reflect function by finding all its uses and replacing them with
+  /// appropriate constant values. For __CUDA_FTZ, uses the module flag value.
+  /// For __CUDA_ARCH, uses SmVersion * 10. For all other strings, uses 0.
+  bool handleReflectFunction(Function *F);
+  void setVarMap(Module &M);
+
 public:
   static char ID;
-  unsigned int SmVersion;
   NVVMReflect() : NVVMReflect(0) {}
-  explicit NVVMReflect(unsigned int Sm) : FunctionPass(ID), SmVersion(Sm) {
+  // __CUDA_FTZ is assigned in `runOnModule` by checking nvvm-reflect-ftz module
+  // metadata.
+  explicit NVVMReflect(unsigned int Sm) : ModulePass(ID) {
+    VarMap["__CUDA_ARCH"] = Sm * 10;
     initializeNVVMReflectPass(*PassRegistry::getPassRegistry());
   }
-
-  bool runOnFunction(Function &) override;
+  // This mapping will contain should include __CUDA_FTZ and __CUDA_ARCH values.
+  explicit NVVMReflect(const StringMap<int> &Mapping) : ModulePass(ID), VarMap(Mapping) {
+    initializeNVVMReflectPass(*PassRegistry::getPassRegistry());
+  }
+  bool runOnModule(Module &M) override;
 };
 } // namespace
 
-FunctionPass *llvm::createNVVMReflectPass(unsigned int SmVersion) {
+ModulePass *llvm::createNVVMReflectPass(unsigned int SmVersion) {
   return new NVVMReflect(SmVersion);
 }
 
@@ -78,27 +91,51 @@ INITIALIZE_PASS(NVVMReflect, "nvvm-reflect",
                 "Replace occurrences of __nvvm_reflect() calls with 0/1", false,
                 false)
 
-static bool runNVVMReflect(Function &F, unsigned SmVersion) {
-  if (!NVVMReflectEnabled)
-    return false;
+static cl::list<std::string>
+    ReflectList("nvvm-reflect-list", cl::value_desc("name=<int>"), cl::Hidden,
+                cl::desc("A list of string=num assignments"),
+                cl::ValueRequired);
 
-  if (F.getName() == NVVM_REFLECT_FUNCTION ||
-      F.getName() == NVVM_REFLECT_OCL_FUNCTION) {
-    assert(F.isDeclaration() && "_reflect function should not have a body");
-    assert(F.getReturnType()->isIntegerTy() &&
-           "_reflect's return type should be integer");
-    return false;
+/// The command line can look as follows :
+/// -nvvm-reflect-list a=1,b=2 -nvvm-reflect-list c=3,d=0 -R e=2
+/// The strings "a=1,b=2", "c=3,d=0", "e=2" are available in the
+/// ReflectList vector. First, each of ReflectList[i] is 'split'
+/// using "," as the delimiter. Then each of this part is split
+/// using "=" as the delimiter.
+void NVVMReflect::setVarMap(Module &M) {
+  if (auto *Flag = mdconst::extract_or_null<ConstantInt>(
+          M.getModuleFlag("nvvm-reflect-ftz")))
+    VarMap["__CUDA_FTZ"] = Flag->getSExtValue();
+
+  for (unsigned I = 0, E = ReflectList.size(); I != E; ++I) {
+    LLVM_DEBUG(dbgs() << "Option : " << ReflectList[I] << "\n");
+    SmallVector<StringRef, 4> NameValList;
+    StringRef(ReflectList[I]).split(NameValList, ",");
+    for (unsigned J = 0, EJ = NameValList.size(); J != EJ; ++J) {
+      SmallVector<StringRef, 2> NameValPair;
+      NameValList[J].split(NameValPair, "=");
+      assert(NameValPair.size() == 2 && "name=val expected");
+      StringRef ValStr = NameValPair[1].trim();
+      int Val;
+      if (ValStr.getAsInteger(10, Val))
+        report_fatal_error("integer value expected");
+      VarMap[NameValPair[0]] = Val;
+    }
   }
+}
+
+bool NVVMReflect::handleReflectFunction(Function *F) {
+  // Validate _reflect function
+  assert(F->isDeclaration() && "_reflect function should not have a body");
+  assert(F->getReturnType()->isIntegerTy() &&
+         "_reflect's return type should be integer");
 
   SmallVector<Instruction *, 4> ToRemove;
   SmallVector<Instruction *, 4> ToSimplify;
 
-  // Go through the calls in this function.  Each call to __nvvm_reflect or
-  // llvm.nvvm.reflect should be a CallInst with a ConstantArray argument.
-  // First validate that. If the c-string corresponding to the ConstantArray can
-  // be found successfully, see if it can be found in VarMap. If so, replace the
-  // uses of CallInst with the value found in VarMap. If not, replace the use
-  // with value 0.
+  // Go through the uses of the reflect function. Each use should be a CallInst
+  // with a ConstantArray argument. Replace the uses with the appropriate
+  // constant values.
 
   // The IR for __nvvm_reflect calls differs between CUDA versions.
   //
@@ -119,15 +156,10 @@ static bool runNVVMReflect(Function &F, unsigned SmVersion) {
   //
   // In this case, we get a Constant with a GlobalVariable operand and we need
   // to dig deeper to find its initializer with the string we'll use for lookup.
-  for (Instruction &I : instructions(F)) {
-    CallInst *Call = dyn_cast<CallInst>(&I);
-    if (!Call)
-      continue;
-    Function *Callee = Call->getCalledFunction();
-    if (!Callee || (Callee->getName() != NVVM_REFLECT_FUNCTION &&
-                    Callee->getName() != NVVM_REFLECT_OCL_FUNCTION &&
-                    Callee->getIntrinsicID() != Intrinsic::nvvm_reflect))
-      continue;
+
+  for (User *U : F->users()) {
+    assert(isa<CallInst>(U) && "Only a call instruction can use _reflect");
+    CallInst *Call = cast<CallInst>(U);
 
     // FIXME: Improve error handling here and elsewhere in this pass.
     assert(Call->getNumOperands() == 2 &&
@@ -162,20 +194,15 @@ static bool runNVVMReflect(Function &F, unsigned SmVersion) {
            "Format of _reflect function not recognized");
 
     StringRef ReflectArg = cast<ConstantDataSequential>(Operand)->getAsString();
+    // Remove the null terminator from the string
     ReflectArg = ReflectArg.substr(0, ReflectArg.size() - 1);
     LLVM_DEBUG(dbgs() << "Arg of _reflect : " << ReflectArg << "\n");
 
     int ReflectVal = 0; // The default value is 0
-    if (ReflectArg == "__CUDA_FTZ") {
-      // Try to pull __CUDA_FTZ from the nvvm-reflect-ftz module flag.  Our
-      // choice here must be kept in sync with AutoUpgrade, which uses the same
-      // technique to detect whether ftz is enabled.
-      if (auto *Flag = mdconst::extract_or_null<ConstantInt>(
-              F.getParent()->getModuleFlag("nvvm-reflect-ftz")))
-        ReflectVal = Flag->getSExtValue();
-    } else if (ReflectArg == "__CUDA_ARCH") {
-      ReflectVal = SmVersion * 10;
+    if (VarMap.contains(ReflectArg)) {
+      ReflectVal = VarMap[ReflectArg];
     }
+    LLVM_DEBUG(dbgs() << "ReflectVal: " << ReflectVal << "\n");
 
     // If the immediate user is a simple comparison we want to simplify it.
     for (User *U : Call->users())
@@ -191,7 +218,7 @@ static bool runNVVMReflect(Function &F, unsigned SmVersion) {
   // until we find a terminator that we can then remove.
   while (!ToSimplify.empty()) {
     Instruction *I = ToSimplify.pop_back_val();
-    if (Constant *C = ConstantFoldInstruction(I, F.getDataLayout())) {
+    if (Constant *C = ConstantFoldInstruction(I, F->getDataLayout())) {
       for (User *U : I->users())
         if (Instruction *I = dyn_cast<Instruction>(U))
           ToSimplify.push_back(I);
@@ -208,23 +235,45 @@ static bool runNVVMReflect(Function &F, unsigned SmVersion) {
   // Removing via isInstructionTriviallyDead may add duplicates to the ToRemove
   // array. Filter out the duplicates before starting to erase from parent.
   std::sort(ToRemove.begin(), ToRemove.end());
-  auto NewLastIter = llvm::unique(ToRemove);
+  auto *NewLastIter = llvm::unique(ToRemove);
   ToRemove.erase(NewLastIter, ToRemove.end());
 
   for (Instruction *I : ToRemove)
     I->eraseFromParent();
 
+  // Remove the __nvvm_reflect function from the module
+  F->eraseFromParent();
   return ToRemove.size() > 0;
 }
 
-bool NVVMReflect::runOnFunction(Function &F) {
-  return runNVVMReflect(F, SmVersion);
-}
+bool NVVMReflect::runOnModule(Module &M) {
+  if (!NVVMReflectEnabled)
+    return false;
+
+  setVarMap(M);
 
-NVVMReflectPass::NVVMReflectPass() : NVVMReflectPass(0) {}
+  bool Changed = false;
+  // Names of reflect function to find and replace
+  SmallVector<std::string, 3> ReflectNames = {
+      NVVM_REFLECT_FUNCTION, NVVM_REFLECT_OCL_FUNCTION,
+      Intrinsic::getName(Intrinsic::nvvm_reflect).str()};
+
+  // Process all reflect functions
+  for (const std::string &Name : ReflectNames) {
+    Function *ReflectFunction = M.getFunction(Name);
+    if (ReflectFunction) {
+      Changed |= handleReflectFunction(ReflectFunction);
+    }
+  }
+
+  return Changed;
+}
 
-PreservedAnalyses NVVMReflectPass::run(Function &F,
-                                       FunctionAnalysisManager &AM) {
-  return runNVVMReflect(F, SmVersion) ? PreservedAnalyses::none()
-                                      : PreservedAnalyses::all();
+// Implementations for the pass that works with the new pass manager.
+NVVMReflectPass::NVVMReflectPass(unsigned SmVersion) {
+  VarMap["__CUDA_ARCH"] = SmVersion * 10;
 }
+PreservedAnalyses NVVMReflectPass::run(Module &M, ModuleAnalysisManager &AM) {
+  return NVVMReflect(VarMap).runOnModule(M) ? PreservedAnalyses::none()
+                                            : PreservedAnalyses::all();
+}
\ No newline at end of file

YonahGoldberg · 2025-04-04T16:58:25Z

@AlexMaclean @Artem-B

llvm/lib/Target/NVPTX/NVVMReflect.cpp

AlexMaclean

Looks like some nice cleanup. Please add tests for the new debugging option.

llvm/lib/Target/NVPTX/NVVMReflect.cpp

AlexMaclean · 2025-04-04T17:25:14Z

llvm/lib/Target/NVPTX/NVVMReflect.cpp

+    SmallVector<StringRef, 4> NameValList;
+    StringRef(ReflectList[I]).split(NameValList, ",");


Lets use the split iterator in StringExtras here as it should be simpler and more efficient.

Do we need to implement the comma-splitting? Wouldn't cl::list handle multiple values for us already if the user specifies the option more than once? If cl::list already handles multiple values, then we should probably rename the option into --nvvm-reflect-add or something...

Also, using a more modern C++ features for iteration may make work well here.

E.g.

for (StringRef O: ReflectList) { auto [Key, Value] = O.split("="); ... }

I made the splitting more modern.

For the comma-splitting, I think you need it. If you do ... --nvvm-reflect-add a=1,b=2 --nvm-reflect-add c=3,d=4,
then the reflect list looks like {"a=1,b=2", "c=3,d=4"}. So for each option specified you still need to split on the comma.

One option -> one key=value pair. There's no need to support specifying multiple pairs via a single option.

Is it fine instead to just add the cl::CommaSeparated argument when creating the option (see current version of code)? Now, like you said before, cl::list handles if the user specifies the option more than once and it splits on the commas for us.

There's indeed more than one way to do things. The question is -- which one makes more sense?

The way I see it the CLI should be as simple to implement, and as straightforward to use as we can make it. From that standpoint along the comma separated list of key=value pairs will stand out as a sore thumb. In some cases we have to do that, but this is not one of them. The "standard" way for the options is to specify one thing per option. E.g we're not passing comma-separated list of macros to define with -D. Speaking of which, -D is actually semantically very close to what you're trying to do, as NVVMReflect is, effectively, a preprocessor for the IR, and your new option defines one of the macro values for it.

So, let's keep thing simple, and define one reflect value at a time.

Ok, I fixed it so it has the desired behavior. Thanks for all the great feedback!!

llvm/lib/Target/NVPTX/NVVMReflect.cpp

github-actions · 2025-04-04T17:48:33Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Artem-B · 2025-04-04T20:16:38Z

@AlexMaclean do you think we could reuse intrinsic autoupgrade machinery for this, instead of making reflect processing more complicated?

It could be somewhat useful for other purposes.
E.g. we could introduce a const-foldable (when we know the GPU we're targeting) nvvm.arch() which would return CUDA_ARCH value and upgrade nvvm_reflect to it. Bonus point is that it would also be useful for IR users to parametrize their code without relying on NVVMReflect.

@jhuber6 would something like that help with some of your offloading cases. I recall you did run into trouble with NVVMReflect a while back.

jhuber6 · 2025-04-04T20:37:27Z

@AlexMaclean do you think we could reuse intrinsic autoupgrade machinery for this, instead of making reflect processing more complicated?

It could be somewhat useful for other purposes. E.g. we could introduce a const-foldable (when we know the GPU we're targeting) nvvm.arch() which would return CUDA_ARCH value and upgrade nvvm_reflect to it. Bonus point is that it would also be useful for IR users to parametrize their code without relying on NVVMReflect.

@jhuber6 would something like that help with some of your offloading cases. I recall you did run into trouble with NVVMReflect a while back.

All that really matters for correctness is that this pass is always run and it always does constant prop + DCE when it's used as an edge directly.

AlexMaclean · 2025-04-04T21:18:56Z

@AlexMaclean do you think we could reuse intrinsic autoupgrade machinery for this, instead of making reflect processing more complicated?

It could be somewhat useful for other purposes. E.g. we could introduce a const-foldable (when we know the GPU we're targeting) nvvm.arch() which would return CUDA_ARCH value and upgrade nvvm_reflect to it. Bonus point is that it would also be useful for IR users to parametrize their code without relying on NVVMReflect.

@jhuber6 would something like that help with some of your offloading cases. I recall you did run into trouble with NVVMReflect a while back.

It's an interesting idea, and I agree that intrinsics seem marginally cleaner than relying on string arguments, but I don't think there will be much benefit in the end. For correctness, we'll still need something like NVVMReflect to run to fold these calls away and do the constant prop + DCE.

I think with some cleanup this new approach will not be more complicated. In some ways I think this allows for simplification because we shouldn't need to be as careful about deleting and simplifying now that we're not iterating of instructions in a function.

Artem-B · 2025-04-04T22:38:19Z

llvm/lib/Target/NVPTX/NVVMReflect.cpp

-    return false;
+static cl::list<std::string>
+    ReflectList("nvvm-reflect-list", cl::value_desc("name=<int>"), cl::Hidden,
+                cl::desc("A list of string=num assignments"),


Nit: ".. list of comma-separated key= values"

llvm/lib/Target/NVPTX/NVVMReflect.cpp

Artem-B · 2025-04-04T22:59:03Z

llvm/lib/Target/NVPTX/NVVMReflect.cpp

+    SmallVector<StringRef, 4> NameValList;
+    StringRef(ReflectList[I]).split(NameValList, ",");


Do we need to implement the comma-splitting? Wouldn't cl::list handle multiple values for us already if the user specifies the option more than once? If cl::list already handles multiple values, then we should probably rename the option into --nvvm-reflect-add or something...

Also, using a more modern C++ features for iteration may make work well here.

E.g.

for (StringRef O: ReflectList) { auto [Key, Value] = O.split("="); ... }

jhuber6 · 2025-04-04T23:01:11Z

It's also worth noting that #134016 is AMD's attempt at doing the same functionality.

llvm/lib/Target/NVPTX/NVVMReflect.cpp

AlexMaclean

Looking lots better, couple minor stylistic issues to fix

llvm/lib/Target/NVPTX/NVPTX.h

llvm/lib/Target/NVPTX/NVVMReflect.cpp

AlexMaclean

Looks essentially good to go. As a stylistic matter I think the const qualifier has been somewhat over-applied. In general I think it makes sense to add const only when it improves readability, and not where ever possible.

llvm/lib/Target/NVPTX/NVVMReflect.cpp

YonahGoldberg · 2025-04-09T22:10:31Z

@Artem-B Currently I'm failing a Clang test because of the fact that I switched NVVMReflect to be a module pass now instead of a function pass. The test is `clang/test/CodeGen/builtins-nvptx.c:

__device__ bool reflect() {

// CHECK: call i32 @llvm.nvvm.reflect(ptr {{.*}})

  unsigned x = __nvvm_reflect("__CUDA_ARCH");
  return x >= 700;
}

I'm failing now because the NVVMReflect pass is getting run and removing the call. The difference that's causing it to get run now is in NVPTXTargetMachine.cpp:

PB.registerPipelineStartEPCallback(
      [this](ModulePassManager &PM, OptimizationLevel Level) {
        // We do not want to fold out calls to nvvm.reflect early if the user
        // has not provided a target architecture just yet.
        if (Subtarget.hasTargetName())
          PM.addPass(NVVMReflectPass(Subtarget.getSmVersion()));

        FunctionPassManager FPM;
        // Note: NVVMIntrRangePass was causing numerical discrepancies at one
        // point, if issues crop up, consider disabling.
        FPM.addPass(NVVMIntrRangePass());
        if (EarlyByValArgsCopy)
          FPM.addPass(NVPTXCopyByValArgsPass());
        PM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM)));
      });

I changed this in my PR so that the reflect pass is added to the module pass manager, not the function pass manager. For some reason, these function passes added in the callback are NOT run in the lit-test, but the module pass is.

For reference, the clang cmd run is: clang -cc1 -internal-isystem /mnt/data/ygoldberg/mainline/llvm-project/build/lib/clang/21/include -nostdsysteminc -ffp-contract=off -triple nvptx64-unknown-unknown -target-cpu sm_70 -target-feature +ptx63 -fcuda-is-device -emit-llvm -o - -x cuda /mnt/data/ygoldberg/mainline/llvm-project/clang/test/CodeGen/builtins-nvptx.c

Do you know if this is an issue or why this is happening?

Artem-B · 2025-04-09T22:50:51Z

I don't know why there's a difference in whether reflect gets to run or not while emitting unoptimized IR.
Can you try adding -disable-llvm-optzns and see if that helps?

In the end it does not really matter much. We can adjust the test to verify that we see the intended reflect value in the generated IR.

YonahGoldberg · 2025-04-09T23:26:06Z

This option worked. If I add it to all the clang compile commands for the builtins-nvptx.c, all tests pass. Should we just go ahead and commit that instead of changing the reflect test?

AlexMaclean

LGTM, please wait for @Artem-B's approval before landing.

Artem-B

LGTM for the code. Tests could use a bit more polishing.

llvm/lib/Target/NVPTX/NVVMReflect.cpp

llvm/test/CodeGen/NVPTX/nvvm-reflect-options.ll

Artem-B

Almost there. Few more test nits.

llvm/test/CodeGen/NVPTX/nvvm-reflect-options.ll

Artem-B

LGTM.

AlexMaclean · 2025-04-11T01:33:31Z

Merging on behalf of @YonahGoldberg at his request offline.

github-actions · 2025-04-11T01:33:55Z

@YonahGoldberg Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

…tion (#135825) This was already declared in NVPTX.h and I accidentally added it back in #134416.

The NVVMReflect pass simply replaces calls to nvvm-reflect functions with the appropriate constant, either the architecture number, or nvvm-reflect-ftz, found in the module's metadata. The implementation is inefficient and does this by traversing through all instructions to find calls. The common case is that you never call nvvm-reflect, so this traversal is costly. This PR: - Updates the pass so that it finds the reflect functions by name, and then traverses through their uses to find the calls directly. - Adds a line (245) to make sure the dead nvvm-reflect definitions are erased. - Adds the ability to set reflect values via command line

…tion (llvm#135825) This was already declared in NVPTX.h and I accidentally added it back in llvm#134416.

llvmbot added the backend:NVPTX label Apr 4, 2025

justinfargnoli requested review from AlexMaclean and Artem-B April 4, 2025 17:03

justinfargnoli reviewed Apr 4, 2025

View reviewed changes

llvm/lib/Target/NVPTX/NVVMReflect.cpp Outdated Show resolved Hide resolved

justinfargnoli assigned YonahGoldberg Apr 4, 2025

AlexMaclean reviewed Apr 4, 2025

View reviewed changes

Artem-B reviewed Apr 4, 2025

View reviewed changes

jhuber6 reviewed Apr 7, 2025

View reviewed changes

llvm/lib/Target/NVPTX/NVVMReflect.cpp Outdated Show resolved Hide resolved

YonahGoldberg force-pushed the nvvm-reflect branch 4 times, most recently from 903731a to cbf8664 Compare April 8, 2025 22:26

AlexMaclean reviewed Apr 8, 2025

View reviewed changes

AlexMaclean reviewed Apr 9, 2025

View reviewed changes

llvm/lib/Target/NVPTX/NVVMReflect.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/NVPTX/NVVMReflect.cpp Outdated Show resolved Hide resolved

AlexMaclean approved these changes Apr 10, 2025

View reviewed changes

Artem-B approved these changes Apr 10, 2025

View reviewed changes

llvm/lib/Target/NVPTX/NVVMReflect.cpp Outdated Show resolved Hide resolved

llvm/test/CodeGen/NVPTX/nvvm-reflect-options.ll Outdated Show resolved Hide resolved

llvm/test/CodeGen/NVPTX/nvvm-reflect-options.ll Outdated Show resolved Hide resolved

Artem-B reviewed Apr 10, 2025

View reviewed changes

llvm/test/CodeGen/NVPTX/nvvm-reflect-options.ll Outdated Show resolved Hide resolved

llvm/test/CodeGen/NVPTX/nvvm-reflect-options.ll Outdated Show resolved Hide resolved

llvm/test/CodeGen/NVPTX/nvvm-reflect-options.ll Outdated Show resolved Hide resolved

YonahGoldberg requested review from lanza and bcardosolopes as code owners April 10, 2025 22:17

YonahGoldberg added 17 commits April 10, 2025 22:18

forgot to set Changed variable

7c3bd38

typo in comment

cfab804

final cleanup

0eac5e9

formatting

9b0e427

formatting

4177763

int -> unsigned

04d509d

const correct probably

9c54010

format

8429c81

forgot a getOperand call

d2b828d

format

cbd4793

format

7ddc7e4

newline

1ac6c2b

removing const

716c8f0

format

ece70ba

updated reflect options test

cc159a9

newline

2056414

final style changes

47f280e

YonahGoldberg force-pushed the nvvm-reflect branch from 968a917 to 47f280e Compare April 10, 2025 22:19

Artem-B approved these changes Apr 10, 2025

View reviewed changes

clang format

ac0ffbf

YonahGoldberg force-pushed the nvvm-reflect branch from e61755d to ac0ffbf Compare April 10, 2025 22:46

Merge branch 'main' into nvvm-reflect

b9c5704

AlexMaclean merged commit 701d726 into llvm:main Apr 11, 2025
11 checks passed

Wolfram70 mentioned this pull request Apr 11, 2025

[clang][NVPTX] Add builtins and intrinsics for conversions of new FP types #134345

Merged

YonahGoldberg mentioned this pull request Apr 15, 2025

[NVPTX] Remove extraneous initializeNVVMReflectLegacyPassPass declaration #135825

Merged

AlexMaclean pushed a commit that referenced this pull request Apr 16, 2025

[NVPTX] Remove extraneous initializeNVVMReflectLegacyPassPass declara…

ef1abbe

…tion (#135825) This was already declared in NVPTX.h and I accidentally added it back in #134416.

var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025

[NVPTX] Remove extraneous initializeNVVMReflectLegacyPassPass declara…

e6845cf

…tion (llvm#135825) This was already declared in NVPTX.h and I accidentally added it back in llvm#134416.

		SmallVector<StringRef, 4> NameValList;
		StringRef(ReflectList[I]).split(NameValList, ",");

[NVPTX] Improve NVVMReflect Efficiency #134416

[NVPTX] Improve NVVMReflect Efficiency #134416

Uh oh!

Conversation

YonahGoldberg commented Apr 4, 2025

Uh oh!

github-actions bot commented Apr 4, 2025

Uh oh!

llvmbot commented Apr 4, 2025

Uh oh!

YonahGoldberg commented Apr 4, 2025

Uh oh!

Uh oh!

AlexMaclean left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Artem-B Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Artem-B commented Apr 4, 2025

Uh oh!

jhuber6 commented Apr 4, 2025

Uh oh!

AlexMaclean commented Apr 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhuber6 commented Apr 4, 2025

Uh oh!

Uh oh!

AlexMaclean left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexMaclean left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

YonahGoldberg commented Apr 9, 2025

Uh oh!

Artem-B commented Apr 9, 2025

Uh oh!

YonahGoldberg commented Apr 9, 2025

Artem-B Apr 7, 2025 •

edited

Loading

github-actions bot commented Apr 4, 2025 •

edited

Loading