Skip to content

[MLIR][NVVM] Add dumpISA and dumpMachineISA Flags #116199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

grypp
Copy link
Member

@grypp grypp commented Nov 14, 2024

Currently, dumping virtual and machine-level ISA is restricted to debug builds, making it unavailable in release builds. However, the ability to view these ISAs can be valuable even in release builds or with production compilers. For instance, nvcc provides similar functionality, making it a useful reference.

This PR introduces dumpISA and dumpMachineISA flags to the GpuModuleToBinaryPass.

Additionally, it adds dump-ptx and dump-sass flags to the GPUToNVVMPipelineOptions.

Currently, dumping virtual and machine-level ISA is restricted to debug builds, making it unavailable in release builds. However, the ability to view these ISAs can be valuable even in release builds or with production compilers. For instance, `nvcc` provides similar functionality, making it a useful reference.

This PR introduces `dumpISA` and `dumpMachineISA` flags to the `GpuModuleToBinaryPass`.

Additionally, it adds `dump-ptx` and `dump-sass` flags to the `GPUToNVVMPipelineOptions`.
@llvmbot
Copy link
Member

llvmbot commented Nov 14, 2024

@llvm/pr-subscribers-mlir-llvm
@llvm/pr-subscribers-mlir-gpu

@llvm/pr-subscribers-mlir

Author: Guray Ozen (grypp)

Changes

Currently, dumping virtual and machine-level ISA is restricted to debug builds, making it unavailable in release builds. However, the ability to view these ISAs can be valuable even in release builds or with production compilers. For instance, nvcc provides similar functionality, making it a useful reference.

This PR introduces dumpISA and dumpMachineISA flags to the GpuModuleToBinaryPass.

Additionally, it adds dump-ptx and dump-sass flags to the GPUToNVVMPipelineOptions.


Full diff: https://github.com/llvm/llvm-project/pull/116199.diff

11 Files Affected:

  • (modified) mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h (+16-2)
  • (modified) mlir/include/mlir/Dialect/GPU/Pipelines/Passes.h (+6)
  • (modified) mlir/include/mlir/Dialect/GPU/Transforms/Passes.td (+9-1)
  • (modified) mlir/include/mlir/Target/LLVM/NVVM/Utils.h (+1-1)
  • (modified) mlir/lib/Dialect/GPU/IR/GPUDialect.cpp (+12-4)
  • (modified) mlir/lib/Dialect/GPU/Pipelines/GPUToNVVMPipeline.cpp (+2)
  • (modified) mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp (+1-1)
  • (modified) mlir/lib/Target/LLVM/NVVM/Target.cpp (+14-14)
  • (modified) mlir/lib/Target/LLVM/ROCDL/Target.cpp (+3-4)
  • (modified) mlir/test/Integration/GPU/CUDA/dump-ptx.mlir (+1-1)
  • (modified) mlir/test/Integration/GPU/CUDA/dump-sass.mlir (+1-1)
diff --git a/mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h b/mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h
index 6d7cb5ca7a7f81..88111aed2a0a6b 100644
--- a/mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h
+++ b/mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h
@@ -52,7 +52,8 @@ class TargetOptions {
       StringRef toolkitPath = {}, ArrayRef<std::string> linkFiles = {},
       StringRef cmdOptions = {},
       CompilationTarget compilationTarget = getDefaultCompilationTarget(),
-      function_ref<SymbolTable *()> getSymbolTableCallback = {});
+      function_ref<SymbolTable *()> getSymbolTableCallback = {},
+      bool dumpISA = false, bool dumpMachineISA = false);
 
   /// Returns the typeID.
   TypeID getTypeID() const;
@@ -66,6 +67,12 @@ class TargetOptions {
   /// Returns the command line options.
   StringRef getCmdOptions() const;
 
+  /// Returns the dump-isa command line options.
+  bool getDumpISA() const;
+
+  /// Returns the dump-machine-isa command line options.
+  bool getDumpMachineISA() const;
+
   /// Returns a tokenization of the command line options.
   std::pair<llvm::BumpPtrAllocator, SmallVector<const char *>>
   tokenizeCmdOptions() const;
@@ -90,7 +97,8 @@ class TargetOptions {
       TypeID typeID, StringRef toolkitPath = {},
       ArrayRef<std::string> linkFiles = {}, StringRef cmdOptions = {},
       CompilationTarget compilationTarget = getDefaultCompilationTarget(),
-      function_ref<SymbolTable *()> getSymbolTableCallback = {});
+      function_ref<SymbolTable *()> getSymbolTableCallback = {},
+      bool dumpISA = false, bool dumpMachineISA = false);
 
   /// Path to the target toolkit.
   std::string toolkitPath;
@@ -102,6 +110,12 @@ class TargetOptions {
   /// process.
   std::string cmdOptions;
 
+  /// An optional flag to dump generated ISA.
+  bool dumpISA = false;
+
+  /// An optional flag to dump generated and disassembled machine ISA.
+  bool dumpMachineISA = false;
+
   /// Compilation process target format.
   CompilationTarget compilationTarget;
 
diff --git a/mlir/include/mlir/Dialect/GPU/Pipelines/Passes.h b/mlir/include/mlir/Dialect/GPU/Pipelines/Passes.h
index caa0901bb49434..485eac6bad5b89 100644
--- a/mlir/include/mlir/Dialect/GPU/Pipelines/Passes.h
+++ b/mlir/include/mlir/Dialect/GPU/Pipelines/Passes.h
@@ -53,6 +53,12 @@ struct GPUToNVVMPipelineOptions
           "Whether to use the bareptr calling convention on the host (warning "
           "this should be false until the GPU layering is fixed)"),
       llvm::cl::init(false)};
+  PassOptions::Option<bool> dumpPtx{
+      *this, "dump-ptx", llvm::cl::desc("Dumps PTX code to the error output"),
+      llvm::cl::init(false)};
+  PassOptions::Option<bool> dumpSass{
+      *this, "dump-sass", llvm::cl::desc("Dumps SASS code to the error output"),
+      llvm::cl::init(false)};
 };
 
 //===----------------------------------------------------------------------===//
diff --git a/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td b/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td
index 4a9ddafdd177d2..f7cd9dd8c15bfb 100644
--- a/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td
+++ b/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td
@@ -95,7 +95,15 @@ def GpuModuleToBinaryPass
     Option<"cmdOptions", "opts", "std::string", [{""}],
            "Command line options to pass to the tools.">,
     Option<"compilationTarget", "format", "std::string", [{"fatbin"}],
-           "The target representation of the compilation process.">
+           "The target representation of the compilation process.">,           
+    Option<"dumpISA", "dump-isa", "bool",
+           /*default=*/"false",
+           "Dumps generated ISA to the error output.">,
+    Option<"dumpMachineISA", "dump-machine-isa", "bool",
+           /*default=*/"false",
+           "Dumps the generated machine-level ISA to the error output. "
+           "If the generated ISA is virtual, it instead dumps the"
+           "machine-level equivalent.">
   ];
 }
 
diff --git a/mlir/include/mlir/Target/LLVM/NVVM/Utils.h b/mlir/include/mlir/Target/LLVM/NVVM/Utils.h
index 65ae8a6bdb4ada..6926acaa3e337a 100644
--- a/mlir/include/mlir/Target/LLVM/NVVM/Utils.h
+++ b/mlir/include/mlir/Target/LLVM/NVVM/Utils.h
@@ -54,7 +54,7 @@ class SerializeGPUModuleBase : public LLVM::ModuleToObject {
   LogicalResult appendStandardLibs();
 
   /// Loads the bitcode files in `fileList`.
-  virtual std::optional<SmallVector<std::unique_ptr<llvm::Module>>>
+  std::optional<SmallVector<std::unique_ptr<llvm::Module>>>
   loadBitcodeFiles(llvm::Module &module) override;
 
 protected:
diff --git a/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp b/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
index 956877497d9338..3407cb9ec3cbad 100644
--- a/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
+++ b/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
@@ -2302,18 +2302,26 @@ KernelMetadataAttr KernelTableAttr::lookup(StringAttr key) const {
 TargetOptions::TargetOptions(
     StringRef toolkitPath, ArrayRef<std::string> linkFiles,
     StringRef cmdOptions, CompilationTarget compilationTarget,
-    function_ref<SymbolTable *()> getSymbolTableCallback)
+    function_ref<SymbolTable *()> getSymbolTableCallback, bool dumpISA,
+    bool dumpMachineISA)
     : TargetOptions(TypeID::get<TargetOptions>(), toolkitPath, linkFiles,
-                    cmdOptions, compilationTarget, getSymbolTableCallback) {}
+                    cmdOptions, compilationTarget, getSymbolTableCallback,
+                    dumpISA, dumpMachineISA) {}
 
 TargetOptions::TargetOptions(
     TypeID typeID, StringRef toolkitPath, ArrayRef<std::string> linkFiles,
     StringRef cmdOptions, CompilationTarget compilationTarget,
-    function_ref<SymbolTable *()> getSymbolTableCallback)
+    function_ref<SymbolTable *()> getSymbolTableCallback, bool dumpISA,
+    bool dumpMachineISA)
     : toolkitPath(toolkitPath.str()), linkFiles(linkFiles),
-      cmdOptions(cmdOptions.str()), compilationTarget(compilationTarget),
+      cmdOptions(cmdOptions.str()), dumpISA(dumpISA),
+      dumpMachineISA(dumpMachineISA), compilationTarget(compilationTarget),
       getSymbolTableCallback(getSymbolTableCallback), typeID(typeID) {}
 
+bool TargetOptions::getDumpISA() const { return dumpISA; }
+
+bool TargetOptions::getDumpMachineISA() const { return dumpMachineISA; }
+
 TypeID TargetOptions::getTypeID() const { return typeID; }
 
 StringRef TargetOptions::getToolkitPath() const { return toolkitPath; }
diff --git a/mlir/lib/Dialect/GPU/Pipelines/GPUToNVVMPipeline.cpp b/mlir/lib/Dialect/GPU/Pipelines/GPUToNVVMPipeline.cpp
index fb440756e0c1d5..8d32be95f57696 100644
--- a/mlir/lib/Dialect/GPU/Pipelines/GPUToNVVMPipeline.cpp
+++ b/mlir/lib/Dialect/GPU/Pipelines/GPUToNVVMPipeline.cpp
@@ -95,6 +95,8 @@ void buildHostPostPipeline(OpPassManager &pm,
 
   GpuModuleToBinaryPassOptions gpuModuleToBinaryPassOptions;
   gpuModuleToBinaryPassOptions.compilationTarget = options.cubinFormat;
+  gpuModuleToBinaryPassOptions.dumpISA = options.dumpPtx;
+  gpuModuleToBinaryPassOptions.dumpMachineISA = options.dumpSass;
   pm.addPass(createGpuModuleToBinaryPass(gpuModuleToBinaryPassOptions));
   pm.addPass(createConvertMathToLLVMPass());
   pm.addPass(createCanonicalizerPass());
diff --git a/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp b/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
index 86a3b4780e88ce..a9538407888cb3 100644
--- a/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
+++ b/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
@@ -70,7 +70,7 @@ void GpuModuleToBinaryPass::runOnOperation() {
   };
 
   TargetOptions targetOptions(toolkitPath, linkFiles, cmdOptions, *targetFormat,
-                              lazyTableBuilder);
+                              lazyTableBuilder, dumpISA, dumpMachineISA);
   if (failed(transformGpuModulesToBinaries(
           getOperation(), OffloadingLLVMTranslationAttrInterface(nullptr),
           targetOptions)))
diff --git a/mlir/lib/Target/LLVM/NVVM/Target.cpp b/mlir/lib/Target/LLVM/NVVM/Target.cpp
index 69602af8563aa0..b5aff6be272ffd 100644
--- a/mlir/lib/Target/LLVM/NVVM/Target.cpp
+++ b/mlir/lib/Target/LLVM/NVVM/Target.cpp
@@ -292,8 +292,8 @@ NVPTXSerializer::compileToBinary(const std::string &ptxCode) {
     return std::nullopt;
   TmpFile cubinFile;
   if (createFatbin) {
-    Twine cubinFilename = ptxFile->first + ".cubin";
-    cubinFile = TmpFile(cubinFilename.str(), llvm::FileRemover(cubinFilename));
+    std::string cubinFilename = (ptxFile->first + ".cubin").str();
+    cubinFile = TmpFile(cubinFilename, llvm::FileRemover(cubinFilename));
   } else {
     cubinFile.first = binaryFile->first;
   }
@@ -402,8 +402,8 @@ NVPTXSerializer::compileToBinary(const std::string &ptxCode) {
                                 /*MemoryLimit=*/0,
                                 /*ErrMsg=*/&message))
     return emitLogError("`ptxas`");
-#define DEBUG_TYPE "dump-sass"
-  LLVM_DEBUG({
+
+  if (targetOptions.getDumpMachineISA()) {
     std::optional<std::string> nvdisasm = findTool("nvdisasm");
     SmallVector<StringRef> nvdisasmArgs(
         {StringRef("nvdisasm"), StringRef(cubinFile.first)});
@@ -417,11 +417,10 @@ NVPTXSerializer::compileToBinary(const std::string &ptxCode) {
     llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> logBuffer =
         llvm::MemoryBuffer::getFile(logFile->first);
     if (logBuffer && !(*logBuffer)->getBuffer().empty()) {
-      llvm::dbgs() << "Output:\n" << (*logBuffer)->getBuffer() << "\n";
-      llvm::dbgs().flush();
+      llvm::errs() << "Output:\n" << (*logBuffer)->getBuffer() << "\n";
+      llvm::errs().flush();
     }
-  });
-#undef DEBUG_TYPE
+  }
 
   // Invoke `fatbin`.
   message.clear();
@@ -572,12 +571,13 @@ NVPTXSerializer::moduleToObject(llvm::Module &llvmModule) {
     getOperation().emitError() << "Failed translating the module to ISA.";
     return std::nullopt;
   }
-#define DEBUG_TYPE "serialize-to-isa"
-  LLVM_DEBUG({
-    llvm::dbgs() << "PTX for module: " << getOperation().getNameAttr() << "\n";
-    llvm::dbgs() << *serializedISA << "\n";
-    llvm::dbgs().flush();
-  });
+  if (targetOptions.getDumpISA()) {
+    llvm::errs() << "// Generated PTX for module: "
+                 << getOperation().getNameAttr() << "\n";
+    llvm::errs() << *serializedISA << "\n";
+    llvm::errs().flush();
+  }
+
 #undef DEBUG_TYPE
 
   // Return PTX if the compilation target is `assembly`.
diff --git a/mlir/lib/Target/LLVM/ROCDL/Target.cpp b/mlir/lib/Target/LLVM/ROCDL/Target.cpp
index 227b45133b57e3..6761479e88da61 100644
--- a/mlir/lib/Target/LLVM/ROCDL/Target.cpp
+++ b/mlir/lib/Target/LLVM/ROCDL/Target.cpp
@@ -430,13 +430,12 @@ std::optional<SmallVector<char, 0>> SerializeGPUModuleBase::moduleToObjectImpl(
     getOperation().emitError() << "failed translating the module to ISA";
     return std::nullopt;
   }
-#define DEBUG_TYPE "serialize-to-isa"
-  LLVM_DEBUG({
+  if (targetOptions.getDumpISA()) {
     llvm::dbgs() << "ISA for module: "
                  << cast<gpu::GPUModuleOp>(getOperation()).getNameAttr() << "\n"
                  << *serializedISA << "\n";
-  });
-#undef DEBUG_TYPE
+  }
+
   // Return ISA assembly code if the compilation target is assembly.
   if (targetOptions.getCompilationTarget() == gpu::CompilationTarget::Assembly)
     return SmallVector<char, 0>(serializedISA->begin(), serializedISA->end());
diff --git a/mlir/test/Integration/GPU/CUDA/dump-ptx.mlir b/mlir/test/Integration/GPU/CUDA/dump-ptx.mlir
index 0cc5d8645bb364..c511596d49c27a 100644
--- a/mlir/test/Integration/GPU/CUDA/dump-ptx.mlir
+++ b/mlir/test/Integration/GPU/CUDA/dump-ptx.mlir
@@ -1,5 +1,5 @@
 // RUN: mlir-opt %s \
-// RUN:  | mlir-opt -gpu-lower-to-nvvm-pipeline -debug-only=serialize-to-isa \
+// RUN:  | mlir-opt -gpu-lower-to-nvvm-pipeline="dump-ptx=true" \
 // RUN:  2>&1 | FileCheck %s
 
 // CHECK: Generated by LLVM NVPTX Back-End
diff --git a/mlir/test/Integration/GPU/CUDA/dump-sass.mlir b/mlir/test/Integration/GPU/CUDA/dump-sass.mlir
index d32f5efc29d58e..2b204644c90c14 100644
--- a/mlir/test/Integration/GPU/CUDA/dump-sass.mlir
+++ b/mlir/test/Integration/GPU/CUDA/dump-sass.mlir
@@ -1,5 +1,5 @@
 // RUN: mlir-opt %s \
-// RUN:  | mlir-opt -gpu-lower-to-nvvm-pipeline -debug-only=dump-sass \
+// RUN:  | mlir-opt -gpu-lower-to-nvvm-pipeline="dump-sass=true" \
 // RUN:  2>&1 | FileCheck %s
 
 // CHECK: MOV

Copy link
Collaborator

@joker-eph joker-eph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, dumping virtual and machine-level ISA is restricted to debug builds, making it unavailable in release builds. However, the ability to view these ISAs can be valuable even in release builds or with production compilers.

I don't buy this argument: debug output remains debug output. You're just arbitrarily interested in some debug output instead of others, but that does not seem like well plumbed to me here.

See #116007 for a more principled approach that offers the actually features in a more structured way.

@grypp
Copy link
Member Author

grypp commented Nov 14, 2024

I don't think generated code is a debug output.

See #116007 for a more principled approach that offers the actually features in a more structured way.

I am totally fine with that approach. I can close my PR :) I wasn't aware of this PR.

@grypp grypp closed this Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants