Skip to content

[flang][cuda] Add kernel registration in CUF constructor #112416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 15, 2024

Conversation

clementval
Copy link
Contributor

Update the CUF constructor with the cuf.register_kernel operations.

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Oct 15, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 15, 2024

@llvm/pr-subscribers-flang-fir-hlfir

Author: Valentin Clement (バレンタイン クレメン) (clementval)

Changes

Update the CUF constructor with the cuf.register_kernel operations.


Full diff: https://github.com/llvm/llvm-project/pull/112416.diff

3 Files Affected:

  • (modified) flang/include/flang/Optimizer/Transforms/Passes.td (+1-1)
  • (modified) flang/lib/Optimizer/Transforms/CUFAddConstructor.cpp (+18-2)
  • (modified) flang/test/Fir/CUDA/cuda-register-func.fir (+2-6)
diff --git a/flang/include/flang/Optimizer/Transforms/Passes.td b/flang/include/flang/Optimizer/Transforms/Passes.td
index bf75123e853779..af6bd41cbb71da 100644
--- a/flang/include/flang/Optimizer/Transforms/Passes.td
+++ b/flang/include/flang/Optimizer/Transforms/Passes.td
@@ -439,7 +439,7 @@ def CufImplicitDeviceGlobal :
 def CUFAddConstructor : Pass<"cuf-add-constructor", "mlir::ModuleOp"> {
   let summary = "Add constructor to register CUDA Fortran allocators";
   let dependentDialects = [
-    "mlir::func::FuncDialect"
+    "cuf::CUFDialect", "mlir::func::FuncDialect"
   ];
 }
 
diff --git a/flang/lib/Optimizer/Transforms/CUFAddConstructor.cpp b/flang/lib/Optimizer/Transforms/CUFAddConstructor.cpp
index 48620fbc585861..3db24226e75042 100644
--- a/flang/lib/Optimizer/Transforms/CUFAddConstructor.cpp
+++ b/flang/lib/Optimizer/Transforms/CUFAddConstructor.cpp
@@ -12,6 +12,7 @@
 #include "flang/Optimizer/Dialect/FIRDialect.h"
 #include "flang/Optimizer/Dialect/FIROpsSupport.h"
 #include "flang/Runtime/entry-names.h"
+#include "mlir/Dialect/GPU/IR/GPUDialect.h"
 #include "mlir/Dialect/LLVMIR/LLVMDialect.h"
 #include "mlir/Pass/Pass.h"
 #include "llvm/ADT/SmallVector.h"
@@ -23,6 +24,8 @@ namespace fir {
 
 namespace {
 
+static constexpr llvm::StringRef cudaModName{"cuda_device_mod"};
+
 static constexpr llvm::StringRef cudaFortranCtorName{
     "__cudaFortranConstructor"};
 
@@ -31,6 +34,7 @@ struct CUFAddConstructor
 
   void runOnOperation() override {
     mlir::ModuleOp mod = getOperation();
+    mlir::SymbolTable symTab(mod);
     mlir::OpBuilder builder{mod.getBodyRegion()};
     builder.setInsertionPointToEnd(mod.getBody());
     mlir::Location loc = mod.getLoc();
@@ -48,13 +52,25 @@ struct CUFAddConstructor
         mod.getContext(), RTNAME_STRING(CUFRegisterAllocator));
     builder.setInsertionPointToEnd(mod.getBody());
 
-    // Create the constructor function that cal CUFRegisterAllocator.
-    builder.setInsertionPointToEnd(mod.getBody());
+    // Create the constructor function that call CUFRegisterAllocator.
     auto func = builder.create<mlir::LLVM::LLVMFuncOp>(loc, cudaFortranCtorName,
                                                        funcTy);
     func.setLinkage(mlir::LLVM::Linkage::Internal);
     builder.setInsertionPointToStart(func.addEntryBlock(builder));
     builder.create<mlir::LLVM::CallOp>(loc, funcTy, cufRegisterAllocatorRef);
+
+    // Register kernels
+    auto gpuMod = symTab.lookup<mlir::gpu::GPUModuleOp>(cudaModName);
+    if (gpuMod) {
+      for (auto func : gpuMod.getOps<mlir::gpu::GPUFuncOp>()) {
+        if (func.isKernel()) {
+          auto kernelName = mlir::SymbolRefAttr::get(
+              builder.getStringAttr(cudaModName),
+              {mlir::SymbolRefAttr::get(builder.getContext(), func.getName())});
+          builder.create<cuf::RegisterKernelOp>(loc, kernelName);
+        }
+      }
+    }
     builder.create<mlir::LLVM::ReturnOp>(loc, mlir::ValueRange{});
 
     // Create the llvm.global_ctor with the function.
diff --git a/flang/test/Fir/CUDA/cuda-register-func.fir b/flang/test/Fir/CUDA/cuda-register-func.fir
index a428f68eb3bf42..277475f0883dcc 100644
--- a/flang/test/Fir/CUDA/cuda-register-func.fir
+++ b/flang/test/Fir/CUDA/cuda-register-func.fir
@@ -1,4 +1,4 @@
-// RUN: fir-opt %s | FileCheck %s
+// RUN: fir-opt --cuf-add-constructor %s | FileCheck %s
 
 module attributes {gpu.container_module} {
   gpu.module @cuda_device_mod {
@@ -9,12 +9,8 @@ module attributes {gpu.container_module} {
       gpu.return
     }
   }
-  llvm.func internal @__cudaFortranConstructor() {
-    cuf.register_kernel @cuda_device_mod::@_QPsub_device1
-    cuf.register_kernel @cuda_device_mod::@_QPsub_device2
-    llvm.return
-  }
 }
 
+// CHECK-LABEL: llvm.func internal @__cudaFortranConstructor()
 // CHECK: cuf.register_kernel @cuda_device_mod::@_QPsub_device1
 // CHECK: cuf.register_kernel @cuda_device_mod::@_QPsub_device2

Copy link
Contributor

@Renaud-K Renaud-K left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@clementval clementval merged commit 8588014 into llvm:main Oct 15, 2024
5 of 7 checks passed
@clementval clementval deleted the cuf_register_constructor branch October 15, 2024 21:18
DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this pull request Oct 16, 2024
Update the CUF constructor with the cuf.register_kernel operations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants