[flang][cuda] Do not register global constants #118582

Renaud-K · 2024-12-04T03:03:08Z

Global constants have no symbols in library files. They are replaced with literal constants during lowering before kernels are moved into a GPU module. Do not register them because they will result in unresolved symbols.

llvmbot · 2024-12-04T03:03:43Z

@llvm/pr-subscribers-flang-fir-hlfir

Author: Renaud Kauffmann (Renaud-K)

Changes

Global constants have no symbols in library files. They are replaced with literal constants during lowering before kernels are moved into a GPU module. Do not register them because they will result in unresolved symbols.

Full diff: https://github.com/llvm/llvm-project/pull/118582.diff

2 Files Affected:

(modified) flang/lib/Optimizer/Transforms/CUFAddConstructor.cpp (+1-4)
(modified) flang/test/Fir/CUDA/cuda-constructor-2.f90 (+30)

diff --git a/flang/lib/Optimizer/Transforms/CUFAddConstructor.cpp b/flang/lib/Optimizer/Transforms/CUFAddConstructor.cpp
index dd204126be5dbc..73a46843f0320b 100644
--- a/flang/lib/Optimizer/Transforms/CUFAddConstructor.cpp
+++ b/flang/lib/Optimizer/Transforms/CUFAddConstructor.cpp
@@ -106,8 +106,7 @@ struct CUFAddConstructor
 
         mlir::func::FuncOp func;
         switch (attr.getValue()) {
-        case cuf::DataAttribute::Device:
-        case cuf::DataAttribute::Constant: {
+        case cuf::DataAttribute::Device: {
           func = fir::runtime::getRuntimeFunc<mkRTKey(CUFRegisterVariable)>(
               loc, builder);
           auto fTy = func.getFunctionType();
@@ -145,8 +144,6 @@ struct CUFAddConstructor
         default:
           break;
         }
-        if (!func)
-          continue;
       }
     }
     builder.create<mlir::LLVM::ReturnOp>(loc, mlir::ValueRange{});
diff --git a/flang/test/Fir/CUDA/cuda-constructor-2.f90 b/flang/test/Fir/CUDA/cuda-constructor-2.f90
index 901497e2cde550..29efdb083878af 100644
--- a/flang/test/Fir/CUDA/cuda-constructor-2.f90
+++ b/flang/test/Fir/CUDA/cuda-constructor-2.f90
@@ -30,3 +30,33 @@ module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<!llvm.ptr, dense<
 // CHECK-DAG: %[[BOXREF:.*]] = fir.convert %[[BOX]] : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) -> !fir.ref<i8>
 // CHECK-DAG: fir.call @_FortranACUFRegisterVariable(%[[MODULE:.*]], %[[BOXREF]], %{{.*}}, %{{.*}}) 
 //
+
+// ----- 
+
+// Checking that constant global variables are not registered
+
+// CHECK: @_FortranACUFRegisterAllocator
+// CHECK-NOT: fir.call @_FortranACUFRegisterVariable
+
+module attributes {dlti.dl_spec = #dlti.dl_spec<i8 = dense<8> : vector<2xi64>, i16 = dense<16> : vector<2xi64>, i1 = dense<8> : vector<2xi64>, !llvm.ptr = dense<64> : vector<4xi64>, f80 = dense<128> : vector<2xi64>, i128 = dense<128> : vector<2xi64>, i64 = dense<64> : vector<2xi64>, !llvm.ptr<271> = dense<32> : vector<4xi64>, !llvm.ptr<272> = dense<64> : vector<4xi64>, f128 = dense<128> : vector<2xi64>, !llvm.ptr<270> = dense<32> : vector<4xi64>, f16 = dense<16> : vector<2xi64>, f64 = dense<64> : vector<2xi64>, i32 = dense<32> : vector<2xi64>, "dlti.stack_alignment" = 128 : i64, "dlti.endianness" = "little">, fir.defaultkind = "a1c4d8i4l4r4", fir.kindmap = "", gpu.container_module, llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", llvm.ident = "flang version 20.0.0 (https://github.com/llvm/llvm-project.git 3372303188df0f7f8ac26e7ab610cf8b0f716d42)", llvm.target_triple = "x86_64-unknown-linux-gnu"} {
+  fir.global @_QMiso_c_bindingECc_int {data_attr = #cuf.cuda<constant>} constant : i32
+  
+
+  fir.type_info @_QM__fortran_builtinsT__builtin_c_ptr noinit nodestroy nofinal : !fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}>
+  gpu.module @cuda_device_mod {
+    fir.global @_QMiso_c_bindingECc_int {data_attr = #cuf.cuda<constant>} constant : i32
+    gpu.func @_QMdevmodPdevsub(%arg0: !fir.ref<!fir.box<!fir.heap<!fir.array<?xf32>>>>) -> i32 {
+      %c0 = arith.constant 0 : index
+      %c4_i32 = arith.constant 4 : i32
+      %0 = fir.alloca i32 {bindc_name = "devsub", uniq_name = "_QMdevmodFdevsubEdevsub"}
+      %1 = fir.alloca i32 {bindc_name = "__builtin_warpsize", uniq_name = "_QM__fortran_builtinsEC__builtin_warpsize"}
+      %2 = fir.load %arg0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xf32>>>>
+      %3:3 = fir.box_dims %2, %c0 : (!fir.box<!fir.heap<!fir.array<?xf32>>>, index) -> (index, index, index)
+      %4 = fir.convert %3#1 : (index) -> i32
+      %5 = arith.muli %4, %c4_i32 : i32
+      fir.store %5 to %0 : !fir.ref<i32>
+      %6 = fir.load %0 : !fir.ref<i32>
+      gpu.return %6 : i32
+    }
+  }
+}
\ No newline at end of file

clementval

LGTM

clementval · 2024-12-04T17:56:23Z

flang/lib/Optimizer/Transforms/CUFAddConstructor.cpp

@@ -106,8 +106,7 @@ struct CUFAddConstructor

        mlir::func::FuncOp func;
        switch (attr.getValue()) {
-        case cuf::DataAttribute::Device:
-        case cuf::DataAttribute::Constant: {
+        case cuf::DataAttribute::Device: {


What about variable like this:

module mod1 integer, constant :: a(10) end

Do they need to be registered?

Something like this:

module devmod use iso_c_binding, only: c_int integer, parameter :: amazon(10) = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] integer, constant :: google(10)

gets lowered to, in HLFIR:

fir.global @_QMdevmodECamazon(dense<0> : tensor<10xi32>) constant : !fir.array<10xi32> fir.global @_QMdevmodEgoogle {data_attr = #cuf.cuda<constant>} : !fir.array<10xi32> {

I am not finding a whole lot about the constant attribute. Is it new? From online research, it seems to just be a protection against modification such as a pointer to constant data. If it is the case, then it should be implemented like a regular global.

The array parameter seems to be handled fine with this change. It will get the #cuf.cuda<constant> assigned later in the flow and will be copied to the GPU module with its initializer.

It is defined in the programming guide: https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/index.html#cfpg-var-qual-attr-constant

Maybe we wrongly add the #cuf.cuda<constant> in the implicit global pass.

Device constant data may not be assigned or modified in any device subprogram, but may be modified in host subprogram

It seems to be more like a global with some constraints on the device side.

Yeah but shouldn't we register them as well?

Yes. So, we should either revisit how #cuf.cuda<constant> is applied or how constant fir.global are handled. Whichever change we are going to apply is going to ripple through the flow. What would be the best way to handle it? Maybe we only apply the attribute to reflect the user setting?

Maybe we only apply the attribute to reflect the user setting?

Yeah it's probably nicer and if we need we can add an implicit attribute or just default to device. I guess #cuf.cuda<constant> and constant in fir.global are not really synonym. We can handle this is a follow up patch when we have a real test with the CUDA attribute constant.

…f.cuda<constant> attribute (#118912) 1. In `CufOpConversion` `isDeviceGlobal` was renamed `isRegisteredGlobal` and moved to the common file. `isRegisteredGlobal` excludes constant `fir.global` operation from registration. This is to avoid calls to `_FortranACUFGetDeviceAddress` on globals which do not have any symbols in the runtime. This was done for `_FortranACUFRegisterVariable` in #118582, but also needs to be done here after #118591 2. `CufDeviceGlobal` no longer adds the `#cuf.cuda<constant>` attribute to the constant global. As discussed in #118582 a module variable with the #cuf.cuda<constant> attribute is not a compile time constant. Yet, the compile time constant also needs to be copied into the GPU module. The candidates for copy to the GPU modules are - the globals needing regsitrations regardless of their uses in device code (they can be referred to in host code as well) - the compile time constant when used in device code 3. The registration of "constant" module device variables ( #cuf.cuda<constant>) can be restored in `CufAddConstructor`

Do not register global constants

bdfe280

Renaud-K requested a review from clementval December 4, 2024 03:03

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Dec 4, 2024

Renaud-K requested review from wangzpgi and jeanPerier December 4, 2024 03:03

clementval approved these changes Dec 4, 2024

View reviewed changes

jeanPerier approved these changes Dec 4, 2024

View reviewed changes

Renaud-K merged commit ed2db3b into llvm:main Dec 4, 2024
11 checks passed

clementval reviewed Dec 4, 2024

View reviewed changes

Renaud-K mentioned this pull request Dec 6, 2024

[flang][cuda] Distinguish constant fir.global from globals with a #cuf.cuda<constant> attribute #118912

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[flang][cuda] Do not register global constants #118582

[flang][cuda] Do not register global constants #118582

Uh oh!

Renaud-K commented Dec 4, 2024

Uh oh!

llvmbot commented Dec 4, 2024

Uh oh!

clementval left a comment

Uh oh!

Uh oh!

clementval Dec 4, 2024

Uh oh!

Renaud-K Dec 4, 2024

Uh oh!

clementval Dec 4, 2024

Uh oh!

Renaud-K Dec 4, 2024

Uh oh!

clementval Dec 4, 2024

Uh oh!

Renaud-K Dec 4, 2024

Uh oh!

clementval Dec 4, 2024

Uh oh!

Uh oh!

[flang][cuda] Do not register global constants #118582

[flang][cuda] Do not register global constants #118582

Uh oh!

Conversation

Renaud-K commented Dec 4, 2024

Uh oh!

llvmbot commented Dec 4, 2024

Uh oh!

clementval left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

clementval Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

Renaud-K Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

clementval Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

Renaud-K Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

clementval Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

Renaud-K Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

clementval Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!