Skip to content

[flang][cuda] Do not register global constants #118582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions flang/lib/Optimizer/Transforms/CUFAddConstructor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,7 @@ struct CUFAddConstructor

mlir::func::FuncOp func;
switch (attr.getValue()) {
case cuf::DataAttribute::Device:
case cuf::DataAttribute::Constant: {
case cuf::DataAttribute::Device: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about variable like this:

module mod1
  integer, constant :: a(10)
end

Do they need to be registered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this:

module devmod
    use iso_c_binding, only: c_int
    integer, parameter :: amazon(10) = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    integer, constant :: google(10)

gets lowered to, in HLFIR:

  fir.global @_QMdevmodECamazon(dense<0> : tensor<10xi32>) constant : !fir.array<10xi32>
  fir.global @_QMdevmodEgoogle {data_attr = #cuf.cuda<constant>} : !fir.array<10xi32> {

I am not finding a whole lot about the constant attribute. Is it new? From online research, it seems to just be a protection against modification such as a pointer to constant data. If it is the case, then it should be implemented like a regular global.

The array parameter seems to be handled fine with this change. It will get the #cuf.cuda<constant> assigned later in the flow and will be copied to the GPU module with its initializer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is defined in the programming guide: https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/index.html#cfpg-var-qual-attr-constant

Maybe we wrongly add the #cuf.cuda<constant> in the implicit global pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Device constant data may not be assigned or modified in any device subprogram, but may be modified in host subprogram

It seems to be more like a global with some constraints on the device side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but shouldn't we register them as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. So, we should either revisit how #cuf.cuda<constant> is applied or how constant fir.global are handled. Whichever change we are going to apply is going to ripple through the flow. What would be the best way to handle it? Maybe we only apply the attribute to reflect the user setting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we only apply the attribute to reflect the user setting?

Yeah it's probably nicer and if we need we can add an implicit attribute or just default to device. I guess #cuf.cuda<constant> and constant in fir.global are not really synonym. We can handle this is a follow up patch when we have a real test with the CUDA attribute constant.

func = fir::runtime::getRuntimeFunc<mkRTKey(CUFRegisterVariable)>(
loc, builder);
auto fTy = func.getFunctionType();
Expand Down Expand Up @@ -145,8 +144,6 @@ struct CUFAddConstructor
default:
break;
}
if (!func)
continue;
}
}
builder.create<mlir::LLVM::ReturnOp>(loc, mlir::ValueRange{});
Expand Down
30 changes: 30 additions & 0 deletions flang/test/Fir/CUDA/cuda-constructor-2.f90
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,33 @@ module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<!llvm.ptr, dense<
// CHECK-DAG: %[[BOXREF:.*]] = fir.convert %[[BOX]] : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) -> !fir.ref<i8>
// CHECK-DAG: fir.call @_FortranACUFRegisterVariable(%[[MODULE:.*]], %[[BOXREF]], %{{.*}}, %{{.*}})
//

// -----

// Checking that constant global variables are not registered

// CHECK: @_FortranACUFRegisterAllocator
// CHECK-NOT: fir.call @_FortranACUFRegisterVariable

module attributes {dlti.dl_spec = #dlti.dl_spec<i8 = dense<8> : vector<2xi64>, i16 = dense<16> : vector<2xi64>, i1 = dense<8> : vector<2xi64>, !llvm.ptr = dense<64> : vector<4xi64>, f80 = dense<128> : vector<2xi64>, i128 = dense<128> : vector<2xi64>, i64 = dense<64> : vector<2xi64>, !llvm.ptr<271> = dense<32> : vector<4xi64>, !llvm.ptr<272> = dense<64> : vector<4xi64>, f128 = dense<128> : vector<2xi64>, !llvm.ptr<270> = dense<32> : vector<4xi64>, f16 = dense<16> : vector<2xi64>, f64 = dense<64> : vector<2xi64>, i32 = dense<32> : vector<2xi64>, "dlti.stack_alignment" = 128 : i64, "dlti.endianness" = "little">, fir.defaultkind = "a1c4d8i4l4r4", fir.kindmap = "", gpu.container_module, llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", llvm.ident = "flang version 20.0.0 (https://github.com/llvm/llvm-project.git 3372303188df0f7f8ac26e7ab610cf8b0f716d42)", llvm.target_triple = "x86_64-unknown-linux-gnu"} {
fir.global @_QMiso_c_bindingECc_int {data_attr = #cuf.cuda<constant>} constant : i32


fir.type_info @_QM__fortran_builtinsT__builtin_c_ptr noinit nodestroy nofinal : !fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}>
gpu.module @cuda_device_mod {
fir.global @_QMiso_c_bindingECc_int {data_attr = #cuf.cuda<constant>} constant : i32
gpu.func @_QMdevmodPdevsub(%arg0: !fir.ref<!fir.box<!fir.heap<!fir.array<?xf32>>>>) -> i32 {
%c0 = arith.constant 0 : index
%c4_i32 = arith.constant 4 : i32
%0 = fir.alloca i32 {bindc_name = "devsub", uniq_name = "_QMdevmodFdevsubEdevsub"}
%1 = fir.alloca i32 {bindc_name = "__builtin_warpsize", uniq_name = "_QM__fortran_builtinsEC__builtin_warpsize"}
%2 = fir.load %arg0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xf32>>>>
%3:3 = fir.box_dims %2, %c0 : (!fir.box<!fir.heap<!fir.array<?xf32>>>, index) -> (index, index, index)
%4 = fir.convert %3#1 : (index) -> i32
%5 = arith.muli %4, %c4_i32 : i32
fir.store %5 to %0 : !fir.ref<i32>
%6 = fir.load %0 : !fir.ref<i32>
gpu.return %6 : i32
}
}
}
Loading