Skip to content

[flang] Add ability to have special allocator for descriptor data #100690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Aug 1, 2024

Conversation

clementval
Copy link
Contributor

This patch enhances the descriptor with the ability to have specialized allocator. The allocators are registered in a dedicated registry and the index of the desired allocator is stored in the descriptor. The default allocator, std::malloc, is registered at index 0.

In order to have this allocator index in the descriptor, the f18Addendum field is repurposed to be able to hold the presence flag for the addendum (lsb) and the allocator index.

Since this is a change in the semantic and name of the 7th field of the descriptor, the CFI_VERSION is bumped to the date of the initial change.

This patch only adds the ability to have this features as part of the descriptor but does not add specific allocator yet. CUDA fortran will be the first user of this feature to allocate descriptor data in the different type of device memory base on the CUDA attribute.

@llvmbot
Copy link
Member

llvmbot commented Jul 26, 2024

@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-semantics

Author: Valentin Clement (バレンタイン クレメン) (clementval)

Changes

This patch enhances the descriptor with the ability to have specialized allocator. The allocators are registered in a dedicated registry and the index of the desired allocator is stored in the descriptor. The default allocator, std::malloc, is registered at index 0.

In order to have this allocator index in the descriptor, the f18Addendum field is repurposed to be able to hold the presence flag for the addendum (lsb) and the allocator index.

Since this is a change in the semantic and name of the 7th field of the descriptor, the CFI_VERSION is bumped to the date of the initial change.

This patch only adds the ability to have this features as part of the descriptor but does not add specific allocator yet. CUDA fortran will be the first user of this feature to allocate descriptor data in the different type of device memory base on the CUDA attribute.


Patch is 46.62 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100690.diff

23 Files Affected:

  • (modified) flang/include/flang/ISO_Fortran_binding.h (+10-2)
  • (modified) flang/include/flang/Optimizer/CodeGen/TBAABuilder.h (+3-3)
  • (modified) flang/include/flang/Optimizer/CodeGen/TypeConverter.h (+1-1)
  • (modified) flang/include/flang/Runtime/descriptor.h (+4-4)
  • (modified) flang/lib/Optimizer/CodeGen/CodeGen.cpp (+2-1)
  • (modified) flang/lib/Optimizer/CodeGen/TypeConverter.cpp (+3-3)
  • (modified) flang/runtime/CMakeLists.txt (+2)
  • (modified) flang/runtime/ISO_Fortran_util.h (+1-1)
  • (added) flang/runtime/allocator-registry.cpp (+42)
  • (added) flang/runtime/allocator-registry.h (+55)
  • (modified) flang/runtime/descriptor.cpp (+10-4)
  • (modified) flang/test/Fir/box.fir (+9-9)
  • (modified) flang/test/Fir/convert-to-llvm.fir (+7-8)
  • (modified) flang/test/Fir/embox-char.fir (+2-2)
  • (modified) flang/test/Fir/embox.fir (+4-4)
  • (modified) flang/test/Fir/ignore-missing-type-descriptor.fir (+1-1)
  • (modified) flang/test/Fir/polymorphic.fir (+4-4)
  • (modified) flang/test/Fir/rebox-global.fir (+1-1)
  • (modified) flang/test/Fir/rebox.fir (+1-1)
  • (modified) flang/test/Fir/tbaa.fir (+3-3)
  • (modified) flang/test/Fir/type-descriptor.fir (+2-2)
  • (modified) flang/test/Lower/allocatable-polymorphic.f90 (+3-3)
  • (modified) flang/unittests/Evaluate/ISO-Fortran-binding.cpp (+1)
diff --git a/flang/include/flang/ISO_Fortran_binding.h b/flang/include/flang/ISO_Fortran_binding.h
index 757d7f2b10cba..d8601a71f88e1 100644
--- a/flang/include/flang/ISO_Fortran_binding.h
+++ b/flang/include/flang/ISO_Fortran_binding.h
@@ -30,7 +30,7 @@
 #endif
 
 /* 18.5.4 */
-#define CFI_VERSION 20180515
+#define CFI_VERSION 20240719
 
 #define CFI_MAX_RANK 15
 typedef unsigned char CFI_rank_t;
@@ -146,7 +146,10 @@ extern "C++" template <typename T> struct FlexibleArray : T {
   CFI_rank_t rank; /* [0 .. CFI_MAX_RANK] */ \
   CFI_type_t type; \
   CFI_attribute_t attribute; \
-  unsigned char f18Addendum;
+  /* The lsb of this field indicates the presence of the f18Addendum. Other \
+   * bits are used to specify the index of the allocator used to managed \
+   * memory of the data hold by the descriptor. */ \
+  unsigned char extra;
 
 typedef struct CFI_cdesc_t {
   _CFI_CDESC_T_HEADER_MEMBERS
@@ -155,6 +158,11 @@ typedef struct CFI_cdesc_t {
 #else
   CFI_dim_t dim[]; /* must appear last */
 #endif
+
+#ifdef __cplusplus
+  RT_API_ATTRS inline bool HasAddendum() const { return extra & 1; }
+  RT_API_ATTRS inline int GetAllocIdx() const { return ((int)extra >> 1); }
+#endif
 } CFI_cdesc_t;
 
 /* 18.5.4 */
diff --git a/flang/include/flang/Optimizer/CodeGen/TBAABuilder.h b/flang/include/flang/Optimizer/CodeGen/TBAABuilder.h
index 5420e48146bbf..8032fb8fe3d56 100644
--- a/flang/include/flang/Optimizer/CodeGen/TBAABuilder.h
+++ b/flang/include/flang/Optimizer/CodeGen/TBAABuilder.h
@@ -55,7 +55,7 @@ namespace fir {
 //                  <@type_desc_3, 20>, // rank
 //                  <@type_desc_3, 21>, // type
 //                  <@type_desc_3, 22>, // attribute
-//                  <@type_desc_3, 23>} // f18Addendum
+//                  <@type_desc_3, 23>} // extra
 //   }
 //   llvm.tbaa_type_desc @type_desc_5 {
 //       id = "CFI_cdesc_t_dim1",
@@ -65,7 +65,7 @@ namespace fir {
 //                  <@type_desc_3, 20>, // rank
 //                  <@type_desc_3, 21>, // type
 //                  <@type_desc_3, 22>, // attribute
-//                  <@type_desc_3, 23>, // f18Addendum
+//                  <@type_desc_3, 23>, // extra
 //                  <@type_desc_3, 24>, // dim[0].lower_bound
 //                  <@type_desc_3, 32>, // dim[0].extent
 //                  <@type_desc_3, 40>} // dim[0].sm
@@ -78,7 +78,7 @@ namespace fir {
 //                  <@type_desc_3, 20>, // rank
 //                  <@type_desc_3, 21>, // type
 //                  <@type_desc_3, 22>, // attribute
-//                  <@type_desc_3, 23>, // f18Addendum
+//                  <@type_desc_3, 23>, // extra
 //                  <@type_desc_3, 24>, // dim[0].lower_bound
 //                  <@type_desc_3, 32>, // dim[0].extent
 //                  <@type_desc_3, 40>, // dim[0].sm
diff --git a/flang/include/flang/Optimizer/CodeGen/TypeConverter.h b/flang/include/flang/Optimizer/CodeGen/TypeConverter.h
index ece3e4e073f68..7215abf3a8f81 100644
--- a/flang/include/flang/Optimizer/CodeGen/TypeConverter.h
+++ b/flang/include/flang/Optimizer/CodeGen/TypeConverter.h
@@ -29,7 +29,7 @@ static constexpr unsigned kVersionPosInBox = 2;
 static constexpr unsigned kRankPosInBox = 3;
 static constexpr unsigned kTypePosInBox = 4;
 static constexpr unsigned kAttributePosInBox = 5;
-static constexpr unsigned kF18AddendumPosInBox = 6;
+static constexpr unsigned kExtraPosInBox = 6;
 static constexpr unsigned kDimsPosInBox = 7;
 static constexpr unsigned kOptTypePtrPosInBox = 8;
 static constexpr unsigned kOptRowTypePosInBox = 9;
diff --git a/flang/include/flang/Runtime/descriptor.h b/flang/include/flang/Runtime/descriptor.h
index 1b0b7e23ce6cc..301a1059b4096 100644
--- a/flang/include/flang/Runtime/descriptor.h
+++ b/flang/include/flang/Runtime/descriptor.h
@@ -92,8 +92,8 @@ class Dimension {
 // The storage for this object follows the last used dim[] entry in a
 // Descriptor (CFI_cdesc_t) generic descriptor.  Space matters here, since
 // descriptors serve as POINTER and ALLOCATABLE components of derived type
-// instances.  The presence of this structure is implied by the flag
-// CFI_cdesc_t.f18Addendum, and the number of elements in the len_[]
+// instances.  The presence of this structure is implied by the lsb of
+// CFI_cdesc_t.extra set to 1, and the number of elements in the len_[]
 // array is determined by derivedType_->LenParameters().
 class DescriptorAddendum {
 public:
@@ -339,14 +339,14 @@ class Descriptor {
       const SubscriptValue *, const int *permutation = nullptr) const;
 
   RT_API_ATTRS DescriptorAddendum *Addendum() {
-    if (raw_.f18Addendum != 0) {
+    if (raw_.HasAddendum()) {
       return reinterpret_cast<DescriptorAddendum *>(&GetDimension(rank()));
     } else {
       return nullptr;
     }
   }
   RT_API_ATTRS const DescriptorAddendum *Addendum() const {
-    if (raw_.f18Addendum != 0) {
+    if (raw_.HasAddendum()) {
       return reinterpret_cast<const DescriptorAddendum *>(
           &GetDimension(rank()));
     } else {
diff --git a/flang/lib/Optimizer/CodeGen/CodeGen.cpp b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
index f9ea92a843b23..77c17b005bdef 100644
--- a/flang/lib/Optimizer/CodeGen/CodeGen.cpp
+++ b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
@@ -1241,9 +1241,10 @@ struct EmboxCommonConversion : public fir::FIROpConversion<OP> {
     descriptor =
         insertField(rewriter, loc, descriptor, {kAttributePosInBox},
                     this->genI32Constant(loc, rewriter, getCFIAttr(boxTy)));
+
     const bool hasAddendum = fir::boxHasAddendum(boxTy);
     descriptor =
-        insertField(rewriter, loc, descriptor, {kF18AddendumPosInBox},
+        insertField(rewriter, loc, descriptor, {kExtraPosInBox},
                     this->genI32Constant(loc, rewriter, hasAddendum ? 1 : 0));
 
     if (hasAddendum) {
diff --git a/flang/lib/Optimizer/CodeGen/TypeConverter.cpp b/flang/lib/Optimizer/CodeGen/TypeConverter.cpp
index 59f67f4fcad44..1043494b6fb48 100644
--- a/flang/lib/Optimizer/CodeGen/TypeConverter.cpp
+++ b/flang/lib/Optimizer/CodeGen/TypeConverter.cpp
@@ -177,7 +177,7 @@ bool LLVMTypeConverter::requiresExtendedDesc(mlir::Type boxElementType) const {
 // the addendum defined in descriptor.h.
 mlir::Type LLVMTypeConverter::convertBoxTypeAsStruct(BaseBoxType box,
                                                      int rank) const {
-  // (base_addr*, elem_len, version, rank, type, attribute, f18Addendum, [dim]
+  // (base_addr*, elem_len, version, rank, type, attribute, extra, [dim]
   llvm::SmallVector<mlir::Type> dataDescFields;
   mlir::Type ele = box.getEleTy();
   // remove fir.heap/fir.ref/fir.ptr
@@ -206,9 +206,9 @@ mlir::Type LLVMTypeConverter::convertBoxTypeAsStruct(BaseBoxType box,
   // attribute
   dataDescFields.push_back(
       getDescFieldTypeModel<kAttributePosInBox>()(&getContext()));
-  // f18Addendum
+  // extra
   dataDescFields.push_back(
-      getDescFieldTypeModel<kF18AddendumPosInBox>()(&getContext()));
+      getDescFieldTypeModel<kExtraPosInBox>()(&getContext()));
   // [dims]
   if (rank == unknownRank()) {
     if (auto seqTy = mlir::dyn_cast<SequenceType>(ele))
diff --git a/flang/runtime/CMakeLists.txt b/flang/runtime/CMakeLists.txt
index d587fd44b1678..1f3ae23dcbf12 100644
--- a/flang/runtime/CMakeLists.txt
+++ b/flang/runtime/CMakeLists.txt
@@ -107,6 +107,7 @@ add_subdirectory(Float128Math)
 
 set(sources
   ISO_Fortran_binding.cpp
+  allocator-registry.cpp
   allocatable.cpp
   array-constructor.cpp
   assign.cpp
@@ -178,6 +179,7 @@ include(AddFlangOffloadRuntime)
 set(supported_files
   ISO_Fortran_binding.cpp
   allocatable.cpp
+  allocator-registry.cpp
   array-constructor.cpp
   assign.cpp
   buffer.cpp
diff --git a/flang/runtime/ISO_Fortran_util.h b/flang/runtime/ISO_Fortran_util.h
index 469067600bd90..dd0eeef80bb89 100644
--- a/flang/runtime/ISO_Fortran_util.h
+++ b/flang/runtime/ISO_Fortran_util.h
@@ -86,7 +86,7 @@ static inline RT_API_ATTRS void EstablishDescriptor(CFI_cdesc_t *descriptor,
   descriptor->rank = rank;
   descriptor->type = type;
   descriptor->attribute = attribute;
-  descriptor->f18Addendum = 0;
+  descriptor->extra = 0;
   std::size_t byteSize{elem_len};
   constexpr std::size_t lower_bound{0};
   if (base_addr) {
diff --git a/flang/runtime/allocator-registry.cpp b/flang/runtime/allocator-registry.cpp
new file mode 100644
index 0000000000000..461756d2ba95d
--- /dev/null
+++ b/flang/runtime/allocator-registry.cpp
@@ -0,0 +1,42 @@
+//===-- runtime/allocator-registry.cpp ------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "allocator-registry.h"
+#include "terminator.h"
+
+namespace Fortran::runtime {
+
+#ifndef FLANG_RUNTIME_NO_GLOBAL_VAR_DEFS
+RT_OFFLOAD_VAR_GROUP_BEGIN
+RT_VAR_ATTRS AllocatorRegistry allocatorRegistry;
+RT_OFFLOAD_VAR_GROUP_END
+#endif // FLANG_RUNTIME_NO_GLOBAL_VAR_DEFS
+
+RT_OFFLOAD_API_GROUP_BEGIN
+RT_API_ATTRS void AllocatorRegistry::Register(int pos, Allocator_t allocator) {
+  // pos 0 is reserved for the default allocator and is registered in the
+  // struct ctor.
+  INTERNAL_CHECK(pos > 0 && pos < MAX_ALLOCATOR);
+  allocators[pos] = allocator;
+}
+
+RT_API_ATTRS AllocFct AllocatorRegistry::GetAllocator(int pos) {
+  INTERNAL_CHECK(pos >= 0 && pos < MAX_ALLOCATOR);
+  AllocFct f{allocators[pos].alloc};
+  INTERNAL_CHECK(f != nullptr);
+  return f;
+}
+
+RT_API_ATTRS FreeFct AllocatorRegistry::GetDeallocator(int pos) {
+  INTERNAL_CHECK(pos >= 0 && pos < MAX_ALLOCATOR);
+  FreeFct f{allocators[pos].free};
+  INTERNAL_CHECK(f != nullptr);
+  return f;
+}
+RT_OFFLOAD_API_GROUP_END
+} // namespace Fortran::runtime
diff --git a/flang/runtime/allocator-registry.h b/flang/runtime/allocator-registry.h
new file mode 100644
index 0000000000000..3e941ea986f24
--- /dev/null
+++ b/flang/runtime/allocator-registry.h
@@ -0,0 +1,55 @@
+//===-- runtime/allocator.h -------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef FORTRAN_RUNTIME_ALLOCATOR_H_
+#define FORTRAN_RUNTIME_ALLOCATOR_H_
+
+#include "flang/Common/api-attrs.h"
+#include <cstdlib>
+#include <vector>
+
+#define MAX_ALLOCATOR 5
+
+namespace Fortran::runtime {
+
+using AllocFct = void *(*)(std::size_t);
+using FreeFct = void (*)(void *);
+
+typedef struct Allocator_t {
+  AllocFct alloc{nullptr};
+  FreeFct free{nullptr};
+} Allocator_t;
+
+#ifdef RT_DEVICE_COMPILATION
+static RT_API_ATTRS void *MallocWrapper(std::size_t size) {
+  return std::malloc(size);
+}
+static RT_API_ATTRS void FreeWrapper(void *p) { return std::free(p); }
+#endif
+
+struct AllocatorRegistry {
+#ifdef RT_DEVICE_COMPILATION
+  RT_API_ATTRS constexpr AllocatorRegistry()
+      : allocators{{&MallocWrapper, &FreeWrapper}} {}
+#else
+  constexpr AllocatorRegistry() { allocators[0] = {&std::malloc, &std::free}; };
+#endif
+  RT_API_ATTRS void Register(int, Allocator_t);
+  RT_API_ATTRS AllocFct GetAllocator(int pos);
+  RT_API_ATTRS FreeFct GetDeallocator(int pos);
+
+  Allocator_t allocators[MAX_ALLOCATOR];
+};
+
+RT_OFFLOAD_VAR_GROUP_BEGIN
+extern RT_VAR_ATTRS AllocatorRegistry allocatorRegistry;
+RT_OFFLOAD_VAR_GROUP_END
+
+} // namespace Fortran::runtime
+
+#endif // FORTRAN_RUNTIME_ALLOCATOR_H_
diff --git a/flang/runtime/descriptor.cpp b/flang/runtime/descriptor.cpp
index 9b04cb4f8d0d0..1bf3ebbc00335 100644
--- a/flang/runtime/descriptor.cpp
+++ b/flang/runtime/descriptor.cpp
@@ -8,6 +8,7 @@
 
 #include "flang/Runtime/descriptor.h"
 #include "ISO_Fortran_util.h"
+#include "allocator-registry.h"
 #include "derived.h"
 #include "memory.h"
 #include "stat.h"
@@ -50,7 +51,9 @@ RT_API_ATTRS void Descriptor::Establish(TypeCode t, std::size_t elementBytes,
       GetDimension(j).SetByteStride(0);
     }
   }
-  raw_.f18Addendum = addendum;
+  if (addendum) {
+    raw_.extra = raw_.extra | 1;
+  }
   DescriptorAddendum *a{Addendum()};
   RUNTIME_CHECK(terminator, addendum == (a != nullptr));
   if (a) {
@@ -162,7 +165,9 @@ RT_API_ATTRS int Descriptor::Allocate() {
   // Zero size allocation is possible in Fortran and the resulting
   // descriptor must be allocated/associated. Since std::malloc(0)
   // result is implementation defined, always allocate at least one byte.
-  void *p{byteSize ? std::malloc(byteSize) : std::malloc(1)};
+
+  AllocFct alloc{allocatorRegistry.GetAllocator(raw_.GetAllocIdx())};
+  void *p{alloc(byteSize ? byteSize : 1)};
   if (!p) {
     return CFI_ERROR_MEM_ALLOCATION;
   }
@@ -204,7 +209,8 @@ RT_API_ATTRS int Descriptor::Deallocate() {
   if (!descriptor.base_addr) {
     return CFI_ERROR_BASE_ADDR_NULL;
   } else {
-    std::free(descriptor.base_addr);
+    FreeFct free{allocatorRegistry.GetDeallocator(descriptor.GetAllocIdx())};
+    free(descriptor.base_addr);
     descriptor.base_addr = nullptr;
     return CFI_SUCCESS;
   }
@@ -290,7 +296,7 @@ void Descriptor::Dump(FILE *f) const {
   std::fprintf(f, "  rank      %d\n", static_cast<int>(raw_.rank));
   std::fprintf(f, "  type      %d\n", static_cast<int>(raw_.type));
   std::fprintf(f, "  attribute %d\n", static_cast<int>(raw_.attribute));
-  std::fprintf(f, "  addendum  %d\n", static_cast<int>(raw_.f18Addendum));
+  std::fprintf(f, "  extra     %d\n", static_cast<int>(raw_.extra));
   for (int j{0}; j < raw_.rank; ++j) {
     std::fprintf(f, "  dim[%d] lower_bound %jd\n", j,
         static_cast<std::intmax_t>(raw_.dim[j].lower_bound));
diff --git a/flang/test/Fir/box.fir b/flang/test/Fir/box.fir
index 1a9c9e0496814..81a4d8bc13bf0 100644
--- a/flang/test/Fir/box.fir
+++ b/flang/test/Fir/box.fir
@@ -1,7 +1,7 @@
 // RUN: tco -o - %s | FileCheck %s
 
 // Global box initialization (test must come first because llvm globals are emitted first).
-// CHECK-LABEL: @globalx = internal global { ptr, i64, i32, i8, i8, i8, i8 } { ptr null, i64 ptrtoint (ptr getelementptr (i32, ptr null, i32 1) to i64), i32 20180515, i8 0, i8 9, i8 2, i8 0 }
+// CHECK-LABEL: @globalx = internal global { ptr, i64, i32, i8, i8, i8, i8 } { ptr null, i64 ptrtoint (ptr getelementptr (i32, ptr null, i32 1) to i64), i32 20240719, i8 0, i8 9, i8 2, i8 0 }
 fir.global internal @globalx : !fir.box<!fir.heap<i32>> {
   %c0 = arith.constant 0 : index
   %0 = fir.convert %c0 : (index) -> !fir.heap<i32>
@@ -9,7 +9,7 @@ fir.global internal @globalx : !fir.box<!fir.heap<i32>> {
   fir.has_value %1 : !fir.box<!fir.heap<i32>>
 }
 
-// CHECK-LABEL: @globaly = internal global { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } { ptr null, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20180515, i8 1, i8 27, i8 2, i8 0,{{.*}}[3 x i64] [i64 1, i64 0, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64)]
+// CHECK-LABEL: @globaly = internal global { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } { ptr null, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20240719, i8 1, i8 27, i8 2, i8 0,{{.*}}[3 x i64] [i64 1, i64 0, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64)]
 fir.global internal @globaly : !fir.box<!fir.heap<!fir.array<?xf32>>> {
   %c0 = arith.constant 0 : index
   %0 = fir.convert %c0 : (index) -> !fir.heap<!fir.array<?xf32>>
@@ -27,7 +27,7 @@ func.func private @ga(%b : !fir.box<!fir.array<?xf32>>)
 // CHECK: (ptr %[[ARG:.*]])
 func.func @f(%a : !fir.ref<f32>) {
   // CHECK: %[[DESC:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
-  // CHECK: %[[INS0:.*]] = insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20180515, i8 0, i8 27, i8 0, i8 0 }, ptr %[[ARG]], 0
+  // CHECK: %[[INS0:.*]] = insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20240719, i8 0, i8 27, i8 0, i8 0 }, ptr %[[ARG]], 0
   // CHECK: store {{.*}} %[[INS0]], {{.*}} %[[DESC]]
   %b = fir.embox %a : (!fir.ref<f32>) -> !fir.box<f32>
 
@@ -44,7 +44,7 @@ func.func @fa(%a : !fir.ref<!fir.array<100xf32>>) {
   %c1 = arith.constant 1 : index
   %c100 = arith.constant 100 : index
   %d = fir.shape %c100 : (index) -> !fir.shape<1>
-  // CHECK: %[[INS70:.*]] = insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20180515, i8 1, i8 27, i8 0, i8 0, {{.*}} }, ptr %{{.*}}, 0
+  // CHECK: %[[INS70:.*]] = insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20240719, i8 1, i8 27, i8 0, i8 0, {{.*}} }, ptr %{{.*}}, 0
   %b = fir.embox %c(%d) : (!fir.ref<!fir.array<?xf32>>, !fir.shape<1>) -> !fir.box<!fir.array<?xf32>>
   // CHECK: call void @ga(
   fir.call @ga(%b) : (!fir.box<!fir.array<?xf32>>) -> ()
@@ -58,7 +58,7 @@ func.func @fa(%a : !fir.ref<!fir.array<100xf32>>) {
 func.func @b1(%arg0 : !fir.ref<!fir.char<1,?>>, %arg1 : index) -> !fir.box<!fir.char<1,?>> {
   // CHECK: %[[size:.*]] = mul i64 ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64), %[[arg1]]
   // CHECK: insertvalue {{.*}} undef, i64 %[[size]], 1
-  // CHECK: insertvalue {{.*}} i32 20180515, 2
+  // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
   %x = fir.embox %arg0 typeparams %arg1 : (!fir.ref<!fir.char<1,?>>, index) -> !fir.box<!fir.char<1,?>>
   // CHECK: store {{.*}}, ptr %[[res]]
@@ -71,7 +71,7 @@ func.func @b1(%arg0 : !fir.ref<!fir.char<1,?>>, %arg1 : index) -> !fir.box<!fir.
 // CHECK-SAME: ptr %[[arg0:.*]], i64 %[[arg1:.*]])
 func.func @b2(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,5>>>, %arg1 : index) -> !fir.box<!fir.array<?x!fir.char<1,5>>> {
   %1 = fir.shape %arg1 : (index) -> !fir.shape<1>
-  // CHECK: insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr ([5 x i8], ptr null, i32 1) to i64), i32 20180515, i8 1, i8 40, i8 0, i8 0, {{.*}} }, i64 %[[arg1]], 7, 0, 1
+  // CHECK: insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr ([5 x i8], ptr null, i32 1) to i64), i32 20240719, i8 1, i8 40, i8 0, i8 0, {{.*}} }, i64 %[[arg1]], 7, 0, 1
   // CHECK: insertvalue {{.*}} %{{.*}}, i64 ptrtoint (ptr getelementptr ([5 x i8], ptr null, i32 1) to i64), 7, 0, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
   %2 = fir.embox %arg0(%1) : (!fir.ref<!fir.array<?x!fir.char<1,5>>>, !fir.shape<1>) -> !fir.box<!fir.array<?x!fir.char<1,5>>>
@@ -86,7 +86,7 @@ func.func @b3(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,?>>>, %arg1 : index, %ar
   %1 = fir.shape %arg2 : (index) -> !fir.shape<1>
   // CHECK: %[[size:.*]] = mul i64 ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64), %[[arg1]]
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
-  // CHECK: insertvalue {{.*}} i32 20180515, 2
+  // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 %[[arg2]], 7, 0, 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 7, 0, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
@@ -103,7 +103,7 @@ func.func @b4(%arg0 : !fir.ref<!fir.array<7x!fir.char<1,?>>>, %arg1 : index) ->
   %1 = fir.shape %c_7 : (index) -> !fir.shape<1>
   // CHECK:   %[[size:.*]] = mul i64 ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64), %[[arg1]]
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
-  // CHECK: insertvalue {{.*}} i32 20180515, 2
+  // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 7, 7, 0, 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 7, 0, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
@@ -147,7 +147,7 @@ func.func @box6(%0 : !fir.ref<!fir.array<?x?x?x?xf32>>, %1 : index, %2 : index)
   // CHECK: %[[sdp2:.*]] = sdiv i64 %[[dp2]], 2
   // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[sdp2]], 0
   // CHECK: %[[extent:.*]] = select i1 %[[cmp]], i64 %[[sdp2]], i64 0
-  // CHECK: insertvalue { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] } { ptr undef, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20180515, i8 2, i8 27, i8 0, i8 0, [2 x [3 x i64]] [{{\[}}3 x i64] [i64 1, i64 undef, i64 undef], [3 x i64] undef] }, i64 %[[extent]], 7, 0, 1
+  // CHECK: insertvalue { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] } { ptr undef, i64 ptrtoint...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jul 26, 2024

@llvm/pr-subscribers-flang-runtime

Author: Valentin Clement (バレンタイン クレメン) (clementval)

Changes

This patch enhances the descriptor with the ability to have specialized allocator. The allocators are registered in a dedicated registry and the index of the desired allocator is stored in the descriptor. The default allocator, std::malloc, is registered at index 0.

In order to have this allocator index in the descriptor, the f18Addendum field is repurposed to be able to hold the presence flag for the addendum (lsb) and the allocator index.

Since this is a change in the semantic and name of the 7th field of the descriptor, the CFI_VERSION is bumped to the date of the initial change.

This patch only adds the ability to have this features as part of the descriptor but does not add specific allocator yet. CUDA fortran will be the first user of this feature to allocate descriptor data in the different type of device memory base on the CUDA attribute.


Patch is 46.62 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100690.diff

23 Files Affected:

  • (modified) flang/include/flang/ISO_Fortran_binding.h (+10-2)
  • (modified) flang/include/flang/Optimizer/CodeGen/TBAABuilder.h (+3-3)
  • (modified) flang/include/flang/Optimizer/CodeGen/TypeConverter.h (+1-1)
  • (modified) flang/include/flang/Runtime/descriptor.h (+4-4)
  • (modified) flang/lib/Optimizer/CodeGen/CodeGen.cpp (+2-1)
  • (modified) flang/lib/Optimizer/CodeGen/TypeConverter.cpp (+3-3)
  • (modified) flang/runtime/CMakeLists.txt (+2)
  • (modified) flang/runtime/ISO_Fortran_util.h (+1-1)
  • (added) flang/runtime/allocator-registry.cpp (+42)
  • (added) flang/runtime/allocator-registry.h (+55)
  • (modified) flang/runtime/descriptor.cpp (+10-4)
  • (modified) flang/test/Fir/box.fir (+9-9)
  • (modified) flang/test/Fir/convert-to-llvm.fir (+7-8)
  • (modified) flang/test/Fir/embox-char.fir (+2-2)
  • (modified) flang/test/Fir/embox.fir (+4-4)
  • (modified) flang/test/Fir/ignore-missing-type-descriptor.fir (+1-1)
  • (modified) flang/test/Fir/polymorphic.fir (+4-4)
  • (modified) flang/test/Fir/rebox-global.fir (+1-1)
  • (modified) flang/test/Fir/rebox.fir (+1-1)
  • (modified) flang/test/Fir/tbaa.fir (+3-3)
  • (modified) flang/test/Fir/type-descriptor.fir (+2-2)
  • (modified) flang/test/Lower/allocatable-polymorphic.f90 (+3-3)
  • (modified) flang/unittests/Evaluate/ISO-Fortran-binding.cpp (+1)
diff --git a/flang/include/flang/ISO_Fortran_binding.h b/flang/include/flang/ISO_Fortran_binding.h
index 757d7f2b10cba..d8601a71f88e1 100644
--- a/flang/include/flang/ISO_Fortran_binding.h
+++ b/flang/include/flang/ISO_Fortran_binding.h
@@ -30,7 +30,7 @@
 #endif
 
 /* 18.5.4 */
-#define CFI_VERSION 20180515
+#define CFI_VERSION 20240719
 
 #define CFI_MAX_RANK 15
 typedef unsigned char CFI_rank_t;
@@ -146,7 +146,10 @@ extern "C++" template <typename T> struct FlexibleArray : T {
   CFI_rank_t rank; /* [0 .. CFI_MAX_RANK] */ \
   CFI_type_t type; \
   CFI_attribute_t attribute; \
-  unsigned char f18Addendum;
+  /* The lsb of this field indicates the presence of the f18Addendum. Other \
+   * bits are used to specify the index of the allocator used to managed \
+   * memory of the data hold by the descriptor. */ \
+  unsigned char extra;
 
 typedef struct CFI_cdesc_t {
   _CFI_CDESC_T_HEADER_MEMBERS
@@ -155,6 +158,11 @@ typedef struct CFI_cdesc_t {
 #else
   CFI_dim_t dim[]; /* must appear last */
 #endif
+
+#ifdef __cplusplus
+  RT_API_ATTRS inline bool HasAddendum() const { return extra & 1; }
+  RT_API_ATTRS inline int GetAllocIdx() const { return ((int)extra >> 1); }
+#endif
 } CFI_cdesc_t;
 
 /* 18.5.4 */
diff --git a/flang/include/flang/Optimizer/CodeGen/TBAABuilder.h b/flang/include/flang/Optimizer/CodeGen/TBAABuilder.h
index 5420e48146bbf..8032fb8fe3d56 100644
--- a/flang/include/flang/Optimizer/CodeGen/TBAABuilder.h
+++ b/flang/include/flang/Optimizer/CodeGen/TBAABuilder.h
@@ -55,7 +55,7 @@ namespace fir {
 //                  <@type_desc_3, 20>, // rank
 //                  <@type_desc_3, 21>, // type
 //                  <@type_desc_3, 22>, // attribute
-//                  <@type_desc_3, 23>} // f18Addendum
+//                  <@type_desc_3, 23>} // extra
 //   }
 //   llvm.tbaa_type_desc @type_desc_5 {
 //       id = "CFI_cdesc_t_dim1",
@@ -65,7 +65,7 @@ namespace fir {
 //                  <@type_desc_3, 20>, // rank
 //                  <@type_desc_3, 21>, // type
 //                  <@type_desc_3, 22>, // attribute
-//                  <@type_desc_3, 23>, // f18Addendum
+//                  <@type_desc_3, 23>, // extra
 //                  <@type_desc_3, 24>, // dim[0].lower_bound
 //                  <@type_desc_3, 32>, // dim[0].extent
 //                  <@type_desc_3, 40>} // dim[0].sm
@@ -78,7 +78,7 @@ namespace fir {
 //                  <@type_desc_3, 20>, // rank
 //                  <@type_desc_3, 21>, // type
 //                  <@type_desc_3, 22>, // attribute
-//                  <@type_desc_3, 23>, // f18Addendum
+//                  <@type_desc_3, 23>, // extra
 //                  <@type_desc_3, 24>, // dim[0].lower_bound
 //                  <@type_desc_3, 32>, // dim[0].extent
 //                  <@type_desc_3, 40>, // dim[0].sm
diff --git a/flang/include/flang/Optimizer/CodeGen/TypeConverter.h b/flang/include/flang/Optimizer/CodeGen/TypeConverter.h
index ece3e4e073f68..7215abf3a8f81 100644
--- a/flang/include/flang/Optimizer/CodeGen/TypeConverter.h
+++ b/flang/include/flang/Optimizer/CodeGen/TypeConverter.h
@@ -29,7 +29,7 @@ static constexpr unsigned kVersionPosInBox = 2;
 static constexpr unsigned kRankPosInBox = 3;
 static constexpr unsigned kTypePosInBox = 4;
 static constexpr unsigned kAttributePosInBox = 5;
-static constexpr unsigned kF18AddendumPosInBox = 6;
+static constexpr unsigned kExtraPosInBox = 6;
 static constexpr unsigned kDimsPosInBox = 7;
 static constexpr unsigned kOptTypePtrPosInBox = 8;
 static constexpr unsigned kOptRowTypePosInBox = 9;
diff --git a/flang/include/flang/Runtime/descriptor.h b/flang/include/flang/Runtime/descriptor.h
index 1b0b7e23ce6cc..301a1059b4096 100644
--- a/flang/include/flang/Runtime/descriptor.h
+++ b/flang/include/flang/Runtime/descriptor.h
@@ -92,8 +92,8 @@ class Dimension {
 // The storage for this object follows the last used dim[] entry in a
 // Descriptor (CFI_cdesc_t) generic descriptor.  Space matters here, since
 // descriptors serve as POINTER and ALLOCATABLE components of derived type
-// instances.  The presence of this structure is implied by the flag
-// CFI_cdesc_t.f18Addendum, and the number of elements in the len_[]
+// instances.  The presence of this structure is implied by the lsb of
+// CFI_cdesc_t.extra set to 1, and the number of elements in the len_[]
 // array is determined by derivedType_->LenParameters().
 class DescriptorAddendum {
 public:
@@ -339,14 +339,14 @@ class Descriptor {
       const SubscriptValue *, const int *permutation = nullptr) const;
 
   RT_API_ATTRS DescriptorAddendum *Addendum() {
-    if (raw_.f18Addendum != 0) {
+    if (raw_.HasAddendum()) {
       return reinterpret_cast<DescriptorAddendum *>(&GetDimension(rank()));
     } else {
       return nullptr;
     }
   }
   RT_API_ATTRS const DescriptorAddendum *Addendum() const {
-    if (raw_.f18Addendum != 0) {
+    if (raw_.HasAddendum()) {
       return reinterpret_cast<const DescriptorAddendum *>(
           &GetDimension(rank()));
     } else {
diff --git a/flang/lib/Optimizer/CodeGen/CodeGen.cpp b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
index f9ea92a843b23..77c17b005bdef 100644
--- a/flang/lib/Optimizer/CodeGen/CodeGen.cpp
+++ b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
@@ -1241,9 +1241,10 @@ struct EmboxCommonConversion : public fir::FIROpConversion<OP> {
     descriptor =
         insertField(rewriter, loc, descriptor, {kAttributePosInBox},
                     this->genI32Constant(loc, rewriter, getCFIAttr(boxTy)));
+
     const bool hasAddendum = fir::boxHasAddendum(boxTy);
     descriptor =
-        insertField(rewriter, loc, descriptor, {kF18AddendumPosInBox},
+        insertField(rewriter, loc, descriptor, {kExtraPosInBox},
                     this->genI32Constant(loc, rewriter, hasAddendum ? 1 : 0));
 
     if (hasAddendum) {
diff --git a/flang/lib/Optimizer/CodeGen/TypeConverter.cpp b/flang/lib/Optimizer/CodeGen/TypeConverter.cpp
index 59f67f4fcad44..1043494b6fb48 100644
--- a/flang/lib/Optimizer/CodeGen/TypeConverter.cpp
+++ b/flang/lib/Optimizer/CodeGen/TypeConverter.cpp
@@ -177,7 +177,7 @@ bool LLVMTypeConverter::requiresExtendedDesc(mlir::Type boxElementType) const {
 // the addendum defined in descriptor.h.
 mlir::Type LLVMTypeConverter::convertBoxTypeAsStruct(BaseBoxType box,
                                                      int rank) const {
-  // (base_addr*, elem_len, version, rank, type, attribute, f18Addendum, [dim]
+  // (base_addr*, elem_len, version, rank, type, attribute, extra, [dim]
   llvm::SmallVector<mlir::Type> dataDescFields;
   mlir::Type ele = box.getEleTy();
   // remove fir.heap/fir.ref/fir.ptr
@@ -206,9 +206,9 @@ mlir::Type LLVMTypeConverter::convertBoxTypeAsStruct(BaseBoxType box,
   // attribute
   dataDescFields.push_back(
       getDescFieldTypeModel<kAttributePosInBox>()(&getContext()));
-  // f18Addendum
+  // extra
   dataDescFields.push_back(
-      getDescFieldTypeModel<kF18AddendumPosInBox>()(&getContext()));
+      getDescFieldTypeModel<kExtraPosInBox>()(&getContext()));
   // [dims]
   if (rank == unknownRank()) {
     if (auto seqTy = mlir::dyn_cast<SequenceType>(ele))
diff --git a/flang/runtime/CMakeLists.txt b/flang/runtime/CMakeLists.txt
index d587fd44b1678..1f3ae23dcbf12 100644
--- a/flang/runtime/CMakeLists.txt
+++ b/flang/runtime/CMakeLists.txt
@@ -107,6 +107,7 @@ add_subdirectory(Float128Math)
 
 set(sources
   ISO_Fortran_binding.cpp
+  allocator-registry.cpp
   allocatable.cpp
   array-constructor.cpp
   assign.cpp
@@ -178,6 +179,7 @@ include(AddFlangOffloadRuntime)
 set(supported_files
   ISO_Fortran_binding.cpp
   allocatable.cpp
+  allocator-registry.cpp
   array-constructor.cpp
   assign.cpp
   buffer.cpp
diff --git a/flang/runtime/ISO_Fortran_util.h b/flang/runtime/ISO_Fortran_util.h
index 469067600bd90..dd0eeef80bb89 100644
--- a/flang/runtime/ISO_Fortran_util.h
+++ b/flang/runtime/ISO_Fortran_util.h
@@ -86,7 +86,7 @@ static inline RT_API_ATTRS void EstablishDescriptor(CFI_cdesc_t *descriptor,
   descriptor->rank = rank;
   descriptor->type = type;
   descriptor->attribute = attribute;
-  descriptor->f18Addendum = 0;
+  descriptor->extra = 0;
   std::size_t byteSize{elem_len};
   constexpr std::size_t lower_bound{0};
   if (base_addr) {
diff --git a/flang/runtime/allocator-registry.cpp b/flang/runtime/allocator-registry.cpp
new file mode 100644
index 0000000000000..461756d2ba95d
--- /dev/null
+++ b/flang/runtime/allocator-registry.cpp
@@ -0,0 +1,42 @@
+//===-- runtime/allocator-registry.cpp ------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "allocator-registry.h"
+#include "terminator.h"
+
+namespace Fortran::runtime {
+
+#ifndef FLANG_RUNTIME_NO_GLOBAL_VAR_DEFS
+RT_OFFLOAD_VAR_GROUP_BEGIN
+RT_VAR_ATTRS AllocatorRegistry allocatorRegistry;
+RT_OFFLOAD_VAR_GROUP_END
+#endif // FLANG_RUNTIME_NO_GLOBAL_VAR_DEFS
+
+RT_OFFLOAD_API_GROUP_BEGIN
+RT_API_ATTRS void AllocatorRegistry::Register(int pos, Allocator_t allocator) {
+  // pos 0 is reserved for the default allocator and is registered in the
+  // struct ctor.
+  INTERNAL_CHECK(pos > 0 && pos < MAX_ALLOCATOR);
+  allocators[pos] = allocator;
+}
+
+RT_API_ATTRS AllocFct AllocatorRegistry::GetAllocator(int pos) {
+  INTERNAL_CHECK(pos >= 0 && pos < MAX_ALLOCATOR);
+  AllocFct f{allocators[pos].alloc};
+  INTERNAL_CHECK(f != nullptr);
+  return f;
+}
+
+RT_API_ATTRS FreeFct AllocatorRegistry::GetDeallocator(int pos) {
+  INTERNAL_CHECK(pos >= 0 && pos < MAX_ALLOCATOR);
+  FreeFct f{allocators[pos].free};
+  INTERNAL_CHECK(f != nullptr);
+  return f;
+}
+RT_OFFLOAD_API_GROUP_END
+} // namespace Fortran::runtime
diff --git a/flang/runtime/allocator-registry.h b/flang/runtime/allocator-registry.h
new file mode 100644
index 0000000000000..3e941ea986f24
--- /dev/null
+++ b/flang/runtime/allocator-registry.h
@@ -0,0 +1,55 @@
+//===-- runtime/allocator.h -------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef FORTRAN_RUNTIME_ALLOCATOR_H_
+#define FORTRAN_RUNTIME_ALLOCATOR_H_
+
+#include "flang/Common/api-attrs.h"
+#include <cstdlib>
+#include <vector>
+
+#define MAX_ALLOCATOR 5
+
+namespace Fortran::runtime {
+
+using AllocFct = void *(*)(std::size_t);
+using FreeFct = void (*)(void *);
+
+typedef struct Allocator_t {
+  AllocFct alloc{nullptr};
+  FreeFct free{nullptr};
+} Allocator_t;
+
+#ifdef RT_DEVICE_COMPILATION
+static RT_API_ATTRS void *MallocWrapper(std::size_t size) {
+  return std::malloc(size);
+}
+static RT_API_ATTRS void FreeWrapper(void *p) { return std::free(p); }
+#endif
+
+struct AllocatorRegistry {
+#ifdef RT_DEVICE_COMPILATION
+  RT_API_ATTRS constexpr AllocatorRegistry()
+      : allocators{{&MallocWrapper, &FreeWrapper}} {}
+#else
+  constexpr AllocatorRegistry() { allocators[0] = {&std::malloc, &std::free}; };
+#endif
+  RT_API_ATTRS void Register(int, Allocator_t);
+  RT_API_ATTRS AllocFct GetAllocator(int pos);
+  RT_API_ATTRS FreeFct GetDeallocator(int pos);
+
+  Allocator_t allocators[MAX_ALLOCATOR];
+};
+
+RT_OFFLOAD_VAR_GROUP_BEGIN
+extern RT_VAR_ATTRS AllocatorRegistry allocatorRegistry;
+RT_OFFLOAD_VAR_GROUP_END
+
+} // namespace Fortran::runtime
+
+#endif // FORTRAN_RUNTIME_ALLOCATOR_H_
diff --git a/flang/runtime/descriptor.cpp b/flang/runtime/descriptor.cpp
index 9b04cb4f8d0d0..1bf3ebbc00335 100644
--- a/flang/runtime/descriptor.cpp
+++ b/flang/runtime/descriptor.cpp
@@ -8,6 +8,7 @@
 
 #include "flang/Runtime/descriptor.h"
 #include "ISO_Fortran_util.h"
+#include "allocator-registry.h"
 #include "derived.h"
 #include "memory.h"
 #include "stat.h"
@@ -50,7 +51,9 @@ RT_API_ATTRS void Descriptor::Establish(TypeCode t, std::size_t elementBytes,
       GetDimension(j).SetByteStride(0);
     }
   }
-  raw_.f18Addendum = addendum;
+  if (addendum) {
+    raw_.extra = raw_.extra | 1;
+  }
   DescriptorAddendum *a{Addendum()};
   RUNTIME_CHECK(terminator, addendum == (a != nullptr));
   if (a) {
@@ -162,7 +165,9 @@ RT_API_ATTRS int Descriptor::Allocate() {
   // Zero size allocation is possible in Fortran and the resulting
   // descriptor must be allocated/associated. Since std::malloc(0)
   // result is implementation defined, always allocate at least one byte.
-  void *p{byteSize ? std::malloc(byteSize) : std::malloc(1)};
+
+  AllocFct alloc{allocatorRegistry.GetAllocator(raw_.GetAllocIdx())};
+  void *p{alloc(byteSize ? byteSize : 1)};
   if (!p) {
     return CFI_ERROR_MEM_ALLOCATION;
   }
@@ -204,7 +209,8 @@ RT_API_ATTRS int Descriptor::Deallocate() {
   if (!descriptor.base_addr) {
     return CFI_ERROR_BASE_ADDR_NULL;
   } else {
-    std::free(descriptor.base_addr);
+    FreeFct free{allocatorRegistry.GetDeallocator(descriptor.GetAllocIdx())};
+    free(descriptor.base_addr);
     descriptor.base_addr = nullptr;
     return CFI_SUCCESS;
   }
@@ -290,7 +296,7 @@ void Descriptor::Dump(FILE *f) const {
   std::fprintf(f, "  rank      %d\n", static_cast<int>(raw_.rank));
   std::fprintf(f, "  type      %d\n", static_cast<int>(raw_.type));
   std::fprintf(f, "  attribute %d\n", static_cast<int>(raw_.attribute));
-  std::fprintf(f, "  addendum  %d\n", static_cast<int>(raw_.f18Addendum));
+  std::fprintf(f, "  extra     %d\n", static_cast<int>(raw_.extra));
   for (int j{0}; j < raw_.rank; ++j) {
     std::fprintf(f, "  dim[%d] lower_bound %jd\n", j,
         static_cast<std::intmax_t>(raw_.dim[j].lower_bound));
diff --git a/flang/test/Fir/box.fir b/flang/test/Fir/box.fir
index 1a9c9e0496814..81a4d8bc13bf0 100644
--- a/flang/test/Fir/box.fir
+++ b/flang/test/Fir/box.fir
@@ -1,7 +1,7 @@
 // RUN: tco -o - %s | FileCheck %s
 
 // Global box initialization (test must come first because llvm globals are emitted first).
-// CHECK-LABEL: @globalx = internal global { ptr, i64, i32, i8, i8, i8, i8 } { ptr null, i64 ptrtoint (ptr getelementptr (i32, ptr null, i32 1) to i64), i32 20180515, i8 0, i8 9, i8 2, i8 0 }
+// CHECK-LABEL: @globalx = internal global { ptr, i64, i32, i8, i8, i8, i8 } { ptr null, i64 ptrtoint (ptr getelementptr (i32, ptr null, i32 1) to i64), i32 20240719, i8 0, i8 9, i8 2, i8 0 }
 fir.global internal @globalx : !fir.box<!fir.heap<i32>> {
   %c0 = arith.constant 0 : index
   %0 = fir.convert %c0 : (index) -> !fir.heap<i32>
@@ -9,7 +9,7 @@ fir.global internal @globalx : !fir.box<!fir.heap<i32>> {
   fir.has_value %1 : !fir.box<!fir.heap<i32>>
 }
 
-// CHECK-LABEL: @globaly = internal global { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } { ptr null, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20180515, i8 1, i8 27, i8 2, i8 0,{{.*}}[3 x i64] [i64 1, i64 0, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64)]
+// CHECK-LABEL: @globaly = internal global { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } { ptr null, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20240719, i8 1, i8 27, i8 2, i8 0,{{.*}}[3 x i64] [i64 1, i64 0, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64)]
 fir.global internal @globaly : !fir.box<!fir.heap<!fir.array<?xf32>>> {
   %c0 = arith.constant 0 : index
   %0 = fir.convert %c0 : (index) -> !fir.heap<!fir.array<?xf32>>
@@ -27,7 +27,7 @@ func.func private @ga(%b : !fir.box<!fir.array<?xf32>>)
 // CHECK: (ptr %[[ARG:.*]])
 func.func @f(%a : !fir.ref<f32>) {
   // CHECK: %[[DESC:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
-  // CHECK: %[[INS0:.*]] = insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20180515, i8 0, i8 27, i8 0, i8 0 }, ptr %[[ARG]], 0
+  // CHECK: %[[INS0:.*]] = insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20240719, i8 0, i8 27, i8 0, i8 0 }, ptr %[[ARG]], 0
   // CHECK: store {{.*}} %[[INS0]], {{.*}} %[[DESC]]
   %b = fir.embox %a : (!fir.ref<f32>) -> !fir.box<f32>
 
@@ -44,7 +44,7 @@ func.func @fa(%a : !fir.ref<!fir.array<100xf32>>) {
   %c1 = arith.constant 1 : index
   %c100 = arith.constant 100 : index
   %d = fir.shape %c100 : (index) -> !fir.shape<1>
-  // CHECK: %[[INS70:.*]] = insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20180515, i8 1, i8 27, i8 0, i8 0, {{.*}} }, ptr %{{.*}}, 0
+  // CHECK: %[[INS70:.*]] = insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20240719, i8 1, i8 27, i8 0, i8 0, {{.*}} }, ptr %{{.*}}, 0
   %b = fir.embox %c(%d) : (!fir.ref<!fir.array<?xf32>>, !fir.shape<1>) -> !fir.box<!fir.array<?xf32>>
   // CHECK: call void @ga(
   fir.call @ga(%b) : (!fir.box<!fir.array<?xf32>>) -> ()
@@ -58,7 +58,7 @@ func.func @fa(%a : !fir.ref<!fir.array<100xf32>>) {
 func.func @b1(%arg0 : !fir.ref<!fir.char<1,?>>, %arg1 : index) -> !fir.box<!fir.char<1,?>> {
   // CHECK: %[[size:.*]] = mul i64 ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64), %[[arg1]]
   // CHECK: insertvalue {{.*}} undef, i64 %[[size]], 1
-  // CHECK: insertvalue {{.*}} i32 20180515, 2
+  // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
   %x = fir.embox %arg0 typeparams %arg1 : (!fir.ref<!fir.char<1,?>>, index) -> !fir.box<!fir.char<1,?>>
   // CHECK: store {{.*}}, ptr %[[res]]
@@ -71,7 +71,7 @@ func.func @b1(%arg0 : !fir.ref<!fir.char<1,?>>, %arg1 : index) -> !fir.box<!fir.
 // CHECK-SAME: ptr %[[arg0:.*]], i64 %[[arg1:.*]])
 func.func @b2(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,5>>>, %arg1 : index) -> !fir.box<!fir.array<?x!fir.char<1,5>>> {
   %1 = fir.shape %arg1 : (index) -> !fir.shape<1>
-  // CHECK: insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr ([5 x i8], ptr null, i32 1) to i64), i32 20180515, i8 1, i8 40, i8 0, i8 0, {{.*}} }, i64 %[[arg1]], 7, 0, 1
+  // CHECK: insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr ([5 x i8], ptr null, i32 1) to i64), i32 20240719, i8 1, i8 40, i8 0, i8 0, {{.*}} }, i64 %[[arg1]], 7, 0, 1
   // CHECK: insertvalue {{.*}} %{{.*}}, i64 ptrtoint (ptr getelementptr ([5 x i8], ptr null, i32 1) to i64), 7, 0, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
   %2 = fir.embox %arg0(%1) : (!fir.ref<!fir.array<?x!fir.char<1,5>>>, !fir.shape<1>) -> !fir.box<!fir.array<?x!fir.char<1,5>>>
@@ -86,7 +86,7 @@ func.func @b3(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,?>>>, %arg1 : index, %ar
   %1 = fir.shape %arg2 : (index) -> !fir.shape<1>
   // CHECK: %[[size:.*]] = mul i64 ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64), %[[arg1]]
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
-  // CHECK: insertvalue {{.*}} i32 20180515, 2
+  // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 %[[arg2]], 7, 0, 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 7, 0, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
@@ -103,7 +103,7 @@ func.func @b4(%arg0 : !fir.ref<!fir.array<7x!fir.char<1,?>>>, %arg1 : index) ->
   %1 = fir.shape %c_7 : (index) -> !fir.shape<1>
   // CHECK:   %[[size:.*]] = mul i64 ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64), %[[arg1]]
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
-  // CHECK: insertvalue {{.*}} i32 20180515, 2
+  // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 7, 7, 0, 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 7, 0, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
@@ -147,7 +147,7 @@ func.func @box6(%0 : !fir.ref<!fir.array<?x?x?x?xf32>>, %1 : index, %2 : index)
   // CHECK: %[[sdp2:.*]] = sdiv i64 %[[dp2]], 2
   // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[sdp2]], 0
   // CHECK: %[[extent:.*]] = select i1 %[[cmp]], i64 %[[sdp2]], i64 0
-  // CHECK: insertvalue { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] } { ptr undef, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20180515, i8 2, i8 27, i8 0, i8 0, [2 x [3 x i64]] [{{\[}}3 x i64] [i64 1, i64 undef, i64 undef], [3 x i64] undef] }, i64 %[[extent]], 7, 0, 1
+  // CHECK: insertvalue { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] } { ptr undef, i64 ptrtoint...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jul 26, 2024

@llvm/pr-subscribers-flang-codegen

Author: Valentin Clement (バレンタイン クレメン) (clementval)

Changes

This patch enhances the descriptor with the ability to have specialized allocator. The allocators are registered in a dedicated registry and the index of the desired allocator is stored in the descriptor. The default allocator, std::malloc, is registered at index 0.

In order to have this allocator index in the descriptor, the f18Addendum field is repurposed to be able to hold the presence flag for the addendum (lsb) and the allocator index.

Since this is a change in the semantic and name of the 7th field of the descriptor, the CFI_VERSION is bumped to the date of the initial change.

This patch only adds the ability to have this features as part of the descriptor but does not add specific allocator yet. CUDA fortran will be the first user of this feature to allocate descriptor data in the different type of device memory base on the CUDA attribute.


Patch is 46.62 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100690.diff

23 Files Affected:

  • (modified) flang/include/flang/ISO_Fortran_binding.h (+10-2)
  • (modified) flang/include/flang/Optimizer/CodeGen/TBAABuilder.h (+3-3)
  • (modified) flang/include/flang/Optimizer/CodeGen/TypeConverter.h (+1-1)
  • (modified) flang/include/flang/Runtime/descriptor.h (+4-4)
  • (modified) flang/lib/Optimizer/CodeGen/CodeGen.cpp (+2-1)
  • (modified) flang/lib/Optimizer/CodeGen/TypeConverter.cpp (+3-3)
  • (modified) flang/runtime/CMakeLists.txt (+2)
  • (modified) flang/runtime/ISO_Fortran_util.h (+1-1)
  • (added) flang/runtime/allocator-registry.cpp (+42)
  • (added) flang/runtime/allocator-registry.h (+55)
  • (modified) flang/runtime/descriptor.cpp (+10-4)
  • (modified) flang/test/Fir/box.fir (+9-9)
  • (modified) flang/test/Fir/convert-to-llvm.fir (+7-8)
  • (modified) flang/test/Fir/embox-char.fir (+2-2)
  • (modified) flang/test/Fir/embox.fir (+4-4)
  • (modified) flang/test/Fir/ignore-missing-type-descriptor.fir (+1-1)
  • (modified) flang/test/Fir/polymorphic.fir (+4-4)
  • (modified) flang/test/Fir/rebox-global.fir (+1-1)
  • (modified) flang/test/Fir/rebox.fir (+1-1)
  • (modified) flang/test/Fir/tbaa.fir (+3-3)
  • (modified) flang/test/Fir/type-descriptor.fir (+2-2)
  • (modified) flang/test/Lower/allocatable-polymorphic.f90 (+3-3)
  • (modified) flang/unittests/Evaluate/ISO-Fortran-binding.cpp (+1)
diff --git a/flang/include/flang/ISO_Fortran_binding.h b/flang/include/flang/ISO_Fortran_binding.h
index 757d7f2b10cba..d8601a71f88e1 100644
--- a/flang/include/flang/ISO_Fortran_binding.h
+++ b/flang/include/flang/ISO_Fortran_binding.h
@@ -30,7 +30,7 @@
 #endif
 
 /* 18.5.4 */
-#define CFI_VERSION 20180515
+#define CFI_VERSION 20240719
 
 #define CFI_MAX_RANK 15
 typedef unsigned char CFI_rank_t;
@@ -146,7 +146,10 @@ extern "C++" template <typename T> struct FlexibleArray : T {
   CFI_rank_t rank; /* [0 .. CFI_MAX_RANK] */ \
   CFI_type_t type; \
   CFI_attribute_t attribute; \
-  unsigned char f18Addendum;
+  /* The lsb of this field indicates the presence of the f18Addendum. Other \
+   * bits are used to specify the index of the allocator used to managed \
+   * memory of the data hold by the descriptor. */ \
+  unsigned char extra;
 
 typedef struct CFI_cdesc_t {
   _CFI_CDESC_T_HEADER_MEMBERS
@@ -155,6 +158,11 @@ typedef struct CFI_cdesc_t {
 #else
   CFI_dim_t dim[]; /* must appear last */
 #endif
+
+#ifdef __cplusplus
+  RT_API_ATTRS inline bool HasAddendum() const { return extra & 1; }
+  RT_API_ATTRS inline int GetAllocIdx() const { return ((int)extra >> 1); }
+#endif
 } CFI_cdesc_t;
 
 /* 18.5.4 */
diff --git a/flang/include/flang/Optimizer/CodeGen/TBAABuilder.h b/flang/include/flang/Optimizer/CodeGen/TBAABuilder.h
index 5420e48146bbf..8032fb8fe3d56 100644
--- a/flang/include/flang/Optimizer/CodeGen/TBAABuilder.h
+++ b/flang/include/flang/Optimizer/CodeGen/TBAABuilder.h
@@ -55,7 +55,7 @@ namespace fir {
 //                  <@type_desc_3, 20>, // rank
 //                  <@type_desc_3, 21>, // type
 //                  <@type_desc_3, 22>, // attribute
-//                  <@type_desc_3, 23>} // f18Addendum
+//                  <@type_desc_3, 23>} // extra
 //   }
 //   llvm.tbaa_type_desc @type_desc_5 {
 //       id = "CFI_cdesc_t_dim1",
@@ -65,7 +65,7 @@ namespace fir {
 //                  <@type_desc_3, 20>, // rank
 //                  <@type_desc_3, 21>, // type
 //                  <@type_desc_3, 22>, // attribute
-//                  <@type_desc_3, 23>, // f18Addendum
+//                  <@type_desc_3, 23>, // extra
 //                  <@type_desc_3, 24>, // dim[0].lower_bound
 //                  <@type_desc_3, 32>, // dim[0].extent
 //                  <@type_desc_3, 40>} // dim[0].sm
@@ -78,7 +78,7 @@ namespace fir {
 //                  <@type_desc_3, 20>, // rank
 //                  <@type_desc_3, 21>, // type
 //                  <@type_desc_3, 22>, // attribute
-//                  <@type_desc_3, 23>, // f18Addendum
+//                  <@type_desc_3, 23>, // extra
 //                  <@type_desc_3, 24>, // dim[0].lower_bound
 //                  <@type_desc_3, 32>, // dim[0].extent
 //                  <@type_desc_3, 40>, // dim[0].sm
diff --git a/flang/include/flang/Optimizer/CodeGen/TypeConverter.h b/flang/include/flang/Optimizer/CodeGen/TypeConverter.h
index ece3e4e073f68..7215abf3a8f81 100644
--- a/flang/include/flang/Optimizer/CodeGen/TypeConverter.h
+++ b/flang/include/flang/Optimizer/CodeGen/TypeConverter.h
@@ -29,7 +29,7 @@ static constexpr unsigned kVersionPosInBox = 2;
 static constexpr unsigned kRankPosInBox = 3;
 static constexpr unsigned kTypePosInBox = 4;
 static constexpr unsigned kAttributePosInBox = 5;
-static constexpr unsigned kF18AddendumPosInBox = 6;
+static constexpr unsigned kExtraPosInBox = 6;
 static constexpr unsigned kDimsPosInBox = 7;
 static constexpr unsigned kOptTypePtrPosInBox = 8;
 static constexpr unsigned kOptRowTypePosInBox = 9;
diff --git a/flang/include/flang/Runtime/descriptor.h b/flang/include/flang/Runtime/descriptor.h
index 1b0b7e23ce6cc..301a1059b4096 100644
--- a/flang/include/flang/Runtime/descriptor.h
+++ b/flang/include/flang/Runtime/descriptor.h
@@ -92,8 +92,8 @@ class Dimension {
 // The storage for this object follows the last used dim[] entry in a
 // Descriptor (CFI_cdesc_t) generic descriptor.  Space matters here, since
 // descriptors serve as POINTER and ALLOCATABLE components of derived type
-// instances.  The presence of this structure is implied by the flag
-// CFI_cdesc_t.f18Addendum, and the number of elements in the len_[]
+// instances.  The presence of this structure is implied by the lsb of
+// CFI_cdesc_t.extra set to 1, and the number of elements in the len_[]
 // array is determined by derivedType_->LenParameters().
 class DescriptorAddendum {
 public:
@@ -339,14 +339,14 @@ class Descriptor {
       const SubscriptValue *, const int *permutation = nullptr) const;
 
   RT_API_ATTRS DescriptorAddendum *Addendum() {
-    if (raw_.f18Addendum != 0) {
+    if (raw_.HasAddendum()) {
       return reinterpret_cast<DescriptorAddendum *>(&GetDimension(rank()));
     } else {
       return nullptr;
     }
   }
   RT_API_ATTRS const DescriptorAddendum *Addendum() const {
-    if (raw_.f18Addendum != 0) {
+    if (raw_.HasAddendum()) {
       return reinterpret_cast<const DescriptorAddendum *>(
           &GetDimension(rank()));
     } else {
diff --git a/flang/lib/Optimizer/CodeGen/CodeGen.cpp b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
index f9ea92a843b23..77c17b005bdef 100644
--- a/flang/lib/Optimizer/CodeGen/CodeGen.cpp
+++ b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
@@ -1241,9 +1241,10 @@ struct EmboxCommonConversion : public fir::FIROpConversion<OP> {
     descriptor =
         insertField(rewriter, loc, descriptor, {kAttributePosInBox},
                     this->genI32Constant(loc, rewriter, getCFIAttr(boxTy)));
+
     const bool hasAddendum = fir::boxHasAddendum(boxTy);
     descriptor =
-        insertField(rewriter, loc, descriptor, {kF18AddendumPosInBox},
+        insertField(rewriter, loc, descriptor, {kExtraPosInBox},
                     this->genI32Constant(loc, rewriter, hasAddendum ? 1 : 0));
 
     if (hasAddendum) {
diff --git a/flang/lib/Optimizer/CodeGen/TypeConverter.cpp b/flang/lib/Optimizer/CodeGen/TypeConverter.cpp
index 59f67f4fcad44..1043494b6fb48 100644
--- a/flang/lib/Optimizer/CodeGen/TypeConverter.cpp
+++ b/flang/lib/Optimizer/CodeGen/TypeConverter.cpp
@@ -177,7 +177,7 @@ bool LLVMTypeConverter::requiresExtendedDesc(mlir::Type boxElementType) const {
 // the addendum defined in descriptor.h.
 mlir::Type LLVMTypeConverter::convertBoxTypeAsStruct(BaseBoxType box,
                                                      int rank) const {
-  // (base_addr*, elem_len, version, rank, type, attribute, f18Addendum, [dim]
+  // (base_addr*, elem_len, version, rank, type, attribute, extra, [dim]
   llvm::SmallVector<mlir::Type> dataDescFields;
   mlir::Type ele = box.getEleTy();
   // remove fir.heap/fir.ref/fir.ptr
@@ -206,9 +206,9 @@ mlir::Type LLVMTypeConverter::convertBoxTypeAsStruct(BaseBoxType box,
   // attribute
   dataDescFields.push_back(
       getDescFieldTypeModel<kAttributePosInBox>()(&getContext()));
-  // f18Addendum
+  // extra
   dataDescFields.push_back(
-      getDescFieldTypeModel<kF18AddendumPosInBox>()(&getContext()));
+      getDescFieldTypeModel<kExtraPosInBox>()(&getContext()));
   // [dims]
   if (rank == unknownRank()) {
     if (auto seqTy = mlir::dyn_cast<SequenceType>(ele))
diff --git a/flang/runtime/CMakeLists.txt b/flang/runtime/CMakeLists.txt
index d587fd44b1678..1f3ae23dcbf12 100644
--- a/flang/runtime/CMakeLists.txt
+++ b/flang/runtime/CMakeLists.txt
@@ -107,6 +107,7 @@ add_subdirectory(Float128Math)
 
 set(sources
   ISO_Fortran_binding.cpp
+  allocator-registry.cpp
   allocatable.cpp
   array-constructor.cpp
   assign.cpp
@@ -178,6 +179,7 @@ include(AddFlangOffloadRuntime)
 set(supported_files
   ISO_Fortran_binding.cpp
   allocatable.cpp
+  allocator-registry.cpp
   array-constructor.cpp
   assign.cpp
   buffer.cpp
diff --git a/flang/runtime/ISO_Fortran_util.h b/flang/runtime/ISO_Fortran_util.h
index 469067600bd90..dd0eeef80bb89 100644
--- a/flang/runtime/ISO_Fortran_util.h
+++ b/flang/runtime/ISO_Fortran_util.h
@@ -86,7 +86,7 @@ static inline RT_API_ATTRS void EstablishDescriptor(CFI_cdesc_t *descriptor,
   descriptor->rank = rank;
   descriptor->type = type;
   descriptor->attribute = attribute;
-  descriptor->f18Addendum = 0;
+  descriptor->extra = 0;
   std::size_t byteSize{elem_len};
   constexpr std::size_t lower_bound{0};
   if (base_addr) {
diff --git a/flang/runtime/allocator-registry.cpp b/flang/runtime/allocator-registry.cpp
new file mode 100644
index 0000000000000..461756d2ba95d
--- /dev/null
+++ b/flang/runtime/allocator-registry.cpp
@@ -0,0 +1,42 @@
+//===-- runtime/allocator-registry.cpp ------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "allocator-registry.h"
+#include "terminator.h"
+
+namespace Fortran::runtime {
+
+#ifndef FLANG_RUNTIME_NO_GLOBAL_VAR_DEFS
+RT_OFFLOAD_VAR_GROUP_BEGIN
+RT_VAR_ATTRS AllocatorRegistry allocatorRegistry;
+RT_OFFLOAD_VAR_GROUP_END
+#endif // FLANG_RUNTIME_NO_GLOBAL_VAR_DEFS
+
+RT_OFFLOAD_API_GROUP_BEGIN
+RT_API_ATTRS void AllocatorRegistry::Register(int pos, Allocator_t allocator) {
+  // pos 0 is reserved for the default allocator and is registered in the
+  // struct ctor.
+  INTERNAL_CHECK(pos > 0 && pos < MAX_ALLOCATOR);
+  allocators[pos] = allocator;
+}
+
+RT_API_ATTRS AllocFct AllocatorRegistry::GetAllocator(int pos) {
+  INTERNAL_CHECK(pos >= 0 && pos < MAX_ALLOCATOR);
+  AllocFct f{allocators[pos].alloc};
+  INTERNAL_CHECK(f != nullptr);
+  return f;
+}
+
+RT_API_ATTRS FreeFct AllocatorRegistry::GetDeallocator(int pos) {
+  INTERNAL_CHECK(pos >= 0 && pos < MAX_ALLOCATOR);
+  FreeFct f{allocators[pos].free};
+  INTERNAL_CHECK(f != nullptr);
+  return f;
+}
+RT_OFFLOAD_API_GROUP_END
+} // namespace Fortran::runtime
diff --git a/flang/runtime/allocator-registry.h b/flang/runtime/allocator-registry.h
new file mode 100644
index 0000000000000..3e941ea986f24
--- /dev/null
+++ b/flang/runtime/allocator-registry.h
@@ -0,0 +1,55 @@
+//===-- runtime/allocator.h -------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef FORTRAN_RUNTIME_ALLOCATOR_H_
+#define FORTRAN_RUNTIME_ALLOCATOR_H_
+
+#include "flang/Common/api-attrs.h"
+#include <cstdlib>
+#include <vector>
+
+#define MAX_ALLOCATOR 5
+
+namespace Fortran::runtime {
+
+using AllocFct = void *(*)(std::size_t);
+using FreeFct = void (*)(void *);
+
+typedef struct Allocator_t {
+  AllocFct alloc{nullptr};
+  FreeFct free{nullptr};
+} Allocator_t;
+
+#ifdef RT_DEVICE_COMPILATION
+static RT_API_ATTRS void *MallocWrapper(std::size_t size) {
+  return std::malloc(size);
+}
+static RT_API_ATTRS void FreeWrapper(void *p) { return std::free(p); }
+#endif
+
+struct AllocatorRegistry {
+#ifdef RT_DEVICE_COMPILATION
+  RT_API_ATTRS constexpr AllocatorRegistry()
+      : allocators{{&MallocWrapper, &FreeWrapper}} {}
+#else
+  constexpr AllocatorRegistry() { allocators[0] = {&std::malloc, &std::free}; };
+#endif
+  RT_API_ATTRS void Register(int, Allocator_t);
+  RT_API_ATTRS AllocFct GetAllocator(int pos);
+  RT_API_ATTRS FreeFct GetDeallocator(int pos);
+
+  Allocator_t allocators[MAX_ALLOCATOR];
+};
+
+RT_OFFLOAD_VAR_GROUP_BEGIN
+extern RT_VAR_ATTRS AllocatorRegistry allocatorRegistry;
+RT_OFFLOAD_VAR_GROUP_END
+
+} // namespace Fortran::runtime
+
+#endif // FORTRAN_RUNTIME_ALLOCATOR_H_
diff --git a/flang/runtime/descriptor.cpp b/flang/runtime/descriptor.cpp
index 9b04cb4f8d0d0..1bf3ebbc00335 100644
--- a/flang/runtime/descriptor.cpp
+++ b/flang/runtime/descriptor.cpp
@@ -8,6 +8,7 @@
 
 #include "flang/Runtime/descriptor.h"
 #include "ISO_Fortran_util.h"
+#include "allocator-registry.h"
 #include "derived.h"
 #include "memory.h"
 #include "stat.h"
@@ -50,7 +51,9 @@ RT_API_ATTRS void Descriptor::Establish(TypeCode t, std::size_t elementBytes,
       GetDimension(j).SetByteStride(0);
     }
   }
-  raw_.f18Addendum = addendum;
+  if (addendum) {
+    raw_.extra = raw_.extra | 1;
+  }
   DescriptorAddendum *a{Addendum()};
   RUNTIME_CHECK(terminator, addendum == (a != nullptr));
   if (a) {
@@ -162,7 +165,9 @@ RT_API_ATTRS int Descriptor::Allocate() {
   // Zero size allocation is possible in Fortran and the resulting
   // descriptor must be allocated/associated. Since std::malloc(0)
   // result is implementation defined, always allocate at least one byte.
-  void *p{byteSize ? std::malloc(byteSize) : std::malloc(1)};
+
+  AllocFct alloc{allocatorRegistry.GetAllocator(raw_.GetAllocIdx())};
+  void *p{alloc(byteSize ? byteSize : 1)};
   if (!p) {
     return CFI_ERROR_MEM_ALLOCATION;
   }
@@ -204,7 +209,8 @@ RT_API_ATTRS int Descriptor::Deallocate() {
   if (!descriptor.base_addr) {
     return CFI_ERROR_BASE_ADDR_NULL;
   } else {
-    std::free(descriptor.base_addr);
+    FreeFct free{allocatorRegistry.GetDeallocator(descriptor.GetAllocIdx())};
+    free(descriptor.base_addr);
     descriptor.base_addr = nullptr;
     return CFI_SUCCESS;
   }
@@ -290,7 +296,7 @@ void Descriptor::Dump(FILE *f) const {
   std::fprintf(f, "  rank      %d\n", static_cast<int>(raw_.rank));
   std::fprintf(f, "  type      %d\n", static_cast<int>(raw_.type));
   std::fprintf(f, "  attribute %d\n", static_cast<int>(raw_.attribute));
-  std::fprintf(f, "  addendum  %d\n", static_cast<int>(raw_.f18Addendum));
+  std::fprintf(f, "  extra     %d\n", static_cast<int>(raw_.extra));
   for (int j{0}; j < raw_.rank; ++j) {
     std::fprintf(f, "  dim[%d] lower_bound %jd\n", j,
         static_cast<std::intmax_t>(raw_.dim[j].lower_bound));
diff --git a/flang/test/Fir/box.fir b/flang/test/Fir/box.fir
index 1a9c9e0496814..81a4d8bc13bf0 100644
--- a/flang/test/Fir/box.fir
+++ b/flang/test/Fir/box.fir
@@ -1,7 +1,7 @@
 // RUN: tco -o - %s | FileCheck %s
 
 // Global box initialization (test must come first because llvm globals are emitted first).
-// CHECK-LABEL: @globalx = internal global { ptr, i64, i32, i8, i8, i8, i8 } { ptr null, i64 ptrtoint (ptr getelementptr (i32, ptr null, i32 1) to i64), i32 20180515, i8 0, i8 9, i8 2, i8 0 }
+// CHECK-LABEL: @globalx = internal global { ptr, i64, i32, i8, i8, i8, i8 } { ptr null, i64 ptrtoint (ptr getelementptr (i32, ptr null, i32 1) to i64), i32 20240719, i8 0, i8 9, i8 2, i8 0 }
 fir.global internal @globalx : !fir.box<!fir.heap<i32>> {
   %c0 = arith.constant 0 : index
   %0 = fir.convert %c0 : (index) -> !fir.heap<i32>
@@ -9,7 +9,7 @@ fir.global internal @globalx : !fir.box<!fir.heap<i32>> {
   fir.has_value %1 : !fir.box<!fir.heap<i32>>
 }
 
-// CHECK-LABEL: @globaly = internal global { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } { ptr null, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20180515, i8 1, i8 27, i8 2, i8 0,{{.*}}[3 x i64] [i64 1, i64 0, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64)]
+// CHECK-LABEL: @globaly = internal global { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } { ptr null, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20240719, i8 1, i8 27, i8 2, i8 0,{{.*}}[3 x i64] [i64 1, i64 0, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64)]
 fir.global internal @globaly : !fir.box<!fir.heap<!fir.array<?xf32>>> {
   %c0 = arith.constant 0 : index
   %0 = fir.convert %c0 : (index) -> !fir.heap<!fir.array<?xf32>>
@@ -27,7 +27,7 @@ func.func private @ga(%b : !fir.box<!fir.array<?xf32>>)
 // CHECK: (ptr %[[ARG:.*]])
 func.func @f(%a : !fir.ref<f32>) {
   // CHECK: %[[DESC:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
-  // CHECK: %[[INS0:.*]] = insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20180515, i8 0, i8 27, i8 0, i8 0 }, ptr %[[ARG]], 0
+  // CHECK: %[[INS0:.*]] = insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20240719, i8 0, i8 27, i8 0, i8 0 }, ptr %[[ARG]], 0
   // CHECK: store {{.*}} %[[INS0]], {{.*}} %[[DESC]]
   %b = fir.embox %a : (!fir.ref<f32>) -> !fir.box<f32>
 
@@ -44,7 +44,7 @@ func.func @fa(%a : !fir.ref<!fir.array<100xf32>>) {
   %c1 = arith.constant 1 : index
   %c100 = arith.constant 100 : index
   %d = fir.shape %c100 : (index) -> !fir.shape<1>
-  // CHECK: %[[INS70:.*]] = insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20180515, i8 1, i8 27, i8 0, i8 0, {{.*}} }, ptr %{{.*}}, 0
+  // CHECK: %[[INS70:.*]] = insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20240719, i8 1, i8 27, i8 0, i8 0, {{.*}} }, ptr %{{.*}}, 0
   %b = fir.embox %c(%d) : (!fir.ref<!fir.array<?xf32>>, !fir.shape<1>) -> !fir.box<!fir.array<?xf32>>
   // CHECK: call void @ga(
   fir.call @ga(%b) : (!fir.box<!fir.array<?xf32>>) -> ()
@@ -58,7 +58,7 @@ func.func @fa(%a : !fir.ref<!fir.array<100xf32>>) {
 func.func @b1(%arg0 : !fir.ref<!fir.char<1,?>>, %arg1 : index) -> !fir.box<!fir.char<1,?>> {
   // CHECK: %[[size:.*]] = mul i64 ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64), %[[arg1]]
   // CHECK: insertvalue {{.*}} undef, i64 %[[size]], 1
-  // CHECK: insertvalue {{.*}} i32 20180515, 2
+  // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
   %x = fir.embox %arg0 typeparams %arg1 : (!fir.ref<!fir.char<1,?>>, index) -> !fir.box<!fir.char<1,?>>
   // CHECK: store {{.*}}, ptr %[[res]]
@@ -71,7 +71,7 @@ func.func @b1(%arg0 : !fir.ref<!fir.char<1,?>>, %arg1 : index) -> !fir.box<!fir.
 // CHECK-SAME: ptr %[[arg0:.*]], i64 %[[arg1:.*]])
 func.func @b2(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,5>>>, %arg1 : index) -> !fir.box<!fir.array<?x!fir.char<1,5>>> {
   %1 = fir.shape %arg1 : (index) -> !fir.shape<1>
-  // CHECK: insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr ([5 x i8], ptr null, i32 1) to i64), i32 20180515, i8 1, i8 40, i8 0, i8 0, {{.*}} }, i64 %[[arg1]], 7, 0, 1
+  // CHECK: insertvalue {{.*}} { ptr undef, i64 ptrtoint (ptr getelementptr ([5 x i8], ptr null, i32 1) to i64), i32 20240719, i8 1, i8 40, i8 0, i8 0, {{.*}} }, i64 %[[arg1]], 7, 0, 1
   // CHECK: insertvalue {{.*}} %{{.*}}, i64 ptrtoint (ptr getelementptr ([5 x i8], ptr null, i32 1) to i64), 7, 0, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
   %2 = fir.embox %arg0(%1) : (!fir.ref<!fir.array<?x!fir.char<1,5>>>, !fir.shape<1>) -> !fir.box<!fir.array<?x!fir.char<1,5>>>
@@ -86,7 +86,7 @@ func.func @b3(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,?>>>, %arg1 : index, %ar
   %1 = fir.shape %arg2 : (index) -> !fir.shape<1>
   // CHECK: %[[size:.*]] = mul i64 ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64), %[[arg1]]
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
-  // CHECK: insertvalue {{.*}} i32 20180515, 2
+  // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 %[[arg2]], 7, 0, 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 7, 0, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
@@ -103,7 +103,7 @@ func.func @b4(%arg0 : !fir.ref<!fir.array<7x!fir.char<1,?>>>, %arg1 : index) ->
   %1 = fir.shape %c_7 : (index) -> !fir.shape<1>
   // CHECK:   %[[size:.*]] = mul i64 ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64), %[[arg1]]
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
-  // CHECK: insertvalue {{.*}} i32 20180515, 2
+  // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 7, 7, 0, 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 7, 0, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
@@ -147,7 +147,7 @@ func.func @box6(%0 : !fir.ref<!fir.array<?x?x?x?xf32>>, %1 : index, %2 : index)
   // CHECK: %[[sdp2:.*]] = sdiv i64 %[[dp2]], 2
   // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[sdp2]], 0
   // CHECK: %[[extent:.*]] = select i1 %[[cmp]], i64 %[[sdp2]], i64 0
-  // CHECK: insertvalue { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] } { ptr undef, i64 ptrtoint (ptr getelementptr (float, ptr null, i32 1) to i64), i32 20180515, i8 2, i8 27, i8 0, i8 0, [2 x [3 x i64]] [{{\[}}3 x i64] [i64 1, i64 undef, i64 undef], [3 x i64] undef] }, i64 %[[extent]], 7, 0, 1
+  // CHECK: insertvalue { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] } { ptr undef, i64 ptrtoint...
[truncated]

@clementval
Copy link
Contributor Author

@klausler Are you ok with the latest changes?

Copy link

github-actions bot commented Jul 29, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@clementval
Copy link
Contributor Author

@vzakhari @klausler I think I addressed all the comments in this PR. Anything else you would like to comment on?

Copy link
Contributor

@vzakhari vzakhari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Valentin! It looks good to me.

@clementval clementval force-pushed the descriptor_allocator branch from 9bd2532 to ecc4fd7 Compare July 31, 2024 16:48
@clementval clementval merged commit 6df4e7c into llvm:main Aug 1, 2024
7 checks passed
clementval added a commit that referenced this pull request Aug 1, 2024
…101212)

#100690 introduces allocator registry with the ability to store
allocator index in the descriptor. This patch adds an attribute to
fir.embox and fircg.ext_embox to be able to set the allocator index
while populating the descriptor fields.
@clementval clementval deleted the descriptor_allocator branch August 1, 2024 22:59
clementval added a commit that referenced this pull request Aug 2, 2024
Add allocators for CUDA fortran allocation on the device. 3 allocators
are added for pinned, device and managed/unified memory allocation.
`CUFRegisterAllocator()` is called to register the allocators in the
allocator registry added in #100690.


Since this require CUDA, a cmake option `FLANG_CUF_RUNTIME` is added to
conditionally build these.
}
RT_API_ATTRS inline void SetAllocIdx(int pos) {
raw_.extra &= ~_CFI_ALLOCATOR_IDX_MASK; // Clear the allocator index bits.
raw_.extra |= (pos << _CFI_ALLOCATOR_IDX_SHIFT);
Copy link
Member

@Thirumalai-Shaktivel Thirumalai-Shaktivel Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line causes the build failure in my setup, as the following:

/mnt/hdd1tb/JENKINS/workspace/update_upstream-main/llvm-project/flang/include/flang/Runtime/descriptor.h:441:16: error: array subscript 'Fortran::runtime::Descriptor[0]' is partly outside array bounds of 'Fortran::runtime::StaticDescriptor<0> [1]' [-Werror=array-bounds]
   441 |     raw_.extra |= (pos << _CFI_ALLOCATOR_IDX_SHIFT);
       |     ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Edit: I use -DFLANG_ENABLE_WERROR=ON in my setup

Can you please check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which compiler are you using? All build bots are working fine with this change.

Copy link
Member

@Thirumalai-Shaktivel Thirumalai-Shaktivel Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using GNU 11. Then, I tested this using Clang and it works.

Should we only use clang to build flang?

Copy link
Member

@Thirumalai-Shaktivel Thirumalai-Shaktivel Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With:

    [...]
    -DCMAKE_C_COMPILER=gcc \
    -DCMAKE_CXX_COMPILER=g++ \
    -DFLANG_ENABLE_WERROR=ON

I get the error.
With -DFLANG_ENABLE_WERROR=OFF gives a WARNING as above and it works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to have -DFLANG_ENABLE_WERROR=ON for any compilers. Usually building with clang is warning free because it is what is used in upstream buildbots.

I'm often building with gcc 9.3.0 and there is tons of warnings.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, got it. Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm often building with gcc 9.3.0 and there is tons of warnings.

I use gcc 9.3.0 with -Werror enabled for flang. What warnings are you seeing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't throw the error for gcc 9. I will use gcc 9 only. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants