Skip to content

Commit 3212051

Browse files
committed
[RFC][flang] Experimental device build of Flang runtime.
These are initial changes to experiment with building the Fortran runtime as a CUDA or OpenMP target offload library. The initial patch defines a set of macros that have to be used consistently in Flang runtime source code so that it can be built for different offload devices using different programming models (CUDA, HIP, OpenMP target offload). Currently supported modes are: * CUDA: Flang runtime may be built as a fatlib for the host and a set of CUDA architectures specified during the build. The packaging of the device code is done by the CUDA toolchain and may differ from toolchan to toolchain. * OpenMP offload: - host_device mode: Flang runtime may be built as a fatlib for the host and a set of OpenMP offload architectures. The packaging of the device code is done by the OpenMP offload compiler and may differ from compiler to compiler. OpenMP offload 'nohost' mode is a TODO to match the build setup of libomptarget/DeviceRTL. Flang runtime will be built as LLVM Bitcode library using Clang/LLVM toolchain. The host part of the library will be "empty", so there will be two distributable object: the host Flang runtime and dummy host library with device Flang runtime pieces packaged using clang-offload-packager and clang. In all supported modes, enabling parts of Flang runtime for the device compilation can be done iteratively to make the patches observable. Note that at any point in time the resulting library may have unresolved references to not yet enabled parts of Flang runtime. Example cmake/make commands for building with Clang for NVPTX target: cmake \ -DFLANG_EXPERIMENTAL_CUDA_RUNTIME=ON \ -DCMAKE_CUDA_ARCHITECTURES=80 \ -DCMAKE_C_COMPILER=/clang_nvptx/bin/clang \ -DCMAKE_CXX_COMPILER=/clang_nvptx/bin/clang++ \ -DCMAKE_CUDA_COMPILER=/clang_nvptx/bin/clang \ /llvm-project/flang/runtime/ make -j FortranRuntime Example cmake/make commands for building with Clang OpenMP offload: cmake \ -DFLANG_EXPERIMENTAL_OMP_OFFLOAD_BUILD="host_device" \ -DCMAKE_C_COMPILER=clang \ -DCMAKE_CXX_COMPILER=clang++ \ -DFLANG_OMP_DEVICE_ARCHITECTURES="sm_80" \ ../flang/runtime/ make -j FortranRuntime Differential Revision: https://reviews.llvm.org/D151173
1 parent c88f27f commit 3212051

File tree

14 files changed

+535
-169
lines changed

14 files changed

+535
-169
lines changed

flang/docs/GettingStarted.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,76 @@ directory:
180180
ninja check-flang
181181
```
182182
183+
### Building flang runtime for accelerators
184+
Flang runtime can be built for accelerators in experimental mode, i.e.
185+
complete enabling is WIP. CUDA and OpenMP target offload builds
186+
are currently supported.
187+
188+
#### Building out-of-tree
189+
190+
##### CUDA build
191+
Clang with NVPTX backend and NVCC compilers are supported.
192+
193+
```bash
194+
cd llvm-project/flang
195+
mkdir -rf build_flang_runtime
196+
mkdir build_flang_runtime
197+
cd build_flang_runtime
198+
199+
cmake \
200+
-DFLANG_EXPERIMENTAL_CUDA_RUNTIME=ON \
201+
-DCMAKE_CUDA_ARCHITECTURES=80 \
202+
-DCMAKE_C_COMPILER=clang \
203+
-DCMAKE_CXX_COMPILER=clang++ \
204+
-DCMAKE_CUDA_COMPILER=clang \
205+
../runtime/
206+
make -j FortranRuntime
207+
```
208+
209+
```bash
210+
cd llvm-project/flang
211+
mkdir -rf build_flang_runtime
212+
mkdir build_flang_runtime
213+
cd build_flang_runtime
214+
215+
cmake \
216+
-DFLANG_EXPERIMENTAL_CUDA_RUNTIME=ON \
217+
-DCMAKE_CUDA_ARCHITECTURES=80 \
218+
-DCMAKE_C_COMPILER=clang \
219+
-DCMAKE_CXX_COMPILER=clang++ \
220+
-DCMAKE_CUDA_COMPILER=nvcc \
221+
../runtime/
222+
make -j FortranRuntime
223+
```
224+
225+
The result of the build is a "fat" library with the host and device
226+
code. Note that the packaging of the libraries is different
227+
between [Clang](https://clang.llvm.org/docs/OffloadingDesign.html#linking-target-device-code) and NVCC, so the library must be linked using
228+
compatible compiler drivers.
229+
230+
##### OpenMP target offload build
231+
Only Clang compiler is currently supported.
232+
233+
```
234+
cd llvm-project/flang
235+
mkdir -rf build_flang_runtime
236+
mkdir build_flang_runtime
237+
cd build_flang_runtime
238+
239+
cmake \
240+
-DFLANG_EXPERIMENTAL_OMP_OFFLOAD_BUILD="host_device" \
241+
-DCMAKE_C_COMPILER=clang \
242+
-DCMAKE_CXX_COMPILER=clang++ \
243+
-DFLANG_OMP_DEVICE_ARCHITECTURES="all" \
244+
../runtime/
245+
make -j FortranRuntime
246+
```
247+
248+
The result of the build is a "device-only" library, i.e. the host
249+
part of the library is just a container for the device code.
250+
The resulting library may be linked to user programs using
251+
Clang-like device linking pipeline.
252+
183253
## Supported C++ compilers
184254
185255
Flang is written in C++17.

flang/include/flang/ISO_Fortran_binding.h

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@
1818
* implementation.
1919
*/
2020

21+
#include "Runtime/api-attrs.h"
22+
2123
#ifdef __cplusplus
2224
namespace Fortran {
2325
namespace ISO {
@@ -121,8 +123,8 @@ namespace cfi_internal {
121123
// care of getting the memory storage. Note that it already contains one element
122124
// because a struct cannot be empty.
123125
template <typename T> struct FlexibleArray : T {
124-
T &operator[](int index) { return *(this + index); }
125-
const T &operator[](int index) const { return *(this + index); }
126+
RT_API_ATTRS T &operator[](int index) { return *(this + index); }
127+
const RT_API_ATTRS T &operator[](int index) const { return *(this + index); }
126128
operator T *() { return this; }
127129
operator const T *() const { return this; }
128130
};
@@ -174,11 +176,11 @@ extern "C" {
174176
void *CFI_address(const CFI_cdesc_t *, const CFI_index_t subscripts[]);
175177
int CFI_allocate(CFI_cdesc_t *, const CFI_index_t lower_bounds[],
176178
const CFI_index_t upper_bounds[], size_t elem_len);
177-
int CFI_deallocate(CFI_cdesc_t *);
179+
RT_API_ATTRS int CFI_deallocate(CFI_cdesc_t *);
178180
int CFI_establish(CFI_cdesc_t *, void *base_addr, CFI_attribute_t, CFI_type_t,
179181
size_t elem_len, CFI_rank_t, const CFI_index_t extents[]);
180182
int CFI_is_contiguous(const CFI_cdesc_t *);
181-
int CFI_section(CFI_cdesc_t *, const CFI_cdesc_t *source,
183+
RT_API_ATTRS int CFI_section(CFI_cdesc_t *, const CFI_cdesc_t *source,
182184
const CFI_index_t lower_bounds[], const CFI_index_t upper_bounds[],
183185
const CFI_index_t strides[]);
184186
int CFI_select_part(CFI_cdesc_t *, const CFI_cdesc_t *source,
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
/*===-- include/flang/Runtime/api-attrs.h ---------------------------*- C -*-=//
2+
*
3+
* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4+
* See https://llvm.org/LICENSE.txt for license information.
5+
* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
*
7+
*===------------------------------------------------------------------------===
8+
*/
9+
10+
/*
11+
* The file defines a set macros that can be used to apply
12+
* different attributes/pragmas to functions/variables
13+
* declared/defined/used in Flang runtime library.
14+
*/
15+
16+
#ifndef FORTRAN_RUNTIME_API_ATTRS_H_
17+
#define FORTRAN_RUNTIME_API_ATTRS_H_
18+
19+
/*
20+
* RT_EXT_API_GROUP_BEGIN/END pair is placed around definitions
21+
* of functions exported by Flang runtime library. They are the entry
22+
* points that are referenced in the Flang generated code.
23+
* The macros may be expanded into any construct that is valid to appear
24+
* at C++ module scope.
25+
*/
26+
#ifndef RT_EXT_API_GROUP_BEGIN
27+
#if defined(OMP_NOHOST_BUILD)
28+
#define RT_EXT_API_GROUP_BEGIN \
29+
_Pragma("omp begin declare target device_type(nohost)")
30+
#elif defined(OMP_OFFLOAD_BUILD)
31+
#define RT_EXT_API_GROUP_BEGIN _Pragma("omp declare target")
32+
#else
33+
#define RT_EXT_API_GROUP_BEGIN
34+
#endif
35+
#endif /* !defined(RT_EXT_API_GROUP_BEGIN) */
36+
37+
#ifndef RT_EXT_API_GROUP_END
38+
#if defined(OMP_NOHOST_BUILD) || defined(OMP_OFFLOAD_BUILD)
39+
#define RT_EXT_API_GROUP_END _Pragma("omp end declare target")
40+
#else
41+
#define RT_EXT_API_GROUP_END
42+
#endif
43+
#endif /* !defined(RT_EXT_API_GROUP_END) */
44+
45+
/*
46+
* RT_VAR_GROUP_BEGIN/END pair is placed around definitions
47+
* of module scope variables referenced by Flang runtime (directly
48+
* or indirectly).
49+
* The macros may be expanded into any construct that is valid to appear
50+
* at C++ module scope.
51+
*/
52+
#ifndef RT_VAR_GROUP_BEGIN
53+
#define RT_VAR_GROUP_BEGIN RT_EXT_API_GROUP_BEGIN
54+
#endif /* !defined(RT_VAR_GROUP_BEGIN) */
55+
56+
#ifndef RT_VAR_GROUP_END
57+
#define RT_VAR_GROUP_END RT_EXT_API_GROUP_END
58+
#endif /* !defined(RT_VAR_GROUP_END) */
59+
60+
/*
61+
* Each non-exported function used by Flang runtime (e.g. via
62+
* calling it or taking its address, etc.) is marked with
63+
* RT_API_ATTRS. The macros is placed at both declaration and
64+
* definition of such a function.
65+
* The macros may be expanded into a construct that is valid
66+
* to appear as part of a C++ decl-specifier.
67+
*/
68+
#ifndef RT_API_ATTRS
69+
#if defined(__CUDACC__) || defined(__CUDA__)
70+
#define RT_API_ATTRS __host__ __device__
71+
#else
72+
#define RT_API_ATTRS
73+
#endif
74+
#endif /* !defined(RT_API_ATTRS) */
75+
76+
/*
77+
* Each const/constexpr module scope variable referenced by Flang runtime
78+
* (directly or indirectly) is marked with RT_CONST_VAR_ATTRS.
79+
* The macros is placed at both declaration and definition of such a variable.
80+
* The macros may be expanded into a construct that is valid
81+
* to appear as part of a C++ decl-specifier.
82+
*/
83+
#ifndef RT_CONST_VAR_ATTRS
84+
#if defined(__CUDACC__) || defined(__CUDA__)
85+
#define RT_CONST_VAR_ATTRS __constant__
86+
#else
87+
#define RT_CONST_VAR_ATTRS
88+
#endif
89+
#endif /* !defined(RT_CONST_VAR_ATTRS) */
90+
91+
#endif /* !FORTRAN_RUNTIME_API_ATTRS_H_ */

0 commit comments

Comments
 (0)