Skip to content

[Flang][Driver] Enable gpulibc/nogpulibc options for Flang, which allows linking of GPU LIBC for the fortran and OpenMP runtime #77135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 9, 2024

Conversation

agozillon
Copy link
Contributor

This patch seeks to add the -gpulibc and -nogpulibc for Flang, which allows the linking of the GPU libc library, this allows the use of memcpy and other useful library functions for GPU.

In particular, this allows the Fortran runtime (written in C++) to be compiled for offload and then linked against the GPU LIBC library via this option to resolve memcpy and other C library functions that the fortran runtime depends on for AMD GPU devices (and likely other GPU devices).

This is the current method I've tested and found to be able to utilise the Fortran runtime when compiled for AMD GPU, albeit it requires compiling libc for GPU and then the Fortran runtime for GPU, so not particularly straight forward or user friendly yet.

Activating this option will allow the subset of C functions to also be utilised for GPU in other C/C++ based Fortran libraries if any are made when linking against GPU libc.

…ows linking of GPU LIBC for the fortran and OpenMP runtime

This patch seeks to add the -gpulibc and -nogpulibc for Flang, which allows
the linking of the GPU libc library, this allows the use of memcpy and other useful library functions for GPU.

In particular, this allows the Fortran runtime (written in C++) to be compiled for offload and then linked against
the GPU LIBC library via this option to resolve memcpy and other C library functions that the fortran runtime
depends on for AMD GPU devices (and likely other GPU devices).

This is the current method I've tested and found to be able to utilise the Fortran runtime when compiled
for AMD GPU, albeit it requires compiling libc for GPU and then the Fortran runtime for GPU, so not
particularly straight forward or user friendly yet.

Activating this option will allow the subset of C functions to also be utilised for GPU in
other C/C++ based Fortran libraries if any are made.
@llvmbot llvmbot added clang Clang issues not falling into any other category flang:driver flang Flang issues not falling into any other category labels Jan 5, 2024
@llvmbot
Copy link
Member

llvmbot commented Jan 5, 2024

@llvm/pr-subscribers-flang-driver

@llvm/pr-subscribers-clang

Author: None (agozillon)

Changes

This patch seeks to add the -gpulibc and -nogpulibc for Flang, which allows the linking of the GPU libc library, this allows the use of memcpy and other useful library functions for GPU.

In particular, this allows the Fortran runtime (written in C++) to be compiled for offload and then linked against the GPU LIBC library via this option to resolve memcpy and other C library functions that the fortran runtime depends on for AMD GPU devices (and likely other GPU devices).

This is the current method I've tested and found to be able to utilise the Fortran runtime when compiled for AMD GPU, albeit it requires compiling libc for GPU and then the Fortran runtime for GPU, so not particularly straight forward or user friendly yet.

Activating this option will allow the subset of C functions to also be utilised for GPU in other C/C++ based Fortran libraries if any are made when linking against GPU libc.


Full diff: https://github.com/llvm/llvm-project/pull/77135.diff

3 Files Affected:

  • (modified) clang/include/clang/Driver/Options.td (+2-2)
  • (modified) flang/test/Driver/driver-help-hidden.f90 (+1)
  • (modified) flang/test/Driver/driver-help.f90 (+2)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 6aff37f1336871..12f41a1ea03a8c 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -5198,9 +5198,9 @@ def nogpulib : Flag<["-"], "nogpulib">, MarshallingInfoFlag<LangOpts<"NoGPULib">
   Visibility<[ClangOption, CC1Option]>,
   HelpText<"Do not link device library for CUDA/HIP device compilation">;
 def : Flag<["-"], "nocudalib">, Alias<nogpulib>;
-def gpulibc : Flag<["-"], "gpulibc">, Visibility<[ClangOption, CC1Option]>,
+def gpulibc : Flag<["-"], "gpulibc">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>,
   HelpText<"Link the LLVM C Library for GPUs">;
-def nogpulibc : Flag<["-"], "nogpulibc">, Visibility<[ClangOption, CC1Option]>;
+def nogpulibc : Flag<["-"], "nogpulibc">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
 def nodefaultlibs : Flag<["-"], "nodefaultlibs">;
 def nodriverkitlib : Flag<["-"], "nodriverkitlib">;
 def nofixprebinding : Flag<["-"], "nofixprebinding">;
diff --git a/flang/test/Driver/driver-help-hidden.f90 b/flang/test/Driver/driver-help-hidden.f90
index 9a11a7a571ffcc..70bb9f8eb512ce 100644
--- a/flang/test/Driver/driver-help-hidden.f90
+++ b/flang/test/Driver/driver-help-hidden.f90
@@ -108,6 +108,7 @@
 ! CHECK-NEXT: -fxor-operator          Enable .XOR. as a synonym of .NEQV.
 ! CHECK-NEXT: -gline-directives-only  Emit debug line info directives only
 ! CHECK-NEXT: -gline-tables-only      Emit debug line number tables only
+! CHECK-NEXT: -gpulibc                Link the LLVM C Library for GPUs
 ! CHECK-NEXT: -g                      Generate source-level debug information
 ! CHECK-NEXT: --help-hidden           Display help for hidden options
 ! CHECK-NEXT: -help                   Display available options
diff --git a/flang/test/Driver/driver-help.f90 b/flang/test/Driver/driver-help.f90
index e0e74dc56f331e..0d760616aace04 100644
--- a/flang/test/Driver/driver-help.f90
+++ b/flang/test/Driver/driver-help.f90
@@ -94,6 +94,7 @@
 ! HELP-NEXT: -fxor-operator          Enable .XOR. as a synonym of .NEQV.
 ! HELP-NEXT: -gline-directives-only  Emit debug line info directives only
 ! HELP-NEXT: -gline-tables-only      Emit debug line number tables only
+! HELP-NEXT: -gpulibc                Link the LLVM C Library for GPUs
 ! HELP-NEXT: -g                      Generate source-level debug information
 ! HELP-NEXT: --help-hidden           Display help for hidden options
 ! HELP-NEXT: -help                   Display available options
@@ -228,6 +229,7 @@
 ! HELP-FC1-NEXT: -fversion-loops-for-stride
 ! HELP-FC1-NEXT:                         Create unit-strided versions of loops
 ! HELP-FC1-NEXT: -fxor-operator          Enable .XOR. as a synonym of .NEQV.
+! HELP-FC1-NEXT: -gpulibc                Link the LLVM C Library for GPUs
 ! HELP-FC1-NEXT: -help                   Display available options
 ! HELP-FC1-NEXT: -init-only              Only execute frontend initialization
 ! HELP-FC1-NEXT: -I <dir>                Add directory to the end of the list of include search paths

@banach-space
Copy link
Contributor

Makes sense to me, though this is not my area of expertise. Could you add a bit more elaborate test? Perhaps something that would check the linker invocation>?

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting this with Fortran makes sense. This option basically controls whether or not the GPU toolchain will implicitly include the libcgpu.a static library via -lcgpu. It defaults to on if it finds the libc wrapper headers in the clang resource directory, lib/clang/18/include/llvm_libc_wrappers/llvm-libc-decls. I'm assuming that Fortran doesn't have this?

It's supposed to wrap around the C standard headers so the compiler knows that we have certain libc functions on the GPU. However, OpenMP will pretty much just assume anything referenced on the GPU is implicitly on the device so it will likely work for most functions without the wrapper headers. The important exception is stdout and friends. Because this is a global, OpenMP by default will try to map the host value rather than use the one present in libcgpu so we need to declare it on the GPU so it avoids the implicit map.

I'd be very interested in troubleshooting anything to get this working on Fortran.

@jhuber6
Copy link
Contributor

jhuber6 commented Jan 5, 2024

Makes sense to me, though this is not my area of expertise. Could you add a bit more elaborate test? Perhaps something that would check the linker invocation>?

I'm not familiar with how Fortran handles stuff here. It's tested in the clang portion at least. The handling of this is in CommonArgs somewhere I believe. If Fortran shares that it should be inherited, so it's at least tested in the clang version so it might be fine.

@agozillon
Copy link
Contributor Author

I am gonna sign off for the weekend as it's quite late here, so I'll reply in a little more detail on Monday and update the PR further. but I'd be happy to add a further flang test, although not too sure what it'd be, so suggestions are welcome.

I tested this with an out of tree build of GPU libc (basically two seperate build directories) and found that -lgpuc wouldn't get the ordering correct to link the library correctly to the fortran runtime, so it seemed for this specific case of an out of tree build of GPU libc the option was the correct way to get it linked in in the correct order. In the case of it finding it in the correct directory i didn't quite manage the perfect build recipe for it (suggestions welcome here as well) and tend to not use the install option myself, but perhaps it would auto detect for Flang as well! However, in the case where it's an separately compiled and installed gpu libc it might be nice to have this option activated as well for Flang to make both methods of linking possible. However, i am a little bit of a driver and build environment/system noob so ill defer to everyone else's better judgement in this case!

@jhuber6
Copy link
Contributor

jhuber6 commented Jan 5, 2024

I am gonna sign off for the weekend as it's quite late here, so I'll reply in a little more detail on Monday and update the PR further. but I'd be happy to add a further flang test, although not too sure what it'd be, so suggestions are welcome.

I tested this with an out of tree build of GPU libc (basically two seperate build directories) and found that -lgpuc wouldn't get the ordering correct to link the library correctly to the fortran runtime, so it seemed for this specific case of an out of tree build of GPU libc the option was the correct way to get it linked in in the correct order. In the case of it finding it in the correct directory i didn't quite manage the perfect build recipe for it (suggestions welcome here as well) and tend to not use the install option myself, but perhaps it would auto detect for Flang as well! However, in the case where it's an separately compiled and installed gpu libc it might be nice to have this option activated as well for Flang to make both methods of linking possible. However, i am a little bit of a driver and build environment/system noob so ill defer to everyone else's better judgement in this case!

If you have the static library, and it contains an entry for the desired architecture, it should just work so long as you're using the "new" driver pipeline. However, ordering is important here. It behaves similarly to the GNU BFD linker, where a static library is only checked against the current state of the symbol table as it reads the files in input order. So uses.o -lfoo will extract but -lfoo uses.o will not.

It's possible that this just was being linked too late with however Fortran handles it. I decided to be conservative with the default here because I'm assuming very few people will actually have the GPU libc.

It would be very interesting to see something like puts working from Fortran, so let me know if there's anything I can do to help.

@banach-space
Copy link
Contributor

Thanks for the discussion!

It defaults to on if it finds the libc wrapper headers in the clang resource directory, lib/clang/18/include/llvm_libc_wrappers/llvm-libc-decls. I'm assuming that Fortran doesn't have this?

It shouldn't, which means that the semantics of -gpulibc will be a bit different in Flang, right? That's something that could be tested.

I'm not familiar with how Fortran handles stuff here. It's tested in the clang portion at least. The handling of this is in CommonArgs somewhere I believe. If Fortran shares that it should be inherited, so it's at least tested in the clang version so it might be fine.

Some bits in "CommonArgs" will be shared, but we do specialise for Flang in various places. Also, tests in Clang check the driver in the "Clang" mode - it would be good to verify this option in the "Flang" mode as well. There's driver-help.f90, but it is not that helpful (it only makes sure that we don't pollute flang-new -help with options from Clang that are not supported).

However, i am a little bit of a driver and build environment/system noob

Not true, you've already landed a few patches :)

ill defer to everyone else's better judgement in this case!

Replicating the following would be sufficient: https://github.com/llvm/llvm-project/blob/main/clang/test/Driver/openmp-offload-gpu.c#L392.

@agozillon
Copy link
Contributor Author

Thanks for the discussion!

It defaults to on if it finds the libc wrapper headers in the clang resource directory, lib/clang/18/include/llvm_libc_wrappers/llvm-libc-decls. I'm assuming that Fortran doesn't have this?

It shouldn't, which means that the semantics of -gpulibc will be a bit different in Flang, right? That's something that could be tested.

I believe Flang inherits this functionality via the addOpenMPDeviceLibC function in CommonArgs.cpp, which gets called after the Fortran runtime libraries are added for each of the relevant ToolChains (gnu etc.) from what I can tell! It's where the gpulibc/nogpulibc flags are also handled. However, the desired library doesn't reside in those directories with a regular build command, you seem to require adding the building of GPU libc specifically to your build options and then subsequently installing the build into a directory! The build I've tested with is an amalgamation of Clang/OpenMP/Flang/GPU LIBC.

That is to say the auto find and include seems to work quite happily for Flang (at least when the whole host of projects are enabled and installed), but having the options available would be desirable, the most important cases being to be able to turn off GPU libc inclusion in an installed build and turn it on in a regular non-installed build (provided it can find it in your environments path). Just more flexibility to replicate what Clang has just now.

I'm not familiar with how Fortran handles stuff here. It's tested in the clang portion at least. The handling of this is in CommonArgs somewhere I believe. If Fortran shares that it should be inherited, so it's at least tested in the clang version so it might be fine.

Some bits in "CommonArgs" will be shared, but we do specialise for Flang in various places. Also, tests in Clang check the driver in the "Clang" mode - it would be good to verify this option in the "Flang" mode as well. There's driver-help.f90, but it is not that helpful (it only makes sure that we don't pollute flang-new -help with options from Clang that are not supported).

However, i am a little bit of a driver and build environment/system noob

Not true, you've already landed a few patches :)

True, thank you :-)!

ill defer to everyone else's better judgement in this case!

Replicating the following would be sufficient: https://github.com/llvm/llvm-project/blob/main/clang/test/Driver/openmp-offload-gpu.c#L392.

Thank you I'll add a similar test to the PR in the flang/test/Driver/omp-driver-offload.f90 test file, I believe this is still the closest equivalent we have to openmp-offload-gpu.c, but please do correct me if I am wrong!

@agozillon
Copy link
Contributor Author

I am gonna sign off for the weekend as it's quite late here, so I'll reply in a little more detail on Monday and update the PR further. but I'd be happy to add a further flang test, although not too sure what it'd be, so suggestions are welcome.
I tested this with an out of tree build of GPU libc (basically two seperate build directories) and found that -lgpuc wouldn't get the ordering correct to link the library correctly to the fortran runtime, so it seemed for this specific case of an out of tree build of GPU libc the option was the correct way to get it linked in in the correct order. In the case of it finding it in the correct directory i didn't quite manage the perfect build recipe for it (suggestions welcome here as well) and tend to not use the install option myself, but perhaps it would auto detect for Flang as well! However, in the case where it's an separately compiled and installed gpu libc it might be nice to have this option activated as well for Flang to make both methods of linking possible. However, i am a little bit of a driver and build environment/system noob so ill defer to everyone else's better judgement in this case!

If you have the static library, and it contains an entry for the desired architecture, it should just work so long as you're using the "new" driver pipeline. However, ordering is important here. It behaves similarly to the GNU BFD linker, where a static library is only checked against the current state of the symbol table as it reads the files in input order. So uses.o -lfoo will extract but -lfoo uses.o will not.

It's possible that this just was being linked too late with however Fortran handles it. I decided to be conservative with the default here because I'm assuming very few people will actually have the GPU libc.

I think it might just have been a little too late, or perhaps I was doing it incorrectly, always a possibility, but in either case it's possible to have it auto-included similarly to Clang with Flang, if the library resides in the correct directory and if it's not available in the directory enabling these options will allow it to still be linked into the runtime correctly when the correct one is specified on the command line! So I believe it will work fine in the above cases, just passing the library directly on the command line via -lcgpu just won't work to resolve the necessary calls in the fortran runtime for now unfortunately, but as the other two methods work fine it's not particularly necessary!

It would be very interesting to see something like puts working from Fortran, so let me know if there's anything I can do to help.

I'd love to try more complex programs in the near future that depend on more runtime features (in particular the stdout you mentioned previously would be interesting to try, but I'm not too sure how the print functionality in flang-new actually works at the moment), for the moment this has primarily just been me trying to fix a test I came across that will utilise a Fortran runtime function on device without a user explicitly calling it (an assign operation that lowers to a runtime function call) and it works quite nicely with a fortran runtime compiled for GPU and then linked into the GPU libc library for now (with some other not driver or library related changes)! I imagine we'll be running into more uses for the runtime on device in the near future to test.

@agozillon
Copy link
Contributor Author

Added a similar test for the options to Flang's omp-driver-offload.f90 in the last commit, happy to add more tests if desired, just might need some suggestions if that's the case!

Copy link
Contributor

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, ta!

@agozillon
Copy link
Contributor Author

Thank you very much for your time and review @banach-space and @jhuber6 I'll land this tomorrow afternoon (for EU timezones) so that I can more easily babysit the buildbots on the very small chance something goes wrong.

justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
…ows linking of GPU LIBC for the fortran and OpenMP runtime (llvm#77135)

This patch seeks to add the -gpulibc and -nogpulibc for Flang, which
allows the linking of the GPU libc library, this allows the use of
memcpy and other useful library functions for GPU.

In particular, this allows the Fortran runtime (written in C++) to be
compiled for offload and then linked against the GPU LIBC library via
this option to resolve memcpy and other C library functions that the
fortran runtime depends on for AMD GPU devices (and likely other GPU
devices).

This is the current method I've tested and found to be able to utilise
the Fortran runtime when compiled for AMD GPU, albeit it requires
compiling libc for GPU and then the Fortran runtime for GPU, so not
particularly straight forward or user friendly yet.

Activating this option will allow the subset of C functions to also be
utilised for GPU in other C/C++ based Fortran libraries if any are made
when linking against GPU libc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category flang:driver flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants