[Clang] Put offloading globals in the `.llvm.rodata.offloading` section #111890

jhuber6 · 2024-10-10T18:45:43Z

Summary:
For our offloading entries, we currently store all the string names of
kernels that the runtime will need to load from the target executable.
These are available via pointer in the __tgt_offload_entry struct,
however this makes it difficult to obtain from the object itself. This
patch simply puts the strings in a named section so they can be easily
queried.

The motivation behind this is that when the linker wrapper is doing
linking, it wants to know which kernels the host executable is calling.
We could get this already via the .relaomp_offloading_entires
section and trawling through the string table, but that's quite annoying
and not portable. The follow-up to this should be to make the linker
wrapper get a list of all used symbols the device link job should count
as "needed" so we can handle static linking more directly.

Summary: For our offloading entries, we currently store all the string names of kernels that the runtime will need to load from the target executable. These are available via pointer in the `__tgt_offload_entry` struct, however this makes it difficult to obtain from the object itself. This patch simply puts the strings in a named section so they can be easily queried. The motivation behind this is that when the linker wrapper is doing linking, it wants to know which kernels the host executable is calling. We *could* get this already via the `.relaomp_offloading_entires` section and trawling through the string table, but that's quite annoying and not portable. The follow-up to this should be to make the linker wrapper get a list of all used symbols the device link job should count as "needed" so we can handle static linking more directly.

llvmbot · 2024-10-10T23:19:58Z

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
For our offloading entries, we currently store all the string names of
kernels that the runtime will need to load from the target executable.
These are available via pointer in the __tgt_offload_entry struct,
however this makes it difficult to obtain from the object itself. This
patch simply puts the strings in a named section so they can be easily
queried.

The motivation behind this is that when the linker wrapper is doing
linking, it wants to know which kernels the host executable is calling.
We could get this already via the .relaomp_offloading_entires
section and trawling through the string table, but that's quite annoying
and not portable. The follow-up to this should be to make the linker
wrapper get a list of all used symbols the device link job should count
as "needed" so we can handle static linking more directly.

Full diff: https://github.com/llvm/llvm-project/pull/111890.diff

2 Files Affected:

(modified) clang/test/CodeGenCUDA/offloading-entries.cu (+45-20)
(modified) llvm/lib/Frontend/Offloading/Utility.cpp (+8)

diff --git a/clang/test/CodeGenCUDA/offloading-entries.cu b/clang/test/CodeGenCUDA/offloading-entries.cu
index ec21f018607ff0..259e3324e8ac94 100644
--- a/clang/test/CodeGenCUDA/offloading-entries.cu
+++ b/clang/test/CodeGenCUDA/offloading-entries.cu
@@ -15,48 +15,48 @@
 #include "Inputs/cuda.h"
 
 //.
-// CUDA: @.offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00"
+// CUDA: @.offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00", section ".llvm.rodata.offloading", align 1
 // CUDA: @.offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__foov, ptr @.offloading.entry_name, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries", align 1
-// CUDA: @.offloading.entry_name.1 = internal unnamed_addr constant [11 x i8] c"_Z6kernelv\00"
+// CUDA: @.offloading.entry_name.1 = internal unnamed_addr constant [11 x i8] c"_Z6kernelv\00", section ".llvm.rodata.offloading", align 1
 // CUDA: @.offloading.entry._Z6kernelv = weak constant %struct.__tgt_offload_entry { ptr @_Z21__device_stub__kernelv, ptr @.offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries", align 1
-// CUDA: @.offloading.entry_name.2 = internal unnamed_addr constant [4 x i8] c"var\00"
+// CUDA: @.offloading.entry_name.2 = internal unnamed_addr constant [4 x i8] c"var\00", section ".llvm.rodata.offloading", align 1
 // CUDA: @.offloading.entry.var = weak constant %struct.__tgt_offload_entry { ptr @var, ptr @.offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "cuda_offloading_entries", align 1
-// CUDA: @.offloading.entry_name.3 = internal unnamed_addr constant [5 x i8] c"surf\00"
+// CUDA: @.offloading.entry_name.3 = internal unnamed_addr constant [5 x i8] c"surf\00", section ".llvm.rodata.offloading", align 1
 // CUDA: @.offloading.entry.surf = weak constant %struct.__tgt_offload_entry { ptr @surf, ptr @.offloading.entry_name.3, i64 4, i32 2, i32 1 }, section "cuda_offloading_entries", align 1
-// CUDA: @.offloading.entry_name.4 = internal unnamed_addr constant [4 x i8] c"tex\00"
+// CUDA: @.offloading.entry_name.4 = internal unnamed_addr constant [4 x i8] c"tex\00", section ".llvm.rodata.offloading", align 1
 // CUDA: @.offloading.entry.tex = weak constant %struct.__tgt_offload_entry { ptr @tex, ptr @.offloading.entry_name.4, i64 4, i32 3, i32 1 }, section "cuda_offloading_entries", align 1
 //.
-// HIP: @.offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00"
+// HIP: @.offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00", section ".llvm.rodata.offloading", align 1
 // HIP: @.offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z3foov, ptr @.offloading.entry_name, i64 0, i32 0, i32 0 }, section "hip_offloading_entries", align 1
-// HIP: @.offloading.entry_name.1 = internal unnamed_addr constant [11 x i8] c"_Z6kernelv\00"
+// HIP: @.offloading.entry_name.1 = internal unnamed_addr constant [11 x i8] c"_Z6kernelv\00", section ".llvm.rodata.offloading", align 1
 // HIP: @.offloading.entry._Z6kernelv = weak constant %struct.__tgt_offload_entry { ptr @_Z6kernelv, ptr @.offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "hip_offloading_entries", align 1
-// HIP: @.offloading.entry_name.2 = internal unnamed_addr constant [4 x i8] c"var\00"
+// HIP: @.offloading.entry_name.2 = internal unnamed_addr constant [4 x i8] c"var\00", section ".llvm.rodata.offloading", align 1
 // HIP: @.offloading.entry.var = weak constant %struct.__tgt_offload_entry { ptr @var, ptr @.offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "hip_offloading_entries", align 1
-// HIP: @.offloading.entry_name.3 = internal unnamed_addr constant [5 x i8] c"surf\00"
+// HIP: @.offloading.entry_name.3 = internal unnamed_addr constant [5 x i8] c"surf\00", section ".llvm.rodata.offloading", align 1
 // HIP: @.offloading.entry.surf = weak constant %struct.__tgt_offload_entry { ptr @surf, ptr @.offloading.entry_name.3, i64 4, i32 2, i32 1 }, section "hip_offloading_entries", align 1
-// HIP: @.offloading.entry_name.4 = internal unnamed_addr constant [4 x i8] c"tex\00"
+// HIP: @.offloading.entry_name.4 = internal unnamed_addr constant [4 x i8] c"tex\00", section ".llvm.rodata.offloading", align 1
 // HIP: @.offloading.entry.tex = weak constant %struct.__tgt_offload_entry { ptr @tex, ptr @.offloading.entry_name.4, i64 4, i32 3, i32 1 }, section "hip_offloading_entries", align 1
 //.
-// CUDA-COFF: @.offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00"
+// CUDA-COFF: @.offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00", section ".llvm.rodata.offloading", align 1
 // CUDA-COFF: @.offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__foov, ptr @.offloading.entry_name, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1
-// CUDA-COFF: @.offloading.entry_name.1 = internal unnamed_addr constant [11 x i8] c"_Z6kernelv\00"
+// CUDA-COFF: @.offloading.entry_name.1 = internal unnamed_addr constant [11 x i8] c"_Z6kernelv\00", section ".llvm.rodata.offloading", align 1
 // CUDA-COFF: @.offloading.entry._Z6kernelv = weak constant %struct.__tgt_offload_entry { ptr @_Z21__device_stub__kernelv, ptr @.offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1
-// CUDA-COFF: @.offloading.entry_name.2 = internal unnamed_addr constant [4 x i8] c"var\00"
+// CUDA-COFF: @.offloading.entry_name.2 = internal unnamed_addr constant [4 x i8] c"var\00", section ".llvm.rodata.offloading", align 1
 // CUDA-COFF: @.offloading.entry.var = weak constant %struct.__tgt_offload_entry { ptr @var, ptr @.offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1
-// CUDA-COFF: @.offloading.entry_name.3 = internal unnamed_addr constant [5 x i8] c"surf\00"
+// CUDA-COFF: @.offloading.entry_name.3 = internal unnamed_addr constant [5 x i8] c"surf\00", section ".llvm.rodata.offloading", align 1
 // CUDA-COFF: @.offloading.entry.surf = weak constant %struct.__tgt_offload_entry { ptr @surf, ptr @.offloading.entry_name.3, i64 4, i32 2, i32 1 }, section "cuda_offloading_entries$OE", align 1
-// CUDA-COFF: @.offloading.entry_name.4 = internal unnamed_addr constant [4 x i8] c"tex\00"
+// CUDA-COFF: @.offloading.entry_name.4 = internal unnamed_addr constant [4 x i8] c"tex\00", section ".llvm.rodata.offloading", align 1
 // CUDA-COFF: @.offloading.entry.tex = weak constant %struct.__tgt_offload_entry { ptr @tex, ptr @.offloading.entry_name.4, i64 4, i32 3, i32 1 }, section "cuda_offloading_entries$OE", align 1
 //.
-// HIP-COFF: @.offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00"
+// HIP-COFF: @.offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00", section ".llvm.rodata.offloading", align 1
 // HIP-COFF: @.offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z3foov, ptr @.offloading.entry_name, i64 0, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1
-// HIP-COFF: @.offloading.entry_name.1 = internal unnamed_addr constant [11 x i8] c"_Z6kernelv\00"
+// HIP-COFF: @.offloading.entry_name.1 = internal unnamed_addr constant [11 x i8] c"_Z6kernelv\00", section ".llvm.rodata.offloading", align 1
 // HIP-COFF: @.offloading.entry._Z6kernelv = weak constant %struct.__tgt_offload_entry { ptr @_Z6kernelv, ptr @.offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1
-// HIP-COFF: @.offloading.entry_name.2 = internal unnamed_addr constant [4 x i8] c"var\00"
+// HIP-COFF: @.offloading.entry_name.2 = internal unnamed_addr constant [4 x i8] c"var\00", section ".llvm.rodata.offloading", align 1
 // HIP-COFF: @.offloading.entry.var = weak constant %struct.__tgt_offload_entry { ptr @var, ptr @.offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1
-// HIP-COFF: @.offloading.entry_name.3 = internal unnamed_addr constant [5 x i8] c"surf\00"
+// HIP-COFF: @.offloading.entry_name.3 = internal unnamed_addr constant [5 x i8] c"surf\00", section ".llvm.rodata.offloading", align 1
 // HIP-COFF: @.offloading.entry.surf = weak constant %struct.__tgt_offload_entry { ptr @surf, ptr @.offloading.entry_name.3, i64 4, i32 2, i32 1 }, section "hip_offloading_entries$OE", align 1
-// HIP-COFF: @.offloading.entry_name.4 = internal unnamed_addr constant [4 x i8] c"tex\00"
+// HIP-COFF: @.offloading.entry_name.4 = internal unnamed_addr constant [4 x i8] c"tex\00", section ".llvm.rodata.offloading", align 1
 // HIP-COFF: @.offloading.entry.tex = weak constant %struct.__tgt_offload_entry { ptr @tex, ptr @.offloading.entry_name.4, i64 4, i32 3, i32 1 }, section "hip_offloading_entries$OE", align 1
 //.
 // CUDA-LABEL: @_Z18__device_stub__foov(
@@ -137,3 +137,28 @@ template <typename T, int dim = 1, int mode = 0>
 struct __attribute__((device_builtin_texture_type)) texture : public textureReference {};
 
 texture<void> tex;
+//.
+// CUDA: [[META0:![0-9]+]] = !{ptr @.offloading.entry_name}
+// CUDA: [[META1:![0-9]+]] = !{ptr @.offloading.entry_name.1}
+// CUDA: [[META2:![0-9]+]] = !{ptr @.offloading.entry_name.2}
+// CUDA: [[META3:![0-9]+]] = !{ptr @.offloading.entry_name.3}
+// CUDA: [[META4:![0-9]+]] = !{ptr @.offloading.entry_name.4}
+//.
+// HIP: [[META0:![0-9]+]] = !{ptr @.offloading.entry_name}
+// HIP: [[META1:![0-9]+]] = !{ptr @.offloading.entry_name.1}
+// HIP: [[META2:![0-9]+]] = !{ptr @.offloading.entry_name.2}
+// HIP: [[META3:![0-9]+]] = !{ptr @.offloading.entry_name.3}
+// HIP: [[META4:![0-9]+]] = !{ptr @.offloading.entry_name.4}
+//.
+// CUDA-COFF: [[META0:![0-9]+]] = !{ptr @.offloading.entry_name}
+// CUDA-COFF: [[META1:![0-9]+]] = !{ptr @.offloading.entry_name.1}
+// CUDA-COFF: [[META2:![0-9]+]] = !{ptr @.offloading.entry_name.2}
+// CUDA-COFF: [[META3:![0-9]+]] = !{ptr @.offloading.entry_name.3}
+// CUDA-COFF: [[META4:![0-9]+]] = !{ptr @.offloading.entry_name.4}
+//.
+// HIP-COFF: [[META0:![0-9]+]] = !{ptr @.offloading.entry_name}
+// HIP-COFF: [[META1:![0-9]+]] = !{ptr @.offloading.entry_name.1}
+// HIP-COFF: [[META2:![0-9]+]] = !{ptr @.offloading.entry_name.2}
+// HIP-COFF: [[META3:![0-9]+]] = !{ptr @.offloading.entry_name.3}
+// HIP-COFF: [[META4:![0-9]+]] = !{ptr @.offloading.entry_name.4}
+//.
diff --git a/llvm/lib/Frontend/Offloading/Utility.cpp b/llvm/lib/Frontend/Offloading/Utility.cpp
index 010c0bfd3be76b..7a0a7afcfcb5c9 100644
--- a/llvm/lib/Frontend/Offloading/Utility.cpp
+++ b/llvm/lib/Frontend/Offloading/Utility.cpp
@@ -53,7 +53,15 @@ offloading::getOffloadingEntryInitializer(Module &M, Constant *Addr,
   auto *Str =
       new GlobalVariable(M, AddrName->getType(), /*isConstant=*/true,
                          GlobalValue::InternalLinkage, AddrName, Prefix);
+  StringRef SectionName = ".llvm.rodata.offloading";
   Str->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
+  Str->setSection(SectionName);
+  Str->setAlignment(Align(1));
+
+  // Make a metadata node for these constants so it can be queried from IR.
+  NamedMDNode *MD = M.getOrInsertNamedMetadata("llvm.offloading.symbols");
+  Metadata *MDVals[] = {ConstantAsMetadata::get(Str)};
+  MD->addOperand(llvm::MDNode::get(M.getContext(), MDVals));
 
   // Construct the offloading entry.
   Constant *EntryData[] = {

Summary: Standard archive linking semantics dictate that archive members are only extracted if they are used, i.e. define a currently undefined symbol. This poses some issues for offloading langues because there can be a kernel called `foo` in a device archive. The CPU archive is extracted but `foo` is never referenced on the device so it is not extracted. This patch augments this behavior to consider all symbols used by the offloading toolchain as 'undefined' so they will extract archives. We do this by getting a list of all the kernel names the runtime will call from the offloading entires list. Not that this does not bother checking if the *host* side will actually use the archive, so the case where a kernel is not used by the static library will still cause it to extract. I could fix this but it felt like a lot of effort for little gain. Previously this was solved by forcibly extracting anything with public visibility. This means that we can now link device code with externally visible symbols and don't need to worry about it always extracting as well. Depends on llvm#111890

shiltian

It looks reasonable to me.

jhuber6 · 2024-10-11T01:21:42Z

It looks reasonable to me.

Did you mean to approve? Also I'm undecided on the name. I was going to use llvm.offloading.symbols but then I remembered that I reserved llvm.offloading as a prefix for the LLVM_OFFLOADING section type which denotes embedded code.

jplehr · 2024-10-11T11:46:44Z

Would there be any reason to put entries for different offloading languages into distinctly named “sub entries”?

jhuber6 · 2024-10-18T14:28:34Z

Would there be any reason to put entries for different offloading languages into distinctly named “sub entries”?

Not really, this is just a list of all the device-side symbols the host might want, that would make it more difficult to later extract.

Also ping.

jhuber6 · 2024-10-25T16:02:17Z

ping

yxsamliu

LGTM

…on (llvm#111890) Summary: For our offloading entries, we currently store all the string names of kernels that the runtime will need to load from the target executable. These are available via pointer in the `__tgt_offload_entry` struct, however this makes it difficult to obtain from the object itself. This patch simply puts the strings in a named section so they can be easily queried. The motivation behind this is that when the linker wrapper is doing linking, it wants to know which kernels the host executable is calling. We *could* get this already via the `.relaomp_offloading_entires` section and trawling through the string table, but that's quite annoying and not portable. The follow-up to this should be to make the linker wrapper get a list of all used symbols the device link job should count as "needed" so we can handle static linking more directly.

jhuber6 requested review from Artem-B, jdoerfert, JonChesterfield, shiltian and yxsamliu October 10, 2024 18:45

jhuber6 force-pushed the offloadstrings branch from 49550ad to 13967a7 Compare October 10, 2024 21:26

jhuber6 force-pushed the offloadstrings branch from 13967a7 to db58755 Compare October 10, 2024 23:19

llvmbot added the clang Clang issues not falling into any other category label Oct 10, 2024

jhuber6 mentioned this pull request Oct 10, 2024

[LinkerWrapper] Extract device archives if symbol is used on the host #111921

Open

shiltian reviewed Oct 11, 2024

View reviewed changes

jhuber6 requested a review from jplehr October 21, 2024 16:43

yxsamliu approved these changes Oct 25, 2024

View reviewed changes

jhuber6 merged commit 42eb54b into llvm:main Oct 28, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Clang] Put offloading globals in the `.llvm.rodata.offloading` section #111890

[Clang] Put offloading globals in the `.llvm.rodata.offloading` section #111890

Uh oh!

jhuber6 commented Oct 10, 2024

Uh oh!

llvmbot commented Oct 10, 2024

Uh oh!

shiltian left a comment

Uh oh!

jhuber6 commented Oct 11, 2024

Uh oh!

jplehr commented Oct 11, 2024

Uh oh!

jhuber6 commented Oct 18, 2024

Uh oh!

jhuber6 commented Oct 25, 2024

Uh oh!

yxsamliu left a comment

Uh oh!

Uh oh!

Uh oh!

[Clang] Put offloading globals in the .llvm.rodata.offloading section #111890

[Clang] Put offloading globals in the .llvm.rodata.offloading section #111890

Uh oh!

Conversation

jhuber6 commented Oct 10, 2024

Uh oh!

llvmbot commented Oct 10, 2024

Uh oh!

shiltian left a comment

Choose a reason for hiding this comment

Uh oh!

jhuber6 commented Oct 11, 2024

Uh oh!

jplehr commented Oct 11, 2024

Uh oh!

jhuber6 commented Oct 18, 2024

Uh oh!

jhuber6 commented Oct 25, 2024

Uh oh!

yxsamliu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[Clang] Put offloading globals in the `.llvm.rodata.offloading` section #111890

[Clang] Put offloading globals in the `.llvm.rodata.offloading` section #111890