Skip to content

[clang][CodeGen] used globals are fake #93601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions clang/lib/CodeGen/CodeGenModule.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2922,18 +2922,21 @@ static void emitUsed(CodeGenModule &CGM, StringRef Name,
if (List.empty())
return;

// We unconditionally use unqualified pointers - this is intentional as these
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We unconditionally use unqualified pointers - this is intentional as these
// We unconditionally use AS0 pointers - this is intentional as these

Since this refers to LLVM and not Clang pointer types I'd use the address space terminology.

// are fake / ephemeral globals, serving only as lifetime extension hooks
llvm::Type *UsedPtrTy = llvm::PointerType::getUnqual(CGM.getLLVMContext());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to just use get(Ctx, 0)


// Convert List to what ConstantArray needs.
SmallVector<llvm::Constant*, 8> UsedArray;
UsedArray.resize(List.size());
for (unsigned i = 0, e = List.size(); i != e; ++i) {
UsedArray[i] =
llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(
cast<llvm::Constant>(&*List[i]), CGM.Int8PtrTy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got confused on the other thread. Is Int8PtrTy not always AS 0 in this case? I think always emitting used/compiler.used as a hardcoded addrspace 0 (IR 0, having nothing to do with the language or LangAS mapping) would be fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, Int8PtrTy is (now, after the change in the other thread) whatever LangAS::Default ends up mapping to. The idea being that the target would pick / reflect a contextually sane mapping. The one worry I have re: doing Unqual (just 0) here is that I suspect (but would have to check) that we'll then have to deal with invalid casts, where you have the target's 0 being e.g. private, and this line would try to ASCast from a Constant that'd be in global (for e.g. the SPIRV target this'd actually happen, global is 1 and private is 0). Or are you thinking we'd populate these differently, somehow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the purpose of used/compiler_used, I think it's best to ignore the language address space map (reiterating that the IR address space has little to do with the language). Just treat it as a special global that always uses addrspace(0). The required addrspacecasts will be inserted to 0, so it just leaves this as the status quo. It doesn't really matter if the target considers the casts valid or not, the used variables never codegened.

If we really had to go out of our way to avoid illegal constant expression casts, we'd probably have to have multiple used intrinsic variables per address space. In a better future we wouldn't allow constantexpr addrspacecasts, and then we'd also need to come up with another solution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm still struggling to see why it is best create (possibly) broken IR, super heroic "but let us build an even more complicated devices like multiple intrinsics per AS, and also redo how ASCasts work etc." and rely on implicit knowledge about what happens with used and compiler.used (ah, because these are never CodeGen-ed, it's fine to do whatever). What's the actual benefit here? Why not make it a special global that uses AS 42, since that'd actually signal it's special, as opposed to Unqual/0, which do the exact contrary. I'm also struggling to see the backwards compat angle since we do not guarantee IR-level compat between LLVM versions (yes it frequently works but it's not guaranteed to).

Copy link
Contributor

@arsenm arsenm May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm still struggling to see why it is best create (possibly) broken IR

It's not broken. It just needs to provide a use, and whatever casts are there do not matter. We can write whatever rules we want and have the verifier enforce it. Currently we don't have any generic concept of "illegal addrspacecasts" (relatedly we should probably stop throwing codegen errors on the cases we don't handle, and just lower them to poison)

Why not make it a special global that uses AS 42

Behavior of address spaces are target defined. We don't want to just grab random numbers for generic purposes. We've gradually been migrating away from 0 being special in more contexts, but nothing has been done for used (and I don't see a particularly compelling reason to do, used just needs to be a box that is a use)

IR-level compat between LLVM versions (yes it frequently works but it's not guaranteed to).

We do guarantee forward compatible bitcode. We've just done a bad job at it for some target stuff, particularly in cases where subtarget feature names have leaked into the IR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I still don't see the actual benefit being spelled out, I might be missing the obvious. I'll try again: I'm the FE (this is in Clang). I am emitting a global. Why would I do anything special here, versus emitting any other global, what am I gaining by not doing just that/losing by actually doing that. Surely it's straightforward to explain?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have HIP apps which ship bitcode library which may contain llvm.compiler.used or llvm.global_ctors which use addr space 0. If we switch to use addr space 1, the new HIP app may not be able to link with the existing bitcode libraries.

My understanding is that the addr space used in llvm.compiler.used or llvm.global_ctors do not really have meanings since no one loads or stores to them. As long as they are the same for different TU's it is fine. It should be OK to use addr space 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You aren't emitting a real global, it's this fake special case IR construct. It doesn't fit into the boxes provided by the target global/program/alloca address spaces

UsedArray[i] = llvm::ConstantExpr::getPointerCast(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we should get rid of it, getPointerBitCastOrAddrSpaceCast is the correct helper for now. We should have a getAddrSpaceCast that doesn't throw a fit on same typed pointers but getPointerCast will try to insert ptrtoint etc.

cast<llvm::Constant>(&*List[i]), UsedPtrTy);
}

if (UsedArray.empty())
return;
llvm::ArrayType *ATy = llvm::ArrayType::get(CGM.Int8PtrTy, UsedArray.size());
llvm::ArrayType *ATy = llvm::ArrayType::get(UsedPtrTy, UsedArray.size());

auto *GV = new llvm::GlobalVariable(
CGM.getModule(), ATy, false, llvm::GlobalValue::AppendingLinkage,
Expand Down
2 changes: 2 additions & 0 deletions clang/test/CodeGen/2005-12-04-AttributeUsed.c
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
// RUN: %clang_cc1 %s -triple x86_64-apple-darwin -emit-llvm -o - | FileCheck %s
// RUN: %clang_cc1 %s -triple amdgcn-amd-amdhsa -emit-llvm -o - | FileCheck %s --check-prefix=GLOBALAS

// CHECK: @llvm.used = appending global [2 x ptr] [ptr @foo, ptr @X], section "llvm.metadata"
// GLOBALAS: @llvm.compiler.used = appending addrspace(1) global [2 x ptr] [ptr @foo, ptr addrspacecast (ptr addrspace(1) @X to ptr)], section "llvm.metadata"
int X __attribute__((used));
int Y;

Expand Down
4 changes: 3 additions & 1 deletion clang/test/CodeGen/attr-retain.c
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
// RUN: %clang_cc1 -emit-llvm -triple x86_64 %s -o - | FileCheck %s
// RUN: %clang_cc1 -emit-llvm -triple amdgcn-amd-amdhsa %s -o - | FileCheck %s --check-prefix=GLOBALAS

/// Set !retain regardless of the target. The backend will lower !retain to
/// SHF_GNU_RETAIN on ELF and ignore the metadata for other binary formats.
Expand All @@ -11,7 +12,8 @@

// CHECK: @llvm.used = appending global [8 x ptr] [ptr @c0, ptr @foo.l0, ptr @f0, ptr @f2, ptr @g0, ptr @g1, ptr @g3, ptr @g4], section "llvm.metadata"
// CHECK: @llvm.compiler.used = appending global [3 x ptr] [ptr @f2, ptr @g3, ptr @g4], section "llvm.metadata"

// GLOBALAS: @llvm.used = appending addrspace(1) global [8 x ptr] [ptr addrspacecast (ptr addrspace(4) @c0 to ptr), ptr addrspacecast (ptr addrspace(1) @foo.l0 to ptr), ptr @f0, ptr @f2, ptr addrspacecast (ptr addrspace(1) @g0 to ptr), ptr addrspacecast (ptr addrspace(1) @g1 to ptr), ptr addrspacecast (ptr addrspace(1) @g3 to ptr), ptr addrspacecast (ptr addrspace(1) @g4 to ptr)], section "llvm.metadata"
// GLOBALAS: appending addrspace(1) global [3 x ptr] [ptr @f2, ptr addrspacecast (ptr addrspace(1) @g3 to ptr), ptr addrspacecast (ptr addrspace(1) @g4 to ptr)], section "llvm.metadata"
const int c0 __attribute__((retain)) = 42;

void foo(void) {
Expand Down
2 changes: 2 additions & 0 deletions clang/test/CodeGenCXX/attr-retain.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
// RUN: %clang_cc1 -emit-llvm -triple x86_64 -Werror %s -o - | FileCheck %s
// RUN: %clang_cc1 -emit-llvm -triple amdgcn-amd-amdhsa -Werror %s -o - | FileCheck %s --check-prefix=GLOBALAS

// CHECK: @llvm.used = appending global [7 x ptr]
// GLOBALAS: @llvm.used = appending addrspace(1) global [7 x ptr]
// CHECK-SAME: @_ZN2X0C2Ev
// CHECK-SAME: @_ZN2X0C1Ev
// CHECK-SAME: @_ZN2X0D2Ev
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ template<> int A<bool>::a = 10;
// MACHO: { i32, ptr, ptr } { i32 65535, ptr @[[unordered22:[^,]*]], ptr null },

// ELF: { i32, ptr, ptr } { i32 65535, ptr @[[unordered21:[^,]*]], ptr @_ZN4Fib2ILi5EE1aE },
// MACHO: { i32, ptr, ptr } { i32 65535, ptr @[[unordered21:[^,]*]], ptr null },
// MACHO: { i32, ptr, ptr } { i32 65535, ptr @[[unordered21:[^,]*]], ptr null },

// ALL: { i32, ptr, ptr } { i32 65535, ptr @_GLOBAL__sub_I_static_member_variable_explicit_specialization.cpp, ptr null }]

Expand Down
45 changes: 24 additions & 21 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,7 @@ added in the future:
not be used lightly but only for specific situations such as an
alternative to the *register pinning* performance technique often
used when implementing functional programming languages. At the
moment only X86, AArch64, and RISCV support this convention. The
moment only X86, AArch64, and RISCV support this convention. The
following limitations exist:

- On *X86-32* only up to 4 bit type parameters are supported. No
Expand Down Expand Up @@ -685,10 +685,10 @@ implementation defined, the optimizer can't do the latter. The former is
challenging as many commonly expected properties, such as
``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral types.
Similar restrictions apply to intrinsics that might examine the pointer bits,
such as :ref:`llvm.ptrmask<int_ptrmask>`.
such as :ref:`llvm.ptrmask<int_ptrmask>`.

The alignment information provided by the frontend for a non-integral pointer
(typically using attributes or metadata) must be valid for every possible
(typically using attributes or metadata) must be valid for every possible
representation of the pointer.

.. _globalvars:
Expand Down Expand Up @@ -1649,10 +1649,10 @@ Currently, only the following parameter attributes are defined:
- Both ``a`` and ``b`` are constants.
- The range is allowed to wrap.
- The range should not represent the full or empty set. That is, ``a!=b``.
This attribute may only be applied to parameters or return values with integer

This attribute may only be applied to parameters or return values with integer
or vector of integer types.

For vector-typed parameters, the range is applied element-wise.

.. _gc:
Expand Down Expand Up @@ -8642,8 +8642,11 @@ The '``llvm.used``' Global Variable
The ``@llvm.used`` global is an array which has
:ref:`appending linkage <linkage_appending>`. This array contains a list of
pointers to named global variables, functions and aliases which may optionally
have a pointer cast formed of bitcast or getelementptr. For example, a legal
use of it is:
have a pointer cast formed of bitcast or getelementptr. The address space of the
pointers is always unspecified, rather than the globals address space, since
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not say unspecified. Say it is the default address space. 0 is not an unspecified address space

these are not real global references but rather a special marker for the code
emission logic.
For example, a legal use of it is:

.. code-block:: llvm

Expand Down Expand Up @@ -14254,7 +14257,7 @@ Arguments:
""""""""""
The first 4 arguments are similar to ``llvm.instrprof.increment``. The indexing
is specific to callsites, meaning callsites are indexed from 0, independent from
the indexes used by the other intrinsics (such as
the indexes used by the other intrinsics (such as
``llvm.instrprof.increment[.step]``).

The last argument is the called value of the callsite this intrinsic precedes.
Expand All @@ -14268,7 +14271,7 @@ a buffer LLVM can use to perform counter increments (i.e. the lowering of
``llvm.instrprof.increment[.step]``. The address range following the counter
buffer, ``<num-counters>`` x ``sizeof(ptr)`` - sized, is expected to contain
pointers to contexts of functions called from this function ("subcontexts").
LLVM does not dereference into that memory region, just calculates GEPs.
LLVM does not dereference into that memory region, just calculates GEPs.

The lowering of ``llvm.instrprof.callsite`` consists of:

Expand Down Expand Up @@ -14888,8 +14891,8 @@ integer bit width or any vector of integer elements.
Overview:
"""""""""

Return ``-1`` if ``%a`` is signed less than ``%b``, ``0`` if they are equal, and
``1`` if ``%a`` is signed greater than ``%b``. Vector intrinsics operate on a per-element basis.
Return ``-1`` if ``%a`` is signed less than ``%b``, ``0`` if they are equal, and
``1`` if ``%a`` is signed greater than ``%b``. Vector intrinsics operate on a per-element basis.

Arguments:
""""""""""
Expand Down Expand Up @@ -14917,8 +14920,8 @@ integer bit width or any vector of integer elements.
Overview:
"""""""""

Return ``-1`` if ``%a`` is unsigned less than ``%b``, ``0`` if they are equal, and
``1`` if ``%a`` is unsigned greater than ``%b``. Vector intrinsics operate on a per-element basis.
Return ``-1`` if ``%a`` is unsigned less than ``%b``, ``0`` if they are equal, and
``1`` if ``%a`` is unsigned greater than ``%b``. Vector intrinsics operate on a per-element basis.

Arguments:
""""""""""
Expand Down Expand Up @@ -20980,9 +20983,9 @@ Semantics:
""""""""""

The '``llvm.vp.minimum``' intrinsic performs floating-point minimum (:ref:`minimum <i_minimum>`)
of the first and second vector operand on each enabled lane, the result being
of the first and second vector operand on each enabled lane, the result being
NaN if either operand is a NaN. -0.0 is considered to be less than +0.0 for this
intrinsic. The result on disabled lanes is a :ref:`poison value <poisonvalues>`.
intrinsic. The result on disabled lanes is a :ref:`poison value <poisonvalues>`.
The operation is performed in the default floating-point environment.

Examples:
Expand Down Expand Up @@ -21030,9 +21033,9 @@ Semantics:
""""""""""

The '``llvm.vp.maximum``' intrinsic performs floating-point maximum (:ref:`maximum <i_maximum>`)
of the first and second vector operand on each enabled lane, the result being
of the first and second vector operand on each enabled lane, the result being
NaN if either operand is a NaN. -0.0 is considered to be less than +0.0 for this
intrinsic. The result on disabled lanes is a :ref:`poison value <poisonvalues>`.
intrinsic. The result on disabled lanes is a :ref:`poison value <poisonvalues>`.
The operation is performed in the default floating-point environment.

Examples:
Expand Down Expand Up @@ -28309,7 +28312,7 @@ Semantics:
""""""""""

The intrinsic ``@llvm.allow.ubsan.check()`` returns either ``true`` or
``false``, depending on compiler options.
``false``, depending on compiler options.

For each evaluation of a call to this intrinsic, the program must be valid and
correct both if it returns ``true`` and if it returns ``false``.
Expand Down Expand Up @@ -28368,13 +28371,13 @@ Semantics:
""""""""""

The intrinsic ``@llvm.allow.runtime.check()`` returns either ``true`` or
``false``, depending on compiler options.
``false``, depending on compiler options.

For each evaluation of a call to this intrinsic, the program must be valid and
correct both if it returns ``true`` and if it returns ``false``.

When used in a branch condition, it allows us to choose between
two alternative correct solutions for the same problem.
two alternative correct solutions for the same problem.

If the intrinsic is evaluated as ``true``, program should execute a guarded
check. If the intrinsic is evaluated as ``false``, the program should avoid any
Expand Down
Loading