-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[DataLayout][LangRef] Split non-integral and unstable pointer properties #105735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: users/arichardson/spr/main.datalayoutlangref-split-non-integral-and-unstable-pointer-properties
Are you sure you want to change the base?
Changes from all commits
90068f1
e4bd118
35afb97
db97145
94ecfa3
278ce21
df9bdfe
7615db9
142a3ff
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -650,48 +650,127 @@ literal types are uniqued in recent versions of LLVM. | |
|
||
.. _nointptrtype: | ||
|
||
Non-Integral Pointer Type | ||
------------------------- | ||
Non-Integral and Unstable Pointer Types | ||
--------------------------------------- | ||
|
||
Note: non-integral pointer types are a work in progress, and they should be | ||
considered experimental at this time. | ||
Note: non-integral/unstable pointer types are a work in progress, and they | ||
should be considered experimental at this time. | ||
|
||
LLVM IR optionally allows the frontend to denote pointers in certain address | ||
spaces as "non-integral" via the :ref:`datalayout string<langref_datalayout>`. | ||
Non-integral pointer types represent pointers that have an *unspecified* bitwise | ||
representation; that is, the integral representation may be target dependent or | ||
unstable (not backed by a fixed integer). | ||
spaces as "non-integral" or "unstable" (or both "non-integral" and "unstable") | ||
via the :ref:`datalayout string<langref_datalayout>`. | ||
|
||
The exact implications of these properties are target-specific, but the | ||
following IR semantics and restrictions to optimization passes apply: | ||
|
||
Unstable pointer representation | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Pointers in this address space have an *unspecified* bitwise representation | ||
(i.e. not backed by a fixed integer). The bitwise pattern of such pointers is | ||
allowed to change in a target-specific way. For example, this could be a pointer | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
type used with copying garbage collection where the garbage collector could | ||
update the pointer at any time in the collection sweep. | ||
|
||
``inttoptr`` and ``ptrtoint`` instructions have the same semantics as for | ||
integral (i.e. normal) pointers in that they convert integers to and from | ||
corresponding pointer types, but there are additional implications to be | ||
aware of. Because the bit-representation of a non-integral pointer may | ||
not be stable, two identical casts of the same operand may or may not | ||
corresponding pointer types, but there are additional implications to be aware | ||
of. | ||
|
||
For "unstable" pointer representations, the bit-representation of the pointer | ||
may not be stable, so two identical casts of the same operand may or may not | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this applies to only an SSA value of an unstable pointer type? What about an in-memory value with the unstable pointer type? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not familiar with how GC pointers are used in LLVM, I just tried to split out the existing "copying GC" non-integral pointers properties into a separate property to allow for "fat pointers", CHERI capabilities, etc to use non-integral pointers without incurring all the restrictions imposed by GC pointers. Not sure who is best to comment on this, probably someone from azul who has worked on it recently. |
||
return the same value. Said differently, the conversion to or from the | ||
non-integral type depends on environmental state in an implementation | ||
"unstable" pointer type depends on environmental state in an implementation | ||
defined manner. | ||
|
||
If the frontend wishes to observe a *particular* value following a cast, the | ||
generated IR must fence with the underlying environment in an implementation | ||
defined manner. (In practice, this tends to require ``noinline`` routines for | ||
such operations.) | ||
|
||
From the perspective of the optimizer, ``inttoptr`` and ``ptrtoint`` for | ||
non-integral types are analogous to ones on integral types with one | ||
"unstable" pointer types are analogous to ones on integral types with one | ||
key exception: the optimizer may not, in general, insert new dynamic | ||
occurrences of such casts. If a new cast is inserted, the optimizer would | ||
need to either ensure that a) all possible values are valid, or b) | ||
appropriate fencing is inserted. Since the appropriate fencing is | ||
implementation defined, the optimizer can't do the latter. The former is | ||
challenging as many commonly expected properties, such as | ||
``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral types. | ||
``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for "unstable" pointer types. | ||
Similar restrictions apply to intrinsics that might examine the pointer bits, | ||
such as :ref:`llvm.ptrmask<int_ptrmask>`. | ||
|
||
The alignment information provided by the frontend for a non-integral pointer | ||
The alignment information provided by the frontend for an "unstable" pointer | ||
(typically using attributes or metadata) must be valid for every possible | ||
representation of the pointer. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is non-integral the right term for something that is more than just an integer? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Naming is hard - I kept this pre-existing name since it can also be interpreted as not just an integer, i.e. it can be anything else (such as integer+metadata). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, just to toss out a drive-by name suggestion (though I'm fine with keeping non-integral): how about "annotated" pointers? That is, the pointer does (without unstable) have a fixed representation and point to some address, but there are bits in that representation that "annotate" the address, and so |
||
Non-integral pointer representation | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Pointers are not represented as just an address, but may instead include | ||
additional metadata such as bounds information or a temporal identifier. | ||
Examples include AMDGPU buffer descriptors with a 128-bit fat pointer and a | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
32-bit offset, or CHERI capabilities that contain bounds, permissions and a | ||
type field (as well as an out-of-band validity bit, see next section). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At least with CHERI one can turn an integer into a pointer, it's just not a valid pointer (i.e. things like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've made this a bit more explicit, let me know what you think. |
||
In general, valid non-integral pointers cannot becreated from just an integer | ||
value: while ``inttoptr`` yields a deterministic bitwise pattern, the resulting | ||
value is not guaranteed to be a valid dereferenceable pointer. | ||
|
||
In most cases pointers with a non-integral representation behave exactly the | ||
same as an integral pointer, the only difference is that it is not possible to | ||
create a pointer just from an address. | ||
|
||
"Non-integral" pointers also impose restrictions on transformation passes, but | ||
in general these are less restrictive than for "unstable" pointers. The main | ||
difference compared to integral pointers is that ``inttoptr`` instructions | ||
should not be inserted by passes as they may not be able to create a valid | ||
pointer. This property also means that ``inttoptr(ptrtoint(x))`` cannot be | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
folded to ``x`` as the ``ptrtoint`` operation may destroy the necessary metadata | ||
to reconstruct the pointer. | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having let this bit of discussion from the RFC percolate, I think that imposing this property on non-integral/annotated pointers is fine, and that, while (current) AMDGPU does allow There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... But, having read below, should it be clarified that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the real problem here is that the semantics of ptrtoint are very vague and need to be clarified. It depends on whether ptrtoint returns "some integer derived from ptr, e.g. raw bitwise representation" or the "underlying address of the pointer". I always interpreted it as the latter, which would mean e.g. for a 128-bit CHERI capability ptrtoint only gives you the low 64 address bits ands strips metadata. For AMDGPU fat pointers I imagine it would need to return base+offset. The problem with the "raw bitwise interpretation" is that pointer comparisons of one-past-the end vs start of next object should return true, i.e. @nikic I imagine clarifying the ptrtoint semantics warrants another wider discussion on discourse? I can send one out later this week unless there is already (undocumented) consensus on the current semantics. If we instead assume There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My theory of And in the context of annotated pointers, I'd argue that, if the notion of "which object you're in" is part of the pointer, than one-past-the-end is not equal to the next object. That is, given that I've done %x = ptr addrspace(1) ...
%y = ptr addrspace(1) [%x + 64 bytes, distinct object]
%x8 = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) %x, 0, /*numRecords=*/64, ...)
%y8 = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) %y, 0, N, ...)
%x7 = addrspacecast ptr addrspace(8) %x8 to ptr addrspace(7)
%y7 = addrspacecast ptr addrspace(8) %y8 to ptr addrspace(7)
%y.via.x = getelementptr i8, ptr addrspace(7) %x7, i32 64
%r = icmp ptr addrspace(7) eq %y.via.x, y7 should have Additionally, even if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've sent this question as https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987 for wider visibility. |
||
|
||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Unlike "unstable" pointers, the bit-wise representation is stable and | ||
``ptrtoint(x)`` always yields a deterministic value. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This last point seems odd; it's not the concern of the user of llvm.memcpy/memmove how they're implemented, and it would be a bug were something to lower such a legitimate use to the wrong thing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well I believe sometimes memcpy/memmove is lowered to loads/stores in IR and this would now be illegal. I think we would need to tag these llvm.memcpy calls to prevent this transformation. |
||
This means transformation passes are still permitted to insert new ``ptrtoint`` | ||
instructions. | ||
However, it is important to note that ``ptrtoint`` may not yield the same value | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... Wait, hold on, I thought one of the firmer outcomes of the big ptrtoint semantics thread is that That is
is exactly
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, apologies for the delay here - I need to get around to rebasing my changes on top of the outcome of the discussion. I hope to have something next week. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No problem, just wanted to flag that (Also, re that discussion - it might be good to get your thoughts on the |
||
as storing the pointer via memory and reading it back as an integer, even if the | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
bitwidth of the two types matches (since ptrtoint could involve some form of | ||
arithmetic or strip parts of the non-integral pointer representation). | ||
|
||
Non-integral pointers with external state | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The former is presumably inherited from ptrtoint on "normal" pointers, with only the latter being new? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure the via-memory provenance semantics are actually well defined for normal pointers, but yes the latter is new and I am not sure if that property needs to be split out: software-only fat pointers could be copied byte-by-byte but capabilities cannot. |
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
A special case of non-integral pointers is ones that include external state | ||
(such as implicit bounds information or a type tag) with a target-defined size. | ||
An example of such a type is a CHERI capability, where there is an additional | ||
validity bit that is part of all pointer-typed registers, but is located in | ||
memory at an implementation-defined address separate from the pointer itself. | ||
Another example would be a fat-pointer scheme where pointers remain plain | ||
integers, but the associated bounds are stored in an out-of-band table. | ||
|
||
When a ``store ptr addrspace(N) %p, ptr @dst`` of such a non-integral pointers | ||
is performed, the external metadata is also stored to another location. | ||
Similarly, a ``%val = load ptr addrspace(N), ptr @dst`` will fetch the | ||
external metadata and make it available for all uses of ``%val``. | ||
Notionally, these additional bits are part of the pointer, but since loads and | ||
stores only operate on the "inline" bits of the pointer and the additional | ||
bits are not explicitly exposed, they are not included in the size specified in | ||
the :ref:`datalayout string<langref_datalayout>`. | ||
|
||
When a pointer type has external state, all roundtrips via memory must | ||
be performed as loads and stores of the correct type since stores of other | ||
types may not propagate the external data. | ||
Therefore it is not legal to convert a load/store of a non-integral pointer | ||
type with external state to a load/store of an integer type with same | ||
bitwidth, as that drops the additional state. | ||
However, it can be assumed that appropriately-aligned ``llvm.memcpy`` and | ||
``llvm.memmove`` intrinsics will preserve the external metadata. | ||
This is essential to allow frontends to efficiently emit of copies of | ||
structures containing such pointers, since expanding all these copies as | ||
individual loads and stores would significantly inhibit optimizations. | ||
To ensure that this results in valid code, affected backends must lower | ||
these intrinsics in a way that propagates external state. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we clarify "appropriately-aligned" here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I should probably drop "appropriately-aligned" - this relates to the fact that CHERI capabilities can only be loaded from/stored to naturally aligned memory locations since there is only one tag bit per capability-size location. |
||
|
||
.. _globalvars: | ||
|
||
Global Variables | ||
|
@@ -3082,16 +3161,22 @@ as follows: | |
``A<address space>`` | ||
Specifies the address space of objects created by '``alloca``'. | ||
Defaults to the default address space of 0. | ||
``p[n]:<size>:<abi>[:<pref>][:<idx>]`` | ||
``p[<flags>][<address space>]:<size>:<abi>[:<pref>][:<idx>]`` | ||
This specifies the *size* of a pointer and its ``<abi>`` and | ||
``<pref>``\erred alignments for address space ``n``. ``<pref>`` is optional | ||
and defaults to ``<abi>``. The fourth parameter ``<idx>`` is the size of the | ||
index that used for address calculation, which must be less than or equal | ||
to the pointer size. If not | ||
specified, the default index size is equal to the pointer size. All sizes | ||
are in bits. The address space, ``n``, is optional, and if not specified, | ||
denotes the default address space 0. The value of ``n`` must be | ||
in the range [1,2^24). | ||
are in bits. The ``<address space>``, is optional, and if not specified, | ||
denotes the default address space 0. The value of ``<address space>`` must | ||
be in the range [1,2^24). | ||
The optional``<flags>`` are used to specify properties of pointers in this | ||
address space: the character ``u`` marks pointers as having an unstable | ||
representation, ``n`` marks pointers as non-integral (i.e. having | ||
additional metadata), ``e`` marks pointers having external state | ||
(``n`` must also be set). See :ref:`Non-Integral Pointer Types <nointptrtype>`. | ||
|
||
``i<size>:<abi>[:<pref>]`` | ||
This specifies the alignment for an integer type of a given bit | ||
``<size>``. The value of ``<size>`` must be in the range [1,2^24). | ||
|
@@ -3146,9 +3231,11 @@ as follows: | |
this set are considered to support most general arithmetic operations | ||
efficiently. | ||
``ni:<address space0>:<address space1>:<address space2>...`` | ||
This specifies pointer types with the specified address spaces | ||
as :ref:`Non-Integral Pointer Type <nointptrtype>` s. The ``0`` | ||
address space cannot be specified as non-integral. | ||
This marks pointer types with the specified address spaces | ||
as :ref:`non-integral and unstable <nointptrtype>`. | ||
The ``0`` address space cannot be specified as non-integral. | ||
It is only supported for backwards compatibility, the flags of the ``p`` | ||
specifier should be used instead for new code. | ||
|
||
On every specification that takes a ``<abi>:<pref>``, specifying the | ||
``<pref>`` alignment is optional. If omitted, the preceding ``:`` | ||
|
@@ -12151,6 +12238,8 @@ If ``value`` is smaller than ``ty2`` then a zero extension is done. If | |
``value`` is larger than ``ty2`` then a truncation is done. If they are | ||
the same size, then nothing is done (*no-op cast*) other than a type | ||
change. | ||
For :ref:`non-integral pointers <nointptrtype>` the ``ptrtoint`` instruction | ||
may involve additional transformations beyond truncations or extension. | ||
|
||
Example: | ||
"""""""" | ||
|
Uh oh!
There was an error while loading. Please reload this page.