Skip to content

Commit 47822c8

Browse files
authored
[LangRef] Clarify that the pointer after an object must be valid. (#127892)
In some places, we rely on the assumption that the pointer after the object must also be valid and not overflow, but it does not seem to be spelled out clearly in LangRef, unless I missed a reference. The GetElementPtr section mentions that the maximum object size is half the pointer index type space, but then the pointer past the object may wrap. Clarify that the pointer after the object must also be valid. This should match Alive2's semantics: https://alive2.llvm.org/ce/z/Dk8QFL (https://github.com/AliveToolkit/alive2/blob/master/tools/transform.cpp#L1288) PR: #127892
1 parent bef4e52 commit 47822c8

File tree

1 file changed

+58
-48
lines changed

1 file changed

+58
-48
lines changed

llvm/docs/LangRef.rst

Lines changed: 58 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -729,8 +729,8 @@ units that do not include the definition.
729729
As SSA values, global variables define pointer values that are in scope
730730
(i.e. they dominate) all basic blocks in the program. Global variables
731731
always define a pointer to their "content" type because they describe a
732-
region of memory, and all memory objects in LLVM are accessed through
733-
pointers.
732+
region of memory, and all :ref:`allocated object<allocatedobjects>` in LLVM are
733+
accessed through pointers.
734734

735735
Global variables can be marked with ``unnamed_addr`` which indicates
736736
that the address is not significant, only the content. Constants marked
@@ -2169,7 +2169,8 @@ For example:
21692169
A ``nofree`` function is explicitly allowed to free memory which it
21702170
allocated or (if not ``nosync``) arrange for another thread to free
21712171
memory on it's behalf. As a result, perhaps surprisingly, a ``nofree``
2172-
function can return a pointer to a previously deallocated memory object.
2172+
function can return a pointer to a previously deallocated
2173+
:ref:`allocated object<allocatedobjects>`.
21732174
``noimplicitfloat``
21742175
Disallows implicit floating-point code. This inhibits optimizations that
21752176
use floating-point code and floating-point registers for operations that are
@@ -3280,31 +3281,42 @@ This information is passed along to the backend so that it generates
32803281
code for the proper architecture. It's possible to override this on the
32813282
command line with the ``-mtriple`` command line option.
32823283

3284+
3285+
.. _allocatedobjects:
3286+
3287+
Allocated Objects
3288+
-----------------
3289+
3290+
An allocated object, memory object, or simply object, is a region of a memory
3291+
space that is reserved by a memory allocation such as :ref:`alloca <i_alloca>`,
3292+
heap allocation calls, and global variable definitions. Once it is allocated,
3293+
the bytes stored in the region can only be read or written through a pointer
3294+
that is :ref:`based on <pointeraliasing>` the allocation value. If a pointer
3295+
that is not based on the object tries to read or write to the object, it is
3296+
undefined behavior.
3297+
3298+
The following properties hold for all allocated objects, otherwise the
3299+
behavior is undefined:
3300+
3301+
- no allocated object may cross the unsigned address space boundary (including
3302+
the pointer after the end of the object),
3303+
- the size of all allocated objects must be non-negative and not exceed the
3304+
largest signed integer that fits into the index type.
3305+
32833306
.. _objectlifetime:
32843307

32853308
Object Lifetime
32863309
----------------------
32873310

3288-
A memory object, or simply object, is a region of a memory space that is
3289-
reserved by a memory allocation such as :ref:`alloca <i_alloca>`, heap
3290-
allocation calls, and global variable definitions.
3291-
Once it is allocated, the bytes stored in the region can only be read or written
3292-
through a pointer that is :ref:`based on <pointeraliasing>` the allocation
3293-
value.
3294-
If a pointer that is not based on the object tries to read or write to the
3295-
object, it is undefined behavior.
3296-
3297-
A lifetime of a memory object is a property that decides its accessibility.
3298-
Unless stated otherwise, a memory object is alive since its allocation, and
3299-
dead after its deallocation.
3300-
It is undefined behavior to access a memory object that isn't alive, but
3301-
operations that don't dereference it such as
3302-
:ref:`getelementptr <i_getelementptr>`, :ref:`ptrtoint <i_ptrtoint>` and
3303-
:ref:`icmp <i_icmp>` return a valid result.
3304-
This explains code motion of these instructions across operations that
3305-
impact the object's lifetime.
3306-
A stack object's lifetime can be explicitly specified using
3307-
:ref:`llvm.lifetime.start <int_lifestart>` and
3311+
A lifetime of an :ref:`allocated object<allocatedobjects>` is a property that
3312+
decides its accessibility. Unless stated otherwise, an allocated object is alive
3313+
since its allocation, and dead after its deallocation. It is undefined behavior
3314+
to access an allocated object that isn't alive, but operations that don't
3315+
dereference it such as :ref:`getelementptr <i_getelementptr>`,
3316+
:ref:`ptrtoint <i_ptrtoint>` and :ref:`icmp <i_icmp>` return a valid result.
3317+
This explains code motion of these instructions across operations that impact
3318+
the object's lifetime. A stack object's lifetime can be explicitly specified
3319+
using :ref:`llvm.lifetime.start <int_lifestart>` and
33083320
:ref:`llvm.lifetime.end <int_lifeend>` intrinsic function calls.
33093321

33103322
.. _pointeraliasing:
@@ -4484,11 +4496,10 @@ Here are some examples of multidimensional arrays:
44844496

44854497
There is no restriction on indexing beyond the end of the array implied
44864498
by a static type (though there are restrictions on indexing beyond the
4487-
bounds of an allocated object in some cases). This means that
4488-
single-dimension 'variable sized array' addressing can be implemented in
4489-
LLVM with a zero length array type. An implementation of 'pascal style
4490-
arrays' in LLVM could use the type "``{ i32, [0 x float]}``", for
4491-
example.
4499+
bounds of an :ref:`allocated object<allocatedobjects>` in some cases). This
4500+
means that single-dimension 'variable sized array' addressing can be implemented
4501+
in LLVM with a zero length array type. An implementation of 'pascal style
4502+
arrays' in LLVM could use the type "``{ i32, [0 x float]}``", for example.
44924503

44934504
.. _t_struct:
44944505

@@ -11708,8 +11719,9 @@ For ``nuw`` (no unsigned wrap):
1170811719
For ``inbounds`` all rules of the ``nusw`` attribute apply. Additionally,
1170911720
if the ``getelementptr`` has any non-zero indices, the following rules apply:
1171011721

11711-
* The base pointer has an *in bounds* address of the allocated object that it
11712-
is :ref:`based <pointeraliasing>` on. This means that it points into that
11722+
* The base pointer has an *in bounds* address of the
11723+
:ref:`allocated object<allocatedobjects>` that it is
11724+
:ref:`based <pointeraliasing>` on. This means that it points into that
1171311725
allocated object, or to its end. Note that the object does not have to be
1171411726
live anymore; being in-bounds of a deallocated object is sufficient.
1171511727
* During the successive addition of offsets to the address, the resulting
@@ -11720,10 +11732,6 @@ Note that ``getelementptr`` with all-zero indices is always considered to be
1172011732
As a corollary, the only pointer in bounds of the null pointer in the default
1172111733
address space is the null pointer itself.
1172211734

11723-
These rules are based on the assumption that no allocated object may cross
11724-
the unsigned address space boundary, and no allocated object may be larger
11725-
than half the pointer index type space.
11726-
1172711735
If ``inbounds`` is present on a ``getelementptr`` instruction, the ``nusw``
1172811736
attribute will be automatically set as well. For this reason, the ``nusw``
1172911737
will also not be printed in textual IR if ``inbounds`` is already present.
@@ -26318,7 +26326,7 @@ Memory Use Markers
2631826326
------------------
2631926327

2632026328
This class of intrinsics provides information about the
26321-
:ref:`lifetime of memory objects <objectlifetime>` and ranges where variables
26329+
:ref:`lifetime of allocated objects <objectlifetime>` and ranges where variables
2632226330
are immutable.
2632326331

2632426332
.. _int_lifestart:
@@ -26386,8 +26394,8 @@ Syntax:
2638626394
Overview:
2638726395
"""""""""
2638826396

26389-
The '``llvm.lifetime.end``' intrinsic specifies the end of a memory object's
26390-
lifetime.
26397+
The '``llvm.lifetime.end``' intrinsic specifies the end of a
26398+
:ref:`allocated object's lifetime<objectlifetime>`.
2639126399

2639226400
Arguments:
2639326401
""""""""""
@@ -26417,7 +26425,8 @@ with ``poison``.
2641726425

2641826426
Syntax:
2641926427
"""""""
26420-
This is an overloaded intrinsic. The memory object can belong to any address space.
26428+
This is an overloaded intrinsic. The :ref:`allocated object<allocatedobjects>`
26429+
can belong to any address space.
2642126430

2642226431
::
2642326432

@@ -26427,7 +26436,7 @@ Overview:
2642726436
"""""""""
2642826437

2642926438
The '``llvm.invariant.start``' intrinsic specifies that the contents of
26430-
a memory object will not change.
26439+
an :ref:`allocated object<allocatedobjects>` will not change.
2643126440

2643226441
Arguments:
2643326442
""""""""""
@@ -26448,7 +26457,8 @@ unchanging.
2644826457

2644926458
Syntax:
2645026459
"""""""
26451-
This is an overloaded intrinsic. The memory object can belong to any address space.
26460+
This is an overloaded intrinsic. The :ref:`allocated object<allocatedobjects>`
26461+
can belong to any address space.
2645226462

2645326463
::
2645426464

@@ -26457,8 +26467,8 @@ This is an overloaded intrinsic. The memory object can belong to any address spa
2645726467
Overview:
2645826468
"""""""""
2645926469

26460-
The '``llvm.invariant.end``' intrinsic specifies that the contents of a
26461-
memory object are mutable.
26470+
The '``llvm.invariant.end``' intrinsic specifies that the contents of an
26471+
:ref:`allocated object<allocatedobjects>` are mutable.
2646226472

2646326473
Arguments:
2646426474
""""""""""
@@ -26478,9 +26488,9 @@ This intrinsic indicates that the memory is mutable again.
2647826488

2647926489
Syntax:
2648026490
"""""""
26481-
This is an overloaded intrinsic. The memory object can belong to any address
26482-
space. The returned pointer must belong to the same address space as the
26483-
argument.
26491+
This is an overloaded intrinsic. The :ref:`allocated object<allocatedobjects>`
26492+
can belong to any address space. The returned pointer must belong to the same
26493+
address space as the argument.
2648426494

2648526495
::
2648626496

@@ -26514,9 +26524,9 @@ It does not read any accessible memory and the execution can be speculated.
2651426524

2651526525
Syntax:
2651626526
"""""""
26517-
This is an overloaded intrinsic. The memory object can belong to any address
26518-
space. The returned pointer must belong to the same address space as the
26519-
argument.
26527+
This is an overloaded intrinsic. The :ref:`allocated object<allocatedobjects>`
26528+
can belong to any address space. The returned pointer must belong to the same
26529+
address space as the argument.
2652026530

2652126531
::
2652226532

0 commit comments

Comments
 (0)