Skip to content

[LangRef] Clarify that the pointer after an object must be valid. #127892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 24, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 58 additions & 48 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -729,8 +729,8 @@ units that do not include the definition.
As SSA values, global variables define pointer values that are in scope
(i.e. they dominate) all basic blocks in the program. Global variables
always define a pointer to their "content" type because they describe a
region of memory, and all memory objects in LLVM are accessed through
pointers.
region of memory, and all :ref:`allocated object<allocatedobjects>` in LLVM are
accessed through pointers.

Global variables can be marked with ``unnamed_addr`` which indicates
that the address is not significant, only the content. Constants marked
Expand Down Expand Up @@ -2169,7 +2169,8 @@ For example:
A ``nofree`` function is explicitly allowed to free memory which it
allocated or (if not ``nosync``) arrange for another thread to free
memory on it's behalf. As a result, perhaps surprisingly, a ``nofree``
function can return a pointer to a previously deallocated memory object.
function can return a pointer to a previously deallocated
:ref:`allocated object<allocatedobjects>`.
``noimplicitfloat``
Disallows implicit floating-point code. This inhibits optimizations that
use floating-point code and floating-point registers for operations that are
Expand Down Expand Up @@ -3280,31 +3281,42 @@ This information is passed along to the backend so that it generates
code for the proper architecture. It's possible to override this on the
command line with the ``-mtriple`` command line option.


.. _allocatedobjects:

Allocated Objects
-----------------

An allocated object, memory object, or simply object, is a region of a memory
space that is reserved by a memory allocation such as :ref:`alloca <i_alloca>`,
heap allocation calls, and global variable definitions. Once it is allocated,
the bytes stored in the region can only be read or written through a pointer
that is :ref:`based on <pointeraliasing>` the allocation value. If a pointer
that is not based on the object tries to read or write to the object, it is
undefined behavior.

The following properties hold for all allocated objects, otherwise the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also have "otherwise, the behavior is undefined" wording.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks

behavior is undefined:

- no allocated object may cross the unsigned address space boundary (including
the pointer after the end of the object),
- the size of all allocated objects must be non-negative and not exceed the
largest signed integer that fits into the index type.

.. _objectlifetime:

Object Lifetime
----------------------

A memory object, or simply object, is a region of a memory space that is
reserved by a memory allocation such as :ref:`alloca <i_alloca>`, heap
allocation calls, and global variable definitions.
Once it is allocated, the bytes stored in the region can only be read or written
through a pointer that is :ref:`based on <pointeraliasing>` the allocation
value.
If a pointer that is not based on the object tries to read or write to the
object, it is undefined behavior.

A lifetime of a memory object is a property that decides its accessibility.
Unless stated otherwise, a memory object is alive since its allocation, and
dead after its deallocation.
It is undefined behavior to access a memory object that isn't alive, but
operations that don't dereference it such as
:ref:`getelementptr <i_getelementptr>`, :ref:`ptrtoint <i_ptrtoint>` and
:ref:`icmp <i_icmp>` return a valid result.
This explains code motion of these instructions across operations that
impact the object's lifetime.
A stack object's lifetime can be explicitly specified using
:ref:`llvm.lifetime.start <int_lifestart>` and
A lifetime of an :ref:`allocated object<allocatedobjects>` is a property that
decides its accessibility. Unless stated otherwise, an allocated object is alive
since its allocation, and dead after its deallocation. It is undefined behavior
to access an allocated object that isn't alive, but operations that don't
dereference it such as :ref:`getelementptr <i_getelementptr>`,
:ref:`ptrtoint <i_ptrtoint>` and :ref:`icmp <i_icmp>` return a valid result.
This explains code motion of these instructions across operations that impact
the object's lifetime. A stack object's lifetime can be explicitly specified
using :ref:`llvm.lifetime.start <int_lifestart>` and
:ref:`llvm.lifetime.end <int_lifeend>` intrinsic function calls.

.. _pointeraliasing:
Expand Down Expand Up @@ -4484,11 +4496,10 @@ Here are some examples of multidimensional arrays:

There is no restriction on indexing beyond the end of the array implied
by a static type (though there are restrictions on indexing beyond the
bounds of an allocated object in some cases). This means that
single-dimension 'variable sized array' addressing can be implemented in
LLVM with a zero length array type. An implementation of 'pascal style
arrays' in LLVM could use the type "``{ i32, [0 x float]}``", for
example.
bounds of an :ref:`allocated object<allocatedobjects>` in some cases). This
means that single-dimension 'variable sized array' addressing can be implemented
in LLVM with a zero length array type. An implementation of 'pascal style
arrays' in LLVM could use the type "``{ i32, [0 x float]}``", for example.

.. _t_struct:

Expand Down Expand Up @@ -11708,8 +11719,9 @@ For ``nuw`` (no unsigned wrap):
For ``inbounds`` all rules of the ``nusw`` attribute apply. Additionally,
if the ``getelementptr`` has any non-zero indices, the following rules apply:

* The base pointer has an *in bounds* address of the allocated object that it
is :ref:`based <pointeraliasing>` on. This means that it points into that
* The base pointer has an *in bounds* address of the
:ref:`allocated object<allocatedobjects>` that it is
:ref:`based <pointeraliasing>` on. This means that it points into that
allocated object, or to its end. Note that the object does not have to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"must be valid" is ambiguous. I'd explicitly say "must not be null".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would express it in terms of overflow because null can have a different connotation in other address spaces.
I suggest a remark like:

based on the assumption that no allocated object may cross the unsigned address space boundary (including a pointer after the end of the object).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the wording, thanks

live anymore; being in-bounds of a deallocated object is sufficient.
* During the successive addition of offsets to the address, the resulting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd phrase this as "than the largest signed integer that fits into the index type". It's not super clear what exactly "half" here means, but the intention is that the size is <= SignedMax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to be non negative and < SignedMax (otherwise there may be no room for the pointer after the object).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. The size already points to the one-past-the-end location, so why do you need the extra increment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was still thinking about the previous wording, which I read as it takes up all signed positive addresses. Should be adjusted nowthanks

Expand All @@ -11720,10 +11732,6 @@ Note that ``getelementptr`` with all-zero indices is always considered to be
As a corollary, the only pointer in bounds of the null pointer in the default
address space is the null pointer itself.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These rules are based on the assumption for
These rules are based on the assumptions for

Though I'm not sure we really need this sentence anymore, it may make more sense to link "allocated object" in the bullet point above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the bullet point to link to the allocated objects section and remove the sentence here.


These rules are based on the assumption that no allocated object may cross
the unsigned address space boundary, and no allocated object may be larger
than half the pointer index type space.

If ``inbounds`` is present on a ``getelementptr`` instruction, the ``nusw``
attribute will be automatically set as well. For this reason, the ``nusw``
will also not be printed in textual IR if ``inbounds`` is already present.
Expand Down Expand Up @@ -26318,7 +26326,7 @@ Memory Use Markers
------------------

This class of intrinsics provides information about the
:ref:`lifetime of memory objects <objectlifetime>` and ranges where variables
:ref:`lifetime of allocated objects <objectlifetime>` and ranges where variables
are immutable.

.. _int_lifestart:
Expand Down Expand Up @@ -26386,8 +26394,8 @@ Syntax:
Overview:
"""""""""

The '``llvm.lifetime.end``' intrinsic specifies the end of a memory object's
lifetime.
The '``llvm.lifetime.end``' intrinsic specifies the end of a
:ref:`allocated object's lifetime<objectlifetime>`.

Arguments:
""""""""""
Expand Down Expand Up @@ -26417,7 +26425,8 @@ with ``poison``.

Syntax:
"""""""
This is an overloaded intrinsic. The memory object can belong to any address space.
This is an overloaded intrinsic. The :ref:`allocated object<allocatedobjects>`
can belong to any address space.

::

Expand All @@ -26427,7 +26436,7 @@ Overview:
"""""""""

The '``llvm.invariant.start``' intrinsic specifies that the contents of
a memory object will not change.
an :ref:`allocated object<allocatedobjects>` will not change.

Arguments:
""""""""""
Expand All @@ -26448,7 +26457,8 @@ unchanging.

Syntax:
"""""""
This is an overloaded intrinsic. The memory object can belong to any address space.
This is an overloaded intrinsic. The :ref:`allocated object<allocatedobjects>`
can belong to any address space.

::

Expand All @@ -26457,8 +26467,8 @@ This is an overloaded intrinsic. The memory object can belong to any address spa
Overview:
"""""""""

The '``llvm.invariant.end``' intrinsic specifies that the contents of a
memory object are mutable.
The '``llvm.invariant.end``' intrinsic specifies that the contents of an
:ref:`allocated object<allocatedobjects>` are mutable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The '``llvm.invariant.end``' intrinsic specifies that the contents of a
The '``llvm.invariant.end``' intrinsic specifies that the contents of an

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed thanks


Arguments:
""""""""""
Expand All @@ -26478,9 +26488,9 @@ This intrinsic indicates that the memory is mutable again.

Syntax:
"""""""
This is an overloaded intrinsic. The memory object can belong to any address
space. The returned pointer must belong to the same address space as the
argument.
This is an overloaded intrinsic. The :ref:`allocated object<allocatedobjects>`
can belong to any address space. The returned pointer must belong to the same
address space as the argument.

::

Expand Down Expand Up @@ -26514,9 +26524,9 @@ It does not read any accessible memory and the execution can be speculated.

Syntax:
"""""""
This is an overloaded intrinsic. The memory object can belong to any address
space. The returned pointer must belong to the same address space as the
argument.
This is an overloaded intrinsic. The :ref:`allocated object<allocatedobjects>`
can belong to any address space. The returned pointer must belong to the same
address space as the argument.

::

Expand Down
Loading