Skip to content

[LangRef] Clarify that the pointer after an object must be valid. #127892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 24, 2025

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Feb 19, 2025

In some places, we rely on the assumption that the pointer after the object must also be valid and not overflow, but it does not seem to be spelled out clearly in LangRef, unless I missed a reference.

The GetElementPtr section mentions that the maximum object size is half the pointer index type space, but then the pointer past the object may wrap. Clarify that the pointer after the object must also be valid.

This should match Alive2's semantics: https://alive2.llvm.org/ce/z/Dk8QFL (https://github.com/AliveToolkit/alive2/blob/master/tools/transform.cpp#L1288)

@fhahn
Copy link
Contributor Author

fhahn commented Feb 19, 2025

Not sure if there is a better place to document the assumption than in the GEP semantics

@llvmbot
Copy link
Member

llvmbot commented Feb 19, 2025

@llvm/pr-subscribers-llvm-ir

Author: Florian Hahn (fhahn)

Changes

In some places, we rely on the assumption that the pointer after the object must also be valid and not overflow, but it does not seem to be spelled out clearly in LangRef, unless I missed a reference.

The GetElementPtr section mentions that the maximum object size is half the pointer index type space, but then the pointer past the object may wrap. Clarify that the pointer after the object must also be valid.

This should match Alive2's semantics: https://alive2.llvm.org/ce/z/Dk8QFL (https://github.com/AliveToolkit/alive2/blob/master/tools/transform.cpp#L1288)


Full diff: https://github.com/llvm/llvm-project/pull/127892.diff

1 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+3-2)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index deb87365ae8d7..8a12e6b964a9f 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -11722,8 +11722,9 @@ As a corollary, the only pointer in bounds of the null pointer in the default
 address space is the null pointer itself.
 
 These rules are based on the assumption that no allocated object may cross
-the unsigned address space boundary, and no allocated object may be larger
-than half the pointer index type space.
+the unsigned address space boundary, the pointer after the object must be valid,
+and no allocated object may be larger than half the pointer index type space
+- 1.
 
 If ``inbounds`` is present on a ``getelementptr`` instruction, the ``nusw``
 attribute will be automatically set as well. For this reason, the ``nusw``

@nikic
Copy link
Contributor

nikic commented Feb 19, 2025

Not sure if there is a better place to document the assumption than in the GEP semantics

It would make sense to add a separate section on allocated objects.

Edit: See also https://doc.rust-lang.org/std/ptr/index.html#allocated-object for Rust's documentation of LLVM's allocated object rules.

@@ -11722,8 +11722,9 @@ As a corollary, the only pointer in bounds of the null pointer in the default
address space is the null pointer itself.

These rules are based on the assumption that no allocated object may cross
the unsigned address space boundary, and no allocated object may be larger
than half the pointer index type space.
the unsigned address space boundary, the pointer after the object must be valid,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"must be valid" is ambiguous. I'd explicitly say "must not be null".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would express it in terms of overflow because null can have a different connotation in other address spaces.
I suggest a remark like:

based on the assumption that no allocated object may cross the unsigned address space boundary (including a pointer after the end of the object).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the wording, thanks

than half the pointer index type space.
the unsigned address space boundary, the pointer after the object must be valid,
and no allocated object may be larger than half the pointer index type space
- 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd phrase this as "than the largest signed integer that fits into the index type". It's not super clear what exactly "half" here means, but the intention is that the size is <= SignedMax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to be non negative and < SignedMax (otherwise there may be no room for the pointer after the object).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. The size already points to the one-past-the-end location, so why do you need the extra increment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was still thinking about the previous wording, which I read as it takes up all signed positive addresses. Should be adjusted nowthanks

@fhahn fhahn force-pushed the langref-gep-pointer-after-object-valid branch 2 times, most recently from 4221331 to 2ec31c0 Compare February 20, 2025 16:39
Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce a section for Allocated Objects, re-using parts of the Object Lifetime section.

I also updated various references to consistently refer to allocated objects. I could also split this off the a separate PR if preferred?

@nunoplopes
Copy link
Member

looks good to me, thanks!

@fhahn fhahn force-pushed the langref-gep-pointer-after-object-valid branch 2 times, most recently from eadd721 to 66b93da Compare February 23, 2025 18:09
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

that is not based on the object tries to read or write to the object, it is
undefined behavior.

The following properties hold for all allocated objects:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also have "otherwise, the behavior is undefined" wording.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks

These rules are based on the assumption that no allocated object may cross
the unsigned address space boundary, and no allocated object may be larger
than half the pointer index type space.
These rules are based on the assumption for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These rules are based on the assumption for
These rules are based on the assumptions for

Though I'm not sure we really need this sentence anymore, it may make more sense to link "allocated object" in the bullet point above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the bullet point to link to the allocated objects section and remove the sentence here.

@@ -26458,7 +26469,7 @@ Overview:
"""""""""

The '``llvm.invariant.end``' intrinsic specifies that the contents of a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The '``llvm.invariant.end``' intrinsic specifies that the contents of a
The '``llvm.invariant.end``' intrinsic specifies that the contents of an

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed thanks

@fhahn fhahn changed the title [LangRef] Clarify that the pointer after an objet must be valid. [LangRef] Clarify that the pointer after an object must be valid. Feb 24, 2025
In some places, we rely on the assumption that the pointer after the
object must also be valid and not overflow, but it does not seem to be
spelled out clearly in LangRef, unless I missed a reference.

The GetElementPtr section mentions that the maximum object size is half
the pointer index type space, but then the pointer past the object may
wrap. Clarify that the pointer after the object must also be valid.

This should match Alive2's semantics: https://alive2.llvm.org/ce/z/Dk8QFL
(https://github.com/AliveToolkit/alive2/blob/master/tools/transform.cpp#L1288)
@fhahn fhahn force-pushed the langref-gep-pointer-after-object-valid branch from 66b93da to c0cc0af Compare February 24, 2025 17:02
@fhahn fhahn merged commit 47822c8 into llvm:main Feb 24, 2025
10 of 12 checks passed
@fhahn fhahn deleted the langref-gep-pointer-after-object-valid branch February 24, 2025 21:23
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 24, 2025
… valid. (#127892)

In some places, we rely on the assumption that the pointer after the
object must also be valid and not overflow, but it does not seem to be
spelled out clearly in LangRef, unless I missed a reference.

The GetElementPtr section mentions that the maximum object size is half
the pointer index type space, but then the pointer past the object may
wrap. Clarify that the pointer after the object must also be valid.

This should match Alive2's semantics:
https://alive2.llvm.org/ce/z/Dk8QFL
(https://github.com/AliveToolkit/alive2/blob/master/tools/transform.cpp#L1288)

PR: llvm/llvm-project#127892
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants