-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LangRef] Add a description of the semantics of call signatures. #136189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LangRef] Add a description of the semantics of call signatures. #136189
Conversation
This doesn't introduce anything new; it's just a reflection of the semantics we've already had for many years. Fixes llvm#63484.
@llvm/pr-subscribers-llvm-ir Author: Eli Friedman (efriedma-quic) ChangesThis doesn't introduce anything new; it's just a reflection of the semantics we've already had for many years. Per discussion on #63484. Full diff: https://github.com/llvm/llvm-project/pull/136189.diff 1 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 110c30e19220f..8f5e035c1d736 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -9475,10 +9475,11 @@ This instruction requires several arguments:
from the :ref:`datalayout string<langref_datalayout>` will be used.
#. '``ty``': the type of the call instruction itself which is also the
type of the return value. Functions that return no value are marked
- ``void``.
+ ``void``. The signature is computed based on the return type and argument
+ types.
#. '``fnty``': shall be the signature of the function being invoked. The
- argument types must match the types implied by this signature. This
- type can be omitted if the function is not varargs.
+ argument types must match the types implied by this signature. Thisa
+ is only required if the signature specifies a varargs type.
#. '``fnptrval``': An LLVM value containing a pointer to a function to
be invoked. In most cases, this is a direct function invocation, but
indirect ``invoke``'s are just as possible, calling an arbitrary pointer
@@ -9571,10 +9572,11 @@ This instruction requires several arguments:
from the :ref:`datalayout string<langref_datalayout>` will be used.
#. '``ty``': the type of the call instruction itself which is also the
type of the return value. Functions that return no value are marked
- ``void``.
+ ``void``. The signature is computed based on the return type and argument
+ types.
#. '``fnty``': shall be the signature of the function being called. The
argument types must match the types implied by this signature. This
- type can be omitted if the function is not varargs.
+ is only required if the signature specifies a varargs type.
#. '``fnptrval``': An LLVM value containing a pointer to a function to
be called. In most cases, this is a direct function call, but
other ``callbr``'s are just as possible, calling an arbitrary pointer
@@ -13127,10 +13129,11 @@ This instruction requires several arguments:
from the :ref:`datalayout string<langref_datalayout>` will be used.
#. '``ty``': the type of the call instruction itself which is also the
type of the return value. Functions that return no value are marked
- ``void``.
+ ``void``. The signature is computed based on the return type and argument
+ types.
#. '``fnty``': shall be the signature of the function being called. The
argument types must match the types implied by this signature. This
- type can be omitted if the function is not varargs.
+ is only required if the signature specifies a varargs type.
#. '``fnptrval``': An LLVM value containing a pointer to a function to
be called. In most cases, this is a direct function call, but
indirect ``call``'s are just as possible, calling an arbitrary pointer
@@ -13152,6 +13155,16 @@ values. Upon a '``ret``' instruction in the called function, control
flow continues with the instruction after the function call, and the
return value of the function is bound to the result argument.
+If the callee refers to an intrinsic function, the signature of the call must
+match the signature of the callee. Otherwise, if the signature of the call
+does not match the signature of the called function, the behavior is
+target-specific. For a signficant mismatch, this likely results in undefined
+behavior. LLVM interprocedural optimizations generally only optimize calls
+where the signature of the caller matches the signature of the callee.
+
+Note that it is possible for the signatures to mismatch even if a call appears
+to be a "direct" call, like ``call void @f()``.
+
Example:
""""""""
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
#. '``fnty``': shall be the signature of the function being invoked. The | ||
argument types must match the types implied by this signature. This | ||
type can be omitted if the function is not varargs. | ||
argument types must match the types implied by this signature. Thisa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argument types must match the types implied by this signature. Thisa | |
argument types must match the types implied by this signature. This |
If the callee refers to an intrinsic function, the signature of the call must | ||
match the signature of the callee. Otherwise, if the signature of the call | ||
does not match the signature of the called function, the behavior is | ||
target-specific. For a signficant mismatch, this likely results in undefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
target-specific. For a signficant mismatch, this likely results in undefined | |
target-specific. For a significant mismatch, this likely results in undefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite get it: #63484 calls this a bug and a unintended regression of the opaque pointer migration.
I don't understand why it is proper to endorse this in LangRef instead of fixing the verifier.
@joker-eph This updates LangRef to reflect current LLVM semantics. I understand that you think these semantics are not desirable, but LangRef should still reflect the current semantics. I can see how a reasonable person might disagree on the best way to handle the problem of mismatched function signatures, but as the issue discussion explains in detail, the current design is intentional, so calling this a "bug" is disingenuous. It is not the case that someone just forgot to implement a verifier check -- the lack of a verifier check is deliberate. If we wanted to make any change away from the current semantics, it would have to go through an RFC (unless it's just a cosmetic change in textual IR, without impact on the IR model itself -- but you didn't like that either). |
This is a weird reasoning: you introduce a bug / regression and then claim this is the current semantics... How is that OK?
How so? Was this discussed and considered like a desirable change in LLVM verification? You didn't provide such pointers in the bug discussion I believe.
CF above: where was this particular behavior change discussed in an RFC in the first place? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor nit.
I agree with @joker-eph this feels like a bug. However, this PR codifies existing behavior, which seems fair to me. Hope someone puts an RFC at some point though.
If the callee refers to an intrinsic function, the signature of the call must | ||
match the signature of the callee. Otherwise, if the signature of the call | ||
does not match the signature of the called function, the behavior is | ||
target-specific. For a significant mismatch, this likely results in undefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we say "undefined behavior"? might to more consistent with the -passes=lint
diagnostics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some weirdness here because of the way ABI lowering works: we don't define one correct way to lower a high-level type to an LLVM type, so different frontends represent the "same" type as a different LLVM type. Because of this, optimizations are conservative with signature mismatches. Maybe we can revisit this at some point.
The first paragraph defines the semantics of a signature mismatch. It has nothing to do with opaque pointers; the type of the function pointer isn't mentioned at all. And it's how it always worked. The second paragraph isn't defining anything at all; it's just noting an edge case. It's a natural consequence of the callee of a call being a function pointer, all function pointers having the same type, and constants being interchangeable with other values. Opaque pointers changed the type of function pointers; the other rules have always existed. These are rules we could potentially change, but any change to in-memory semantics would have deep consequences across the whole codebase. |
It feels like a bug at first glance, but it most definitively not a bug. Statically determinable undefined behavior is not invalid IR, nor should it be. This case is not unique and not deserving of special treatment for the same reason we don't make store to null pointer invalid IR, and it is useful to have statically representable UB situations. If we were to mandate that callsite types match, it now introduces special case that imposes restrictions on what IR transforms are possible, which brings API complexity and compile time costs to avoid. Any IR transform that touches a pointer cannot simply do a RAUW or other propagation of the value, just in case it happens to be a function which ends up appearing in a mismatched call site. Every piece of code touching a pointer now needs to defend against source programming errors, and inspect all users of the value inflating the impact of this edge case. Making this a structural IR rule is worse. We could change something about the syntax, or at most make it a warning. We already have the rarely used lint pass, we could consider starting to emit warnings directly from the text asm parser. |
I agree with this in general, but I don't think you're applying the right reasoning here. We had something that until LLVM 15 was a clear invalid IR by construction: it's not a "runtime behavior" that we'd have to guess here but a static construct by definition. The change in LLVM 15 dropped the structural information on the floor without attempting to preserve it, causing this regression.
Well, how was this working until LLVM 16? |
@efriedma-quic merging this within 24h while a discussion is on-going seems like not the usual norm in the community, and clearly not what's encoding in our developer guide right now. |
As far I can tell, there weren't any objections to the patch itself. The discussion is centered around concerns that LLVM should not work this way... but LLVM works the way it does no matter what LangRef says. If there's some specific concern, I'm happy to revise the text. |
The use was previously a bitcast instruction with different element types, not the direct call site |
The red card next in the "review" section of the patch disagree with this assessment I believe.
This line of thinking makes it look like anytime there is a bug in LLVM which makes it behaves differently than LangRef we should just update LangRef to match the implementation instead of considering to fix the bug. |
Okay, so... what do you want to actually happen at this point? Delete "Note that it is possible for the signatures to mismatch even if a call appears" sentence from LangRef? Do you have some objection to the other changes? |
…m#136189) This doesn't introduce anything new; it's just a reflection of the semantics we've already had for many years. Per discussion on llvm#63484.
…m#136189) This doesn't introduce anything new; it's just a reflection of the semantics we've already had for many years. Per discussion on llvm#63484.
…m#136189) This doesn't introduce anything new; it's just a reflection of the semantics we've already had for many years. Per discussion on llvm#63484.
This doesn't introduce anything new; it's just a reflection of the semantics we've already had for many years.
Per discussion on #63484.