Skip to content

[AArch64][PAC] Support ptrauth builtins and -fptrauth-intrinsics. #65996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ahmedbougacha
Copy link
Member

@ahmedbougacha ahmedbougacha commented Sep 11, 2023

This defines the basic set of pointer authentication clang builtins
(provided in a new header, ptrauth.h), with diagnostics and IRGen
support. The availability of the builtins is gated on a new flag,
-fptrauth-intrinsics.

Note that this only includes the basic intrinsics, and notably excludes
ptrauth_sign_constant, ptrauth_type_discriminator, and
ptrauth_string_discriminator, which need extra logic to be fully
supported.

This also introduces clang/docs/PointerAuthentication.rst, which
describes the ptrauth model in general, in addition to these builtins.

Co-Authored-By: Akira Hatanaka [email protected]
Co-Authored-By: John McCall [email protected]

@ahmedbougacha ahmedbougacha added clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics clang:codegen IR generation bugs: mangling, exceptions, etc. labels Sep 11, 2023
@ahmedbougacha ahmedbougacha requested review from a team as code owners September 11, 2023 18:52
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AArch64 backend:X86 clang:modules C++20 modules and Clang Header Modules labels Sep 11, 2023
@llvmbot
Copy link
Member

llvmbot commented Sep 11, 2023

@llvm/pr-subscribers-clang-codegen

Changes

This defines the basic set of pointer authentication clang builtins (provided in a new header, ptrauth.h), with diagnostics and IRGen support. The availability of the builtins is gated on a new flag, -fptrauth-intrinsics.

Note that this only includes the basic intrinsics, and notably excludes ptrauth_sign_constant, ptrauth_type_discriminator, and ptrauth_string_discriminator, which need extra logic to be fully supported.

This also introduces clang/docs/PointerAuthentication.rst, which describes the ptrauth model in general, as well as these builtins.

(Replaces https://reviews.llvm.org/D112941)

Patch is 93.58 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/65996.diff

30 Files Affected:

  • (modified) clang/docs/LanguageExtensions.rst (+5)
  • (added) clang/docs/PointerAuthentication.rst (+548)
  • (modified) clang/include/clang/Basic/Builtins.def (+8)
  • (modified) clang/include/clang/Basic/DiagnosticGroups.td (+1)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+16)
  • (modified) clang/include/clang/Basic/Features.def (+1)
  • (modified) clang/include/clang/Basic/LangOptions.def (+2)
  • (modified) clang/include/clang/Basic/TargetInfo.h (+6)
  • (modified) clang/include/clang/Driver/Options.td (+8)
  • (modified) clang/include/clang/Sema/Sema.h (+2)
  • (modified) clang/lib/Basic/Module.cpp (+4)
  • (modified) clang/lib/Basic/TargetInfo.cpp (+4)
  • (modified) clang/lib/Basic/Targets/AArch64.cpp (+6)
  • (modified) clang/lib/Basic/Targets/AArch64.h (+2)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+67)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+5)
  • (modified) clang/lib/Frontend/CompilerInvocation.cpp (+13)
  • (modified) clang/lib/Headers/CMakeLists.txt (+1)
  • (modified) clang/lib/Headers/module.modulemap (+5)
  • (added) clang/lib/Headers/ptrauth.h (+167)
  • (modified) clang/lib/Sema/SemaChecking.cpp (+182)
  • (added) clang/test/CodeGen/ptrauth-intrinsics.c (+73)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/module.modulemap (+8)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/ptrauth.h (+1)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/stddef.h (+1)
  • (added) clang/test/Modules/ptrauth-include-from-darwin.m (+6)
  • (added) clang/test/Preprocessor/ptrauth_feature.c (+10)
  • (added) clang/test/Sema/ptrauth-intrinsics-macro.c (+34)
  • (added) clang/test/Sema/ptrauth.c (+126)
  • (modified) llvm/docs/PointerAuth.md (+3)
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 11cbdca7a268fc3..49a3934d9d082fc 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -13,6 +13,7 @@ Clang Language Extensions
    BlockLanguageSpec
    Block-ABI-Apple
    AutomaticReferenceCounting
+   PointerAuthentication
    MatrixTypes
 
 Introduction
@@ -4157,6 +4158,10 @@ reordering of memory accesses and side effect instructions. Other instructions
 like simple arithmetic may be reordered around the intrinsic. If you expect to
 have no reordering at all, use inline assembly instead.
 
+Pointer Authentication
+^^^^^^^^^^^^^^^^^^^^^^
+See :doc:`PointerAuthentication`.
+
 X86/X86-64 Language Extensions
 ------------------------------
 
diff --git a/clang/docs/PointerAuthentication.rst b/clang/docs/PointerAuthentication.rst
new file mode 100644
index 000000000000000..87b8f244a2e4653
--- /dev/null
+++ b/clang/docs/PointerAuthentication.rst
@@ -0,0 +1,548 @@
+Pointer Authentication
+======================
+
+.. contents::
+   :local:
+
+Introduction
+------------
+
+Pointer authentication is a technology which offers strong probabilistic protection against exploiting a broad class of memory bugs to take control of program execution.  When adopted consistently in a language ABI, it provides a form of relatively fine-grained control flow integrity (CFI) check that resists both return-oriented programming (ROP) and jump-oriented programming (JOP) attacks.
+
+While pointer authentication can be implemented purely in software, direct hardware support (e.g. as provided by ARMv8.3) can dramatically lower the execution speed and code size costs.  Similarly, while pointer authentication can be implemented on any architecture, taking advantage of the (typically) excess addressing range of a target with 64-bit pointers minimizes the impact on memory performance and can allow interoperation with existing code (by disabling pointer authentication dynamically).  This document will generally attempt to present the pointer authentication feature independent of any hardware implementation or ABI.  Considerations that are implementation-specific are clearly identified throughout.
+
+Note that there are several different terms in use:
+
+- **Pointer authentication** is a target-independent language technology.
+
+- **ARMv8.3** is an AArch64 architecture revision of that provides hardware support for pointer authentication.  It is implemented on several shipping processors, including the Apple A12 and later.
+
+* **arm64e** is a specific ABI for (not yet fully stable) for implementing pointer authentication on ARMv8.3 on certain Apple operating systems.
+
+This document serves four purposes:
+
+- It describes the basic ideas of pointer authentication.
+
+- It documents several language extensions that are useful on targets using pointer authentication.
+
+- It presents a theory of operation for the security mitigation, describing the basic requirements for correctness, various weaknesses in the mechanism, and ways in which programmers can strengthen its protections (including recommendations for language implementors).
+
+- It will eventually document the language ABIs currently used for C, C++, Objective-C, and Swift on arm64e, although these are not yet stable on any target.
+
+Basic Concepts
+--------------
+
+The simple address of an object or function is a **raw pointer**.  A raw pointer can be **signed** to produce a **signed pointer**.  A signed pointer can be then **authenticated** in order to verify that it was **validly signed** and extract the original raw pointer.  These terms reflect the most likely implementation technique: computing and storing a cryptographic signature along with the pointer.  The security of pointer authentication does not rely on attackers not being able to separately overwrite the signature.
+
+An **abstract signing key** is a name which refers to a secret key which can used to sign and authenticate pointers.  The key value for a particular name is consistent throughout a process.
+
+A **discriminator** is an arbitrary value used to **diversify** signed pointers so that one validly-signed pointer cannot simply be copied over another.  A discriminator is simply opaque data of some implementation-defined size that is included in the signature as a salt.
+
+Nearly all aspects of pointer authentication use just these two primary operations:
+
+- ``sign(raw_pointer, key, discriminator)`` produces a signed pointer given a raw pointer, an abstract signing key, and a discriminator.
+
+- ``auth(signed_pointer, key, discriminator)`` produces a raw pointer given a signed pointer, an abstract signing key, and a discriminator.
+
+``auth(sign(raw_pointer, key, discriminator), key, discriminator)`` must succeed and produce ``raw_pointer``.  ``auth`` applied to a value that was ultimately produced in any other way is expected to immediately halt the program.  However, it is permitted for ``auth`` to fail to detect that a signed pointer was not produced in this way, in which case it may return anything; this is what makes pointer authentication a probabilistic mitigation rather than a perfect one.
+
+There are two secondary operations which are required only to implement certain intrinsics in ````:
+
+- ``strip(signed_pointer, key)`` produces a raw pointer given a signed pointer and a key it was presumptively signed with.  This is useful for certain kinds of tooling, such as crash backtraces; it should generally not be used in the basic language ABI except in very careful ways.
+
+- ``sign_generic(value)`` produces a cryptographic signature for arbitrary data, not necessarily a pointer.  This is useful for efficiently verifying that non-pointer data has not been tampered with.
+
+Whenever any of these operations is called for, the key value must be known statically.  This is because the layout of a signed pointer may vary according to the signing key.  (For example, in ARMv8.3, the layout of a signed pointer depends on whether TBI is enabled, which can be set independently for code and data pointers.)
+
+.. admonition:: Note for API designers and language implementors
+
+  These are the *primitive* operations of pointer authentication, provided for clarity of description.  They are not suitable either as high-level interfaces or as primitives in a compiler IR because they expose raw pointers.  Raw pointers require special attention in the language implementation to avoid the accidental creation of exploitable code sequences; see the section on `Attackable code sequences`_.
+
+The following details are all implementation-defined:
+
+- the nature of a signed pointer
+- the size of a discriminator
+- the number and nature of the signing keys
+- the implementation of the ``sign``, ``auth``, ``strip``, and ``sign_generic`` operations
+
+While the use of the terms "sign" and "signed pointer" suggest the use of a cryptographic signature, other implementations may be possible.  See `Alternative implementations`_ for an exploration of implementation options.
+
+.. admonition:: Implementation example: ARMv8.3
+
+  Readers may find it helpful to know how these terms map to ARMv8.3:
+
+  - A signed pointer is a pointer with a signature stored in the otherwise-unused high bits.  The kernel configures the signature width based on the system's addressing needs, accounting for whether the AArch64 TBI feature is enabled for the kind of pointer (code or data).
+
+  - A discriminator is a 64-bit integer.  Constant discriminators are 16-bit integers.  Blending a constant discriminator into an address consists of replacing the top 16 bits of the address with the constant.
+
+  - There are five 128-bit signing-key registers, each of which can only be directly read or set by privileged code.  Of these, four are used for signing pointers, and the fifth is used only for ``sign_generic``.  The key data is simply a pepper added to the hash, not an encryption key, and so can be initialized using random data.
+
+  - ``sign`` computes a cryptographic hash of the pointer, discriminator, and signing key, and stores it in the high bits as the signature. ``auth`` removes the signature, computes the same hash, and compares the result with the stored signature.  ``strip`` removes the signature without authenticating it.  While ARMv8.3's ``aut*`` instructions do not themselves trap on failure, the compiler only ever emits them in sequences that will trap.
+
+  - ``sign_generic`` corresponds to the ``pacga`` instruction, which takes two 64-bit values and produces a 64-bit cryptographic hash. Implementations of this instruction may not produce meaningful data in all bits of the result.
+
+Discriminators
+~~~~~~~~~~~~~~
+
+A discriminator is arbitrary extra data which alters the signature on a pointer.  When two pointers are signed differently --- either with different keys or with different discriminators --- an attacker cannot simply replace one pointer with the other.  For more information on why discriminators are important and how to use them effectively, see the section on `Substitution attacks`_.
+
+To use standard cryptographic terminology, a discriminator acts as a salt in the signing of a pointer, and the key data acts as a pepper.  That is, both the discriminator and key data are ultimately just added as inputs to the signing algorithm along with the pointer, but they serve significantly different roles.  The key data is a common secret added to every signature, whereas the discriminator is a signing-specific value that can be derived from the circumstances of how a pointer is signed.  However, unlike a password salt, it's important that discriminators be *independently* derived from the circumstances of the signing; they should never simply be stored alongside a pointer.
+
+The intrinsic interface in ```` allows an arbitrary discriminator value to be provided, but can only be used when running normal code.  The discriminators used by language ABIs must be restricted to make it feasible for the loader to sign pointers stored in global memory without needing excessive amounts of metadata.  Under these restrictions, a discriminator may consist of either or both of the following:
+
+- The address at which the pointer is stored in memory.  A pointer signed with a discriminator which incorporates its storage address is said to have **address diversity**.  In general, using address diversity means that a pointer cannot be reliably replaced by an attacker or used to reliably replace a different pointer.  However, an attacker may still be able to attack a larger call sequence if they can alter the address through which the pointer is accessed.  Furthermore, some situations cannot use address diversity because of language or other restrictions.
+
+- A constant integer, called a **constant discriminator**. A pointer signed with a non-zero constant discriminator is said to have **constant diversity**.  If the discriminator is specific to a single declaration, it is said to have **declaration diversity**; if the discriminator is specific to a type of value, it is said to have **type diversity**.  For example, C++ v-tables on arm64e sign their component functions using a hash of their method names and signatures, which provides declaration diversity; similarly, C++ member function pointers sign their invocation functions using a hash of the member pointer type, which provides type diversity.
+
+The implementation may need to restrict constant discriminators to be significantly smaller than the full size of a discriminator.  For example, on arm64e, constant discriminators are only 16-bit values.  This is believed to not significantly weaken the mitigation, since collisions remain uncommon.
+
+The algorithm for blending a constant discriminator with a storage address is implementation-defined.
+
+.. _Signing schemas:
+
+Signing schemas
+~~~~~~~~~~~~~~~
+
+Correct use of pointer authentication requires the signing code and the authenticating code to agree about the **signing schema** for the pointer:
+
+- the abstract signing key with which the pointer should be signed and
+- an algorithm for computing the discriminator.
+
+As described in the section above on `Discriminators`_, in most situations, the discriminator is produced by taking a constant discriminator and optionally blending it with the storage address of the pointer.  In these situations, the signing schema breaks down even more simply:
+
+- the abstract signing key,
+- a constant discriminator, and
+- whether to use address diversity.
+
+It is important that the signing schema be independently derived at all signing and authentication sites.  Preferably, the schema should be hard-coded everywhere it is needed, but at the very least, it must not be derived by inspecting information stored along with the pointer.  See the section on `Attacks on pointer authentication`_ for more information.
+
+Language Features
+-----------------
+
+There is currently one main pointer authentication language feature:
+
+- The language provides the ```` intrinsic interface for manually signing and authenticating pointers in code.  These can be used in circumstances where very specific behavior is required.
+
+
+Language extensions
+~~~~~~~~~~~~~~~~~~~
+
+Feature testing
+^^^^^^^^^^^^^^^
+
+Whether the current target uses pointer authentication can be tested for with a number of different tests.
+
+- ``__has_feature(ptrauth_intrinsics)`` is true if ```` provides its normal interface.  This may be true even on targets where pointer authentication is not enabled by default.
+
+````
+~~~~~~~~~~~~~~~
+
+This header defines the following types and operations:
+
+``ptrauth_key``
+^^^^^^^^^^^^^^^
+
+This ``enum`` is the type of abstract signing keys.  In addition to defining the set of implementation-specific signing keys (for example, ARMv8.3 defines ``ptrauth_key_asia``), it also defines some portable aliases for those keys.  For example, ``ptrauth_key_function_pointer`` is the key generally used for C function pointers, which will generally be suitable for other function-signing schemas.
+
+In all the operation descriptions below, key values must be constant values corresponding to one of the implementation-specific abstract signing keys from this ``enum``.
+
+``ptrauth_extra_data_t``
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+This is a ``typedef`` of a standard integer type of the correct size to hold a discriminator value.
+
+In the signing and authentication operation descriptions below, discriminator values must have either pointer type or integer type. If the discriminator is an integer, it will be coerced to ``ptrauth_extra_data_t``.
+
+``ptrauth_blend_discriminator``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_blend_discriminator(pointer, integer)
+
+Produce a discriminator value which blends information from the given pointer and the given integer.
+
+Implementations may ignore some bits from each value, which is to say, the blending algorithm may be chosen for speed and convenience over theoretical strength as a hash-combining algorithm.  For example, arm64e simply overwrites the high 16 bits of the pointer with the low 16 bits of the integer, which can be done in a single instruction with an immediate integer.
+
+``pointer`` must have pointer type, and ``integer`` must have integer type. The result has type ``ptrauth_extra_data_t``.
+
+``ptrauth_strip``
+^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_strip(signedPointer, key)
+
+Given that ``signedPointer`` matches the layout for signed pointers signed with the given key, extract the raw pointer from it.  This operation does not trap and cannot fail, even if the pointer is not validly signed.
+
+``ptrauth_sign_unauthenticated``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_sign_unauthenticated(pointer, key, discriminator)
+
+Produce a signed pointer for the given raw pointer without applying any authentication or extra treatment.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+This is a treacherous operation that can easily result in `signing oracles`_.  Programs should use it seldom and carefully.
+
+``ptrauth_auth_and_resign``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_auth_and_resign(pointer, oldKey, oldDiscriminator, newKey, newDiscriminator)
+
+Authenticate that ``pointer`` is signed with ``oldKey`` and ``oldDiscriminator`` and then resign the raw-pointer result of that authentication with ``newKey`` and ``newDiscriminator``.
+
+``pointer`` must have pointer type.  The result will have the same type as ``pointer``.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+The code sequence produced for this operation must not be directly attackable.  However, if the discriminator values are not constant integers, their computations may still be attackable.  In the future, Clang should be enhanced to guaranteed non-attackability if these expressions are :ref:`safely-derived`.
+
+``ptrauth_auth_data``
+^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_auth_data(pointer, key, discriminator)
+
+Authenticate that ``pointer`` is signed with ``key`` and ``discriminator`` and remove the signature.
+
+``pointer`` must have object pointer type.  The result will have the same type as ``pointer``.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+In the future when Clang makes `safe derivation`_ guarantees, the result of this operation should be considered safely-derived.
+
+``ptrauth_sign_generic_data``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_sign_generic_data(value1, value2)
+
+Computes a signature for the given pair of values, incorporating a secret signing key.
+
+This operation can be used to verify that arbitrary data has not be tampered with by computing a signature for the data, storing that signature, and then repeating this process and verifying that it yields the same result.  This can be reasonably done in any number of ways; for example, a library could compute an ordinary checksum of the data and just sign the result in order to get the tamper-resistance advantages of the secret signing key (since otherwise an attacker could reliably overwrite both the data and the checksum).
+
+``value1`` and ``value2`` must be either pointers or integers.  If the integers are larger than ``uintptr_t`` then data not representa...

@llvmbot
Copy link
Member

llvmbot commented Sep 11, 2023

@llvm/pr-subscribers-clang-driver

Changes

This defines the basic set of pointer authentication clang builtins (provided in a new header, ptrauth.h), with diagnostics and IRGen support. The availability of the builtins is gated on a new flag, -fptrauth-intrinsics.

Note that this only includes the basic intrinsics, and notably excludes ptrauth_sign_constant, ptrauth_type_discriminator, and ptrauth_string_discriminator, which need extra logic to be fully supported.

This also introduces clang/docs/PointerAuthentication.rst, which describes the ptrauth model in general, as well as these builtins.

(Replaces https://reviews.llvm.org/D112941)

Patch is 93.58 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/65996.diff

30 Files Affected:

  • (modified) clang/docs/LanguageExtensions.rst (+5)
  • (added) clang/docs/PointerAuthentication.rst (+548)
  • (modified) clang/include/clang/Basic/Builtins.def (+8)
  • (modified) clang/include/clang/Basic/DiagnosticGroups.td (+1)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+16)
  • (modified) clang/include/clang/Basic/Features.def (+1)
  • (modified) clang/include/clang/Basic/LangOptions.def (+2)
  • (modified) clang/include/clang/Basic/TargetInfo.h (+6)
  • (modified) clang/include/clang/Driver/Options.td (+8)
  • (modified) clang/include/clang/Sema/Sema.h (+2)
  • (modified) clang/lib/Basic/Module.cpp (+4)
  • (modified) clang/lib/Basic/TargetInfo.cpp (+4)
  • (modified) clang/lib/Basic/Targets/AArch64.cpp (+6)
  • (modified) clang/lib/Basic/Targets/AArch64.h (+2)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+67)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+5)
  • (modified) clang/lib/Frontend/CompilerInvocation.cpp (+13)
  • (modified) clang/lib/Headers/CMakeLists.txt (+1)
  • (modified) clang/lib/Headers/module.modulemap (+5)
  • (added) clang/lib/Headers/ptrauth.h (+167)
  • (modified) clang/lib/Sema/SemaChecking.cpp (+182)
  • (added) clang/test/CodeGen/ptrauth-intrinsics.c (+73)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/module.modulemap (+8)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/ptrauth.h (+1)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/stddef.h (+1)
  • (added) clang/test/Modules/ptrauth-include-from-darwin.m (+6)
  • (added) clang/test/Preprocessor/ptrauth_feature.c (+10)
  • (added) clang/test/Sema/ptrauth-intrinsics-macro.c (+34)
  • (added) clang/test/Sema/ptrauth.c (+126)
  • (modified) llvm/docs/PointerAuth.md (+3)
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 11cbdca7a268fc3..49a3934d9d082fc 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -13,6 +13,7 @@ Clang Language Extensions
    BlockLanguageSpec
    Block-ABI-Apple
    AutomaticReferenceCounting
+   PointerAuthentication
    MatrixTypes
 
 Introduction
@@ -4157,6 +4158,10 @@ reordering of memory accesses and side effect instructions. Other instructions
 like simple arithmetic may be reordered around the intrinsic. If you expect to
 have no reordering at all, use inline assembly instead.
 
+Pointer Authentication
+^^^^^^^^^^^^^^^^^^^^^^
+See :doc:`PointerAuthentication`.
+
 X86/X86-64 Language Extensions
 ------------------------------
 
diff --git a/clang/docs/PointerAuthentication.rst b/clang/docs/PointerAuthentication.rst
new file mode 100644
index 000000000000000..87b8f244a2e4653
--- /dev/null
+++ b/clang/docs/PointerAuthentication.rst
@@ -0,0 +1,548 @@
+Pointer Authentication
+======================
+
+.. contents::
+   :local:
+
+Introduction
+------------
+
+Pointer authentication is a technology which offers strong probabilistic protection against exploiting a broad class of memory bugs to take control of program execution.  When adopted consistently in a language ABI, it provides a form of relatively fine-grained control flow integrity (CFI) check that resists both return-oriented programming (ROP) and jump-oriented programming (JOP) attacks.
+
+While pointer authentication can be implemented purely in software, direct hardware support (e.g. as provided by ARMv8.3) can dramatically lower the execution speed and code size costs.  Similarly, while pointer authentication can be implemented on any architecture, taking advantage of the (typically) excess addressing range of a target with 64-bit pointers minimizes the impact on memory performance and can allow interoperation with existing code (by disabling pointer authentication dynamically).  This document will generally attempt to present the pointer authentication feature independent of any hardware implementation or ABI.  Considerations that are implementation-specific are clearly identified throughout.
+
+Note that there are several different terms in use:
+
+- **Pointer authentication** is a target-independent language technology.
+
+- **ARMv8.3** is an AArch64 architecture revision of that provides hardware support for pointer authentication.  It is implemented on several shipping processors, including the Apple A12 and later.
+
+* **arm64e** is a specific ABI for (not yet fully stable) for implementing pointer authentication on ARMv8.3 on certain Apple operating systems.
+
+This document serves four purposes:
+
+- It describes the basic ideas of pointer authentication.
+
+- It documents several language extensions that are useful on targets using pointer authentication.
+
+- It presents a theory of operation for the security mitigation, describing the basic requirements for correctness, various weaknesses in the mechanism, and ways in which programmers can strengthen its protections (including recommendations for language implementors).
+
+- It will eventually document the language ABIs currently used for C, C++, Objective-C, and Swift on arm64e, although these are not yet stable on any target.
+
+Basic Concepts
+--------------
+
+The simple address of an object or function is a **raw pointer**.  A raw pointer can be **signed** to produce a **signed pointer**.  A signed pointer can be then **authenticated** in order to verify that it was **validly signed** and extract the original raw pointer.  These terms reflect the most likely implementation technique: computing and storing a cryptographic signature along with the pointer.  The security of pointer authentication does not rely on attackers not being able to separately overwrite the signature.
+
+An **abstract signing key** is a name which refers to a secret key which can used to sign and authenticate pointers.  The key value for a particular name is consistent throughout a process.
+
+A **discriminator** is an arbitrary value used to **diversify** signed pointers so that one validly-signed pointer cannot simply be copied over another.  A discriminator is simply opaque data of some implementation-defined size that is included in the signature as a salt.
+
+Nearly all aspects of pointer authentication use just these two primary operations:
+
+- ``sign(raw_pointer, key, discriminator)`` produces a signed pointer given a raw pointer, an abstract signing key, and a discriminator.
+
+- ``auth(signed_pointer, key, discriminator)`` produces a raw pointer given a signed pointer, an abstract signing key, and a discriminator.
+
+``auth(sign(raw_pointer, key, discriminator), key, discriminator)`` must succeed and produce ``raw_pointer``.  ``auth`` applied to a value that was ultimately produced in any other way is expected to immediately halt the program.  However, it is permitted for ``auth`` to fail to detect that a signed pointer was not produced in this way, in which case it may return anything; this is what makes pointer authentication a probabilistic mitigation rather than a perfect one.
+
+There are two secondary operations which are required only to implement certain intrinsics in ````:
+
+- ``strip(signed_pointer, key)`` produces a raw pointer given a signed pointer and a key it was presumptively signed with.  This is useful for certain kinds of tooling, such as crash backtraces; it should generally not be used in the basic language ABI except in very careful ways.
+
+- ``sign_generic(value)`` produces a cryptographic signature for arbitrary data, not necessarily a pointer.  This is useful for efficiently verifying that non-pointer data has not been tampered with.
+
+Whenever any of these operations is called for, the key value must be known statically.  This is because the layout of a signed pointer may vary according to the signing key.  (For example, in ARMv8.3, the layout of a signed pointer depends on whether TBI is enabled, which can be set independently for code and data pointers.)
+
+.. admonition:: Note for API designers and language implementors
+
+  These are the *primitive* operations of pointer authentication, provided for clarity of description.  They are not suitable either as high-level interfaces or as primitives in a compiler IR because they expose raw pointers.  Raw pointers require special attention in the language implementation to avoid the accidental creation of exploitable code sequences; see the section on `Attackable code sequences`_.
+
+The following details are all implementation-defined:
+
+- the nature of a signed pointer
+- the size of a discriminator
+- the number and nature of the signing keys
+- the implementation of the ``sign``, ``auth``, ``strip``, and ``sign_generic`` operations
+
+While the use of the terms "sign" and "signed pointer" suggest the use of a cryptographic signature, other implementations may be possible.  See `Alternative implementations`_ for an exploration of implementation options.
+
+.. admonition:: Implementation example: ARMv8.3
+
+  Readers may find it helpful to know how these terms map to ARMv8.3:
+
+  - A signed pointer is a pointer with a signature stored in the otherwise-unused high bits.  The kernel configures the signature width based on the system's addressing needs, accounting for whether the AArch64 TBI feature is enabled for the kind of pointer (code or data).
+
+  - A discriminator is a 64-bit integer.  Constant discriminators are 16-bit integers.  Blending a constant discriminator into an address consists of replacing the top 16 bits of the address with the constant.
+
+  - There are five 128-bit signing-key registers, each of which can only be directly read or set by privileged code.  Of these, four are used for signing pointers, and the fifth is used only for ``sign_generic``.  The key data is simply a pepper added to the hash, not an encryption key, and so can be initialized using random data.
+
+  - ``sign`` computes a cryptographic hash of the pointer, discriminator, and signing key, and stores it in the high bits as the signature. ``auth`` removes the signature, computes the same hash, and compares the result with the stored signature.  ``strip`` removes the signature without authenticating it.  While ARMv8.3's ``aut*`` instructions do not themselves trap on failure, the compiler only ever emits them in sequences that will trap.
+
+  - ``sign_generic`` corresponds to the ``pacga`` instruction, which takes two 64-bit values and produces a 64-bit cryptographic hash. Implementations of this instruction may not produce meaningful data in all bits of the result.
+
+Discriminators
+~~~~~~~~~~~~~~
+
+A discriminator is arbitrary extra data which alters the signature on a pointer.  When two pointers are signed differently --- either with different keys or with different discriminators --- an attacker cannot simply replace one pointer with the other.  For more information on why discriminators are important and how to use them effectively, see the section on `Substitution attacks`_.
+
+To use standard cryptographic terminology, a discriminator acts as a salt in the signing of a pointer, and the key data acts as a pepper.  That is, both the discriminator and key data are ultimately just added as inputs to the signing algorithm along with the pointer, but they serve significantly different roles.  The key data is a common secret added to every signature, whereas the discriminator is a signing-specific value that can be derived from the circumstances of how a pointer is signed.  However, unlike a password salt, it's important that discriminators be *independently* derived from the circumstances of the signing; they should never simply be stored alongside a pointer.
+
+The intrinsic interface in ```` allows an arbitrary discriminator value to be provided, but can only be used when running normal code.  The discriminators used by language ABIs must be restricted to make it feasible for the loader to sign pointers stored in global memory without needing excessive amounts of metadata.  Under these restrictions, a discriminator may consist of either or both of the following:
+
+- The address at which the pointer is stored in memory.  A pointer signed with a discriminator which incorporates its storage address is said to have **address diversity**.  In general, using address diversity means that a pointer cannot be reliably replaced by an attacker or used to reliably replace a different pointer.  However, an attacker may still be able to attack a larger call sequence if they can alter the address through which the pointer is accessed.  Furthermore, some situations cannot use address diversity because of language or other restrictions.
+
+- A constant integer, called a **constant discriminator**. A pointer signed with a non-zero constant discriminator is said to have **constant diversity**.  If the discriminator is specific to a single declaration, it is said to have **declaration diversity**; if the discriminator is specific to a type of value, it is said to have **type diversity**.  For example, C++ v-tables on arm64e sign their component functions using a hash of their method names and signatures, which provides declaration diversity; similarly, C++ member function pointers sign their invocation functions using a hash of the member pointer type, which provides type diversity.
+
+The implementation may need to restrict constant discriminators to be significantly smaller than the full size of a discriminator.  For example, on arm64e, constant discriminators are only 16-bit values.  This is believed to not significantly weaken the mitigation, since collisions remain uncommon.
+
+The algorithm for blending a constant discriminator with a storage address is implementation-defined.
+
+.. _Signing schemas:
+
+Signing schemas
+~~~~~~~~~~~~~~~
+
+Correct use of pointer authentication requires the signing code and the authenticating code to agree about the **signing schema** for the pointer:
+
+- the abstract signing key with which the pointer should be signed and
+- an algorithm for computing the discriminator.
+
+As described in the section above on `Discriminators`_, in most situations, the discriminator is produced by taking a constant discriminator and optionally blending it with the storage address of the pointer.  In these situations, the signing schema breaks down even more simply:
+
+- the abstract signing key,
+- a constant discriminator, and
+- whether to use address diversity.
+
+It is important that the signing schema be independently derived at all signing and authentication sites.  Preferably, the schema should be hard-coded everywhere it is needed, but at the very least, it must not be derived by inspecting information stored along with the pointer.  See the section on `Attacks on pointer authentication`_ for more information.
+
+Language Features
+-----------------
+
+There is currently one main pointer authentication language feature:
+
+- The language provides the ```` intrinsic interface for manually signing and authenticating pointers in code.  These can be used in circumstances where very specific behavior is required.
+
+
+Language extensions
+~~~~~~~~~~~~~~~~~~~
+
+Feature testing
+^^^^^^^^^^^^^^^
+
+Whether the current target uses pointer authentication can be tested for with a number of different tests.
+
+- ``__has_feature(ptrauth_intrinsics)`` is true if ```` provides its normal interface.  This may be true even on targets where pointer authentication is not enabled by default.
+
+````
+~~~~~~~~~~~~~~~
+
+This header defines the following types and operations:
+
+``ptrauth_key``
+^^^^^^^^^^^^^^^
+
+This ``enum`` is the type of abstract signing keys.  In addition to defining the set of implementation-specific signing keys (for example, ARMv8.3 defines ``ptrauth_key_asia``), it also defines some portable aliases for those keys.  For example, ``ptrauth_key_function_pointer`` is the key generally used for C function pointers, which will generally be suitable for other function-signing schemas.
+
+In all the operation descriptions below, key values must be constant values corresponding to one of the implementation-specific abstract signing keys from this ``enum``.
+
+``ptrauth_extra_data_t``
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+This is a ``typedef`` of a standard integer type of the correct size to hold a discriminator value.
+
+In the signing and authentication operation descriptions below, discriminator values must have either pointer type or integer type. If the discriminator is an integer, it will be coerced to ``ptrauth_extra_data_t``.
+
+``ptrauth_blend_discriminator``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_blend_discriminator(pointer, integer)
+
+Produce a discriminator value which blends information from the given pointer and the given integer.
+
+Implementations may ignore some bits from each value, which is to say, the blending algorithm may be chosen for speed and convenience over theoretical strength as a hash-combining algorithm.  For example, arm64e simply overwrites the high 16 bits of the pointer with the low 16 bits of the integer, which can be done in a single instruction with an immediate integer.
+
+``pointer`` must have pointer type, and ``integer`` must have integer type. The result has type ``ptrauth_extra_data_t``.
+
+``ptrauth_strip``
+^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_strip(signedPointer, key)
+
+Given that ``signedPointer`` matches the layout for signed pointers signed with the given key, extract the raw pointer from it.  This operation does not trap and cannot fail, even if the pointer is not validly signed.
+
+``ptrauth_sign_unauthenticated``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_sign_unauthenticated(pointer, key, discriminator)
+
+Produce a signed pointer for the given raw pointer without applying any authentication or extra treatment.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+This is a treacherous operation that can easily result in `signing oracles`_.  Programs should use it seldom and carefully.
+
+``ptrauth_auth_and_resign``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_auth_and_resign(pointer, oldKey, oldDiscriminator, newKey, newDiscriminator)
+
+Authenticate that ``pointer`` is signed with ``oldKey`` and ``oldDiscriminator`` and then resign the raw-pointer result of that authentication with ``newKey`` and ``newDiscriminator``.
+
+``pointer`` must have pointer type.  The result will have the same type as ``pointer``.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+The code sequence produced for this operation must not be directly attackable.  However, if the discriminator values are not constant integers, their computations may still be attackable.  In the future, Clang should be enhanced to guaranteed non-attackability if these expressions are :ref:`safely-derived`.
+
+``ptrauth_auth_data``
+^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_auth_data(pointer, key, discriminator)
+
+Authenticate that ``pointer`` is signed with ``key`` and ``discriminator`` and remove the signature.
+
+``pointer`` must have object pointer type.  The result will have the same type as ``pointer``.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+In the future when Clang makes `safe derivation`_ guarantees, the result of this operation should be considered safely-derived.
+
+``ptrauth_sign_generic_data``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_sign_generic_data(value1, value2)
+
+Computes a signature for the given pair of values, incorporating a secret signing key.
+
+This operation can be used to verify that arbitrary data has not be tampered with by computing a signature for the data, storing that signature, and then repeating this process and verifying that it yields the same result.  This can be reasonably done in any number of ways; for example, a library could compute an ordinary checksum of the data and just sign the result in order to get the tamper-resistance advantages of the secret signing key (since otherwise an attacker could reliably overwrite both the data and the checksum).
+
+``value1`` and ``value2`` must be either pointers or integers.  If the integers are larger than ``uintptr_t`` then data not representa...

@llvmbot
Copy link
Member

llvmbot commented Sep 11, 2023

@llvm/pr-subscribers-backend-aarch64

Changes

This defines the basic set of pointer authentication clang builtins (provided in a new header, ptrauth.h), with diagnostics and IRGen support. The availability of the builtins is gated on a new flag, -fptrauth-intrinsics.

Note that this only includes the basic intrinsics, and notably excludes ptrauth_sign_constant, ptrauth_type_discriminator, and ptrauth_string_discriminator, which need extra logic to be fully supported.

This also introduces clang/docs/PointerAuthentication.rst, which describes the ptrauth model in general, as well as these builtins.

(Replaces https://reviews.llvm.org/D112941)

Patch is 93.58 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/65996.diff

30 Files Affected:

  • (modified) clang/docs/LanguageExtensions.rst (+5)
  • (added) clang/docs/PointerAuthentication.rst (+548)
  • (modified) clang/include/clang/Basic/Builtins.def (+8)
  • (modified) clang/include/clang/Basic/DiagnosticGroups.td (+1)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+16)
  • (modified) clang/include/clang/Basic/Features.def (+1)
  • (modified) clang/include/clang/Basic/LangOptions.def (+2)
  • (modified) clang/include/clang/Basic/TargetInfo.h (+6)
  • (modified) clang/include/clang/Driver/Options.td (+8)
  • (modified) clang/include/clang/Sema/Sema.h (+2)
  • (modified) clang/lib/Basic/Module.cpp (+4)
  • (modified) clang/lib/Basic/TargetInfo.cpp (+4)
  • (modified) clang/lib/Basic/Targets/AArch64.cpp (+6)
  • (modified) clang/lib/Basic/Targets/AArch64.h (+2)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+67)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+5)
  • (modified) clang/lib/Frontend/CompilerInvocation.cpp (+13)
  • (modified) clang/lib/Headers/CMakeLists.txt (+1)
  • (modified) clang/lib/Headers/module.modulemap (+5)
  • (added) clang/lib/Headers/ptrauth.h (+167)
  • (modified) clang/lib/Sema/SemaChecking.cpp (+182)
  • (added) clang/test/CodeGen/ptrauth-intrinsics.c (+73)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/module.modulemap (+8)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/ptrauth.h (+1)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/stddef.h (+1)
  • (added) clang/test/Modules/ptrauth-include-from-darwin.m (+6)
  • (added) clang/test/Preprocessor/ptrauth_feature.c (+10)
  • (added) clang/test/Sema/ptrauth-intrinsics-macro.c (+34)
  • (added) clang/test/Sema/ptrauth.c (+126)
  • (modified) llvm/docs/PointerAuth.md (+3)
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 11cbdca7a268fc3..49a3934d9d082fc 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -13,6 +13,7 @@ Clang Language Extensions
    BlockLanguageSpec
    Block-ABI-Apple
    AutomaticReferenceCounting
+   PointerAuthentication
    MatrixTypes
 
 Introduction
@@ -4157,6 +4158,10 @@ reordering of memory accesses and side effect instructions. Other instructions
 like simple arithmetic may be reordered around the intrinsic. If you expect to
 have no reordering at all, use inline assembly instead.
 
+Pointer Authentication
+^^^^^^^^^^^^^^^^^^^^^^
+See :doc:`PointerAuthentication`.
+
 X86/X86-64 Language Extensions
 ------------------------------
 
diff --git a/clang/docs/PointerAuthentication.rst b/clang/docs/PointerAuthentication.rst
new file mode 100644
index 000000000000000..87b8f244a2e4653
--- /dev/null
+++ b/clang/docs/PointerAuthentication.rst
@@ -0,0 +1,548 @@
+Pointer Authentication
+======================
+
+.. contents::
+   :local:
+
+Introduction
+------------
+
+Pointer authentication is a technology which offers strong probabilistic protection against exploiting a broad class of memory bugs to take control of program execution.  When adopted consistently in a language ABI, it provides a form of relatively fine-grained control flow integrity (CFI) check that resists both return-oriented programming (ROP) and jump-oriented programming (JOP) attacks.
+
+While pointer authentication can be implemented purely in software, direct hardware support (e.g. as provided by ARMv8.3) can dramatically lower the execution speed and code size costs.  Similarly, while pointer authentication can be implemented on any architecture, taking advantage of the (typically) excess addressing range of a target with 64-bit pointers minimizes the impact on memory performance and can allow interoperation with existing code (by disabling pointer authentication dynamically).  This document will generally attempt to present the pointer authentication feature independent of any hardware implementation or ABI.  Considerations that are implementation-specific are clearly identified throughout.
+
+Note that there are several different terms in use:
+
+- **Pointer authentication** is a target-independent language technology.
+
+- **ARMv8.3** is an AArch64 architecture revision of that provides hardware support for pointer authentication.  It is implemented on several shipping processors, including the Apple A12 and later.
+
+* **arm64e** is a specific ABI for (not yet fully stable) for implementing pointer authentication on ARMv8.3 on certain Apple operating systems.
+
+This document serves four purposes:
+
+- It describes the basic ideas of pointer authentication.
+
+- It documents several language extensions that are useful on targets using pointer authentication.
+
+- It presents a theory of operation for the security mitigation, describing the basic requirements for correctness, various weaknesses in the mechanism, and ways in which programmers can strengthen its protections (including recommendations for language implementors).
+
+- It will eventually document the language ABIs currently used for C, C++, Objective-C, and Swift on arm64e, although these are not yet stable on any target.
+
+Basic Concepts
+--------------
+
+The simple address of an object or function is a **raw pointer**.  A raw pointer can be **signed** to produce a **signed pointer**.  A signed pointer can be then **authenticated** in order to verify that it was **validly signed** and extract the original raw pointer.  These terms reflect the most likely implementation technique: computing and storing a cryptographic signature along with the pointer.  The security of pointer authentication does not rely on attackers not being able to separately overwrite the signature.
+
+An **abstract signing key** is a name which refers to a secret key which can used to sign and authenticate pointers.  The key value for a particular name is consistent throughout a process.
+
+A **discriminator** is an arbitrary value used to **diversify** signed pointers so that one validly-signed pointer cannot simply be copied over another.  A discriminator is simply opaque data of some implementation-defined size that is included in the signature as a salt.
+
+Nearly all aspects of pointer authentication use just these two primary operations:
+
+- ``sign(raw_pointer, key, discriminator)`` produces a signed pointer given a raw pointer, an abstract signing key, and a discriminator.
+
+- ``auth(signed_pointer, key, discriminator)`` produces a raw pointer given a signed pointer, an abstract signing key, and a discriminator.
+
+``auth(sign(raw_pointer, key, discriminator), key, discriminator)`` must succeed and produce ``raw_pointer``.  ``auth`` applied to a value that was ultimately produced in any other way is expected to immediately halt the program.  However, it is permitted for ``auth`` to fail to detect that a signed pointer was not produced in this way, in which case it may return anything; this is what makes pointer authentication a probabilistic mitigation rather than a perfect one.
+
+There are two secondary operations which are required only to implement certain intrinsics in ````:
+
+- ``strip(signed_pointer, key)`` produces a raw pointer given a signed pointer and a key it was presumptively signed with.  This is useful for certain kinds of tooling, such as crash backtraces; it should generally not be used in the basic language ABI except in very careful ways.
+
+- ``sign_generic(value)`` produces a cryptographic signature for arbitrary data, not necessarily a pointer.  This is useful for efficiently verifying that non-pointer data has not been tampered with.
+
+Whenever any of these operations is called for, the key value must be known statically.  This is because the layout of a signed pointer may vary according to the signing key.  (For example, in ARMv8.3, the layout of a signed pointer depends on whether TBI is enabled, which can be set independently for code and data pointers.)
+
+.. admonition:: Note for API designers and language implementors
+
+  These are the *primitive* operations of pointer authentication, provided for clarity of description.  They are not suitable either as high-level interfaces or as primitives in a compiler IR because they expose raw pointers.  Raw pointers require special attention in the language implementation to avoid the accidental creation of exploitable code sequences; see the section on `Attackable code sequences`_.
+
+The following details are all implementation-defined:
+
+- the nature of a signed pointer
+- the size of a discriminator
+- the number and nature of the signing keys
+- the implementation of the ``sign``, ``auth``, ``strip``, and ``sign_generic`` operations
+
+While the use of the terms "sign" and "signed pointer" suggest the use of a cryptographic signature, other implementations may be possible.  See `Alternative implementations`_ for an exploration of implementation options.
+
+.. admonition:: Implementation example: ARMv8.3
+
+  Readers may find it helpful to know how these terms map to ARMv8.3:
+
+  - A signed pointer is a pointer with a signature stored in the otherwise-unused high bits.  The kernel configures the signature width based on the system's addressing needs, accounting for whether the AArch64 TBI feature is enabled for the kind of pointer (code or data).
+
+  - A discriminator is a 64-bit integer.  Constant discriminators are 16-bit integers.  Blending a constant discriminator into an address consists of replacing the top 16 bits of the address with the constant.
+
+  - There are five 128-bit signing-key registers, each of which can only be directly read or set by privileged code.  Of these, four are used for signing pointers, and the fifth is used only for ``sign_generic``.  The key data is simply a pepper added to the hash, not an encryption key, and so can be initialized using random data.
+
+  - ``sign`` computes a cryptographic hash of the pointer, discriminator, and signing key, and stores it in the high bits as the signature. ``auth`` removes the signature, computes the same hash, and compares the result with the stored signature.  ``strip`` removes the signature without authenticating it.  While ARMv8.3's ``aut*`` instructions do not themselves trap on failure, the compiler only ever emits them in sequences that will trap.
+
+  - ``sign_generic`` corresponds to the ``pacga`` instruction, which takes two 64-bit values and produces a 64-bit cryptographic hash. Implementations of this instruction may not produce meaningful data in all bits of the result.
+
+Discriminators
+~~~~~~~~~~~~~~
+
+A discriminator is arbitrary extra data which alters the signature on a pointer.  When two pointers are signed differently --- either with different keys or with different discriminators --- an attacker cannot simply replace one pointer with the other.  For more information on why discriminators are important and how to use them effectively, see the section on `Substitution attacks`_.
+
+To use standard cryptographic terminology, a discriminator acts as a salt in the signing of a pointer, and the key data acts as a pepper.  That is, both the discriminator and key data are ultimately just added as inputs to the signing algorithm along with the pointer, but they serve significantly different roles.  The key data is a common secret added to every signature, whereas the discriminator is a signing-specific value that can be derived from the circumstances of how a pointer is signed.  However, unlike a password salt, it's important that discriminators be *independently* derived from the circumstances of the signing; they should never simply be stored alongside a pointer.
+
+The intrinsic interface in ```` allows an arbitrary discriminator value to be provided, but can only be used when running normal code.  The discriminators used by language ABIs must be restricted to make it feasible for the loader to sign pointers stored in global memory without needing excessive amounts of metadata.  Under these restrictions, a discriminator may consist of either or both of the following:
+
+- The address at which the pointer is stored in memory.  A pointer signed with a discriminator which incorporates its storage address is said to have **address diversity**.  In general, using address diversity means that a pointer cannot be reliably replaced by an attacker or used to reliably replace a different pointer.  However, an attacker may still be able to attack a larger call sequence if they can alter the address through which the pointer is accessed.  Furthermore, some situations cannot use address diversity because of language or other restrictions.
+
+- A constant integer, called a **constant discriminator**. A pointer signed with a non-zero constant discriminator is said to have **constant diversity**.  If the discriminator is specific to a single declaration, it is said to have **declaration diversity**; if the discriminator is specific to a type of value, it is said to have **type diversity**.  For example, C++ v-tables on arm64e sign their component functions using a hash of their method names and signatures, which provides declaration diversity; similarly, C++ member function pointers sign their invocation functions using a hash of the member pointer type, which provides type diversity.
+
+The implementation may need to restrict constant discriminators to be significantly smaller than the full size of a discriminator.  For example, on arm64e, constant discriminators are only 16-bit values.  This is believed to not significantly weaken the mitigation, since collisions remain uncommon.
+
+The algorithm for blending a constant discriminator with a storage address is implementation-defined.
+
+.. _Signing schemas:
+
+Signing schemas
+~~~~~~~~~~~~~~~
+
+Correct use of pointer authentication requires the signing code and the authenticating code to agree about the **signing schema** for the pointer:
+
+- the abstract signing key with which the pointer should be signed and
+- an algorithm for computing the discriminator.
+
+As described in the section above on `Discriminators`_, in most situations, the discriminator is produced by taking a constant discriminator and optionally blending it with the storage address of the pointer.  In these situations, the signing schema breaks down even more simply:
+
+- the abstract signing key,
+- a constant discriminator, and
+- whether to use address diversity.
+
+It is important that the signing schema be independently derived at all signing and authentication sites.  Preferably, the schema should be hard-coded everywhere it is needed, but at the very least, it must not be derived by inspecting information stored along with the pointer.  See the section on `Attacks on pointer authentication`_ for more information.
+
+Language Features
+-----------------
+
+There is currently one main pointer authentication language feature:
+
+- The language provides the ```` intrinsic interface for manually signing and authenticating pointers in code.  These can be used in circumstances where very specific behavior is required.
+
+
+Language extensions
+~~~~~~~~~~~~~~~~~~~
+
+Feature testing
+^^^^^^^^^^^^^^^
+
+Whether the current target uses pointer authentication can be tested for with a number of different tests.
+
+- ``__has_feature(ptrauth_intrinsics)`` is true if ```` provides its normal interface.  This may be true even on targets where pointer authentication is not enabled by default.
+
+````
+~~~~~~~~~~~~~~~
+
+This header defines the following types and operations:
+
+``ptrauth_key``
+^^^^^^^^^^^^^^^
+
+This ``enum`` is the type of abstract signing keys.  In addition to defining the set of implementation-specific signing keys (for example, ARMv8.3 defines ``ptrauth_key_asia``), it also defines some portable aliases for those keys.  For example, ``ptrauth_key_function_pointer`` is the key generally used for C function pointers, which will generally be suitable for other function-signing schemas.
+
+In all the operation descriptions below, key values must be constant values corresponding to one of the implementation-specific abstract signing keys from this ``enum``.
+
+``ptrauth_extra_data_t``
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+This is a ``typedef`` of a standard integer type of the correct size to hold a discriminator value.
+
+In the signing and authentication operation descriptions below, discriminator values must have either pointer type or integer type. If the discriminator is an integer, it will be coerced to ``ptrauth_extra_data_t``.
+
+``ptrauth_blend_discriminator``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_blend_discriminator(pointer, integer)
+
+Produce a discriminator value which blends information from the given pointer and the given integer.
+
+Implementations may ignore some bits from each value, which is to say, the blending algorithm may be chosen for speed and convenience over theoretical strength as a hash-combining algorithm.  For example, arm64e simply overwrites the high 16 bits of the pointer with the low 16 bits of the integer, which can be done in a single instruction with an immediate integer.
+
+``pointer`` must have pointer type, and ``integer`` must have integer type. The result has type ``ptrauth_extra_data_t``.
+
+``ptrauth_strip``
+^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_strip(signedPointer, key)
+
+Given that ``signedPointer`` matches the layout for signed pointers signed with the given key, extract the raw pointer from it.  This operation does not trap and cannot fail, even if the pointer is not validly signed.
+
+``ptrauth_sign_unauthenticated``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_sign_unauthenticated(pointer, key, discriminator)
+
+Produce a signed pointer for the given raw pointer without applying any authentication or extra treatment.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+This is a treacherous operation that can easily result in `signing oracles`_.  Programs should use it seldom and carefully.
+
+``ptrauth_auth_and_resign``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_auth_and_resign(pointer, oldKey, oldDiscriminator, newKey, newDiscriminator)
+
+Authenticate that ``pointer`` is signed with ``oldKey`` and ``oldDiscriminator`` and then resign the raw-pointer result of that authentication with ``newKey`` and ``newDiscriminator``.
+
+``pointer`` must have pointer type.  The result will have the same type as ``pointer``.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+The code sequence produced for this operation must not be directly attackable.  However, if the discriminator values are not constant integers, their computations may still be attackable.  In the future, Clang should be enhanced to guaranteed non-attackability if these expressions are :ref:`safely-derived`.
+
+``ptrauth_auth_data``
+^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_auth_data(pointer, key, discriminator)
+
+Authenticate that ``pointer`` is signed with ``key`` and ``discriminator`` and remove the signature.
+
+``pointer`` must have object pointer type.  The result will have the same type as ``pointer``.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+In the future when Clang makes `safe derivation`_ guarantees, the result of this operation should be considered safely-derived.
+
+``ptrauth_sign_generic_data``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_sign_generic_data(value1, value2)
+
+Computes a signature for the given pair of values, incorporating a secret signing key.
+
+This operation can be used to verify that arbitrary data has not be tampered with by computing a signature for the data, storing that signature, and then repeating this process and verifying that it yields the same result.  This can be reasonably done in any number of ways; for example, a library could compute an ordinary checksum of the data and just sign the result in order to get the tamper-resistance advantages of the secret signing key (since otherwise an attacker could reliably overwrite both the data and the checksum).
+
+``value1`` and ``value2`` must be either pointers or integers.  If the integers are larger than ``uintptr_t`` then data not representa...

@llvmbot
Copy link
Member

llvmbot commented Sep 11, 2023

@llvm/pr-subscribers-clang

Changes

This defines the basic set of pointer authentication clang builtins (provided in a new header, ptrauth.h), with diagnostics and IRGen support. The availability of the builtins is gated on a new flag, -fptrauth-intrinsics.

Note that this only includes the basic intrinsics, and notably excludes ptrauth_sign_constant, ptrauth_type_discriminator, and ptrauth_string_discriminator, which need extra logic to be fully supported.

This also introduces clang/docs/PointerAuthentication.rst, which describes the ptrauth model in general, as well as these builtins.

(Replaces https://reviews.llvm.org/D112941)

Patch is 93.58 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/65996.diff

30 Files Affected:

  • (modified) clang/docs/LanguageExtensions.rst (+5)
  • (added) clang/docs/PointerAuthentication.rst (+548)
  • (modified) clang/include/clang/Basic/Builtins.def (+8)
  • (modified) clang/include/clang/Basic/DiagnosticGroups.td (+1)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+16)
  • (modified) clang/include/clang/Basic/Features.def (+1)
  • (modified) clang/include/clang/Basic/LangOptions.def (+2)
  • (modified) clang/include/clang/Basic/TargetInfo.h (+6)
  • (modified) clang/include/clang/Driver/Options.td (+8)
  • (modified) clang/include/clang/Sema/Sema.h (+2)
  • (modified) clang/lib/Basic/Module.cpp (+4)
  • (modified) clang/lib/Basic/TargetInfo.cpp (+4)
  • (modified) clang/lib/Basic/Targets/AArch64.cpp (+6)
  • (modified) clang/lib/Basic/Targets/AArch64.h (+2)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+67)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+5)
  • (modified) clang/lib/Frontend/CompilerInvocation.cpp (+13)
  • (modified) clang/lib/Headers/CMakeLists.txt (+1)
  • (modified) clang/lib/Headers/module.modulemap (+5)
  • (added) clang/lib/Headers/ptrauth.h (+167)
  • (modified) clang/lib/Sema/SemaChecking.cpp (+182)
  • (added) clang/test/CodeGen/ptrauth-intrinsics.c (+73)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/module.modulemap (+8)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/ptrauth.h (+1)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/stddef.h (+1)
  • (added) clang/test/Modules/ptrauth-include-from-darwin.m (+6)
  • (added) clang/test/Preprocessor/ptrauth_feature.c (+10)
  • (added) clang/test/Sema/ptrauth-intrinsics-macro.c (+34)
  • (added) clang/test/Sema/ptrauth.c (+126)
  • (modified) llvm/docs/PointerAuth.md (+3)
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 11cbdca7a268fc3..49a3934d9d082fc 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -13,6 +13,7 @@ Clang Language Extensions
    BlockLanguageSpec
    Block-ABI-Apple
    AutomaticReferenceCounting
+   PointerAuthentication
    MatrixTypes
 
 Introduction
@@ -4157,6 +4158,10 @@ reordering of memory accesses and side effect instructions. Other instructions
 like simple arithmetic may be reordered around the intrinsic. If you expect to
 have no reordering at all, use inline assembly instead.
 
+Pointer Authentication
+^^^^^^^^^^^^^^^^^^^^^^
+See :doc:`PointerAuthentication`.
+
 X86/X86-64 Language Extensions
 ------------------------------
 
diff --git a/clang/docs/PointerAuthentication.rst b/clang/docs/PointerAuthentication.rst
new file mode 100644
index 000000000000000..87b8f244a2e4653
--- /dev/null
+++ b/clang/docs/PointerAuthentication.rst
@@ -0,0 +1,548 @@
+Pointer Authentication
+======================
+
+.. contents::
+   :local:
+
+Introduction
+------------
+
+Pointer authentication is a technology which offers strong probabilistic protection against exploiting a broad class of memory bugs to take control of program execution.  When adopted consistently in a language ABI, it provides a form of relatively fine-grained control flow integrity (CFI) check that resists both return-oriented programming (ROP) and jump-oriented programming (JOP) attacks.
+
+While pointer authentication can be implemented purely in software, direct hardware support (e.g. as provided by ARMv8.3) can dramatically lower the execution speed and code size costs.  Similarly, while pointer authentication can be implemented on any architecture, taking advantage of the (typically) excess addressing range of a target with 64-bit pointers minimizes the impact on memory performance and can allow interoperation with existing code (by disabling pointer authentication dynamically).  This document will generally attempt to present the pointer authentication feature independent of any hardware implementation or ABI.  Considerations that are implementation-specific are clearly identified throughout.
+
+Note that there are several different terms in use:
+
+- **Pointer authentication** is a target-independent language technology.
+
+- **ARMv8.3** is an AArch64 architecture revision of that provides hardware support for pointer authentication.  It is implemented on several shipping processors, including the Apple A12 and later.
+
+* **arm64e** is a specific ABI for (not yet fully stable) for implementing pointer authentication on ARMv8.3 on certain Apple operating systems.
+
+This document serves four purposes:
+
+- It describes the basic ideas of pointer authentication.
+
+- It documents several language extensions that are useful on targets using pointer authentication.
+
+- It presents a theory of operation for the security mitigation, describing the basic requirements for correctness, various weaknesses in the mechanism, and ways in which programmers can strengthen its protections (including recommendations for language implementors).
+
+- It will eventually document the language ABIs currently used for C, C++, Objective-C, and Swift on arm64e, although these are not yet stable on any target.
+
+Basic Concepts
+--------------
+
+The simple address of an object or function is a **raw pointer**.  A raw pointer can be **signed** to produce a **signed pointer**.  A signed pointer can be then **authenticated** in order to verify that it was **validly signed** and extract the original raw pointer.  These terms reflect the most likely implementation technique: computing and storing a cryptographic signature along with the pointer.  The security of pointer authentication does not rely on attackers not being able to separately overwrite the signature.
+
+An **abstract signing key** is a name which refers to a secret key which can used to sign and authenticate pointers.  The key value for a particular name is consistent throughout a process.
+
+A **discriminator** is an arbitrary value used to **diversify** signed pointers so that one validly-signed pointer cannot simply be copied over another.  A discriminator is simply opaque data of some implementation-defined size that is included in the signature as a salt.
+
+Nearly all aspects of pointer authentication use just these two primary operations:
+
+- ``sign(raw_pointer, key, discriminator)`` produces a signed pointer given a raw pointer, an abstract signing key, and a discriminator.
+
+- ``auth(signed_pointer, key, discriminator)`` produces a raw pointer given a signed pointer, an abstract signing key, and a discriminator.
+
+``auth(sign(raw_pointer, key, discriminator), key, discriminator)`` must succeed and produce ``raw_pointer``.  ``auth`` applied to a value that was ultimately produced in any other way is expected to immediately halt the program.  However, it is permitted for ``auth`` to fail to detect that a signed pointer was not produced in this way, in which case it may return anything; this is what makes pointer authentication a probabilistic mitigation rather than a perfect one.
+
+There are two secondary operations which are required only to implement certain intrinsics in ````:
+
+- ``strip(signed_pointer, key)`` produces a raw pointer given a signed pointer and a key it was presumptively signed with.  This is useful for certain kinds of tooling, such as crash backtraces; it should generally not be used in the basic language ABI except in very careful ways.
+
+- ``sign_generic(value)`` produces a cryptographic signature for arbitrary data, not necessarily a pointer.  This is useful for efficiently verifying that non-pointer data has not been tampered with.
+
+Whenever any of these operations is called for, the key value must be known statically.  This is because the layout of a signed pointer may vary according to the signing key.  (For example, in ARMv8.3, the layout of a signed pointer depends on whether TBI is enabled, which can be set independently for code and data pointers.)
+
+.. admonition:: Note for API designers and language implementors
+
+  These are the *primitive* operations of pointer authentication, provided for clarity of description.  They are not suitable either as high-level interfaces or as primitives in a compiler IR because they expose raw pointers.  Raw pointers require special attention in the language implementation to avoid the accidental creation of exploitable code sequences; see the section on `Attackable code sequences`_.
+
+The following details are all implementation-defined:
+
+- the nature of a signed pointer
+- the size of a discriminator
+- the number and nature of the signing keys
+- the implementation of the ``sign``, ``auth``, ``strip``, and ``sign_generic`` operations
+
+While the use of the terms "sign" and "signed pointer" suggest the use of a cryptographic signature, other implementations may be possible.  See `Alternative implementations`_ for an exploration of implementation options.
+
+.. admonition:: Implementation example: ARMv8.3
+
+  Readers may find it helpful to know how these terms map to ARMv8.3:
+
+  - A signed pointer is a pointer with a signature stored in the otherwise-unused high bits.  The kernel configures the signature width based on the system's addressing needs, accounting for whether the AArch64 TBI feature is enabled for the kind of pointer (code or data).
+
+  - A discriminator is a 64-bit integer.  Constant discriminators are 16-bit integers.  Blending a constant discriminator into an address consists of replacing the top 16 bits of the address with the constant.
+
+  - There are five 128-bit signing-key registers, each of which can only be directly read or set by privileged code.  Of these, four are used for signing pointers, and the fifth is used only for ``sign_generic``.  The key data is simply a pepper added to the hash, not an encryption key, and so can be initialized using random data.
+
+  - ``sign`` computes a cryptographic hash of the pointer, discriminator, and signing key, and stores it in the high bits as the signature. ``auth`` removes the signature, computes the same hash, and compares the result with the stored signature.  ``strip`` removes the signature without authenticating it.  While ARMv8.3's ``aut*`` instructions do not themselves trap on failure, the compiler only ever emits them in sequences that will trap.
+
+  - ``sign_generic`` corresponds to the ``pacga`` instruction, which takes two 64-bit values and produces a 64-bit cryptographic hash. Implementations of this instruction may not produce meaningful data in all bits of the result.
+
+Discriminators
+~~~~~~~~~~~~~~
+
+A discriminator is arbitrary extra data which alters the signature on a pointer.  When two pointers are signed differently --- either with different keys or with different discriminators --- an attacker cannot simply replace one pointer with the other.  For more information on why discriminators are important and how to use them effectively, see the section on `Substitution attacks`_.
+
+To use standard cryptographic terminology, a discriminator acts as a salt in the signing of a pointer, and the key data acts as a pepper.  That is, both the discriminator and key data are ultimately just added as inputs to the signing algorithm along with the pointer, but they serve significantly different roles.  The key data is a common secret added to every signature, whereas the discriminator is a signing-specific value that can be derived from the circumstances of how a pointer is signed.  However, unlike a password salt, it's important that discriminators be *independently* derived from the circumstances of the signing; they should never simply be stored alongside a pointer.
+
+The intrinsic interface in ```` allows an arbitrary discriminator value to be provided, but can only be used when running normal code.  The discriminators used by language ABIs must be restricted to make it feasible for the loader to sign pointers stored in global memory without needing excessive amounts of metadata.  Under these restrictions, a discriminator may consist of either or both of the following:
+
+- The address at which the pointer is stored in memory.  A pointer signed with a discriminator which incorporates its storage address is said to have **address diversity**.  In general, using address diversity means that a pointer cannot be reliably replaced by an attacker or used to reliably replace a different pointer.  However, an attacker may still be able to attack a larger call sequence if they can alter the address through which the pointer is accessed.  Furthermore, some situations cannot use address diversity because of language or other restrictions.
+
+- A constant integer, called a **constant discriminator**. A pointer signed with a non-zero constant discriminator is said to have **constant diversity**.  If the discriminator is specific to a single declaration, it is said to have **declaration diversity**; if the discriminator is specific to a type of value, it is said to have **type diversity**.  For example, C++ v-tables on arm64e sign their component functions using a hash of their method names and signatures, which provides declaration diversity; similarly, C++ member function pointers sign their invocation functions using a hash of the member pointer type, which provides type diversity.
+
+The implementation may need to restrict constant discriminators to be significantly smaller than the full size of a discriminator.  For example, on arm64e, constant discriminators are only 16-bit values.  This is believed to not significantly weaken the mitigation, since collisions remain uncommon.
+
+The algorithm for blending a constant discriminator with a storage address is implementation-defined.
+
+.. _Signing schemas:
+
+Signing schemas
+~~~~~~~~~~~~~~~
+
+Correct use of pointer authentication requires the signing code and the authenticating code to agree about the **signing schema** for the pointer:
+
+- the abstract signing key with which the pointer should be signed and
+- an algorithm for computing the discriminator.
+
+As described in the section above on `Discriminators`_, in most situations, the discriminator is produced by taking a constant discriminator and optionally blending it with the storage address of the pointer.  In these situations, the signing schema breaks down even more simply:
+
+- the abstract signing key,
+- a constant discriminator, and
+- whether to use address diversity.
+
+It is important that the signing schema be independently derived at all signing and authentication sites.  Preferably, the schema should be hard-coded everywhere it is needed, but at the very least, it must not be derived by inspecting information stored along with the pointer.  See the section on `Attacks on pointer authentication`_ for more information.
+
+Language Features
+-----------------
+
+There is currently one main pointer authentication language feature:
+
+- The language provides the ```` intrinsic interface for manually signing and authenticating pointers in code.  These can be used in circumstances where very specific behavior is required.
+
+
+Language extensions
+~~~~~~~~~~~~~~~~~~~
+
+Feature testing
+^^^^^^^^^^^^^^^
+
+Whether the current target uses pointer authentication can be tested for with a number of different tests.
+
+- ``__has_feature(ptrauth_intrinsics)`` is true if ```` provides its normal interface.  This may be true even on targets where pointer authentication is not enabled by default.
+
+````
+~~~~~~~~~~~~~~~
+
+This header defines the following types and operations:
+
+``ptrauth_key``
+^^^^^^^^^^^^^^^
+
+This ``enum`` is the type of abstract signing keys.  In addition to defining the set of implementation-specific signing keys (for example, ARMv8.3 defines ``ptrauth_key_asia``), it also defines some portable aliases for those keys.  For example, ``ptrauth_key_function_pointer`` is the key generally used for C function pointers, which will generally be suitable for other function-signing schemas.
+
+In all the operation descriptions below, key values must be constant values corresponding to one of the implementation-specific abstract signing keys from this ``enum``.
+
+``ptrauth_extra_data_t``
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+This is a ``typedef`` of a standard integer type of the correct size to hold a discriminator value.
+
+In the signing and authentication operation descriptions below, discriminator values must have either pointer type or integer type. If the discriminator is an integer, it will be coerced to ``ptrauth_extra_data_t``.
+
+``ptrauth_blend_discriminator``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_blend_discriminator(pointer, integer)
+
+Produce a discriminator value which blends information from the given pointer and the given integer.
+
+Implementations may ignore some bits from each value, which is to say, the blending algorithm may be chosen for speed and convenience over theoretical strength as a hash-combining algorithm.  For example, arm64e simply overwrites the high 16 bits of the pointer with the low 16 bits of the integer, which can be done in a single instruction with an immediate integer.
+
+``pointer`` must have pointer type, and ``integer`` must have integer type. The result has type ``ptrauth_extra_data_t``.
+
+``ptrauth_strip``
+^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_strip(signedPointer, key)
+
+Given that ``signedPointer`` matches the layout for signed pointers signed with the given key, extract the raw pointer from it.  This operation does not trap and cannot fail, even if the pointer is not validly signed.
+
+``ptrauth_sign_unauthenticated``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_sign_unauthenticated(pointer, key, discriminator)
+
+Produce a signed pointer for the given raw pointer without applying any authentication or extra treatment.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+This is a treacherous operation that can easily result in `signing oracles`_.  Programs should use it seldom and carefully.
+
+``ptrauth_auth_and_resign``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_auth_and_resign(pointer, oldKey, oldDiscriminator, newKey, newDiscriminator)
+
+Authenticate that ``pointer`` is signed with ``oldKey`` and ``oldDiscriminator`` and then resign the raw-pointer result of that authentication with ``newKey`` and ``newDiscriminator``.
+
+``pointer`` must have pointer type.  The result will have the same type as ``pointer``.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+The code sequence produced for this operation must not be directly attackable.  However, if the discriminator values are not constant integers, their computations may still be attackable.  In the future, Clang should be enhanced to guaranteed non-attackability if these expressions are :ref:`safely-derived`.
+
+``ptrauth_auth_data``
+^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_auth_data(pointer, key, discriminator)
+
+Authenticate that ``pointer`` is signed with ``key`` and ``discriminator`` and remove the signature.
+
+``pointer`` must have object pointer type.  The result will have the same type as ``pointer``.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+In the future when Clang makes `safe derivation`_ guarantees, the result of this operation should be considered safely-derived.
+
+``ptrauth_sign_generic_data``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_sign_generic_data(value1, value2)
+
+Computes a signature for the given pair of values, incorporating a secret signing key.
+
+This operation can be used to verify that arbitrary data has not be tampered with by computing a signature for the data, storing that signature, and then repeating this process and verifying that it yields the same result.  This can be reasonably done in any number of ways; for example, a library could compute an ordinary checksum of the data and just sign the result in order to get the tamper-resistance advantages of the secret signing key (since otherwise an attacker could reliably overwrite both the data and the checksum).
+
+``value1`` and ``value2`` must be either pointers or integers.  If the integers are larger than ``uintptr_t`` then data not representa...

@llvmbot
Copy link
Member

llvmbot commented Sep 11, 2023

@llvm/pr-subscribers-backend-x86

Changes

This defines the basic set of pointer authentication clang builtins (provided in a new header, ptrauth.h), with diagnostics and IRGen support. The availability of the builtins is gated on a new flag, -fptrauth-intrinsics.

Note that this only includes the basic intrinsics, and notably excludes ptrauth_sign_constant, ptrauth_type_discriminator, and ptrauth_string_discriminator, which need extra logic to be fully supported.

This also introduces clang/docs/PointerAuthentication.rst, which describes the ptrauth model in general, as well as these builtins.

(Replaces https://reviews.llvm.org/D112941)

Patch is 93.58 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/65996.diff

30 Files Affected:

  • (modified) clang/docs/LanguageExtensions.rst (+5)
  • (added) clang/docs/PointerAuthentication.rst (+548)
  • (modified) clang/include/clang/Basic/Builtins.def (+8)
  • (modified) clang/include/clang/Basic/DiagnosticGroups.td (+1)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+16)
  • (modified) clang/include/clang/Basic/Features.def (+1)
  • (modified) clang/include/clang/Basic/LangOptions.def (+2)
  • (modified) clang/include/clang/Basic/TargetInfo.h (+6)
  • (modified) clang/include/clang/Driver/Options.td (+8)
  • (modified) clang/include/clang/Sema/Sema.h (+2)
  • (modified) clang/lib/Basic/Module.cpp (+4)
  • (modified) clang/lib/Basic/TargetInfo.cpp (+4)
  • (modified) clang/lib/Basic/Targets/AArch64.cpp (+6)
  • (modified) clang/lib/Basic/Targets/AArch64.h (+2)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+67)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+5)
  • (modified) clang/lib/Frontend/CompilerInvocation.cpp (+13)
  • (modified) clang/lib/Headers/CMakeLists.txt (+1)
  • (modified) clang/lib/Headers/module.modulemap (+5)
  • (added) clang/lib/Headers/ptrauth.h (+167)
  • (modified) clang/lib/Sema/SemaChecking.cpp (+182)
  • (added) clang/test/CodeGen/ptrauth-intrinsics.c (+73)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/module.modulemap (+8)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/ptrauth.h (+1)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/stddef.h (+1)
  • (added) clang/test/Modules/ptrauth-include-from-darwin.m (+6)
  • (added) clang/test/Preprocessor/ptrauth_feature.c (+10)
  • (added) clang/test/Sema/ptrauth-intrinsics-macro.c (+34)
  • (added) clang/test/Sema/ptrauth.c (+126)
  • (modified) llvm/docs/PointerAuth.md (+3)
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 11cbdca7a268fc3..49a3934d9d082fc 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -13,6 +13,7 @@ Clang Language Extensions
    BlockLanguageSpec
    Block-ABI-Apple
    AutomaticReferenceCounting
+   PointerAuthentication
    MatrixTypes
 
 Introduction
@@ -4157,6 +4158,10 @@ reordering of memory accesses and side effect instructions. Other instructions
 like simple arithmetic may be reordered around the intrinsic. If you expect to
 have no reordering at all, use inline assembly instead.
 
+Pointer Authentication
+^^^^^^^^^^^^^^^^^^^^^^
+See :doc:`PointerAuthentication`.
+
 X86/X86-64 Language Extensions
 ------------------------------
 
diff --git a/clang/docs/PointerAuthentication.rst b/clang/docs/PointerAuthentication.rst
new file mode 100644
index 000000000000000..87b8f244a2e4653
--- /dev/null
+++ b/clang/docs/PointerAuthentication.rst
@@ -0,0 +1,548 @@
+Pointer Authentication
+======================
+
+.. contents::
+   :local:
+
+Introduction
+------------
+
+Pointer authentication is a technology which offers strong probabilistic protection against exploiting a broad class of memory bugs to take control of program execution.  When adopted consistently in a language ABI, it provides a form of relatively fine-grained control flow integrity (CFI) check that resists both return-oriented programming (ROP) and jump-oriented programming (JOP) attacks.
+
+While pointer authentication can be implemented purely in software, direct hardware support (e.g. as provided by ARMv8.3) can dramatically lower the execution speed and code size costs.  Similarly, while pointer authentication can be implemented on any architecture, taking advantage of the (typically) excess addressing range of a target with 64-bit pointers minimizes the impact on memory performance and can allow interoperation with existing code (by disabling pointer authentication dynamically).  This document will generally attempt to present the pointer authentication feature independent of any hardware implementation or ABI.  Considerations that are implementation-specific are clearly identified throughout.
+
+Note that there are several different terms in use:
+
+- **Pointer authentication** is a target-independent language technology.
+
+- **ARMv8.3** is an AArch64 architecture revision of that provides hardware support for pointer authentication.  It is implemented on several shipping processors, including the Apple A12 and later.
+
+* **arm64e** is a specific ABI for (not yet fully stable) for implementing pointer authentication on ARMv8.3 on certain Apple operating systems.
+
+This document serves four purposes:
+
+- It describes the basic ideas of pointer authentication.
+
+- It documents several language extensions that are useful on targets using pointer authentication.
+
+- It presents a theory of operation for the security mitigation, describing the basic requirements for correctness, various weaknesses in the mechanism, and ways in which programmers can strengthen its protections (including recommendations for language implementors).
+
+- It will eventually document the language ABIs currently used for C, C++, Objective-C, and Swift on arm64e, although these are not yet stable on any target.
+
+Basic Concepts
+--------------
+
+The simple address of an object or function is a **raw pointer**.  A raw pointer can be **signed** to produce a **signed pointer**.  A signed pointer can be then **authenticated** in order to verify that it was **validly signed** and extract the original raw pointer.  These terms reflect the most likely implementation technique: computing and storing a cryptographic signature along with the pointer.  The security of pointer authentication does not rely on attackers not being able to separately overwrite the signature.
+
+An **abstract signing key** is a name which refers to a secret key which can used to sign and authenticate pointers.  The key value for a particular name is consistent throughout a process.
+
+A **discriminator** is an arbitrary value used to **diversify** signed pointers so that one validly-signed pointer cannot simply be copied over another.  A discriminator is simply opaque data of some implementation-defined size that is included in the signature as a salt.
+
+Nearly all aspects of pointer authentication use just these two primary operations:
+
+- ``sign(raw_pointer, key, discriminator)`` produces a signed pointer given a raw pointer, an abstract signing key, and a discriminator.
+
+- ``auth(signed_pointer, key, discriminator)`` produces a raw pointer given a signed pointer, an abstract signing key, and a discriminator.
+
+``auth(sign(raw_pointer, key, discriminator), key, discriminator)`` must succeed and produce ``raw_pointer``.  ``auth`` applied to a value that was ultimately produced in any other way is expected to immediately halt the program.  However, it is permitted for ``auth`` to fail to detect that a signed pointer was not produced in this way, in which case it may return anything; this is what makes pointer authentication a probabilistic mitigation rather than a perfect one.
+
+There are two secondary operations which are required only to implement certain intrinsics in ````:
+
+- ``strip(signed_pointer, key)`` produces a raw pointer given a signed pointer and a key it was presumptively signed with.  This is useful for certain kinds of tooling, such as crash backtraces; it should generally not be used in the basic language ABI except in very careful ways.
+
+- ``sign_generic(value)`` produces a cryptographic signature for arbitrary data, not necessarily a pointer.  This is useful for efficiently verifying that non-pointer data has not been tampered with.
+
+Whenever any of these operations is called for, the key value must be known statically.  This is because the layout of a signed pointer may vary according to the signing key.  (For example, in ARMv8.3, the layout of a signed pointer depends on whether TBI is enabled, which can be set independently for code and data pointers.)
+
+.. admonition:: Note for API designers and language implementors
+
+  These are the *primitive* operations of pointer authentication, provided for clarity of description.  They are not suitable either as high-level interfaces or as primitives in a compiler IR because they expose raw pointers.  Raw pointers require special attention in the language implementation to avoid the accidental creation of exploitable code sequences; see the section on `Attackable code sequences`_.
+
+The following details are all implementation-defined:
+
+- the nature of a signed pointer
+- the size of a discriminator
+- the number and nature of the signing keys
+- the implementation of the ``sign``, ``auth``, ``strip``, and ``sign_generic`` operations
+
+While the use of the terms "sign" and "signed pointer" suggest the use of a cryptographic signature, other implementations may be possible.  See `Alternative implementations`_ for an exploration of implementation options.
+
+.. admonition:: Implementation example: ARMv8.3
+
+  Readers may find it helpful to know how these terms map to ARMv8.3:
+
+  - A signed pointer is a pointer with a signature stored in the otherwise-unused high bits.  The kernel configures the signature width based on the system's addressing needs, accounting for whether the AArch64 TBI feature is enabled for the kind of pointer (code or data).
+
+  - A discriminator is a 64-bit integer.  Constant discriminators are 16-bit integers.  Blending a constant discriminator into an address consists of replacing the top 16 bits of the address with the constant.
+
+  - There are five 128-bit signing-key registers, each of which can only be directly read or set by privileged code.  Of these, four are used for signing pointers, and the fifth is used only for ``sign_generic``.  The key data is simply a pepper added to the hash, not an encryption key, and so can be initialized using random data.
+
+  - ``sign`` computes a cryptographic hash of the pointer, discriminator, and signing key, and stores it in the high bits as the signature. ``auth`` removes the signature, computes the same hash, and compares the result with the stored signature.  ``strip`` removes the signature without authenticating it.  While ARMv8.3's ``aut*`` instructions do not themselves trap on failure, the compiler only ever emits them in sequences that will trap.
+
+  - ``sign_generic`` corresponds to the ``pacga`` instruction, which takes two 64-bit values and produces a 64-bit cryptographic hash. Implementations of this instruction may not produce meaningful data in all bits of the result.
+
+Discriminators
+~~~~~~~~~~~~~~
+
+A discriminator is arbitrary extra data which alters the signature on a pointer.  When two pointers are signed differently --- either with different keys or with different discriminators --- an attacker cannot simply replace one pointer with the other.  For more information on why discriminators are important and how to use them effectively, see the section on `Substitution attacks`_.
+
+To use standard cryptographic terminology, a discriminator acts as a salt in the signing of a pointer, and the key data acts as a pepper.  That is, both the discriminator and key data are ultimately just added as inputs to the signing algorithm along with the pointer, but they serve significantly different roles.  The key data is a common secret added to every signature, whereas the discriminator is a signing-specific value that can be derived from the circumstances of how a pointer is signed.  However, unlike a password salt, it's important that discriminators be *independently* derived from the circumstances of the signing; they should never simply be stored alongside a pointer.
+
+The intrinsic interface in ```` allows an arbitrary discriminator value to be provided, but can only be used when running normal code.  The discriminators used by language ABIs must be restricted to make it feasible for the loader to sign pointers stored in global memory without needing excessive amounts of metadata.  Under these restrictions, a discriminator may consist of either or both of the following:
+
+- The address at which the pointer is stored in memory.  A pointer signed with a discriminator which incorporates its storage address is said to have **address diversity**.  In general, using address diversity means that a pointer cannot be reliably replaced by an attacker or used to reliably replace a different pointer.  However, an attacker may still be able to attack a larger call sequence if they can alter the address through which the pointer is accessed.  Furthermore, some situations cannot use address diversity because of language or other restrictions.
+
+- A constant integer, called a **constant discriminator**. A pointer signed with a non-zero constant discriminator is said to have **constant diversity**.  If the discriminator is specific to a single declaration, it is said to have **declaration diversity**; if the discriminator is specific to a type of value, it is said to have **type diversity**.  For example, C++ v-tables on arm64e sign their component functions using a hash of their method names and signatures, which provides declaration diversity; similarly, C++ member function pointers sign their invocation functions using a hash of the member pointer type, which provides type diversity.
+
+The implementation may need to restrict constant discriminators to be significantly smaller than the full size of a discriminator.  For example, on arm64e, constant discriminators are only 16-bit values.  This is believed to not significantly weaken the mitigation, since collisions remain uncommon.
+
+The algorithm for blending a constant discriminator with a storage address is implementation-defined.
+
+.. _Signing schemas:
+
+Signing schemas
+~~~~~~~~~~~~~~~
+
+Correct use of pointer authentication requires the signing code and the authenticating code to agree about the **signing schema** for the pointer:
+
+- the abstract signing key with which the pointer should be signed and
+- an algorithm for computing the discriminator.
+
+As described in the section above on `Discriminators`_, in most situations, the discriminator is produced by taking a constant discriminator and optionally blending it with the storage address of the pointer.  In these situations, the signing schema breaks down even more simply:
+
+- the abstract signing key,
+- a constant discriminator, and
+- whether to use address diversity.
+
+It is important that the signing schema be independently derived at all signing and authentication sites.  Preferably, the schema should be hard-coded everywhere it is needed, but at the very least, it must not be derived by inspecting information stored along with the pointer.  See the section on `Attacks on pointer authentication`_ for more information.
+
+Language Features
+-----------------
+
+There is currently one main pointer authentication language feature:
+
+- The language provides the ```` intrinsic interface for manually signing and authenticating pointers in code.  These can be used in circumstances where very specific behavior is required.
+
+
+Language extensions
+~~~~~~~~~~~~~~~~~~~
+
+Feature testing
+^^^^^^^^^^^^^^^
+
+Whether the current target uses pointer authentication can be tested for with a number of different tests.
+
+- ``__has_feature(ptrauth_intrinsics)`` is true if ```` provides its normal interface.  This may be true even on targets where pointer authentication is not enabled by default.
+
+````
+~~~~~~~~~~~~~~~
+
+This header defines the following types and operations:
+
+``ptrauth_key``
+^^^^^^^^^^^^^^^
+
+This ``enum`` is the type of abstract signing keys.  In addition to defining the set of implementation-specific signing keys (for example, ARMv8.3 defines ``ptrauth_key_asia``), it also defines some portable aliases for those keys.  For example, ``ptrauth_key_function_pointer`` is the key generally used for C function pointers, which will generally be suitable for other function-signing schemas.
+
+In all the operation descriptions below, key values must be constant values corresponding to one of the implementation-specific abstract signing keys from this ``enum``.
+
+``ptrauth_extra_data_t``
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+This is a ``typedef`` of a standard integer type of the correct size to hold a discriminator value.
+
+In the signing and authentication operation descriptions below, discriminator values must have either pointer type or integer type. If the discriminator is an integer, it will be coerced to ``ptrauth_extra_data_t``.
+
+``ptrauth_blend_discriminator``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_blend_discriminator(pointer, integer)
+
+Produce a discriminator value which blends information from the given pointer and the given integer.
+
+Implementations may ignore some bits from each value, which is to say, the blending algorithm may be chosen for speed and convenience over theoretical strength as a hash-combining algorithm.  For example, arm64e simply overwrites the high 16 bits of the pointer with the low 16 bits of the integer, which can be done in a single instruction with an immediate integer.
+
+``pointer`` must have pointer type, and ``integer`` must have integer type. The result has type ``ptrauth_extra_data_t``.
+
+``ptrauth_strip``
+^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_strip(signedPointer, key)
+
+Given that ``signedPointer`` matches the layout for signed pointers signed with the given key, extract the raw pointer from it.  This operation does not trap and cannot fail, even if the pointer is not validly signed.
+
+``ptrauth_sign_unauthenticated``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_sign_unauthenticated(pointer, key, discriminator)
+
+Produce a signed pointer for the given raw pointer without applying any authentication or extra treatment.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+This is a treacherous operation that can easily result in `signing oracles`_.  Programs should use it seldom and carefully.
+
+``ptrauth_auth_and_resign``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_auth_and_resign(pointer, oldKey, oldDiscriminator, newKey, newDiscriminator)
+
+Authenticate that ``pointer`` is signed with ``oldKey`` and ``oldDiscriminator`` and then resign the raw-pointer result of that authentication with ``newKey`` and ``newDiscriminator``.
+
+``pointer`` must have pointer type.  The result will have the same type as ``pointer``.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+The code sequence produced for this operation must not be directly attackable.  However, if the discriminator values are not constant integers, their computations may still be attackable.  In the future, Clang should be enhanced to guaranteed non-attackability if these expressions are :ref:`safely-derived`.
+
+``ptrauth_auth_data``
+^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_auth_data(pointer, key, discriminator)
+
+Authenticate that ``pointer`` is signed with ``key`` and ``discriminator`` and remove the signature.
+
+``pointer`` must have object pointer type.  The result will have the same type as ``pointer``.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+In the future when Clang makes `safe derivation`_ guarantees, the result of this operation should be considered safely-derived.
+
+``ptrauth_sign_generic_data``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_sign_generic_data(value1, value2)
+
+Computes a signature for the given pair of values, incorporating a secret signing key.
+
+This operation can be used to verify that arbitrary data has not be tampered with by computing a signature for the data, storing that signature, and then repeating this process and verifying that it yields the same result.  This can be reasonably done in any number of ways; for example, a library could compute an ordinary checksum of the data and just sign the result in order to get the tamper-resistance advantages of the secret signing key (since otherwise an attacker could reliably overwrite both the data and the checksum).
+
+``value1`` and ``value2`` must be either pointers or integers.  If the integers are larger than ``uintptr_t`` then data not representa...

@llvmbot
Copy link
Member

llvmbot commented Sep 11, 2023

@llvm/pr-subscribers-clang-modules

Changes

This defines the basic set of pointer authentication clang builtins (provided in a new header, ptrauth.h), with diagnostics and IRGen support. The availability of the builtins is gated on a new flag, -fptrauth-intrinsics.

Note that this only includes the basic intrinsics, and notably excludes ptrauth_sign_constant, ptrauth_type_discriminator, and ptrauth_string_discriminator, which need extra logic to be fully supported.

This also introduces clang/docs/PointerAuthentication.rst, which describes the ptrauth model in general, as well as these builtins.

(Replaces https://reviews.llvm.org/D112941)

Patch is 93.58 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/65996.diff

30 Files Affected:

  • (modified) clang/docs/LanguageExtensions.rst (+5)
  • (added) clang/docs/PointerAuthentication.rst (+548)
  • (modified) clang/include/clang/Basic/Builtins.def (+8)
  • (modified) clang/include/clang/Basic/DiagnosticGroups.td (+1)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+16)
  • (modified) clang/include/clang/Basic/Features.def (+1)
  • (modified) clang/include/clang/Basic/LangOptions.def (+2)
  • (modified) clang/include/clang/Basic/TargetInfo.h (+6)
  • (modified) clang/include/clang/Driver/Options.td (+8)
  • (modified) clang/include/clang/Sema/Sema.h (+2)
  • (modified) clang/lib/Basic/Module.cpp (+4)
  • (modified) clang/lib/Basic/TargetInfo.cpp (+4)
  • (modified) clang/lib/Basic/Targets/AArch64.cpp (+6)
  • (modified) clang/lib/Basic/Targets/AArch64.h (+2)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+67)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+5)
  • (modified) clang/lib/Frontend/CompilerInvocation.cpp (+13)
  • (modified) clang/lib/Headers/CMakeLists.txt (+1)
  • (modified) clang/lib/Headers/module.modulemap (+5)
  • (added) clang/lib/Headers/ptrauth.h (+167)
  • (modified) clang/lib/Sema/SemaChecking.cpp (+182)
  • (added) clang/test/CodeGen/ptrauth-intrinsics.c (+73)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/module.modulemap (+8)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/ptrauth.h (+1)
  • (added) clang/test/Modules/Inputs/ptrauth-include-from-darwin/stddef.h (+1)
  • (added) clang/test/Modules/ptrauth-include-from-darwin.m (+6)
  • (added) clang/test/Preprocessor/ptrauth_feature.c (+10)
  • (added) clang/test/Sema/ptrauth-intrinsics-macro.c (+34)
  • (added) clang/test/Sema/ptrauth.c (+126)
  • (modified) llvm/docs/PointerAuth.md (+3)
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 11cbdca7a268fc3..49a3934d9d082fc 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -13,6 +13,7 @@ Clang Language Extensions
    BlockLanguageSpec
    Block-ABI-Apple
    AutomaticReferenceCounting
+   PointerAuthentication
    MatrixTypes
 
 Introduction
@@ -4157,6 +4158,10 @@ reordering of memory accesses and side effect instructions. Other instructions
 like simple arithmetic may be reordered around the intrinsic. If you expect to
 have no reordering at all, use inline assembly instead.
 
+Pointer Authentication
+^^^^^^^^^^^^^^^^^^^^^^
+See :doc:`PointerAuthentication`.
+
 X86/X86-64 Language Extensions
 ------------------------------
 
diff --git a/clang/docs/PointerAuthentication.rst b/clang/docs/PointerAuthentication.rst
new file mode 100644
index 000000000000000..87b8f244a2e4653
--- /dev/null
+++ b/clang/docs/PointerAuthentication.rst
@@ -0,0 +1,548 @@
+Pointer Authentication
+======================
+
+.. contents::
+   :local:
+
+Introduction
+------------
+
+Pointer authentication is a technology which offers strong probabilistic protection against exploiting a broad class of memory bugs to take control of program execution.  When adopted consistently in a language ABI, it provides a form of relatively fine-grained control flow integrity (CFI) check that resists both return-oriented programming (ROP) and jump-oriented programming (JOP) attacks.
+
+While pointer authentication can be implemented purely in software, direct hardware support (e.g. as provided by ARMv8.3) can dramatically lower the execution speed and code size costs.  Similarly, while pointer authentication can be implemented on any architecture, taking advantage of the (typically) excess addressing range of a target with 64-bit pointers minimizes the impact on memory performance and can allow interoperation with existing code (by disabling pointer authentication dynamically).  This document will generally attempt to present the pointer authentication feature independent of any hardware implementation or ABI.  Considerations that are implementation-specific are clearly identified throughout.
+
+Note that there are several different terms in use:
+
+- **Pointer authentication** is a target-independent language technology.
+
+- **ARMv8.3** is an AArch64 architecture revision of that provides hardware support for pointer authentication.  It is implemented on several shipping processors, including the Apple A12 and later.
+
+* **arm64e** is a specific ABI for (not yet fully stable) for implementing pointer authentication on ARMv8.3 on certain Apple operating systems.
+
+This document serves four purposes:
+
+- It describes the basic ideas of pointer authentication.
+
+- It documents several language extensions that are useful on targets using pointer authentication.
+
+- It presents a theory of operation for the security mitigation, describing the basic requirements for correctness, various weaknesses in the mechanism, and ways in which programmers can strengthen its protections (including recommendations for language implementors).
+
+- It will eventually document the language ABIs currently used for C, C++, Objective-C, and Swift on arm64e, although these are not yet stable on any target.
+
+Basic Concepts
+--------------
+
+The simple address of an object or function is a **raw pointer**.  A raw pointer can be **signed** to produce a **signed pointer**.  A signed pointer can be then **authenticated** in order to verify that it was **validly signed** and extract the original raw pointer.  These terms reflect the most likely implementation technique: computing and storing a cryptographic signature along with the pointer.  The security of pointer authentication does not rely on attackers not being able to separately overwrite the signature.
+
+An **abstract signing key** is a name which refers to a secret key which can used to sign and authenticate pointers.  The key value for a particular name is consistent throughout a process.
+
+A **discriminator** is an arbitrary value used to **diversify** signed pointers so that one validly-signed pointer cannot simply be copied over another.  A discriminator is simply opaque data of some implementation-defined size that is included in the signature as a salt.
+
+Nearly all aspects of pointer authentication use just these two primary operations:
+
+- ``sign(raw_pointer, key, discriminator)`` produces a signed pointer given a raw pointer, an abstract signing key, and a discriminator.
+
+- ``auth(signed_pointer, key, discriminator)`` produces a raw pointer given a signed pointer, an abstract signing key, and a discriminator.
+
+``auth(sign(raw_pointer, key, discriminator), key, discriminator)`` must succeed and produce ``raw_pointer``.  ``auth`` applied to a value that was ultimately produced in any other way is expected to immediately halt the program.  However, it is permitted for ``auth`` to fail to detect that a signed pointer was not produced in this way, in which case it may return anything; this is what makes pointer authentication a probabilistic mitigation rather than a perfect one.
+
+There are two secondary operations which are required only to implement certain intrinsics in ````:
+
+- ``strip(signed_pointer, key)`` produces a raw pointer given a signed pointer and a key it was presumptively signed with.  This is useful for certain kinds of tooling, such as crash backtraces; it should generally not be used in the basic language ABI except in very careful ways.
+
+- ``sign_generic(value)`` produces a cryptographic signature for arbitrary data, not necessarily a pointer.  This is useful for efficiently verifying that non-pointer data has not been tampered with.
+
+Whenever any of these operations is called for, the key value must be known statically.  This is because the layout of a signed pointer may vary according to the signing key.  (For example, in ARMv8.3, the layout of a signed pointer depends on whether TBI is enabled, which can be set independently for code and data pointers.)
+
+.. admonition:: Note for API designers and language implementors
+
+  These are the *primitive* operations of pointer authentication, provided for clarity of description.  They are not suitable either as high-level interfaces or as primitives in a compiler IR because they expose raw pointers.  Raw pointers require special attention in the language implementation to avoid the accidental creation of exploitable code sequences; see the section on `Attackable code sequences`_.
+
+The following details are all implementation-defined:
+
+- the nature of a signed pointer
+- the size of a discriminator
+- the number and nature of the signing keys
+- the implementation of the ``sign``, ``auth``, ``strip``, and ``sign_generic`` operations
+
+While the use of the terms "sign" and "signed pointer" suggest the use of a cryptographic signature, other implementations may be possible.  See `Alternative implementations`_ for an exploration of implementation options.
+
+.. admonition:: Implementation example: ARMv8.3
+
+  Readers may find it helpful to know how these terms map to ARMv8.3:
+
+  - A signed pointer is a pointer with a signature stored in the otherwise-unused high bits.  The kernel configures the signature width based on the system's addressing needs, accounting for whether the AArch64 TBI feature is enabled for the kind of pointer (code or data).
+
+  - A discriminator is a 64-bit integer.  Constant discriminators are 16-bit integers.  Blending a constant discriminator into an address consists of replacing the top 16 bits of the address with the constant.
+
+  - There are five 128-bit signing-key registers, each of which can only be directly read or set by privileged code.  Of these, four are used for signing pointers, and the fifth is used only for ``sign_generic``.  The key data is simply a pepper added to the hash, not an encryption key, and so can be initialized using random data.
+
+  - ``sign`` computes a cryptographic hash of the pointer, discriminator, and signing key, and stores it in the high bits as the signature. ``auth`` removes the signature, computes the same hash, and compares the result with the stored signature.  ``strip`` removes the signature without authenticating it.  While ARMv8.3's ``aut*`` instructions do not themselves trap on failure, the compiler only ever emits them in sequences that will trap.
+
+  - ``sign_generic`` corresponds to the ``pacga`` instruction, which takes two 64-bit values and produces a 64-bit cryptographic hash. Implementations of this instruction may not produce meaningful data in all bits of the result.
+
+Discriminators
+~~~~~~~~~~~~~~
+
+A discriminator is arbitrary extra data which alters the signature on a pointer.  When two pointers are signed differently --- either with different keys or with different discriminators --- an attacker cannot simply replace one pointer with the other.  For more information on why discriminators are important and how to use them effectively, see the section on `Substitution attacks`_.
+
+To use standard cryptographic terminology, a discriminator acts as a salt in the signing of a pointer, and the key data acts as a pepper.  That is, both the discriminator and key data are ultimately just added as inputs to the signing algorithm along with the pointer, but they serve significantly different roles.  The key data is a common secret added to every signature, whereas the discriminator is a signing-specific value that can be derived from the circumstances of how a pointer is signed.  However, unlike a password salt, it's important that discriminators be *independently* derived from the circumstances of the signing; they should never simply be stored alongside a pointer.
+
+The intrinsic interface in ```` allows an arbitrary discriminator value to be provided, but can only be used when running normal code.  The discriminators used by language ABIs must be restricted to make it feasible for the loader to sign pointers stored in global memory without needing excessive amounts of metadata.  Under these restrictions, a discriminator may consist of either or both of the following:
+
+- The address at which the pointer is stored in memory.  A pointer signed with a discriminator which incorporates its storage address is said to have **address diversity**.  In general, using address diversity means that a pointer cannot be reliably replaced by an attacker or used to reliably replace a different pointer.  However, an attacker may still be able to attack a larger call sequence if they can alter the address through which the pointer is accessed.  Furthermore, some situations cannot use address diversity because of language or other restrictions.
+
+- A constant integer, called a **constant discriminator**. A pointer signed with a non-zero constant discriminator is said to have **constant diversity**.  If the discriminator is specific to a single declaration, it is said to have **declaration diversity**; if the discriminator is specific to a type of value, it is said to have **type diversity**.  For example, C++ v-tables on arm64e sign their component functions using a hash of their method names and signatures, which provides declaration diversity; similarly, C++ member function pointers sign their invocation functions using a hash of the member pointer type, which provides type diversity.
+
+The implementation may need to restrict constant discriminators to be significantly smaller than the full size of a discriminator.  For example, on arm64e, constant discriminators are only 16-bit values.  This is believed to not significantly weaken the mitigation, since collisions remain uncommon.
+
+The algorithm for blending a constant discriminator with a storage address is implementation-defined.
+
+.. _Signing schemas:
+
+Signing schemas
+~~~~~~~~~~~~~~~
+
+Correct use of pointer authentication requires the signing code and the authenticating code to agree about the **signing schema** for the pointer:
+
+- the abstract signing key with which the pointer should be signed and
+- an algorithm for computing the discriminator.
+
+As described in the section above on `Discriminators`_, in most situations, the discriminator is produced by taking a constant discriminator and optionally blending it with the storage address of the pointer.  In these situations, the signing schema breaks down even more simply:
+
+- the abstract signing key,
+- a constant discriminator, and
+- whether to use address diversity.
+
+It is important that the signing schema be independently derived at all signing and authentication sites.  Preferably, the schema should be hard-coded everywhere it is needed, but at the very least, it must not be derived by inspecting information stored along with the pointer.  See the section on `Attacks on pointer authentication`_ for more information.
+
+Language Features
+-----------------
+
+There is currently one main pointer authentication language feature:
+
+- The language provides the ```` intrinsic interface for manually signing and authenticating pointers in code.  These can be used in circumstances where very specific behavior is required.
+
+
+Language extensions
+~~~~~~~~~~~~~~~~~~~
+
+Feature testing
+^^^^^^^^^^^^^^^
+
+Whether the current target uses pointer authentication can be tested for with a number of different tests.
+
+- ``__has_feature(ptrauth_intrinsics)`` is true if ```` provides its normal interface.  This may be true even on targets where pointer authentication is not enabled by default.
+
+````
+~~~~~~~~~~~~~~~
+
+This header defines the following types and operations:
+
+``ptrauth_key``
+^^^^^^^^^^^^^^^
+
+This ``enum`` is the type of abstract signing keys.  In addition to defining the set of implementation-specific signing keys (for example, ARMv8.3 defines ``ptrauth_key_asia``), it also defines some portable aliases for those keys.  For example, ``ptrauth_key_function_pointer`` is the key generally used for C function pointers, which will generally be suitable for other function-signing schemas.
+
+In all the operation descriptions below, key values must be constant values corresponding to one of the implementation-specific abstract signing keys from this ``enum``.
+
+``ptrauth_extra_data_t``
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+This is a ``typedef`` of a standard integer type of the correct size to hold a discriminator value.
+
+In the signing and authentication operation descriptions below, discriminator values must have either pointer type or integer type. If the discriminator is an integer, it will be coerced to ``ptrauth_extra_data_t``.
+
+``ptrauth_blend_discriminator``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_blend_discriminator(pointer, integer)
+
+Produce a discriminator value which blends information from the given pointer and the given integer.
+
+Implementations may ignore some bits from each value, which is to say, the blending algorithm may be chosen for speed and convenience over theoretical strength as a hash-combining algorithm.  For example, arm64e simply overwrites the high 16 bits of the pointer with the low 16 bits of the integer, which can be done in a single instruction with an immediate integer.
+
+``pointer`` must have pointer type, and ``integer`` must have integer type. The result has type ``ptrauth_extra_data_t``.
+
+``ptrauth_strip``
+^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_strip(signedPointer, key)
+
+Given that ``signedPointer`` matches the layout for signed pointers signed with the given key, extract the raw pointer from it.  This operation does not trap and cannot fail, even if the pointer is not validly signed.
+
+``ptrauth_sign_unauthenticated``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_sign_unauthenticated(pointer, key, discriminator)
+
+Produce a signed pointer for the given raw pointer without applying any authentication or extra treatment.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+This is a treacherous operation that can easily result in `signing oracles`_.  Programs should use it seldom and carefully.
+
+``ptrauth_auth_and_resign``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_auth_and_resign(pointer, oldKey, oldDiscriminator, newKey, newDiscriminator)
+
+Authenticate that ``pointer`` is signed with ``oldKey`` and ``oldDiscriminator`` and then resign the raw-pointer result of that authentication with ``newKey`` and ``newDiscriminator``.
+
+``pointer`` must have pointer type.  The result will have the same type as ``pointer``.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+The code sequence produced for this operation must not be directly attackable.  However, if the discriminator values are not constant integers, their computations may still be attackable.  In the future, Clang should be enhanced to guaranteed non-attackability if these expressions are :ref:`safely-derived`.
+
+``ptrauth_auth_data``
+^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_auth_data(pointer, key, discriminator)
+
+Authenticate that ``pointer`` is signed with ``key`` and ``discriminator`` and remove the signature.
+
+``pointer`` must have object pointer type.  The result will have the same type as ``pointer``.  This operation is not required to have the same behavior on a null pointer that the language implementation would.
+
+In the future when Clang makes `safe derivation`_ guarantees, the result of this operation should be considered safely-derived.
+
+``ptrauth_sign_generic_data``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+  ptrauth_sign_generic_data(value1, value2)
+
+Computes a signature for the given pair of values, incorporating a secret signing key.
+
+This operation can be used to verify that arbitrary data has not be tampered with by computing a signature for the data, storing that signature, and then repeating this process and verifying that it yields the same result.  This can be reasonably done in any number of ways; for example, a library could compute an ordinary checksum of the data and just sign the result in order to get the tamper-resistance advantages of the secret signing key (since otherwise an attacker could reliably overwrite both the data and the checksum).
+
+``value1`` and ``value2`` must be either pointers or integers.  If the integers are larger than ``uintptr_t`` then data not representa...

Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just went through the document. Great to see the possible attack vectors listed at the end, this will be a good reference for folks.


- ``sign_generic(value)`` produces a cryptographic signature for arbitrary data, not necessarily a pointer. This is useful for efficiently verifying that non-pointer data has not been tampered with.

Whenever any of these operations is called for, the key value must be known statically. This is because the layout of a signed pointer may vary according to the signing key. (For example, in ARMv8.3, the layout of a signed pointer depends on whether TBI is enabled, which can be set independently for code and data pointers.)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Top Byte Ignore (TBI) is enabled"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think you might be mixing up concerns here. TBI on or off changes the amount of bits the signature can use up in any given pointer. The choice of data or instruction pointer is whether you use the D or I instruction variant (https://developer.arm.com/documentation/dui0801/g/A64-General-Instructions/PACDA--PACDZA).

You could equally use PACDB to sign your data pointers, it's up to your ABI. So yes, a combination of ABI and platform ABI will change pointer layout, but I'm not sure the Arm example really makes much sense as it stands if you know the details.

My point is basically that the signing key doesn't tell you if TBI is enabled. Your platform ABI would tell you that. I see why it's passed to these operations, but I think the TBI note muddles the explanation some.

You could make it more obviously two different things:

There are other factors that can influence the layout, for example on ARMv8.3 whether Top Byte Ignore (TBI) is enabled or not. These are platform ABI choices....

And while TBI being possible to enable independently for data and code is nice, it's not needed to make the point that the pointer layout can be changed by it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you're right that this about the key rather than the pointer, but I'd rather avoid going even further into the details of TBI being platform ABI! How about simply saying "whether TBI is enabled, which can be set independently for I and D keys." We don't really talk about the PAuth keys elsewhere, but for illustrative purposes this seems fine?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key thing for me is that the layout of a pointer doesn't depend on the signing key. It sort of does but that's a secondary effect from TBI.

You could say that the layout is effected by the signing key and other factors such as .

Which is correct enough for as you said, just illustrating that it can change.


An attacker can simply overwrite a pointer intended for one purpose with a pointer intended for another purpose if both purposes use the same signing schema and that schema does not use address diversity.

The most common source of this weakness is when code relies on using the default language rules for C function pointers. The current implementation uses the exact same signing schema for all C function pointers, even for functions of substantially different type. While efforts are ongoing to improve constant diversity for C function pointers of different type, there are necessary limits to this. The C standard requires function pointers to be copyable with ``memcpy``, which means that function pointers can never use address diversity. Furthermore, even if a function pointer can only be replaced with another function of the exact same type, that can still be useful to an attacker, as in the following example of a hand-rolled "v-table":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to briefly say what compatible with memcpy means.

My guess is that because the signature may occupy bits significant to addressing, attempting to deference a signed pointer will fault. Which is exactly what memcpy would try to do.

But then you mention address diversity in the next sentence which sounds more like something to do with discriminators, so I'm not sure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps memcpy, even if it could auth the pointer, wouldn't know where the pointer had been stored. And it would need to know that if address diversity had been used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, this says "copyable", not "compatible": if a pointer signed with address diversity is copied, it needs to be re-signed with the destination address. So it's not that memcpy consumes the pointer itself, merely that it doesn't know that it's copying function pointers, so can't ever preserve address diversity. (one could imagine a memcpy implementation that's somehow informed of all signed address-diversified pointers, and that actually did resign on the fly, but that's just silly)

This attack only affects ordinary programmers if they are using certain treacherous patterns of code. Currently this includes:

- all uses of the ``__ptrauth_sign_unauthenticated`` intrinsic and
- assigning data pointers to ``__ptrauth``-qualified l-values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you show an example? I know this is an advanced document but many (including me) will be hazy on what l and r values are. Especially if we are reading these sections out of interest in security not compiler/language development.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't answer your question, but on second thought, it may be best to leave this out for now, until the qualifier support patches.

I did something quite awkward with this patch, in trying to split out the general-enough bits (that would make a useful starting point to document builtins and the general programming and security models). But maybe this should stick to the absolute basics, and we can add these considerations later (in a more dignified dedicated PR and commit!)

@ahmedbougacha ahmedbougacha force-pushed the eng/abougacha/ptrauth-clang-builtins branch from 6d13591 to fccf064 Compare September 13, 2023 00:16
@DavidSpickett
Copy link
Collaborator

DavidSpickett commented Sep 13, 2023

Maybe I just need to learn how to use Github, but did half the document disappear?

I see a reference to the attacks section See the section on ``Attacks on pointer authentication``_ for more information. but the section is gone from the diff even in a private window.

@atrosinenko
Copy link
Contributor

As already mentioned by @DavidSpickett , part of PointerAuthentication.rst file disappeared, thus some links are dangling now.

@atrosinenko
Copy link
Contributor

Meanwhile, isn't the C++ standard use the term "safely-derived pointer", too?


The implementation may need to restrict constant discriminators to be significantly smaller than the full size of a discriminator. For example, on arm64e, constant discriminators are only 16-bit values. This is believed to not significantly weaken the mitigation, since collisions remain uncommon.

The algorithm for blending a constant discriminator with a storage address is implementation-defined.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should say "... with a pointer ..." rather than " ... with a storage address ..."? See similar earlier remark of blending a pointer, rather than an address.


Correct use of pointer authentication requires the signing code and the authenticating code to agree about the **signing schema** for the pointer:

- the abstract signing key with which the pointer should be signed and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worthwhile to add an explicit bullet point saying "- whether the pointer should be signed at all"? For example, the pac-ret signing scheme only signs return address pointers.

Correct use of pointer authentication requires the signing code and the authenticating code to agree about the **signing schema** for the pointer:

- the abstract signing key with which the pointer should be signed and
- an algorithm for computing the discriminator.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An earlier remark of mine in this review also stated that it is signing-schema dependent whether there's a guarantee that language-level authentication instructions will trigger a fault on the authentication of a pointer, or on the first use of a pointer after authentication.
Maybe this should be added as an explicit bullet point as to what defines a signing schema here?


It is important that the signing schema be independently derived at all signing and authentication sites. Preferably, the schema should be hard-coded everywhere it is needed, but at the very least, it must not be derived by inspecting information stored along with the pointer. See the section on `Attacks on pointer authentication`_ for more information.

Language Features
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit Pick. At this point, I was wondering, shouldn't this section say "C and C++ language features" rather than "language features"?
And then I noticed that this documentation lives in "clang/docs/...", so in a C and C++ (and Objective-C(++)) specific area.
Everything above this line in this document is, I think, mostly independent of source language. Which would make it feasible to split this documentation in a generic "LLVM" part (the part above this line); and a language-specific/frontend-specific part (the part below this line).
I'm not sure if that would make sense though - splitting the documentation may make it harder to follow and keep consistent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the llvm docs mainly focuses on IR definitions and backend details. It points to this clang page otherwise, so it should be easily discoverable for other implementations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this document is not meant to be LLVM documentation at all. The part above describes the basic concepts of pointer authentication and is intentionally mostly divorced from how those concepts are presented in C/C++ or LLVM. The LLVM docs link to it because you need to understand those basic concepts before you can understand the LLVM intrinsics. It's roughly analogous to how you're expected to understand the basic concepts of throwing, unwinding, and catching exceptions before you go off to read about the LLVM landingpad instruction.

@asl
Copy link
Collaborator

asl commented Dec 4, 2023

Looks like the review stalled. Where we are here? @ahmedbougacha @ChuanqiXu9 ?

@ahmedbougacha ahmedbougacha force-pushed the eng/abougacha/ptrauth-clang-builtins branch from fccf064 to c4e7c86 Compare January 8, 2024 16:38
Copy link

github-actions bot commented Jan 8, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@ahmedbougacha ahmedbougacha force-pushed the eng/abougacha/ptrauth-clang-builtins branch from cd71428 to c5cde09 Compare February 26, 2024 17:43
@ahmedbougacha
Copy link
Member Author

Updated, tried to address all the comments; let me know if I missed something!

Copy link
Collaborator

@kbeyls kbeyls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just read through the documentation parts of this PR again.
I think that the quality is more than good enough to land, even though there are a few very minor remarks by myself and @DavidSpickett seemingly still open.

I haven't looked again at the details of the implementation, but remember having gone through it quite a long time ago. I'm happy for this PR to land.

…trinsics.

This defines the basic set of pointer authentication clang builtins
(provided in a new header, ptrauth.h), with diagnostics and IRGen
support.  The availability of the builtins is gated on a new flag,
`-fptrauth-intrinsics`.

Note that this only includes the basic intrinsics, and notably excludes
`ptrauth_sign_constant`, `ptrauth_type_discriminator`, and
`ptrauth_string_discriminator`, which need extra logic to be fully
supported.

This also introduces clang/docs/PointerAuthentication.rst, which
describes the ptrauth model in general, in addition to these builtins.

Co-Authored-By: Akira Hatanaka <[email protected]>
Co-Authored-By: John McCall <[email protected]>
@ahmedbougacha ahmedbougacha force-pushed the eng/abougacha/ptrauth-clang-builtins branch from c5cde09 to 1492d63 Compare March 15, 2024 21:15
@ahmedbougacha ahmedbougacha requested a review from Endilll as a code owner March 15, 2024 21:15
@ahmedbougacha ahmedbougacha merged commit 0481f04 into llvm:main Mar 15, 2024
@ahmedbougacha ahmedbougacha deleted the eng/abougacha/ptrauth-clang-builtins branch March 15, 2024 21:17
@Endilll Endilll removed request for a team and Endilll March 16, 2024 09:22
MaskRay added a commit that referenced this pull request Mar 26, 2024
And add a driver test missing from the original patch #65996.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 backend:X86 clang:codegen IR generation bugs: mangling, exceptions, etc. clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

8 participants