Skip to content

[Support] Add SipHash-based 16-bit ptrauth stable hash. #93902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 15, 2024

Conversation

ahmedbougacha
Copy link
Member

@ahmedbougacha ahmedbougacha commented May 31, 2024

Depends on:

This finally wraps the now-lightly-modified SipHash C reference
implementation, for the main interface we need (16-bit ptrauth
discriminators).

The exact algorithm is the little-endian interpretation of the
non-doubled (i.e. 64-bit) result of applying a SipHash-2-4 using the
constant seed b5d4c9eb79104a796fec8b1b428781d4 (big-endian), with the
result reduced by modulo to the range of non-zero discriminators (i.e.
(rawHash % 65535) + 1).

By "stable" we mean that the result of this hash algorithm will the same
across different compiler versions and target platforms.

The 16-bit hashes are used extensively for the AArch64 ptrauth ABI,
because AArch64 can efficiently load a 16-bit immediate into the high
bits of a register without disturbing the remainder of the value, which
serves as a nice blend operation.

16 bits is also sufficiently compact to not inflate a loader relocation.
We disallow zero to guarantee a different discriminator from the places
in the ABI that use a constant zero.

@llvmbot
Copy link
Member

llvmbot commented May 31, 2024

@llvm/pr-subscribers-llvm-support

Author: Ahmed Bougacha (ahmedbougacha)

Changes

Based on the SipHash reference implementation:
https://github.com/veorq/SipHash
which has very graciously been licensed under our llvm license (Apache-2.0 WITH LLVM-exception) by Jean-Philippe Aumasson.

This lightly modifies it to fit into libSupport, and wraps it for the two main interfaces we're interested in (16/64-bit).

This intentionally doesn't expose a raw interface beyond that to encourage others to carefully consider their use.
Beyond that, I'm ambivalent about naming this ptrauth, siphash, stablehash, or which combination of the three (which is what this patch does for the symbols, but keeping SipHash for the file, for other potential usage.)

The exact algorithm is the little-endian interpretation of the non-doubled (i.e. 64-bit) result of applying a SipHash-2-4 using a specific key value which can be found in the source.

By "stable" we mean that the result of this hash algorithm will the same across different compiler versions and target platforms.

The 16-bit variant is used extensively for the AArch64 ptrauth ABI, because AArch64 can efficiently load a 16-bit immediate into the high bits of a register without disturbing the remainder of the value, which serves as a nice blend operation.

16 bits is also sufficiently compact to not inflate a loader relocation. We disallow zero to guarantee a different discriminator from the places in the ABI that use a constant zero.


Full diff: https://github.com/llvm/llvm-project/pull/93902.diff

5 Files Affected:

  • (added) llvm/include/llvm/Support/SipHash.h (+47)
  • (modified) llvm/lib/Support/CMakeLists.txt (+1)
  • (added) llvm/lib/Support/SipHash.cpp (+174)
  • (modified) llvm/unittests/Support/CMakeLists.txt (+1)
  • (added) llvm/unittests/Support/SipHashTest.cpp (+43)
diff --git a/llvm/include/llvm/Support/SipHash.h b/llvm/include/llvm/Support/SipHash.h
new file mode 100644
index 0000000000000..fcc29c00da185
--- /dev/null
+++ b/llvm/include/llvm/Support/SipHash.h
@@ -0,0 +1,47 @@
+//===--- SipHash.h - An ABI-stable string SipHash ---------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// A family of ABI-stable string hash algorithms based on SipHash, currently
+// used to compute ptrauth discriminators.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_SUPPORT_SIPHASH_H
+#define LLVM_SUPPORT_SIPHASH_H
+
+#include <cstdint>
+
+namespace llvm {
+class StringRef;
+
+/// Compute a stable 64-bit hash of the given string.
+///
+/// The exact algorithm is the little-endian interpretation of the
+/// non-doubled (i.e. 64-bit) result of applying a SipHash-2-4 using
+/// a specific key value which can be found in the source.
+///
+/// By "stable" we mean that the result of this hash algorithm will
+/// the same across different compiler versions and target platforms.
+uint64_t getPointerAuthStableSipHash64(StringRef S);
+
+/// Compute a stable non-zero 16-bit hash of the given string.
+///
+/// This computes the full getPointerAuthStableSipHash64, but additionally
+/// truncates it down to a non-zero 16-bit value.
+///
+/// We use a 16-bit discriminator because ARM64 can efficiently load
+/// a 16-bit immediate into the high bits of a register without disturbing
+/// the remainder of the value, which serves as a nice blend operation.
+/// 16 bits is also sufficiently compact to not inflate a loader relocation.
+/// We disallow zero to guarantee a different discriminator from the places
+/// in the ABI that use a constant zero.
+uint64_t getPointerAuthStableSipHash16(StringRef S);
+
+} // end namespace llvm
+
+#endif
diff --git a/llvm/lib/Support/CMakeLists.txt b/llvm/lib/Support/CMakeLists.txt
index be4badc09efa5..aa37b812791ff 100644
--- a/llvm/lib/Support/CMakeLists.txt
+++ b/llvm/lib/Support/CMakeLists.txt
@@ -222,6 +222,7 @@ add_llvm_component_library(LLVMSupport
   SHA1.cpp
   SHA256.cpp
   Signposts.cpp
+  SipHash.cpp
   SmallPtrSet.cpp
   SmallVector.cpp
   SourceMgr.cpp
diff --git a/llvm/lib/Support/SipHash.cpp b/llvm/lib/Support/SipHash.cpp
new file mode 100644
index 0000000000000..dbd60fb73ebb5
--- /dev/null
+++ b/llvm/lib/Support/SipHash.cpp
@@ -0,0 +1,174 @@
+//===--- StableHash.cpp - An ABI-stable string hash -----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+//  This file implements an ABI-stable string hash based on SipHash, used to
+//  compute ptrauth discriminators.
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Support/SipHash.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Debug.h"
+#include <cstdint>
+#include <cstring>
+
+using namespace llvm;
+
+#define DEBUG_TYPE "llvm-siphash"
+
+//  Lightly adapted from the SipHash reference C implementation by
+//  Jean-Philippe Aumasson and Daniel J. Bernstein.
+
+#define SIPHASH_ROTL(x, b) (uint64_t)(((x) << (b)) | ((x) >> (64 - (b))))
+
+#define SIPHASH_U8TO64_LE(p)                                                   \
+  (((uint64_t)((p)[0])) | ((uint64_t)((p)[1]) << 8) |                          \
+   ((uint64_t)((p)[2]) << 16) | ((uint64_t)((p)[3]) << 24) |                   \
+   ((uint64_t)((p)[4]) << 32) | ((uint64_t)((p)[5]) << 40) |                   \
+   ((uint64_t)((p)[6]) << 48) | ((uint64_t)((p)[7]) << 56))
+
+#define SIPHASH_SIPROUND                                                       \
+  do {                                                                         \
+    v0 += v1;                                                                  \
+    v1 = SIPHASH_ROTL(v1, 13);                                                 \
+    v1 ^= v0;                                                                  \
+    v0 = SIPHASH_ROTL(v0, 32);                                                 \
+    v2 += v3;                                                                  \
+    v3 = SIPHASH_ROTL(v3, 16);                                                 \
+    v3 ^= v2;                                                                  \
+    v0 += v3;                                                                  \
+    v3 = SIPHASH_ROTL(v3, 21);                                                 \
+    v3 ^= v0;                                                                  \
+    v2 += v1;                                                                  \
+    v1 = SIPHASH_ROTL(v1, 17);                                                 \
+    v1 ^= v2;                                                                  \
+    v2 = SIPHASH_ROTL(v2, 32);                                                 \
+  } while (0)
+
+template <int cROUNDS, int dROUNDS, class ResultTy>
+static inline ResultTy siphash(const uint8_t *in, uint64_t inlen,
+                               const uint8_t (&k)[16]) {
+  static_assert(sizeof(ResultTy) == 8 || sizeof(ResultTy) == 16,
+                "result type should be uint64_t or uint128_t");
+  uint64_t v0 = 0x736f6d6570736575ULL;
+  uint64_t v1 = 0x646f72616e646f6dULL;
+  uint64_t v2 = 0x6c7967656e657261ULL;
+  uint64_t v3 = 0x7465646279746573ULL;
+  uint64_t b;
+  uint64_t k0 = SIPHASH_U8TO64_LE(k);
+  uint64_t k1 = SIPHASH_U8TO64_LE(k + 8);
+  uint64_t m;
+  int i;
+  const uint8_t *end = in + inlen - (inlen % sizeof(uint64_t));
+  const int left = inlen & 7;
+  b = ((uint64_t)inlen) << 56;
+  v3 ^= k1;
+  v2 ^= k0;
+  v1 ^= k1;
+  v0 ^= k0;
+
+  if (sizeof(ResultTy) == 16) {
+    v1 ^= 0xee;
+  }
+
+  for (; in != end; in += 8) {
+    m = SIPHASH_U8TO64_LE(in);
+    v3 ^= m;
+
+    for (i = 0; i < cROUNDS; ++i)
+      SIPHASH_SIPROUND;
+
+    v0 ^= m;
+  }
+
+  switch (left) {
+  case 7:
+    b |= ((uint64_t)in[6]) << 48;
+    LLVM_FALLTHROUGH;
+  case 6:
+    b |= ((uint64_t)in[5]) << 40;
+    LLVM_FALLTHROUGH;
+  case 5:
+    b |= ((uint64_t)in[4]) << 32;
+    LLVM_FALLTHROUGH;
+  case 4:
+    b |= ((uint64_t)in[3]) << 24;
+    LLVM_FALLTHROUGH;
+  case 3:
+    b |= ((uint64_t)in[2]) << 16;
+    LLVM_FALLTHROUGH;
+  case 2:
+    b |= ((uint64_t)in[1]) << 8;
+    LLVM_FALLTHROUGH;
+  case 1:
+    b |= ((uint64_t)in[0]);
+    break;
+  case 0:
+    break;
+  }
+
+  v3 ^= b;
+
+  for (i = 0; i < cROUNDS; ++i)
+    SIPHASH_SIPROUND;
+
+  v0 ^= b;
+
+  if (sizeof(ResultTy) == 8) {
+    v2 ^= 0xff;
+  } else {
+    v2 ^= 0xee;
+  }
+
+  for (i = 0; i < dROUNDS; ++i)
+    SIPHASH_SIPROUND;
+
+  b = v0 ^ v1 ^ v2 ^ v3;
+
+  // This mess with the result type would be easier with 'if constexpr'.
+
+  uint64_t firstHalf = b;
+  if (sizeof(ResultTy) == 8)
+    return firstHalf;
+
+  v1 ^= 0xdd;
+
+  for (i = 0; i < dROUNDS; ++i)
+    SIPHASH_SIPROUND;
+
+  b = v0 ^ v1 ^ v2 ^ v3;
+  uint64_t secondHalf = b;
+
+  return firstHalf | (ResultTy(secondHalf) << (sizeof(ResultTy) == 8 ? 0 : 64));
+}
+
+//===--- LLVM-specific wrappers around siphash.
+
+/// Compute an ABI-stable 64-bit hash of the given string.
+uint64_t llvm::getPointerAuthStableSipHash64(StringRef Str) {
+  static const uint8_t K[16] = {0xb5, 0xd4, 0xc9, 0xeb, 0x79, 0x10, 0x4a, 0x79,
+                                0x6f, 0xec, 0x8b, 0x1b, 0x42, 0x87, 0x81, 0xd4};
+
+  // The aliasing is fine here because of omnipotent char.
+  auto *Data = reinterpret_cast<const uint8_t *>(Str.data());
+  return siphash<2, 4, uint64_t>(Data, Str.size(), K);
+}
+
+/// Compute an ABI-stable 16-bit hash of the given string.
+uint64_t llvm::getPointerAuthStableSipHash16(StringRef Str) {
+  uint64_t RawHash = getPointerAuthStableSipHash64(Str);
+
+  // Produce a non-zero 16-bit discriminator.
+  uint64_t Discriminator = (RawHash % 0xFFFF) + 1;
+  LLVM_DEBUG(dbgs() << "ptrauth stable hash 16-bit discriminator: "
+                    << utostr(Discriminator) << " (0x"
+                    << utohexstr(Discriminator) << ")"
+                    << " of: " << Str << "\n");
+  return Discriminator;
+}
diff --git a/llvm/unittests/Support/CMakeLists.txt b/llvm/unittests/Support/CMakeLists.txt
index 2718be8450f80..631f2e6bf00df 100644
--- a/llvm/unittests/Support/CMakeLists.txt
+++ b/llvm/unittests/Support/CMakeLists.txt
@@ -75,6 +75,7 @@ add_llvm_unittest(SupportTests
   ScopedPrinterTest.cpp
   SHA256.cpp
   SignalsTest.cpp
+  SipHashTest.cpp
   SourceMgrTest.cpp
   SpecialCaseListTest.cpp
   SuffixTreeTest.cpp
diff --git a/llvm/unittests/Support/SipHashTest.cpp b/llvm/unittests/Support/SipHashTest.cpp
new file mode 100644
index 0000000000000..1a8143d9c9375
--- /dev/null
+++ b/llvm/unittests/Support/SipHashTest.cpp
@@ -0,0 +1,43 @@
+//===- llvm/unittest/Support/SipHashTest.cpp ------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Support/SipHash.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/Support/raw_ostream.h"
+#include "gtest/gtest.h"
+
+using namespace llvm;
+
+namespace {
+
+TEST(SipHashTest, PointerAuthSipHash) {
+  // Test some basic cases, for 16 bit and 64 bit results.
+  EXPECT_EQ(0xE793U, getPointerAuthStableSipHash16(""));
+  EXPECT_EQ(0xF468U, getPointerAuthStableSipHash16("strlen"));
+  EXPECT_EQ(0x2D15U, getPointerAuthStableSipHash16("_ZN1 ind; f"));
+
+  EXPECT_EQ(0xB2BB69BB0A2AC0F1U, getPointerAuthStableSipHash64(""));
+  EXPECT_EQ(0x9304ABFF427B72E8U, getPointerAuthStableSipHash64("strlen"));
+  EXPECT_EQ(0x55F45179A08AE51BU, getPointerAuthStableSipHash64("_ZN1 ind; f"));
+
+  // Test some known strings that are already enshrined in the ABI.
+  EXPECT_EQ(0x6AE1U, getPointerAuthStableSipHash16("isa"));
+  EXPECT_EQ(0xB5ABU, getPointerAuthStableSipHash16("objc_class:superclass"));
+  EXPECT_EQ(0xC0BBU, getPointerAuthStableSipHash16("block_descriptor"));
+  EXPECT_EQ(0xC310U, getPointerAuthStableSipHash16("method_list_t"));
+
+  // Test the limits that apply to 16 bit results but don't to 64 bit results.
+  EXPECT_EQ(1U,                  getPointerAuthStableSipHash16("_Zptrkvttf"));
+  EXPECT_EQ(0x314FD87E0611F020U, getPointerAuthStableSipHash64("_Zptrkvttf"));
+
+  EXPECT_EQ(0xFFFFU,             getPointerAuthStableSipHash16("_Zaflhllod"));
+  EXPECT_EQ(0x1292F635FB3DFBF8U, getPointerAuthStableSipHash64("_Zaflhllod"));
+}
+
+} // end anonymous namespace

@ahmedbougacha ahmedbougacha self-assigned this May 31, 2024
Base automatically changed from users/ahmedbougacha/ptrauth-header-arm64e-keys to main May 31, 2024 20:03
Copy link

github-actions bot commented May 31, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@asl
Copy link
Collaborator

asl commented Jun 3, 2024

@kbeyls Will you please summarize briefly what we discussed on the sync call from the license standpoint?

Copy link
Contributor

@kovdan01 kovdan01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, we want to take reference SipHash implementation with minimal changes. I personally have no issues with taking cryptographic C code "as is" with minimal changes. If it is the intention, LGTM with several minor comments.

Anyway, I want someone else with deeper understanding of the context to take a look before this gets merged.

uint64_t RawHash = getPointerAuthStableSipHash64(Str);

// Produce a non-zero 16-bit discriminator.
uint64_t Discriminator = (RawHash % 0xFFFF) + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure that such scheme is already used in downstream for a long time and there is a strong point in having non-zero discriminator always when we compute that from a string, but let me mention a potential downside of such approach instead of just doing uint64_t Discriminator = RawHash % 0x10000;.

If we assume that 64-bit hash values are distributed uniformly when applying the hash function to an infinite set of all possible strings (this should probably be true for a cryptographically secure hash), non-zero 16-bit values computed as here become non-uniformly distributed:

  • 16-bit value 0: 0 64-bit values corresponding
  • 16-bit value 1: 281479271743490 64-bit values corresponding
  • 16-bit values 2..65535: 281479271743489 64-bit values corresponding

I suppose that it might be OK, it's just not very consistent with 64-bit hash computation since we do not try to avoid zero value there. I get the point that the chance of having zero 64-bit hash value is very low compared to 16-bit though.

The final point: if that was discussed with security researchers, I have no issues with such an implementation ignoring 16-bit zeros. If not - IMHO it's better to talk to security specialists and ask them for a piece of advice.

Copy link
Member Author

@ahmedbougacha ahmedbougacha Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure that such scheme is already used in downstream for a long time

I'll just mention that it's not a matter of downstream: it's the platform ABI for arm64e.

I suppose that it might be OK, it's just not very consistent with 64-bit hash computation since we do not try to avoid zero value there. I get the point that the chance of having zero 64-bit hash value is very low compared to 16-bit though.

Right, the 16-bit and 64-bit hashes aren't used in equivalent ways. The 16-bit result is used in discriminator contexts, where the overwhelming majority of discriminators are 0 (and this is even enshrined in the ISA with the LDRAA/LDRAB instructions.) This current patch series doesn't use the 64-bit variant, so I removed it altogether, but you could imagine using it in more interesting ways, e.g., as a way to seed a (not-really-but-almost) 64-bit PACGA MAC chain. In such cases a collision with the default 0 discriminator value isn't a concern.

The final point: if that was discussed with security researchers, I have no issues with such an implementation ignoring 16-bit zeros. If not - IMHO it's better to talk to security specialists and ask them for a piece of advice.

Absolutely, I suggest you all consider that for the new ELF implementations ;) But arm64e is ABI

@ahmedbougacha ahmedbougacha force-pushed the users/ahmedbougacha/ptrauth-siphash branch from 8e9ab9c to bf413d6 Compare June 4, 2024 22:13
@ahmedbougacha ahmedbougacha changed the base branch from main to users/ahmedbougacha/integrate-siphash-into-libsupport June 4, 2024 22:13
@ahmedbougacha
Copy link
Member Author

ahmedbougacha commented Jun 4, 2024

I split out:

and rebased this on top: we're now left with only the thin wrapper (the 16-bit one only here) and the simple unittests.

@ahmedbougacha ahmedbougacha changed the title [Support] Add SipHash-based 16/64-bit ptrauth stable hash. [Support] Add SipHash-based 16-bit ptrauth stable hash. Jun 4, 2024
@kovdan01
Copy link
Contributor

kovdan01 commented Jun 6, 2024

  • we're now left with only the thin wrapper (the 16-bit one only here) and the simple unittests

The wrapper and the tests LGTM

@ahmedbougacha ahmedbougacha force-pushed the users/ahmedbougacha/integrate-siphash-into-libsupport branch from 52ed57f to e0343b5 Compare June 15, 2024 00:06
Base automatically changed from users/ahmedbougacha/integrate-siphash-into-libsupport to main June 15, 2024 00:07
This finally wraps the now-lightly-modified SipHash C reference
implementation, for the main interface we need (16-bit ptrauth
discriminators).

The exact algorithm is the little-endian interpretation of the
non-doubled (i.e. 64-bit) result of applying a SipHash-2-4 using the
constant seed `b5d4c9eb79104a796fec8b1b428781d4` (big-endian), with the
result reduced by modulo to the range of non-zero discriminators (i.e.
`(rawHash % 65535) + 1`).

By "stable" we mean that the result of this hash algorithm will the same
across different compiler versions and target platforms.

The 16-bit hashes are used extensively for the AArch64 ptrauth ABI,
because AArch64 can efficiently load a 16-bit immediate into the high
bits of a register without disturbing the remainder of the value, which
serves as a nice blend operation.

16 bits is also sufficiently compact to not inflate a loader relocation.
We disallow zero to guarantee a different discriminator from the places
in the ABI that use a constant zero.

Co-Authored-By: John McCall <[email protected]>
@ahmedbougacha ahmedbougacha force-pushed the users/ahmedbougacha/ptrauth-siphash branch from bb44624 to b0a19c3 Compare June 15, 2024 00:14
@ahmedbougacha
Copy link
Member Author

I'll wait for #94394 test results before merging this.

@ahmedbougacha ahmedbougacha merged commit 61069bd into main Jun 15, 2024
5 of 6 checks passed
@ahmedbougacha ahmedbougacha deleted the users/ahmedbougacha/ptrauth-siphash branch June 15, 2024 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants