-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Support] Add SipHash-based 16-bit ptrauth stable hash. #93902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-support Author: Ahmed Bougacha (ahmedbougacha) ChangesBased on the SipHash reference implementation: This lightly modifies it to fit into libSupport, and wraps it for the two main interfaces we're interested in (16/64-bit). This intentionally doesn't expose a raw interface beyond that to encourage others to carefully consider their use. The exact algorithm is the little-endian interpretation of the non-doubled (i.e. 64-bit) result of applying a SipHash-2-4 using a specific key value which can be found in the source. By "stable" we mean that the result of this hash algorithm will the same across different compiler versions and target platforms. The 16-bit variant is used extensively for the AArch64 ptrauth ABI, because AArch64 can efficiently load a 16-bit immediate into the high bits of a register without disturbing the remainder of the value, which serves as a nice blend operation. 16 bits is also sufficiently compact to not inflate a loader relocation. We disallow zero to guarantee a different discriminator from the places in the ABI that use a constant zero. Full diff: https://github.com/llvm/llvm-project/pull/93902.diff 5 Files Affected:
diff --git a/llvm/include/llvm/Support/SipHash.h b/llvm/include/llvm/Support/SipHash.h
new file mode 100644
index 0000000000000..fcc29c00da185
--- /dev/null
+++ b/llvm/include/llvm/Support/SipHash.h
@@ -0,0 +1,47 @@
+//===--- SipHash.h - An ABI-stable string SipHash ---------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// A family of ABI-stable string hash algorithms based on SipHash, currently
+// used to compute ptrauth discriminators.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_SUPPORT_SIPHASH_H
+#define LLVM_SUPPORT_SIPHASH_H
+
+#include <cstdint>
+
+namespace llvm {
+class StringRef;
+
+/// Compute a stable 64-bit hash of the given string.
+///
+/// The exact algorithm is the little-endian interpretation of the
+/// non-doubled (i.e. 64-bit) result of applying a SipHash-2-4 using
+/// a specific key value which can be found in the source.
+///
+/// By "stable" we mean that the result of this hash algorithm will
+/// the same across different compiler versions and target platforms.
+uint64_t getPointerAuthStableSipHash64(StringRef S);
+
+/// Compute a stable non-zero 16-bit hash of the given string.
+///
+/// This computes the full getPointerAuthStableSipHash64, but additionally
+/// truncates it down to a non-zero 16-bit value.
+///
+/// We use a 16-bit discriminator because ARM64 can efficiently load
+/// a 16-bit immediate into the high bits of a register without disturbing
+/// the remainder of the value, which serves as a nice blend operation.
+/// 16 bits is also sufficiently compact to not inflate a loader relocation.
+/// We disallow zero to guarantee a different discriminator from the places
+/// in the ABI that use a constant zero.
+uint64_t getPointerAuthStableSipHash16(StringRef S);
+
+} // end namespace llvm
+
+#endif
diff --git a/llvm/lib/Support/CMakeLists.txt b/llvm/lib/Support/CMakeLists.txt
index be4badc09efa5..aa37b812791ff 100644
--- a/llvm/lib/Support/CMakeLists.txt
+++ b/llvm/lib/Support/CMakeLists.txt
@@ -222,6 +222,7 @@ add_llvm_component_library(LLVMSupport
SHA1.cpp
SHA256.cpp
Signposts.cpp
+ SipHash.cpp
SmallPtrSet.cpp
SmallVector.cpp
SourceMgr.cpp
diff --git a/llvm/lib/Support/SipHash.cpp b/llvm/lib/Support/SipHash.cpp
new file mode 100644
index 0000000000000..dbd60fb73ebb5
--- /dev/null
+++ b/llvm/lib/Support/SipHash.cpp
@@ -0,0 +1,174 @@
+//===--- StableHash.cpp - An ABI-stable string hash -----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements an ABI-stable string hash based on SipHash, used to
+// compute ptrauth discriminators.
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Support/SipHash.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Debug.h"
+#include <cstdint>
+#include <cstring>
+
+using namespace llvm;
+
+#define DEBUG_TYPE "llvm-siphash"
+
+// Lightly adapted from the SipHash reference C implementation by
+// Jean-Philippe Aumasson and Daniel J. Bernstein.
+
+#define SIPHASH_ROTL(x, b) (uint64_t)(((x) << (b)) | ((x) >> (64 - (b))))
+
+#define SIPHASH_U8TO64_LE(p) \
+ (((uint64_t)((p)[0])) | ((uint64_t)((p)[1]) << 8) | \
+ ((uint64_t)((p)[2]) << 16) | ((uint64_t)((p)[3]) << 24) | \
+ ((uint64_t)((p)[4]) << 32) | ((uint64_t)((p)[5]) << 40) | \
+ ((uint64_t)((p)[6]) << 48) | ((uint64_t)((p)[7]) << 56))
+
+#define SIPHASH_SIPROUND \
+ do { \
+ v0 += v1; \
+ v1 = SIPHASH_ROTL(v1, 13); \
+ v1 ^= v0; \
+ v0 = SIPHASH_ROTL(v0, 32); \
+ v2 += v3; \
+ v3 = SIPHASH_ROTL(v3, 16); \
+ v3 ^= v2; \
+ v0 += v3; \
+ v3 = SIPHASH_ROTL(v3, 21); \
+ v3 ^= v0; \
+ v2 += v1; \
+ v1 = SIPHASH_ROTL(v1, 17); \
+ v1 ^= v2; \
+ v2 = SIPHASH_ROTL(v2, 32); \
+ } while (0)
+
+template <int cROUNDS, int dROUNDS, class ResultTy>
+static inline ResultTy siphash(const uint8_t *in, uint64_t inlen,
+ const uint8_t (&k)[16]) {
+ static_assert(sizeof(ResultTy) == 8 || sizeof(ResultTy) == 16,
+ "result type should be uint64_t or uint128_t");
+ uint64_t v0 = 0x736f6d6570736575ULL;
+ uint64_t v1 = 0x646f72616e646f6dULL;
+ uint64_t v2 = 0x6c7967656e657261ULL;
+ uint64_t v3 = 0x7465646279746573ULL;
+ uint64_t b;
+ uint64_t k0 = SIPHASH_U8TO64_LE(k);
+ uint64_t k1 = SIPHASH_U8TO64_LE(k + 8);
+ uint64_t m;
+ int i;
+ const uint8_t *end = in + inlen - (inlen % sizeof(uint64_t));
+ const int left = inlen & 7;
+ b = ((uint64_t)inlen) << 56;
+ v3 ^= k1;
+ v2 ^= k0;
+ v1 ^= k1;
+ v0 ^= k0;
+
+ if (sizeof(ResultTy) == 16) {
+ v1 ^= 0xee;
+ }
+
+ for (; in != end; in += 8) {
+ m = SIPHASH_U8TO64_LE(in);
+ v3 ^= m;
+
+ for (i = 0; i < cROUNDS; ++i)
+ SIPHASH_SIPROUND;
+
+ v0 ^= m;
+ }
+
+ switch (left) {
+ case 7:
+ b |= ((uint64_t)in[6]) << 48;
+ LLVM_FALLTHROUGH;
+ case 6:
+ b |= ((uint64_t)in[5]) << 40;
+ LLVM_FALLTHROUGH;
+ case 5:
+ b |= ((uint64_t)in[4]) << 32;
+ LLVM_FALLTHROUGH;
+ case 4:
+ b |= ((uint64_t)in[3]) << 24;
+ LLVM_FALLTHROUGH;
+ case 3:
+ b |= ((uint64_t)in[2]) << 16;
+ LLVM_FALLTHROUGH;
+ case 2:
+ b |= ((uint64_t)in[1]) << 8;
+ LLVM_FALLTHROUGH;
+ case 1:
+ b |= ((uint64_t)in[0]);
+ break;
+ case 0:
+ break;
+ }
+
+ v3 ^= b;
+
+ for (i = 0; i < cROUNDS; ++i)
+ SIPHASH_SIPROUND;
+
+ v0 ^= b;
+
+ if (sizeof(ResultTy) == 8) {
+ v2 ^= 0xff;
+ } else {
+ v2 ^= 0xee;
+ }
+
+ for (i = 0; i < dROUNDS; ++i)
+ SIPHASH_SIPROUND;
+
+ b = v0 ^ v1 ^ v2 ^ v3;
+
+ // This mess with the result type would be easier with 'if constexpr'.
+
+ uint64_t firstHalf = b;
+ if (sizeof(ResultTy) == 8)
+ return firstHalf;
+
+ v1 ^= 0xdd;
+
+ for (i = 0; i < dROUNDS; ++i)
+ SIPHASH_SIPROUND;
+
+ b = v0 ^ v1 ^ v2 ^ v3;
+ uint64_t secondHalf = b;
+
+ return firstHalf | (ResultTy(secondHalf) << (sizeof(ResultTy) == 8 ? 0 : 64));
+}
+
+//===--- LLVM-specific wrappers around siphash.
+
+/// Compute an ABI-stable 64-bit hash of the given string.
+uint64_t llvm::getPointerAuthStableSipHash64(StringRef Str) {
+ static const uint8_t K[16] = {0xb5, 0xd4, 0xc9, 0xeb, 0x79, 0x10, 0x4a, 0x79,
+ 0x6f, 0xec, 0x8b, 0x1b, 0x42, 0x87, 0x81, 0xd4};
+
+ // The aliasing is fine here because of omnipotent char.
+ auto *Data = reinterpret_cast<const uint8_t *>(Str.data());
+ return siphash<2, 4, uint64_t>(Data, Str.size(), K);
+}
+
+/// Compute an ABI-stable 16-bit hash of the given string.
+uint64_t llvm::getPointerAuthStableSipHash16(StringRef Str) {
+ uint64_t RawHash = getPointerAuthStableSipHash64(Str);
+
+ // Produce a non-zero 16-bit discriminator.
+ uint64_t Discriminator = (RawHash % 0xFFFF) + 1;
+ LLVM_DEBUG(dbgs() << "ptrauth stable hash 16-bit discriminator: "
+ << utostr(Discriminator) << " (0x"
+ << utohexstr(Discriminator) << ")"
+ << " of: " << Str << "\n");
+ return Discriminator;
+}
diff --git a/llvm/unittests/Support/CMakeLists.txt b/llvm/unittests/Support/CMakeLists.txt
index 2718be8450f80..631f2e6bf00df 100644
--- a/llvm/unittests/Support/CMakeLists.txt
+++ b/llvm/unittests/Support/CMakeLists.txt
@@ -75,6 +75,7 @@ add_llvm_unittest(SupportTests
ScopedPrinterTest.cpp
SHA256.cpp
SignalsTest.cpp
+ SipHashTest.cpp
SourceMgrTest.cpp
SpecialCaseListTest.cpp
SuffixTreeTest.cpp
diff --git a/llvm/unittests/Support/SipHashTest.cpp b/llvm/unittests/Support/SipHashTest.cpp
new file mode 100644
index 0000000000000..1a8143d9c9375
--- /dev/null
+++ b/llvm/unittests/Support/SipHashTest.cpp
@@ -0,0 +1,43 @@
+//===- llvm/unittest/Support/SipHashTest.cpp ------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Support/SipHash.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/Support/raw_ostream.h"
+#include "gtest/gtest.h"
+
+using namespace llvm;
+
+namespace {
+
+TEST(SipHashTest, PointerAuthSipHash) {
+ // Test some basic cases, for 16 bit and 64 bit results.
+ EXPECT_EQ(0xE793U, getPointerAuthStableSipHash16(""));
+ EXPECT_EQ(0xF468U, getPointerAuthStableSipHash16("strlen"));
+ EXPECT_EQ(0x2D15U, getPointerAuthStableSipHash16("_ZN1 ind; f"));
+
+ EXPECT_EQ(0xB2BB69BB0A2AC0F1U, getPointerAuthStableSipHash64(""));
+ EXPECT_EQ(0x9304ABFF427B72E8U, getPointerAuthStableSipHash64("strlen"));
+ EXPECT_EQ(0x55F45179A08AE51BU, getPointerAuthStableSipHash64("_ZN1 ind; f"));
+
+ // Test some known strings that are already enshrined in the ABI.
+ EXPECT_EQ(0x6AE1U, getPointerAuthStableSipHash16("isa"));
+ EXPECT_EQ(0xB5ABU, getPointerAuthStableSipHash16("objc_class:superclass"));
+ EXPECT_EQ(0xC0BBU, getPointerAuthStableSipHash16("block_descriptor"));
+ EXPECT_EQ(0xC310U, getPointerAuthStableSipHash16("method_list_t"));
+
+ // Test the limits that apply to 16 bit results but don't to 64 bit results.
+ EXPECT_EQ(1U, getPointerAuthStableSipHash16("_Zptrkvttf"));
+ EXPECT_EQ(0x314FD87E0611F020U, getPointerAuthStableSipHash64("_Zptrkvttf"));
+
+ EXPECT_EQ(0xFFFFU, getPointerAuthStableSipHash16("_Zaflhllod"));
+ EXPECT_EQ(0x1292F635FB3DFBF8U, getPointerAuthStableSipHash64("_Zaflhllod"));
+}
+
+} // end anonymous namespace
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
@kbeyls Will you please summarize briefly what we discussed on the sync call from the license standpoint? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, we want to take reference SipHash implementation with minimal changes. I personally have no issues with taking cryptographic C code "as is" with minimal changes. If it is the intention, LGTM with several minor comments.
Anyway, I want someone else with deeper understanding of the context to take a look before this gets merged.
llvm/lib/Support/SipHash.cpp
Outdated
uint64_t RawHash = getPointerAuthStableSipHash64(Str); | ||
|
||
// Produce a non-zero 16-bit discriminator. | ||
uint64_t Discriminator = (RawHash % 0xFFFF) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure that such scheme is already used in downstream for a long time and there is a strong point in having non-zero discriminator always when we compute that from a string, but let me mention a potential downside of such approach instead of just doing uint64_t Discriminator = RawHash % 0x10000;
.
If we assume that 64-bit hash values are distributed uniformly when applying the hash function to an infinite set of all possible strings (this should probably be true for a cryptographically secure hash), non-zero 16-bit values computed as here become non-uniformly distributed:
- 16-bit value 0: 0 64-bit values corresponding
- 16-bit value 1: 281479271743490 64-bit values corresponding
- 16-bit values 2..65535: 281479271743489 64-bit values corresponding
I suppose that it might be OK, it's just not very consistent with 64-bit hash computation since we do not try to avoid zero value there. I get the point that the chance of having zero 64-bit hash value is very low compared to 16-bit though.
The final point: if that was discussed with security researchers, I have no issues with such an implementation ignoring 16-bit zeros. If not - IMHO it's better to talk to security specialists and ask them for a piece of advice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure that such scheme is already used in downstream for a long time
I'll just mention that it's not a matter of downstream: it's the platform ABI for arm64e.
I suppose that it might be OK, it's just not very consistent with 64-bit hash computation since we do not try to avoid zero value there. I get the point that the chance of having zero 64-bit hash value is very low compared to 16-bit though.
Right, the 16-bit and 64-bit hashes aren't used in equivalent ways. The 16-bit result is used in discriminator contexts, where the overwhelming majority of discriminators are 0 (and this is even enshrined in the ISA with the LDRAA/LDRAB instructions.) This current patch series doesn't use the 64-bit variant, so I removed it altogether, but you could imagine using it in more interesting ways, e.g., as a way to seed a (not-really-but-almost) 64-bit PACGA MAC chain. In such cases a collision with the default 0 discriminator value isn't a concern.
The final point: if that was discussed with security researchers, I have no issues with such an implementation ignoring 16-bit zeros. If not - IMHO it's better to talk to security specialists and ask them for a piece of advice.
Absolutely, I suggest you all consider that for the new ELF implementations ;) But arm64e is ABI
8e9ab9c
to
bf413d6
Compare
I split out:
and rebased this on top: we're now left with only the thin wrapper (the 16-bit one only here) and the simple unittests. |
The wrapper and the tests LGTM |
52ed57f
to
e0343b5
Compare
This finally wraps the now-lightly-modified SipHash C reference implementation, for the main interface we need (16-bit ptrauth discriminators). The exact algorithm is the little-endian interpretation of the non-doubled (i.e. 64-bit) result of applying a SipHash-2-4 using the constant seed `b5d4c9eb79104a796fec8b1b428781d4` (big-endian), with the result reduced by modulo to the range of non-zero discriminators (i.e. `(rawHash % 65535) + 1`). By "stable" we mean that the result of this hash algorithm will the same across different compiler versions and target platforms. The 16-bit hashes are used extensively for the AArch64 ptrauth ABI, because AArch64 can efficiently load a 16-bit immediate into the high bits of a register without disturbing the remainder of the value, which serves as a nice blend operation. 16 bits is also sufficiently compact to not inflate a loader relocation. We disallow zero to guarantee a different discriminator from the places in the ABI that use a constant zero. Co-Authored-By: John McCall <[email protected]>
bb44624
to
b0a19c3
Compare
I'll wait for #94394 test results before merging this. |
Depends on:
This finally wraps the now-lightly-modified SipHash C reference
implementation, for the main interface we need (16-bit ptrauth
discriminators).
The exact algorithm is the little-endian interpretation of the
non-doubled (i.e. 64-bit) result of applying a SipHash-2-4 using the
constant seed
b5d4c9eb79104a796fec8b1b428781d4
(big-endian), with theresult reduced by modulo to the range of non-zero discriminators (i.e.
(rawHash % 65535) + 1
).By "stable" we mean that the result of this hash algorithm will the same
across different compiler versions and target platforms.
The 16-bit hashes are used extensively for the AArch64 ptrauth ABI,
because AArch64 can efficiently load a 16-bit immediate into the high
bits of a register without disturbing the remainder of the value, which
serves as a nice blend operation.
16 bits is also sufficiently compact to not inflate a loader relocation.
We disallow zero to guarantee a different discriminator from the places
in the ABI that use a constant zero.