[Support] Integrate SipHash.cpp into libSupport. #94394

ahmedbougacha · 2024-06-04T20:05:22Z

Start building it as part of the library, with some minor
tweaks compared to the reference implementation:

clang-format to match libSupport
remove tracing support
add file header
templatize cROUNDS/dROUNDS, as well as 8B/16B result type
replace assert with static_assert
return the result directly, as uint64_t/uint128_t
remove big-endian support
use LLVM_FALLTHROUGH
remove tracing support

The siphash function itself isn't used yet, and will be in a follow-up
commit.

llvmbot · 2024-06-04T22:11:51Z

@llvm/pr-subscribers-llvm-support

Author: Ahmed Bougacha (ahmedbougacha)

Changes

Start building it as part of the library, with some minor
tweaks compared to the reference implementation:

clang-format to match libSupport
remove tracing support
add file header
templatize cROUNDS/dROUNDS, as well as 8B/16B result type
replace assert with static_assert
return the result directly, as uint64_t/uint128_t
remove big-endian support
use LLVM_FALLTHROUGH
remove tracing support

The siphash function itself isn't used yet, and will be in a follow-up
commit.

Full diff: https://github.com/llvm/llvm-project/pull/94394.diff

3 Files Affected:

(modified) llvm/lib/Support/CMakeLists.txt (+1-3)
(removed) llvm/lib/Support/README.md.SipHash (-126)
(modified) llvm/lib/Support/SipHash.cpp (+126-176)

diff --git a/llvm/lib/Support/CMakeLists.txt b/llvm/lib/Support/CMakeLists.txt
index 7cc01a5399911..f5f8447d48d01 100644
--- a/llvm/lib/Support/CMakeLists.txt
+++ b/llvm/lib/Support/CMakeLists.txt
@@ -127,9 +127,6 @@ endif()
 
 add_subdirectory(BLAKE3)
 
-# Temporarily ignore SipHash.cpp before we fully integrate it into LLVMSupport.
-set(LLVM_OPTIONAL_SOURCES SipHash.cpp)
-
 add_llvm_component_library(LLVMSupport
   ABIBreak.cpp
   AMDGPUMetadata.cpp
@@ -226,6 +223,7 @@ add_llvm_component_library(LLVMSupport
   SHA1.cpp
   SHA256.cpp
   Signposts.cpp
+  SipHash.cpp
   SmallPtrSet.cpp
   SmallVector.cpp
   SourceMgr.cpp
diff --git a/llvm/lib/Support/README.md.SipHash b/llvm/lib/Support/README.md.SipHash
deleted file mode 100644
index 4de3cd1854681..0000000000000
--- a/llvm/lib/Support/README.md.SipHash
+++ /dev/null
@@ -1,126 +0,0 @@
-# SipHash
-
-[![License:
-CC0-1.0](https://licensebuttons.net/l/zero/1.0/80x15.png)](http://creativecommons.org/publicdomain/zero/1.0/)
-
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-
-
-SipHash is a family of pseudorandom functions (PRFs) optimized for speed on short messages.
-This is the reference C code of SipHash: portable, simple, optimized for clarity and debugging.
-
-SipHash was designed in 2012 by [Jean-Philippe Aumasson](https://aumasson.jp)
-and [Daniel J. Bernstein](https://cr.yp.to) as a defense against [hash-flooding
-DoS attacks](https://aumasson.jp/siphash/siphashdos_29c3_slides.pdf).
-
-SipHash is:
-
-* *Simpler and faster* on short messages than previous cryptographic
-algorithms, such as MACs based on universal hashing.
-
-* *Competitive in performance* with insecure non-cryptographic algorithms, such as [fhhash](https://github.com/cbreeden/fxhash).
-
-* *Cryptographically secure*, with no sign of weakness despite multiple [cryptanalysis](https://eprint.iacr.org/2019/865) [projects](https://eprint.iacr.org/2019/865) by leading cryptographers.
-
-* *Battle-tested*, with successful integration in OSs (Linux kernel, OpenBSD,
-FreeBSD, FreeRTOS), languages (Perl, Python, Ruby, etc.), libraries (OpenSSL libcrypto,
-Sodium, etc.) and applications (Wireguard, Redis, etc.).
-
-As a secure pseudorandom function (a.k.a. keyed hash function), SipHash can also be used as a secure message authentication code (MAC).
-But SipHash is *not a hash* in the sense of general-purpose key-less hash function such as BLAKE3 or SHA-3.
-SipHash should therefore always be used with a secret key in order to be secure.
-
-
-## Variants
-
-The default SipHash is *SipHash-2-4*: it takes a 128-bit key, does 2 compression
-rounds, 4 finalization rounds, and returns a 64-bit tag.
-
-Variants can use a different number of rounds. For example, we proposed *SipHash-4-8* as a conservative version.
-
-The following versions are not described in the paper but were designed and analyzed to fulfill applications' needs:
-
-* *SipHash-128* returns a 128-bit tag instead of 64-bit. Versions with specified number of rounds are SipHash-2-4-128, SipHash4-8-128, and so on.
-
-* *HalfSipHash* works with 32-bit words instead of 64-bit, takes a 64-bit key,
-and returns 32-bit or 64-bit tags. For example, HalfSipHash-2-4-32 has 2
-compression rounds, 4 finalization rounds, and returns a 32-bit tag.
-
-
-## Security
-
-(Half)SipHash-*c*-*d* with *c* ≥ 2 and *d* ≥ 4 is expected to provide the maximum PRF
-security for any function with the same key and output size.
-
-The standard PRF security goal allow the attacker access to the output of SipHash on messages chosen adaptively by the attacker.
-
-Security is limited by the key size (128 bits for SipHash), such that
-attackers searching 2<sup>*s*</sup> keys have chance 2<sup>*s*−128</sup> of finding
-the SipHash key. 
-Security is also limited by the output size. In particular, when
-SipHash is used as a MAC, an attacker who blindly tries 2<sup>*s*</sup> tags will
-succeed with probability 2<sup>*s*-*t*</sup>, if *t* is that tag's bit size.
-
-
-## Research
-
-* [Research paper](https://www.aumasson.jp/siphash/siphash.pdf) "SipHash: a fast short-input PRF" (accepted at INDOCRYPT 2012)
-* [Slides](https://cr.yp.to/talks/2012.12.12/slides.pdf) of the presentation of SipHash at INDOCRYPT 2012 (Bernstein)
-* [Slides](https://www.aumasson.jp/siphash/siphash_slides.pdf) of the presentation of SipHash at the DIAC workshop (Aumasson)
-
-
-## Usage
-
-Running
-
-```sh
-  make
-```
-
-will build tests for 
-
-* SipHash-2-4-64
-* SipHash-2-4-128
-* HalfSipHash-2-4-32
-* HalfSipHash-2-4-64
-
-
-```C
-  ./test
-```
-
-verifies 64 test vectors, and
-
-```C
-  ./debug
-```
-
-does the same and prints intermediate values.
-
-The code can be adapted to implement SipHash-*c*-*d*, the version of SipHash
-with *c* compression rounds and *d* finalization rounds, by defining `cROUNDS`
-or `dROUNDS` when compiling.  This can be done with `-D` command line arguments
-to many compilers such as below.
-
-```sh
-gcc -Wall --std=c99 -DcROUNDS=2 -DdROUNDS=4 siphash.c halfsiphash.c test.c -o test
-```
-
-The `makefile` also takes *c* and *d* rounds values as parameters.
-
-```sh
-make cROUNDS=2 dROUNDS=4
-``` 
-
-Obviously, if the number of rounds is modified then the test vectors
-won't verify.
-
-## Intellectual property
-
-This code is copyright (c) 2014-2023 Jean-Philippe Aumasson, Daniel J.
-Bernstein. It is multi-licensed under
-
-* [CC0](./LICENCE_CC0)
-* [MIT](./LICENSE_MIT).
-* [Apache 2.0 with LLVM exceptions](./LICENSE_A2LLVM).
-
diff --git a/llvm/lib/Support/SipHash.cpp b/llvm/lib/Support/SipHash.cpp
index c6d16e205521d..ef882ae4d8745 100644
--- a/llvm/lib/Support/SipHash.cpp
+++ b/llvm/lib/Support/SipHash.cpp
@@ -1,185 +1,135 @@
-/*
-   SipHash reference C implementation
-
-   Copyright (c) 2012-2022 Jean-Philippe Aumasson
-   <[email protected]>
-   Copyright (c) 2012-2014 Daniel J. Bernstein <[email protected]>
-
-   To the extent possible under law, the author(s) have dedicated all copyright
-   and related and neighboring rights to this software to the public domain
-   worldwide. This software is distributed without any warranty.
-
-   You should have received a copy of the CC0 Public Domain Dedication along
-   with
-   this software. If not, see
-   <http://creativecommons.org/publicdomain/zero/1.0/>.
- */
-
-#include "siphash.h"
-#include <assert.h>
-#include <stddef.h>
-#include <stdint.h>
-
-/* default: SipHash-2-4 */
-#ifndef cROUNDS
-#define cROUNDS 2
-#endif
-#ifndef dROUNDS
-#define dROUNDS 4
-#endif
+//===--- SipHash.cpp - An ABI-stable string hash --------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
 
-#define ROTL(x, b) (uint64_t)(((x) << (b)) | ((x) >> (64 - (b))))
+#include "llvm/Support/Compiler.h"
+#include <cstdint>
 
-#define U32TO8_LE(p, v)                                                        \
-    (p)[0] = (uint8_t)((v));                                                   \
-    (p)[1] = (uint8_t)((v) >> 8);                                              \
-    (p)[2] = (uint8_t)((v) >> 16);                                             \
-    (p)[3] = (uint8_t)((v) >> 24);
+// Lightly adapted from the SipHash reference C implementation by
+// Jean-Philippe Aumasson and Daniel J. Bernstein.
 
-#define U64TO8_LE(p, v)                                                        \
-    U32TO8_LE((p), (uint32_t)((v)));                                           \
-    U32TO8_LE((p) + 4, (uint32_t)((v) >> 32));
+#define ROTL(x, b) (uint64_t)(((x) << (b)) | ((x) >> (64 - (b))))
 
 #define U8TO64_LE(p)                                                           \
-    (((uint64_t)((p)[0])) | ((uint64_t)((p)[1]) << 8) |                        \
-     ((uint64_t)((p)[2]) << 16) | ((uint64_t)((p)[3]) << 24) |                 \
-     ((uint64_t)((p)[4]) << 32) | ((uint64_t)((p)[5]) << 40) |                 \
-     ((uint64_t)((p)[6]) << 48) | ((uint64_t)((p)[7]) << 56))
+  (((uint64_t)((p)[0])) | ((uint64_t)((p)[1]) << 8) |                          \
+   ((uint64_t)((p)[2]) << 16) | ((uint64_t)((p)[3]) << 24) |                   \
+   ((uint64_t)((p)[4]) << 32) | ((uint64_t)((p)[5]) << 40) |                   \
+   ((uint64_t)((p)[6]) << 48) | ((uint64_t)((p)[7]) << 56))
 
 #define SIPROUND                                                               \
-    do {                                                                       \
-        v0 += v1;                                                              \
-        v1 = ROTL(v1, 13);                                                     \
-        v1 ^= v0;                                                              \
-        v0 = ROTL(v0, 32);                                                     \
-        v2 += v3;                                                              \
-        v3 = ROTL(v3, 16);                                                     \
-        v3 ^= v2;                                                              \
-        v0 += v3;                                                              \
-        v3 = ROTL(v3, 21);                                                     \
-        v3 ^= v0;                                                              \
-        v2 += v1;                                                              \
-        v1 = ROTL(v1, 17);                                                     \
-        v1 ^= v2;                                                              \
-        v2 = ROTL(v2, 32);                                                     \
-    } while (0)
-
-#ifdef DEBUG_SIPHASH
-#include <stdio.h>
-
-#define TRACE                                                                  \
-    do {                                                                       \
-        printf("(%3zu) v0 %016" PRIx64 "\n", inlen, v0);                       \
-        printf("(%3zu) v1 %016" PRIx64 "\n", inlen, v1);                       \
-        printf("(%3zu) v2 %016" PRIx64 "\n", inlen, v2);                       \
-        printf("(%3zu) v3 %016" PRIx64 "\n", inlen, v3);                       \
-    } while (0)
-#else
-#define TRACE
-#endif
-
-/*
-    Computes a SipHash value
-    *in: pointer to input data (read-only)
-    inlen: input data length in bytes (any size_t value)
-    *k: pointer to the key data (read-only), must be 16 bytes 
-    *out: pointer to output data (write-only), outlen bytes must be allocated
-    outlen: length of the output in bytes, must be 8 or 16
-*/
-int siphash(const void *in, const size_t inlen, const void *k, uint8_t *out,
-            const size_t outlen) {
-
-    const unsigned char *ni = (const unsigned char *)in;
-    const unsigned char *kk = (const unsigned char *)k;
-
-    assert((outlen == 8) || (outlen == 16));
-    uint64_t v0 = UINT64_C(0x736f6d6570736575);
-    uint64_t v1 = UINT64_C(0x646f72616e646f6d);
-    uint64_t v2 = UINT64_C(0x6c7967656e657261);
-    uint64_t v3 = UINT64_C(0x7465646279746573);
-    uint64_t k0 = U8TO64_LE(kk);
-    uint64_t k1 = U8TO64_LE(kk + 8);
-    uint64_t m;
-    int i;
-    const unsigned char *end = ni + inlen - (inlen % sizeof(uint64_t));
-    const int left = inlen & 7;
-    uint64_t b = ((uint64_t)inlen) << 56;
-    v3 ^= k1;
-    v2 ^= k0;
-    v1 ^= k1;
-    v0 ^= k0;
-
-    if (outlen == 16)
-        v1 ^= 0xee;
-
-    for (; ni != end; ni += 8) {
-        m = U8TO64_LE(ni);
-        v3 ^= m;
-
-        TRACE;
-        for (i = 0; i < cROUNDS; ++i)
-            SIPROUND;
-
-        v0 ^= m;
-    }
-
-    switch (left) {
-    case 7:
-        b |= ((uint64_t)ni[6]) << 48;
-        /* FALLTHRU */
-    case 6:
-        b |= ((uint64_t)ni[5]) << 40;
-        /* FALLTHRU */
-    case 5:
-        b |= ((uint64_t)ni[4]) << 32;
-        /* FALLTHRU */
-    case 4:
-        b |= ((uint64_t)ni[3]) << 24;
-        /* FALLTHRU */
-    case 3:
-        b |= ((uint64_t)ni[2]) << 16;
-        /* FALLTHRU */
-    case 2:
-        b |= ((uint64_t)ni[1]) << 8;
-        /* FALLTHRU */
-    case 1:
-        b |= ((uint64_t)ni[0]);
-        break;
-    case 0:
-        break;
-    }
-
-    v3 ^= b;
-
-    TRACE;
-    for (i = 0; i < cROUNDS; ++i)
-        SIPROUND;
-
-    v0 ^= b;
-
-    if (outlen == 16)
-        v2 ^= 0xee;
-    else
-        v2 ^= 0xff;
-
-    TRACE;
-    for (i = 0; i < dROUNDS; ++i)
-        SIPROUND;
+  do {                                                                         \
+    v0 += v1;                                                                  \
+    v1 = ROTL(v1, 13);                                                         \
+    v1 ^= v0;                                                                  \
+    v0 = ROTL(v0, 32);                                                         \
+    v2 += v3;                                                                  \
+    v3 = ROTL(v3, 16);                                                         \
+    v3 ^= v2;                                                                  \
+    v0 += v3;                                                                  \
+    v3 = ROTL(v3, 21);                                                         \
+    v3 ^= v0;                                                                  \
+    v2 += v1;                                                                  \
+    v1 = ROTL(v1, 17);                                                         \
+    v1 ^= v2;                                                                  \
+    v2 = ROTL(v2, 32);                                                         \
+  } while (0)
+
+template <int cROUNDS, int dROUNDS, class ResultTy>
+static inline ResultTy siphash(const unsigned char *in, uint64_t inlen,
+                               const unsigned char (&k)[16]) {
+
+  const unsigned char *ni = (const unsigned char *)in;
+  const unsigned char *kk = (const unsigned char *)k;
+
+  static_assert(sizeof(ResultTy) == 8 || sizeof(ResultTy) == 16,
+                "result type should be uint64_t or uint128_t");
+  uint64_t v0 = UINT64_C(0x736f6d6570736575);
+  uint64_t v1 = UINT64_C(0x646f72616e646f6d);
+  uint64_t v2 = UINT64_C(0x6c7967656e657261);
+  uint64_t v3 = UINT64_C(0x7465646279746573);
+  uint64_t k0 = U8TO64_LE(kk);
+  uint64_t k1 = U8TO64_LE(kk + 8);
+  uint64_t m;
+  int i;
+  const unsigned char *end = ni + inlen - (inlen % sizeof(uint64_t));
+  const int left = inlen & 7;
+  uint64_t b = ((uint64_t)inlen) << 56;
+  v3 ^= k1;
+  v2 ^= k0;
+  v1 ^= k1;
+  v0 ^= k0;
+
+  if (sizeof(ResultTy) == 16)
+    v1 ^= 0xee;
+
+  for (; ni != end; ni += 8) {
+    m = U8TO64_LE(ni);
+    v3 ^= m;
 
-    b = v0 ^ v1 ^ v2 ^ v3;
-    U64TO8_LE(out, b);
-
-    if (outlen == 8)
-        return 0;
-
-    v1 ^= 0xdd;
-
-    TRACE;
-    for (i = 0; i < dROUNDS; ++i)
-        SIPROUND;
-
-    b = v0 ^ v1 ^ v2 ^ v3;
-    U64TO8_LE(out + 8, b);
-
-    return 0;
+    for (i = 0; i < cROUNDS; ++i)
+      SIPROUND;
+
+    v0 ^= m;
+  }
+
+  switch (left) {
+  case 7:
+    b |= ((uint64_t)ni[6]) << 48;
+    LLVM_FALLTHROUGH;
+  case 6:
+    b |= ((uint64_t)ni[5]) << 40;
+    LLVM_FALLTHROUGH;
+  case 5:
+    b |= ((uint64_t)ni[4]) << 32;
+    LLVM_FALLTHROUGH;
+  case 4:
+    b |= ((uint64_t)ni[3]) << 24;
+    LLVM_FALLTHROUGH;
+  case 3:
+    b |= ((uint64_t)ni[2]) << 16;
+    LLVM_FALLTHROUGH;
+  case 2:
+    b |= ((uint64_t)ni[1]) << 8;
+    LLVM_FALLTHROUGH;
+  case 1:
+    b |= ((uint64_t)ni[0]);
+    break;
+  case 0:
+    break;
+  }
+
+  v3 ^= b;
+
+  for (i = 0; i < cROUNDS; ++i)
+    SIPROUND;
+
+  v0 ^= b;
+
+  if (sizeof(ResultTy) == 16)
+    v2 ^= 0xee;
+  else
+    v2 ^= 0xff;
+
+  for (i = 0; i < dROUNDS; ++i)
+    SIPROUND;
+
+  b = v0 ^ v1 ^ v2 ^ v3;
+
+  uint64_t firstHalf = b;
+  if (sizeof(ResultTy) == 8)
+    return firstHalf;
+
+  v1 ^= 0xdd;
+
+  for (i = 0; i < dROUNDS; ++i)
+    SIPROUND;
+
+  b = v0 ^ v1 ^ v2 ^ v3;
+  uint64_t secondHalf = b;
+
+  return firstHalf | (ResultTy(secondHalf) << (sizeof(ResultTy) == 8 ? 0 : 64));
 }

kbeyls

Do I understand correctly that "remove big-endian support" results in this code not running correctly on big-endian machines?
I don't recall the LLVM project claiming that it cannot run on big-endian machines.
If I understand this correctly, I would not remove the big-endian support.
(I think it also helps with keeping the source code closer to the upstream siphash version, which might have some benefits?)

llvm/lib/Support/SipHash.cpp

ahmedbougacha · 2024-06-05T18:09:06Z

Do I understand correctly that "remove big-endian support" results in this code not running correctly on big-endian machines? I don't recall the LLVM project claiming that it cannot run on big-endian machines. If I understand this correctly, I would not remove the big-endian support. (I think it also helps with keeping the source code closer to the upstream siphash version, which might have some benefits?)

That's a good call; we haven't tried to run it on BE hardware (I don't think we could have), but if it's supported I imagine this would have showed up on buildbots anyway. The way this version returns the result directly makes it interesting; I'm only assuming this would break, I haven't fully thought it through. Maybe we should stick to filling out the original buffer and let it all get optimized away. I'll take a look

ahmedbougacha · 2024-06-05T18:11:53Z

I'll also mention that I left the original variable naming but did re-format the file, whitespace being more friendly to diffs, and this being nice and tidily contained. If you or others have strong opinions, I'm happy to recapitalize the names.

kbeyls · 2024-06-05T20:48:37Z

I'll also mention that I left the original variable naming but did re-format the file, whitespace being more friendly to diffs, and this being nice and tidily contained. If you or others have strong opinions, I'm happy to recapitalize the names.

I don't have a strong opinion; I can see the argument both ways. Happy with it as is.

asl · 2024-06-06T22:21:02Z

That's a good call; we haven't tried to run it on BE hardware (I don't think we could have), but if it's supported I imagine this would have showed up on buildbots anyway.

There are (still) PowerPC buildbots. So, maybe we can check there?

asl · 2024-06-08T06:13:58Z

So, regarding big-endian things. Original siphash is always "little-endian" regardless of the host platform. On big endian hosts it essentially does byte swap in the end. We do not have it here, so we will end with different hashes on platforms with different endianness.

From the pauth perspective this is not a problem, as we do not do cross-platform hash calculation and further comparison. The hash output (discriminator value) is always compiled on compiler side and left as-is.

So, we can either keep the present code as-is. Or we can just sprinkle few calls from Endian.h to do byteswap on BE platforms.

kbeyls · 2024-06-08T09:01:27Z

So, regarding big-endian things. Original siphash is always "little-endian" regardless of the host platform. On big endian hosts it essentially does byte swap in the end. We do not have it here, so we will end with different hashes on platforms with different endianness.

From the pauth perspective this is not a problem, as we do not do cross-platform hash calculation and further comparison. The hash output (discriminator value) is always compiled on compiler side and left as-is.

So, we can either keep the present code as-is. Or we can just sprinkle few calls from Endian.h to do byteswap on BE platforms.

I was thinking that it is important that on both big and little endian platforms, the same hash is produced? Otherwise it becomes impossible to cross-compile from an other-endian host to an other-endian target? That basically would break ABI?
It would surface when combining libraries built on differently-endian platforms. Maybe this doesn't happen often in practice, but LLVM remains supported on big-endian platforms, so I would think it's important that those platforms can cross-compile correctly to other targets?

llvm/lib/Support/SipHash.cpp

asl · 2024-06-10T17:18:11Z

llvm/lib/Support/SipHash.cpp

-#define U64TO8_LE(p, v)                                                        \
-    U32TO8_LE((p), (uint32_t)((v)));                                           \
-    U32TO8_LE((p) + 4, (uint32_t)((v) >> 32));
+#define ROTL(x, b) (uint64_t)(((x) << (b)) | ((x) >> (64 - (b))))


Maybe we can use function call instead of macros?

E.g. (as in xxhash):

static uint64_t rotl64(uint64_t X, size_t R) { return (X << R) | (X >> (64 - R)); }

This, I'm a little more inclined to keep the macros as close to original as possible, it being the core of the function. It could help to name the function ROTL to match the original; we can turn the whole block into a function as well. I don't feel strongly either way, let me know which you prefer.

asl · 2024-06-10T17:34:31Z

llvm/lib/Support/SipHash.cpp

+  b = v0 ^ v1 ^ v2 ^ v3;
+  uint64_t secondHalf = b;
+
+  return firstHalf | (ResultTy(secondHalf) << 64);


Need to think more, if we'd need support::endian::byte_swap(firstHalf, llvm::endianness::little) here or not.

In the 128-bit result case we'd be exposed to the big-endian host ABI's treatment of uint128_t results, and from what I can find that seems to be quite messy! 644b473 describes it a little bit more, and brings back the original's returning via an i8 array. The caller can figure out what it wants to do with the result bytes.

For ptrauth in #93902 I made it read64le the result again here (and we do define the truncation as based on the little-endian interpretation of the raw hash bytes already.)

asl · 2024-06-11T19:01:02Z

@ahmedbougacha Anything left here? I think it is good to go after the two changes mentioned above. We can deal with cross-endian things afterwards as soon as we will have tests & buildbots

ahmedbougacha · 2024-06-11T23:15:19Z

I think this version should work on BE hosts, but that's only by thinking through the code. Only the bots will tell us one way or another ;)

kbeyls · 2024-06-12T07:18:16Z

I just checked if there indeed are big-endian bots which should pick up if a different hash gets produced on a big-endian system. I guess this bot (the only bot?) would pick it up: https://lab.llvm.org/buildbot/#/builders/231

I now also realize that there are no tests with this commit. I assume that later commits that test hash computation for a pointer authentication discriminator will implicitly test this. In the ideal world, it seems there should be a simple test, e.g. checking one 64 bit and one 128 bit hash as part of this commit?
I don't think not having the test should block landing this PR, but if it would be straightforward to add a test, then I think it is still worthwhile to do it as part of this PR.

asl · 2024-06-13T00:36:13Z

@kbeyls There are (some) tests in the follow-up PR https://github.com/llvm/llvm-project/pull/93902/files#diff-8df159460fc7a128734566054df883f3192b1b261dc8eac667933b4042e9af5f

ahmedbougacha · 2024-06-13T17:36:26Z

Yes, this doesn't have tests by itself because there's no exposed interface. It's certainly trivial to add one (which would allow using the reference test vectors).

I don't have strong arguments either way, but I figured the conservative option is to force hypothetical users to consider their use more seriously. One might argue that's not how we usually treat libSupport, so I'm happy to expose the raw function here.

kbeyls · 2024-06-13T18:36:49Z

Yes, this doesn't have tests by itself because there's no exposed interface. It's certainly trivial to add one (which would allow using the reference test vectors).

I don't have strong arguments either way, but I figured the conservative option is to force hypothetical users to consider their use more seriously. One might argue that's not how we usually treat libSupport, so I'm happy to expose the raw function here.

I see some value in being able to test with the reference test vectors to be fully sure that the implementation really implements SipHash. But as I said above, I'm happy with merging this as is.

ahmedbougacha · 2024-06-13T22:07:28Z

37c84b9 shows what I had in mind, let me know what you all think. I added:

void getSipHash_2_4_64(ArrayRef<uint8_t> In, const uint8_t (&K)[16],
                       uint8_t (&Out)[8]);

void getSipHash_2_4_128(ArrayRef<uint8_t> In, const uint8_t (&K)[16],
                        uint8_t (&Out)[16]);

as the core interfaces, and mimicked the ref. test harness to reuse the same test vectors. If this seems reasonable to yall I'm happy to extract the vectors.h file from the ref. implementation into the "Import original sources" PR – that's why I kept it open ;)

kbeyls · 2024-06-14T07:23:30Z

37c84b9 shows what I had in mind, let me know what you all think. I added:
void getSipHash_2_4_64(ArrayRef<uint8_t> In, const uint8_t (&K)[16],
                       uint8_t (&Out)[8]);

void getSipHash_2_4_128(ArrayRef<uint8_t> In, const uint8_t (&K)[16],
                        uint8_t (&Out)[16]);
as the core interfaces, and mimicked the ref. test harness to reuse the same test vectors. If this seems reasonable to yall I'm happy to extract the vectors.h file from the ref. implementation into the "Import original sources" PR – that's why I kept it open ;)

Thanks, that looks good to me.

asl

+1

While there, give it our usual file header and an acknowledgement.

Start building it as part of the library, with some minor tweaks compared to the reference implementation: - templatize cROUNDS/dROUNDS, as well as 8B/16B result type - replace assert with static_assert - use LLVM_FALLTHROUGH This also exports interfaces for SipHash-2-4-64/-128, and tests them using the reference test vectors.

ahmedbougacha · 2024-06-15T00:56:02Z

I just checked if there indeed are big-endian bots which should pick up if a different hash gets produced on a big-endian system. I guess this bot (the only bot?) would pick it up: https://lab.llvm.org/buildbot/#/builders/231

For whatever reason I can't find 231, but on clang-ppc64be-linux-test-suite/multistage, looks like the unittests are passing after all!

MaskRay · 2024-07-26T06:11:29Z

I wonder whether the PAuth use case should use 128-bit xxhash instead (#95863), which is much faster.

64-bit xxhash is used by lld for string deduplication and clang for serialization.

asl · 2024-07-26T06:17:16Z

I wonder whether the PAuth use case should use 128-bit xxhash instead (#95863), which is much faster.

My understanding (@ahmedbougacha please correct me if I'm wrong) is that siphash is essentially a part of ABI on arm64e. And we can expect code to be shared between arm64e and other pauth-aware platforms expecting the same discriminator values.

kbeyls self-requested a review June 4, 2024 20:15

ahmedbougacha force-pushed the users/ahmedbougacha/integrate-siphash-into-libsupport branch from 1d85610 to 6679d54 Compare June 4, 2024 22:09

ahmedbougacha marked this pull request as ready for review June 4, 2024 22:11

llvmbot added the llvm:support label Jun 4, 2024

ahmedbougacha mentioned this pull request Jun 4, 2024

[Support] Add SipHash-based 16-bit ptrauth stable hash. #93902

Merged

ahmedbougacha self-assigned this Jun 4, 2024

kbeyls mentioned this pull request Jun 5, 2024

[Support] Import SipHash c reference implementation. #94393

Merged

kbeyls requested changes Jun 5, 2024

View reviewed changes

llvm/lib/Support/SipHash.cpp Show resolved Hide resolved

llvm/lib/Support/SipHash.cpp Outdated Show resolved Hide resolved

llvm/lib/Support/SipHash.cpp Outdated Show resolved Hide resolved

kbeyls reviewed Jun 5, 2024

View reviewed changes

llvm/lib/Support/SipHash.cpp Outdated Show resolved Hide resolved

asl reviewed Jun 10, 2024

View reviewed changes

ahmedbougacha requested a review from kbeyls June 11, 2024 23:13

kbeyls approved these changes Jun 12, 2024

View reviewed changes

kbeyls approved these changes Jun 14, 2024

View reviewed changes

asl approved these changes Jun 14, 2024

View reviewed changes

Base automatically changed from users/ahmedbougacha/import-siphash-reference-impl to main June 14, 2024 23:56

ahmedbougacha added 3 commits June 14, 2024 17:05

[Support] Reformat SipHash.cpp to match libSupport.

7767727

While there, give it our usual file header and an acknowledgement.

[Support] Remove SipHash tracing support.

aec3a94

ahmedbougacha force-pushed the users/ahmedbougacha/integrate-siphash-into-libsupport branch from 52ed57f to e0343b5 Compare June 15, 2024 00:06

ahmedbougacha merged commit 577c3f1 into main Jun 15, 2024
3 of 5 checks passed

ahmedbougacha deleted the users/ahmedbougacha/integrate-siphash-into-libsupport branch June 15, 2024 00:07

jpienaar mentioned this pull request Jun 17, 2024

[mlirc] Add missing extern C #95829

Merged

[Support] Integrate SipHash.cpp into libSupport. #94394

[Support] Integrate SipHash.cpp into libSupport. #94394

Uh oh!

Conversation

ahmedbougacha commented Jun 4, 2024

Uh oh!

llvmbot commented Jun 4, 2024

Uh oh!

kbeyls left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahmedbougacha commented Jun 5, 2024

Uh oh!

ahmedbougacha commented Jun 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kbeyls commented Jun 5, 2024

Uh oh!

asl commented Jun 6, 2024

Uh oh!

asl commented Jun 8, 2024

Uh oh!

kbeyls commented Jun 8, 2024

Uh oh!

Uh oh!

asl Jun 10, 2024

Choose a reason for hiding this comment

Uh oh!

ahmedbougacha Jun 11, 2024

Choose a reason for hiding this comment

Uh oh!

asl Jun 10, 2024

Choose a reason for hiding this comment

Uh oh!

ahmedbougacha Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asl commented Jun 11, 2024

Uh oh!

ahmedbougacha commented Jun 11, 2024

Uh oh!

kbeyls commented Jun 12, 2024

Uh oh!

asl commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahmedbougacha commented Jun 13, 2024

Uh oh!

kbeyls commented Jun 13, 2024

Uh oh!

ahmedbougacha commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kbeyls commented Jun 14, 2024

Uh oh!

asl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ahmedbougacha commented Jun 15, 2024

Uh oh!

MaskRay commented Jul 26, 2024

Uh oh!

asl commented Jul 26, 2024

Uh oh!

Uh oh!

ahmedbougacha commented Jun 5, 2024 •

edited

Loading

ahmedbougacha Jun 11, 2024 •

edited

Loading

asl commented Jun 13, 2024 •

edited

Loading

ahmedbougacha commented Jun 13, 2024 •

edited

Loading