[lld-macho,BalancedPartition] Simplify relocation hash and avoid xxHash #121729

MaskRay · 2025-01-06T04:18:01Z

xxHash, inferior to xxh3, is discouraged. We try not to use xxhash in
lld.

Switch to read32le for content hash and xxh3/stable_hash_combine for
relocation hash. Remove the intermediate std::string for relocation
hash.

Change the tail hashing scheme to consider individual bytes instead.
This helps group 0102 and 0201 together. The benefit is negligible,
though.

Created using spr 1.3.5-bogner

llvmbot · 2025-01-06T04:18:40Z

@llvm/pr-subscribers-lld-macho

@llvm/pr-subscribers-lld

Author: Fangrui Song (MaskRay)

Changes

xxHash, inferior to xxh3, is discouraged. We try not to use xxhash in
lld.

Switch to read32le for content hash and xxh3/stable_hash_combine for
relocation hash. Remove the intermediate std::string for relocation
hash.

Full diff: https://github.com/llvm/llvm-project/pull/121729.diff

2 Files Affected:

(modified) lld/MachO/BPSectionOrderer.h (+16-17)
(modified) lld/include/lld/Common/BPSectionOrdererBase.h (-9)

diff --git a/lld/MachO/BPSectionOrderer.h b/lld/MachO/BPSectionOrderer.h
index 8ba911fcc546bd..3de815d79b0f4c 100644
--- a/lld/MachO/BPSectionOrderer.h
+++ b/lld/MachO/BPSectionOrderer.h
@@ -19,7 +19,10 @@
 #include "Symbols.h"
 #include "lld/Common/BPSectionOrdererBase.h"
 #include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/StableHashing.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Endian.h"
+#include "llvm/Support/xxhash.h"
 
 namespace lld::macho {
 
@@ -91,22 +94,20 @@ class BPSectionMacho : public BPSectionBase {
     constexpr unsigned windowSize = 4;
 
     // Calculate content hashes
-    size_t dataSize = isec->data.size();
-    for (size_t i = 0; i < dataSize; i++) {
-      auto window = isec->data.drop_front(i).take_front(windowSize);
-      hashes.push_back(xxHash64(window));
-    }
+    ArrayRef<uint8_t> data = isec->data;
+    for (size_t i = 0; i <= data.size() - windowSize; i++)
+      hashes.push_back(llvm::support::endian::read32le(data.data() + i));
 
     // Calculate relocation hashes
     for (const auto &r : isec->relocs) {
-      if (r.length == 0 || r.referent.isNull() || r.offset >= isec->data.size())
+      if (r.length == 0 || r.referent.isNull() || r.offset >= data.size())
         continue;
 
       uint64_t relocHash = getRelocHash(r, sectionToIdx);
       uint32_t start = (r.offset < windowSize) ? 0 : r.offset - windowSize + 1;
       for (uint32_t i = start; i < r.offset + r.length; i++) {
-        auto window = isec->data.drop_front(i).take_front(windowSize);
-        hashes.push_back(xxHash64(window) + relocHash);
+        auto window = data.drop_front(i).take_front(windowSize);
+        hashes.push_back(xxh3_64bits(window) ^ relocHash);
       }
     }
 
@@ -124,19 +125,17 @@ class BPSectionMacho : public BPSectionBase {
     std::optional<uint64_t> sectionIdx;
     if (auto it = sectionToIdx.find(isec); it != sectionToIdx.end())
       sectionIdx = it->second;
-    std::string kind;
+    uint64_t kind = -1, value = 0;
     if (isec)
-      kind = ("Section " + Twine(isec->kind())).str();
+      kind = uint64_t(isec->kind());
 
     if (auto *sym = reloc.referent.dyn_cast<Symbol *>()) {
-      kind += (" Symbol " + Twine(sym->kind())).str();
-      if (auto *d = llvm::dyn_cast<Defined>(sym)) {
-        return BPSectionBase::getRelocHash(kind, sectionIdx.value_or(0),
-                                           d->value, reloc.addend);
-      }
+      kind = (kind << 8) | uint8_t(sym->kind());
+      if (auto *d = llvm::dyn_cast<Defined>(sym))
+        value = d->value;
     }
-    return BPSectionBase::getRelocHash(kind, sectionIdx.value_or(0), 0,
-                                       reloc.addend);
+    return llvm::stable_hash_combine(kind, sectionIdx.value_or(0), value,
+                                     reloc.addend);
   }
 };
 
diff --git a/lld/include/lld/Common/BPSectionOrdererBase.h b/lld/include/lld/Common/BPSectionOrdererBase.h
index e2cb41f69cc684..29599afa03bd40 100644
--- a/lld/include/lld/Common/BPSectionOrdererBase.h
+++ b/lld/include/lld/Common/BPSectionOrdererBase.h
@@ -18,7 +18,6 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Twine.h"
-#include "llvm/Support/xxhash.h"
 #include <memory>
 #include <optional>
 
@@ -56,14 +55,6 @@ class BPSectionBase {
     return P1;
   }
 
-  static uint64_t getRelocHash(llvm::StringRef kind, uint64_t sectionIdx,
-                               uint64_t offset, uint64_t addend) {
-    return llvm::xxHash64((kind + ": " + llvm::Twine::utohexstr(sectionIdx) +
-                           " + " + llvm::Twine::utohexstr(offset) + " + " +
-                           llvm::Twine::utohexstr(addend))
-                              .str());
-  }
-
   /// Reorders sections using balanced partitioning algorithm based on profile
   /// data.
   static llvm::DenseMap<const BPSectionBase *, size_t>

ellishg · 2025-01-06T21:41:45Z

lld/MachO/BPSectionOrderer.h

-      hashes.push_back(xxHash64(window));
-    }
+    ArrayRef<uint8_t> data = isec->data;
+    for (size_t i = 0; i <= data.size() - windowSize; i++)


Sections shorter than 4 bytes are trivial functions that are likely
foled by ICF.

We should still guard against the case when data.size() < windowSize because these sections could be data sections in the future

ellishg · 2025-01-06T22:49:18Z

lld/MachO/BPSectionOrderer.h

-    }
+    ArrayRef<uint8_t> data = isec->data;
+    for (size_t i = 0; i <= data.size() - windowSize; i++)
+      hashes.push_back(llvm::support::endian::read32le(data.data() + i));


This changes how we take hashes at the end of the section, but it could be a change for the better. I'm testing this PR on our apps to see if there is a size regression or not

ellishg

Only very minor size changes on my end. I'm curious if @Colibrow sees and size regressions. LGTM after fixing the data.size() < windowSize issue.

Created using spr 1.3.5-bogner

MaskRay · 2025-01-07T04:13:04Z

Only very minor size changes on my end. I'm curious if @Colibrow sees and size regressions. LGTM after fixing the data.size() < windowSize issue.

Thanks. I've changed the tail hashing scheme to:

    if (data.size() >= windowSize)
      for (size_t i = 0; i <= data.size() - windowSize; ++i)
        hashes.push_back(llvm::support::endian::read32le(data.data() + i));
    for (uint8_t byte : data.take_back(windowSize - 1))
      hashes.push_back(byte);

This helps group 0102 and 0201 together. The similarity between 030201 and 0201 is now the same as 030201 and 0102, which should be fine.

ellishg · 2025-01-07T04:36:08Z

lld/MachO/BPSectionOrderer.h

+    ArrayRef<uint8_t> data = isec->data;
+    if (data.size() >= windowSize)
+      for (size_t i = 0; i <= data.size() - windowSize; ++i)
+        hashes.push_back(llvm::support::endian::read32le(data.data() + i));


Why are we using read32le() here and xxh3_64bits() for relocations below? As I understand, read32le() only works here because the window size is exactly 4. I chose this window size because it gave the best results on a few binaries, but other window sizes could work better for other scenarios. If we use xxh3_64bits() in both cases, we are free to change windowSize.

I agree that this is weird. xxh3_64bits(window) for relocation hashing is just because it's easy: no need to handle the shorter-than-4-bytes case.

Hmmm. Reloc::length is actually a logarithm field. For Mach-O arm64, the relocation offsets are aligned to start of the instruction. Shall we compute one single hash for a relocation? I guess the sliding window doesn't help, but happy to be proven wrong.

file size gzipped size

libsample.so 3,181,560 1,487,043

libsample.so_xxh3 3,181,544 1,486,208

It seems good although the change is minor. The object file size change confused me. Do you have any ideas?

The uncompressed size could change due to alignment and changes in the unwind info section. You can use bloaty to verify this. IIRC [TEXT] will show alignment changes, but that isn't well documented.

Hmmm. Reloc::length is actually a logarithm field. For Mach-O arm64, the relocation offsets are aligned to start of the instruction. Shall we compute one single hash for a relocation? I guess the sliding window doesn't help, but happy to be proven wrong.

I settled on the current implementation by trying many different hashing strategies. I got the best results by hashing a sliding window for relocations and the section data. I'm open to changing this if we run experiments to confirm there is no regression. For now, I think those more aggressive changes should be a separate PR.

I would rather use xxh3_64bits() here which allows for more flexibility and consistency

MaskRay · 2025-01-08T02:59:37Z

@ellishg Does the current PR look good to you?

ellishg · 2025-01-14T18:32:06Z

@ellishg Does the current PR look good to you?

Can we use only xxh3_64bits() in this PR for simplicity and flexibility?

MaskRay · 2025-01-15T03:44:58Z

@ellishg Does the current PR look good to you?

Can we use only xxh3_64bits() in this PR for simplicity and flexibility?

Do you want two PRs, one changing xxHash64 to xxh3_64bits, and the other simplifying the relocation hash?

If this is your intention, I can pre-land one commit to make just the xxHash64 change, and rebase this PR.

ellishg · 2025-01-15T07:54:46Z

@ellishg Does the current PR look good to you?

Can we use only xxh3_64bits() in this PR for simplicity and flexibility?

Do you want two PRs, one changing xxHash64 to xxh3_64bits, and the other simplifying the relocation hash?

If this is your intention, I can pre-land one commit to make just the xxHash64 change, and rebase this PR.

One PR should be ok. I was just saying we should use xxh3_64bits() instead of read32le() on line 100.

MaskRay · 2025-01-16T03:10:01Z

@ellishg Does the current PR look good to you?

Can we use only xxh3_64bits() in this PR for simplicity and flexibility?

Do you want two PRs, one changing xxHash64 to xxh3_64bits, and the other simplifying the relocation hash?
If this is your intention, I can pre-land one commit to make just the xxHash64 change, and rebase this PR.

One PR should be ok. I was just saying we should use xxh3_64bits() instead of read32le() on line 100.

Using a hash function like xxh3_64bits for the k-mers introduces unnecessary performance overhead...
It's ok to have two sets of values into hashes: (a) instructions/data (b) xxh3_64bits derived values (relocations).

Well, for this PR to proceed I could give up if you are really strong about this.
But for the ELF port I'll ensure that we don't introduce this slight performance overhead...

carlocab

Seems ok to me given #121729 (comment).

Caveat: I'm not super comfortable/familiar with this part of LLD :)

ellishg · 2025-01-16T17:11:34Z

@ellishg Does the current PR look good to you?

Can we use only xxh3_64bits() in this PR for simplicity and flexibility?

Do you want two PRs, one changing xxHash64 to xxh3_64bits, and the other simplifying the relocation hash?

If this is your intention, I can pre-land one commit to make just the xxHash64 change, and rebase this PR.

One PR should be ok. I was just saying we should use xxh3_64bits() instead of read32le() on line 100.

Using a hash function like xxh3_64bits for the k-mers introduces unnecessary performance overhead...

It's ok to have two sets of values into hashes: (a) instructions/data (b) xxh3_64bits derived values (relocations).

Well, for this PR to proceed I could give up if you are really strong about this.

But for the ELF port I'll ensure that we don't introduce this slight performance overhead...

I would guess the performance overhead of hashing is negligible compared to the runtime of the BP algorithm. I'm ok with this for now, but we should use the hashing algorithm if we ever find out that it can be beneficial to change the k-mer size (the number of bytes to hash).

… avoid xxHash xxHash, inferior to xxh3, is discouraged. We try not to use xxhash in lld. Switch to read32le for content hash and xxh3/stable_hash_combine for relocation hash. Remove the intermediate std::string for relocation hash. Change the tail hashing scheme to consider individual bytes instead. This helps group 0102 and 0201 together. The benefit is negligible, though. Pull Request: llvm/llvm-project#121729

[𝘀𝗽𝗿] initial version

671e163

Created using spr 1.3.5-bogner

MaskRay requested a review from ellishg January 6, 2025 04:18

llvmbot added lld lld:MachO labels Jan 6, 2025

MaskRay requested a review from kyulee-com January 6, 2025 04:18

ellishg reviewed Jan 6, 2025

View reviewed changes

ellishg reviewed Jan 7, 2025

View reviewed changes

MaskRay added 2 commits January 6, 2025 20:09

change tail hashing scheme

3d4e558

Created using spr 1.3.5-bogner

.

86bd9d9

Created using spr 1.3.5-bogner

ellishg reviewed Jan 7, 2025

View reviewed changes

MaskRay mentioned this pull request Jan 14, 2025

[ELF] Add BPSectionOrderer options #120514

Merged

carlocab approved these changes Jan 16, 2025

View reviewed changes

ellishg approved these changes Jan 16, 2025

View reviewed changes

MaskRay merged commit 60e4d24 into main Jan 16, 2025
8 checks passed

MaskRay deleted the users/MaskRay/spr/lld-machobalancedpartition-simplify-relocation-hash-and-avoid-xxhash branch January 16, 2025 17:31

Colibrow added a commit to Colibrow/llvm-project that referenced this pull request Jan 23, 2025

change content hash as llvm#121729

6cfaa8b

Colibrow added a commit to Colibrow/llvm-project that referenced this pull request Jan 28, 2025

change content hash as llvm#121729

637a269

MaskRay pushed a commit to Colibrow/llvm-project that referenced this pull request Feb 2, 2025

change content hash as llvm#121729

d705526

MaskRay pushed a commit to Colibrow/llvm-project that referenced this pull request Feb 2, 2025

change content hash as llvm#121729

95a36a5

file	size	gzipped size
libsample.so	3,181,560	1,487,043
libsample.so_xxh3	3,181,544	1,486,208

[lld-macho,BalancedPartition] Simplify relocation hash and avoid xxHash #121729

[lld-macho,BalancedPartition] Simplify relocation hash and avoid xxHash #121729

Uh oh!

Conversation

MaskRay commented Jan 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellishg Jan 6, 2025

Choose a reason for hiding this comment

Uh oh!

ellishg Jan 6, 2025

Choose a reason for hiding this comment

Uh oh!

ellishg left a comment

Choose a reason for hiding this comment

Uh oh!

MaskRay commented Jan 7, 2025

Uh oh!

ellishg Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

MaskRay Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Colibrow Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ellishg Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

ellishg Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

MaskRay commented Jan 8, 2025

Uh oh!

ellishg commented Jan 14, 2025

Uh oh!

MaskRay commented Jan 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellishg commented Jan 15, 2025

Uh oh!

MaskRay commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carlocab left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ellishg commented Jan 16, 2025

Uh oh!

Uh oh!

Uh oh!

MaskRay commented Jan 6, 2025 •

edited

Loading

llvmbot commented Jan 6, 2025 •

edited

Loading

MaskRay Jan 7, 2025 •

edited

Loading

Colibrow Jan 7, 2025 •

edited

Loading

MaskRay commented Jan 15, 2025 •

edited

Loading

MaskRay commented Jan 16, 2025 •

edited

Loading

carlocab left a comment •

edited

Loading