Skip to content

[lld-macho,BalancedPartition] Simplify relocation hash and avoid xxHash #121729

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Jan 6, 2025

xxHash, inferior to xxh3, is discouraged. We try not to use xxhash in
lld.

Switch to read32le for content hash and xxh3/stable_hash_combine for
relocation hash. Remove the intermediate std::string for relocation
hash.

Change the tail hashing scheme to consider individual bytes instead.
This helps group 0102 and 0201 together. The benefit is negligible,
though.

Created using spr 1.3.5-bogner
@llvmbot
Copy link
Member

llvmbot commented Jan 6, 2025

@llvm/pr-subscribers-lld-macho

@llvm/pr-subscribers-lld

Author: Fangrui Song (MaskRay)

Changes

xxHash, inferior to xxh3, is discouraged. We try not to use xxhash in
lld.

Switch to read32le for content hash and xxh3/stable_hash_combine for
relocation hash. Remove the intermediate std::string for relocation
hash.


Full diff: https://github.com/llvm/llvm-project/pull/121729.diff

2 Files Affected:

  • (modified) lld/MachO/BPSectionOrderer.h (+16-17)
  • (modified) lld/include/lld/Common/BPSectionOrdererBase.h (-9)
diff --git a/lld/MachO/BPSectionOrderer.h b/lld/MachO/BPSectionOrderer.h
index 8ba911fcc546bd..3de815d79b0f4c 100644
--- a/lld/MachO/BPSectionOrderer.h
+++ b/lld/MachO/BPSectionOrderer.h
@@ -19,7 +19,10 @@
 #include "Symbols.h"
 #include "lld/Common/BPSectionOrdererBase.h"
 #include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/StableHashing.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Endian.h"
+#include "llvm/Support/xxhash.h"
 
 namespace lld::macho {
 
@@ -91,22 +94,20 @@ class BPSectionMacho : public BPSectionBase {
     constexpr unsigned windowSize = 4;
 
     // Calculate content hashes
-    size_t dataSize = isec->data.size();
-    for (size_t i = 0; i < dataSize; i++) {
-      auto window = isec->data.drop_front(i).take_front(windowSize);
-      hashes.push_back(xxHash64(window));
-    }
+    ArrayRef<uint8_t> data = isec->data;
+    for (size_t i = 0; i <= data.size() - windowSize; i++)
+      hashes.push_back(llvm::support::endian::read32le(data.data() + i));
 
     // Calculate relocation hashes
     for (const auto &r : isec->relocs) {
-      if (r.length == 0 || r.referent.isNull() || r.offset >= isec->data.size())
+      if (r.length == 0 || r.referent.isNull() || r.offset >= data.size())
         continue;
 
       uint64_t relocHash = getRelocHash(r, sectionToIdx);
       uint32_t start = (r.offset < windowSize) ? 0 : r.offset - windowSize + 1;
       for (uint32_t i = start; i < r.offset + r.length; i++) {
-        auto window = isec->data.drop_front(i).take_front(windowSize);
-        hashes.push_back(xxHash64(window) + relocHash);
+        auto window = data.drop_front(i).take_front(windowSize);
+        hashes.push_back(xxh3_64bits(window) ^ relocHash);
       }
     }
 
@@ -124,19 +125,17 @@ class BPSectionMacho : public BPSectionBase {
     std::optional<uint64_t> sectionIdx;
     if (auto it = sectionToIdx.find(isec); it != sectionToIdx.end())
       sectionIdx = it->second;
-    std::string kind;
+    uint64_t kind = -1, value = 0;
     if (isec)
-      kind = ("Section " + Twine(isec->kind())).str();
+      kind = uint64_t(isec->kind());
 
     if (auto *sym = reloc.referent.dyn_cast<Symbol *>()) {
-      kind += (" Symbol " + Twine(sym->kind())).str();
-      if (auto *d = llvm::dyn_cast<Defined>(sym)) {
-        return BPSectionBase::getRelocHash(kind, sectionIdx.value_or(0),
-                                           d->value, reloc.addend);
-      }
+      kind = (kind << 8) | uint8_t(sym->kind());
+      if (auto *d = llvm::dyn_cast<Defined>(sym))
+        value = d->value;
     }
-    return BPSectionBase::getRelocHash(kind, sectionIdx.value_or(0), 0,
-                                       reloc.addend);
+    return llvm::stable_hash_combine(kind, sectionIdx.value_or(0), value,
+                                     reloc.addend);
   }
 };
 
diff --git a/lld/include/lld/Common/BPSectionOrdererBase.h b/lld/include/lld/Common/BPSectionOrdererBase.h
index e2cb41f69cc684..29599afa03bd40 100644
--- a/lld/include/lld/Common/BPSectionOrdererBase.h
+++ b/lld/include/lld/Common/BPSectionOrdererBase.h
@@ -18,7 +18,6 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Twine.h"
-#include "llvm/Support/xxhash.h"
 #include <memory>
 #include <optional>
 
@@ -56,14 +55,6 @@ class BPSectionBase {
     return P1;
   }
 
-  static uint64_t getRelocHash(llvm::StringRef kind, uint64_t sectionIdx,
-                               uint64_t offset, uint64_t addend) {
-    return llvm::xxHash64((kind + ": " + llvm::Twine::utohexstr(sectionIdx) +
-                           " + " + llvm::Twine::utohexstr(offset) + " + " +
-                           llvm::Twine::utohexstr(addend))
-                              .str());
-  }
-
   /// Reorders sections using balanced partitioning algorithm based on profile
   /// data.
   static llvm::DenseMap<const BPSectionBase *, size_t>

hashes.push_back(xxHash64(window));
}
ArrayRef<uint8_t> data = isec->data;
for (size_t i = 0; i <= data.size() - windowSize; i++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sections shorter than 4 bytes are trivial functions that are likely
foled by ICF.

We should still guard against the case when data.size() < windowSize because these sections could be data sections in the future

}
ArrayRef<uint8_t> data = isec->data;
for (size_t i = 0; i <= data.size() - windowSize; i++)
hashes.push_back(llvm::support::endian::read32le(data.data() + i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes how we take hashes at the end of the section, but it could be a change for the better. I'm testing this PR on our apps to see if there is a size regression or not

Copy link
Contributor

@ellishg ellishg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only very minor size changes on my end. I'm curious if @Colibrow sees and size regressions. LGTM after fixing the data.size() < windowSize issue.

Created using spr 1.3.5-bogner
.
Created using spr 1.3.5-bogner
@MaskRay
Copy link
Member Author

MaskRay commented Jan 7, 2025

Only very minor size changes on my end. I'm curious if @Colibrow sees and size regressions. LGTM after fixing the data.size() < windowSize issue.

Thanks. I've changed the tail hashing scheme to:

    if (data.size() >= windowSize)
      for (size_t i = 0; i <= data.size() - windowSize; ++i)
        hashes.push_back(llvm::support::endian::read32le(data.data() + i));
    for (uint8_t byte : data.take_back(windowSize - 1))
      hashes.push_back(byte);

This helps group 0102 and 0201 together. The similarity between 030201 and 0201 is now the same as 030201 and 0102, which should be fine.

ArrayRef<uint8_t> data = isec->data;
if (data.size() >= windowSize)
for (size_t i = 0; i <= data.size() - windowSize; ++i)
hashes.push_back(llvm::support::endian::read32le(data.data() + i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using read32le() here and xxh3_64bits() for relocations below? As I understand, read32le() only works here because the window size is exactly 4. I chose this window size because it gave the best results on a few binaries, but other window sizes could work better for other scenarios. If we use xxh3_64bits() in both cases, we are free to change windowSize.

Copy link
Member Author

@MaskRay MaskRay Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is weird. xxh3_64bits(window) for relocation hashing is just because it's easy: no need to handle the shorter-than-4-bytes case.

Hmmm. Reloc::length is actually a logarithm field. For Mach-O arm64, the relocation offsets are aligned to start of the instruction. Shall we compute one single hash for a relocation? I guess the sliding window doesn't help, but happy to be proven wrong.

Copy link
Contributor

@Colibrow Colibrow Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file size gzipped size
libsample.so 3,181,560 1,487,043
libsample.so_xxh3 3,181,544 1,486,208

It seems good although the change is minor. The object file size change confused me. Do you have any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uncompressed size could change due to alignment and changes in the unwind info section. You can use bloaty to verify this. IIRC [TEXT] will show alignment changes, but that isn't well documented.

Hmmm. Reloc::length is actually a logarithm field. For Mach-O arm64, the relocation offsets are aligned to start of the instruction. Shall we compute one single hash for a relocation? I guess the sliding window doesn't help, but happy to be proven wrong.

I settled on the current implementation by trying many different hashing strategies. I got the best results by hashing a sliding window for relocations and the section data. I'm open to changing this if we run experiments to confirm there is no regression. For now, I think those more aggressive changes should be a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather use xxh3_64bits() here which allows for more flexibility and consistency

@MaskRay
Copy link
Member Author

MaskRay commented Jan 8, 2025

@ellishg Does the current PR look good to you?

@ellishg
Copy link
Contributor

ellishg commented Jan 14, 2025

@ellishg Does the current PR look good to you?

Can we use only xxh3_64bits() in this PR for simplicity and flexibility?

@MaskRay
Copy link
Member Author

MaskRay commented Jan 15, 2025

@ellishg Does the current PR look good to you?

Can we use only xxh3_64bits() in this PR for simplicity and flexibility?

Do you want two PRs, one changing xxHash64 to xxh3_64bits, and the other simplifying the relocation hash?

If this is your intention, I can pre-land one commit to make just the xxHash64 change, and rebase this PR.

@ellishg
Copy link
Contributor

ellishg commented Jan 15, 2025

@ellishg Does the current PR look good to you?

Can we use only xxh3_64bits() in this PR for simplicity and flexibility?

Do you want two PRs, one changing xxHash64 to xxh3_64bits, and the other simplifying the relocation hash?

If this is your intention, I can pre-land one commit to make just the xxHash64 change, and rebase this PR.

One PR should be ok. I was just saying we should use xxh3_64bits() instead of read32le() on line 100.

@MaskRay
Copy link
Member Author

MaskRay commented Jan 16, 2025

@ellishg Does the current PR look good to you?

Can we use only xxh3_64bits() in this PR for simplicity and flexibility?

Do you want two PRs, one changing xxHash64 to xxh3_64bits, and the other simplifying the relocation hash?
If this is your intention, I can pre-land one commit to make just the xxHash64 change, and rebase this PR.

One PR should be ok. I was just saying we should use xxh3_64bits() instead of read32le() on line 100.

Using a hash function like xxh3_64bits for the k-mers introduces unnecessary performance overhead...
It's ok to have two sets of values into hashes: (a) instructions/data (b) xxh3_64bits derived values (relocations).

Well, for this PR to proceed I could give up if you are really strong about this.
But for the ELF port I'll ensure that we don't introduce this slight performance overhead...

Copy link
Member

@carlocab carlocab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok to me given #121729 (comment).

Caveat: I'm not super comfortable/familiar with this part of LLD :)

@ellishg
Copy link
Contributor

ellishg commented Jan 16, 2025

@ellishg Does the current PR look good to you?

Can we use only xxh3_64bits() in this PR for simplicity and flexibility?

Do you want two PRs, one changing xxHash64 to xxh3_64bits, and the other simplifying the relocation hash?

If this is your intention, I can pre-land one commit to make just the xxHash64 change, and rebase this PR.

One PR should be ok. I was just saying we should use xxh3_64bits() instead of read32le() on line 100.

Using a hash function like xxh3_64bits for the k-mers introduces unnecessary performance overhead...

It's ok to have two sets of values into hashes: (a) instructions/data (b) xxh3_64bits derived values (relocations).

Well, for this PR to proceed I could give up if you are really strong about this.

But for the ELF port I'll ensure that we don't introduce this slight performance overhead...

I would guess the performance overhead of hashing is negligible compared to the runtime of the BP algorithm. I'm ok with this for now, but we should use the hashing algorithm if we ever find out that it can be beneficial to change the k-mer size (the number of bytes to hash).

@MaskRay MaskRay merged commit 60e4d24 into main Jan 16, 2025
8 checks passed
@MaskRay MaskRay deleted the users/MaskRay/spr/lld-machobalancedpartition-simplify-relocation-hash-and-avoid-xxhash branch January 16, 2025 17:31
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 16, 2025
… avoid xxHash

xxHash, inferior to xxh3, is discouraged. We try not to use xxhash in
lld.

Switch to read32le for content hash and xxh3/stable_hash_combine for
relocation hash. Remove the intermediate std::string for relocation
hash.

Change the tail hashing scheme to consider individual bytes instead.
This helps group 0102 and 0201 together. The benefit is negligible,
though.

Pull Request: llvm/llvm-project#121729
Colibrow added a commit to Colibrow/llvm-project that referenced this pull request Jan 23, 2025
Colibrow added a commit to Colibrow/llvm-project that referenced this pull request Jan 28, 2025
MaskRay pushed a commit to Colibrow/llvm-project that referenced this pull request Feb 2, 2025
MaskRay pushed a commit to Colibrow/llvm-project that referenced this pull request Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants