[DebugNames] Use hashes to quickly filter false positives #79755

felipepiovezan · 2024-01-28T15:36:06Z

The current implementation of DebugNames is only using hashes to compute the bucket number. Once inside the bucket, it reverts back to string comparisons, even though not all hashes inside a bucket are identical.

This commit changes the behavior so that we check the hash before comparing strings. Such check is so important that it speeds up a simple benchmark by 20%. In other words, the following expression evaluation time goes from 1100ms to 850ms.

bin/lldb \
		--batch \
		-o "b CodeGenFunction::GenerateCode" \
		-o run \
		-o "expr Fn" \
		-- \
		clang++ -c -g test.cpp -o /dev/null &> output

(Note, these numbers are considering the usage of IDX_parent)

The current implementation of DebugNames is _only_ using hashes to compute the bucket number. Once inside the bucket, it reverts back to string comparisons, even though not all hashes inside a bucket are identical. This commit changes the behavior so that we check the hash before comparing strings. Such check is so important that it speeds up a simple benchmark by 20%. In other words, the following expression evaluation time goes from 1100ms to 850ms. ``` bin/lldb \ --batch \ -o "b CodeGenFunction::GenerateCode" \ -o run \ -o "expr Fn" \ -- \ clang++ -c -g test.cpp -o /dev/null &> output ``` (Note, these numbers are considering the usage of IDX_parent)

llvmbot · 2024-01-28T15:36:36Z

@llvm/pr-subscribers-debuginfo

Author: Felipe de Azevedo Piovezan (felipepiovezan)

Changes

The current implementation of DebugNames is only using hashes to compute the bucket number. Once inside the bucket, it reverts back to string comparisons, even though not all hashes inside a bucket are identical.

This commit changes the behavior so that we check the hash before comparing strings. Such check is so important that it speeds up a simple benchmark by 20%. In other words, the following expression evaluation time goes from 1100ms to 850ms.

bin/lldb \
		--batch \
		-o "b CodeGenFunction::GenerateCode" \
		-o run \
		-o "expr Fn" \
		-- \
		clang++ -c -g test.cpp -o /dev/null &amp;&gt; output

(Note, these numbers are considering the usage of IDX_parent)

Full diff: https://github.com/llvm/llvm-project/pull/79755.diff

1 Files Affected:

(modified) llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp (+4-2)

diff --git a/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp b/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
index 03ad5d133caddf..a1a1ac093aa5c1 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
@@ -937,9 +937,11 @@ DWARFDebugNames::ValueIterator::findEntryOffsetInCurrentIndex() {
     return std::nullopt; // Empty bucket
 
   for (; Index <= Hdr.NameCount; ++Index) {
-    uint32_t Hash = CurrentIndex->getHashArrayEntry(Index);
-    if (Hash % Hdr.BucketCount != Bucket)
+    uint32_t HashAtIndex = CurrentIndex->getHashArrayEntry(Index);
+    if (HashAtIndex % Hdr.BucketCount != Bucket)
       return std::nullopt; // End of bucket
+    if (HashAtIndex != Hash)
+      continue;
 
     NameTableEntry NTE = CurrentIndex->getNameTableEntry(Index);
     if (NTE.getString() == Key)

adrian-prantl

That's a solid improvement!

llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp

JDevlieghere

🥳

dwblaikie · 2024-01-30T00:17:48Z

Nice!

The current implementation of DebugNames is _only_ using hashes to compute the bucket number. Once inside the bucket, it reverts back to string comparisons, even though not all hashes inside a bucket are identical. This commit changes the behavior so that we check the hash before comparing strings. Such check is so important that it speeds up a simple benchmark by 20%. In other words, the following expression evaluation time goes from 1100ms to 850ms. ``` bin/lldb \ --batch \ -o "b CodeGenFunction::GenerateCode" \ -o run \ -o "expr Fn" \ -- \ clang++ -c -g test.cpp -o /dev/null &> output ``` (Note, these numbers are considering the usage of IDX_parent) (cherry picked from commit 69cb99f)

llvmbot added the debuginfo label Jan 28, 2024

felipepiovezan requested review from JDevlieghere, adrian-prantl, dwblaikie and ayermolo January 28, 2024 15:36

ayermolo approved these changes Jan 28, 2024

View reviewed changes

adrian-prantl approved these changes Jan 29, 2024

View reviewed changes

llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp Show resolved Hide resolved

JDevlieghere approved these changes Jan 29, 2024

View reviewed changes

fixup! Add another comment

ac12bf8

felipepiovezan merged commit 69cb99f into llvm:main Jan 30, 2024

felipepiovezan deleted the felipe/debug_names_use_hashes_correctly branch January 30, 2024 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DebugNames] Use hashes to quickly filter false positives #79755

[DebugNames] Use hashes to quickly filter false positives #79755

Uh oh!

felipepiovezan commented Jan 28, 2024

Uh oh!

llvmbot commented Jan 28, 2024

Uh oh!

adrian-prantl left a comment

Uh oh!

Uh oh!

JDevlieghere left a comment

Uh oh!

dwblaikie commented Jan 30, 2024

Uh oh!

Uh oh!

[DebugNames] Use hashes to quickly filter false positives #79755

[DebugNames] Use hashes to quickly filter false positives #79755

Uh oh!

Conversation

felipepiovezan commented Jan 28, 2024

Uh oh!

llvmbot commented Jan 28, 2024

Uh oh!

adrian-prantl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JDevlieghere left a comment

Choose a reason for hiding this comment

Uh oh!

dwblaikie commented Jan 30, 2024

Uh oh!

Uh oh!