Skip to content

[DebugNames] Use hashes to quickly filter false positives #79755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

felipepiovezan
Copy link
Contributor

The current implementation of DebugNames is only using hashes to compute the bucket number. Once inside the bucket, it reverts back to string comparisons, even though not all hashes inside a bucket are identical.

This commit changes the behavior so that we check the hash before comparing strings. Such check is so important that it speeds up a simple benchmark by 20%. In other words, the following expression evaluation time goes from 1100ms to 850ms.

bin/lldb \
		--batch \
		-o "b CodeGenFunction::GenerateCode" \
		-o run \
		-o "expr Fn" \
		-- \
		clang++ -c -g test.cpp -o /dev/null &> output

(Note, these numbers are considering the usage of IDX_parent)

The current implementation of DebugNames is _only_ using hashes to compute the
bucket number. Once inside the bucket, it reverts back to string comparisons,
even though not all hashes inside a bucket are identical.

This commit changes the behavior so that we check the hash before comparing
strings. Such check is so important that it speeds up a simple benchmark by 20%.
In other words, the following expression evaluation time goes from 1100ms to
850ms.

```
bin/lldb \
		--batch \
		-o "b CodeGenFunction::GenerateCode" \
		-o run \
		-o "expr Fn" \
		-- \
		clang++ -c -g test.cpp -o /dev/null &> output
```

(Note, these numbers are considering the usage of IDX_parent)
@llvmbot
Copy link
Member

llvmbot commented Jan 28, 2024

@llvm/pr-subscribers-debuginfo

Author: Felipe de Azevedo Piovezan (felipepiovezan)

Changes

The current implementation of DebugNames is only using hashes to compute the bucket number. Once inside the bucket, it reverts back to string comparisons, even though not all hashes inside a bucket are identical.

This commit changes the behavior so that we check the hash before comparing strings. Such check is so important that it speeds up a simple benchmark by 20%. In other words, the following expression evaluation time goes from 1100ms to 850ms.

bin/lldb \
		--batch \
		-o "b CodeGenFunction::GenerateCode" \
		-o run \
		-o "expr Fn" \
		-- \
		clang++ -c -g test.cpp -o /dev/null &> output

(Note, these numbers are considering the usage of IDX_parent)


Full diff: https://github.com/llvm/llvm-project/pull/79755.diff

1 Files Affected:

  • (modified) llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp (+4-2)
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp b/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
index 03ad5d133caddf..a1a1ac093aa5c1 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
@@ -937,9 +937,11 @@ DWARFDebugNames::ValueIterator::findEntryOffsetInCurrentIndex() {
     return std::nullopt; // Empty bucket
 
   for (; Index <= Hdr.NameCount; ++Index) {
-    uint32_t Hash = CurrentIndex->getHashArrayEntry(Index);
-    if (Hash % Hdr.BucketCount != Bucket)
+    uint32_t HashAtIndex = CurrentIndex->getHashArrayEntry(Index);
+    if (HashAtIndex % Hdr.BucketCount != Bucket)
       return std::nullopt; // End of bucket
+    if (HashAtIndex != Hash)
+      continue;
 
     NameTableEntry NTE = CurrentIndex->getNameTableEntry(Index);
     if (NTE.getString() == Key)

Copy link
Collaborator

@adrian-prantl adrian-prantl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a solid improvement!

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

@dwblaikie
Copy link
Collaborator

Nice!

@felipepiovezan felipepiovezan merged commit 69cb99f into llvm:main Jan 30, 2024
@felipepiovezan felipepiovezan deleted the felipe/debug_names_use_hashes_correctly branch January 30, 2024 17:44
felipepiovezan added a commit to felipepiovezan/llvm-project that referenced this pull request Feb 2, 2024
The current implementation of DebugNames is _only_ using hashes to
compute the bucket number. Once inside the bucket, it reverts back to
string comparisons, even though not all hashes inside a bucket are
identical.

This commit changes the behavior so that we check the hash before
comparing strings. Such check is so important that it speeds up a simple
benchmark by 20%. In other words, the following expression evaluation
time goes from 1100ms to 850ms.

```
bin/lldb \
		--batch \
		-o "b CodeGenFunction::GenerateCode" \
		-o run \
		-o "expr Fn" \
		-- \
		clang++ -c -g test.cpp -o /dev/null &> output
```

(Note, these numbers are considering the usage of IDX_parent)

(cherry picked from commit 69cb99f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants