[DebugInfo] Make DISubprogram's hashing always produce the same result #90770

augusto2112 · 2024-05-01T20:23:17Z

A DISubprogram's hashing algorithm takes into account its Scope. A Scope can be a temporary though which can be replaced later on during compilation. This means that the hashing algorithm for a DISubprogram could produce a different hash before/after the Scope has changed. Fix this by checking the Scope's linkage name instead, which should always be the same.

rdar://127004707

llvmbot · 2024-05-01T20:23:47Z

@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-llvm-ir

Author: Augusto Noronha (augusto2112)

Changes

A DISubprogram's hashing algorithm takes into account its Scope. A Scope can be a temporary though which can be replaced later on during compilation. This means that the hashing algorithm for a DISubprogram could produce a different hash before/after the Scope has changed. Fix this by checking the Scope's linkage name instead, which should always be the same.

rdar://127004707

Full diff: https://github.com/llvm/llvm-project/pull/90770.diff

2 Files Affected:

(modified) llvm/lib/IR/LLVMContextImpl.h (+11-5)
(modified) llvm/unittests/IR/DebugInfoTest.cpp (+51-1)

diff --git a/llvm/lib/IR/LLVMContextImpl.h b/llvm/lib/IR/LLVMContextImpl.h
index 2713015c266c7e..98c7f7b66c89a7 100644
--- a/llvm/lib/IR/LLVMContextImpl.h
+++ b/llvm/lib/IR/LLVMContextImpl.h
@@ -825,19 +825,25 @@ template <> struct MDNodeKeyImpl<DISubprogram> {
   bool isDefinition() const { return SPFlags & DISubprogram::SPFlagDefinition; }
 
   unsigned getHashValue() const {
+    // Use the Scope's linkage name instead of using the scope directly, as the
+    // scope may be a temporary one which can replaced, which would produce a
+    // different hash for the same DISubprogram.
+    llvm::StringRef ScopeLinkageName;
+    if (auto *CT = dyn_cast_or_null<DICompositeType>(Scope))
+      if (auto *ID = CT->getRawIdentifier())
+        ScopeLinkageName = ID->getString();
+
     // If this is a declaration inside an ODR type, only hash the type and the
     // name.  Otherwise the hash will be stronger than
     // MDNodeSubsetEqualImpl::isDeclarationOfODRMember().
-    if (!isDefinition() && LinkageName)
-      if (auto *CT = dyn_cast_or_null<DICompositeType>(Scope))
-        if (CT->getRawIdentifier())
-          return hash_combine(LinkageName, Scope);
+    if (!isDefinition() && LinkageName && isa<DICompositeType>(Scope))
+      return hash_combine(LinkageName, ScopeLinkageName);
 
     // Intentionally computes the hash on a subset of the operands for
     // performance reason. The subset has to be significant enough to avoid
     // collision "most of the time". There is no correctness issue in case of
     // collision because of the full check above.
-    return hash_combine(Name, Scope, File, Type, Line);
+    return hash_combine(Name, ScopeLinkageName, File, Type, Line);
   }
 };
 
diff --git a/llvm/unittests/IR/DebugInfoTest.cpp b/llvm/unittests/IR/DebugInfoTest.cpp
index d06b979bf4a1c4..e93f917e146eab 100644
--- a/llvm/unittests/IR/DebugInfoTest.cpp
+++ b/llvm/unittests/IR/DebugInfoTest.cpp
@@ -20,6 +20,8 @@
 #include "llvm/IR/Verifier.h"
 #include "llvm/Support/SourceMgr.h"
 #include "llvm/Transforms/Utils/Local.h"
+#include "../lib/IR/LLVMContextImpl.h"
+
 #include "gtest/gtest.h"
 
 using namespace llvm;
@@ -349,7 +351,7 @@ TEST(MetadataTest, OrderingOfDbgVariableRecords) {
   UseNewDbgInfoFormat = OldDbgValueMode;
 }
 
-TEST(DIBuiler, CreateFile) {
+TEST(DIBuilder, CreateFile) {
   LLVMContext Ctx;
   std::unique_ptr<Module> M(new Module("MyModule", Ctx));
   DIBuilder DIB(*M);
@@ -1184,4 +1186,52 @@ TEST(MetadataTest, DbgVariableRecordConversionRoutines) {
   UseNewDbgInfoFormat = false;
 }
 
+// Test that the hashing function for DISubprograms produce the same result
+// after replacing the temporary scope.
+TEST(DIBuilder, HashingDISubprogram) {
+  LLVMContext Ctx;
+  std::unique_ptr<Module> M(new Module("MyModule", Ctx));
+  DIBuilder DIB(*M);
+
+  DIFile *F = DIB.createFile("main.c", "/");
+  DICompileUnit *CU =
+      DIB.createCompileUnit(dwarf::DW_LANG_C, F, "Test", false, "", 0);
+
+  llvm::TempDIType ForwardDeclaredType =
+      llvm::TempDIType(DIB.createReplaceableCompositeType(
+          llvm::dwarf::DW_TAG_structure_type, "MyType", CU, F, 0, 0, 8, 8, {},
+          "UniqueIdentifier"));
+
+  // The hashing function is different for declarations and definitions, so
+  // create one of each.
+  DISubprogram *Declaration =
+      DIB.createMethod(ForwardDeclaredType.get(), "MethodName", "LinkageName",
+                       F, 0, DIB.createSubroutineType({}));
+
+  DISubprogram *Definition = DIB.createFunction(
+      ForwardDeclaredType.get(), "MethodName", "LinkageName", F, 0,
+      DIB.createSubroutineType({}), 0, DINode::FlagZero,
+      llvm::DISubprogram::SPFlagDefinition, nullptr, Declaration);
+
+  // Produce the hash with the temporary scope.
+  unsigned HashDeclaration =
+      MDNodeKeyImpl<DISubprogram>(Declaration).getHashValue();
+  unsigned HashDefinition =
+      MDNodeKeyImpl<DISubprogram>(Definition).getHashValue();
+
+  // Instantiate the real scope and replace the temporary one with it.
+  DICompositeType *Type = DIB.createStructType(CU, "MyType", F, 0, 8, 8, {}, {},
+                                               {}, 0, {}, "UniqueIdentifier");
+  DIB.replaceTemporary(std::move(ForwardDeclaredType), Type);
+
+  // Make sure the hashing is consistent.
+  unsigned HashDeclarationAfter =
+      MDNodeKeyImpl<DISubprogram>(Declaration).getHashValue();
+  unsigned HashDefinitionAfter =
+      MDNodeKeyImpl<DISubprogram>(Definition).getHashValue();
+
+  EXPECT_EQ(HashDeclaration, HashDeclarationAfter);
+  EXPECT_EQ(HashDefinition, HashDefinitionAfter);
+}
+
 } // end namespace

github-actions · 2024-05-01T20:26:01Z

✅ With the latest revision this PR passed the C/C++ code formatter.

bulbazord

Can't say I know a lot about this code, but your explanations and comments make sense to me. I left one suggestion but I'll leave the final approval to somebody more knowledgable.

llvm/unittests/IR/DebugInfoTest.cpp

A DISubprogram's hashing algorithm takes into account its Scope. A Scope can be a temporary though which can be replaced later on during compilation. This means that the hashing algorithm for a DISubprogram could produce a different hash before/after the Scope has changed. Fix this by checking the Scope's linkage name instead, which should always be the same. rdar://127004707

adrian-prantl

Conceptually LGTM, with minor nits.

adrian-prantl · 2024-05-02T16:42:59Z

llvm/lib/IR/LLVMContextImpl.h

-        if (CT->getRawIdentifier())
-          return hash_combine(LinkageName, Scope);
+    if (!isDefinition() && LinkageName &&
+        isa_and_nonnull<DICompositeType>(Scope))


This check is now redundant.

(just the isa_and_nonnull)

The isa_and_nonnull is there to respect the comment above.

// If this is a declaration inside an ODR type, only hash the type and the // name. Otherwise the hash will be stronger than // MDNodeSubsetEqualImpl::isDeclarationOfODRMember().

Otherwise we'd use the earlier hash for every declaration.

I guess I could check if ScopeLinkageName is set instead, could a composite type have no linkage name?

adrian-prantl · 2024-05-02T16:43:29Z

llvm/lib/IR/LLVMContextImpl.h


    // Intentionally computes the hash on a subset of the operands for
    // performance reason. The subset has to be significant enough to avoid
    // collision "most of the time". There is no correctness issue in case of
    // collision because of the full check above.
-    return hash_combine(Name, Scope, File, Type, Line);
+    return hash_combine(Name, ScopeLinkageName, File, Type, Line);


I don't think there's value in hashing the ScopeLinkageName here, because it will be empty all the time now.

Also the File/Line should provide enough uniqueness.

I don't think there's value in hashing the ScopeLinkageName here, because it will be empty all the time now.

Not necessarily, if the function is a definition ScopeLinkageName can still be set.

adrian-prantl

(marking as request changes, since I requested changes)

dwblaikie

What about cases where the mangled name is not included - I guess, for subprogram definitions, maybe that only comes up in something like C?

dwblaikie · 2024-05-06T17:22:03Z

llvm/unittests/IR/DebugInfoTest.cpp

+  llvm::TempDIType ForwardDeclaredType =
+      llvm::TempDIType(DIB.createReplaceableCompositeType(
+          llvm::dwarf::DW_TAG_structure_type, "MyType", CU, F, 0, 0, 8, 8, {},
+          "UniqueIdentifier"));
+


why is the type required when testing the hashing of a subprogram? To create some interesting referential structure, I guess - perhaps a comment(s) would be handy?

The type is required because I'm testing that replacing the subprogram's scope (in this case, the type) from the temporary type to the actual one should not change the hashing function's result.

The bug that I'm solving with this is that some DISubprograms inside a DenseSet would get duplicated when the set resized, because the hashing function was not consistent before and after replacing their scope from a temporary type with the actual type.

There's already a comment above the test with an explanation of why we need the type. Do you think I should add some clarification to it?

// Test that the hashing function for DISubprograms produce the same result // after replacing the temporary scope.

Yeah, few extra words might be nice. Since a DISubprogram is itself a scope it wasn't clear to me this referred to the enclosing scope, or that it specifically happens for enclosing scopes that are types only.

augusto2112 · 2024-05-07T18:11:21Z

What about cases where the mangled name is not included - I guess, for subprogram definitions, maybe that only comes up in something like C?

@dwblaikie In that case the hash would be based off of the other fields (Name, File, Type, Line), so it should still be a good enough hashing function.

augusto2112 requested review from bulbazord and adrian-prantl May 1, 2024 20:23

llvmbot added the llvm:ir label May 1, 2024

augusto2112 force-pushed the hashing-disubprogram branch from 8bedd86 to bdae5ed Compare May 1, 2024 20:36

bulbazord reviewed May 1, 2024

View reviewed changes

llvm/unittests/IR/DebugInfoTest.cpp Outdated Show resolved Hide resolved

augusto2112 force-pushed the hashing-disubprogram branch from bdae5ed to aafbf00 Compare May 1, 2024 21:21

augusto2112 force-pushed the hashing-disubprogram branch from aafbf00 to 0f47349 Compare May 1, 2024 21:24

adrian-prantl approved these changes May 2, 2024

View reviewed changes

adrian-prantl added the debuginfo label May 2, 2024

adrian-prantl requested changes May 2, 2024

View reviewed changes

augusto2112 requested a review from adrian-prantl May 2, 2024 18:16

adrian-prantl approved these changes May 2, 2024

View reviewed changes

augusto2112 merged commit dcf376a into llvm:main May 2, 2024

dwblaikie reviewed May 6, 2024

View reviewed changes

[DebugInfo] Make DISubprogram's hashing always produce the same result #90770

[DebugInfo] Make DISubprogram's hashing always produce the same result #90770

Uh oh!

Conversation

augusto2112 commented May 1, 2024

Uh oh!

llvmbot commented May 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bulbazord left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adrian-prantl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adrian-prantl left a comment

Choose a reason for hiding this comment

Uh oh!

dwblaikie left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

augusto2112 commented May 7, 2024

Uh oh!

Uh oh!

llvmbot commented May 1, 2024 •

edited

Loading

github-actions bot commented May 1, 2024 •

edited

Loading