Skip to content

[DebugInfo] Make DISubprogram's hashing always produce the same result #90770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 2, 2024

Conversation

augusto2112
Copy link
Contributor

A DISubprogram's hashing algorithm takes into account its Scope. A Scope can be a temporary though which can be replaced later on during compilation. This means that the hashing algorithm for a DISubprogram could produce a different hash before/after the Scope has changed. Fix this by checking the Scope's linkage name instead, which should always be the same.

rdar://127004707

@llvmbot
Copy link
Member

llvmbot commented May 1, 2024

@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-llvm-ir

Author: Augusto Noronha (augusto2112)

Changes

A DISubprogram's hashing algorithm takes into account its Scope. A Scope can be a temporary though which can be replaced later on during compilation. This means that the hashing algorithm for a DISubprogram could produce a different hash before/after the Scope has changed. Fix this by checking the Scope's linkage name instead, which should always be the same.

rdar://127004707


Full diff: https://github.com/llvm/llvm-project/pull/90770.diff

2 Files Affected:

  • (modified) llvm/lib/IR/LLVMContextImpl.h (+11-5)
  • (modified) llvm/unittests/IR/DebugInfoTest.cpp (+51-1)
diff --git a/llvm/lib/IR/LLVMContextImpl.h b/llvm/lib/IR/LLVMContextImpl.h
index 2713015c266c7e..98c7f7b66c89a7 100644
--- a/llvm/lib/IR/LLVMContextImpl.h
+++ b/llvm/lib/IR/LLVMContextImpl.h
@@ -825,19 +825,25 @@ template <> struct MDNodeKeyImpl<DISubprogram> {
   bool isDefinition() const { return SPFlags & DISubprogram::SPFlagDefinition; }
 
   unsigned getHashValue() const {
+    // Use the Scope's linkage name instead of using the scope directly, as the
+    // scope may be a temporary one which can replaced, which would produce a
+    // different hash for the same DISubprogram.
+    llvm::StringRef ScopeLinkageName;
+    if (auto *CT = dyn_cast_or_null<DICompositeType>(Scope))
+      if (auto *ID = CT->getRawIdentifier())
+        ScopeLinkageName = ID->getString();
+
     // If this is a declaration inside an ODR type, only hash the type and the
     // name.  Otherwise the hash will be stronger than
     // MDNodeSubsetEqualImpl::isDeclarationOfODRMember().
-    if (!isDefinition() && LinkageName)
-      if (auto *CT = dyn_cast_or_null<DICompositeType>(Scope))
-        if (CT->getRawIdentifier())
-          return hash_combine(LinkageName, Scope);
+    if (!isDefinition() && LinkageName && isa<DICompositeType>(Scope))
+      return hash_combine(LinkageName, ScopeLinkageName);
 
     // Intentionally computes the hash on a subset of the operands for
     // performance reason. The subset has to be significant enough to avoid
     // collision "most of the time". There is no correctness issue in case of
     // collision because of the full check above.
-    return hash_combine(Name, Scope, File, Type, Line);
+    return hash_combine(Name, ScopeLinkageName, File, Type, Line);
   }
 };
 
diff --git a/llvm/unittests/IR/DebugInfoTest.cpp b/llvm/unittests/IR/DebugInfoTest.cpp
index d06b979bf4a1c4..e93f917e146eab 100644
--- a/llvm/unittests/IR/DebugInfoTest.cpp
+++ b/llvm/unittests/IR/DebugInfoTest.cpp
@@ -20,6 +20,8 @@
 #include "llvm/IR/Verifier.h"
 #include "llvm/Support/SourceMgr.h"
 #include "llvm/Transforms/Utils/Local.h"
+#include "../lib/IR/LLVMContextImpl.h"
+
 #include "gtest/gtest.h"
 
 using namespace llvm;
@@ -349,7 +351,7 @@ TEST(MetadataTest, OrderingOfDbgVariableRecords) {
   UseNewDbgInfoFormat = OldDbgValueMode;
 }
 
-TEST(DIBuiler, CreateFile) {
+TEST(DIBuilder, CreateFile) {
   LLVMContext Ctx;
   std::unique_ptr<Module> M(new Module("MyModule", Ctx));
   DIBuilder DIB(*M);
@@ -1184,4 +1186,52 @@ TEST(MetadataTest, DbgVariableRecordConversionRoutines) {
   UseNewDbgInfoFormat = false;
 }
 
+// Test that the hashing function for DISubprograms produce the same result
+// after replacing the temporary scope.
+TEST(DIBuilder, HashingDISubprogram) {
+  LLVMContext Ctx;
+  std::unique_ptr<Module> M(new Module("MyModule", Ctx));
+  DIBuilder DIB(*M);
+
+  DIFile *F = DIB.createFile("main.c", "/");
+  DICompileUnit *CU =
+      DIB.createCompileUnit(dwarf::DW_LANG_C, F, "Test", false, "", 0);
+
+  llvm::TempDIType ForwardDeclaredType =
+      llvm::TempDIType(DIB.createReplaceableCompositeType(
+          llvm::dwarf::DW_TAG_structure_type, "MyType", CU, F, 0, 0, 8, 8, {},
+          "UniqueIdentifier"));
+
+  // The hashing function is different for declarations and definitions, so
+  // create one of each.
+  DISubprogram *Declaration =
+      DIB.createMethod(ForwardDeclaredType.get(), "MethodName", "LinkageName",
+                       F, 0, DIB.createSubroutineType({}));
+
+  DISubprogram *Definition = DIB.createFunction(
+      ForwardDeclaredType.get(), "MethodName", "LinkageName", F, 0,
+      DIB.createSubroutineType({}), 0, DINode::FlagZero,
+      llvm::DISubprogram::SPFlagDefinition, nullptr, Declaration);
+
+  // Produce the hash with the temporary scope.
+  unsigned HashDeclaration =
+      MDNodeKeyImpl<DISubprogram>(Declaration).getHashValue();
+  unsigned HashDefinition =
+      MDNodeKeyImpl<DISubprogram>(Definition).getHashValue();
+
+  // Instantiate the real scope and replace the temporary one with it.
+  DICompositeType *Type = DIB.createStructType(CU, "MyType", F, 0, 8, 8, {}, {},
+                                               {}, 0, {}, "UniqueIdentifier");
+  DIB.replaceTemporary(std::move(ForwardDeclaredType), Type);
+
+  // Make sure the hashing is consistent.
+  unsigned HashDeclarationAfter =
+      MDNodeKeyImpl<DISubprogram>(Declaration).getHashValue();
+  unsigned HashDefinitionAfter =
+      MDNodeKeyImpl<DISubprogram>(Definition).getHashValue();
+
+  EXPECT_EQ(HashDeclaration, HashDeclarationAfter);
+  EXPECT_EQ(HashDefinition, HashDefinitionAfter);
+}
+
 } // end namespace

Copy link

github-actions bot commented May 1, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@augusto2112 augusto2112 force-pushed the hashing-disubprogram branch from 8bedd86 to bdae5ed Compare May 1, 2024 20:36
Copy link
Member

@bulbazord bulbazord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't say I know a lot about this code, but your explanations and comments make sense to me. I left one suggestion but I'll leave the final approval to somebody more knowledgable.

@augusto2112 augusto2112 force-pushed the hashing-disubprogram branch from bdae5ed to aafbf00 Compare May 1, 2024 21:21
A DISubprogram's hashing algorithm takes into account its Scope. A Scope
can be a temporary though which can be replaced later on during
compilation. This means that the hashing algorithm for a DISubprogram
could produce a different hash before/after the Scope has changed. Fix
this by checking the Scope's linkage name instead, which should always
be the same.

rdar://127004707
@augusto2112 augusto2112 force-pushed the hashing-disubprogram branch from aafbf00 to 0f47349 Compare May 1, 2024 21:24
Copy link
Collaborator

@adrian-prantl adrian-prantl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually LGTM, with minor nits.

if (CT->getRawIdentifier())
return hash_combine(LinkageName, Scope);
if (!isDefinition() && LinkageName &&
isa_and_nonnull<DICompositeType>(Scope))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is now redundant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just the isa_and_nonnull)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isa_and_nonnull is there to respect the comment above.

    // If this is a declaration inside an ODR type, only hash the type and the
     // name.  Otherwise the hash will be stronger than
     // MDNodeSubsetEqualImpl::isDeclarationOfODRMember().

Otherwise we'd use the earlier hash for every declaration.

I guess I could check if ScopeLinkageName is set instead, could a composite type have no linkage name?


// Intentionally computes the hash on a subset of the operands for
// performance reason. The subset has to be significant enough to avoid
// collision "most of the time". There is no correctness issue in case of
// collision because of the full check above.
return hash_combine(Name, Scope, File, Type, Line);
return hash_combine(Name, ScopeLinkageName, File, Type, Line);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's value in hashing the ScopeLinkageName here, because it will be empty all the time now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the File/Line should provide enough uniqueness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's value in hashing the ScopeLinkageName here, because it will be empty all the time now.

Not necessarily, if the function is a definition ScopeLinkageName can still be set.

Copy link
Collaborator

@adrian-prantl adrian-prantl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(marking as request changes, since I requested changes)

@augusto2112 augusto2112 requested a review from adrian-prantl May 2, 2024 18:16
@augusto2112 augusto2112 merged commit dcf376a into llvm:main May 2, 2024
Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about cases where the mangled name is not included - I guess, for subprogram definitions, maybe that only comes up in something like C?

Comment on lines +1200 to +1204
llvm::TempDIType ForwardDeclaredType =
llvm::TempDIType(DIB.createReplaceableCompositeType(
llvm::dwarf::DW_TAG_structure_type, "MyType", CU, F, 0, 0, 8, 8, {},
"UniqueIdentifier"));

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the type required when testing the hashing of a subprogram? To create some interesting referential structure, I guess - perhaps a comment(s) would be handy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type is required because I'm testing that replacing the subprogram's scope (in this case, the type) from the temporary type to the actual one should not change the hashing function's result.

The bug that I'm solving with this is that some DISubprograms inside a DenseSet would get duplicated when the set resized, because the hashing function was not consistent before and after replacing their scope from a temporary type with the actual type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a comment above the test with an explanation of why we need the type. Do you think I should add some clarification to it?

// Test that the hashing function for DISubprograms produce the same result
// after replacing the temporary scope.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, few extra words might be nice. Since a DISubprogram is itself a scope it wasn't clear to me this referred to the enclosing scope, or that it specifically happens for enclosing scopes that are types only.

@augusto2112
Copy link
Contributor Author

What about cases where the mangled name is not included - I guess, for subprogram definitions, maybe that only comes up in something like C?

@dwblaikie In that case the hash would be based off of the other fields (Name, File, Type, Line), so it should still be a good enough hashing function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants