-
Notifications
You must be signed in to change notification settings - Fork 14.2k
[DebugInfo] Make DISubprogram's hashing always produce the same result #90770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-debuginfo @llvm/pr-subscribers-llvm-ir Author: Augusto Noronha (augusto2112) ChangesA DISubprogram's hashing algorithm takes into account its Scope. A Scope can be a temporary though which can be replaced later on during compilation. This means that the hashing algorithm for a DISubprogram could produce a different hash before/after the Scope has changed. Fix this by checking the Scope's linkage name instead, which should always be the same. rdar://127004707 Full diff: https://github.com/llvm/llvm-project/pull/90770.diff 2 Files Affected:
diff --git a/llvm/lib/IR/LLVMContextImpl.h b/llvm/lib/IR/LLVMContextImpl.h
index 2713015c266c7e..98c7f7b66c89a7 100644
--- a/llvm/lib/IR/LLVMContextImpl.h
+++ b/llvm/lib/IR/LLVMContextImpl.h
@@ -825,19 +825,25 @@ template <> struct MDNodeKeyImpl<DISubprogram> {
bool isDefinition() const { return SPFlags & DISubprogram::SPFlagDefinition; }
unsigned getHashValue() const {
+ // Use the Scope's linkage name instead of using the scope directly, as the
+ // scope may be a temporary one which can replaced, which would produce a
+ // different hash for the same DISubprogram.
+ llvm::StringRef ScopeLinkageName;
+ if (auto *CT = dyn_cast_or_null<DICompositeType>(Scope))
+ if (auto *ID = CT->getRawIdentifier())
+ ScopeLinkageName = ID->getString();
+
// If this is a declaration inside an ODR type, only hash the type and the
// name. Otherwise the hash will be stronger than
// MDNodeSubsetEqualImpl::isDeclarationOfODRMember().
- if (!isDefinition() && LinkageName)
- if (auto *CT = dyn_cast_or_null<DICompositeType>(Scope))
- if (CT->getRawIdentifier())
- return hash_combine(LinkageName, Scope);
+ if (!isDefinition() && LinkageName && isa<DICompositeType>(Scope))
+ return hash_combine(LinkageName, ScopeLinkageName);
// Intentionally computes the hash on a subset of the operands for
// performance reason. The subset has to be significant enough to avoid
// collision "most of the time". There is no correctness issue in case of
// collision because of the full check above.
- return hash_combine(Name, Scope, File, Type, Line);
+ return hash_combine(Name, ScopeLinkageName, File, Type, Line);
}
};
diff --git a/llvm/unittests/IR/DebugInfoTest.cpp b/llvm/unittests/IR/DebugInfoTest.cpp
index d06b979bf4a1c4..e93f917e146eab 100644
--- a/llvm/unittests/IR/DebugInfoTest.cpp
+++ b/llvm/unittests/IR/DebugInfoTest.cpp
@@ -20,6 +20,8 @@
#include "llvm/IR/Verifier.h"
#include "llvm/Support/SourceMgr.h"
#include "llvm/Transforms/Utils/Local.h"
+#include "../lib/IR/LLVMContextImpl.h"
+
#include "gtest/gtest.h"
using namespace llvm;
@@ -349,7 +351,7 @@ TEST(MetadataTest, OrderingOfDbgVariableRecords) {
UseNewDbgInfoFormat = OldDbgValueMode;
}
-TEST(DIBuiler, CreateFile) {
+TEST(DIBuilder, CreateFile) {
LLVMContext Ctx;
std::unique_ptr<Module> M(new Module("MyModule", Ctx));
DIBuilder DIB(*M);
@@ -1184,4 +1186,52 @@ TEST(MetadataTest, DbgVariableRecordConversionRoutines) {
UseNewDbgInfoFormat = false;
}
+// Test that the hashing function for DISubprograms produce the same result
+// after replacing the temporary scope.
+TEST(DIBuilder, HashingDISubprogram) {
+ LLVMContext Ctx;
+ std::unique_ptr<Module> M(new Module("MyModule", Ctx));
+ DIBuilder DIB(*M);
+
+ DIFile *F = DIB.createFile("main.c", "/");
+ DICompileUnit *CU =
+ DIB.createCompileUnit(dwarf::DW_LANG_C, F, "Test", false, "", 0);
+
+ llvm::TempDIType ForwardDeclaredType =
+ llvm::TempDIType(DIB.createReplaceableCompositeType(
+ llvm::dwarf::DW_TAG_structure_type, "MyType", CU, F, 0, 0, 8, 8, {},
+ "UniqueIdentifier"));
+
+ // The hashing function is different for declarations and definitions, so
+ // create one of each.
+ DISubprogram *Declaration =
+ DIB.createMethod(ForwardDeclaredType.get(), "MethodName", "LinkageName",
+ F, 0, DIB.createSubroutineType({}));
+
+ DISubprogram *Definition = DIB.createFunction(
+ ForwardDeclaredType.get(), "MethodName", "LinkageName", F, 0,
+ DIB.createSubroutineType({}), 0, DINode::FlagZero,
+ llvm::DISubprogram::SPFlagDefinition, nullptr, Declaration);
+
+ // Produce the hash with the temporary scope.
+ unsigned HashDeclaration =
+ MDNodeKeyImpl<DISubprogram>(Declaration).getHashValue();
+ unsigned HashDefinition =
+ MDNodeKeyImpl<DISubprogram>(Definition).getHashValue();
+
+ // Instantiate the real scope and replace the temporary one with it.
+ DICompositeType *Type = DIB.createStructType(CU, "MyType", F, 0, 8, 8, {}, {},
+ {}, 0, {}, "UniqueIdentifier");
+ DIB.replaceTemporary(std::move(ForwardDeclaredType), Type);
+
+ // Make sure the hashing is consistent.
+ unsigned HashDeclarationAfter =
+ MDNodeKeyImpl<DISubprogram>(Declaration).getHashValue();
+ unsigned HashDefinitionAfter =
+ MDNodeKeyImpl<DISubprogram>(Definition).getHashValue();
+
+ EXPECT_EQ(HashDeclaration, HashDeclarationAfter);
+ EXPECT_EQ(HashDefinition, HashDefinitionAfter);
+}
+
} // end namespace
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
8bedd86
to
bdae5ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't say I know a lot about this code, but your explanations and comments make sense to me. I left one suggestion but I'll leave the final approval to somebody more knowledgable.
bdae5ed
to
aafbf00
Compare
A DISubprogram's hashing algorithm takes into account its Scope. A Scope can be a temporary though which can be replaced later on during compilation. This means that the hashing algorithm for a DISubprogram could produce a different hash before/after the Scope has changed. Fix this by checking the Scope's linkage name instead, which should always be the same. rdar://127004707
aafbf00
to
0f47349
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually LGTM, with minor nits.
if (CT->getRawIdentifier()) | ||
return hash_combine(LinkageName, Scope); | ||
if (!isDefinition() && LinkageName && | ||
isa_and_nonnull<DICompositeType>(Scope)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is now redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just the isa_and_nonnull)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The isa_and_nonnull
is there to respect the comment above.
// If this is a declaration inside an ODR type, only hash the type and the
// name. Otherwise the hash will be stronger than
// MDNodeSubsetEqualImpl::isDeclarationOfODRMember().
Otherwise we'd use the earlier hash for every declaration.
I guess I could check if ScopeLinkageName
is set instead, could a composite type have no linkage name?
|
||
// Intentionally computes the hash on a subset of the operands for | ||
// performance reason. The subset has to be significant enough to avoid | ||
// collision "most of the time". There is no correctness issue in case of | ||
// collision because of the full check above. | ||
return hash_combine(Name, Scope, File, Type, Line); | ||
return hash_combine(Name, ScopeLinkageName, File, Type, Line); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's value in hashing the ScopeLinkageName here, because it will be empty all the time now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the File/Line should provide enough uniqueness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's value in hashing the ScopeLinkageName here, because it will be empty all the time now.
Not necessarily, if the function is a definition ScopeLinkageName
can still be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(marking as request changes, since I requested changes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about cases where the mangled name is not included - I guess, for subprogram definitions, maybe that only comes up in something like C?
llvm::TempDIType ForwardDeclaredType = | ||
llvm::TempDIType(DIB.createReplaceableCompositeType( | ||
llvm::dwarf::DW_TAG_structure_type, "MyType", CU, F, 0, 0, 8, 8, {}, | ||
"UniqueIdentifier")); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the type required when testing the hashing of a subprogram? To create some interesting referential structure, I guess - perhaps a comment(s) would be handy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type is required because I'm testing that replacing the subprogram's scope (in this case, the type) from the temporary type to the actual one should not change the hashing function's result.
The bug that I'm solving with this is that some DISubprograms inside a DenseSet would get duplicated when the set resized, because the hashing function was not consistent before and after replacing their scope from a temporary type with the actual type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a comment above the test with an explanation of why we need the type. Do you think I should add some clarification to it?
// Test that the hashing function for DISubprograms produce the same result
// after replacing the temporary scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, few extra words might be nice. Since a DISubprogram is itself a scope it wasn't clear to me this referred to the enclosing scope, or that it specifically happens for enclosing scopes that are types only.
@dwblaikie In that case the hash would be based off of the other fields (Name, File, Type, Line), so it should still be a good enough hashing function. |
A DISubprogram's hashing algorithm takes into account its Scope. A Scope can be a temporary though which can be replaced later on during compilation. This means that the hashing algorithm for a DISubprogram could produce a different hash before/after the Scope has changed. Fix this by checking the Scope's linkage name instead, which should always be the same.
rdar://127004707