Skip to content

[DebugInfo] Make DISubprogram's hashing always produce the same result #90770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 12 additions & 5 deletions llvm/lib/IR/LLVMContextImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -825,19 +825,26 @@ template <> struct MDNodeKeyImpl<DISubprogram> {
bool isDefinition() const { return SPFlags & DISubprogram::SPFlagDefinition; }

unsigned getHashValue() const {
// Use the Scope's linkage name instead of using the scope directly, as the
// scope may be a temporary one which can replaced, which would produce a
// different hash for the same DISubprogram.
llvm::StringRef ScopeLinkageName;
if (auto *CT = dyn_cast_or_null<DICompositeType>(Scope))
if (auto *ID = CT->getRawIdentifier())
ScopeLinkageName = ID->getString();

// If this is a declaration inside an ODR type, only hash the type and the
// name. Otherwise the hash will be stronger than
// MDNodeSubsetEqualImpl::isDeclarationOfODRMember().
if (!isDefinition() && LinkageName)
if (auto *CT = dyn_cast_or_null<DICompositeType>(Scope))
if (CT->getRawIdentifier())
return hash_combine(LinkageName, Scope);
if (!isDefinition() && LinkageName &&
isa_and_nonnull<DICompositeType>(Scope))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is now redundant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just the isa_and_nonnull)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isa_and_nonnull is there to respect the comment above.

    // If this is a declaration inside an ODR type, only hash the type and the
     // name.  Otherwise the hash will be stronger than
     // MDNodeSubsetEqualImpl::isDeclarationOfODRMember().

Otherwise we'd use the earlier hash for every declaration.

I guess I could check if ScopeLinkageName is set instead, could a composite type have no linkage name?

return hash_combine(LinkageName, ScopeLinkageName);

// Intentionally computes the hash on a subset of the operands for
// performance reason. The subset has to be significant enough to avoid
// collision "most of the time". There is no correctness issue in case of
// collision because of the full check above.
return hash_combine(Name, Scope, File, Type, Line);
return hash_combine(Name, ScopeLinkageName, File, Type, Line);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's value in hashing the ScopeLinkageName here, because it will be empty all the time now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the File/Line should provide enough uniqueness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's value in hashing the ScopeLinkageName here, because it will be empty all the time now.

Not necessarily, if the function is a definition ScopeLinkageName can still be set.

}
};

Expand Down
52 changes: 51 additions & 1 deletion llvm/unittests/IR/DebugInfoTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
//===----------------------------------------------------------------------===//

#include "llvm/IR/DebugInfo.h"
#include "../lib/IR/LLVMContextImpl.h"
#include "llvm/ADT/APSInt.h"
#include "llvm/AsmParser/Parser.h"
#include "llvm/IR/DIBuilder.h"
Expand All @@ -20,6 +21,7 @@
#include "llvm/IR/Verifier.h"
#include "llvm/Support/SourceMgr.h"
#include "llvm/Transforms/Utils/Local.h"

#include "gtest/gtest.h"

using namespace llvm;
Expand Down Expand Up @@ -349,7 +351,7 @@ TEST(MetadataTest, OrderingOfDbgVariableRecords) {
UseNewDbgInfoFormat = OldDbgValueMode;
}

TEST(DIBuiler, CreateFile) {
TEST(DIBuilder, CreateFile) {
LLVMContext Ctx;
std::unique_ptr<Module> M(new Module("MyModule", Ctx));
DIBuilder DIB(*M);
Expand Down Expand Up @@ -1184,4 +1186,52 @@ TEST(MetadataTest, DbgVariableRecordConversionRoutines) {
UseNewDbgInfoFormat = false;
}

// Test that the hashing function for DISubprograms produce the same result
// after replacing the temporary scope.
TEST(DIBuilder, HashingDISubprogram) {
LLVMContext Ctx;
std::unique_ptr<Module> M = std::make_unique<Module>("MyModule", Ctx);
DIBuilder DIB(*M);

DIFile *F = DIB.createFile("main.c", "/");
DICompileUnit *CU =
DIB.createCompileUnit(dwarf::DW_LANG_C, F, "Test", false, "", 0);

llvm::TempDIType ForwardDeclaredType =
llvm::TempDIType(DIB.createReplaceableCompositeType(
llvm::dwarf::DW_TAG_structure_type, "MyType", CU, F, 0, 0, 8, 8, {},
"UniqueIdentifier"));

Comment on lines +1200 to +1204
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the type required when testing the hashing of a subprogram? To create some interesting referential structure, I guess - perhaps a comment(s) would be handy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type is required because I'm testing that replacing the subprogram's scope (in this case, the type) from the temporary type to the actual one should not change the hashing function's result.

The bug that I'm solving with this is that some DISubprograms inside a DenseSet would get duplicated when the set resized, because the hashing function was not consistent before and after replacing their scope from a temporary type with the actual type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a comment above the test with an explanation of why we need the type. Do you think I should add some clarification to it?

// Test that the hashing function for DISubprograms produce the same result
// after replacing the temporary scope.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, few extra words might be nice. Since a DISubprogram is itself a scope it wasn't clear to me this referred to the enclosing scope, or that it specifically happens for enclosing scopes that are types only.

// The hashing function is different for declarations and definitions, so
// create one of each.
DISubprogram *Declaration =
DIB.createMethod(ForwardDeclaredType.get(), "MethodName", "LinkageName",
F, 0, DIB.createSubroutineType({}));

DISubprogram *Definition = DIB.createFunction(
ForwardDeclaredType.get(), "MethodName", "LinkageName", F, 0,
DIB.createSubroutineType({}), 0, DINode::FlagZero,
llvm::DISubprogram::SPFlagDefinition, nullptr, Declaration);

// Produce the hash with the temporary scope.
unsigned HashDeclaration =
MDNodeKeyImpl<DISubprogram>(Declaration).getHashValue();
unsigned HashDefinition =
MDNodeKeyImpl<DISubprogram>(Definition).getHashValue();

// Instantiate the real scope and replace the temporary one with it.
DICompositeType *Type = DIB.createStructType(CU, "MyType", F, 0, 8, 8, {}, {},
{}, 0, {}, "UniqueIdentifier");
DIB.replaceTemporary(std::move(ForwardDeclaredType), Type);

// Now make sure the hashing is consistent.
unsigned HashDeclarationAfter =
MDNodeKeyImpl<DISubprogram>(Declaration).getHashValue();
unsigned HashDefinitionAfter =
MDNodeKeyImpl<DISubprogram>(Definition).getHashValue();

EXPECT_EQ(HashDeclaration, HashDeclarationAfter);
EXPECT_EQ(HashDefinition, HashDefinitionAfter);
}

} // end namespace