Skip to content

[LLVM][DWARF] Refactor code for generating DWARF V5 .debug_names #82394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions llvm/include/llvm/BinaryFormat/Dwarf.h
Original file line number Diff line number Diff line change
Expand Up @@ -613,6 +613,25 @@ enum AcceleratorTable {
DW_hash_function_djb = 0u
};

// Uniquify the string hashes and calculate the bucket count for the
// DWARF v5 Accelerator Table. NOTE: This function effectively consumes the
// 'hashes' input parameter.
inline uint32_t getDebugNamesBucketCount(MutableArrayRef<uint32_t> hashes,
uint32_t &uniqueHashCount) {
uint32_t BucketCount = 0;

sort(hashes);
uniqueHashCount = llvm::unique(hashes) - hashes.begin();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without returning the iterator from llvm::unique or otherwise communicating info about hashes out to the caller - I guess the caller isn't expected to use their copy of the data structure again? Might be worth detailing in the doc comment that this function effectively consumes the hashes array? Or is it somehow usable afterwards?

Copy link
Contributor Author

@cmtice cmtice Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your comment? Yes, I want the changes to hashes to make it back to the caller.

I thought making it a MutableArray meant that the changes would get back to the user? Originally I had it as a SmallVector &, but felipepivezan asked me to change it to a MutableArray instead. LLVM would not allow me to declare the MutableArray as a reference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM would not allow me to declare the MutableArray as a reference

MutableArrays are meant to be passed by copy, they are just a "non const ptr" + "size" abstraction.

I thought making it a MutableArray meant that the changes would get back to the user?

The changes make it back to the user, but this is not the important distinction between MutableArrayRef and SmallVector. The SmallVector API exposes resizing capabilities, which we are not using inside this function. So you communicate this intent by using the more restricted API.

That said, David's point is that the erase functions only moves elements around, it doesn't actually erase them (contrary to what the name suggests). Because of that, the caller of this function can't possibly have a useful vector after this function returns.

Copy link
Contributor Author

@cmtice cmtice Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I went back and double checked the callers; they do NOT expect a usable vector after this call, so that is fine. However I'm getting a compile time error using the MutableArrayRef, which I could use to help figuring out how to fix:

third_party/llvm/llvm-project/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp:43:50: error: non-const lvalue reference to type 'MutableArrayRef<uint32_t>' (aka 'MutableArrayRef') cannot bind to a value of unrelated type 'SmallVector<uint32_t, 0>' (aka 'SmallVector<unsigned int, 0>')
43 | llvm::dwarf::computeDebugNamesUniqueHashes(Uniques, UniqueHashCount);
| ^~~~~~~
third_party/llvm/llvm-project/llvm/include/llvm/BinaryFormat/Dwarf.h:618:74: note: passing argument to parameter 'hashes' here
618 | inline uint32_t computeDebugNamesUniqueHashes(MutableArrayRef<uint32_t> &hashes,
| ^
1 error generated.

('Uniques', at the call site, line 43, is a SmallVector...)

Copy link
Contributor

@felipepiovezan felipepiovezan Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

618 | inline uint32_t computeDebugNamesUniqueHashes(
    MutableArrayRef<uint32_t> &hashes,

Did you make it a reference? It should not be a reference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this function calculates and returns two different values: the bucket count (for which the function is named) and the unique hash count.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hindsight, we should have just returned a pair

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be adjusted post-merge? With C++17 structured bindings, returning multiple values via pairs/tuples should really be the norm, rather than using reference parameters as outputs, in my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to create another PR (probably in a day or two) to fix this. Do you know of any examples I could look at to see how this is done well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With C++17 structured bindings, returning multiple values via pairs/tuples should really be the norm, rather than using reference parameters as outputs, in my opinion.

FWIW I 100% agree with you, but every now and then this runs into the LLVM's community general hesitancy to use auto more aggressively.

I'll try to create another PR (probably in a day or two) to fix this. Do you know of any examples I could look at to see how this is done well?

Can't think of any example this early, but you could probably do something like:

inline std::pair<uint32_t, uint32_t> getDebugNamesBucketAndHashCount((MutableArrayRef<uint32_t> hashes) {
   ...
   return {BucketCount, HashCount};
}

And then on the call site you'd normally do something like:

auto [BuckerCount, HashCount] = getDebugNamesBucketAndHashCount(...)

However, since those are actually member variables, you can't really use a declaration like that. You'll need to store the pair in a temporary variable and assign to the member variables using .first and .second



if (uniqueHashCount > 1024)
BucketCount = uniqueHashCount / 4;
else if (uniqueHashCount > 16)
BucketCount = uniqueHashCount / 2;
else
BucketCount = std::max<uint32_t>(uniqueHashCount, 1);

return BucketCount;
}

// Constants for the GNU pubnames/pubtypes extensions supporting gdb index.
enum GDBIndexEntryKind {
GIEK_NONE,
Expand Down
14 changes: 2 additions & 12 deletions llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,22 +33,12 @@ using namespace llvm;

void AccelTableBase::computeBucketCount() {
// First get the number of unique hashes.
std::vector<uint32_t> Uniques;
SmallVector<uint32_t, 0> Uniques;
Uniques.reserve(Entries.size());
for (const auto &E : Entries)
Uniques.push_back(E.second.HashValue);
array_pod_sort(Uniques.begin(), Uniques.end());
std::vector<uint32_t>::iterator P =
std::unique(Uniques.begin(), Uniques.end());

UniqueHashCount = std::distance(Uniques.begin(), P);

if (UniqueHashCount > 1024)
BucketCount = UniqueHashCount / 4;
else if (UniqueHashCount > 16)
BucketCount = UniqueHashCount / 2;
else
BucketCount = std::max<uint32_t>(UniqueHashCount, 1);
BucketCount = llvm::dwarf::getDebugNamesBucketCount(Uniques, UniqueHashCount);
}

void AccelTableBase::finalize(AsmPrinter *Asm, StringRef Prefix) {
Expand Down