-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[IR] improve hashing quality for ValueInfo #132917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IR] improve hashing quality for ValueInfo #132917
Conversation
@llvm/pr-subscribers-llvm-ir Author: Zhaoxuan Jiang (nocchijiang) ChangesModuleSummaryIndex::propagateAttributes() was observed to take about 25 minutes to complete on a ThinLTO project. Profiling revealed that the majority of this time was spent operating on the MarkedNonReadWriteOnly set. By moving the storage to a per-GlobalValueSummaryInfo basis, the execution time is dramatically reduced to less than 10 seconds. Full diff: https://github.com/llvm/llvm-project/pull/132917.diff 2 Files Affected:
diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h b/llvm/include/llvm/IR/ModuleSummaryIndex.h
index 2650456c49841..5b59bf23f0dad 100644
--- a/llvm/include/llvm/IR/ModuleSummaryIndex.h
+++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h
@@ -165,6 +165,8 @@ struct alignas(8) GlobalValueSummaryInfo {
/// in the GlobalValueMap. Requires a vector in the case of multiple
/// COMDAT values of the same name.
GlobalValueSummaryList SummaryList;
+
+ mutable bool MarkedNonReadWriteOnly = false;
};
/// Map from global value GUID to corresponding summary structures. Use a
diff --git a/llvm/lib/IR/ModuleSummaryIndex.cpp b/llvm/lib/IR/ModuleSummaryIndex.cpp
index d9024b0a8673f..7206c12576855 100644
--- a/llvm/lib/IR/ModuleSummaryIndex.cpp
+++ b/llvm/lib/IR/ModuleSummaryIndex.cpp
@@ -199,9 +199,7 @@ bool ModuleSummaryIndex::isGUIDLive(GlobalValue::GUID GUID) const {
return false;
}
-static void
-propagateAttributesToRefs(GlobalValueSummary *S,
- DenseSet<ValueInfo> &MarkedNonReadWriteOnly) {
+static void propagateAttributesToRefs(GlobalValueSummary *S) {
// If reference is not readonly or writeonly then referenced summary is not
// read/writeonly either. Note that:
// - All references from GlobalVarSummary are conservatively considered as
@@ -213,9 +211,10 @@ propagateAttributesToRefs(GlobalValueSummary *S,
for (auto &VI : S->refs()) {
assert(VI.getAccessSpecifier() == 0 || isa<FunctionSummary>(S));
if (!VI.getAccessSpecifier()) {
- if (!MarkedNonReadWriteOnly.insert(VI).second)
+ if (VI.getRef()->second.MarkedNonReadWriteOnly)
continue;
- } else if (MarkedNonReadWriteOnly.contains(VI))
+ VI.getRef()->second.MarkedNonReadWriteOnly = true;
+ } else if (VI.getRef()->second.MarkedNonReadWriteOnly)
continue;
for (auto &Ref : VI.getSummaryList())
// If references to alias is not read/writeonly then aliasee
@@ -260,7 +259,6 @@ void ModuleSummaryIndex::propagateAttributes(
const DenseSet<GlobalValue::GUID> &GUIDPreservedSymbols) {
if (!PropagateAttrs)
return;
- DenseSet<ValueInfo> MarkedNonReadWriteOnly;
for (auto &P : *this) {
bool IsDSOLocal = true;
for (auto &S : P.second.SummaryList) {
@@ -299,7 +297,7 @@ void ModuleSummaryIndex::propagateAttributes(
GVS->setReadOnly(false);
GVS->setWriteOnly(false);
}
- propagateAttributesToRefs(S.get(), MarkedNonReadWriteOnly);
+ propagateAttributesToRefs(S.get());
// If the flag from any summary is false, the GV is not DSOLocal.
IsDSOLocal &= S->isDSOLocal();
|
@teresajohnson Could you take a look at this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the approach of putting temporary analysis information in the GlobalValueSummaryInfo struct - it doesn't belong there logically, and while probably minimal, it also adds size overhead to the structure across all use cases.
I'm surprised to hear that this change took a 25 minute indexing down to 10s. We build very large apps frequently and I haven't seen anything like this, so I think we'll want to look more at why this is taking so long in your case. From the profile, is most of the time spent on the set insert, or on the set contains check?
It would be slightly better to put this in the Flags in the ValueInfo itself (where we already have the read only / write only flags being checked here), but I'd like a better understanding of why this is behaving so extreme in your app first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple ideas for improving the efficiency of this code below. I'm curious as to what the index graph looks like in your case. This isn't a recursive algorithm, so there presumably are just an extreme number of references for it to take so long. But then I'm surprised that improving this would drop the indexing time so dramatically to complete in 10s, as the rest of the indexing contains other graph traversals such as computing imports, and those aren't free.
llvm/lib/IR/ModuleSummaryIndex.cpp
Outdated
@@ -199,9 +199,7 @@ bool ModuleSummaryIndex::isGUIDLive(GlobalValue::GUID GUID) const { | |||
return false; | |||
} | |||
|
|||
static void | |||
propagateAttributesToRefs(GlobalValueSummary *S, | |||
DenseSet<ValueInfo> &MarkedNonReadWriteOnly) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether it would be a bit more efficient to make this a DenseSet of the GlobalValue::GUID instead of the ValueInfo
llvm/lib/IR/ModuleSummaryIndex.cpp
Outdated
continue; | ||
} else if (MarkedNonReadWriteOnly.contains(VI)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check might be useless and possibly can be removed. We only get here and do the else if check if getAccessSpecifier() is non-zero, which means the VI is still either read or write only. But we only add a VI to the MarkedNonReadWriteOnly set if its getAccessSpecifier() is 0, and we only ever go towards 0 (marking things not read or write only), never the reverse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add an assert in the else case though to confirm this assumption (that MarkedNonReadWriteOnly.contains(VI) is false).
Thanks for your quick feedback!
I don't like it neither. I will explain the reason below.
During my investigation of an LTO miscompile affecting two iOS applications with predominantly Objective-C code, I encountered the performance issue being addressed. The sample profile generated using I have noticed that you have made several suggestions in the review comments that I would definitely like to try and tell you how they would work.
I must admit that I am not familiar with the code related to ThinLTO. The reason I eventually chose to put the flag in |
I tested the 2 ideas from the review comments. Unfortunately neither of them make noticeable improvements and I believe Update: moving |
I managed to grab some interesting profiling data with a RelWithDebInfo BaselineMost time are spent in
|
On Thu, Mar 27, 2025 at 4:38 AM Zhaoxuan Jiang ***@***.***> wrote:
I managed to grab some interesting profiling data with a RelWithDebInfo
libLTO.dylib.
Baseline
image.png (view on web)
<https://github.com/user-attachments/assets/def0f19c-b6b0-43de-879a-b3febde4ded0>
Most time are spent in insert:
image.png (view on web)
<https://github.com/user-attachments/assets/f3063d33-6ef7-4fe2-8cd8-98355fc741bb>
19.52 min 68.9% 0 s llvm::detail::DenseSetImpl<llvm::ValueInfo, llvm::DenseMap<llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>, llvm::DenseMapInfo<llvm::ValueInfo, void>>::insert(llvm::ValueInfo const&)
1.15 min 4.0% 0 s llvm::detail::DenseSetImpl<llvm::ValueInfo, llvm::DenseMap<llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>, llvm::DenseMapInfo<llvm::ValueInfo, void>>::contains(llvm::ValueInfo const&) const
MarkedNonReadWriteOnly in GlobalValueSummaryInfo
image.png (view on web)
<https://github.com/user-attachments/assets/06f5af9d-a73d-478c-9288-e34bb74d9400>
MarkedNonReadWriteOnly in ValueInfo
image.png (view on web)
<https://github.com/user-attachments/assets/4c20c3e2-7c65-44cd-9b49-53fd80e875d1>
image.png (view on web)
<https://github.com/user-attachments/assets/c5a4ab1d-8fbf-4cb7-90ab-aec763c34188>
I strongly suspect that the relationship between ValueInfo and
GlobalValueSummaryInfo is not strictly 1-to-1, otherwise the 10x
performance difference cannot be explained. This is also the reason why I
initially decided to declare the flag in GlobalValueSummaryInfo:
DenseMapInfo<ValueInfo> utilizes the pointer identity of
GlobalValueSummaryMapTy entry, so the MarkedNonReadWriteOnly set was
effectively marking the underlying GlobalValueSummaryInfo rather than
ValueInfo. However, I am uncertain whether this is the expected behavior.
—
Reply to this email directly, view it on GitHub
<#132917 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE37ZQ5376UH7TNMBDDCGWT2WPPL5AVCNFSM6AAAAABZXJM6NGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONJXG4ZTCNZYGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: nocchijiang]*nocchijiang* left a comment
(llvm/llvm-project#132917)
<#132917 (comment)>
I managed to grab some interesting profiling data with a RelWithDebInfo
libLTO.dylib.
Baseline
image.png (view on web)
<https://github.com/user-attachments/assets/def0f19c-b6b0-43de-879a-b3febde4ded0>
Most time are spent in insert:
image.png (view on web)
<https://github.com/user-attachments/assets/f3063d33-6ef7-4fe2-8cd8-98355fc741bb>
19.52 min 68.9% 0 s llvm::detail::DenseSetImpl<llvm::ValueInfo, llvm::DenseMap<llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>, llvm::DenseMapInfo<llvm::ValueInfo, void>>::insert(llvm::ValueInfo const&)
1.15 min 4.0% 0 s llvm::detail::DenseSetImpl<llvm::ValueInfo, llvm::DenseMap<llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>, llvm::DenseMapInfo<llvm::ValueInfo, void>>::contains(llvm::ValueInfo const&) const
MarkedNonReadWriteOnly in GlobalValueSummaryInfo
image.png (view on web)
<https://github.com/user-attachments/assets/06f5af9d-a73d-478c-9288-e34bb74d9400>
MarkedNonReadWriteOnly in ValueInfo
image.png (view on web)
<https://github.com/user-attachments/assets/4c20c3e2-7c65-44cd-9b49-53fd80e875d1>
image.png (view on web)
<https://github.com/user-attachments/assets/c5a4ab1d-8fbf-4cb7-90ab-aec763c34188>
I strongly suspect that the relationship between ValueInfo and
GlobalValueSummaryInfo is not strictly 1-to-1, otherwise the 10x
performance difference cannot be explained. This is also the reason why I
initially decided to declare the flag in GlobalValueSummaryInfo:
DenseMapInfo<ValueInfo> utilizes the pointer identity of
GlobalValueSummaryMapTy entry, so the MarkedNonReadWriteOnly set was
effectively marking the underlying GlobalValueSummaryInfo rather than
ValueInfo. However, I am uncertain whether this is the expected behavior.
Sorry for the noise on this suggestion - thinking it through more it
doesn't make a lot of sense to put in the ValueInfo. The
GlobalValueSummaryInfo structs are the values in the index map, and each
ValueInfo is essentially a pointer to one of them, along with a few bits,
and those bits can vary by the characteristics of the reference. They are
also meant to be immutable after building the summary. As you note the
ValueInfo comparisons only look at the pointer value, not these bits, which
is why a set of them works to unique the entries to the underlying summary
index map entries.
I need to think about this some more, as stashing temporary analysis
results in these structures isn't ideal. I'm still trying to understand how
this code could be so slow, compared to everything else we do in the
indexing step. Since there can be at most one entry in the
MarkedNonReadWriteOnly set per summary index map (`GlobalValueSummaryMapTy
GlobalValueMap`) entry, the actual inserting in theory shouldn't be
massively slower than building that map in the first place. So it must just
be an extreme number of lookups during insert attempts that is causing it
to be so slow? Can you collect some stats: how big the set is at the end,
how many times propagateAttributesToRefs is called, how many total refs()
get processed in that loop, and how many of those have
`!VI.getAccessSpecifier()`?
—
… Reply to this email directly, view it on GitHub
<#132917 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE37ZQ5376UH7TNMBDDCGWT2WPPL5AVCNFSM6AAAAABZXJM6NGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONJXG4ZTCNZYGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Teresa Johnson | Software Engineer | ***@***.*** |
|
The numbers are:
|
Inserting 50M values into a hash table shouldn't really take an appreciable amount of time. I wonder whether you're hitting some kind of catastrophic hash collision issue and complexity degenerates quadratically? The implementation is
Which is already somewhat bad because the low three bits of I.getRef() are guaranteed to be zero, so you could try whether adding a |
Applying the shift does help, but it is still far from acceptable.
Wrapping the shifted value in
I will update the patch as per @nikic 's great suggestion. |
44fa6fd
to
3d6ee17
Compare
@@ -300,7 +301,9 @@ template <> struct DenseMapInfo<ValueInfo> { | |||
assert(isSpecialKey(L) || isSpecialKey(R) || (L.haveGVs() == R.haveGVs())); | |||
return L.getRef() == R.getRef(); | |||
} | |||
static unsigned getHashValue(ValueInfo I) { return (uintptr_t)I.getRef(); } | |||
static unsigned getHashValue(ValueInfo I) { | |||
return hash_value(((uintptr_t)I.getRef() >> ValueInfo::NumFlagsBit)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be a need to use the shift if you're using hash_value. The shift helps because the low bits of the hash get used, but if you're using hash_value, it makes no difference whether the low or high bits are zero, everything will get thoroughly mixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I dropped the shift in the latest patch.
3d6ee17
to
87f2f66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikic nice find!
This could presumably help other places that use a DenseSet/Map of ValueInfo. Perhaps the description should be about improving the hashing of ValueInfo and note the massive improvement to propagateAttributes in your case in the summary.
The current hashing quality for ValueInfo is poor because it uses pointers as the hash value, which can negatively impact performance in various places that use a DenseSet/Map of ValueInfo. In one observed case, ModuleSummaryIndex::propagateAttributes() was taking about 25 minutes to complete on a ThinLTO application. Profiling revealed that the majority of this time was spent operating on the MarkedNonReadWriteOnly set. With the improved hashing, the execution time for propagateAttributes is dramatically reduced to less than 10 seconds.
87f2f66
to
007929b
Compare
Updated the title and description of the PR and commit message as suggested by @teresajohnson. Could you also help me merge this PR since I don't have commit access? |
Done! |
The current hashing quality for `ValueInfo` is poor because it uses pointers as the hash value, which can negatively impact performance in various places that use a `DenseSet`/`Map` of `ValueInfo`. In one observed case, `ModuleSummaryIndex::propagateAttributes()` was taking about 25 minutes to complete on a ThinLTO application. Profiling revealed that the majority of this time was spent operating on the `MarkedNonReadWriteOnly` set. With the improved hashing, the execution time for `propagateAttributes` is dramatically reduced to less than 10 seconds.
The current hashing quality for `ValueInfo` is poor because it uses pointers as the hash value, which can negatively impact performance in various places that use a `DenseSet`/`Map` of `ValueInfo`. In one observed case, `ModuleSummaryIndex::propagateAttributes()` was taking about 25 minutes to complete on a ThinLTO application. Profiling revealed that the majority of this time was spent operating on the `MarkedNonReadWriteOnly` set. With the improved hashing, the execution time for `propagateAttributes` is dramatically reduced to less than 10 seconds.
The current hashing quality for
ValueInfo
is poor because it uses pointers as the hash value, which can negatively impact performance in various places that use aDenseSet
/Map
ofValueInfo
. In one observed case,ModuleSummaryIndex::propagateAttributes()
was taking about 25 minutes to complete on a ThinLTO application. Profiling revealed that the majority of this time was spent operating on theMarkedNonReadWriteOnly
set.With the improved hashing, the execution time for
propagateAttributes
is dramatically reduced to less than 10 seconds.