Skip to content

[IR] improve hashing quality for ValueInfo #132917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

nocchijiang
Copy link
Contributor

@nocchijiang nocchijiang commented Mar 25, 2025

The current hashing quality for ValueInfo is poor because it uses pointers as the hash value, which can negatively impact performance in various places that use a DenseSet/Map of ValueInfo. In one observed case, ModuleSummaryIndex::propagateAttributes() was taking about 25 minutes to complete on a ThinLTO application. Profiling revealed that the majority of this time was spent operating on the MarkedNonReadWriteOnly set.

With the improved hashing, the execution time for propagateAttributes is dramatically reduced to less than 10 seconds.

@llvmbot
Copy link
Member

llvmbot commented Mar 25, 2025

@llvm/pr-subscribers-llvm-ir

Author: Zhaoxuan Jiang (nocchijiang)

Changes

ModuleSummaryIndex::propagateAttributes() was observed to take about 25 minutes to complete on a ThinLTO project. Profiling revealed that the majority of this time was spent operating on the MarkedNonReadWriteOnly set.

By moving the storage to a per-GlobalValueSummaryInfo basis, the execution time is dramatically reduced to less than 10 seconds.


Full diff: https://github.com/llvm/llvm-project/pull/132917.diff

2 Files Affected:

  • (modified) llvm/include/llvm/IR/ModuleSummaryIndex.h (+2)
  • (modified) llvm/lib/IR/ModuleSummaryIndex.cpp (+5-7)
diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h b/llvm/include/llvm/IR/ModuleSummaryIndex.h
index 2650456c49841..5b59bf23f0dad 100644
--- a/llvm/include/llvm/IR/ModuleSummaryIndex.h
+++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h
@@ -165,6 +165,8 @@ struct alignas(8) GlobalValueSummaryInfo {
   /// in the GlobalValueMap. Requires a vector in the case of multiple
   /// COMDAT values of the same name.
   GlobalValueSummaryList SummaryList;
+
+  mutable bool MarkedNonReadWriteOnly = false;
 };
 
 /// Map from global value GUID to corresponding summary structures. Use a
diff --git a/llvm/lib/IR/ModuleSummaryIndex.cpp b/llvm/lib/IR/ModuleSummaryIndex.cpp
index d9024b0a8673f..7206c12576855 100644
--- a/llvm/lib/IR/ModuleSummaryIndex.cpp
+++ b/llvm/lib/IR/ModuleSummaryIndex.cpp
@@ -199,9 +199,7 @@ bool ModuleSummaryIndex::isGUIDLive(GlobalValue::GUID GUID) const {
   return false;
 }
 
-static void
-propagateAttributesToRefs(GlobalValueSummary *S,
-                          DenseSet<ValueInfo> &MarkedNonReadWriteOnly) {
+static void propagateAttributesToRefs(GlobalValueSummary *S) {
   // If reference is not readonly or writeonly then referenced summary is not
   // read/writeonly either. Note that:
   // - All references from GlobalVarSummary are conservatively considered as
@@ -213,9 +211,10 @@ propagateAttributesToRefs(GlobalValueSummary *S,
   for (auto &VI : S->refs()) {
     assert(VI.getAccessSpecifier() == 0 || isa<FunctionSummary>(S));
     if (!VI.getAccessSpecifier()) {
-      if (!MarkedNonReadWriteOnly.insert(VI).second)
+      if (VI.getRef()->second.MarkedNonReadWriteOnly)
         continue;
-    } else if (MarkedNonReadWriteOnly.contains(VI))
+      VI.getRef()->second.MarkedNonReadWriteOnly = true;
+    } else if (VI.getRef()->second.MarkedNonReadWriteOnly)
       continue;
     for (auto &Ref : VI.getSummaryList())
       // If references to alias is not read/writeonly then aliasee
@@ -260,7 +259,6 @@ void ModuleSummaryIndex::propagateAttributes(
     const DenseSet<GlobalValue::GUID> &GUIDPreservedSymbols) {
   if (!PropagateAttrs)
     return;
-  DenseSet<ValueInfo> MarkedNonReadWriteOnly;
   for (auto &P : *this) {
     bool IsDSOLocal = true;
     for (auto &S : P.second.SummaryList) {
@@ -299,7 +297,7 @@ void ModuleSummaryIndex::propagateAttributes(
           GVS->setReadOnly(false);
           GVS->setWriteOnly(false);
         }
-      propagateAttributesToRefs(S.get(), MarkedNonReadWriteOnly);
+      propagateAttributesToRefs(S.get());
 
       // If the flag from any summary is false, the GV is not DSOLocal.
       IsDSOLocal &= S->isDSOLocal();

@nocchijiang
Copy link
Contributor Author

@teresajohnson Could you take a look at this patch?

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the approach of putting temporary analysis information in the GlobalValueSummaryInfo struct - it doesn't belong there logically, and while probably minimal, it also adds size overhead to the structure across all use cases.

I'm surprised to hear that this change took a 25 minute indexing down to 10s. We build very large apps frequently and I haven't seen anything like this, so I think we'll want to look more at why this is taking so long in your case. From the profile, is most of the time spent on the set insert, or on the set contains check?

It would be slightly better to put this in the Flags in the ValueInfo itself (where we already have the read only / write only flags being checked here), but I'd like a better understanding of why this is behaving so extreme in your app first.

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple ideas for improving the efficiency of this code below. I'm curious as to what the index graph looks like in your case. This isn't a recursive algorithm, so there presumably are just an extreme number of references for it to take so long. But then I'm surprised that improving this would drop the indexing time so dramatically to complete in 10s, as the rest of the indexing contains other graph traversals such as computing imports, and those aren't free.

@@ -199,9 +199,7 @@ bool ModuleSummaryIndex::isGUIDLive(GlobalValue::GUID GUID) const {
return false;
}

static void
propagateAttributesToRefs(GlobalValueSummary *S,
DenseSet<ValueInfo> &MarkedNonReadWriteOnly) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether it would be a bit more efficient to make this a DenseSet of the GlobalValue::GUID instead of the ValueInfo

continue;
} else if (MarkedNonReadWriteOnly.contains(VI))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check might be useless and possibly can be removed. We only get here and do the else if check if getAccessSpecifier() is non-zero, which means the VI is still either read or write only. But we only add a VI to the MarkedNonReadWriteOnly set if its getAccessSpecifier() is 0, and we only ever go towards 0 (marking things not read or write only), never the reverse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add an assert in the else case though to confirm this assumption (that MarkedNonReadWriteOnly.contains(VI) is false).

@nocchijiang
Copy link
Contributor Author

Thanks for your quick feedback!

I don't like the approach of putting temporary analysis information in the GlobalValueSummaryInfo struct - it doesn't belong there logically, and while probably minimal, it also adds size overhead to the structure across all use cases.

I don't like it neither. I will explain the reason below.

I'm surprised to hear that this change took a 25 minute indexing down to 10s. We build very large apps frequently and I haven't seen anything like this, so I think we'll want to look more at why this is taking so long in your case. From the profile, is most of the time spent on the set insert, or on the set contains check?

During my investigation of an LTO miscompile affecting two iOS applications with predominantly Objective-C code, I encountered the performance issue being addressed.

The sample profile generated using Instruments.app from Xcode revealed only a single operation grow under propagateAttributes. However, analyzing a few sample stack traces from Console.app, I observed that the leaf frames consistently manipulating the MarkedNonReadWriteOnly set. Also, I recorded the profile with a release version of libLTO.dylib as the debug version takes forever to complete. Therefore, it is challenging to pinpoint the specific operation that consumes more time.

I have noticed that you have made several suggestions in the review comments that I would definitely like to try and tell you how they would work.

It would be slightly better to put this in the Flags in the ValueInfo itself (where we already have the read only / write only flags being checked here), but I'd like a better understanding of why this is behaving so extreme in your app first.

I must admit that I am not familiar with the code related to ThinLTO. The reason I eventually chose to put the flag in GlobalValueSummaryInfo is that DenseMapInfo<ValueInfo> utilizes the pointer identity of GlobalValueSummaryMapTy entry. If you believe it is acceptable to store the data directly within ValueInfo, I have no objections to that.

@nocchijiang
Copy link
Contributor Author

nocchijiang commented Mar 26, 2025

I tested the 2 ideas from the review comments. Unfortunately neither of them make noticeable improvements and I believe DenseSet overhead has to be avoided with a reasonable cost. Moving MarkedNonReadWriteOnly to ValueInfo and redesigning the inline flags to fit RO/WO/MarkedNonRWO into a 4-case enum results in a noticeable performance hit - now it takes 1 minute to complete propagateAttributes(). It appears that we must sacrifice some memory in order to enhance the performance. Thoughts?

Update: moving MarkedNonReadWriteOnly to ValueInfo without touching existing data structure results in the same performance hit.

@nocchijiang
Copy link
Contributor Author

I managed to grab some interesting profiling data with a RelWithDebInfo libLTO.dylib.

Baseline

image

Most time are spent in insert:

image

19.52 min   68.9%        0 s                             llvm::detail::DenseSetImpl<llvm::ValueInfo, llvm::DenseMap<llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>, llvm::DenseMapInfo<llvm::ValueInfo, void>>::insert(llvm::ValueInfo const&)
1.15 min    4.0%        0 s                             llvm::detail::DenseSetImpl<llvm::ValueInfo, llvm::DenseMap<llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>, llvm::DenseMapInfo<llvm::ValueInfo, void>>::contains(llvm::ValueInfo const&) const

MarkedNonReadWriteOnly in GlobalValueSummaryInfo

image

MarkedNonReadWriteOnly in ValueInfo

image

image

I strongly suspect that the relationship between ValueInfo and GlobalValueSummaryInfo is not strictly 1-to-1, otherwise the 10x performance difference cannot be explained. This is also the reason why I initially decided to declare the flag in GlobalValueSummaryInfo: DenseMapInfo<ValueInfo> utilizes the pointer identity of GlobalValueSummaryMapTy entry, so the MarkedNonReadWriteOnly set was effectively marking the underlying GlobalValueSummaryInfo rather than ValueInfo. However, I am uncertain whether this is the expected behavior.

@teresajohnson
Copy link
Contributor

teresajohnson commented Mar 27, 2025 via email

@nocchijiang
Copy link
Contributor Author

The numbers are:

  • set size: 23,373,881 at the end
  • propagateAttributesToRefs is called 24,528,510 times
  • 54,272,534 refs() get processed in that loop
  • 48,428,967 of these have !VI.getAccessSpecifier()

@nikic
Copy link
Contributor

nikic commented Apr 4, 2025

Inserting 50M values into a hash table shouldn't really take an appreciable amount of time. I wonder whether you're hitting some kind of catastrophic hash collision issue and complexity degenerates quadratically?

The implementation is

static unsigned getHashValue(ValueInfo I) { return (uintptr_t)I.getRef(); }

Which is already somewhat bad because the low three bits of I.getRef() are guaranteed to be zero, so you could try whether adding a >> 3 improves things a bit. And then maybe try whether wrapping it in a hash_value() helps even more.

@nocchijiang
Copy link
Contributor Author

Applying the shift does help, but it is still far from acceptable.

Weight	Self Weight		Symbol Name
9.94 min   56.3%	929.00 ms	 	llvm::ModuleSummaryIndex::propagateAttributes(llvm::DenseSet<unsigned long long, llvm::DenseMapInfo<unsigned long long, void>> const&)
9.79 min   55.5%	1.96 s	 	 propagateAttributesToRefs(llvm::GlobalValueSummary*, llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void>>&)
8.93 min   50.6%	0 s	 	  llvm::detail::DenseSetImpl<llvm::ValueInfo, llvm::DenseMap<llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>, llvm::DenseMapInfo<llvm::ValueInfo, void>>::insert(llvm::ValueInfo const&)
8.93 min   50.6%	0 s	 	   std::__1::pair<llvm::DenseMapIterator<llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>, false>, bool> llvm::DenseMapBase<llvm::DenseMap<llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>, llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>::try_emplace<llvm::detail::DenseSetEmpty&>(llvm::ValueInfo const&, llvm::detail::DenseSetEmpty&)
8.93 min   50.6%	0 s	 	    bool llvm::DenseMapBase<llvm::DenseMap<llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>, llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>::LookupBucketFor<llvm::ValueInfo>(llvm::ValueInfo const&, llvm::detail::DenseSetPair<llvm::ValueInfo>*&)
8.93 min   50.6%	8.93 min	 	     bool llvm::DenseMapBase<llvm::DenseMap<llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>, llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>::LookupBucketFor<llvm::ValueInfo>(llvm::ValueInfo const&, llvm::detail::DenseSetPair<llvm::ValueInfo> const*&) const

Wrapping the shifted value in hash_value() does the trick:

Weight	Self Weight		Symbol Name
5.13 s    1.1%	577.00 ms	 	llvm::ModuleSummaryIndex::propagateAttributes(llvm::DenseSet<unsigned long long, llvm::DenseMapInfo<unsigned long long, void>> const&)
2.54 s    0.5%	1.57 s	 	 propagateAttributesToRefs(llvm::GlobalValueSummary*, llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void>>&)
456.00 ms    0.1%	0 s	 	  llvm::detail::DenseSetImpl<llvm::ValueInfo, llvm::DenseMap<llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>, llvm::DenseMapInfo<llvm::ValueInfo, void>>::contains(llvm::ValueInfo const&) const
456.00 ms    0.1%	50.00 ms	 	   llvm::DenseMapBase<llvm::DenseMap<llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>, llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>::find(llvm::ValueInfo const&) const
406.00 ms    0.0%	393.00 ms	 	    bool llvm::DenseMapBase<llvm::DenseMap<llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>, llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>::LookupBucketFor<llvm::ValueInfo>(llvm::ValueInfo const&, llvm::detail::DenseSetPair<llvm::ValueInfo> const*&) const
12.00 ms    0.0%	2.00 ms	 	     llvm::DenseMapBase<llvm::DenseMap<llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>, llvm::ValueInfo, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::ValueInfo, void>, llvm::detail::DenseSetPair<llvm::ValueInfo>>::getHashValue(llvm::ValueInfo const&)
10.00 ms    0.0%	0 s	 	      llvm::DenseMapInfo<llvm::ValueInfo, void>::getHashValue(llvm::ValueInfo)
10.00 ms    0.0%	0 s	 	       std::__1::enable_if<is_integral_or_enum<unsigned long>::value, llvm::hash_code>::type llvm::hash_value<unsigned long>(unsigned long)

I will update the patch as per @nikic 's great suggestion.

@nocchijiang nocchijiang force-pushed the propagate_attributes_takes_forever_to_run branch from 44fa6fd to 3d6ee17 Compare April 8, 2025 09:24
@@ -300,7 +301,9 @@ template <> struct DenseMapInfo<ValueInfo> {
assert(isSpecialKey(L) || isSpecialKey(R) || (L.haveGVs() == R.haveGVs()));
return L.getRef() == R.getRef();
}
static unsigned getHashValue(ValueInfo I) { return (uintptr_t)I.getRef(); }
static unsigned getHashValue(ValueInfo I) {
return hash_value(((uintptr_t)I.getRef() >> ValueInfo::NumFlagsBit));
Copy link
Contributor

@nikic nikic Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be a need to use the shift if you're using hash_value. The shift helps because the low bits of the hash get used, but if you're using hash_value, it makes no difference whether the low or high bits are zero, everything will get thoroughly mixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I dropped the shift in the latest patch.

@nocchijiang nocchijiang force-pushed the propagate_attributes_takes_forever_to_run branch from 3d6ee17 to 87f2f66 Compare April 8, 2025 09:47
Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic nice find!

This could presumably help other places that use a DenseSet/Map of ValueInfo. Perhaps the description should be about improving the hashing of ValueInfo and note the massive improvement to propagateAttributes in your case in the summary.

The current hashing quality for ValueInfo is poor because it uses
pointers as the hash value, which can negatively impact performance in
various places that use a DenseSet/Map of ValueInfo. In one observed
case, ModuleSummaryIndex::propagateAttributes() was taking about 25
minutes to complete on a ThinLTO application. Profiling revealed that
the majority of this time was spent operating on the
MarkedNonReadWriteOnly set.

With the improved hashing, the execution time for propagateAttributes is
dramatically reduced to less than 10 seconds.
@nocchijiang nocchijiang force-pushed the propagate_attributes_takes_forever_to_run branch from 87f2f66 to 007929b Compare April 9, 2025 00:43
@nocchijiang nocchijiang changed the title [ThinLTO] optimize propagateAttributes performance [IR] improve hashing quality for ValueInfo Apr 9, 2025
@nocchijiang
Copy link
Contributor Author

Updated the title and description of the PR and commit message as suggested by @teresajohnson. Could you also help me merge this PR since I don't have commit access?

@teresajohnson teresajohnson merged commit e24c9e7 into llvm:main Apr 9, 2025
11 checks passed
@teresajohnson
Copy link
Contributor

Updated the title and description of the PR and commit message as suggested by @teresajohnson. Could you also help me merge this PR since I don't have commit access?

Done!

AllinLeeYL pushed a commit to AllinLeeYL/llvm-project that referenced this pull request Apr 10, 2025
The current hashing quality for `ValueInfo` is poor because it uses
pointers as the hash value, which can negatively impact performance in
various places that use a `DenseSet`/`Map` of `ValueInfo`. In one
observed case, `ModuleSummaryIndex::propagateAttributes()` was taking
about 25 minutes to complete on a ThinLTO application. Profiling
revealed that the majority of this time was spent operating on the
`MarkedNonReadWriteOnly` set.

With the improved hashing, the execution time for `propagateAttributes`
is dramatically reduced to less than 10 seconds.
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
The current hashing quality for `ValueInfo` is poor because it uses
pointers as the hash value, which can negatively impact performance in
various places that use a `DenseSet`/`Map` of `ValueInfo`. In one
observed case, `ModuleSummaryIndex::propagateAttributes()` was taking
about 25 minutes to complete on a ThinLTO application. Profiling
revealed that the majority of this time was spent operating on the
`MarkedNonReadWriteOnly` set.

With the improved hashing, the execution time for `propagateAttributes`
is dramatically reduced to less than 10 seconds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants