Skip to content

[TypeProf][PGO]Support skipping vtable comparisons for a class and its derived ones #110575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 2, 2024

Conversation

mingmingl-llvm
Copy link
Contributor

@mingmingl-llvm mingmingl-llvm commented Sep 30, 2024

Performance critical core libraries could be highly-optimized for arch or micro-arch features. For instance, the absl crc library specializes different templated classes among different hardwares [1]. In a practical setting, it's likely that instrumented profiles are collected on one type of machine and used to optimize binaries that run on multiple types of hardwares.

While this kind of specialization is rare in terms of lines of code, compiler can do a better job to skip vtable-based ICP.

  • The per-class Extend implementation is arch-specific as well. If an instrumented profile is collected on one arch and applied to another arch where Extend implementation is different, Extend might be regarded as unlikely function in the latter case. ABSL_ATTRIBUTE_HOT annotation alleviates the problem by putting all Extend implementation into the hot text section [2]

This change introduces a comma-separated list to specify the mangled vtable names, and ICP pass will skip vtable-based comparison if a vtable variable definition is shown to be in its class hierarchy (per LLVM type metadata).

[1] https://github.com/abseil/abseil-cpp/blob/c6b27359c3d27438b1313dddd7598914c1274a50/absl/crc/internal/crc_x86_arm_combined.cc#L621-L650
[2] https://github.com/abseil/abseil-cpp/blame/c6b27359c3d27438b1313dddd7598914c1274a50/absl/crc/internal/crc_x86_arm_combined.cc#L370C3-L370C21

@mingmingl-llvm mingmingl-llvm marked this pull request as ready for review September 30, 2024 21:30
@llvmbot llvmbot added PGO Profile Guided Optimizations llvm:transforms labels Sep 30, 2024
@llvmbot
Copy link
Member

llvmbot commented Sep 30, 2024

@llvm/pr-subscribers-llvm-support

@llvm/pr-subscribers-pgo

Author: Mingming Liu (minglotus-6)

Changes

Performance critical core libraries could be highly-optimized for arch or micro-arch features. For instance, the absl crc library specializes different templated classes among different hardwares [1]. In a practical setting, it's likely that instrumented profiles are collected on one type of machine and used to optimize binaries that run on multiple types of hardwares.

While this kind of specialization is rare in terms of lines of code, compiler can do a better job to skip vtable-based ICP.

  • The per-class Extend implementation is arch-specific as well. If an instrumented profile is collected on one arch and applied to another arch where Extend implementation is different, Extend might be regarded as unlikely function in the latter case. ABSL_ATTRIBUTE_HOT annotation alleviates the problem by putting all Extend implementation into the hot text section [2]

[1] https://github.com/abseil/abseil-cpp/blob/c6b27359c3d27438b1313dddd7598914c1274a50/absl/crc/internal/crc_x86_arm_combined.cc#L621-L650
[2] https://github.com/abseil/abseil-cpp/blame/c6b27359c3d27438b1313dddd7598914c1274a50/absl/crc/internal/crc_x86_arm_combined.cc#L370C3-L370C21


Full diff: https://github.com/llvm/llvm-project/pull/110575.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp (+42-4)
  • (modified) llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll (+1)
diff --git a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
index fbed593ab3aa74..6031bb8c05fa3f 100644
--- a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
+++ b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
@@ -132,6 +132,11 @@ static cl::opt<int> ICPMaxNumVTableLastCandidate(
     "icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden,
     cl::desc("The maximum number of vtable for the last candidate."));
 
+static cl::opt<std::string> ICPKnownUnrepresentativeVTables(
+    "icp-known-unrepresentative-vtables", cl::init(""), cl::Hidden,
+    cl::desc("A comma-separated list of mangled vtable names for which instrumented
+    profiles are not representative. For instance, the instantiated class is arch or micro-arch specific, while instrumented profiles are collected on one arch."));
+
 namespace {
 
 // The key is a vtable global variable, and the value is a map.
@@ -316,6 +321,8 @@ class IndirectCallPromoter {
 
   OptimizationRemarkEmitter &ORE;
 
+  const DenseSet<StringRef> &KnownUnrepresentativeBaseTypes;
+
   // A struct that records the direct target and it's call count.
   struct PromotionCandidate {
     Function *const TargetFunction;
@@ -391,10 +398,12 @@ class IndirectCallPromoter {
       Function &Func, Module &M, InstrProfSymtab *Symtab, bool SamplePGO,
       const VirtualCallSiteTypeInfoMap &VirtualCSInfo,
       VTableAddressPointOffsetValMap &VTableAddressPointOffsetVal,
+      DenseSet<StringRef> &KnownUnrepresentativeTypes,
       OptimizationRemarkEmitter &ORE)
       : F(Func), M(M), Symtab(Symtab), SamplePGO(SamplePGO),
         VirtualCSInfo(VirtualCSInfo),
-        VTableAddressPointOffsetVal(VTableAddressPointOffsetVal), ORE(ORE) {}
+        VTableAddressPointOffsetVal(VTableAddressPointOffsetVal), ORE(ORE),
+        KnownUnrepresentativeBaseTypes(KnownUnrepresentativeTypes) {}
   IndirectCallPromoter(const IndirectCallPromoter &) = delete;
   IndirectCallPromoter &operator=(const IndirectCallPromoter &) = delete;
 
@@ -851,8 +860,26 @@ bool IndirectCallPromoter::isProfitableToCompareVTables(
     LLVM_DEBUG(dbgs() << "\n");
 
     uint64_t CandidateVTableCount = 0;
-    for (auto &[GUID, Count] : VTableGUIDAndCounts)
+    SmallVector<MDNode *, 2> Types;
+    for (auto &[GUID, Count] : VTableGUIDAndCounts) {
       CandidateVTableCount += Count;
+      auto *VTableVar = Symtab->getGlobalVariable(GUID);
+
+      assert(VTableVar &&
+             "VTableVar must exist for GUID in VTableGUIDAndCounts");
+
+      Types.clear();
+      VTableVar->getMetadata(LLVMContext::MD_type, Types);
+
+      for (auto *Type : Types)
+        if (auto *TypeId = dyn_cast<MDString>(Type->getOperand(1).get()))
+          if (KnownUnrepresentativeBaseTypes.contains(TypeId->getString())) {
+            LLVM_DEBUG(dbgs()
+                       << "    vtable profiles are known to be "
+                          "unrepresentative. Bail out vtable comparison.")
+            return false;
+          }
+    }
 
     if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) {
       LLVM_DEBUG(
@@ -956,9 +983,19 @@ static bool promoteIndirectCalls(Module &M, ProfileSummaryInfo *PSI, bool InLTO,
   bool Changed = false;
   VirtualCallSiteTypeInfoMap VirtualCSInfo;
 
-  if (EnableVTableProfileUse)
+  DenseSet<StringRef> KnownUnrepresentativeTypeSet;
+
+  if (EnableVTableProfileUse) {
     computeVirtualCallSiteTypeInfoMap(M, MAM, VirtualCSInfo);
 
+    SmallVector<StringRef> KnownUnrepresentativeTypes;
+    llvm::SplitString(ICPKnownUnrepresentativeVTables,
+                      KnownUnrepresentativeTypes);
+
+    for (const StringRef Str : KnownUnrepresentativeTypes)
+      KnownUnrepresentativeTypeSet.insert(Str);
+  }
+
   // VTableAddressPointOffsetVal stores the vtable address points. The vtable
   // address point of a given <vtable, address point offset> is static (doesn't
   // change after being computed once).
@@ -977,7 +1014,8 @@ static bool promoteIndirectCalls(Module &M, ProfileSummaryInfo *PSI, bool InLTO,
     auto &ORE = FAM.getResult<OptimizationRemarkEmitterAnalysis>(F);
 
     IndirectCallPromoter CallPromoter(F, M, &Symtab, SamplePGO, VirtualCSInfo,
-                                      VTableAddressPointOffsetVal, ORE);
+                                      VTableAddressPointOffsetVal,
+                                      KnownUnrepresentativeTypeSet, ORE);
     bool FuncChanged = CallPromoter.processFunction(PSI);
     if (ICPDUMPAFTER && FuncChanged) {
       LLVM_DEBUG(dbgs() << "\n== IR Dump After =="; F.print(dbgs()));
diff --git a/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll b/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll
index b6afce3d7c6d5d..bbae25787a05c6 100644
--- a/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll
+++ b/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll
@@ -2,6 +2,7 @@
 
 ; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=2 -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,VTABLE-CMP
 ; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=1 -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,FUNC-CMP
+; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=1 -icp-known-unrepresentative-vtables='Base1,Derived3' -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,FUNC-CMP
 
 target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-unknown-linux-gnu"

@llvmbot
Copy link
Member

llvmbot commented Sep 30, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Mingming Liu (minglotus-6)

Changes

Performance critical core libraries could be highly-optimized for arch or micro-arch features. For instance, the absl crc library specializes different templated classes among different hardwares [1]. In a practical setting, it's likely that instrumented profiles are collected on one type of machine and used to optimize binaries that run on multiple types of hardwares.

While this kind of specialization is rare in terms of lines of code, compiler can do a better job to skip vtable-based ICP.

  • The per-class Extend implementation is arch-specific as well. If an instrumented profile is collected on one arch and applied to another arch where Extend implementation is different, Extend might be regarded as unlikely function in the latter case. ABSL_ATTRIBUTE_HOT annotation alleviates the problem by putting all Extend implementation into the hot text section [2]

[1] https://github.com/abseil/abseil-cpp/blob/c6b27359c3d27438b1313dddd7598914c1274a50/absl/crc/internal/crc_x86_arm_combined.cc#L621-L650
[2] https://github.com/abseil/abseil-cpp/blame/c6b27359c3d27438b1313dddd7598914c1274a50/absl/crc/internal/crc_x86_arm_combined.cc#L370C3-L370C21


Full diff: https://github.com/llvm/llvm-project/pull/110575.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp (+42-4)
  • (modified) llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll (+1)
diff --git a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
index fbed593ab3aa74..6031bb8c05fa3f 100644
--- a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
+++ b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
@@ -132,6 +132,11 @@ static cl::opt<int> ICPMaxNumVTableLastCandidate(
     "icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden,
     cl::desc("The maximum number of vtable for the last candidate."));
 
+static cl::opt<std::string> ICPKnownUnrepresentativeVTables(
+    "icp-known-unrepresentative-vtables", cl::init(""), cl::Hidden,
+    cl::desc("A comma-separated list of mangled vtable names for which instrumented
+    profiles are not representative. For instance, the instantiated class is arch or micro-arch specific, while instrumented profiles are collected on one arch."));
+
 namespace {
 
 // The key is a vtable global variable, and the value is a map.
@@ -316,6 +321,8 @@ class IndirectCallPromoter {
 
   OptimizationRemarkEmitter &ORE;
 
+  const DenseSet<StringRef> &KnownUnrepresentativeBaseTypes;
+
   // A struct that records the direct target and it's call count.
   struct PromotionCandidate {
     Function *const TargetFunction;
@@ -391,10 +398,12 @@ class IndirectCallPromoter {
       Function &Func, Module &M, InstrProfSymtab *Symtab, bool SamplePGO,
       const VirtualCallSiteTypeInfoMap &VirtualCSInfo,
       VTableAddressPointOffsetValMap &VTableAddressPointOffsetVal,
+      DenseSet<StringRef> &KnownUnrepresentativeTypes,
       OptimizationRemarkEmitter &ORE)
       : F(Func), M(M), Symtab(Symtab), SamplePGO(SamplePGO),
         VirtualCSInfo(VirtualCSInfo),
-        VTableAddressPointOffsetVal(VTableAddressPointOffsetVal), ORE(ORE) {}
+        VTableAddressPointOffsetVal(VTableAddressPointOffsetVal), ORE(ORE),
+        KnownUnrepresentativeBaseTypes(KnownUnrepresentativeTypes) {}
   IndirectCallPromoter(const IndirectCallPromoter &) = delete;
   IndirectCallPromoter &operator=(const IndirectCallPromoter &) = delete;
 
@@ -851,8 +860,26 @@ bool IndirectCallPromoter::isProfitableToCompareVTables(
     LLVM_DEBUG(dbgs() << "\n");
 
     uint64_t CandidateVTableCount = 0;
-    for (auto &[GUID, Count] : VTableGUIDAndCounts)
+    SmallVector<MDNode *, 2> Types;
+    for (auto &[GUID, Count] : VTableGUIDAndCounts) {
       CandidateVTableCount += Count;
+      auto *VTableVar = Symtab->getGlobalVariable(GUID);
+
+      assert(VTableVar &&
+             "VTableVar must exist for GUID in VTableGUIDAndCounts");
+
+      Types.clear();
+      VTableVar->getMetadata(LLVMContext::MD_type, Types);
+
+      for (auto *Type : Types)
+        if (auto *TypeId = dyn_cast<MDString>(Type->getOperand(1).get()))
+          if (KnownUnrepresentativeBaseTypes.contains(TypeId->getString())) {
+            LLVM_DEBUG(dbgs()
+                       << "    vtable profiles are known to be "
+                          "unrepresentative. Bail out vtable comparison.")
+            return false;
+          }
+    }
 
     if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) {
       LLVM_DEBUG(
@@ -956,9 +983,19 @@ static bool promoteIndirectCalls(Module &M, ProfileSummaryInfo *PSI, bool InLTO,
   bool Changed = false;
   VirtualCallSiteTypeInfoMap VirtualCSInfo;
 
-  if (EnableVTableProfileUse)
+  DenseSet<StringRef> KnownUnrepresentativeTypeSet;
+
+  if (EnableVTableProfileUse) {
     computeVirtualCallSiteTypeInfoMap(M, MAM, VirtualCSInfo);
 
+    SmallVector<StringRef> KnownUnrepresentativeTypes;
+    llvm::SplitString(ICPKnownUnrepresentativeVTables,
+                      KnownUnrepresentativeTypes);
+
+    for (const StringRef Str : KnownUnrepresentativeTypes)
+      KnownUnrepresentativeTypeSet.insert(Str);
+  }
+
   // VTableAddressPointOffsetVal stores the vtable address points. The vtable
   // address point of a given <vtable, address point offset> is static (doesn't
   // change after being computed once).
@@ -977,7 +1014,8 @@ static bool promoteIndirectCalls(Module &M, ProfileSummaryInfo *PSI, bool InLTO,
     auto &ORE = FAM.getResult<OptimizationRemarkEmitterAnalysis>(F);
 
     IndirectCallPromoter CallPromoter(F, M, &Symtab, SamplePGO, VirtualCSInfo,
-                                      VTableAddressPointOffsetVal, ORE);
+                                      VTableAddressPointOffsetVal,
+                                      KnownUnrepresentativeTypeSet, ORE);
     bool FuncChanged = CallPromoter.processFunction(PSI);
     if (ICPDUMPAFTER && FuncChanged) {
       LLVM_DEBUG(dbgs() << "\n== IR Dump After =="; F.print(dbgs()));
diff --git a/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll b/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll
index b6afce3d7c6d5d..bbae25787a05c6 100644
--- a/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll
+++ b/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll
@@ -2,6 +2,7 @@
 
 ; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=2 -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,VTABLE-CMP
 ; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=1 -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,FUNC-CMP
+; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=1 -icp-known-unrepresentative-vtables='Base1,Derived3' -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,FUNC-CMP
 
 target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-unknown-linux-gnu"

@@ -132,6 +132,11 @@ static cl::opt<int> ICPMaxNumVTableLastCandidate(
"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden,
cl::desc("The maximum number of vtable for the last candidate."));

static cl::opt<std::string> ICPKnownUnrepresentativeVTables(
"icp-known-unrepresentative-vtables", cl::init(""), cl::Hidden,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make this option and the code (e.g. variable names, debug messages) specific to "profile unrepresentativeness"? Can we instead generalize it like naming the option "icp-ignore-vtables", the variable can be DenseSet<StringRef> IgnoredBaseTypes and so on?

I think the motivation is justified and can be mentioned in the comments and cl::desc but the implementation don't need to be. IMO this makes the code a little easier to read and follow. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense.

I updated the PR to use cl::opt<DenseSet<StringRef>> ICPIgnoredBaseTypes, and will spin off the changes in CommandLine.h/cpp (with unit tests).

@david-xl
Copy link
Contributor

I agree that the option name can be more general as there can be other use cases too.

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the issue here specific to vtable based ICP or does it also arise in the function based ICP?

@@ -132,6 +132,14 @@ static cl::opt<int> ICPMaxNumVTableLastCandidate(
"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden,
cl::desc("The maximum number of vtable for the last candidate."));

static cl::opt<DenseSet<StringRef>> ICPIgnoredBaseTypes(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cl::list is the typical way of supporting an option that takes a list. Can you just use that instead of adding another cl::opt parser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware of cl::list (at least it didn't occur to me yesterday). Now used cl::list to save llvm::SplitStrings.

OptionValue<DenseSet<StringRef>> D, size_t GlobalWidth) const {}

bool parser<DenseSet<StringRef>>::parse(Option &O, StringRef ArgName,
StringRef Arg,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you certain that the contents of StringRef Arg outlives the DenseSet returned? All the others above return the contents by value.

I would prefer a cl::optstd::string which is parsed and validated in the ICP pass. Is there an advantage to your proposed approach and are there any precedents in the codebase? IMO this option feels overly restrictive when rejecting duplicated strings from the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you certain that the contents of StringRef Arg outlives the DenseSet returned? All the others above return the contents by value.

I took another look (at a couple of callsites of cl::ParseCommandLineOptions), and couldn't easily figure out if referenced strings remain alive for all passes. FrontendOptions.h and cc1as_main.cpp are two places which holds vector<std::string> for LLVM args. But IMO if cl::ParseCommandLineOptions (as a frequently-called function in LLVM and downstream repositories) doesn't require caller keep the argument contents alive, it's not a good idea for cl::opt to even use StringRef (users will have a hard time).

I updated to cl::list<std::string> to save a call to llvm::SplitStrings. And one interesting finding is that both cl::opt and cl::list supports internal or external storage [1], with internal storage as the default.

[1] https://llvm.org/docs/CommandLine.html#the-cl-list-class

if (auto *TypeId = dyn_cast<MDString>(Type->getOperand(1).get()))
if (VTableSet.contains(TypeId->getString().str())) {
LLVM_DEBUG(dbgs()
<< " vtable profiles are known to be "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the comment? s/known to be unrepresentative/ignored by option ..../

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -851,8 +859,27 @@ bool IndirectCallPromoter::isProfitableToCompareVTables(
LLVM_DEBUG(dbgs() << "\n");

uint64_t CandidateVTableCount = 0;
for (auto &[GUID, Count] : VTableGUIDAndCounts)
SmallVector<MDNode *, 2> Types;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move this to line L870 and drop the clear(). Since it's a small vector with only 2 pointers (on the stack) I don't expect changing this will affect performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it's better to move this vector inside the loop. Done.

2. Update `-icp-ignored-base-types` with one base type since this is
   sufficient
@mingmingl-llvm
Copy link
Contributor Author

Is the issue here specific to vtable based ICP or does it also arise in the function based ICP?

This also arises to function-based ICP, and the Extend method of class template is marked as hot with ABSL_ATTRIBUTE_HOT to at least put Extend in .text.hot section (rather than .text.unlikely)

@mingmingl-llvm mingmingl-llvm changed the title [TypeProf][PGO]Skip vtable-based ICP for which type profiles are known to be unrepresentative [TypeProf][PGO]Support skipping vtable comparisons for a class and its derived ones Oct 1, 2024
"and their derived ones will not be vtable-ICP'ed. Useful when the "
"profiled types and actual types in the optimized binary could be "
"different due to profiling "
"limitations."));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could be on previous line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -2,6 +2,7 @@

; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=2 -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,VTABLE-CMP
; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=1 -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,FUNC-CMP
; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=1 -icp-ignored-base-types='Base1' -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,FUNC-CMP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to clarify the expectations for this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

if (auto *TypeId = dyn_cast<MDString>(Type->getOperand(1).get()))
if (IgnoredBaseTypes.contains(TypeId->getString())) {
LLVM_DEBUG(dbgs() << " vtable profiles should be ignored. Bail "
"out vtable comparison.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/out/out of/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

CandidateVTableCount += Count;
auto *VTableVar = Symtab->getGlobalVariable(GUID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we skip all the new handling when IgnoredBaseTypes is empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense.

I took the liberty to use a helper function for the new handling, to simplify the control flow (e.g., continue, inner-for-loop and return) inside the outer loop.

   strings are type info names (the string literals in llvm type
   metadata)
@mingmingl-llvm mingmingl-llvm merged commit 34f0edd into llvm:main Oct 2, 2024
8 checks passed
Sterling-Augustine pushed a commit to Sterling-Augustine/llvm-project that referenced this pull request Oct 3, 2024
…s derived ones (llvm#110575)

Performance critical core libraries could be highly-optimized for arch
or micro-arch features. For instance, the absl crc library specializes
different templated classes among different hardwares [1]. In a
practical setting, it's likely that instrumented profiles are collected
on one type of machine and used to optimize binaries that run on
multiple types of hardwares.

While this kind of specialization is rare in terms of lines of code,
compiler can do a better job to skip vtable-based ICP.
* The per-class `Extend` implementation is arch-specific as well. If an
instrumented profile is collected on one arch and applied to another
arch where `Extend` implementation is different, `Extend` might be
regarded as unlikely function in the latter case. `ABSL_ATTRIBUTE_HOT`
annotation alleviates the problem by putting all `Extend` implementation
into the hot text section [2]

This change introduces a comma-separated list to specify the mangled
vtable names, and ICP pass will skip vtable-based comparison if a vtable
variable definition is shown to be in its class hierarchy (per LLVM type
metadata).

[1]
https://github.com/abseil/abseil-cpp/blob/c6b27359c3d27438b1313dddd7598914c1274a50/absl/crc/internal/crc_x86_arm_combined.cc#L621-L650
[2]
https://github.com/abseil/abseil-cpp/blame/c6b27359c3d27438b1313dddd7598914c1274a50/absl/crc/internal/crc_x86_arm_combined.cc#L370C3-L370C21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:support llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants