-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[TypeProf][PGO]Support skipping vtable comparisons for a class and its derived ones #110575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…es are known to be unrepresentative
@llvm/pr-subscribers-llvm-support @llvm/pr-subscribers-pgo Author: Mingming Liu (minglotus-6) ChangesPerformance critical core libraries could be highly-optimized for arch or micro-arch features. For instance, the absl crc library specializes different templated classes among different hardwares [1]. In a practical setting, it's likely that instrumented profiles are collected on one type of machine and used to optimize binaries that run on multiple types of hardwares. While this kind of specialization is rare in terms of lines of code, compiler can do a better job to skip vtable-based ICP.
[1] https://github.com/abseil/abseil-cpp/blob/c6b27359c3d27438b1313dddd7598914c1274a50/absl/crc/internal/crc_x86_arm_combined.cc#L621-L650 Full diff: https://github.com/llvm/llvm-project/pull/110575.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
index fbed593ab3aa74..6031bb8c05fa3f 100644
--- a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
+++ b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
@@ -132,6 +132,11 @@ static cl::opt<int> ICPMaxNumVTableLastCandidate(
"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden,
cl::desc("The maximum number of vtable for the last candidate."));
+static cl::opt<std::string> ICPKnownUnrepresentativeVTables(
+ "icp-known-unrepresentative-vtables", cl::init(""), cl::Hidden,
+ cl::desc("A comma-separated list of mangled vtable names for which instrumented
+ profiles are not representative. For instance, the instantiated class is arch or micro-arch specific, while instrumented profiles are collected on one arch."));
+
namespace {
// The key is a vtable global variable, and the value is a map.
@@ -316,6 +321,8 @@ class IndirectCallPromoter {
OptimizationRemarkEmitter &ORE;
+ const DenseSet<StringRef> &KnownUnrepresentativeBaseTypes;
+
// A struct that records the direct target and it's call count.
struct PromotionCandidate {
Function *const TargetFunction;
@@ -391,10 +398,12 @@ class IndirectCallPromoter {
Function &Func, Module &M, InstrProfSymtab *Symtab, bool SamplePGO,
const VirtualCallSiteTypeInfoMap &VirtualCSInfo,
VTableAddressPointOffsetValMap &VTableAddressPointOffsetVal,
+ DenseSet<StringRef> &KnownUnrepresentativeTypes,
OptimizationRemarkEmitter &ORE)
: F(Func), M(M), Symtab(Symtab), SamplePGO(SamplePGO),
VirtualCSInfo(VirtualCSInfo),
- VTableAddressPointOffsetVal(VTableAddressPointOffsetVal), ORE(ORE) {}
+ VTableAddressPointOffsetVal(VTableAddressPointOffsetVal), ORE(ORE),
+ KnownUnrepresentativeBaseTypes(KnownUnrepresentativeTypes) {}
IndirectCallPromoter(const IndirectCallPromoter &) = delete;
IndirectCallPromoter &operator=(const IndirectCallPromoter &) = delete;
@@ -851,8 +860,26 @@ bool IndirectCallPromoter::isProfitableToCompareVTables(
LLVM_DEBUG(dbgs() << "\n");
uint64_t CandidateVTableCount = 0;
- for (auto &[GUID, Count] : VTableGUIDAndCounts)
+ SmallVector<MDNode *, 2> Types;
+ for (auto &[GUID, Count] : VTableGUIDAndCounts) {
CandidateVTableCount += Count;
+ auto *VTableVar = Symtab->getGlobalVariable(GUID);
+
+ assert(VTableVar &&
+ "VTableVar must exist for GUID in VTableGUIDAndCounts");
+
+ Types.clear();
+ VTableVar->getMetadata(LLVMContext::MD_type, Types);
+
+ for (auto *Type : Types)
+ if (auto *TypeId = dyn_cast<MDString>(Type->getOperand(1).get()))
+ if (KnownUnrepresentativeBaseTypes.contains(TypeId->getString())) {
+ LLVM_DEBUG(dbgs()
+ << " vtable profiles are known to be "
+ "unrepresentative. Bail out vtable comparison.")
+ return false;
+ }
+ }
if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) {
LLVM_DEBUG(
@@ -956,9 +983,19 @@ static bool promoteIndirectCalls(Module &M, ProfileSummaryInfo *PSI, bool InLTO,
bool Changed = false;
VirtualCallSiteTypeInfoMap VirtualCSInfo;
- if (EnableVTableProfileUse)
+ DenseSet<StringRef> KnownUnrepresentativeTypeSet;
+
+ if (EnableVTableProfileUse) {
computeVirtualCallSiteTypeInfoMap(M, MAM, VirtualCSInfo);
+ SmallVector<StringRef> KnownUnrepresentativeTypes;
+ llvm::SplitString(ICPKnownUnrepresentativeVTables,
+ KnownUnrepresentativeTypes);
+
+ for (const StringRef Str : KnownUnrepresentativeTypes)
+ KnownUnrepresentativeTypeSet.insert(Str);
+ }
+
// VTableAddressPointOffsetVal stores the vtable address points. The vtable
// address point of a given <vtable, address point offset> is static (doesn't
// change after being computed once).
@@ -977,7 +1014,8 @@ static bool promoteIndirectCalls(Module &M, ProfileSummaryInfo *PSI, bool InLTO,
auto &ORE = FAM.getResult<OptimizationRemarkEmitterAnalysis>(F);
IndirectCallPromoter CallPromoter(F, M, &Symtab, SamplePGO, VirtualCSInfo,
- VTableAddressPointOffsetVal, ORE);
+ VTableAddressPointOffsetVal,
+ KnownUnrepresentativeTypeSet, ORE);
bool FuncChanged = CallPromoter.processFunction(PSI);
if (ICPDUMPAFTER && FuncChanged) {
LLVM_DEBUG(dbgs() << "\n== IR Dump After =="; F.print(dbgs()));
diff --git a/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll b/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll
index b6afce3d7c6d5d..bbae25787a05c6 100644
--- a/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll
+++ b/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll
@@ -2,6 +2,7 @@
; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=2 -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,VTABLE-CMP
; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=1 -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,FUNC-CMP
+; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=1 -icp-known-unrepresentative-vtables='Base1,Derived3' -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,FUNC-CMP
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
|
@llvm/pr-subscribers-llvm-transforms Author: Mingming Liu (minglotus-6) ChangesPerformance critical core libraries could be highly-optimized for arch or micro-arch features. For instance, the absl crc library specializes different templated classes among different hardwares [1]. In a practical setting, it's likely that instrumented profiles are collected on one type of machine and used to optimize binaries that run on multiple types of hardwares. While this kind of specialization is rare in terms of lines of code, compiler can do a better job to skip vtable-based ICP.
[1] https://github.com/abseil/abseil-cpp/blob/c6b27359c3d27438b1313dddd7598914c1274a50/absl/crc/internal/crc_x86_arm_combined.cc#L621-L650 Full diff: https://github.com/llvm/llvm-project/pull/110575.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
index fbed593ab3aa74..6031bb8c05fa3f 100644
--- a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
+++ b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
@@ -132,6 +132,11 @@ static cl::opt<int> ICPMaxNumVTableLastCandidate(
"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden,
cl::desc("The maximum number of vtable for the last candidate."));
+static cl::opt<std::string> ICPKnownUnrepresentativeVTables(
+ "icp-known-unrepresentative-vtables", cl::init(""), cl::Hidden,
+ cl::desc("A comma-separated list of mangled vtable names for which instrumented
+ profiles are not representative. For instance, the instantiated class is arch or micro-arch specific, while instrumented profiles are collected on one arch."));
+
namespace {
// The key is a vtable global variable, and the value is a map.
@@ -316,6 +321,8 @@ class IndirectCallPromoter {
OptimizationRemarkEmitter &ORE;
+ const DenseSet<StringRef> &KnownUnrepresentativeBaseTypes;
+
// A struct that records the direct target and it's call count.
struct PromotionCandidate {
Function *const TargetFunction;
@@ -391,10 +398,12 @@ class IndirectCallPromoter {
Function &Func, Module &M, InstrProfSymtab *Symtab, bool SamplePGO,
const VirtualCallSiteTypeInfoMap &VirtualCSInfo,
VTableAddressPointOffsetValMap &VTableAddressPointOffsetVal,
+ DenseSet<StringRef> &KnownUnrepresentativeTypes,
OptimizationRemarkEmitter &ORE)
: F(Func), M(M), Symtab(Symtab), SamplePGO(SamplePGO),
VirtualCSInfo(VirtualCSInfo),
- VTableAddressPointOffsetVal(VTableAddressPointOffsetVal), ORE(ORE) {}
+ VTableAddressPointOffsetVal(VTableAddressPointOffsetVal), ORE(ORE),
+ KnownUnrepresentativeBaseTypes(KnownUnrepresentativeTypes) {}
IndirectCallPromoter(const IndirectCallPromoter &) = delete;
IndirectCallPromoter &operator=(const IndirectCallPromoter &) = delete;
@@ -851,8 +860,26 @@ bool IndirectCallPromoter::isProfitableToCompareVTables(
LLVM_DEBUG(dbgs() << "\n");
uint64_t CandidateVTableCount = 0;
- for (auto &[GUID, Count] : VTableGUIDAndCounts)
+ SmallVector<MDNode *, 2> Types;
+ for (auto &[GUID, Count] : VTableGUIDAndCounts) {
CandidateVTableCount += Count;
+ auto *VTableVar = Symtab->getGlobalVariable(GUID);
+
+ assert(VTableVar &&
+ "VTableVar must exist for GUID in VTableGUIDAndCounts");
+
+ Types.clear();
+ VTableVar->getMetadata(LLVMContext::MD_type, Types);
+
+ for (auto *Type : Types)
+ if (auto *TypeId = dyn_cast<MDString>(Type->getOperand(1).get()))
+ if (KnownUnrepresentativeBaseTypes.contains(TypeId->getString())) {
+ LLVM_DEBUG(dbgs()
+ << " vtable profiles are known to be "
+ "unrepresentative. Bail out vtable comparison.")
+ return false;
+ }
+ }
if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) {
LLVM_DEBUG(
@@ -956,9 +983,19 @@ static bool promoteIndirectCalls(Module &M, ProfileSummaryInfo *PSI, bool InLTO,
bool Changed = false;
VirtualCallSiteTypeInfoMap VirtualCSInfo;
- if (EnableVTableProfileUse)
+ DenseSet<StringRef> KnownUnrepresentativeTypeSet;
+
+ if (EnableVTableProfileUse) {
computeVirtualCallSiteTypeInfoMap(M, MAM, VirtualCSInfo);
+ SmallVector<StringRef> KnownUnrepresentativeTypes;
+ llvm::SplitString(ICPKnownUnrepresentativeVTables,
+ KnownUnrepresentativeTypes);
+
+ for (const StringRef Str : KnownUnrepresentativeTypes)
+ KnownUnrepresentativeTypeSet.insert(Str);
+ }
+
// VTableAddressPointOffsetVal stores the vtable address points. The vtable
// address point of a given <vtable, address point offset> is static (doesn't
// change after being computed once).
@@ -977,7 +1014,8 @@ static bool promoteIndirectCalls(Module &M, ProfileSummaryInfo *PSI, bool InLTO,
auto &ORE = FAM.getResult<OptimizationRemarkEmitterAnalysis>(F);
IndirectCallPromoter CallPromoter(F, M, &Symtab, SamplePGO, VirtualCSInfo,
- VTableAddressPointOffsetVal, ORE);
+ VTableAddressPointOffsetVal,
+ KnownUnrepresentativeTypeSet, ORE);
bool FuncChanged = CallPromoter.processFunction(PSI);
if (ICPDUMPAFTER && FuncChanged) {
LLVM_DEBUG(dbgs() << "\n== IR Dump After =="; F.print(dbgs()));
diff --git a/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll b/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll
index b6afce3d7c6d5d..bbae25787a05c6 100644
--- a/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll
+++ b/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll
@@ -2,6 +2,7 @@
; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=2 -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,VTABLE-CMP
; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=1 -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,FUNC-CMP
+; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=1 -icp-known-unrepresentative-vtables='Base1,Derived3' -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,FUNC-CMP
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
|
@@ -132,6 +132,11 @@ static cl::opt<int> ICPMaxNumVTableLastCandidate( | |||
"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden, | |||
cl::desc("The maximum number of vtable for the last candidate.")); | |||
|
|||
static cl::opt<std::string> ICPKnownUnrepresentativeVTables( | |||
"icp-known-unrepresentative-vtables", cl::init(""), cl::Hidden, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to make this option and the code (e.g. variable names, debug messages) specific to "profile unrepresentativeness"? Can we instead generalize it like naming the option "icp-ignore-vtables", the variable can be DenseSet<StringRef> IgnoredBaseTypes
and so on?
I think the motivation is justified and can be mentioned in the comments and cl::desc but the implementation don't need to be. IMO this makes the code a little easier to read and follow. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense.
I updated the PR to use cl::opt<DenseSet<StringRef>> ICPIgnoredBaseTypes
, and will spin off the changes in CommandLine.h/cpp (with unit tests).
I agree that the option name can be more general as there can be other use cases too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the issue here specific to vtable based ICP or does it also arise in the function based ICP?
@@ -132,6 +132,14 @@ static cl::opt<int> ICPMaxNumVTableLastCandidate( | |||
"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden, | |||
cl::desc("The maximum number of vtable for the last candidate.")); | |||
|
|||
static cl::opt<DenseSet<StringRef>> ICPIgnoredBaseTypes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cl::list is the typical way of supporting an option that takes a list. Can you just use that instead of adding another cl::opt parser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware of cl::list
(at least it didn't occur to me yesterday). Now used cl::list to save llvm::SplitStrings
.
llvm/lib/Support/CommandLine.cpp
Outdated
OptionValue<DenseSet<StringRef>> D, size_t GlobalWidth) const {} | ||
|
||
bool parser<DenseSet<StringRef>>::parse(Option &O, StringRef ArgName, | ||
StringRef Arg, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you certain that the contents of StringRef Arg
outlives the DenseSet returned? All the others above return the contents by value.
I would prefer a cl::optstd::string which is parsed and validated in the ICP pass. Is there an advantage to your proposed approach and are there any precedents in the codebase? IMO this option feels overly restrictive when rejecting duplicated strings from the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you certain that the contents of StringRef Arg outlives the DenseSet returned? All the others above return the contents by value.
I took another look (at a couple of callsites of cl::ParseCommandLineOptions
), and couldn't easily figure out if referenced strings remain alive for all passes. FrontendOptions.h and cc1as_main.cpp are two places which holds vector<std::string>
for LLVM args. But IMO if cl::ParseCommandLineOptions
(as a frequently-called function in LLVM and downstream repositories) doesn't require caller keep the argument contents alive, it's not a good idea for cl::opt
to even use StringRef (users will have a hard time).
I updated to cl::list<std::string>
to save a call to llvm::SplitStrings
. And one interesting finding is that both cl::opt
and cl::list
supports internal or external storage [1], with internal storage as the default.
[1] https://llvm.org/docs/CommandLine.html#the-cl-list-class
if (auto *TypeId = dyn_cast<MDString>(Type->getOperand(1).get())) | ||
if (VTableSet.contains(TypeId->getString().str())) { | ||
LLVM_DEBUG(dbgs() | ||
<< " vtable profiles are known to be " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the comment? s/known to be unrepresentative/ignored by option ..../
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -851,8 +859,27 @@ bool IndirectCallPromoter::isProfitableToCompareVTables( | |||
LLVM_DEBUG(dbgs() << "\n"); | |||
|
|||
uint64_t CandidateVTableCount = 0; | |||
for (auto &[GUID, Count] : VTableGUIDAndCounts) | |||
SmallVector<MDNode *, 2> Types; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move this to line L870 and drop the clear()
. Since it's a small vector with only 2 pointers (on the stack) I don't expect changing this will affect performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that it's better to move this vector inside the loop. Done.
2. Update `-icp-ignored-base-types` with one base type since this is sufficient
This also arises to function-based ICP, and the |
"and their derived ones will not be vtable-ICP'ed. Useful when the " | ||
"profiled types and actual types in the optimized binary could be " | ||
"different due to profiling " | ||
"limitations.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could be on previous line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -2,6 +2,7 @@ | |||
|
|||
; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=2 -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,VTABLE-CMP | |||
; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=1 -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,FUNC-CMP | |||
; RUN: opt < %s -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=1 -icp-ignored-base-types='Base1' -S 2>&1 | FileCheck %s --check-prefixes=VTABLE-COMMON,FUNC-CMP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment to clarify the expectations for this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
if (auto *TypeId = dyn_cast<MDString>(Type->getOperand(1).get())) | ||
if (IgnoredBaseTypes.contains(TypeId->getString())) { | ||
LLVM_DEBUG(dbgs() << " vtable profiles should be ignored. Bail " | ||
"out vtable comparison."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/out/out of/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
CandidateVTableCount += Count; | ||
auto *VTableVar = Symtab->getGlobalVariable(GUID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we skip all the new handling when IgnoredBaseTypes is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense.
I took the liberty to use a helper function for the new handling, to simplify the control flow (e.g., continue, inner-for-loop and return) inside the outer loop.
strings are type info names (the string literals in llvm type metadata)
…s derived ones (llvm#110575) Performance critical core libraries could be highly-optimized for arch or micro-arch features. For instance, the absl crc library specializes different templated classes among different hardwares [1]. In a practical setting, it's likely that instrumented profiles are collected on one type of machine and used to optimize binaries that run on multiple types of hardwares. While this kind of specialization is rare in terms of lines of code, compiler can do a better job to skip vtable-based ICP. * The per-class `Extend` implementation is arch-specific as well. If an instrumented profile is collected on one arch and applied to another arch where `Extend` implementation is different, `Extend` might be regarded as unlikely function in the latter case. `ABSL_ATTRIBUTE_HOT` annotation alleviates the problem by putting all `Extend` implementation into the hot text section [2] This change introduces a comma-separated list to specify the mangled vtable names, and ICP pass will skip vtable-based comparison if a vtable variable definition is shown to be in its class hierarchy (per LLVM type metadata). [1] https://github.com/abseil/abseil-cpp/blob/c6b27359c3d27438b1313dddd7598914c1274a50/absl/crc/internal/crc_x86_arm_combined.cc#L621-L650 [2] https://github.com/abseil/abseil-cpp/blame/c6b27359c3d27438b1313dddd7598914c1274a50/absl/crc/internal/crc_x86_arm_combined.cc#L370C3-L370C21
Performance critical core libraries could be highly-optimized for arch or micro-arch features. For instance, the absl crc library specializes different templated classes among different hardwares [1]. In a practical setting, it's likely that instrumented profiles are collected on one type of machine and used to optimize binaries that run on multiple types of hardwares.
While this kind of specialization is rare in terms of lines of code, compiler can do a better job to skip vtable-based ICP.
Extend
implementation is arch-specific as well. If an instrumented profile is collected on one arch and applied to another arch whereExtend
implementation is different,Extend
might be regarded as unlikely function in the latter case.ABSL_ATTRIBUTE_HOT
annotation alleviates the problem by putting allExtend
implementation into the hot text section [2]This change introduces a comma-separated list to specify the mangled vtable names, and ICP pass will skip vtable-based comparison if a vtable variable definition is shown to be in its class hierarchy (per LLVM type metadata).
[1] https://github.com/abseil/abseil-cpp/blob/c6b27359c3d27438b1313dddd7598914c1274a50/absl/crc/internal/crc_x86_arm_combined.cc#L621-L650
[2] https://github.com/abseil/abseil-cpp/blame/c6b27359c3d27438b1313dddd7598914c1274a50/absl/crc/internal/crc_x86_arm_combined.cc#L370C3-L370C21