-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[CodeGen][StaticDataPartitioning]Place local-linkage global variables in hot or unlikely prefixed sections based on profile information #125756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CodeGen][StaticDataPartitioning]Place local-linkage global variables in hot or unlikely prefixed sections based on profile information #125756
Conversation
…es based on profile information
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-ir Author: Mingming Liu (mingmingl-llvm) ChangesIn this PR, static-data-splitter pass finds out the local-linkage global variables in { A follow-up item is to analyze global variable initializers and count for access from data.
Some stats of static-data-splitter with this patch:
Patch is 28.13 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/125756.diff 14 Files Affected:
diff --git a/llvm/include/llvm/IR/Function.h b/llvm/include/llvm/IR/Function.h
index fcd5396ccfdbc87..29041688124bc29 100644
--- a/llvm/include/llvm/IR/Function.h
+++ b/llvm/include/llvm/IR/Function.h
@@ -346,12 +346,6 @@ class LLVM_ABI Function : public GlobalObject, public ilist_node<Function> {
/// sample PGO, to enable the same inlines as the profiled optimized binary.
DenseSet<GlobalValue::GUID> getImportGUIDs() const;
- /// Set the section prefix for this function.
- void setSectionPrefix(StringRef Prefix);
-
- /// Get the section prefix for this function.
- std::optional<StringRef> getSectionPrefix() const;
-
/// hasGC/getGC/setGC/clearGC - The name of the garbage collection algorithm
/// to use during code generation.
bool hasGC() const {
diff --git a/llvm/include/llvm/IR/GlobalObject.h b/llvm/include/llvm/IR/GlobalObject.h
index 08edc13d81f880a..bb50c39813e1407 100644
--- a/llvm/include/llvm/IR/GlobalObject.h
+++ b/llvm/include/llvm/IR/GlobalObject.h
@@ -124,6 +124,17 @@ class GlobalObject : public GlobalValue {
/// appropriate default object file section.
void setSection(StringRef S);
+ /// Set the section prefix for this global object.
+ void setSectionPrefix(StringRef Prefix);
+
+ /// Update the section prefix, unless the existing prefix is the same as
+ /// `KeepPrefix`.
+ void updateSectionPrefix(StringRef Prefix,
+ std::optional<StringRef> KeepPrefix = std::nullopt);
+
+ /// Get the section prefix for this global object.
+ std::optional<StringRef> getSectionPrefix() const;
+
bool hasComdat() const { return getComdat() != nullptr; }
const Comdat *getComdat() const { return ObjComdat; }
Comdat *getComdat() { return ObjComdat; }
diff --git a/llvm/include/llvm/IR/MDBuilder.h b/llvm/include/llvm/IR/MDBuilder.h
index e02ec8f5a3d8bb1..ce4e1da656049d1 100644
--- a/llvm/include/llvm/IR/MDBuilder.h
+++ b/llvm/include/llvm/IR/MDBuilder.h
@@ -89,8 +89,8 @@ class MDBuilder {
MDNode *createFunctionEntryCount(uint64_t Count, bool Synthetic,
const DenseSet<GlobalValue::GUID> *Imports);
- /// Return metadata containing the section prefix for a function.
- MDNode *createFunctionSectionPrefix(StringRef Prefix);
+ /// Return metadata containing the section prefix for a global object.
+ MDNode *createGlobalObjectSectionPrefix(StringRef Prefix);
/// Return metadata containing the pseudo probe descriptor for a function.
MDNode *createPseudoProbeDesc(uint64_t GUID, uint64_t Hash, StringRef FName);
diff --git a/llvm/lib/CodeGen/StaticDataSplitter.cpp b/llvm/lib/CodeGen/StaticDataSplitter.cpp
index e5bf0a5a3a255f6..f09e3b41e0723e6 100644
--- a/llvm/lib/CodeGen/StaticDataSplitter.cpp
+++ b/llvm/lib/CodeGen/StaticDataSplitter.cpp
@@ -9,13 +9,13 @@
// The pass uses branch profile data to assign hotness based section qualifiers
// for the following types of static data:
// - Jump tables
+// - Module-internal global variables
// - Constant pools (TODO)
-// - Other module-internal data (TODO)
//
// For the original RFC of this pass please see
// https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744
-#include "llvm/ADT/ScopeExit.h"
+#include "llvm/ADT/APInt.h"
#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/CodeGen/MBFIWrapper.h"
@@ -27,9 +27,12 @@
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineJumpTableInfo.h"
#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/GlobalVariable.h"
+#include "llvm/IR/Module.h"
#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"
#include "llvm/Support/CommandLine.h"
+#include "llvm/Target/TargetLoweringObjectFile.h"
using namespace llvm;
@@ -46,12 +49,27 @@ class StaticDataSplitter : public MachineFunctionPass {
const MachineBlockFrequencyInfo *MBFI = nullptr;
const ProfileSummaryInfo *PSI = nullptr;
- // Returns true iff any jump table is hot-cold categorized.
- bool splitJumpTables(MachineFunction &MF);
+ void updateStats(bool ProfileAvailable, const MachineJumpTableInfo *MJTI);
+ void updateJumpTableStats(bool ProfileAvailable,
+ const MachineJumpTableInfo &MJTI);
- // Same as above but works on functions with profile information.
- bool splitJumpTablesWithProfiles(const MachineFunction &MF,
- MachineJumpTableInfo &MJTI);
+ // Use profiles to partition static data.
+ bool partitionStaticDataWithProfiles(MachineFunction &MF);
+
+ // If the global value is a local linkage global variable, return it.
+ // Otherwise, return nullptr.
+ const GlobalVariable *getLocalLinkageGlobalVariable(const GlobalValue *GV);
+
+ // Returns true if the global variable is in one of {.rodata, .bss, .data,
+ // .data.rel.ro} sections
+ bool inStaticDataSection(const GlobalVariable *GV, const TargetMachine &TM);
+
+ // Iterate all global variables in the module and update the section prefix
+ // of the module-internal data.
+ void updateGlobalVariableSectionPrefix(MachineFunction &MF);
+
+ // Accummulated data profile count across machine functions in the module.
+ DenseMap<const GlobalVariable *, APInt> DataProfileCounts;
public:
static char ID;
@@ -77,13 +95,24 @@ bool StaticDataSplitter::runOnMachineFunction(MachineFunction &MF) {
MBFI = &getAnalysis<MachineBlockFrequencyInfoWrapperPass>().getMBFI();
PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
- return splitJumpTables(MF);
+ const bool ProfileAvailable = PSI && PSI->hasProfileSummary() && MBFI &&
+ MF.getFunction().hasProfileData();
+ bool Changed = false;
+
+ if (ProfileAvailable)
+ Changed |= partitionStaticDataWithProfiles(MF);
+
+ updateGlobalVariableSectionPrefix(MF);
+ updateStats(ProfileAvailable, MF.getJumpTableInfo());
+ return Changed;
}
-bool StaticDataSplitter::splitJumpTablesWithProfiles(
- const MachineFunction &MF, MachineJumpTableInfo &MJTI) {
+bool StaticDataSplitter::partitionStaticDataWithProfiles(MachineFunction &MF) {
int NumChangedJumpTables = 0;
+ const TargetMachine &TM = MF.getTarget();
+ MachineJumpTableInfo *MJTI = MF.getJumpTableInfo();
+
// Jump table could be used by either terminating instructions or
// non-terminating ones, so we walk all instructions and use
// `MachineOperand::isJTI()` to identify jump table operands.
@@ -92,63 +121,131 @@ bool StaticDataSplitter::splitJumpTablesWithProfiles(
for (const auto &MBB : MF) {
for (const MachineInstr &I : MBB) {
for (const MachineOperand &Op : I.operands()) {
- if (!Op.isJTI())
- continue;
- const int JTI = Op.getIndex();
- // This is not a source block of jump table.
- if (JTI == -1)
+ std::optional<uint64_t> Count = std::nullopt;
+ if (!Op.isJTI() && !Op.isGlobal())
continue;
- auto Hotness = MachineFunctionDataHotness::Hot;
+ Count = MBFI->getBlockProfileCount(&MBB);
+
+ if (Op.isJTI()) {
+ assert(MJTI != nullptr && "Jump table info is not available.");
+ const int JTI = Op.getIndex();
+ // This is not a source block of jump table.
+ if (JTI == -1)
+ continue;
+
+ auto Hotness = MachineFunctionDataHotness::Hot;
+
+ // Hotness is based on source basic block hotness.
+ // TODO: PSI APIs are about instruction hotness. Introduce API for
+ // data access hotness.
+ if (Count && PSI->isColdCount(*Count))
+ Hotness = MachineFunctionDataHotness::Cold;
- // Hotness is based on source basic block hotness.
- // TODO: PSI APIs are about instruction hotness. Introduce API for data
- // access hotness.
- if (PSI->isColdBlock(&MBB, MBFI))
- Hotness = MachineFunctionDataHotness::Cold;
+ if (MJTI->updateJumpTableEntryHotness(JTI, Hotness))
+ ++NumChangedJumpTables;
+ } else if (Op.isGlobal()) {
+ // Find global variables with local linkage
+ const GlobalVariable *GV =
+ getLocalLinkageGlobalVariable(Op.getGlobal());
+ if (!GV || !inStaticDataSection(GV, TM))
+ continue;
- if (MJTI.updateJumpTableEntryHotness(JTI, Hotness))
- ++NumChangedJumpTables;
+ // Acccumulate data profile count across machine function
+ // instructions.
+ // TODO: Analyze global variable's initializers.
+ if (Count) {
+ auto [It, Inserted] =
+ DataProfileCounts.try_emplace(GV, APInt(128, 0));
+ It->second += *Count;
+ }
+ }
}
}
}
return NumChangedJumpTables > 0;
}
-bool StaticDataSplitter::splitJumpTables(MachineFunction &MF) {
- MachineJumpTableInfo *MJTI = MF.getJumpTableInfo();
- if (!MJTI || MJTI->getJumpTables().empty())
- return false;
-
- const bool ProfileAvailable = PSI && PSI->hasProfileSummary() && MBFI &&
- MF.getFunction().hasProfileData();
- auto statOnExit = llvm::make_scope_exit([&] {
- if (!AreStatisticsEnabled())
- return;
+void StaticDataSplitter::updateJumpTableStats(
+ bool ProfileAvailable, const MachineJumpTableInfo &MJTI) {
+ if (!ProfileAvailable) {
+ NumUnknownJumpTables += MJTI.getJumpTables().size();
+ return;
+ }
- if (!ProfileAvailable) {
- NumUnknownJumpTables += MJTI->getJumpTables().size();
- return;
+ for (size_t JTI = 0; JTI < MJTI.getJumpTables().size(); JTI++) {
+ auto Hotness = MJTI.getJumpTables()[JTI].Hotness;
+ if (Hotness == MachineFunctionDataHotness::Hot) {
+ ++NumHotJumpTables;
+ } else {
+ assert(Hotness == MachineFunctionDataHotness::Cold &&
+ "A jump table is either hot or cold when profile information is "
+ "available.");
+ ++NumColdJumpTables;
}
+ }
+}
- for (size_t JTI = 0; JTI < MJTI->getJumpTables().size(); JTI++) {
- auto Hotness = MJTI->getJumpTables()[JTI].Hotness;
- if (Hotness == MachineFunctionDataHotness::Hot) {
- ++NumHotJumpTables;
- } else {
- assert(Hotness == MachineFunctionDataHotness::Cold &&
- "A jump table is either hot or cold when profile information is "
- "available.");
- ++NumColdJumpTables;
- }
- }
- });
+void StaticDataSplitter::updateStats(bool ProfileAvailable,
+ const MachineJumpTableInfo *MJTI) {
+ if (!AreStatisticsEnabled())
+ return;
- // Place jump tables according to block hotness if function has profile data.
- if (ProfileAvailable)
- return splitJumpTablesWithProfiles(MF, *MJTI);
+ if (MJTI)
+ updateJumpTableStats(ProfileAvailable, *MJTI);
+}
- return true;
+const GlobalVariable *
+StaticDataSplitter::getLocalLinkageGlobalVariable(const GlobalValue *GV) {
+ if (!GV || GV->isDeclarationForLinker())
+ return nullptr;
+
+ return GV->hasLocalLinkage() ? dyn_cast<GlobalVariable>(GV) : nullptr;
+}
+
+bool StaticDataSplitter::inStaticDataSection(const GlobalVariable *GV,
+ const TargetMachine &TM) {
+ assert(GV && "Caller guaranteed");
+
+ // Skip LLVM reserved symbols.
+ if (GV->getName().starts_with("llvm."))
+ return false;
+
+ SectionKind Kind = TargetLoweringObjectFile::getKindForGlobal(GV, TM);
+ return Kind.isData() || Kind.isReadOnly() || Kind.isReadOnlyWithRel() ||
+ Kind.isBSS();
+}
+
+void StaticDataSplitter::updateGlobalVariableSectionPrefix(
+ MachineFunction &MF) {
+ for (GlobalVariable &GV : MF.getFunction().getParent()->globals()) {
+ if (GV.isDeclarationForLinker())
+ continue;
+ // DataProfileCounts accumulates data profile count across all machine
+ // function instructions, and it can't model the indirect accesses through
+ // other global variables' initializers.
+ // TODO: Analyze the users of module-internal global variables and see
+ // through the users' initializers. Do not place a global variable into
+ // unlikely section if any of its users are potentially hot.
+ auto Iter = DataProfileCounts.find(&GV);
+ if (Iter == DataProfileCounts.end())
+ continue;
+
+ // StaticDataSplitter is made a machine function pass rather than a module
+ // pass because (Lazy)MachineBlockFrequencyInfo is a machine-function
+ // analysis pass and cannot be used for a legacy module pass.
+ // As a result, we use `DataProfileCounts` to accumulate data
+ // profile count across machine functions and update global variable section
+ // prefix once per machine function.
+ // FIXME: Make StaticDataSplitter a module pass under new pass manager
+ // framework, and set global variable section prefix once per module after
+ // analyzing all machine functions.
+ if (PSI->isColdCount(Iter->second.getZExtValue())) {
+ GV.updateSectionPrefix("unlikely", std::make_optional(StringRef("hot")));
+ } else if (PSI->isHotCount(Iter->second.getZExtValue())) {
+ GV.updateSectionPrefix("hot");
+ }
+ }
}
char StaticDataSplitter::ID = 0;
diff --git a/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp b/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
index 3c2c7c8c9fed69a..d20ab29cc197974 100644
--- a/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
@@ -670,6 +670,7 @@ getELFSectionNameForGlobal(const GlobalObject *GO, SectionKind Kind,
}
bool HasPrefix = false;
+
if (const auto *F = dyn_cast<Function>(GO)) {
// Jump table hotness takes precedence over its enclosing function's hotness
// if it's known. The function's section prefix is used if jump table entry
@@ -687,6 +688,11 @@ getELFSectionNameForGlobal(const GlobalObject *GO, SectionKind Kind,
raw_svector_ostream(Name) << '.' << *Prefix;
HasPrefix = true;
}
+ } else if (const auto *GV = dyn_cast<GlobalVariable>(GO)) {
+ if (std::optional<StringRef> Prefix = GV->getSectionPrefix()) {
+ raw_svector_ostream(Name) << '.' << *Prefix;
+ HasPrefix = true;
+ }
}
if (UniqueSectionName) {
diff --git a/llvm/lib/IR/Function.cpp b/llvm/lib/IR/Function.cpp
index e6f0d64d071ba67..5666f0a53866fda 100644
--- a/llvm/lib/IR/Function.cpp
+++ b/llvm/lib/IR/Function.cpp
@@ -1164,22 +1164,6 @@ DenseSet<GlobalValue::GUID> Function::getImportGUIDs() const {
return R;
}
-void Function::setSectionPrefix(StringRef Prefix) {
- MDBuilder MDB(getContext());
- setMetadata(LLVMContext::MD_section_prefix,
- MDB.createFunctionSectionPrefix(Prefix));
-}
-
-std::optional<StringRef> Function::getSectionPrefix() const {
- if (MDNode *MD = getMetadata(LLVMContext::MD_section_prefix)) {
- assert(cast<MDString>(MD->getOperand(0))->getString() ==
- "function_section_prefix" &&
- "Metadata not match");
- return cast<MDString>(MD->getOperand(1))->getString();
- }
- return std::nullopt;
-}
-
bool Function::nullPointerIsDefined() const {
return hasFnAttribute(Attribute::NullPointerIsValid);
}
diff --git a/llvm/lib/IR/Globals.cpp b/llvm/lib/IR/Globals.cpp
index db5e1cb57b1bab8..884089262e4659d 100644
--- a/llvm/lib/IR/Globals.cpp
+++ b/llvm/lib/IR/Globals.cpp
@@ -18,6 +18,7 @@
#include "llvm/IR/GlobalAlias.h"
#include "llvm/IR/GlobalValue.h"
#include "llvm/IR/GlobalVariable.h"
+#include "llvm/IR/MDBuilder.h"
#include "llvm/IR/Module.h"
#include "llvm/Support/Error.h"
#include "llvm/Support/ErrorHandling.h"
@@ -286,6 +287,35 @@ void GlobalObject::setSection(StringRef S) {
setGlobalObjectFlag(HasSectionHashEntryBit, !S.empty());
}
+void GlobalObject::setSectionPrefix(StringRef Prefix) {
+ MDBuilder MDB(getContext());
+ setMetadata(LLVMContext::MD_section_prefix,
+ MDB.createGlobalObjectSectionPrefix(Prefix));
+}
+
+void GlobalObject::updateSectionPrefix(StringRef Prefix,
+ std::optional<StringRef> KeepPrefix) {
+ auto SectionPrefix = getSectionPrefix();
+ if (SectionPrefix && (*SectionPrefix == Prefix ||
+ (KeepPrefix && *SectionPrefix == *KeepPrefix)))
+ return;
+
+ setSectionPrefix(Prefix);
+ return;
+}
+
+std::optional<StringRef> GlobalObject::getSectionPrefix() const {
+ if (MDNode *MD = getMetadata(LLVMContext::MD_section_prefix)) {
+ [[maybe_unused]] StringRef MDName =
+ cast<MDString>(MD->getOperand(0))->getString();
+ assert((MDName == "section_prefix" ||
+ (isa<Function>(this) && MDName == "function_section_prefix")) &&
+ "Metadata not match");
+ return cast<MDString>(MD->getOperand(1))->getString();
+ }
+ return std::nullopt;
+}
+
bool GlobalValue::isNobuiltinFnDef() const {
const Function *F = dyn_cast<Function>(this);
if (!F || F->empty())
diff --git a/llvm/lib/IR/MDBuilder.cpp b/llvm/lib/IR/MDBuilder.cpp
index 26c8ab9fc36c850..b6aa8844a7eafa7 100644
--- a/llvm/lib/IR/MDBuilder.cpp
+++ b/llvm/lib/IR/MDBuilder.cpp
@@ -87,9 +87,9 @@ MDNode *MDBuilder::createFunctionEntryCount(
return MDNode::get(Context, Ops);
}
-MDNode *MDBuilder::createFunctionSectionPrefix(StringRef Prefix) {
- return MDNode::get(
- Context, {createString("function_section_prefix"), createString(Prefix)});
+MDNode *MDBuilder::createGlobalObjectSectionPrefix(StringRef Prefix) {
+ return MDNode::get(Context,
+ {createString("section_prefix"), createString(Prefix)});
}
MDNode *MDBuilder::createRange(const APInt &Lo, const APInt &Hi) {
diff --git a/llvm/test/CodeGen/X86/data-section-prefix.ll b/llvm/test/CodeGen/X86/data-section-prefix.ll
new file mode 100644
index 000000000000000..4812fc70758fbce
--- /dev/null
+++ b/llvm/test/CodeGen/X86/data-section-prefix.ll
@@ -0,0 +1,27 @@
+; RUN: llc -mtriple x86_64-linux-gnu -data-sections %s -o - | FileCheck %s --check-prefix=ELF
+; RUN: llc -mtriple x86_64-linux-gnu -unique-section-names=0 -data-sections %s -o - | FileCheck %s --check-prefix=ELF-NOUNIQ
+
+; RUN: llc -mtriple x86_64-windows-msvc -data-sections %s -o - | FileCheck %s --check-prefix=COFF-MSVC
+
+; ELF: .section .data.hot.foo,
+; ELF: .section .data.bar,
+; ELF: .section .bss.unlikely.baz,
+; ELF: .section .bss.quz,
+
+; ELF-NOUNIQ: .section .data.hot.,"aw",@progbits,unique,1
+; ELF-NOUNIQ: .section .data,"aw",@progbits,unique,2
+; ELF-NOUNIQ: .section .bss.unlikely.,"aw",@nobits,unique,3
+; ELF-NOUNIQ: .section .bss,"aw",@nobits,unique,4
+
+; COFF-MSVC: .section .data,"dw",one_only,foo
+; COFF-MSVC: .section .data,"dw",one_only,bar
+; COFF-MSVC: .section .bss,"bw",one_only,baz
+; COFF-MSVC: .section .bss,"bw",one_only,quz
+
+@foo = global i32 1, !section_prefix !0
+@bar = global i32 2
+@baz = global i32 0, !section_prefix !1
+@quz = global i32 0
+
+!0 = !{!"section_prefix", !"hot"}
+!1 = !{!"section_prefix", !"unlikely"}
diff --git a/llvm/test/CodeGen/X86/global-variable-partition.ll b/llvm/test/CodeGen/X86/global-variable-partition.ll
new file mode 100644
index 000000000000000..bb77f3362406bdc
--- /dev/null
+++ b/llvm/test/CodeGen/X86/global-variable-partition.ll
@@ -0,0 +1,165 @@
+
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu -enable-split-machine-functions \
+; RUN: -partition-static-data-sections=true -data-sections=true \
+; RUN: -unique-section-names=true -relocation-model=pic \
+; RUN: %s -o - 2>&1 | FileCheck %s --check-prefixes=SYM,DATA
+
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu -enable-split-machine-functions \
+; RUN: -partition-static-data-sections=true -data-sections=true \
+; RUN: -unique-section-names=false -relocation-model=pic \
+; RUN: %s -o - 2>&1 | FileCheck %s --check-prefixes=UNIQ,DATA
+
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu -enable-split-machine-functions \
+; RUN: -partition-static-data-sections=true -data-sections=false \
+; RUN: -unique-section-names=false -relocation-model=pic \
+; RUN: %s -o - 2>&1 | FileCheck %s --check-prefixes=AGG,DATA
+
+; SYM: .section .rodata.str1.1.hot.
+; UNIQ: .section .rodata.str1.1.hot.,"aMS",@progbits,1
+; AGG: .section .rodata.str1.1.hot
+; DATA: .L.str
+; DATA: "hot\t"
+; DATA: .L.str.1
+; DATA: "%d\t%d\t%d\n"
+
+
+; SYM: .sect...
[truncated]
|
cc @Colibrow |
This is a split of #125756
Is there a reason for using |
std::optional<uint64_t> Count = std::nullopt; | ||
if (!Op.isJTI() && !Op.isGlobal()) | ||
continue; | ||
|
||
auto Hotness = MachineFunctionDataHotness::Hot; | ||
Count = MBFI->getBlockProfileCount(&MBB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::optional<uint64_t> Count = std::nullopt; | |
if (!Op.isJTI() && !Op.isGlobal()) | |
continue; | |
auto Hotness = MachineFunctionDataHotness::Hot; | |
Count = MBFI->getBlockProfileCount(&MBB); | |
if (!Op.isJTI() && !Op.isGlobal()) | |
continue; | |
std::optional<uint64_t> Count = MBFI->getBlockProfileCount(&MBB); |
is that OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the change is ok.
I kept getting "Applying suggestions on deleted lines is currently not supported." error when trying to commit the suggestion. I'll make the change in the local client..
@foo = global i32 1, !section_prefix !0 | ||
@bar = global i32 2 | ||
@baz = global i32 0, !section_prefix !1 | ||
@quz = global i32 0 | ||
|
||
!0 = !{!"section_prefix", !"hot"} | ||
!1 = !{!"section_prefix", !"unlikely"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if data_section_prefix would make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function and global variables are derived classes of GlobalObject, and currently function section prefix metadata has function_section_prefix
.
To me, data_section_prefix
carries the semantic that a global variable/object is a piece of data, which may become inaccurate. Alternative to setting global object's section prefix, I can keep function's methods as they are in TOT, and implements {set,get,update}SectionPrefix
for GlobalVariables (not its superclass GlobalObject) and name it global_variable_section_prefix
. How does that sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I get it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for using .unlikely as a prefix rather than .cold? .cold seems more obvious as an opposite of .hot and we use it elsewhere in LLVM, most notably hot-cold splitting.
@petrhosek Thanks for the question. unlikely
is used to follow the conventions of functions in the codegenprepare pass. I'm open to name changes.
If identical section names (as string literals) are not de-duplicated in the relocatable object files (e.g., when compiled with -data-sections=true -unique-section-names=false
), cold
will give smaller section names than unlikely
per data section where this matters.
std::optional<uint64_t> Count = std::nullopt; | ||
if (!Op.isJTI() && !Op.isGlobal()) | ||
continue; | ||
|
||
auto Hotness = MachineFunctionDataHotness::Hot; | ||
Count = MBFI->getBlockProfileCount(&MBB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the change is ok.
I kept getting "Applying suggestions on deleted lines is currently not supported." error when trying to commit the suggestion. I'll make the change in the local client..
@foo = global i32 1, !section_prefix !0 | ||
@bar = global i32 2 | ||
@baz = global i32 0, !section_prefix !1 | ||
@quz = global i32 0 | ||
|
||
!0 = !{!"section_prefix", !"hot"} | ||
!1 = !{!"section_prefix", !"unlikely"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function and global variables are derived classes of GlobalObject, and currently function section prefix metadata has function_section_prefix
.
To me, data_section_prefix
carries the semantic that a global variable/object is a piece of data, which may become inaccurate. Alternative to setting global object's section prefix, I can keep function's methods as they are in TOT, and implements {set,get,update}SectionPrefix
for GlobalVariables (not its superclass GlobalObject) and name it global_variable_section_prefix
. How does that sound?
I believe the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just some minor comments.
assert(GV && "Caller guaranteed"); | ||
|
||
// Skip LLVM reserved symbols. | ||
if (GV->getName().starts_with("llvm.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please explain why need to skip LLVM reserved symbols?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. The globals with llvm.
as a name prefix are usually handled specially, by many (middle-end and back-end) passes (e.g., AsmPrinter emits special LLVM values specially), and they are skipped in this pass mostly out of conservativeness.
I moved this check before calling inStaticDataSection
helper though for readability, and added a comment around the check. What do you think about this?
// framework, and set global variable section prefix once per module after | ||
// analyzing all machine functions. | ||
if (PSI->isColdCount(Iter->second.getZExtValue())) { | ||
Changed |= GV.updateSectionPrefix("unlikely", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not prefix "split" instead of "unlikely" to align with "MachineFunctionSplitter"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that functions have .hot
or .unlikely
prefixes when they are placed as a whole (in the codegenprepare pass), and have .split
as a prefix when basic blocks of a function are further categorized into hot and non-hot ones (in the MFS pass you mentioned).
Here the granularity is a piece of global variable, and .split
would be a good name if a global variable itself is split into hot and cold ones. For instance, there should be hot and cold functions in a vtable, but it'd take substantial work to implement an ABI to support the split vtable entries.
} | ||
} | ||
} | ||
} | ||
return NumChangedJumpTables > 0; | ||
} | ||
|
||
const GlobalVariable * | ||
StaticDataSplitter::getLocalLinkageGlobalVariable(const GlobalValue *GV) { | ||
if (!GV || GV->isDeclarationForLinker()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can "GV->hasLocalLinkage()" cover "GV->isDeclarationForLinker()"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I haven't thought about it before.
It turns out IR verifier requires a declaration to have one of 'valid' decl linkages, and valid decl linkages must be at least external according to isValidDeclarationLinkage.
I removed GV->isDeclarationForLinker()
here and added a comment.
llvm/include/llvm/IR/GlobalObject.h
Outdated
|
||
/// Update the section prefix, unless the existing prefix is the same as | ||
/// `KeepPrefix`. | ||
bool updateSectionPrefix(StringRef Prefix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to drop this after merging #125757.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
It occurred to me that counters will increase monotonically (before it becomes max), so I asserted a global variable should not have .hot
prefix under if (PSI->isColdCount(Iter->second)) {
around StaticDataSplitter.cpp L 224.
This is a split of llvm#125756
…he base class of {Function, GlobalVariable, IFunc} (llvm#125757) This is a split of llvm#125756
* In StaticDataProfileInfo.h/cpp, add an immutable pass to keep track of constants and their profile information across functions in a module. * Add a module pass, StaticDataAnnotator, to set global variable's section prefix based on module-wide hotness.
Hi folks, just a quick note for this change when you have a moment :) |
@@ -0,0 +1,68 @@ | |||
#ifndef LLVM_ANALYSIS_STATICDATAPROFILEINFO_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is reasonable to put this definition into ProfileSummaryInfo.h, as it is about summary/aggregation of const profiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I don't feel strong about having new classes in ProfileSummaryInfo.h
or in new files. Actually I once thought about the former and hesitated between the two.
I ended up with the latter mainly because PSI is included in multiple cpp files, and the new classes don't need to be included as much.
As the static data partitioning work is under development and symbolized data access profiles may need helper functions and libraries, I think we can keep it this way and do refactors later where they fit. What do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
return; | ||
} | ||
uint64_t &OriginalCount = ConstantProfileCounts[C]; | ||
OriginalCount += llvm::SaturatingAdd(*Count, OriginalCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should += be just =?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the catch! done.
void addConstantProfileCount(const Constant *C, | ||
std::optional<uint64_t> Count); | ||
|
||
/// If \p C has a count, return it. Otherwise, return std::nullopt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When it returns nullopt, it does not mean the constant has unknown profile count -- it can just mean the profile count has not been set/aggregate for the constant. When the analysis phase is done, returning nullopt seems the same has 'hasUnknownCount()' below. Perhaps clarify in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the point that the difference between hasUnknownCount
and getConstantProfileCount
are subtle. I thought about this more.
Basically, StaticDataAnnotator uses both PSI and this class to decide the section prefix of a constant, so an alternative is (for this class) to provide getConstantSectionPrefix
directly. The updated change implements the alternative option, and makes getConstantProfileCount
to a private class method.
|
||
// Skip global variables without profile counts. The module may not be | ||
// profiled or instrumented. | ||
auto Count = SDPI->getConstantProfileCount(&GV); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, will hasUnknownCount GVs return nullopt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the original code, the semantic of getConstantProfileCount
and hasUnknownCount
are subtle on the different combinations (i.e., whether a var is used in hot/cold/lukewarm/unprofiled functions, etc)
The updated code will query a section prefix from StaticDataProfileInfo
directly there. PTAL.
// prefix isn't hot already. | ||
GV.setSectionPrefix("hot"); | ||
Changed = true; | ||
} else if (PSI->isColdCount(*Count) && !SDPI->hasUnknownCount(&GV) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the hasUnknownCount check needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this question is similar to https://github.com/llvm/llvm-project/pull/125756/files#r2017471165 and https://github.com/llvm/llvm-project/pull/125756/files#r2017466522, and being discussed there.
@@ -0,0 +1,68 @@ | |||
#ifndef LLVM_ANALYSIS_STATICDATAPROFILEINFO_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
continue; | ||
|
||
StringRef SectionPrefix = SDPI->getConstantSectionPrefix(&GV, PSI); | ||
if (SectionPrefix.empty() || alreadyHasSectionPrefix(GV, SectionPrefix)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what situation can the constant already have a section prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang provides a section attribute which can be attached to globals. This is commonly used in embedded applications as far as I know.
https://clang.llvm.org/docs/AttributeReference.html#section-declspec-allocate
@mingmingl-llvm I don't think we have a test which covers this condition behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what situation can the constant already have a section prefix?
Good question. A search of GlobalValue::setSectionPrefix
in the LLVM repo shows that it's only used for functions but not global variables, so the alreadyHasSectionPrefix
should return always false currently. To be a little bit future proof, updated the patch to report error if a global variable already has a section prefix, and commented that the subsequent code doesn't handle 'update' properly.
clang provides a section attribute which can be attached to globals. This is commonly used in embedded applications as far as I know.
Thanks for pointing this out.
For test coverage purposes, the test case was updated with @cold_data_custom_foo_section
and @hot_data_custom_bar_section
; the former uses clang's section attribute, and the latter uses #pragma clang section .
Taking a look at [1], such global variables [1] will have explicit sections, and this work (hot-cold partitioning) won't kick in. Per offline discussion, it's desired for compiler to keep source code annotations without partitioning.
[1]
llvm-project/llvm/lib/Target/TargetLoweringObjectFile.cpp
Lines 316 to 331 in 91d2ecf
/// This method computes the appropriate section to emit the specified global | |
/// variable or function definition. This should not be passed external (or | |
/// available externally) globals. | |
MCSection *TargetLoweringObjectFile::SectionForGlobal( | |
const GlobalObject *GO, SectionKind Kind, const TargetMachine &TM) const { | |
// Select section name. | |
if (GO->hasSection()) | |
return getExplicitSectionGlobal(GO, Kind, TM); | |
if (auto *GVar = dyn_cast<GlobalVariable>(GO)) { | |
auto Attrs = GVar->getAttributes(); | |
if ((Attrs.hasAttribute("bss-section") && Kind.isBSS()) || | |
(Attrs.hasAttribute("data-section") && Kind.isData()) || | |
(Attrs.hasAttribute("relro-section") && Kind.isReadOnlyWithRel()) || | |
(Attrs.hasAttribute("rodata-section") && Kind.isReadOnly())) { | |
return getExplicitSectionGlobal(GO, Kind, TM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
continue; | ||
|
||
StringRef SectionPrefix = SDPI->getConstantSectionPrefix(&GV, PSI); | ||
if (SectionPrefix.empty() || alreadyHasSectionPrefix(GV, SectionPrefix)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang provides a section attribute which can be attached to globals. This is commonly used in embedded applications as far as I know.
https://clang.llvm.org/docs/AttributeReference.html#section-declspec-allocate
@mingmingl-llvm I don't think we have a test which covers this condition behaviour?
|
||
; This RUN command sets `-data-sections=true -unique-section-names=true` so data | ||
; sections are uniqufied by numbers. | ||
; RUN: llc -mtriple=x86_64-unknown-linux-gnu -enable-split-machine-functions \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to update this test too once we remove the dependence on -enable-split-machine-functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
continue; | ||
|
||
StringRef SectionPrefix = SDPI->getConstantSectionPrefix(&GV, PSI); | ||
if (SectionPrefix.empty() || alreadyHasSectionPrefix(GV, SectionPrefix)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM.
The presubmit failure in |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/17/builds/6850 Here is the relevant piece of the build log for the reference
|
) This is a follow-up patch of #125756 In this PR, static-data-splitter pass produces the aggregated profile counts of constants for constant pools in a global state (`StateDataProfileInfo`), and asm printer consumes the profile counts to produce `.hot` or `.unlikely` prefixes. This implementation covers both x86 and aarch64 asm printer.
…oning (#129781) This is a follow-up patch of llvm/llvm-project#125756 In this PR, static-data-splitter pass produces the aggregated profile counts of constants for constant pools in a global state (`StateDataProfileInfo`), and asm printer consumes the profile counts to produce `.hot` or `.unlikely` prefixes. This implementation covers both x86 and aarch64 asm printer.
…#129781) This is a follow-up patch of llvm#125756 In this PR, static-data-splitter pass produces the aggregated profile counts of constants for constant pools in a global state (`StateDataProfileInfo`), and asm printer consumes the profile counts to produce `.hot` or `.unlikely` prefixes. This implementation covers both x86 and aarch64 asm printer.
In this PR, static-data-splitter pass finds out the local-linkage global variables in {
.rodata
,.data.rel.ro
,bss
,.data
} sections by analyzing machine instruction operands, and aggregates their accesses from code across functions.A follow-up item is to analyze global variable initializers and count for access from data.
bss2
anddata3
inllvm/test/CodeGen/X86/global-variable-partition.ll
.Some stats of static-data-splitter with this patch:
#perf-sample-in-hot-prefixed <data> section / #perf-sample in <data.*> section
for each section.MEM_INST_RETIRED.ALL_LOADS:u:pinned:precise=2
events at a high frequency (perf -c 2251
) for 30 seconds. The profiled binary is built as non-PIE sodata.rel.ro
coverage data is not available.<data>
section size percentage is defined asunlikely <data> section size / the sum size of <data>.* sections
for each<data>
section