-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[CGData][GMF] Skip No Params #116548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CGData][GMF] Skip No Params #116548
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,6 +14,7 @@ | |
//===----------------------------------------------------------------------===// | ||
|
||
#include "llvm/CGData/StableFunctionMap.h" | ||
#include "llvm/ADT/SmallSet.h" | ||
#include "llvm/Support/CommandLine.h" | ||
#include "llvm/Support/Debug.h" | ||
|
||
|
@@ -35,21 +36,30 @@ static cl::opt<unsigned> GlobalMergingMaxParams( | |
cl::desc( | ||
"The maximum number of parameters allowed when merging functions."), | ||
cl::init(std::numeric_limits<unsigned>::max()), cl::Hidden); | ||
static cl::opt<unsigned> GlobalMergingParamOverhead( | ||
static cl::opt<bool> GlobalMergingSkipNoParams( | ||
"global-merging-skip-no-params", | ||
cl::desc("Skip merging functions with no parameters."), cl::init(true), | ||
cl::Hidden); | ||
static cl::opt<double> GlobalMergingInstOverhead( | ||
"global-merging-inst-overhead", | ||
cl::desc("The overhead cost associated with each instruction when lowering " | ||
"to machine instruction."), | ||
cl::init(1.2), cl::Hidden); | ||
static cl::opt<double> GlobalMergingParamOverhead( | ||
"global-merging-param-overhead", | ||
cl::desc("The overhead cost associated with each parameter when merging " | ||
"functions."), | ||
cl::init(2), cl::Hidden); | ||
static cl::opt<unsigned> | ||
cl::init(2.0), cl::Hidden); | ||
static cl::opt<double> | ||
GlobalMergingCallOverhead("global-merging-call-overhead", | ||
cl::desc("The overhead cost associated with each " | ||
"function call when merging functions."), | ||
cl::init(1), cl::Hidden); | ||
static cl::opt<unsigned> GlobalMergingExtraThreshold( | ||
cl::init(1.0), cl::Hidden); | ||
static cl::opt<double> GlobalMergingExtraThreshold( | ||
"global-merging-extra-threshold", | ||
cl::desc("An additional cost threshold that must be exceeded for merging " | ||
"to be considered beneficial."), | ||
cl::init(0), cl::Hidden); | ||
cl::init(0.0), cl::Hidden); | ||
|
||
unsigned StableFunctionMap::getIdOrCreateForName(StringRef Name) { | ||
auto It = NameToId.find(Name); | ||
|
@@ -160,21 +170,32 @@ static bool isProfitable( | |
if (InstCount < GlobalMergingMinInstrs) | ||
return false; | ||
|
||
unsigned ParamCount = SFS[0]->IndexOperandHashMap->size(); | ||
if (ParamCount > GlobalMergingMaxParams) | ||
return false; | ||
|
||
unsigned Benefit = InstCount * (StableFunctionCount - 1); | ||
unsigned Cost = | ||
(GlobalMergingParamOverhead * ParamCount + GlobalMergingCallOverhead) * | ||
StableFunctionCount + | ||
GlobalMergingExtraThreshold; | ||
double Cost = 0.0; | ||
SmallSet<stable_hash, 8> UniqueHashVals; | ||
for (auto &SF : SFS) { | ||
UniqueHashVals.clear(); | ||
for (auto &[IndexPair, Hash] : *SF->IndexOperandHashMap) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not aware that the unique values could differ, but it is still nice to count them precisely. This fix alone brings ~1% code size reduction on one of the projects I have tested. |
||
UniqueHashVals.insert(Hash); | ||
unsigned ParamCount = UniqueHashVals.size(); | ||
if (ParamCount > GlobalMergingMaxParams) | ||
return false; | ||
Comment on lines
+180
to
+181
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I understand if, if a group of mergable functions have too many params, we can't merge any of them. Could we eliminate functions until the params are small enough and only merge a subset of the group? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct. To compute the actual parameters, we need to iterate over the set of candidates with the same hash. By reducing these candidates, we might obtain a smaller set of parameters. However, this adds complexity to repeat these steps until we find a smaller set. In addition, since the overall function hash still matches, we may create unprofitable merging instances that the linker does not fold. I would maintain the current approach for simplicity. |
||
// Theoretically, if ParamCount is 0, it results in identical code folding | ||
// (ICF), which we can skip merging here since the linker already handles | ||
// ICF. This pass would otherwise introduce unnecessary thunks that are | ||
// merely direct jumps. However, enabling this could be beneficial depending | ||
// on downstream passes, so we provide an option for it. | ||
if (GlobalMergingSkipNoParams && ParamCount == 0) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this pass acts as an IR-level ICF when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was also thinking this. To be explicit, does ICF get the same size win as when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No, this might actually increase the size since it always creates redundant thunks. However, depending on downstream passes or an improved linker's ICF, the outcome could be different. I have now set the flag to |
||
return false; | ||
Cost += ParamCount * GlobalMergingParamOverhead + GlobalMergingCallOverhead; | ||
} | ||
Cost += GlobalMergingExtraThreshold; | ||
|
||
double Benefit = | ||
InstCount * (StableFunctionCount - 1) * GlobalMergingInstOverhead; | ||
bool Result = Benefit > Cost; | ||
LLVM_DEBUG(dbgs() << "isProfitable: Hash = " << SFS[0]->Hash << ", " | ||
<< "StableFunctionCount = " << StableFunctionCount | ||
<< ", InstCount = " << InstCount | ||
<< ", ParamCount = " << ParamCount | ||
<< ", Benefit = " << Benefit << ", Cost = " << Cost | ||
<< ", Result = " << (Result ? "true" : "false") << "\n"); | ||
return Result; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
; This test verifies whether two identical functions, f1 and f2, can be merged | ||
; locally using the global merge function. | ||
; The functions, f1.Tgm and f2.Tgm, will be folded by the linker through | ||
; Identical Code Folding (ICF). | ||
; While identical functions can already be folded by the linker, creating this | ||
; canonical form can be beneficial in downstream passes. This merging process | ||
; can be controlled by the -global-merging-skip-no-params option. | ||
|
||
; RUN: llc -enable-global-merge-func=true -global-merging-skip-no-params=false < %s | FileCheck %s --check-prefix=MERGE | ||
; RUN: llc -enable-global-merge-func=true -global-merging-skip-no-params=true < %s | FileCheck %s --implicit-check-not=".Tgm" | ||
|
||
; MERGE: _f1.Tgm | ||
; MERGE: _f2.Tgm | ||
|
||
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128" | ||
target triple = "arm64-unknown-ios12.0.0" | ||
|
||
@g = external local_unnamed_addr global [0 x i32], align 4 | ||
@g1 = external global i32, align 4 | ||
@g2 = external global i32, align 4 | ||
|
||
define i32 @f1(i32 %a) { | ||
entry: | ||
%idxprom = sext i32 %a to i64 | ||
%arrayidx = getelementptr inbounds [0 x i32], [0 x i32]* @g, i64 0, i64 %idxprom | ||
%0 = load i32, i32* %arrayidx, align 4 | ||
%1 = load volatile i32, i32* @g1, align 4 | ||
%mul = mul nsw i32 %1, %0 | ||
%add = add nsw i32 %mul, 1 | ||
ret i32 %add | ||
} | ||
|
||
define i32 @f2(i32 %a) { | ||
entry: | ||
%idxprom = sext i32 %a to i64 | ||
%arrayidx = getelementptr inbounds [0 x i32], [0 x i32]* @g, i64 0, i64 %idxprom | ||
%0 = load i32, i32* %arrayidx, align 4 | ||
%1 = load volatile i32, i32* @g1, align 4 | ||
%mul = mul nsw i32 %1, %0 | ||
%add = add nsw i32 %mul, 1 | ||
ret i32 %add | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: includes not sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually sorted correctly. The main header file (StableFunctionMap.h) should come first, corresponding to this file, StableFunctionMap.cpp. https://llvm.org/docs/CodingStandards.html#header-files