Skip to content

[CGData][GMF] Skip No Params #116548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 37 additions & 16 deletions llvm/lib/CGData/StableFunctionMap.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
//===----------------------------------------------------------------------===//

#include "llvm/CGData/StableFunctionMap.h"
#include "llvm/ADT/SmallSet.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: includes not sorted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually sorted correctly. The main header file (StableFunctionMap.h) should come first, corresponding to this file, StableFunctionMap.cpp. https://llvm.org/docs/CodingStandards.html#header-files

#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"

Expand All @@ -35,21 +36,30 @@ static cl::opt<unsigned> GlobalMergingMaxParams(
cl::desc(
"The maximum number of parameters allowed when merging functions."),
cl::init(std::numeric_limits<unsigned>::max()), cl::Hidden);
static cl::opt<unsigned> GlobalMergingParamOverhead(
static cl::opt<bool> GlobalMergingSkipNoParams(
"global-merging-skip-no-params",
cl::desc("Skip merging functions with no parameters."), cl::init(true),
cl::Hidden);
static cl::opt<double> GlobalMergingInstOverhead(
"global-merging-inst-overhead",
cl::desc("The overhead cost associated with each instruction when lowering "
"to machine instruction."),
cl::init(1.2), cl::Hidden);
static cl::opt<double> GlobalMergingParamOverhead(
"global-merging-param-overhead",
cl::desc("The overhead cost associated with each parameter when merging "
"functions."),
cl::init(2), cl::Hidden);
static cl::opt<unsigned>
cl::init(2.0), cl::Hidden);
static cl::opt<double>
GlobalMergingCallOverhead("global-merging-call-overhead",
cl::desc("The overhead cost associated with each "
"function call when merging functions."),
cl::init(1), cl::Hidden);
static cl::opt<unsigned> GlobalMergingExtraThreshold(
cl::init(1.0), cl::Hidden);
static cl::opt<double> GlobalMergingExtraThreshold(
"global-merging-extra-threshold",
cl::desc("An additional cost threshold that must be exceeded for merging "
"to be considered beneficial."),
cl::init(0), cl::Hidden);
cl::init(0.0), cl::Hidden);

unsigned StableFunctionMap::getIdOrCreateForName(StringRef Name) {
auto It = NameToId.find(Name);
Expand Down Expand Up @@ -160,21 +170,32 @@ static bool isProfitable(
if (InstCount < GlobalMergingMinInstrs)
return false;

unsigned ParamCount = SFS[0]->IndexOperandHashMap->size();
if (ParamCount > GlobalMergingMaxParams)
return false;

unsigned Benefit = InstCount * (StableFunctionCount - 1);
unsigned Cost =
(GlobalMergingParamOverhead * ParamCount + GlobalMergingCallOverhead) *
StableFunctionCount +
GlobalMergingExtraThreshold;
double Cost = 0.0;
SmallSet<stable_hash, 8> UniqueHashVals;
for (auto &SF : SFS) {
UniqueHashVals.clear();
for (auto &[IndexPair, Hash] : *SF->IndexOperandHashMap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware that the unique values could differ, but it is still nice to count them precisely. This fix alone brings ~1% code size reduction on one of the projects I have tested.

UniqueHashVals.insert(Hash);
unsigned ParamCount = UniqueHashVals.size();
if (ParamCount > GlobalMergingMaxParams)
return false;
Comment on lines +180 to +181
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand if, if a group of mergable functions have too many params, we can't merge any of them. Could we eliminate functions until the params are small enough and only merge a subset of the group?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. To compute the actual parameters, we need to iterate over the set of candidates with the same hash. By reducing these candidates, we might obtain a smaller set of parameters. However, this adds complexity to repeat these steps until we find a smaller set. In addition, since the overall function hash still matches, we may create unprofitable merging instances that the linker does not fold. I would maintain the current approach for simplicity.

// Theoretically, if ParamCount is 0, it results in identical code folding
// (ICF), which we can skip merging here since the linker already handles
// ICF. This pass would otherwise introduce unnecessary thunks that are
// merely direct jumps. However, enabling this could be beneficial depending
// on downstream passes, so we provide an option for it.
if (GlobalMergingSkipNoParams && ParamCount == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this pass acts as an IR-level ICF when ParamCount == 0? If your testing result shows it is beneficial then I have no problems with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking this. To be explicit, does ICF get the same size win as when GlobalMergingMaxParams = 0? If so I think it would be worth saying this in the flag description to help users understand the pass a bit better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be explicit, does ICF get the same size win as when GlobalMergingMaxParams = 0?

No, this might actually increase the size since it always creates redundant thunks. However, depending on downstream passes or an improved linker's ICF, the outcome could be different. I have now set the flag to true to skip this ICF case, but an option remains available to enable it if needed.

return false;
Cost += ParamCount * GlobalMergingParamOverhead + GlobalMergingCallOverhead;
}
Cost += GlobalMergingExtraThreshold;

double Benefit =
InstCount * (StableFunctionCount - 1) * GlobalMergingInstOverhead;
bool Result = Benefit > Cost;
LLVM_DEBUG(dbgs() << "isProfitable: Hash = " << SFS[0]->Hash << ", "
<< "StableFunctionCount = " << StableFunctionCount
<< ", InstCount = " << InstCount
<< ", ParamCount = " << ParamCount
<< ", Benefit = " << Benefit << ", Cost = " << Cost
<< ", Result = " << (Result ? "true" : "false") << "\n");
return Result;
Expand Down
11 changes: 6 additions & 5 deletions llvm/lib/CodeGen/GlobalMergeFunctions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -405,12 +405,13 @@ static ParamLocsVecTy computeParamInfo(
}

ParamLocsVecTy ParamLocsVec;
for (auto &[HashSeq, Locs] : HashSeqToLocs) {
for (auto &[HashSeq, Locs] : HashSeqToLocs)
ParamLocsVec.push_back(std::move(Locs));
llvm::sort(ParamLocsVec, [&](const ParamLocs &L, const ParamLocs &R) {
return L[0] < R[0];
});
}

llvm::sort(ParamLocsVec, [&](const ParamLocs &L, const ParamLocs &R) {
return L[0] < R[0];
});

return ParamLocsVec;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
; while parameterizing a difference in their global variables, g1 and g2.
; To achieve this, we create two instances of the global merging function, f1.Tgm and f2.Tgm,
; which are tail-called from thunks f1 and f2 respectively.
; These identical functions, f1.Tgm and f2.Tgm, will be folded by the linker via Identical Code Folding (IFC).
; These identical functions, f1.Tgm and f2.Tgm, will be folded by the linker via Identical Code Folding (ICF).

; RUN: opt -S --passes=global-merge-func %s | FileCheck %s

Expand Down
42 changes: 42 additions & 0 deletions llvm/test/CodeGen/Generic/cgdata-merge-no-params.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
; This test verifies whether two identical functions, f1 and f2, can be merged
; locally using the global merge function.
; The functions, f1.Tgm and f2.Tgm, will be folded by the linker through
; Identical Code Folding (ICF).
; While identical functions can already be folded by the linker, creating this
; canonical form can be beneficial in downstream passes. This merging process
; can be controlled by the -global-merging-skip-no-params option.

; RUN: llc -enable-global-merge-func=true -global-merging-skip-no-params=false < %s | FileCheck %s --check-prefix=MERGE
; RUN: llc -enable-global-merge-func=true -global-merging-skip-no-params=true < %s | FileCheck %s --implicit-check-not=".Tgm"

; MERGE: _f1.Tgm
; MERGE: _f2.Tgm

target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-unknown-ios12.0.0"

@g = external local_unnamed_addr global [0 x i32], align 4
@g1 = external global i32, align 4
@g2 = external global i32, align 4

define i32 @f1(i32 %a) {
entry:
%idxprom = sext i32 %a to i64
%arrayidx = getelementptr inbounds [0 x i32], [0 x i32]* @g, i64 0, i64 %idxprom
%0 = load i32, i32* %arrayidx, align 4
%1 = load volatile i32, i32* @g1, align 4
%mul = mul nsw i32 %1, %0
%add = add nsw i32 %mul, 1
ret i32 %add
}

define i32 @f2(i32 %a) {
entry:
%idxprom = sext i32 %a to i64
%arrayidx = getelementptr inbounds [0 x i32], [0 x i32]* @g, i64 0, i64 %idxprom
%0 = load i32, i32* %arrayidx, align 4
%1 = load volatile i32, i32* @g1, align 4
%mul = mul nsw i32 %1, %0
%add = add nsw i32 %mul, 1
ret i32 %add
}
Loading