Skip to content

Commit e4c1160

Browse files
committed
[AMDGPU] Increase inline threshold when the callee only has one live use
Currently we will not inline a large function even if it only has one live use. This could significantly impact the performance because CSR spill is very expensive. The goal of this PR is trying to force the inlining if there is only one live use by adjusting the inlining threshold, which is a configurable number. The default value is 15000, which borrows from `InlineConstants::LastCallToStaticBonus`. I'm not sure if this is a good number, and if this is the right way to do that. After making this change, the callee in my local test case can finally be inlined, but the cost is still very close to the threshold: `cost=14010, threshold=170775`. Speaking of the test, how are we gonna test this? Do we want to include a giant IR file? Fixes SWDEV-471398.
1 parent 5e7cc37 commit e4c1160

File tree

2 files changed

+36
-0
lines changed

2 files changed

+36
-0
lines changed

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,10 @@ static cl::opt<size_t> InlineMaxBB(
7575
cl::desc("Maximum number of BBs allowed in a function after inlining"
7676
" (compile time constraint)"));
7777

78+
static cl::opt<unsigned> InlineThresholdOneLiveUse(
79+
"amdgpu-inline-threshold-one-live-use", cl::Hidden, cl::init(15000),
80+
cl::desc("Threshold added when the callee only has one live use"));
81+
7882
static bool dependsOnLocalPhi(const Loop *L, const Value *Cond,
7983
unsigned Depth = 0) {
8084
const Instruction *I = dyn_cast<Instruction>(Cond);
@@ -1307,6 +1311,12 @@ unsigned GCNTTIImpl::adjustInliningThreshold(const CallBase *CB) const {
13071311
unsigned AllocaSize = getCallArgsTotalAllocaSize(CB, DL);
13081312
if (AllocaSize > 0)
13091313
Threshold += ArgAllocaCost;
1314+
1315+
// Increase the threshold if it is the only call to a local function.
1316+
Function *Callee = CB->getCalledFunction();
1317+
if (Callee && Callee->hasLocalLinkage() && Callee->hasOneLiveUse())
1318+
Threshold += InlineThresholdOneLiveUse;
1319+
13101320
return Threshold;
13111321
}
13121322

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
; RUN: opt -mtriple=amdgcn-amd-amdhsa -S -passes=inline -inline-threshold=0 -debug-only=inline-cost %s -o - 2>&1 | FileCheck --check-prefixes=CHECK,CHECK-DEFAULT %s
2+
; RUN: opt -mtriple=amdgcn-amd-amdhsa -S -passes=inline -inline-threshold=0 -debug-only=inline-cost %s -amdgpu-inline-threshold-one-live-use=1024 -o - 2>&1 | FileCheck --check-prefixes=CHECK,CHECK-USER %s
3+
; REQUIRES: asserts
4+
5+
; CHECK: Analyzing call of callee_not_only_one_live_use... (caller:caller)
6+
; CHECK: Cost: -30
7+
; CHECK: Threshold: 0
8+
; CHECK: Analyzing call of callee_only_one_live_use... (caller:caller)
9+
; CHECK: Cost: -15030
10+
; CHECK-DEFAULT: Threshold: 247500
11+
; CHECK-USER: Threshold: 16896
12+
13+
define internal void @callee_not_only_one_live_use() {
14+
ret void
15+
}
16+
17+
define internal void @callee_only_one_live_use() {
18+
ret void
19+
}
20+
21+
define void @caller() {
22+
call void @callee_not_only_one_live_use()
23+
call void @callee_not_only_one_live_use()
24+
call void @callee_only_one_live_use()
25+
ret void
26+
}

0 commit comments

Comments
 (0)