-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[TTI][AMDGPU] Allow targets to adjust LastCallToStaticBonus
via getInliningLastCallToStaticBonus
#111311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TTI][AMDGPU] Allow targets to adjust LastCallToStaticBonus
via getInliningLastCallToStaticBonus
#111311
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
It's a 10x difference. Looks pretty safe to me. What am I missing? TBH I do not see much of an easy alternative to what you're doing. To test, instead of using big functions, you could pass |
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-backend-amdgpu Author: Shilei Tian (shiltian) ChangesCurrently we will not inline a large function even if it only has one live use. Speaking of the test, how are we gonna test this? Do we want to include a giant Fixes SWDEV-471398. Full diff: https://github.com/llvm/llvm-project/pull/111311.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index d348166c2d9a04..debc3db78974ad 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -75,6 +75,10 @@ static cl::opt<size_t> InlineMaxBB(
cl::desc("Maximum number of BBs allowed in a function after inlining"
" (compile time constraint)"));
+static cl::opt<unsigned> InlineThresholdOneLiveUse(
+ "amdgpu-inline-threshold-one-live-use", cl::Hidden, cl::init(15000),
+ cl::desc("Threshold added when the callee only has one live use"));
+
static bool dependsOnLocalPhi(const Loop *L, const Value *Cond,
unsigned Depth = 0) {
const Instruction *I = dyn_cast<Instruction>(Cond);
@@ -1307,6 +1311,12 @@ unsigned GCNTTIImpl::adjustInliningThreshold(const CallBase *CB) const {
unsigned AllocaSize = getCallArgsTotalAllocaSize(CB, DL);
if (AllocaSize > 0)
Threshold += ArgAllocaCost;
+
+ // Increase the threshold if it is the only call to a local function.
+ Function *Callee = CB->getCalledFunction();
+ if (Callee->hasLocalLinkage() && Callee->hasOneLiveUse())
+ Threshold += InlineThresholdOneLiveUse;
+
return Threshold;
}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs test. Isn't there a generic control for this already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused here. This already exists in the form of a cost bonus in the generic inlining cost model.
Is the point here that you want to effectively double the bonus by both applying it as a bonus in the generic model and as a threshold adjustment in AMDGPU TTI?
Yes. IIUC, the generic model doesn't seem to have a target dependent approach to adjust the bonus. The default value is not sufficient and we want to inline it regardless. |
a554afb
to
e4c1160
Compare
e4c1160
to
7f316e2
Compare
I'd rather avoid splitting the logic for this. Where is the default handling? |
llvm-project/llvm/lib/Analysis/InlineCost.cpp Line 2032 in 8a9e9a8
|
I missed one figure Lol. |
I think it would be cleaner to make that bonus configurable via TTI. Splitting it across two places is pretty confusing... |
Sure. I can do that. |
7f316e2
to
a3b9b6f
Compare
LastCallToStaticBonus
via getInliningLastCallToStaticBonus
a3b9b6f
to
dba75a5
Compare
2916c67
to
c2376ef
Compare
c2376ef
to
dfde419
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
dfde419
to
7cde4f2
Compare
Currently we will not inline a large function even if it only has one live use. This could significantly impact the performance because CSR spill is very expensive. The goal of this PR is trying to force the inlining if there is only one live use by adjusting the inlining threshold, which is a configurable number. The default value is 15000, which borrows from `InlineConstants::LastCallToStaticBonus`. I'm not sure if this is a good number, and if this is the right way to do that. After making this change, the callee in my local test case can finally be inlined, but the cost is still very close to the threshold: `cost=14010, threshold=170775`. Speaking of the test, how are we gonna test this? Do we want to include a giant IR file? Fixes SWDEV-471398.
7cde4f2
to
2d520d9
Compare
…tInliningLastCallToStaticBonus` (llvm#111311) Currently we will not be able to inline a large function even if it only has one live use because the inline cost is still very high after applying `LastCallToStaticBonus`, which is a constant. This could significantly impact the performance because CSR spill is very expensive. This PR adds a new function `getInliningLastCallToStaticBonus` to TTI to allow targets to customize this value. Fixes SWDEV-471398.
Currently we will not be able to inline a large function even if it only has one live use because the inline cost is still very high after applying
LastCallToStaticBonus
, which is a constant. This could significantly impact the performance because CSR spill is very expensive.This PR adds a new function
getInliningLastCallToStaticBonus
to TTI to allow targets to customize this value.Fixes SWDEV-471398.