[TTI][AMDGPU] Allow targets to adjust `LastCallToStaticBonus` via `getInliningLastCallToStaticBonus` #111311

shiltian · 2024-10-06T20:28:19Z

Currently we will not be able to inline a large function even if it only has one live use because the inline cost is still very high after applying LastCallToStaticBonus, which is a constant. This could significantly impact the performance because CSR spill is very expensive.

This PR adds a new function getInliningLastCallToStaticBonus to TTI to allow targets to customize this value.

Fixes SWDEV-471398.

shiltian · 2024-10-06T20:28:34Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @shiltian and the rest of your teammates on Graphite

jmmartinez · 2024-10-07T08:00:31Z

but the cost is still very close to the threshold: cost=14010, threshold=170775.

It's a 10x difference. Looks pretty safe to me. What am I missing?

TBH I do not see much of an easy alternative to what you're doing.

To test, instead of using big functions, you could pass -inline-threshold=0 -debug-only=inline-cost and check that the inline threshold increases between a function that is called once and another that is called twice.

llvmbot · 2024-10-07T09:17:19Z

@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Shilei Tian (shiltian)

Changes

Currently we will not inline a large function even if it only has one live use.
This could significantly impact the performance because CSR spill is very
expensive. The goal of this PR is trying to force the inlining if there is only
one live use by adjusting the inlining threshold, which is a configurable
number. The default value is 15000, which borrows from
InlineConstants::LastCallToStaticBonus. I'm not sure if this is a good number,
and if this is the right way to do that. After making this change, the callee in
my local test case can finally be inlined, but the cost is still very close to
the threshold: cost=14010, threshold=170775.

Speaking of the test, how are we gonna test this? Do we want to include a giant
IR file?

Fixes SWDEV-471398.

Full diff: https://github.com/llvm/llvm-project/pull/111311.diff

1 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+10)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index d348166c2d9a04..debc3db78974ad 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -75,6 +75,10 @@ static cl::opt<size_t> InlineMaxBB(
     cl::desc("Maximum number of BBs allowed in a function after inlining"
              " (compile time constraint)"));
 
+static cl::opt<unsigned> InlineThresholdOneLiveUse(
+    "amdgpu-inline-threshold-one-live-use", cl::Hidden, cl::init(15000),
+    cl::desc("Threshold added when the callee only has one live use"));
+
 static bool dependsOnLocalPhi(const Loop *L, const Value *Cond,
                               unsigned Depth = 0) {
   const Instruction *I = dyn_cast<Instruction>(Cond);
@@ -1307,6 +1311,12 @@ unsigned GCNTTIImpl::adjustInliningThreshold(const CallBase *CB) const {
   unsigned AllocaSize = getCallArgsTotalAllocaSize(CB, DL);
   if (AllocaSize > 0)
     Threshold += ArgAllocaCost;
+
+  // Increase the threshold if it is the only call to a local function.
+  Function *Callee = CB->getCalledFunction();
+  if (Callee->hasLocalLinkage() && Callee->hasOneLiveUse())
+    Threshold += InlineThresholdOneLiveUse;
+
   return Threshold;
 }

arsenm

Needs test. Isn't there a generic control for this already?

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

nikic

I'm a bit confused here. This already exists in the form of a cost bonus in the generic inlining cost model.

Is the point here that you want to effectively double the bonus by both applying it as a bonus in the generic model and as a threshold adjustment in AMDGPU TTI?

shiltian · 2024-10-07T13:50:40Z

I'm a bit confused here. This already exists in the form of a cost bonus in the generic inlining cost model.

Is the point here that you want to effectively double the bonus by both applying it as a bonus in the generic model and as a threshold adjustment in AMDGPU TTI?

Yes. IIUC, the generic model doesn't seem to have a target dependent approach to adjust the bonus. The default value is not sufficient and we want to inline it regardless.

arsenm · 2024-10-07T16:54:19Z

Yes. IIUC, the generic model doesn't seem to have a target dependent approach to adjust the bonus. The default value is not sufficient and we want to inline it regardless.

I'd rather avoid splitting the logic for this. Where is the default handling?

shiltian · 2024-10-07T16:58:46Z

Where is the default handling?

llvm-project/llvm/lib/Analysis/InlineCost.cpp

Line 2032 in 8a9e9a8

Cost -= LastCallToStaticBonus;

shiltian · 2024-10-07T18:12:34Z

It's a 10x difference. Looks pretty safe to me. What am I missing?

I missed one figure Lol.

nikic · 2024-10-07T18:58:10Z

I'm a bit confused here. This already exists in the form of a cost bonus in the generic inlining cost model.
Is the point here that you want to effectively double the bonus by both applying it as a bonus in the generic model and as a threshold adjustment in AMDGPU TTI?

Yes. IIUC, the generic model doesn't seem to have a target dependent approach to adjust the bonus. The default value is not sufficient and we want to inline it regardless.

I think it would be cleaner to make that bonus configurable via TTI. Splitting it across two places is pretty confusing...

shiltian · 2024-10-07T19:06:19Z

Sure. I can do that.

llvm/lib/Analysis/InlineCost.cpp

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

github-actions · 2024-10-10T19:59:30Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/Analysis/InlineCost.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Currently we will not inline a large function even if it only has one live use. This could significantly impact the performance because CSR spill is very expensive. The goal of this PR is trying to force the inlining if there is only one live use by adjusting the inlining threshold, which is a configurable number. The default value is 15000, which borrows from `InlineConstants::LastCallToStaticBonus`. I'm not sure if this is a good number, and if this is the right way to do that. After making this change, the callee in my local test case can finally be inlined, but the cost is still very close to the threshold: `cost=14010, threshold=170775`. Speaking of the test, how are we gonna test this? Do we want to include a giant IR file? Fixes SWDEV-471398.

…tInliningLastCallToStaticBonus` (llvm#111311) Currently we will not be able to inline a large function even if it only has one live use because the inline cost is still very high after applying `LastCallToStaticBonus`, which is a constant. This could significantly impact the performance because CSR spill is very expensive. This PR adds a new function `getInliningLastCallToStaticBonus` to TTI to allow targets to customize this value. Fixes SWDEV-471398.

shiltian requested review from jmmartinez, arsenm and nikic October 6, 2024 20:28

arsenm added the backend:AMDGPU label Oct 7, 2024

arsenm reviewed Oct 7, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp Outdated Show resolved Hide resolved

nikic reviewed Oct 7, 2024

View reviewed changes

shiltian force-pushed the users/shiltian/inline-threshold-with-only-one-use branch from a554afb to e4c1160 Compare October 7, 2024 15:55

shiltian marked this pull request as ready for review October 7, 2024 15:55

llvmbot added the llvm:transforms label Oct 7, 2024

shiltian force-pushed the users/shiltian/inline-threshold-with-only-one-use branch from e4c1160 to 7f316e2 Compare October 7, 2024 16:22

shiltian force-pushed the users/shiltian/inline-threshold-with-only-one-use branch from 7f316e2 to a3b9b6f Compare October 7, 2024 19:42

llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Oct 7, 2024

shiltian commented Oct 7, 2024

View reviewed changes

llvm/lib/Analysis/InlineCost.cpp Show resolved Hide resolved

shiltian changed the title ~~[AMDGPU] Increase inline threshold when the callee only has one live use~~ [TTI][AMDGPU] Allow targets to adjust LastCallToStaticBonus via getInliningLastCallToStaticBonus Oct 7, 2024

shiltian commented Oct 7, 2024

View reviewed changes

llvm/include/llvm/Analysis/TargetTransformInfo.h Show resolved Hide resolved

shiltian commented Oct 7, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp Outdated Show resolved Hide resolved

shiltian force-pushed the users/shiltian/inline-threshold-with-only-one-use branch from a3b9b6f to dba75a5 Compare October 7, 2024 19:51

shiltian commented Oct 7, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp Outdated Show resolved Hide resolved

shiltian force-pushed the users/shiltian/inline-threshold-with-only-one-use branch 2 times, most recently from 2916c67 to c2376ef Compare October 8, 2024 19:47

arsenm reviewed Oct 8, 2024

View reviewed changes

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h Outdated Show resolved Hide resolved

shiltian force-pushed the users/shiltian/inline-threshold-with-only-one-use branch from c2376ef to dfde419 Compare October 10, 2024 19:55

shiltian force-pushed the users/shiltian/inline-threshold-with-only-one-use branch from dfde419 to 7cde4f2 Compare October 10, 2024 20:03

arsenm reviewed Oct 10, 2024

View reviewed changes

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h Outdated Show resolved Hide resolved

shiltian commented Oct 10, 2024

View reviewed changes

llvm/include/llvm/Analysis/InlineCost.h Show resolved Hide resolved

arsenm approved these changes Oct 11, 2024

View reviewed changes

nikic reviewed Oct 11, 2024

View reviewed changes

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h Outdated Show resolved Hide resolved

shiltian force-pushed the users/shiltian/inline-threshold-with-only-one-use branch from 7cde4f2 to 2d520d9 Compare October 11, 2024 14:19

shiltian merged commit e34e27f into main Oct 11, 2024
5 of 6 checks passed

shiltian deleted the users/shiltian/inline-threshold-with-only-one-use branch October 11, 2024 14:19

[TTI][AMDGPU] Allow targets to adjust LastCallToStaticBonus via getInliningLastCallToStaticBonus #111311

[TTI][AMDGPU] Allow targets to adjust LastCallToStaticBonus via getInliningLastCallToStaticBonus #111311

Uh oh!

Conversation

shiltian commented Oct 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian commented Oct 6, 2024

Uh oh!

jmmartinez commented Oct 7, 2024

Uh oh!

llvmbot commented Oct 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

shiltian commented Oct 7, 2024

Uh oh!

arsenm commented Oct 7, 2024

Uh oh!

shiltian commented Oct 7, 2024

Uh oh!

shiltian commented Oct 7, 2024

Uh oh!

nikic commented Oct 7, 2024

Uh oh!

shiltian commented Oct 7, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[TTI][AMDGPU] Allow targets to adjust `LastCallToStaticBonus` via `getInliningLastCallToStaticBonus` #111311

[TTI][AMDGPU] Allow targets to adjust `LastCallToStaticBonus` via `getInliningLastCallToStaticBonus` #111311

shiltian commented Oct 6, 2024 •

edited

Loading

llvmbot commented Oct 7, 2024 •

edited

Loading

github-actions bot commented Oct 10, 2024 •

edited

Loading