Skip to content

[TTI][AMDGPU] Allow targets to adjust LastCallToStaticBonus via getInliningLastCallToStaticBonus #111311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 11, 2024

Conversation

shiltian
Copy link
Contributor

@shiltian shiltian commented Oct 6, 2024

Currently we will not be able to inline a large function even if it only has one live use because the inline cost is still very high after applying LastCallToStaticBonus, which is a constant. This could significantly impact the performance because CSR spill is very expensive.

This PR adds a new function getInliningLastCallToStaticBonus to TTI to allow targets to customize this value.

Fixes SWDEV-471398.

Copy link
Contributor Author

shiltian commented Oct 6, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @shiltian and the rest of your teammates on Graphite Graphite

@jmmartinez
Copy link
Contributor

but the cost is still very close to the threshold: cost=14010, threshold=170775.

It's a 10x difference. Looks pretty safe to me. What am I missing?

TBH I do not see much of an easy alternative to what you're doing.


To test, instead of using big functions, you could pass -inline-threshold=0 -debug-only=inline-cost and check that the inline threshold increases between a function that is called once and another that is called twice.

@llvmbot
Copy link
Member

llvmbot commented Oct 7, 2024

@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Shilei Tian (shiltian)

Changes

Currently we will not inline a large function even if it only has one live use.
This could significantly impact the performance because CSR spill is very
expensive. The goal of this PR is trying to force the inlining if there is only
one live use by adjusting the inlining threshold, which is a configurable
number. The default value is 15000, which borrows from
InlineConstants::LastCallToStaticBonus. I'm not sure if this is a good number,
and if this is the right way to do that. After making this change, the callee in
my local test case can finally be inlined, but the cost is still very close to
the threshold: cost=14010, threshold=170775.

Speaking of the test, how are we gonna test this? Do we want to include a giant
IR file?

Fixes SWDEV-471398.


Full diff: https://github.com/llvm/llvm-project/pull/111311.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+10)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index d348166c2d9a04..debc3db78974ad 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -75,6 +75,10 @@ static cl::opt<size_t> InlineMaxBB(
     cl::desc("Maximum number of BBs allowed in a function after inlining"
              " (compile time constraint)"));
 
+static cl::opt<unsigned> InlineThresholdOneLiveUse(
+    "amdgpu-inline-threshold-one-live-use", cl::Hidden, cl::init(15000),
+    cl::desc("Threshold added when the callee only has one live use"));
+
 static bool dependsOnLocalPhi(const Loop *L, const Value *Cond,
                               unsigned Depth = 0) {
   const Instruction *I = dyn_cast<Instruction>(Cond);
@@ -1307,6 +1311,12 @@ unsigned GCNTTIImpl::adjustInliningThreshold(const CallBase *CB) const {
   unsigned AllocaSize = getCallArgsTotalAllocaSize(CB, DL);
   if (AllocaSize > 0)
     Threshold += ArgAllocaCost;
+
+  // Increase the threshold if it is the only call to a local function.
+  Function *Callee = CB->getCalledFunction();
+  if (Callee->hasLocalLinkage() && Callee->hasOneLiveUse())
+    Threshold += InlineThresholdOneLiveUse;
+
   return Threshold;
 }
 

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs test. Isn't there a generic control for this already?

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here. This already exists in the form of a cost bonus in the generic inlining cost model.

Is the point here that you want to effectively double the bonus by both applying it as a bonus in the generic model and as a threshold adjustment in AMDGPU TTI?

@shiltian
Copy link
Contributor Author

shiltian commented Oct 7, 2024

I'm a bit confused here. This already exists in the form of a cost bonus in the generic inlining cost model.

Is the point here that you want to effectively double the bonus by both applying it as a bonus in the generic model and as a threshold adjustment in AMDGPU TTI?

Yes. IIUC, the generic model doesn't seem to have a target dependent approach to adjust the bonus. The default value is not sufficient and we want to inline it regardless.

@shiltian shiltian force-pushed the users/shiltian/inline-threshold-with-only-one-use branch from a554afb to e4c1160 Compare October 7, 2024 15:55
@shiltian shiltian marked this pull request as ready for review October 7, 2024 15:55
@shiltian shiltian force-pushed the users/shiltian/inline-threshold-with-only-one-use branch from e4c1160 to 7f316e2 Compare October 7, 2024 16:22
@arsenm
Copy link
Contributor

arsenm commented Oct 7, 2024

Yes. IIUC, the generic model doesn't seem to have a target dependent approach to adjust the bonus. The default value is not sufficient and we want to inline it regardless.

I'd rather avoid splitting the logic for this. Where is the default handling?

@shiltian
Copy link
Contributor Author

shiltian commented Oct 7, 2024

Where is the default handling?

Cost -= LastCallToStaticBonus;

@shiltian
Copy link
Contributor Author

shiltian commented Oct 7, 2024

It's a 10x difference. Looks pretty safe to me. What am I missing?

I missed one figure Lol.

@nikic
Copy link
Contributor

nikic commented Oct 7, 2024

I'm a bit confused here. This already exists in the form of a cost bonus in the generic inlining cost model.
Is the point here that you want to effectively double the bonus by both applying it as a bonus in the generic model and as a threshold adjustment in AMDGPU TTI?

Yes. IIUC, the generic model doesn't seem to have a target dependent approach to adjust the bonus. The default value is not sufficient and we want to inline it regardless.

I think it would be cleaner to make that bonus configurable via TTI. Splitting it across two places is pretty confusing...

@shiltian
Copy link
Contributor Author

shiltian commented Oct 7, 2024

Sure. I can do that.

@shiltian shiltian force-pushed the users/shiltian/inline-threshold-with-only-one-use branch from 7f316e2 to a3b9b6f Compare October 7, 2024 19:42
@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Oct 7, 2024
@shiltian shiltian changed the title [AMDGPU] Increase inline threshold when the callee only has one live use [TTI][AMDGPU] Allow targets to adjust LastCallToStaticBonus via getInliningLastCallToStaticBonus Oct 7, 2024
@shiltian shiltian force-pushed the users/shiltian/inline-threshold-with-only-one-use branch from a3b9b6f to dba75a5 Compare October 7, 2024 19:51
@shiltian shiltian force-pushed the users/shiltian/inline-threshold-with-only-one-use branch 2 times, most recently from 2916c67 to c2376ef Compare October 8, 2024 19:47
@shiltian shiltian force-pushed the users/shiltian/inline-threshold-with-only-one-use branch from c2376ef to dfde419 Compare October 10, 2024 19:55
Copy link

github-actions bot commented Oct 10, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@shiltian shiltian force-pushed the users/shiltian/inline-threshold-with-only-one-use branch from dfde419 to 7cde4f2 Compare October 10, 2024 20:03
Currently we will not inline a large function even if it only has one live use.
This could significantly impact the performance because CSR spill is very
expensive. The goal of this PR is trying to force the inlining if there is only
one live use by adjusting the inlining threshold, which is a configurable
number. The default value is 15000, which borrows from
`InlineConstants::LastCallToStaticBonus`. I'm not sure if this is a good number,
and if this is the right way to do that. After making this change, the callee in
my local test case can finally be inlined, but the cost is still very close to
the threshold: `cost=14010, threshold=170775`.

Speaking of the test, how are we gonna test this? Do we want to include a giant
IR file?

Fixes SWDEV-471398.
@shiltian shiltian force-pushed the users/shiltian/inline-threshold-with-only-one-use branch from 7cde4f2 to 2d520d9 Compare October 11, 2024 14:19
@shiltian shiltian merged commit e34e27f into main Oct 11, 2024
5 of 6 checks passed
@shiltian shiltian deleted the users/shiltian/inline-threshold-with-only-one-use branch October 11, 2024 14:19
DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this pull request Oct 16, 2024
…tInliningLastCallToStaticBonus` (llvm#111311)

Currently we will not be able to inline a large function even if it only
has one live use because the inline cost is still very high after
applying `LastCallToStaticBonus`, which is a constant. This could
significantly impact the performance because CSR spill is very
expensive.

This PR adds a new function `getInliningLastCallToStaticBonus` to TTI to
allow targets to customize this value.

Fixes SWDEV-471398.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants