-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] Make max dwords of memory cluster configurable #119342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We find it helpful to increase the value for graphics workload. Make it configurable so we can experiment with a different value. It might be more helpful we can have a per-function value, but I am not sure how this can be done properly.
@llvm/pr-subscribers-backend-amdgpu Author: Ruiling, Song (ruiling) ChangesWe find it helpful to increase the value for graphics workload. Make it configurable so we can experiment with a different value. It might be more helpful we can have a per-function value, but I am not sure how this can be done properly. Full diff: https://github.com/llvm/llvm-project/pull/119342.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 4a94d690297949..ed651adcc8cd2e 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -60,6 +60,11 @@ static cl::opt<bool> Fix16BitCopies(
cl::init(true),
cl::ReallyHidden);
+static cl::opt<unsigned> MaxMemoryClusterDWORDS(
+ "amdgpu-max-memory-cluster-dwords", cl::Hidden, cl::init(8),
+ cl::desc(
+ "Restrict the maximum dwords for memory cluster during scheduler"));
+
SIInstrInfo::SIInstrInfo(const GCNSubtarget &ST)
: AMDGPUGenInstrInfo(AMDGPU::ADJCALLSTACKUP, AMDGPU::ADJCALLSTACKDOWN),
RI(ST), ST(ST) {
@@ -565,12 +570,14 @@ bool SIInstrInfo::shouldClusterMemOps(ArrayRef<const MachineOperand *> BaseOps1,
}
// In order to avoid register pressure, on an average, the number of DWORDS
- // loaded together by all clustered mem ops should not exceed 8. This is an
- // empirical value based on certain observations and performance related
- // experiments.
+ // loaded together by all clustered mem ops should not exceed
+ // MaxMemoryClusterDWORDS. This is an empirical value based on certain
+ // observations and performance related experiments.
// The good thing about this heuristic is - it avoids clustering of too many
// sub-word loads, and also avoids clustering of wide loads. Below is the
- // brief summary of how the heuristic behaves for various `LoadSize`.
+ // brief summary of how the heuristic behaves for various `LoadSize` when
+ // MaxMemoryClusterDWORDS is 8.
+ //
// (1) 1 <= LoadSize <= 4: cluster at max 8 mem ops
// (2) 5 <= LoadSize <= 8: cluster at max 4 mem ops
// (3) 9 <= LoadSize <= 12: cluster at max 2 mem ops
@@ -578,7 +585,7 @@ bool SIInstrInfo::shouldClusterMemOps(ArrayRef<const MachineOperand *> BaseOps1,
// (5) LoadSize >= 17: do not cluster
const unsigned LoadSize = NumBytes / ClusterSize;
const unsigned NumDWORDs = ((LoadSize + 3) / 4) * ClusterSize;
- return NumDWORDs <= 8;
+ return NumDWORDs <= MaxMemoryClusterDWORDS;
}
// FIXME: This behaves strangely. If, for example, you have 32 load + stores,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function attribute
Thanks! What's the suggested way to access that value in function attribute from the SIInstrInfo? Maybe keep a record of that value in MachineFunctionInfo and get from it? |
That's just a matter of caching it in MFI or whether read directly from the IR attribute. In either case, the MachineFunction needs to be passed into the hook |
Added a function attribute for it, do we want to keep the command line option for it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the mir test changes parse the value?
if (MFI->getMaxMemoryClusterDWords()) | ||
MaxMemoryClusterDWords = MFI->getMaxMemoryClusterDWords(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should just directly take MFI->getMaxMemoryClusterDWords()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's about maintaining the default value 8
. Without MFI, shouldClusterMemOps()
should still have a copy of the default value 8
, right? If we directly take the MFI->getMaxMemoryClusterDWords()
, that would mean we need to keep the default value in both MFI
and within shouldClusterMemOps()
. Do we want that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed this by adding a default limit value in SIInstrInfo.h, which could be accessible from both places.
Do you mean parsing the value from function attribute? I will add one. |
|
Fix lit-test failure and make it easy to get a mir test.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/10539 Here is the relevant piece of the build log for the reference
|
We find it helpful to increase the value for graphics workload. Make it configurable so we can experiment with a different value. It might be more helpful we can have a per-function value, but I am not sure how this can be done properly.