-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] Skip terminators when forcing emit zero flag #112116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Skip terminators when forcing emit zero flag #112116
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
@llvm/pr-subscribers-backend-amdgpu Author: Shilei Tian (shiltian) ChangesFull diff: https://github.com/llvm/llvm-project/pull/112116.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 9866ecbdddb608..28e26dc47b0ab4 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -1600,6 +1600,17 @@ static bool callWaitsOnFunctionReturn(const MachineInstr &MI) {
return true;
}
+/// \returns true if \p MI is not the first terminator of its associated MBB.
+static bool checkIfMBBNonFirstTerminator(const MachineInstr &MI) {
+ const auto &MBB = MI.getParent();
+ if (MBB->getFirstTerminator() == MI)
+ return false;
+ for (const auto &I : MBB->terminators())
+ if (&I == &MI)
+ return true;
+ return false;
+}
+
/// Generate s_waitcnt instruction to be placed before cur_Inst.
/// Instructions of a given type are returned in order,
/// but instructions of different types can complete out of order.
@@ -1825,7 +1836,9 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
// Verify that the wait is actually needed.
ScoreBrackets.simplifyWaitcnt(Wait);
- if (ForceEmitZeroFlag)
+ // When forcing emit, we need to skip non-first terminators of a MBB because
+ // that would break the terminators of the MBB.
+ if (ForceEmitZeroFlag && !checkIfMBBNonFirstTerminator(MI))
Wait = WCG->getAllZeroWaitcnt(/*IncludeVSCnt=*/false);
if (ForceEmitWaitcnt[LOAD_CNT])
diff --git a/llvm/test/CodeGen/AMDGPU/waitcnt-debug-non-first-terminators.mir b/llvm/test/CodeGen/AMDGPU/waitcnt-debug-non-first-terminators.mir
new file mode 100644
index 00000000000000..530d1981f053e9
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/waitcnt-debug-non-first-terminators.mir
@@ -0,0 +1,22 @@
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -run-pass si-insert-waitcnts -amdgpu-waitcnt-forcezero=1 %s -o - | FileCheck %s
+
+...
+
+# CHECK-LABEL: waitcnt-debug-non-first-terminators
+# CHECK: S_WAITCNT 0
+# CHECK-NEXT: S_CBRANCH_SCC1 %bb.1, implicit $scc
+# CHECK-NEXT: S_BRANCH %bb.2, implicit $scc
+
+name: waitcnt-debug-non-first-terminators
+liveins:
+machineFunctionInfo:
+ isEntryFunction: true
+body: |
+ bb.0:
+ S_CBRANCH_SCC1 %bb.1, implicit $scc
+ S_BRANCH %bb.2, implicit $scc
+ bb.1:
+ S_NOP 0
+ bb.2:
+ S_NOP 0
+...
|
5900729
to
0028877
Compare
0028877
to
66b1675
Compare
66b1675
to
33967ad
Compare
Why do you say "non-first terminators" in the description? In the test case, why does SIInsertWaitcnts want to put a wait between the two branch instructions? |
33967ad
to
5bf3b23
Compare
Updated the description. The first version indeed only considered non-first terminators.
Because we force emit zero via |
5bf3b23
to
62d9aed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the point to avoid this was here
E = Block.instr_end(); |
Don't know if we want the wait at block end or wait on successor entry behavior
; CHECK-NEXT: bb.1: | ||
; CHECK-NEXT: successors: %bb.2(0x80000000) | ||
; CHECK-NEXT: {{ $}} | ||
; CHECK-NEXT: S_WAITCNT 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this version moves the waitcnt from the end of the block to the start of the successor blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, this waitcnt was just inserted before S_NOP
, which is always there. The previous version just didn't have all the check lines.
Oh, I see. That option is weird. I thought it meant "if any waitcnt is required, wait for the counter to be 0". But actually it waits for the counter to be zero after every instruction, even instructions that have nothing to do with wait counts. |
The name of the option is indeed a little bit confusing.
We do need waitcnt in some cases (even w/o the option). That indicates that we can't simply skip the iteration (calling
|
When forcing emit zero, we need to skip terminators of a MBB; otherwise the terminator list of the MBB would be broken.
When forcing emit zero, we need to skip terminators of a MBB; otherwise the terminator list of the MBB would be broken.