Skip to content

[AMDGPU] Skip terminators when forcing emit zero flag #112116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

shiltian
Copy link
Contributor

@shiltian shiltian commented Oct 13, 2024

When forcing emit zero, we need to skip terminators of a MBB; otherwise the terminator list of the MBB would be broken.

Copy link
Contributor Author

shiltian commented Oct 13, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @shiltian and the rest of your teammates on Graphite Graphite

@llvmbot
Copy link
Member

llvmbot commented Oct 13, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Shilei Tian (shiltian)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/112116.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp (+14-1)
  • (added) llvm/test/CodeGen/AMDGPU/waitcnt-debug-non-first-terminators.mir (+22)
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 9866ecbdddb608..28e26dc47b0ab4 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -1600,6 +1600,17 @@ static bool callWaitsOnFunctionReturn(const MachineInstr &MI) {
   return true;
 }
 
+/// \returns true if \p MI is not the first terminator of its associated MBB.
+static bool checkIfMBBNonFirstTerminator(const MachineInstr &MI) {
+  const auto &MBB = MI.getParent();
+  if (MBB->getFirstTerminator() == MI)
+    return false;
+  for (const auto &I : MBB->terminators())
+    if (&I == &MI)
+      return true;
+  return false;
+}
+
 ///  Generate s_waitcnt instruction to be placed before cur_Inst.
 ///  Instructions of a given type are returned in order,
 ///  but instructions of different types can complete out of order.
@@ -1825,7 +1836,9 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
   // Verify that the wait is actually needed.
   ScoreBrackets.simplifyWaitcnt(Wait);
 
-  if (ForceEmitZeroFlag)
+  // When forcing emit, we need to skip non-first terminators of a MBB because
+  // that would break the terminators of the MBB.
+  if (ForceEmitZeroFlag && !checkIfMBBNonFirstTerminator(MI))
     Wait = WCG->getAllZeroWaitcnt(/*IncludeVSCnt=*/false);
 
   if (ForceEmitWaitcnt[LOAD_CNT])
diff --git a/llvm/test/CodeGen/AMDGPU/waitcnt-debug-non-first-terminators.mir b/llvm/test/CodeGen/AMDGPU/waitcnt-debug-non-first-terminators.mir
new file mode 100644
index 00000000000000..530d1981f053e9
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/waitcnt-debug-non-first-terminators.mir
@@ -0,0 +1,22 @@
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -run-pass si-insert-waitcnts -amdgpu-waitcnt-forcezero=1 %s -o - | FileCheck %s
+
+...
+
+# CHECK-LABEL: waitcnt-debug-non-first-terminators
+# CHECK: S_WAITCNT 0
+# CHECK-NEXT: S_CBRANCH_SCC1 %bb.1, implicit $scc
+# CHECK-NEXT: S_BRANCH %bb.2, implicit $scc
+
+name: waitcnt-debug-non-first-terminators
+liveins:
+machineFunctionInfo:
+  isEntryFunction: true
+body:             |
+  bb.0:
+    S_CBRANCH_SCC1 %bb.1, implicit $scc
+    S_BRANCH %bb.2, implicit $scc
+  bb.1:
+    S_NOP 0
+  bb.2:
+    S_NOP 0
+...

Base automatically changed from users/shiltian/remove-unnecessary-data-member to main October 14, 2024 04:54
@shiltian shiltian force-pushed the users/shiltian/check-terminator-before-forcing-emit-zero branch from 5900729 to 0028877 Compare October 14, 2024 05:04
@shiltian shiltian force-pushed the users/shiltian/check-terminator-before-forcing-emit-zero branch from 0028877 to 66b1675 Compare October 14, 2024 05:10
@shiltian shiltian force-pushed the users/shiltian/check-terminator-before-forcing-emit-zero branch from 66b1675 to 33967ad Compare October 14, 2024 05:30
@arsenm arsenm changed the title [AMDGPU] Skip non-first termintors when forcing emit zero flag [AMDGPU] Skip non-first terminators when forcing emit zero flag Oct 14, 2024
@jayfoad
Copy link
Contributor

jayfoad commented Oct 14, 2024

Why do you say "non-first terminators" in the description?

In the test case, why does SIInsertWaitcnts want to put a wait between the two branch instructions?

@shiltian shiltian changed the title [AMDGPU] Skip non-first terminators when forcing emit zero flag [AMDGPU] Skip terminators when forcing emit zero flag Oct 14, 2024
@shiltian shiltian force-pushed the users/shiltian/check-terminator-before-forcing-emit-zero branch from 33967ad to 5bf3b23 Compare October 14, 2024 13:37
@shiltian
Copy link
Contributor Author

Why do you say "non-first terminators" in the description?

Updated the description. The first version indeed only considered non-first terminators.

In the test case, why does SIInsertWaitcnts want to put a wait between the two branch instructions?

Because we force emit zero via amdgpu-waitcnt-forcezero. That is a debug switch, but didn't work properly in some cases.

@shiltian shiltian force-pushed the users/shiltian/check-terminator-before-forcing-emit-zero branch from 5bf3b23 to 62d9aed Compare October 14, 2024 13:39
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point to avoid this was here

Don't know if we want the wait at block end or wait on successor entry behavior

; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: S_WAITCNT 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this version moves the waitcnt from the end of the block to the start of the successor blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, this waitcnt was just inserted before S_NOP, which is always there. The previous version just didn't have all the check lines.

@jayfoad
Copy link
Contributor

jayfoad commented Oct 14, 2024

In the test case, why does SIInsertWaitcnts want to put a wait between the two branch instructions?

Because we force emit zero via amdgpu-waitcnt-forcezero. That is a debug switch, but didn't work properly in some cases.

Oh, I see. That option is weird. I thought it meant "if any waitcnt is required, wait for the counter to be 0". But actually it waits for the counter to be zero after every instruction, even instructions that have nothing to do with wait counts.

@shiltian
Copy link
Contributor Author

That option is weird. I thought it meant "if any waitcnt is required, wait for the counter to be 0". But actually it waits for the counter to be zero after every instruction, even instructions that have nothing to do with wait counts.

The name of the option is indeed a little bit confusing.

I think the point to avoid this was here

We do need waitcnt in some cases (even w/o the option). That indicates that we can't simply skip the iteration (calling generateWaitcntInstBefore) if the instruction is a terminator.

    bool FlushVmCnt = Block.getFirstTerminator() == Inst &&
                      isPreheaderToFlush(Block, ScoreBrackets);

@shiltian shiltian merged commit a746594 into main Oct 14, 2024
5 of 8 checks passed
@shiltian shiltian deleted the users/shiltian/check-terminator-before-forcing-emit-zero branch October 14, 2024 15:46
DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this pull request Oct 16, 2024
When forcing emit zero, we need to skip terminators of a MBB; otherwise
the terminator list of the MBB would be broken.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants