-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] Update hasUnwantedEffectsWhenEXECEmpty #97982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently includes #97676 as this builts on top of it. |
@llvm/pr-subscribers-backend-amdgpu Author: Carl Ritson (perlfu) ChangesAdd barriers and s_wait_event to hasUnwantedEffectsWhenEXECEmpty. Patch is 25.73 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/97982.diff 5 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index cc1b9ac0c9ecda..a2cb3834643227 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -4118,6 +4118,13 @@ bool SIInstrInfo::modifiesModeRegister(const MachineInstr &MI) {
}
bool SIInstrInfo::hasUnwantedEffectsWhenEXECEmpty(const MachineInstr &MI) const {
+ // This function is used to determine if an instruction can be safely
+ // executed under EXECZ without hardware error, indeterminate results,
+ // and/or visible effects on future vector execution or outside the shader.
+ // Note: as of 2024 the only use of this is SIPreEmitPeephole where it is
+ // used in removing branches over short EXECZ sequences.
+ // As such it embeds certain assumptions which may not apply in every case
+ // of EXECZ execution.
unsigned Opcode = MI.getOpcode();
if (MI.mayStore() && isSMRD(MI))
@@ -4136,12 +4143,17 @@ bool SIInstrInfo::hasUnwantedEffectsWhenEXECEmpty(const MachineInstr &MI) const
if (Opcode == AMDGPU::S_SENDMSG || Opcode == AMDGPU::S_SENDMSGHALT ||
isEXP(Opcode) ||
Opcode == AMDGPU::DS_ORDERED_COUNT || Opcode == AMDGPU::S_TRAP ||
- Opcode == AMDGPU::DS_GWS_INIT || Opcode == AMDGPU::DS_GWS_BARRIER)
+ Opcode == AMDGPU::DS_GWS_INIT || Opcode == AMDGPU::DS_GWS_BARRIER ||
+ Opcode == AMDGPU::S_WAIT_EVENT)
return true;
if (MI.isCall() || MI.isInlineAsm())
return true; // conservative assumption
+ // Assume that barrier interactions are only intended with active lanes.
+ if (isBarrierRelated(Opcode))
+ return true;
+
// A mode change is a scalar operation that influences vector instructions.
if (modifiesModeRegister(MI))
return true;
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index 1e2b687854c77a..bee24b3a7a91b3 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -936,6 +936,14 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
Opcode == AMDGPU::S_BARRIER_SIGNAL_ISFIRST_IMM;
}
+ bool isBarrierRelated(unsigned Opcode) const {
+ return isBarrierStart(Opcode) || Opcode == AMDGPU::S_BARRIER_WAIT ||
+ Opcode == AMDGPU::S_BARRIER_INIT_M0 ||
+ Opcode == AMDGPU::S_BARRIER_INIT_IMM ||
+ Opcode == AMDGPU::S_BARRIER_JOIN_IMM ||
+ Opcode == AMDGPU::S_BARRIER_LEAVE;
+ }
+
static bool doesNotReadTiedSource(const MachineInstr &MI) {
return MI.getDesc().TSFlags & SIInstrFlags::TiedSourceNotRead;
}
@@ -967,6 +975,29 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
}
}
+ bool isWaitcnt(unsigned Opcode) const {
+ switch (getNonSoftWaitcntOpcode(Opcode)) {
+ case AMDGPU::S_WAITCNT:
+ case AMDGPU::S_WAITCNT_VSCNT:
+ case AMDGPU::S_WAITCNT_VMCNT:
+ case AMDGPU::S_WAITCNT_EXPCNT:
+ case AMDGPU::S_WAITCNT_LGKMCNT:
+ case AMDGPU::S_WAIT_LOADCNT:
+ case AMDGPU::S_WAIT_LOADCNT_DSCNT:
+ case AMDGPU::S_WAIT_STORECNT:
+ case AMDGPU::S_WAIT_STORECNT_DSCNT:
+ case AMDGPU::S_WAIT_SAMPLECNT:
+ case AMDGPU::S_WAIT_BVHCNT:
+ case AMDGPU::S_WAIT_EXPCNT:
+ case AMDGPU::S_WAIT_DSCNT:
+ case AMDGPU::S_WAIT_KMCNT:
+ case AMDGPU::S_WAIT_IDLE:
+ return true;
+ default:
+ return false;
+ }
+ }
+
bool isVGPRCopy(const MachineInstr &MI) const {
assert(isCopyInstr(MI));
Register Dest = MI.getOperand(0).getReg();
diff --git a/llvm/lib/Target/AMDGPU/SIPreEmitPeephole.cpp b/llvm/lib/Target/AMDGPU/SIPreEmitPeephole.cpp
index 875bccb208c846..1334029544f999 100644
--- a/llvm/lib/Target/AMDGPU/SIPreEmitPeephole.cpp
+++ b/llvm/lib/Target/AMDGPU/SIPreEmitPeephole.cpp
@@ -328,7 +328,7 @@ bool SIPreEmitPeephole::mustRetainExeczBranch(
// These instructions are potentially expensive even if EXEC = 0.
if (TII->isSMRD(MI) || TII->isVMEM(MI) || TII->isFLAT(MI) ||
- TII->isDS(MI) || MI.getOpcode() == AMDGPU::S_WAITCNT)
+ TII->isDS(MI) || TII->isWaitcnt(MI.getOpcode()))
return true;
++NumInstr;
diff --git a/llvm/test/CodeGen/AMDGPU/insert-skips-gfx10.mir b/llvm/test/CodeGen/AMDGPU/insert-skips-gfx10.mir
new file mode 100644
index 00000000000000..b4ed3cafbacb5f
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/insert-skips-gfx10.mir
@@ -0,0 +1,216 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1030 -run-pass si-pre-emit-peephole -amdgpu-skip-threshold=10 -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: skip_waitcnt_vscnt
+body: |
+ ; CHECK-LABEL: name: skip_waitcnt_vscnt
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_WAITCNT_VSCNT $sgpr_null, 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_WAITCNT_VSCNT $sgpr_null, 0
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_waitcnt_expcnt
+body: |
+ ; CHECK-LABEL: name: skip_waitcnt_expcnt
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_WAITCNT_EXPCNT $sgpr_null, 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_WAITCNT_EXPCNT $sgpr_null, 0
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_waitcnt_vmcnt
+body: |
+ ; CHECK-LABEL: name: skip_waitcnt_vmcnt
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_WAITCNT_VMCNT $sgpr_null, 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_WAITCNT_VMCNT $sgpr_null, 0
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_waitcnt_lgkmcnt
+body: |
+ ; CHECK-LABEL: name: skip_waitcnt_lgkmcnt
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_WAITCNT_LGKMCNT $sgpr_null, 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_WAITCNT_LGKMCNT $sgpr_null, 0
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_wait_idle
+body: |
+ ; CHECK-LABEL: name: skip_wait_idle
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_WAIT_IDLE
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_WAIT_IDLE
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_bvh
+body: |
+ ; CHECK-LABEL: name: skip_bvh
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14 = IMPLICIT_DEF
+ ; CHECK-NEXT: $sgpr0_sgpr1_sgpr2_sgpr3 = IMPLICIT_DEF
+ ; CHECK-NEXT: $vgpr0_vgpr1_vgpr2_vgpr3 = IMAGE_BVH_INTERSECT_RAY_sa_gfx11 $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14, renamable $sgpr0_sgpr1_sgpr2_sgpr3, 0, implicit $exec :: (dereferenceable load (s128), addrspace 7)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14 = IMPLICIT_DEF
+ $sgpr0_sgpr1_sgpr2_sgpr3 = IMPLICIT_DEF
+ $vgpr0_vgpr1_vgpr2_vgpr3 = IMAGE_BVH_INTERSECT_RAY_sa_gfx11 $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14, renamable $sgpr0_sgpr1_sgpr2_sgpr3, 0, implicit $exec :: (dereferenceable load (s128), addrspace 7)
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_barrier
+body: |
+ ; CHECK-LABEL: name: skip_barrier
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_BARRIER
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_BARRIER
+
+ bb.2:
+ S_ENDPGM 0
+...
diff --git a/llvm/test/CodeGen/AMDGPU/insert-skips-gfx12.mir b/llvm/test/CodeGen/AMDGPU/insert-skips-gfx12.mir
new file mode 100644
index 00000000000000..2d092974ac566f
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/insert-skips-gfx12.mir
@@ -0,0 +1,610 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1200 -run-pass si-pre-emit-peephole -amdgpu-skip-threshold=10 -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: skip_wait_loadcnt
+body: |
+ ; CHECK-LABEL: name: skip_wait_loadcnt
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_WAIT_LOADCNT 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_WAIT_LOADCNT 0
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_wait_loadcnt_dscnt
+body: |
+ ; CHECK-LABEL: name: skip_wait_loadcnt_dscnt
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_WAIT_LOADCNT_DSCNT 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_WAIT_LOADCNT_DSCNT 0
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_wait_storecnt
+body: |
+ ; CHECK-LABEL: name: skip_wait_storecnt
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_WAIT_STORECNT 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_WAIT_STORECNT 0
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_wait_storecnt_dscnt
+body: |
+ ; CHECK-LABEL: name: skip_wait_storecnt_dscnt
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_WAIT_STORECNT_DSCNT 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_WAIT_STORECNT_DSCNT 0
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_wait_samplecnt
+body: |
+ ; CHECK-LABEL: name: skip_wait_samplecnt
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_WAIT_SAMPLECNT 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_WAIT_SAMPLECNT 0
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_wait_bvhcnt
+body: |
+ ; CHECK-LABEL: name: skip_wait_bvhcnt
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_WAIT_BVHCNT 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_WAIT_BVHCNT 0
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_wait_expcnt
+body: |
+ ; CHECK-LABEL: name: skip_wait_expcnt
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_WAIT_EXPCNT 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_WAIT_EXPCNT 0
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_wait_dscnt
+body: |
+ ; CHECK-LABEL: name: skip_wait_dscnt
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_WAIT_DSCNT 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_WAIT_DSCNT 0
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_wait_kmcnt
+body: |
+ ; CHECK-LABEL: name: skip_wait_kmcnt
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_WAIT_KMCNT 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_WAIT_KMCNT 0
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_wait_idle
+body: |
+ ; CHECK-LABEL: name: skip_wait_idle
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_WAIT_IDLE
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_WAIT_IDLE
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_wait_event
+body: |
+ ; CHECK-LABEL: name: skip_wait_event
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_WAIT_EVENT 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_WAIT_EVENT 0
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_barrier_signal_imm
+body: |
+ ; CHECK-LABEL: name: skip_barrier_signal_imm
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_BARRIER_SIGNAL_IMM -1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+
+ bb.1:
+ successors: %bb.2
+ V_NOP_e32 implicit $exec
+ S_BARRIER_SIGNAL_IMM -1
+
+ bb.2:
+ S_ENDPGM 0
+...
+
+---
+name: skip_barrier_signal_isfirst_imm
+body: |
+ ; CHECK-LABEL: name: skip_barrier_signal_isfirst_imm
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: V_NOP_e32 implicit $exec
+ ; CHECK-NEXT: S_BARRIER_SIGNAL_ISFIRST_IMM -1, implicit-def $scc
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors...
[truncated]
|
You can test this locally with the following command:git-clang-format --diff a77d3ea310c61cf59c1146895b2d51fe014eb0a9 4965fd6d8034fc3680c805aa0127132677146d88 --extensions h,cpp -- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp llvm/lib/Target/AMDGPU/SIInstrInfo.h View the diff from clang-format here.diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index 1712dfe8d4..cc71e4ff3b 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -941,8 +941,7 @@ public:
Opcode == AMDGPU::S_BARRIER_INIT_M0 ||
Opcode == AMDGPU::S_BARRIER_INIT_IMM ||
Opcode == AMDGPU::S_BARRIER_JOIN_IMM ||
- Opcode == AMDGPU::S_BARRIER_LEAVE ||
- Opcode == AMDGPU::DS_GWS_INIT ||
+ Opcode == AMDGPU::S_BARRIER_LEAVE || Opcode == AMDGPU::DS_GWS_INIT ||
Opcode == AMDGPU::DS_GWS_BARRIER;
}
|
7710319
to
08fb1b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable, but I don't really feel super confident about whether this could break anything or fix anything. Does it have any effect on Vulkan CTS?
This change passes Vulkan CTS. Let me clarify why I think this is "correct". We might of course decide that certain barriers should always be executed. |
Add barriers and s_wait_event to hasUnwantedEffectsWhenEXECEmpty. Add a comment documenting the current expected use of the function.
08fb1b2
to
1b2695f
Compare
- Rename isBarrierRelated -> isBarrier - Add DS_GWS instructions to isBarrier - Move comment to header
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/51/builds/1437 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/265 Here is the relevant piece of the build log for the reference:
|
Summary: Add barriers and s_wait_event to hasUnwantedEffectsWhenEXECEmpty. Add a comment documenting the current expected use of the function. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250953
Add barriers and s_wait_event to hasUnwantedEffectsWhenEXECEmpty.
Add a comment documenting the current expected use of the function.