-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] Fix broken MIR generated by gfx11 simulated trap lowering #91652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This was breaking the CFG connection between uses of virtual registers after the trap and their definitions before it. Fixes SWDEV-460384.
@llvm/pr-subscribers-llvm-globalisel Author: Emma Pilkington (epilk) ChangesThis was breaking the CFG connection between uses of virtual registers after the trap and their definitions before it. Fixes SWDEV-460384. Fixes a bug in #85854. Full diff: https://github.com/llvm/llvm-project/pull/91652.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 6599d0abd135c..17e235291e37b 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -2065,14 +2065,26 @@ MachineBasicBlock *SIInstrInfo::insertSimulatedTrap(MachineRegisterInfo &MRI,
.addImm(AMDGPU::SendMsg::ID_INTERRUPT);
BuildMI(MBB, MI, DL, get(AMDGPU::S_MOV_B32), AMDGPU::M0)
.addUse(AMDGPU::TTMP2);
- BuildMI(MBB, MI, DL, get(AMDGPU::S_BRANCH)).addMBB(HaltLoop);
+
+ if (MBB.succ_empty()) {
+ BuildMI(MBB, MI, DL, get(AMDGPU::S_BRANCH)).addMBB(HaltLoop);
+ } else {
+ // HACK: There are some instructions following the trap. Since uses of
+ // virtual registers in SplitBB (or beyond) that were defined before the
+ // trap must be dominated by their definitions, we need SplitBB to be a
+ // successor (even though it's unreachable in practice). This needs to be
+ // represented by a dummy cmp_eq and cbranch to convince analyzeBranch that
+ // SplitBB should indeed be considered a successor.
+ BuildMI(MBB, MI, DL, get(AMDGPU::S_CMP_EQ_U32))
+ .addUse(SetWaveAbortBit)
+ .addUse(SetWaveAbortBit);
+ BuildMI(MBB, MI, DL, get(AMDGPU::S_CBRANCH_SCC1)).addMBB(HaltLoop);
+ }
BuildMI(*HaltLoop, HaltLoop->end(), DL, get(AMDGPU::S_SETHALT)).addImm(5);
BuildMI(*HaltLoop, HaltLoop->end(), DL, get(AMDGPU::S_BRANCH))
.addMBB(HaltLoop);
- if (SplitBB != &MBB)
- MBB.removeSuccessor(SplitBB);
MBB.addSuccessor(HaltLoop);
HaltLoop->addSuccessor(HaltLoop);
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir
index ac98dca00be3d..e217a8be9597d 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir
@@ -8,7 +8,7 @@ name: test_trap
body: |
bb.0:
; GFX1100-LABEL: name: test_trap
- ; GFX1100: successors: %bb.2(0x80000000)
+ ; GFX1100: successors: %bb.1(0x40000000), %bb.2(0x40000000)
; GFX1100-NEXT: {{ $}}
; GFX1100-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
; GFX1100-NEXT: [[C1:%[0-9]+]]:_(p1) = G_CONSTANT i64 0
@@ -21,7 +21,8 @@ body: |
; GFX1100-NEXT: $m0 = S_MOV_B32 [[S_OR_B32_]]
; GFX1100-NEXT: S_SENDMSG 1, implicit $exec, implicit $m0
; GFX1100-NEXT: $m0 = S_MOV_B32 $ttmp2
- ; GFX1100-NEXT: S_BRANCH %bb.2
+ ; GFX1100-NEXT: S_CMP_EQ_U32 [[S_OR_B32_]], [[S_OR_B32_]], implicit-def $scc
+ ; GFX1100-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX1100-NEXT: {{ $}}
; GFX1100-NEXT: .1:
; GFX1100-NEXT: successors:
@@ -45,5 +46,59 @@ body: |
G_STORE %0, %1 :: (store 1, addrspace 1)
G_TRAP
G_STORE %0, %1 :: (store 1, addrspace 1)
+...
+
+---
+name: test_fallthrough_trap
+body: |
+ ; GFX1100-LABEL: name: test_fallthrough_trap
+ ; GFX1100: bb.0:
+ ; GFX1100-NEXT: successors: %bb.1(0x80000000), %bb.2(0x00000000)
+ ; GFX1100-NEXT: {{ $}}
+ ; GFX1100-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; GFX1100-NEXT: [[C1:%[0-9]+]]:_(p1) = G_CONSTANT i64 0
+ ; GFX1100-NEXT: G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+ ; GFX1100-NEXT: S_TRAP 2
+ ; GFX1100-NEXT: [[S_SENDMSG_RTN_B32_:%[0-9]+]]:sreg_32 = S_SENDMSG_RTN_B32 128
+ ; GFX1100-NEXT: $ttmp2 = S_MOV_B32 $m0
+ ; GFX1100-NEXT: [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[S_SENDMSG_RTN_B32_]], 1023, implicit-def $scc
+ ; GFX1100-NEXT: [[S_OR_B32_:%[0-9]+]]:sreg_32 = S_OR_B32 [[S_AND_B32_]], 1024, implicit-def $scc
+ ; GFX1100-NEXT: $m0 = S_MOV_B32 [[S_OR_B32_]]
+ ; GFX1100-NEXT: S_SENDMSG 1, implicit $exec, implicit $m0
+ ; GFX1100-NEXT: $m0 = S_MOV_B32 $ttmp2
+ ; GFX1100-NEXT: S_CMP_EQ_U32 [[S_OR_B32_]], [[S_OR_B32_]], implicit-def $scc
+ ; GFX1100-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
+ ; GFX1100-NEXT: {{ $}}
+ ; GFX1100-NEXT: bb.1:
+ ; GFX1100-NEXT: successors:
+ ; GFX1100-NEXT: {{ $}}
+ ; GFX1100-NEXT: G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+ ; GFX1100-NEXT: {{ $}}
+ ; GFX1100-NEXT: bb.2:
+ ; GFX1100-NEXT: successors: %bb.2(0x80000000)
+ ; GFX1100-NEXT: {{ $}}
+ ; GFX1100-NEXT: S_SETHALT 5
+ ; GFX1100-NEXT: S_BRANCH %bb.2
+ ;
+ ; GFX1150-LABEL: name: test_fallthrough_trap
+ ; GFX1150: bb.0:
+ ; GFX1150-NEXT: successors: %bb.1(0x80000000)
+ ; GFX1150-NEXT: {{ $}}
+ ; GFX1150-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; GFX1150-NEXT: [[C1:%[0-9]+]]:_(p1) = G_CONSTANT i64 0
+ ; GFX1150-NEXT: G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+ ; GFX1150-NEXT: S_TRAP 2
+ ; GFX1150-NEXT: {{ $}}
+ ; GFX1150-NEXT: bb.1:
+ ; GFX1150-NEXT: G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+ bb.0:
+ successors: %bb.1
+
+ %0:_(s8) = G_CONSTANT i8 0
+ %1:_(p1) = G_CONSTANT i64 0
+ G_STORE %0, %1 :: (store 1, addrspace 1)
+ G_TRAP
+ bb.1:
+ G_STORE %0, %1 :: (store 1, addrspace 1)
...
diff --git a/llvm/test/CodeGen/AMDGPU/trap-abis.ll b/llvm/test/CodeGen/AMDGPU/trap-abis.ll
index dcc5fbd142c42..cfc1b47671b07 100644
--- a/llvm/test/CodeGen/AMDGPU/trap-abis.ll
+++ b/llvm/test/CodeGen/AMDGPU/trap-abis.ll
@@ -264,6 +264,142 @@ ret:
ret void
}
+define amdgpu_kernel void @trap_with_use_after(ptr addrspace(1) %arg0, ptr addrspace(1) %arg1) {
+; NOHSA-TRAP-GFX900-LABEL: trap_with_use_after:
+; NOHSA-TRAP-GFX900: ; %bb.0:
+; NOHSA-TRAP-GFX900-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
+; NOHSA-TRAP-GFX900-NEXT: v_mov_b32_e32 v0, 0
+; NOHSA-TRAP-GFX900-NEXT: s_waitcnt lgkmcnt(0)
+; NOHSA-TRAP-GFX900-NEXT: global_load_dword v1, v0, s[0:1] glc
+; NOHSA-TRAP-GFX900-NEXT: s_waitcnt vmcnt(0)
+; NOHSA-TRAP-GFX900-NEXT: s_cbranch_execnz .LBB2_2
+; NOHSA-TRAP-GFX900-NEXT: ; %bb.1:
+; NOHSA-TRAP-GFX900-NEXT: global_store_dword v0, v1, s[2:3]
+; NOHSA-TRAP-GFX900-NEXT: s_waitcnt vmcnt(0)
+; NOHSA-TRAP-GFX900-NEXT: .LBB2_2:
+; NOHSA-TRAP-GFX900-NEXT: s_endpgm
+;
+; HSA-TRAP-GFX803-LABEL: trap_with_use_after:
+; HSA-TRAP-GFX803: ; %bb.0:
+; HSA-TRAP-GFX803-NEXT: s_mov_b64 s[0:1], s[4:5]
+; HSA-TRAP-GFX803-NEXT: s_load_dwordx4 s[4:7], s[6:7], 0x0
+; HSA-TRAP-GFX803-NEXT: s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX803-NEXT: v_mov_b32_e32 v0, s4
+; HSA-TRAP-GFX803-NEXT: v_mov_b32_e32 v1, s5
+; HSA-TRAP-GFX803-NEXT: flat_load_dword v2, v[0:1] glc
+; HSA-TRAP-GFX803-NEXT: s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX803-NEXT: v_mov_b32_e32 v0, s6
+; HSA-TRAP-GFX803-NEXT: v_mov_b32_e32 v1, s7
+; HSA-TRAP-GFX803-NEXT: s_trap 2
+; HSA-TRAP-GFX803-NEXT: flat_store_dword v[0:1], v2
+; HSA-TRAP-GFX803-NEXT: s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX803-NEXT: s_endpgm
+;
+; HSA-TRAP-GFX900-LABEL: trap_with_use_after:
+; HSA-TRAP-GFX900: ; %bb.0:
+; HSA-TRAP-GFX900-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
+; HSA-TRAP-GFX900-NEXT: v_mov_b32_e32 v0, 0
+; HSA-TRAP-GFX900-NEXT: s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX900-NEXT: global_load_dword v1, v0, s[0:1] glc
+; HSA-TRAP-GFX900-NEXT: s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX900-NEXT: s_trap 2
+; HSA-TRAP-GFX900-NEXT: global_store_dword v0, v1, s[2:3]
+; HSA-TRAP-GFX900-NEXT: s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX900-NEXT: s_endpgm
+;
+; HSA-NOTRAP-GFX900-LABEL: trap_with_use_after:
+; HSA-NOTRAP-GFX900: ; %bb.0:
+; HSA-NOTRAP-GFX900-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
+; HSA-NOTRAP-GFX900-NEXT: v_mov_b32_e32 v0, 0
+; HSA-NOTRAP-GFX900-NEXT: s_waitcnt lgkmcnt(0)
+; HSA-NOTRAP-GFX900-NEXT: global_load_dword v1, v0, s[0:1] glc
+; HSA-NOTRAP-GFX900-NEXT: s_waitcnt vmcnt(0)
+; HSA-NOTRAP-GFX900-NEXT: s_cbranch_execnz .LBB2_2
+; HSA-NOTRAP-GFX900-NEXT: ; %bb.1:
+; HSA-NOTRAP-GFX900-NEXT: global_store_dword v0, v1, s[2:3]
+; HSA-NOTRAP-GFX900-NEXT: s_waitcnt vmcnt(0)
+; HSA-NOTRAP-GFX900-NEXT: .LBB2_2:
+; HSA-NOTRAP-GFX900-NEXT: s_endpgm
+;
+; HSA-TRAP-GFX1100-LABEL: trap_with_use_after:
+; HSA-TRAP-GFX1100: ; %bb.0:
+; HSA-TRAP-GFX1100-NEXT: s_load_b128 s[0:3], s[0:1], 0x0
+; HSA-TRAP-GFX1100-NEXT: v_mov_b32_e32 v0, 0
+; HSA-TRAP-GFX1100-NEXT: s_mov_b32 ttmp2, m0
+; HSA-TRAP-GFX1100-NEXT: s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-NEXT: global_load_b32 v1, v0, s[0:1] glc dlc
+; HSA-TRAP-GFX1100-NEXT: s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX1100-NEXT: s_trap 2
+; HSA-TRAP-GFX1100-NEXT: s_sendmsg_rtn_b32 s0, sendmsg(MSG_RTN_GET_DOORBELL)
+; HSA-TRAP-GFX1100-NEXT: s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-NEXT: s_and_b32 s0, s0, 0x3ff
+; HSA-TRAP-GFX1100-NEXT: s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
+; HSA-TRAP-GFX1100-NEXT: s_bitset1_b32 s0, 10
+; HSA-TRAP-GFX1100-NEXT: s_cmp_eq_u32 s0, s0
+; HSA-TRAP-GFX1100-NEXT: s_mov_b32 m0, s0
+; HSA-TRAP-GFX1100-NEXT: s_sendmsg sendmsg(MSG_INTERRUPT)
+; HSA-TRAP-GFX1100-NEXT: s_mov_b32 m0, ttmp2
+; HSA-TRAP-GFX1100-NEXT: s_cbranch_scc1 .LBB2_2
+; HSA-TRAP-GFX1100-NEXT: ; %bb.1:
+; HSA-TRAP-GFX1100-NEXT: global_store_b32 v0, v1, s[2:3] dlc
+; HSA-TRAP-GFX1100-NEXT: s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-NEXT: s_nop 0
+; HSA-TRAP-GFX1100-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; HSA-TRAP-GFX1100-NEXT: s_endpgm
+; HSA-TRAP-GFX1100-NEXT: .LBB2_2: ; =>This Inner Loop Header: Depth=1
+; HSA-TRAP-GFX1100-NEXT: s_sethalt 5
+; HSA-TRAP-GFX1100-NEXT: s_branch .LBB2_2
+;
+; HSA-TRAP-GFX1100-O0-LABEL: trap_with_use_after:
+; HSA-TRAP-GFX1100-O0: ; %bb.0:
+; HSA-TRAP-GFX1100-O0-NEXT: ; implicit-def: $vgpr1 : SGPR spill to VGPR lane
+; HSA-TRAP-GFX1100-O0-NEXT: v_mov_b32_e32 v0, 0
+; HSA-TRAP-GFX1100-O0-NEXT: scratch_store_b32 off, v0, off offset:8 ; 4-byte Folded Spill
+; HSA-TRAP-GFX1100-O0-NEXT: s_load_b64 s[0:1], s[4:5], 0x0
+; HSA-TRAP-GFX1100-O0-NEXT: s_load_b64 s[2:3], s[4:5], 0x8
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT: v_writelane_b32 v1, s2, 0
+; HSA-TRAP-GFX1100-O0-NEXT: v_writelane_b32 v1, s3, 1
+; HSA-TRAP-GFX1100-O0-NEXT: s_or_saveexec_b32 s6, -1
+; HSA-TRAP-GFX1100-O0-NEXT: scratch_store_b32 off, v1, off offset:4 ; 4-byte Folded Spill
+; HSA-TRAP-GFX1100-O0-NEXT: s_mov_b32 exec_lo, s6
+; HSA-TRAP-GFX1100-O0-NEXT: global_load_b32 v0, v0, s[0:1] glc dlc
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT: scratch_store_b32 off, v0, off ; 4-byte Folded Spill
+; HSA-TRAP-GFX1100-O0-NEXT: s_trap 2
+; HSA-TRAP-GFX1100-O0-NEXT: s_sendmsg_rtn_b32 s0, sendmsg(MSG_RTN_GET_DOORBELL)
+; HSA-TRAP-GFX1100-O0-NEXT: s_mov_b32 ttmp2, m0
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT: s_and_b32 s0, s0, 0x3ff
+; HSA-TRAP-GFX1100-O0-NEXT: s_or_b32 s0, s0, 0x400
+; HSA-TRAP-GFX1100-O0-NEXT: s_mov_b32 m0, s0
+; HSA-TRAP-GFX1100-O0-NEXT: s_sendmsg sendmsg(MSG_INTERRUPT)
+; HSA-TRAP-GFX1100-O0-NEXT: s_mov_b32 m0, ttmp2
+; HSA-TRAP-GFX1100-O0-NEXT: s_cmp_eq_u32 s0, s0
+; HSA-TRAP-GFX1100-O0-NEXT: s_cbranch_scc1 .LBB2_2
+; HSA-TRAP-GFX1100-O0-NEXT: ; %bb.1:
+; HSA-TRAP-GFX1100-O0-NEXT: s_or_saveexec_b32 s6, -1
+; HSA-TRAP-GFX1100-O0-NEXT: scratch_load_b32 v0, off, off offset:4 ; 4-byte Folded Reload
+; HSA-TRAP-GFX1100-O0-NEXT: s_mov_b32 exec_lo, s6
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT: v_readlane_b32 s0, v0, 0
+; HSA-TRAP-GFX1100-O0-NEXT: v_readlane_b32 s1, v0, 1
+; HSA-TRAP-GFX1100-O0-NEXT: scratch_load_b32 v1, off, off offset:8 ; 4-byte Folded Reload
+; HSA-TRAP-GFX1100-O0-NEXT: scratch_load_b32 v2, off, off ; 4-byte Folded Reload
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT: global_store_b32 v1, v2, s[0:1] dlc
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-O0-NEXT: ; kill: killed $vgpr0
+; HSA-TRAP-GFX1100-O0-NEXT: s_endpgm
+; HSA-TRAP-GFX1100-O0-NEXT: .LBB2_2: ; =>This Inner Loop Header: Depth=1
+; HSA-TRAP-GFX1100-O0-NEXT: s_sethalt 5
+; HSA-TRAP-GFX1100-O0-NEXT: s_branch .LBB2_2
+ %tmp = load volatile i32, ptr addrspace(1) %arg0
+ call void @llvm.trap()
+ store volatile i32 %tmp, ptr addrspace(1) %arg1
+ ret void
+}
+
define amdgpu_kernel void @debugtrap(ptr addrspace(1) nocapture readonly %arg0) {
; NOHSA-TRAP-GFX900-LABEL: debugtrap:
; NOHSA-TRAP-GFX900: ; %bb.0:
@@ -334,6 +470,20 @@ define amdgpu_kernel void @debugtrap(ptr addrspace(1) nocapture readonly %arg0)
; HSA-TRAP-GFX1100-NEXT: s_nop 0
; HSA-TRAP-GFX1100-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; HSA-TRAP-GFX1100-NEXT: s_endpgm
+;
+; HSA-TRAP-GFX1100-O0-LABEL: debugtrap:
+; HSA-TRAP-GFX1100-O0: ; %bb.0:
+; HSA-TRAP-GFX1100-O0-NEXT: s_load_b64 s[0:1], s[4:5], 0x0
+; HSA-TRAP-GFX1100-O0-NEXT: v_mov_b32_e32 v0, 0
+; HSA-TRAP-GFX1100-O0-NEXT: v_mov_b32_e32 v1, 1
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT: global_store_b32 v0, v1, s[0:1] dlc
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-O0-NEXT: s_trap 3
+; HSA-TRAP-GFX1100-O0-NEXT: v_mov_b32_e32 v1, 2
+; HSA-TRAP-GFX1100-O0-NEXT: global_store_b32 v0, v1, s[0:1] dlc
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-O0-NEXT: s_endpgm
store volatile i32 1, ptr addrspace(1) %arg0
call void @llvm.debugtrap()
store volatile i32 2, ptr addrspace(1) %arg0
|
@llvm/pr-subscribers-backend-amdgpu Author: Emma Pilkington (epilk) ChangesThis was breaking the CFG connection between uses of virtual registers after the trap and their definitions before it. Fixes SWDEV-460384. Fixes a bug in #85854. Full diff: https://github.com/llvm/llvm-project/pull/91652.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 6599d0abd135c..17e235291e37b 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -2065,14 +2065,26 @@ MachineBasicBlock *SIInstrInfo::insertSimulatedTrap(MachineRegisterInfo &MRI,
.addImm(AMDGPU::SendMsg::ID_INTERRUPT);
BuildMI(MBB, MI, DL, get(AMDGPU::S_MOV_B32), AMDGPU::M0)
.addUse(AMDGPU::TTMP2);
- BuildMI(MBB, MI, DL, get(AMDGPU::S_BRANCH)).addMBB(HaltLoop);
+
+ if (MBB.succ_empty()) {
+ BuildMI(MBB, MI, DL, get(AMDGPU::S_BRANCH)).addMBB(HaltLoop);
+ } else {
+ // HACK: There are some instructions following the trap. Since uses of
+ // virtual registers in SplitBB (or beyond) that were defined before the
+ // trap must be dominated by their definitions, we need SplitBB to be a
+ // successor (even though it's unreachable in practice). This needs to be
+ // represented by a dummy cmp_eq and cbranch to convince analyzeBranch that
+ // SplitBB should indeed be considered a successor.
+ BuildMI(MBB, MI, DL, get(AMDGPU::S_CMP_EQ_U32))
+ .addUse(SetWaveAbortBit)
+ .addUse(SetWaveAbortBit);
+ BuildMI(MBB, MI, DL, get(AMDGPU::S_CBRANCH_SCC1)).addMBB(HaltLoop);
+ }
BuildMI(*HaltLoop, HaltLoop->end(), DL, get(AMDGPU::S_SETHALT)).addImm(5);
BuildMI(*HaltLoop, HaltLoop->end(), DL, get(AMDGPU::S_BRANCH))
.addMBB(HaltLoop);
- if (SplitBB != &MBB)
- MBB.removeSuccessor(SplitBB);
MBB.addSuccessor(HaltLoop);
HaltLoop->addSuccessor(HaltLoop);
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir
index ac98dca00be3d..e217a8be9597d 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir
@@ -8,7 +8,7 @@ name: test_trap
body: |
bb.0:
; GFX1100-LABEL: name: test_trap
- ; GFX1100: successors: %bb.2(0x80000000)
+ ; GFX1100: successors: %bb.1(0x40000000), %bb.2(0x40000000)
; GFX1100-NEXT: {{ $}}
; GFX1100-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
; GFX1100-NEXT: [[C1:%[0-9]+]]:_(p1) = G_CONSTANT i64 0
@@ -21,7 +21,8 @@ body: |
; GFX1100-NEXT: $m0 = S_MOV_B32 [[S_OR_B32_]]
; GFX1100-NEXT: S_SENDMSG 1, implicit $exec, implicit $m0
; GFX1100-NEXT: $m0 = S_MOV_B32 $ttmp2
- ; GFX1100-NEXT: S_BRANCH %bb.2
+ ; GFX1100-NEXT: S_CMP_EQ_U32 [[S_OR_B32_]], [[S_OR_B32_]], implicit-def $scc
+ ; GFX1100-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX1100-NEXT: {{ $}}
; GFX1100-NEXT: .1:
; GFX1100-NEXT: successors:
@@ -45,5 +46,59 @@ body: |
G_STORE %0, %1 :: (store 1, addrspace 1)
G_TRAP
G_STORE %0, %1 :: (store 1, addrspace 1)
+...
+
+---
+name: test_fallthrough_trap
+body: |
+ ; GFX1100-LABEL: name: test_fallthrough_trap
+ ; GFX1100: bb.0:
+ ; GFX1100-NEXT: successors: %bb.1(0x80000000), %bb.2(0x00000000)
+ ; GFX1100-NEXT: {{ $}}
+ ; GFX1100-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; GFX1100-NEXT: [[C1:%[0-9]+]]:_(p1) = G_CONSTANT i64 0
+ ; GFX1100-NEXT: G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+ ; GFX1100-NEXT: S_TRAP 2
+ ; GFX1100-NEXT: [[S_SENDMSG_RTN_B32_:%[0-9]+]]:sreg_32 = S_SENDMSG_RTN_B32 128
+ ; GFX1100-NEXT: $ttmp2 = S_MOV_B32 $m0
+ ; GFX1100-NEXT: [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[S_SENDMSG_RTN_B32_]], 1023, implicit-def $scc
+ ; GFX1100-NEXT: [[S_OR_B32_:%[0-9]+]]:sreg_32 = S_OR_B32 [[S_AND_B32_]], 1024, implicit-def $scc
+ ; GFX1100-NEXT: $m0 = S_MOV_B32 [[S_OR_B32_]]
+ ; GFX1100-NEXT: S_SENDMSG 1, implicit $exec, implicit $m0
+ ; GFX1100-NEXT: $m0 = S_MOV_B32 $ttmp2
+ ; GFX1100-NEXT: S_CMP_EQ_U32 [[S_OR_B32_]], [[S_OR_B32_]], implicit-def $scc
+ ; GFX1100-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
+ ; GFX1100-NEXT: {{ $}}
+ ; GFX1100-NEXT: bb.1:
+ ; GFX1100-NEXT: successors:
+ ; GFX1100-NEXT: {{ $}}
+ ; GFX1100-NEXT: G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+ ; GFX1100-NEXT: {{ $}}
+ ; GFX1100-NEXT: bb.2:
+ ; GFX1100-NEXT: successors: %bb.2(0x80000000)
+ ; GFX1100-NEXT: {{ $}}
+ ; GFX1100-NEXT: S_SETHALT 5
+ ; GFX1100-NEXT: S_BRANCH %bb.2
+ ;
+ ; GFX1150-LABEL: name: test_fallthrough_trap
+ ; GFX1150: bb.0:
+ ; GFX1150-NEXT: successors: %bb.1(0x80000000)
+ ; GFX1150-NEXT: {{ $}}
+ ; GFX1150-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; GFX1150-NEXT: [[C1:%[0-9]+]]:_(p1) = G_CONSTANT i64 0
+ ; GFX1150-NEXT: G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+ ; GFX1150-NEXT: S_TRAP 2
+ ; GFX1150-NEXT: {{ $}}
+ ; GFX1150-NEXT: bb.1:
+ ; GFX1150-NEXT: G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+ bb.0:
+ successors: %bb.1
+
+ %0:_(s8) = G_CONSTANT i8 0
+ %1:_(p1) = G_CONSTANT i64 0
+ G_STORE %0, %1 :: (store 1, addrspace 1)
+ G_TRAP
+ bb.1:
+ G_STORE %0, %1 :: (store 1, addrspace 1)
...
diff --git a/llvm/test/CodeGen/AMDGPU/trap-abis.ll b/llvm/test/CodeGen/AMDGPU/trap-abis.ll
index dcc5fbd142c42..cfc1b47671b07 100644
--- a/llvm/test/CodeGen/AMDGPU/trap-abis.ll
+++ b/llvm/test/CodeGen/AMDGPU/trap-abis.ll
@@ -264,6 +264,142 @@ ret:
ret void
}
+define amdgpu_kernel void @trap_with_use_after(ptr addrspace(1) %arg0, ptr addrspace(1) %arg1) {
+; NOHSA-TRAP-GFX900-LABEL: trap_with_use_after:
+; NOHSA-TRAP-GFX900: ; %bb.0:
+; NOHSA-TRAP-GFX900-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
+; NOHSA-TRAP-GFX900-NEXT: v_mov_b32_e32 v0, 0
+; NOHSA-TRAP-GFX900-NEXT: s_waitcnt lgkmcnt(0)
+; NOHSA-TRAP-GFX900-NEXT: global_load_dword v1, v0, s[0:1] glc
+; NOHSA-TRAP-GFX900-NEXT: s_waitcnt vmcnt(0)
+; NOHSA-TRAP-GFX900-NEXT: s_cbranch_execnz .LBB2_2
+; NOHSA-TRAP-GFX900-NEXT: ; %bb.1:
+; NOHSA-TRAP-GFX900-NEXT: global_store_dword v0, v1, s[2:3]
+; NOHSA-TRAP-GFX900-NEXT: s_waitcnt vmcnt(0)
+; NOHSA-TRAP-GFX900-NEXT: .LBB2_2:
+; NOHSA-TRAP-GFX900-NEXT: s_endpgm
+;
+; HSA-TRAP-GFX803-LABEL: trap_with_use_after:
+; HSA-TRAP-GFX803: ; %bb.0:
+; HSA-TRAP-GFX803-NEXT: s_mov_b64 s[0:1], s[4:5]
+; HSA-TRAP-GFX803-NEXT: s_load_dwordx4 s[4:7], s[6:7], 0x0
+; HSA-TRAP-GFX803-NEXT: s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX803-NEXT: v_mov_b32_e32 v0, s4
+; HSA-TRAP-GFX803-NEXT: v_mov_b32_e32 v1, s5
+; HSA-TRAP-GFX803-NEXT: flat_load_dword v2, v[0:1] glc
+; HSA-TRAP-GFX803-NEXT: s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX803-NEXT: v_mov_b32_e32 v0, s6
+; HSA-TRAP-GFX803-NEXT: v_mov_b32_e32 v1, s7
+; HSA-TRAP-GFX803-NEXT: s_trap 2
+; HSA-TRAP-GFX803-NEXT: flat_store_dword v[0:1], v2
+; HSA-TRAP-GFX803-NEXT: s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX803-NEXT: s_endpgm
+;
+; HSA-TRAP-GFX900-LABEL: trap_with_use_after:
+; HSA-TRAP-GFX900: ; %bb.0:
+; HSA-TRAP-GFX900-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
+; HSA-TRAP-GFX900-NEXT: v_mov_b32_e32 v0, 0
+; HSA-TRAP-GFX900-NEXT: s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX900-NEXT: global_load_dword v1, v0, s[0:1] glc
+; HSA-TRAP-GFX900-NEXT: s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX900-NEXT: s_trap 2
+; HSA-TRAP-GFX900-NEXT: global_store_dword v0, v1, s[2:3]
+; HSA-TRAP-GFX900-NEXT: s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX900-NEXT: s_endpgm
+;
+; HSA-NOTRAP-GFX900-LABEL: trap_with_use_after:
+; HSA-NOTRAP-GFX900: ; %bb.0:
+; HSA-NOTRAP-GFX900-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
+; HSA-NOTRAP-GFX900-NEXT: v_mov_b32_e32 v0, 0
+; HSA-NOTRAP-GFX900-NEXT: s_waitcnt lgkmcnt(0)
+; HSA-NOTRAP-GFX900-NEXT: global_load_dword v1, v0, s[0:1] glc
+; HSA-NOTRAP-GFX900-NEXT: s_waitcnt vmcnt(0)
+; HSA-NOTRAP-GFX900-NEXT: s_cbranch_execnz .LBB2_2
+; HSA-NOTRAP-GFX900-NEXT: ; %bb.1:
+; HSA-NOTRAP-GFX900-NEXT: global_store_dword v0, v1, s[2:3]
+; HSA-NOTRAP-GFX900-NEXT: s_waitcnt vmcnt(0)
+; HSA-NOTRAP-GFX900-NEXT: .LBB2_2:
+; HSA-NOTRAP-GFX900-NEXT: s_endpgm
+;
+; HSA-TRAP-GFX1100-LABEL: trap_with_use_after:
+; HSA-TRAP-GFX1100: ; %bb.0:
+; HSA-TRAP-GFX1100-NEXT: s_load_b128 s[0:3], s[0:1], 0x0
+; HSA-TRAP-GFX1100-NEXT: v_mov_b32_e32 v0, 0
+; HSA-TRAP-GFX1100-NEXT: s_mov_b32 ttmp2, m0
+; HSA-TRAP-GFX1100-NEXT: s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-NEXT: global_load_b32 v1, v0, s[0:1] glc dlc
+; HSA-TRAP-GFX1100-NEXT: s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX1100-NEXT: s_trap 2
+; HSA-TRAP-GFX1100-NEXT: s_sendmsg_rtn_b32 s0, sendmsg(MSG_RTN_GET_DOORBELL)
+; HSA-TRAP-GFX1100-NEXT: s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-NEXT: s_and_b32 s0, s0, 0x3ff
+; HSA-TRAP-GFX1100-NEXT: s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
+; HSA-TRAP-GFX1100-NEXT: s_bitset1_b32 s0, 10
+; HSA-TRAP-GFX1100-NEXT: s_cmp_eq_u32 s0, s0
+; HSA-TRAP-GFX1100-NEXT: s_mov_b32 m0, s0
+; HSA-TRAP-GFX1100-NEXT: s_sendmsg sendmsg(MSG_INTERRUPT)
+; HSA-TRAP-GFX1100-NEXT: s_mov_b32 m0, ttmp2
+; HSA-TRAP-GFX1100-NEXT: s_cbranch_scc1 .LBB2_2
+; HSA-TRAP-GFX1100-NEXT: ; %bb.1:
+; HSA-TRAP-GFX1100-NEXT: global_store_b32 v0, v1, s[2:3] dlc
+; HSA-TRAP-GFX1100-NEXT: s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-NEXT: s_nop 0
+; HSA-TRAP-GFX1100-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; HSA-TRAP-GFX1100-NEXT: s_endpgm
+; HSA-TRAP-GFX1100-NEXT: .LBB2_2: ; =>This Inner Loop Header: Depth=1
+; HSA-TRAP-GFX1100-NEXT: s_sethalt 5
+; HSA-TRAP-GFX1100-NEXT: s_branch .LBB2_2
+;
+; HSA-TRAP-GFX1100-O0-LABEL: trap_with_use_after:
+; HSA-TRAP-GFX1100-O0: ; %bb.0:
+; HSA-TRAP-GFX1100-O0-NEXT: ; implicit-def: $vgpr1 : SGPR spill to VGPR lane
+; HSA-TRAP-GFX1100-O0-NEXT: v_mov_b32_e32 v0, 0
+; HSA-TRAP-GFX1100-O0-NEXT: scratch_store_b32 off, v0, off offset:8 ; 4-byte Folded Spill
+; HSA-TRAP-GFX1100-O0-NEXT: s_load_b64 s[0:1], s[4:5], 0x0
+; HSA-TRAP-GFX1100-O0-NEXT: s_load_b64 s[2:3], s[4:5], 0x8
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT: v_writelane_b32 v1, s2, 0
+; HSA-TRAP-GFX1100-O0-NEXT: v_writelane_b32 v1, s3, 1
+; HSA-TRAP-GFX1100-O0-NEXT: s_or_saveexec_b32 s6, -1
+; HSA-TRAP-GFX1100-O0-NEXT: scratch_store_b32 off, v1, off offset:4 ; 4-byte Folded Spill
+; HSA-TRAP-GFX1100-O0-NEXT: s_mov_b32 exec_lo, s6
+; HSA-TRAP-GFX1100-O0-NEXT: global_load_b32 v0, v0, s[0:1] glc dlc
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT: scratch_store_b32 off, v0, off ; 4-byte Folded Spill
+; HSA-TRAP-GFX1100-O0-NEXT: s_trap 2
+; HSA-TRAP-GFX1100-O0-NEXT: s_sendmsg_rtn_b32 s0, sendmsg(MSG_RTN_GET_DOORBELL)
+; HSA-TRAP-GFX1100-O0-NEXT: s_mov_b32 ttmp2, m0
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT: s_and_b32 s0, s0, 0x3ff
+; HSA-TRAP-GFX1100-O0-NEXT: s_or_b32 s0, s0, 0x400
+; HSA-TRAP-GFX1100-O0-NEXT: s_mov_b32 m0, s0
+; HSA-TRAP-GFX1100-O0-NEXT: s_sendmsg sendmsg(MSG_INTERRUPT)
+; HSA-TRAP-GFX1100-O0-NEXT: s_mov_b32 m0, ttmp2
+; HSA-TRAP-GFX1100-O0-NEXT: s_cmp_eq_u32 s0, s0
+; HSA-TRAP-GFX1100-O0-NEXT: s_cbranch_scc1 .LBB2_2
+; HSA-TRAP-GFX1100-O0-NEXT: ; %bb.1:
+; HSA-TRAP-GFX1100-O0-NEXT: s_or_saveexec_b32 s6, -1
+; HSA-TRAP-GFX1100-O0-NEXT: scratch_load_b32 v0, off, off offset:4 ; 4-byte Folded Reload
+; HSA-TRAP-GFX1100-O0-NEXT: s_mov_b32 exec_lo, s6
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT: v_readlane_b32 s0, v0, 0
+; HSA-TRAP-GFX1100-O0-NEXT: v_readlane_b32 s1, v0, 1
+; HSA-TRAP-GFX1100-O0-NEXT: scratch_load_b32 v1, off, off offset:8 ; 4-byte Folded Reload
+; HSA-TRAP-GFX1100-O0-NEXT: scratch_load_b32 v2, off, off ; 4-byte Folded Reload
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT: global_store_b32 v1, v2, s[0:1] dlc
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-O0-NEXT: ; kill: killed $vgpr0
+; HSA-TRAP-GFX1100-O0-NEXT: s_endpgm
+; HSA-TRAP-GFX1100-O0-NEXT: .LBB2_2: ; =>This Inner Loop Header: Depth=1
+; HSA-TRAP-GFX1100-O0-NEXT: s_sethalt 5
+; HSA-TRAP-GFX1100-O0-NEXT: s_branch .LBB2_2
+ %tmp = load volatile i32, ptr addrspace(1) %arg0
+ call void @llvm.trap()
+ store volatile i32 %tmp, ptr addrspace(1) %arg1
+ ret void
+}
+
define amdgpu_kernel void @debugtrap(ptr addrspace(1) nocapture readonly %arg0) {
; NOHSA-TRAP-GFX900-LABEL: debugtrap:
; NOHSA-TRAP-GFX900: ; %bb.0:
@@ -334,6 +470,20 @@ define amdgpu_kernel void @debugtrap(ptr addrspace(1) nocapture readonly %arg0)
; HSA-TRAP-GFX1100-NEXT: s_nop 0
; HSA-TRAP-GFX1100-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; HSA-TRAP-GFX1100-NEXT: s_endpgm
+;
+; HSA-TRAP-GFX1100-O0-LABEL: debugtrap:
+; HSA-TRAP-GFX1100-O0: ; %bb.0:
+; HSA-TRAP-GFX1100-O0-NEXT: s_load_b64 s[0:1], s[4:5], 0x0
+; HSA-TRAP-GFX1100-O0-NEXT: v_mov_b32_e32 v0, 0
+; HSA-TRAP-GFX1100-O0-NEXT: v_mov_b32_e32 v1, 1
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT: global_store_b32 v0, v1, s[0:1] dlc
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-O0-NEXT: s_trap 3
+; HSA-TRAP-GFX1100-O0-NEXT: v_mov_b32_e32 v1, 2
+; HSA-TRAP-GFX1100-O0-NEXT: global_store_b32 v0, v1, s[0:1] dlc
+; HSA-TRAP-GFX1100-O0-NEXT: s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-O0-NEXT: s_endpgm
store volatile i32 1, ptr addrspace(1) %arg0
call void @llvm.debugtrap()
store volatile i32 2, ptr addrspace(1) %arg0
|
if (MBB.succ_empty()) { | ||
BuildMI(MBB, MI, DL, get(AMDGPU::S_BRANCH)).addMBB(HaltLoop); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kill and ENDPGM_TRAP have the same problem, but this handling looks different. Can you follow what they do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, fixed in the update. Did you also mean using s_cbranch_execnz
? I'm not certain why that's actually correct, can we just assume that exec is non-zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's part of it. If exec is zero no lanes are executing so no trap should happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so we should be guarding the whole trap sequence with a c_branch_execnz
? I did this in the update. I think what was confusing me was that the other lowering for the trap intrinsic is an unconditional s_trap 2
. I guess the argument is that should be okay since we don't delete the guarding s_cbranch_execz
if it jumps over a trap here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think about it, this should be unnecessary? By default we assume at least one line is active and have to not remove branches over special instructions that should do nothing with exec 0, like trap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need a "fake" branch so we don't disconnect definitions before the trap with their uses after it in the CFG (this was what I was trying to do with the s_cmp %0,%0; s_cbranch_scc1
thing). I guess we should prefer c_branch_execnz since it seems like a more compact way of representing a "conditional unconditional" branch.
Incidentally, I guess this comment is not correct (the s_cbranch_execnz isn't "optional")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That comment is correct. In that context there's no special case instructions being skipped, like trap, that must be skipped if no lanes are active
if (MBB.succ_empty()) { | ||
BuildMI(MBB, MI, DL, get(AMDGPU::S_BRANCH)).addMBB(HaltLoop); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's part of it. If exec is zero no lanes are executing so no trap should happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The branching is cumbersome, but I guess it doesn't really matter since this should only appear at -O0 anyway
On gfx11 shaders run with PRIV=1, which causes `s_trap 2` to be treated as a nop, which means it isn't a correct lowering for the trap intrinsic. As a workaround, this commit instead lowers the trap intrinsic to instructions that simulate the behavior of s_trap 2. This commit also includes the following commit, fixing a bug in the original version: > 9e0be65 > [AMDGPU] Fix broken MIR generated by gfx11 simulated trap lowering (llvm#91652) > This was breaking the CFG connection between uses of virtual registers > after the trap and their definitions before it. Fixes SWDEV-460384. > Fixes a bug in llvm#85854. Fixes: SWDEV-438421 Change-Id: I58244d69370abe3c57f5239f196cb284a5407530
On gfx11 shaders run with PRIV=1, which causes `s_trap 2` to be treated as a nop, which means it isn't a correct lowering for the trap intrinsic. As a workaround, this commit instead lowers the trap intrinsic to instructions that simulate the behavior of s_trap 2. This commit also includes the following commit, fixing a bug in the original version: > 9e0be65 > [AMDGPU] Fix broken MIR generated by gfx11 simulated trap lowering (llvm#91652) > This was breaking the CFG connection between uses of virtual registers > after the trap and their definitions before it. Fixes SWDEV-460384. > Fixes a bug in llvm#85854. Fixes: SWDEV-438421 Change-Id: I58244d69370abe3c57f5239f196cb284a5407530
This was breaking the CFG connection between uses of virtual registers after the trap and their definitions before it. Fixes SWDEV-460384.
Fixes a bug in #85854.