Skip to content

[AMDGPU] Fix broken MIR generated by gfx11 simulated trap lowering #91652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 22, 2024

Conversation

epilk
Copy link
Member

@epilk epilk commented May 9, 2024

This was breaking the CFG connection between uses of virtual registers after the trap and their definitions before it. Fixes SWDEV-460384.

Fixes a bug in #85854.

This was breaking the CFG connection between uses of virtual registers
after the trap and their definitions before it. Fixes SWDEV-460384.
@llvmbot
Copy link
Member

llvmbot commented May 9, 2024

@llvm/pr-subscribers-llvm-globalisel

Author: Emma Pilkington (epilk)

Changes

This was breaking the CFG connection between uses of virtual registers after the trap and their definitions before it. Fixes SWDEV-460384.

Fixes a bug in #85854.


Full diff: https://github.com/llvm/llvm-project/pull/91652.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+15-3)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir (+57-2)
  • (modified) llvm/test/CodeGen/AMDGPU/trap-abis.ll (+150)
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 6599d0abd135c..17e235291e37b 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -2065,14 +2065,26 @@ MachineBasicBlock *SIInstrInfo::insertSimulatedTrap(MachineRegisterInfo &MRI,
       .addImm(AMDGPU::SendMsg::ID_INTERRUPT);
   BuildMI(MBB, MI, DL, get(AMDGPU::S_MOV_B32), AMDGPU::M0)
       .addUse(AMDGPU::TTMP2);
-  BuildMI(MBB, MI, DL, get(AMDGPU::S_BRANCH)).addMBB(HaltLoop);
+
+  if (MBB.succ_empty()) {
+    BuildMI(MBB, MI, DL, get(AMDGPU::S_BRANCH)).addMBB(HaltLoop);
+  } else {
+    // HACK: There are some instructions following the trap. Since uses of
+    // virtual registers in SplitBB (or beyond) that were defined before the
+    // trap must be dominated by their definitions, we need SplitBB to be a
+    // successor (even though it's unreachable in practice). This needs to be
+    // represented by a dummy cmp_eq and cbranch to convince analyzeBranch that
+    // SplitBB should indeed be considered a successor.
+    BuildMI(MBB, MI, DL, get(AMDGPU::S_CMP_EQ_U32))
+        .addUse(SetWaveAbortBit)
+        .addUse(SetWaveAbortBit);
+    BuildMI(MBB, MI, DL, get(AMDGPU::S_CBRANCH_SCC1)).addMBB(HaltLoop);
+  }
 
   BuildMI(*HaltLoop, HaltLoop->end(), DL, get(AMDGPU::S_SETHALT)).addImm(5);
   BuildMI(*HaltLoop, HaltLoop->end(), DL, get(AMDGPU::S_BRANCH))
       .addMBB(HaltLoop);
 
-  if (SplitBB != &MBB)
-    MBB.removeSuccessor(SplitBB);
   MBB.addSuccessor(HaltLoop);
   HaltLoop->addSuccessor(HaltLoop);
 
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir
index ac98dca00be3d..e217a8be9597d 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir
@@ -8,7 +8,7 @@ name: test_trap
 body: |
   bb.0:
     ; GFX1100-LABEL: name: test_trap
-    ; GFX1100: successors: %bb.2(0x80000000)
+    ; GFX1100: successors: %bb.1(0x40000000), %bb.2(0x40000000)
     ; GFX1100-NEXT: {{  $}}
     ; GFX1100-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
     ; GFX1100-NEXT: [[C1:%[0-9]+]]:_(p1) = G_CONSTANT i64 0
@@ -21,7 +21,8 @@ body: |
     ; GFX1100-NEXT: $m0 = S_MOV_B32 [[S_OR_B32_]]
     ; GFX1100-NEXT: S_SENDMSG 1, implicit $exec, implicit $m0
     ; GFX1100-NEXT: $m0 = S_MOV_B32 $ttmp2
-    ; GFX1100-NEXT: S_BRANCH %bb.2
+    ; GFX1100-NEXT: S_CMP_EQ_U32 [[S_OR_B32_]], [[S_OR_B32_]], implicit-def $scc
+    ; GFX1100-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
     ; GFX1100-NEXT: {{  $}}
     ; GFX1100-NEXT: .1:
     ; GFX1100-NEXT: successors:
@@ -45,5 +46,59 @@ body: |
     G_STORE %0, %1 :: (store 1, addrspace 1)
     G_TRAP
     G_STORE %0, %1 :: (store 1, addrspace 1)
+...
+
+---
+name: test_fallthrough_trap
+body: |
+  ; GFX1100-LABEL: name: test_fallthrough_trap
+  ; GFX1100: bb.0:
+  ; GFX1100-NEXT:   successors: %bb.1(0x80000000), %bb.2(0x00000000)
+  ; GFX1100-NEXT: {{  $}}
+  ; GFX1100-NEXT:   [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+  ; GFX1100-NEXT:   [[C1:%[0-9]+]]:_(p1) = G_CONSTANT i64 0
+  ; GFX1100-NEXT:   G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+  ; GFX1100-NEXT:   S_TRAP 2
+  ; GFX1100-NEXT:   [[S_SENDMSG_RTN_B32_:%[0-9]+]]:sreg_32 = S_SENDMSG_RTN_B32 128
+  ; GFX1100-NEXT:   $ttmp2 = S_MOV_B32 $m0
+  ; GFX1100-NEXT:   [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[S_SENDMSG_RTN_B32_]], 1023, implicit-def $scc
+  ; GFX1100-NEXT:   [[S_OR_B32_:%[0-9]+]]:sreg_32 = S_OR_B32 [[S_AND_B32_]], 1024, implicit-def $scc
+  ; GFX1100-NEXT:   $m0 = S_MOV_B32 [[S_OR_B32_]]
+  ; GFX1100-NEXT:   S_SENDMSG 1, implicit $exec, implicit $m0
+  ; GFX1100-NEXT:   $m0 = S_MOV_B32 $ttmp2
+  ; GFX1100-NEXT:   S_CMP_EQ_U32 [[S_OR_B32_]], [[S_OR_B32_]], implicit-def $scc
+  ; GFX1100-NEXT:   S_CBRANCH_SCC1 %bb.2, implicit $scc
+  ; GFX1100-NEXT: {{  $}}
+  ; GFX1100-NEXT: bb.1:
+  ; GFX1100-NEXT:   successors:
+  ; GFX1100-NEXT: {{  $}}
+  ; GFX1100-NEXT:   G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+  ; GFX1100-NEXT: {{  $}}
+  ; GFX1100-NEXT: bb.2:
+  ; GFX1100-NEXT:   successors: %bb.2(0x80000000)
+  ; GFX1100-NEXT: {{  $}}
+  ; GFX1100-NEXT:   S_SETHALT 5
+  ; GFX1100-NEXT:   S_BRANCH %bb.2
+  ;
+  ; GFX1150-LABEL: name: test_fallthrough_trap
+  ; GFX1150: bb.0:
+  ; GFX1150-NEXT:   successors: %bb.1(0x80000000)
+  ; GFX1150-NEXT: {{  $}}
+  ; GFX1150-NEXT:   [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+  ; GFX1150-NEXT:   [[C1:%[0-9]+]]:_(p1) = G_CONSTANT i64 0
+  ; GFX1150-NEXT:   G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+  ; GFX1150-NEXT:   S_TRAP 2
+  ; GFX1150-NEXT: {{  $}}
+  ; GFX1150-NEXT: bb.1:
+  ; GFX1150-NEXT:   G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+  bb.0:
+    successors: %bb.1
+
+    %0:_(s8) = G_CONSTANT i8 0
+    %1:_(p1) = G_CONSTANT i64 0
+    G_STORE %0, %1 :: (store 1, addrspace 1)
+    G_TRAP
 
+  bb.1:
+    G_STORE %0, %1 :: (store 1, addrspace 1)
 ...
diff --git a/llvm/test/CodeGen/AMDGPU/trap-abis.ll b/llvm/test/CodeGen/AMDGPU/trap-abis.ll
index dcc5fbd142c42..cfc1b47671b07 100644
--- a/llvm/test/CodeGen/AMDGPU/trap-abis.ll
+++ b/llvm/test/CodeGen/AMDGPU/trap-abis.ll
@@ -264,6 +264,142 @@ ret:
   ret void
 }
 
+define amdgpu_kernel void @trap_with_use_after(ptr addrspace(1) %arg0, ptr addrspace(1) %arg1) {
+; NOHSA-TRAP-GFX900-LABEL: trap_with_use_after:
+; NOHSA-TRAP-GFX900:       ; %bb.0:
+; NOHSA-TRAP-GFX900-NEXT:    s_load_dwordx4 s[0:3], s[0:1], 0x24
+; NOHSA-TRAP-GFX900-NEXT:    v_mov_b32_e32 v0, 0
+; NOHSA-TRAP-GFX900-NEXT:    s_waitcnt lgkmcnt(0)
+; NOHSA-TRAP-GFX900-NEXT:    global_load_dword v1, v0, s[0:1] glc
+; NOHSA-TRAP-GFX900-NEXT:    s_waitcnt vmcnt(0)
+; NOHSA-TRAP-GFX900-NEXT:    s_cbranch_execnz .LBB2_2
+; NOHSA-TRAP-GFX900-NEXT:  ; %bb.1:
+; NOHSA-TRAP-GFX900-NEXT:    global_store_dword v0, v1, s[2:3]
+; NOHSA-TRAP-GFX900-NEXT:    s_waitcnt vmcnt(0)
+; NOHSA-TRAP-GFX900-NEXT:  .LBB2_2:
+; NOHSA-TRAP-GFX900-NEXT:    s_endpgm
+;
+; HSA-TRAP-GFX803-LABEL: trap_with_use_after:
+; HSA-TRAP-GFX803:       ; %bb.0:
+; HSA-TRAP-GFX803-NEXT:    s_mov_b64 s[0:1], s[4:5]
+; HSA-TRAP-GFX803-NEXT:    s_load_dwordx4 s[4:7], s[6:7], 0x0
+; HSA-TRAP-GFX803-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX803-NEXT:    v_mov_b32_e32 v0, s4
+; HSA-TRAP-GFX803-NEXT:    v_mov_b32_e32 v1, s5
+; HSA-TRAP-GFX803-NEXT:    flat_load_dword v2, v[0:1] glc
+; HSA-TRAP-GFX803-NEXT:    s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX803-NEXT:    v_mov_b32_e32 v0, s6
+; HSA-TRAP-GFX803-NEXT:    v_mov_b32_e32 v1, s7
+; HSA-TRAP-GFX803-NEXT:    s_trap 2
+; HSA-TRAP-GFX803-NEXT:    flat_store_dword v[0:1], v2
+; HSA-TRAP-GFX803-NEXT:    s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX803-NEXT:    s_endpgm
+;
+; HSA-TRAP-GFX900-LABEL: trap_with_use_after:
+; HSA-TRAP-GFX900:       ; %bb.0:
+; HSA-TRAP-GFX900-NEXT:    s_load_dwordx4 s[0:3], s[4:5], 0x0
+; HSA-TRAP-GFX900-NEXT:    v_mov_b32_e32 v0, 0
+; HSA-TRAP-GFX900-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX900-NEXT:    global_load_dword v1, v0, s[0:1] glc
+; HSA-TRAP-GFX900-NEXT:    s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX900-NEXT:    s_trap 2
+; HSA-TRAP-GFX900-NEXT:    global_store_dword v0, v1, s[2:3]
+; HSA-TRAP-GFX900-NEXT:    s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX900-NEXT:    s_endpgm
+;
+; HSA-NOTRAP-GFX900-LABEL: trap_with_use_after:
+; HSA-NOTRAP-GFX900:       ; %bb.0:
+; HSA-NOTRAP-GFX900-NEXT:    s_load_dwordx4 s[0:3], s[4:5], 0x0
+; HSA-NOTRAP-GFX900-NEXT:    v_mov_b32_e32 v0, 0
+; HSA-NOTRAP-GFX900-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-NOTRAP-GFX900-NEXT:    global_load_dword v1, v0, s[0:1] glc
+; HSA-NOTRAP-GFX900-NEXT:    s_waitcnt vmcnt(0)
+; HSA-NOTRAP-GFX900-NEXT:    s_cbranch_execnz .LBB2_2
+; HSA-NOTRAP-GFX900-NEXT:  ; %bb.1:
+; HSA-NOTRAP-GFX900-NEXT:    global_store_dword v0, v1, s[2:3]
+; HSA-NOTRAP-GFX900-NEXT:    s_waitcnt vmcnt(0)
+; HSA-NOTRAP-GFX900-NEXT:  .LBB2_2:
+; HSA-NOTRAP-GFX900-NEXT:    s_endpgm
+;
+; HSA-TRAP-GFX1100-LABEL: trap_with_use_after:
+; HSA-TRAP-GFX1100:       ; %bb.0:
+; HSA-TRAP-GFX1100-NEXT:    s_load_b128 s[0:3], s[0:1], 0x0
+; HSA-TRAP-GFX1100-NEXT:    v_mov_b32_e32 v0, 0
+; HSA-TRAP-GFX1100-NEXT:    s_mov_b32 ttmp2, m0
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-NEXT:    global_load_b32 v1, v0, s[0:1] glc dlc
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX1100-NEXT:    s_trap 2
+; HSA-TRAP-GFX1100-NEXT:    s_sendmsg_rtn_b32 s0, sendmsg(MSG_RTN_GET_DOORBELL)
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-NEXT:    s_and_b32 s0, s0, 0x3ff
+; HSA-TRAP-GFX1100-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
+; HSA-TRAP-GFX1100-NEXT:    s_bitset1_b32 s0, 10
+; HSA-TRAP-GFX1100-NEXT:    s_cmp_eq_u32 s0, s0
+; HSA-TRAP-GFX1100-NEXT:    s_mov_b32 m0, s0
+; HSA-TRAP-GFX1100-NEXT:    s_sendmsg sendmsg(MSG_INTERRUPT)
+; HSA-TRAP-GFX1100-NEXT:    s_mov_b32 m0, ttmp2
+; HSA-TRAP-GFX1100-NEXT:    s_cbranch_scc1 .LBB2_2
+; HSA-TRAP-GFX1100-NEXT:  ; %bb.1:
+; HSA-TRAP-GFX1100-NEXT:    global_store_b32 v0, v1, s[2:3] dlc
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-NEXT:    s_nop 0
+; HSA-TRAP-GFX1100-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; HSA-TRAP-GFX1100-NEXT:    s_endpgm
+; HSA-TRAP-GFX1100-NEXT:  .LBB2_2: ; =>This Inner Loop Header: Depth=1
+; HSA-TRAP-GFX1100-NEXT:    s_sethalt 5
+; HSA-TRAP-GFX1100-NEXT:    s_branch .LBB2_2
+;
+; HSA-TRAP-GFX1100-O0-LABEL: trap_with_use_after:
+; HSA-TRAP-GFX1100-O0:       ; %bb.0:
+; HSA-TRAP-GFX1100-O0-NEXT:    ; implicit-def: $vgpr1 : SGPR spill to VGPR lane
+; HSA-TRAP-GFX1100-O0-NEXT:    v_mov_b32_e32 v0, 0
+; HSA-TRAP-GFX1100-O0-NEXT:    scratch_store_b32 off, v0, off offset:8 ; 4-byte Folded Spill
+; HSA-TRAP-GFX1100-O0-NEXT:    s_load_b64 s[0:1], s[4:5], 0x0
+; HSA-TRAP-GFX1100-O0-NEXT:    s_load_b64 s[2:3], s[4:5], 0x8
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT:    v_writelane_b32 v1, s2, 0
+; HSA-TRAP-GFX1100-O0-NEXT:    v_writelane_b32 v1, s3, 1
+; HSA-TRAP-GFX1100-O0-NEXT:    s_or_saveexec_b32 s6, -1
+; HSA-TRAP-GFX1100-O0-NEXT:    scratch_store_b32 off, v1, off offset:4 ; 4-byte Folded Spill
+; HSA-TRAP-GFX1100-O0-NEXT:    s_mov_b32 exec_lo, s6
+; HSA-TRAP-GFX1100-O0-NEXT:    global_load_b32 v0, v0, s[0:1] glc dlc
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT:    scratch_store_b32 off, v0, off ; 4-byte Folded Spill
+; HSA-TRAP-GFX1100-O0-NEXT:    s_trap 2
+; HSA-TRAP-GFX1100-O0-NEXT:    s_sendmsg_rtn_b32 s0, sendmsg(MSG_RTN_GET_DOORBELL)
+; HSA-TRAP-GFX1100-O0-NEXT:    s_mov_b32 ttmp2, m0
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT:    s_and_b32 s0, s0, 0x3ff
+; HSA-TRAP-GFX1100-O0-NEXT:    s_or_b32 s0, s0, 0x400
+; HSA-TRAP-GFX1100-O0-NEXT:    s_mov_b32 m0, s0
+; HSA-TRAP-GFX1100-O0-NEXT:    s_sendmsg sendmsg(MSG_INTERRUPT)
+; HSA-TRAP-GFX1100-O0-NEXT:    s_mov_b32 m0, ttmp2
+; HSA-TRAP-GFX1100-O0-NEXT:    s_cmp_eq_u32 s0, s0
+; HSA-TRAP-GFX1100-O0-NEXT:    s_cbranch_scc1 .LBB2_2
+; HSA-TRAP-GFX1100-O0-NEXT:  ; %bb.1:
+; HSA-TRAP-GFX1100-O0-NEXT:    s_or_saveexec_b32 s6, -1
+; HSA-TRAP-GFX1100-O0-NEXT:    scratch_load_b32 v0, off, off offset:4 ; 4-byte Folded Reload
+; HSA-TRAP-GFX1100-O0-NEXT:    s_mov_b32 exec_lo, s6
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT:    v_readlane_b32 s0, v0, 0
+; HSA-TRAP-GFX1100-O0-NEXT:    v_readlane_b32 s1, v0, 1
+; HSA-TRAP-GFX1100-O0-NEXT:    scratch_load_b32 v1, off, off offset:8 ; 4-byte Folded Reload
+; HSA-TRAP-GFX1100-O0-NEXT:    scratch_load_b32 v2, off, off ; 4-byte Folded Reload
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT:    global_store_b32 v1, v2, s[0:1] dlc
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-O0-NEXT:    ; kill: killed $vgpr0
+; HSA-TRAP-GFX1100-O0-NEXT:    s_endpgm
+; HSA-TRAP-GFX1100-O0-NEXT:  .LBB2_2: ; =>This Inner Loop Header: Depth=1
+; HSA-TRAP-GFX1100-O0-NEXT:    s_sethalt 5
+; HSA-TRAP-GFX1100-O0-NEXT:    s_branch .LBB2_2
+  %tmp = load volatile i32, ptr addrspace(1) %arg0
+  call void @llvm.trap()
+  store volatile i32 %tmp, ptr addrspace(1) %arg1
+  ret void
+}
+
 define amdgpu_kernel void @debugtrap(ptr addrspace(1) nocapture readonly %arg0) {
 ; NOHSA-TRAP-GFX900-LABEL: debugtrap:
 ; NOHSA-TRAP-GFX900:       ; %bb.0:
@@ -334,6 +470,20 @@ define amdgpu_kernel void @debugtrap(ptr addrspace(1) nocapture readonly %arg0)
 ; HSA-TRAP-GFX1100-NEXT:    s_nop 0
 ; HSA-TRAP-GFX1100-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
 ; HSA-TRAP-GFX1100-NEXT:    s_endpgm
+;
+; HSA-TRAP-GFX1100-O0-LABEL: debugtrap:
+; HSA-TRAP-GFX1100-O0:       ; %bb.0:
+; HSA-TRAP-GFX1100-O0-NEXT:    s_load_b64 s[0:1], s[4:5], 0x0
+; HSA-TRAP-GFX1100-O0-NEXT:    v_mov_b32_e32 v0, 0
+; HSA-TRAP-GFX1100-O0-NEXT:    v_mov_b32_e32 v1, 1
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT:    global_store_b32 v0, v1, s[0:1] dlc
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-O0-NEXT:    s_trap 3
+; HSA-TRAP-GFX1100-O0-NEXT:    v_mov_b32_e32 v1, 2
+; HSA-TRAP-GFX1100-O0-NEXT:    global_store_b32 v0, v1, s[0:1] dlc
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-O0-NEXT:    s_endpgm
   store volatile i32 1, ptr addrspace(1) %arg0
   call void @llvm.debugtrap()
   store volatile i32 2, ptr addrspace(1) %arg0

@llvmbot
Copy link
Member

llvmbot commented May 9, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Emma Pilkington (epilk)

Changes

This was breaking the CFG connection between uses of virtual registers after the trap and their definitions before it. Fixes SWDEV-460384.

Fixes a bug in #85854.


Full diff: https://github.com/llvm/llvm-project/pull/91652.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+15-3)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir (+57-2)
  • (modified) llvm/test/CodeGen/AMDGPU/trap-abis.ll (+150)
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 6599d0abd135c..17e235291e37b 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -2065,14 +2065,26 @@ MachineBasicBlock *SIInstrInfo::insertSimulatedTrap(MachineRegisterInfo &MRI,
       .addImm(AMDGPU::SendMsg::ID_INTERRUPT);
   BuildMI(MBB, MI, DL, get(AMDGPU::S_MOV_B32), AMDGPU::M0)
       .addUse(AMDGPU::TTMP2);
-  BuildMI(MBB, MI, DL, get(AMDGPU::S_BRANCH)).addMBB(HaltLoop);
+
+  if (MBB.succ_empty()) {
+    BuildMI(MBB, MI, DL, get(AMDGPU::S_BRANCH)).addMBB(HaltLoop);
+  } else {
+    // HACK: There are some instructions following the trap. Since uses of
+    // virtual registers in SplitBB (or beyond) that were defined before the
+    // trap must be dominated by their definitions, we need SplitBB to be a
+    // successor (even though it's unreachable in practice). This needs to be
+    // represented by a dummy cmp_eq and cbranch to convince analyzeBranch that
+    // SplitBB should indeed be considered a successor.
+    BuildMI(MBB, MI, DL, get(AMDGPU::S_CMP_EQ_U32))
+        .addUse(SetWaveAbortBit)
+        .addUse(SetWaveAbortBit);
+    BuildMI(MBB, MI, DL, get(AMDGPU::S_CBRANCH_SCC1)).addMBB(HaltLoop);
+  }
 
   BuildMI(*HaltLoop, HaltLoop->end(), DL, get(AMDGPU::S_SETHALT)).addImm(5);
   BuildMI(*HaltLoop, HaltLoop->end(), DL, get(AMDGPU::S_BRANCH))
       .addMBB(HaltLoop);
 
-  if (SplitBB != &MBB)
-    MBB.removeSuccessor(SplitBB);
   MBB.addSuccessor(HaltLoop);
   HaltLoop->addSuccessor(HaltLoop);
 
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir
index ac98dca00be3d..e217a8be9597d 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir
@@ -8,7 +8,7 @@ name: test_trap
 body: |
   bb.0:
     ; GFX1100-LABEL: name: test_trap
-    ; GFX1100: successors: %bb.2(0x80000000)
+    ; GFX1100: successors: %bb.1(0x40000000), %bb.2(0x40000000)
     ; GFX1100-NEXT: {{  $}}
     ; GFX1100-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
     ; GFX1100-NEXT: [[C1:%[0-9]+]]:_(p1) = G_CONSTANT i64 0
@@ -21,7 +21,8 @@ body: |
     ; GFX1100-NEXT: $m0 = S_MOV_B32 [[S_OR_B32_]]
     ; GFX1100-NEXT: S_SENDMSG 1, implicit $exec, implicit $m0
     ; GFX1100-NEXT: $m0 = S_MOV_B32 $ttmp2
-    ; GFX1100-NEXT: S_BRANCH %bb.2
+    ; GFX1100-NEXT: S_CMP_EQ_U32 [[S_OR_B32_]], [[S_OR_B32_]], implicit-def $scc
+    ; GFX1100-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
     ; GFX1100-NEXT: {{  $}}
     ; GFX1100-NEXT: .1:
     ; GFX1100-NEXT: successors:
@@ -45,5 +46,59 @@ body: |
     G_STORE %0, %1 :: (store 1, addrspace 1)
     G_TRAP
     G_STORE %0, %1 :: (store 1, addrspace 1)
+...
+
+---
+name: test_fallthrough_trap
+body: |
+  ; GFX1100-LABEL: name: test_fallthrough_trap
+  ; GFX1100: bb.0:
+  ; GFX1100-NEXT:   successors: %bb.1(0x80000000), %bb.2(0x00000000)
+  ; GFX1100-NEXT: {{  $}}
+  ; GFX1100-NEXT:   [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+  ; GFX1100-NEXT:   [[C1:%[0-9]+]]:_(p1) = G_CONSTANT i64 0
+  ; GFX1100-NEXT:   G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+  ; GFX1100-NEXT:   S_TRAP 2
+  ; GFX1100-NEXT:   [[S_SENDMSG_RTN_B32_:%[0-9]+]]:sreg_32 = S_SENDMSG_RTN_B32 128
+  ; GFX1100-NEXT:   $ttmp2 = S_MOV_B32 $m0
+  ; GFX1100-NEXT:   [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[S_SENDMSG_RTN_B32_]], 1023, implicit-def $scc
+  ; GFX1100-NEXT:   [[S_OR_B32_:%[0-9]+]]:sreg_32 = S_OR_B32 [[S_AND_B32_]], 1024, implicit-def $scc
+  ; GFX1100-NEXT:   $m0 = S_MOV_B32 [[S_OR_B32_]]
+  ; GFX1100-NEXT:   S_SENDMSG 1, implicit $exec, implicit $m0
+  ; GFX1100-NEXT:   $m0 = S_MOV_B32 $ttmp2
+  ; GFX1100-NEXT:   S_CMP_EQ_U32 [[S_OR_B32_]], [[S_OR_B32_]], implicit-def $scc
+  ; GFX1100-NEXT:   S_CBRANCH_SCC1 %bb.2, implicit $scc
+  ; GFX1100-NEXT: {{  $}}
+  ; GFX1100-NEXT: bb.1:
+  ; GFX1100-NEXT:   successors:
+  ; GFX1100-NEXT: {{  $}}
+  ; GFX1100-NEXT:   G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+  ; GFX1100-NEXT: {{  $}}
+  ; GFX1100-NEXT: bb.2:
+  ; GFX1100-NEXT:   successors: %bb.2(0x80000000)
+  ; GFX1100-NEXT: {{  $}}
+  ; GFX1100-NEXT:   S_SETHALT 5
+  ; GFX1100-NEXT:   S_BRANCH %bb.2
+  ;
+  ; GFX1150-LABEL: name: test_fallthrough_trap
+  ; GFX1150: bb.0:
+  ; GFX1150-NEXT:   successors: %bb.1(0x80000000)
+  ; GFX1150-NEXT: {{  $}}
+  ; GFX1150-NEXT:   [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+  ; GFX1150-NEXT:   [[C1:%[0-9]+]]:_(p1) = G_CONSTANT i64 0
+  ; GFX1150-NEXT:   G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+  ; GFX1150-NEXT:   S_TRAP 2
+  ; GFX1150-NEXT: {{  $}}
+  ; GFX1150-NEXT: bb.1:
+  ; GFX1150-NEXT:   G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+  bb.0:
+    successors: %bb.1
+
+    %0:_(s8) = G_CONSTANT i8 0
+    %1:_(p1) = G_CONSTANT i64 0
+    G_STORE %0, %1 :: (store 1, addrspace 1)
+    G_TRAP
 
+  bb.1:
+    G_STORE %0, %1 :: (store 1, addrspace 1)
 ...
diff --git a/llvm/test/CodeGen/AMDGPU/trap-abis.ll b/llvm/test/CodeGen/AMDGPU/trap-abis.ll
index dcc5fbd142c42..cfc1b47671b07 100644
--- a/llvm/test/CodeGen/AMDGPU/trap-abis.ll
+++ b/llvm/test/CodeGen/AMDGPU/trap-abis.ll
@@ -264,6 +264,142 @@ ret:
   ret void
 }
 
+define amdgpu_kernel void @trap_with_use_after(ptr addrspace(1) %arg0, ptr addrspace(1) %arg1) {
+; NOHSA-TRAP-GFX900-LABEL: trap_with_use_after:
+; NOHSA-TRAP-GFX900:       ; %bb.0:
+; NOHSA-TRAP-GFX900-NEXT:    s_load_dwordx4 s[0:3], s[0:1], 0x24
+; NOHSA-TRAP-GFX900-NEXT:    v_mov_b32_e32 v0, 0
+; NOHSA-TRAP-GFX900-NEXT:    s_waitcnt lgkmcnt(0)
+; NOHSA-TRAP-GFX900-NEXT:    global_load_dword v1, v0, s[0:1] glc
+; NOHSA-TRAP-GFX900-NEXT:    s_waitcnt vmcnt(0)
+; NOHSA-TRAP-GFX900-NEXT:    s_cbranch_execnz .LBB2_2
+; NOHSA-TRAP-GFX900-NEXT:  ; %bb.1:
+; NOHSA-TRAP-GFX900-NEXT:    global_store_dword v0, v1, s[2:3]
+; NOHSA-TRAP-GFX900-NEXT:    s_waitcnt vmcnt(0)
+; NOHSA-TRAP-GFX900-NEXT:  .LBB2_2:
+; NOHSA-TRAP-GFX900-NEXT:    s_endpgm
+;
+; HSA-TRAP-GFX803-LABEL: trap_with_use_after:
+; HSA-TRAP-GFX803:       ; %bb.0:
+; HSA-TRAP-GFX803-NEXT:    s_mov_b64 s[0:1], s[4:5]
+; HSA-TRAP-GFX803-NEXT:    s_load_dwordx4 s[4:7], s[6:7], 0x0
+; HSA-TRAP-GFX803-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX803-NEXT:    v_mov_b32_e32 v0, s4
+; HSA-TRAP-GFX803-NEXT:    v_mov_b32_e32 v1, s5
+; HSA-TRAP-GFX803-NEXT:    flat_load_dword v2, v[0:1] glc
+; HSA-TRAP-GFX803-NEXT:    s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX803-NEXT:    v_mov_b32_e32 v0, s6
+; HSA-TRAP-GFX803-NEXT:    v_mov_b32_e32 v1, s7
+; HSA-TRAP-GFX803-NEXT:    s_trap 2
+; HSA-TRAP-GFX803-NEXT:    flat_store_dword v[0:1], v2
+; HSA-TRAP-GFX803-NEXT:    s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX803-NEXT:    s_endpgm
+;
+; HSA-TRAP-GFX900-LABEL: trap_with_use_after:
+; HSA-TRAP-GFX900:       ; %bb.0:
+; HSA-TRAP-GFX900-NEXT:    s_load_dwordx4 s[0:3], s[4:5], 0x0
+; HSA-TRAP-GFX900-NEXT:    v_mov_b32_e32 v0, 0
+; HSA-TRAP-GFX900-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX900-NEXT:    global_load_dword v1, v0, s[0:1] glc
+; HSA-TRAP-GFX900-NEXT:    s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX900-NEXT:    s_trap 2
+; HSA-TRAP-GFX900-NEXT:    global_store_dword v0, v1, s[2:3]
+; HSA-TRAP-GFX900-NEXT:    s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX900-NEXT:    s_endpgm
+;
+; HSA-NOTRAP-GFX900-LABEL: trap_with_use_after:
+; HSA-NOTRAP-GFX900:       ; %bb.0:
+; HSA-NOTRAP-GFX900-NEXT:    s_load_dwordx4 s[0:3], s[4:5], 0x0
+; HSA-NOTRAP-GFX900-NEXT:    v_mov_b32_e32 v0, 0
+; HSA-NOTRAP-GFX900-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-NOTRAP-GFX900-NEXT:    global_load_dword v1, v0, s[0:1] glc
+; HSA-NOTRAP-GFX900-NEXT:    s_waitcnt vmcnt(0)
+; HSA-NOTRAP-GFX900-NEXT:    s_cbranch_execnz .LBB2_2
+; HSA-NOTRAP-GFX900-NEXT:  ; %bb.1:
+; HSA-NOTRAP-GFX900-NEXT:    global_store_dword v0, v1, s[2:3]
+; HSA-NOTRAP-GFX900-NEXT:    s_waitcnt vmcnt(0)
+; HSA-NOTRAP-GFX900-NEXT:  .LBB2_2:
+; HSA-NOTRAP-GFX900-NEXT:    s_endpgm
+;
+; HSA-TRAP-GFX1100-LABEL: trap_with_use_after:
+; HSA-TRAP-GFX1100:       ; %bb.0:
+; HSA-TRAP-GFX1100-NEXT:    s_load_b128 s[0:3], s[0:1], 0x0
+; HSA-TRAP-GFX1100-NEXT:    v_mov_b32_e32 v0, 0
+; HSA-TRAP-GFX1100-NEXT:    s_mov_b32 ttmp2, m0
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-NEXT:    global_load_b32 v1, v0, s[0:1] glc dlc
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX1100-NEXT:    s_trap 2
+; HSA-TRAP-GFX1100-NEXT:    s_sendmsg_rtn_b32 s0, sendmsg(MSG_RTN_GET_DOORBELL)
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-NEXT:    s_and_b32 s0, s0, 0x3ff
+; HSA-TRAP-GFX1100-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
+; HSA-TRAP-GFX1100-NEXT:    s_bitset1_b32 s0, 10
+; HSA-TRAP-GFX1100-NEXT:    s_cmp_eq_u32 s0, s0
+; HSA-TRAP-GFX1100-NEXT:    s_mov_b32 m0, s0
+; HSA-TRAP-GFX1100-NEXT:    s_sendmsg sendmsg(MSG_INTERRUPT)
+; HSA-TRAP-GFX1100-NEXT:    s_mov_b32 m0, ttmp2
+; HSA-TRAP-GFX1100-NEXT:    s_cbranch_scc1 .LBB2_2
+; HSA-TRAP-GFX1100-NEXT:  ; %bb.1:
+; HSA-TRAP-GFX1100-NEXT:    global_store_b32 v0, v1, s[2:3] dlc
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-NEXT:    s_nop 0
+; HSA-TRAP-GFX1100-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; HSA-TRAP-GFX1100-NEXT:    s_endpgm
+; HSA-TRAP-GFX1100-NEXT:  .LBB2_2: ; =>This Inner Loop Header: Depth=1
+; HSA-TRAP-GFX1100-NEXT:    s_sethalt 5
+; HSA-TRAP-GFX1100-NEXT:    s_branch .LBB2_2
+;
+; HSA-TRAP-GFX1100-O0-LABEL: trap_with_use_after:
+; HSA-TRAP-GFX1100-O0:       ; %bb.0:
+; HSA-TRAP-GFX1100-O0-NEXT:    ; implicit-def: $vgpr1 : SGPR spill to VGPR lane
+; HSA-TRAP-GFX1100-O0-NEXT:    v_mov_b32_e32 v0, 0
+; HSA-TRAP-GFX1100-O0-NEXT:    scratch_store_b32 off, v0, off offset:8 ; 4-byte Folded Spill
+; HSA-TRAP-GFX1100-O0-NEXT:    s_load_b64 s[0:1], s[4:5], 0x0
+; HSA-TRAP-GFX1100-O0-NEXT:    s_load_b64 s[2:3], s[4:5], 0x8
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT:    v_writelane_b32 v1, s2, 0
+; HSA-TRAP-GFX1100-O0-NEXT:    v_writelane_b32 v1, s3, 1
+; HSA-TRAP-GFX1100-O0-NEXT:    s_or_saveexec_b32 s6, -1
+; HSA-TRAP-GFX1100-O0-NEXT:    scratch_store_b32 off, v1, off offset:4 ; 4-byte Folded Spill
+; HSA-TRAP-GFX1100-O0-NEXT:    s_mov_b32 exec_lo, s6
+; HSA-TRAP-GFX1100-O0-NEXT:    global_load_b32 v0, v0, s[0:1] glc dlc
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT:    scratch_store_b32 off, v0, off ; 4-byte Folded Spill
+; HSA-TRAP-GFX1100-O0-NEXT:    s_trap 2
+; HSA-TRAP-GFX1100-O0-NEXT:    s_sendmsg_rtn_b32 s0, sendmsg(MSG_RTN_GET_DOORBELL)
+; HSA-TRAP-GFX1100-O0-NEXT:    s_mov_b32 ttmp2, m0
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT:    s_and_b32 s0, s0, 0x3ff
+; HSA-TRAP-GFX1100-O0-NEXT:    s_or_b32 s0, s0, 0x400
+; HSA-TRAP-GFX1100-O0-NEXT:    s_mov_b32 m0, s0
+; HSA-TRAP-GFX1100-O0-NEXT:    s_sendmsg sendmsg(MSG_INTERRUPT)
+; HSA-TRAP-GFX1100-O0-NEXT:    s_mov_b32 m0, ttmp2
+; HSA-TRAP-GFX1100-O0-NEXT:    s_cmp_eq_u32 s0, s0
+; HSA-TRAP-GFX1100-O0-NEXT:    s_cbranch_scc1 .LBB2_2
+; HSA-TRAP-GFX1100-O0-NEXT:  ; %bb.1:
+; HSA-TRAP-GFX1100-O0-NEXT:    s_or_saveexec_b32 s6, -1
+; HSA-TRAP-GFX1100-O0-NEXT:    scratch_load_b32 v0, off, off offset:4 ; 4-byte Folded Reload
+; HSA-TRAP-GFX1100-O0-NEXT:    s_mov_b32 exec_lo, s6
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT:    v_readlane_b32 s0, v0, 0
+; HSA-TRAP-GFX1100-O0-NEXT:    v_readlane_b32 s1, v0, 1
+; HSA-TRAP-GFX1100-O0-NEXT:    scratch_load_b32 v1, off, off offset:8 ; 4-byte Folded Reload
+; HSA-TRAP-GFX1100-O0-NEXT:    scratch_load_b32 v2, off, off ; 4-byte Folded Reload
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT:    global_store_b32 v1, v2, s[0:1] dlc
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-O0-NEXT:    ; kill: killed $vgpr0
+; HSA-TRAP-GFX1100-O0-NEXT:    s_endpgm
+; HSA-TRAP-GFX1100-O0-NEXT:  .LBB2_2: ; =>This Inner Loop Header: Depth=1
+; HSA-TRAP-GFX1100-O0-NEXT:    s_sethalt 5
+; HSA-TRAP-GFX1100-O0-NEXT:    s_branch .LBB2_2
+  %tmp = load volatile i32, ptr addrspace(1) %arg0
+  call void @llvm.trap()
+  store volatile i32 %tmp, ptr addrspace(1) %arg1
+  ret void
+}
+
 define amdgpu_kernel void @debugtrap(ptr addrspace(1) nocapture readonly %arg0) {
 ; NOHSA-TRAP-GFX900-LABEL: debugtrap:
 ; NOHSA-TRAP-GFX900:       ; %bb.0:
@@ -334,6 +470,20 @@ define amdgpu_kernel void @debugtrap(ptr addrspace(1) nocapture readonly %arg0)
 ; HSA-TRAP-GFX1100-NEXT:    s_nop 0
 ; HSA-TRAP-GFX1100-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
 ; HSA-TRAP-GFX1100-NEXT:    s_endpgm
+;
+; HSA-TRAP-GFX1100-O0-LABEL: debugtrap:
+; HSA-TRAP-GFX1100-O0:       ; %bb.0:
+; HSA-TRAP-GFX1100-O0-NEXT:    s_load_b64 s[0:1], s[4:5], 0x0
+; HSA-TRAP-GFX1100-O0-NEXT:    v_mov_b32_e32 v0, 0
+; HSA-TRAP-GFX1100-O0-NEXT:    v_mov_b32_e32 v1, 1
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-O0-NEXT:    global_store_b32 v0, v1, s[0:1] dlc
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-O0-NEXT:    s_trap 3
+; HSA-TRAP-GFX1100-O0-NEXT:    v_mov_b32_e32 v1, 2
+; HSA-TRAP-GFX1100-O0-NEXT:    global_store_b32 v0, v1, s[0:1] dlc
+; HSA-TRAP-GFX1100-O0-NEXT:    s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-O0-NEXT:    s_endpgm
   store volatile i32 1, ptr addrspace(1) %arg0
   call void @llvm.debugtrap()
   store volatile i32 2, ptr addrspace(1) %arg0

Comment on lines 2069 to 2071
if (MBB.succ_empty()) {
BuildMI(MBB, MI, DL, get(AMDGPU::S_BRANCH)).addMBB(HaltLoop);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kill and ENDPGM_TRAP have the same problem, but this handling looks different. Can you follow what they do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, fixed in the update. Did you also mean using s_cbranch_execnz? I'm not certain why that's actually correct, can we just assume that exec is non-zero?

Copy link
Contributor

@arsenm arsenm May 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's part of it. If exec is zero no lanes are executing so no trap should happen

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so we should be guarding the whole trap sequence with a c_branch_execnz? I did this in the update. I think what was confusing me was that the other lowering for the trap intrinsic is an unconditional s_trap 2. I guess the argument is that should be okay since we don't delete the guarding s_cbranch_execz if it jumps over a trap here?

Copy link
Contributor

@arsenm arsenm May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, this should be unnecessary? By default we assume at least one line is active and have to not remove branches over special instructions that should do nothing with exec 0, like trap

Copy link
Member Author

@epilk epilk May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need a "fake" branch so we don't disconnect definitions before the trap with their uses after it in the CFG (this was what I was trying to do with the s_cmp %0,%0; s_cbranch_scc1 thing). I guess we should prefer c_branch_execnz since it seems like a more compact way of representing a "conditional unconditional" branch.

Incidentally, I guess this comment is not correct (the s_cbranch_execnz isn't "optional")?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment is correct. In that context there's no special case instructions being skipped, like trap, that must be skipped if no lanes are active

Comment on lines 2069 to 2071
if (MBB.succ_empty()) {
BuildMI(MBB, MI, DL, get(AMDGPU::S_BRANCH)).addMBB(HaltLoop);
} else {
Copy link
Contributor

@arsenm arsenm May 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's part of it. If exec is zero no lanes are executing so no trap should happen

@epilk epilk requested a review from arsenm May 16, 2024 15:12
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The branching is cumbersome, but I guess it doesn't really matter since this should only appear at -O0 anyway

@epilk epilk merged commit 9e0be65 into llvm:main May 22, 2024
2 of 4 checks passed
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Jun 12, 2024
On gfx11 shaders run with PRIV=1, which causes `s_trap 2` to be treated
as a nop, which means it isn't a correct lowering for the trap
intrinsic. As a workaround, this commit instead lowers the trap
intrinsic to instructions that simulate the behavior of s_trap 2.

This commit also includes the following commit, fixing a bug in the
original version:

> 9e0be65
> [AMDGPU] Fix broken MIR generated by gfx11 simulated trap lowering (llvm#91652)

> This was breaking the CFG connection between uses of virtual registers
> after the trap and their definitions before it. Fixes SWDEV-460384.

> Fixes a bug in llvm#85854.

Fixes: SWDEV-438421
Change-Id: I58244d69370abe3c57f5239f196cb284a5407530
jrbyrnes pushed a commit to jrbyrnes/llvm-project that referenced this pull request Jul 17, 2024
On gfx11 shaders run with PRIV=1, which causes `s_trap 2` to be treated
as a nop, which means it isn't a correct lowering for the trap
intrinsic. As a workaround, this commit instead lowers the trap
intrinsic to instructions that simulate the behavior of s_trap 2.

This commit also includes the following commit, fixing a bug in the
original version:

> 9e0be65
> [AMDGPU] Fix broken MIR generated by gfx11 simulated trap lowering (llvm#91652)

> This was breaking the CFG connection between uses of virtual registers
> after the trap and their definitions before it. Fixes SWDEV-460384.

> Fixes a bug in llvm#85854.

Fixes: SWDEV-438421
Change-Id: I58244d69370abe3c57f5239f196cb284a5407530
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants