Skip to content

[AMDGPU] Fix hang caused by VS_CNT handling at calls #78318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 17, 2024
Merged

[AMDGPU] Fix hang caused by VS_CNT handling at calls #78318

merged 1 commit into from
Jan 17, 2024

Conversation

jayfoad
Copy link
Contributor

@jayfoad jayfoad commented Jan 16, 2024

Fix a potential hang introduced by #77439 and #77935. This line:

setScoreUB(VS_CNT, getScoreLB(VS_CNT) + getWaitCountMax(VS_CNT));

could potentialy set UB lower than it was before, which confused
SIInsertWaitcnts's fixed point algorithm.

This was only triggered a STORE instruction with an implicit-def, which
seems odd but apparently happens for some spills.

Fix a potential hang introduced by #77439 and #77935. This line:

  setScoreUB(VS_CNT, getScoreLB(VS_CNT) + getWaitCountMax(VS_CNT));

could potentialy set UB lower than it was before, which confused
SIInsertWaitcnts's fixed point algorithm.

This was only triggered a STORE instruction with an implicit-def, which
seems odd but apparently happens for some spills.
@llvmbot
Copy link
Member

llvmbot commented Jan 16, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

Changes

Fix a potential hang introduced by #77439 and #77935. This line:

setScoreUB(VS_CNT, getScoreLB(VS_CNT) + getWaitCountMax(VS_CNT));

could potentialy set UB lower than it was before, which confused
SIInsertWaitcnts's fixed point algorithm.

This was only triggered a STORE instruction with an implicit-def, which
seems odd but apparently happens for some spills.


Full diff: https://github.com/llvm/llvm-project/pull/78318.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp (+1-1)
  • (added) llvm/test/CodeGen/AMDGPU/insert-waitcnts-hang.mir (+50)
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index acf9489b2d3772..2ae028477ac7e3 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -293,7 +293,7 @@ class WaitcntBrackets {
   }
 
   void setStateOnFunctionEntryOrReturn() {
-    setScoreUB(VS_CNT, getScoreLB(VS_CNT) + getWaitCountMax(VS_CNT));
+    setScoreUB(VS_CNT, getScoreUB(VS_CNT) + getWaitCountMax(VS_CNT));
     PendingEvents |= WaitEventMaskForInst[VS_CNT];
   }
 
diff --git a/llvm/test/CodeGen/AMDGPU/insert-waitcnts-hang.mir b/llvm/test/CodeGen/AMDGPU/insert-waitcnts-hang.mir
new file mode 100644
index 00000000000000..993933b2b5c723
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/insert-waitcnts-hang.mir
@@ -0,0 +1,50 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
+# RUN: llc -march=amdgcn -mcpu=gfx1100 -run-pass si-insert-waitcnts %s -o - | FileCheck %s
+
+---
+name: test
+tracksRegLiveness: true
+stack:
+  - { id: 0, name: '', type: spill-slot, offset: 4, size: 40, alignment: 4,
+      stack-id: default, callee-saved-register: '', callee-saved-restored: true,
+      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
+machineFunctionInfo:
+  frameOffsetReg: '$sgpr33'
+body: |
+  ; CHECK-LABEL: name: test
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT:   liveins: $sgpr12, $sgpr13, $sgpr14, $sgpr15, $sgpr30, $sgpr31, $sgpr34, $sgpr35, $sgpr36, $sgpr37, $sgpr38, $sgpr39, $sgpr40, $sgpr41, $sgpr42, $sgpr43, $sgpr44, $sgpr45, $sgpr46, $sgpr47, $vgpr0, $vgpr1, $vgpr31, $vgpr40, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, $sgpr10_sgpr11
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   S_WAITCNT 0
+  ; CHECK-NEXT:   SCRATCH_STORE_DWORDX4_SADDR killed undef $vgpr8_vgpr9_vgpr10_vgpr11, $sgpr33, 4, 0, implicit $exec, implicit $flat_scr, implicit-def $vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17 :: (store (s128) into %stack.0, align 4, addrspace 5)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.1(0x40000000), %bb.2(0x40000000)
+  ; CHECK-NEXT:   liveins: $sgpr42, $sgpr43, $sgpr44, $sgpr45, $sgpr46, $vgpr31, $vgpr40, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr40_sgpr41, $vgpr0_vgpr1:0x000000000000000F
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   dead $sgpr30_sgpr31 = SI_CALL killed undef renamable $sgpr0_sgpr1, 0, csr_amdgpu
+  ; CHECK-NEXT:   S_CBRANCH_EXECNZ %bb.1, implicit $exec
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
+  ; CHECK-NEXT:   liveins: $sgpr46, $vgpr40
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   S_SETPC_B64_return undef $sgpr30_sgpr31
+  bb.0:
+    successors: %bb.1(0x80000000)
+    liveins: $sgpr12, $sgpr13, $sgpr14, $sgpr15, $sgpr30, $sgpr31, $sgpr34, $sgpr35, $sgpr36, $sgpr37, $sgpr38, $sgpr39, $sgpr40, $sgpr41, $sgpr42, $sgpr43, $sgpr44, $sgpr45, $sgpr46, $sgpr47, $vgpr0, $vgpr1, $vgpr31, $vgpr40, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, $sgpr10_sgpr11
+
+    SCRATCH_STORE_DWORDX4_SADDR killed undef $vgpr8_vgpr9_vgpr10_vgpr11, $sgpr33, 4, 0, implicit $exec, implicit $flat_scr, implicit-def $vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17 :: (store (s128) into %stack.0, align 4, addrspace 5)
+
+  bb.1:
+    successors: %bb.1(0x40000000), %bb.2(0x40000000)
+    liveins: $sgpr42, $sgpr43, $sgpr44, $sgpr45, $sgpr46, $vgpr31, $vgpr40, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr40_sgpr41, $vgpr0_vgpr1:0x000000000000000F
+
+    dead $sgpr30_sgpr31 = SI_CALL killed undef renamable $sgpr0_sgpr1, 0, csr_amdgpu
+    S_CBRANCH_EXECNZ %bb.1, implicit $exec
+
+  bb.2:
+    liveins: $sgpr46, $vgpr40
+
+    S_SETPC_B64_return undef $sgpr30_sgpr31
+...

Copy link
Contributor

@perlfu perlfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@dstutt dstutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too

@jayfoad jayfoad merged commit 36ef291 into llvm:main Jan 17, 2024
@jayfoad jayfoad deleted the insert-waitcnts-hang branch January 17, 2024 10:24
ampandey-1995 pushed a commit to ampandey-1995/llvm-project that referenced this pull request Jan 19, 2024
Fix a potential hang introduced by llvm#77439 and llvm#77935. This line:

  setScoreUB(VS_CNT, getScoreLB(VS_CNT) + getWaitCountMax(VS_CNT));

could potentialy set UB lower than it was before, which confused
SIInsertWaitcnts's fixed point algorithm.

This was only triggered a STORE instruction with an implicit-def, which
seems odd but apparently happens for some spills.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants