Skip to content

[AMDGPU] Fix crash due to missing check for FLAT instructions that dont use vector registers when computing VALU hazard. #123627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 21, 2025

Conversation

chinmaydd
Copy link
Contributor

@chinmaydd chinmaydd commented Jan 20, 2025

This fixes SWDEV-494413, SWDEV-5088813, SWDEV-499344, SWDEV-499349, SWDEV-504112 which were discovered using the ISEL Fuzzer.

@llvmbot
Copy link
Member

llvmbot commented Jan 20, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Chinmay Deshpande (chinmaydd)

Changes

Add check for FLAT instructions that dont use vector registers when computing VALU hazard.

This fixes SWDEV-494413, SWDEV-5088813, SWDEV-499344, SWDEV-499349, SWDEV-504112 which were discovered using the ISEL Fuzzer.


Full diff: https://github.com/llvm/llvm-project/pull/123627.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp (+6-3)
  • (added) llvm/test/CodeGen/AMDGPU/fix-crash-valu-hazard.ll (+32)
diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
index 6baef137df5e16..873d18e30a430a 100644
--- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
@@ -858,9 +858,12 @@ int GCNHazardRecognizer::createsVALUHazard(const MachineInstr &MI) {
   }
 
   if (TII->isFLAT(MI)) {
-    int DataIdx = AMDGPU::getNamedOperandIdx(Opcode, AMDGPU::OpName::vdata);
-    if (AMDGPU::getRegBitWidth(Desc.operands()[DataIdx].RegClass) > 64)
-      return DataIdx;
+    // There is no hazard if the instruction does not use vector regs
+    if (VDataIdx == -1)
+      return -1;
+
+    if (AMDGPU::getRegBitWidth(VDataRCID) > 64)
+      return VDataIdx;
   }
 
   return -1;
diff --git a/llvm/test/CodeGen/AMDGPU/fix-crash-valu-hazard.ll b/llvm/test/CodeGen/AMDGPU/fix-crash-valu-hazard.ll
new file mode 100644
index 00000000000000..734f87056717d1
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fix-crash-valu-hazard.ll
@@ -0,0 +1,32 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn -mcpu=gfx942 -O2 < %s | FileCheck -check-prefixes=GFX942 %s
+
+@G = global <2 x i32> splat (i32 5)
+
+define amdgpu_ps void @global_load_lds_dword_saddr(ptr addrspace(1) inreg nocapture %gptr, ptr addrspace(3) nocapture %lptr) {
+; GFX942-LABEL: global_load_lds_dword_saddr:
+; GFX942:       ; %bb.0: ; %main_body
+; GFX942-NEXT:    s_getpc_b64 s[2:3]
+; GFX942-NEXT:    s_add_u32 s2, s2, G@gotpcrel32@lo+4
+; GFX942-NEXT:    s_addc_u32 s3, s3, G@gotpcrel32@hi+12
+; GFX942-NEXT:    s_load_dwordx2 s[2:3], s[2:3], 0x0
+; GFX942-NEXT:    v_mov_b32_e32 v1, 0
+; GFX942-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[2:3]
+; GFX942-NEXT:    flat_load_dwordx2 v[4:5], v[2:3]
+; GFX942-NEXT:    v_readfirstlane_b32 s2, v0
+; GFX942-NEXT:    s_mov_b32 m0, s2
+; GFX942-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX942-NEXT:    v_mul_lo_u32 v0, v4, 10
+; GFX942-NEXT:    global_load_lds_dword v1, s[0:1] offset:32 nt
+; GFX942-NEXT:    v_mul_lo_u32 v1, v5, 10
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
+; GFX942-NEXT:    flat_store_dwordx2 v[2:3], v[0:1]
+; GFX942-NEXT:    s_endpgm
+main_body:
+  %LGV = load <2 x i32>, ptr @G, align 8
+  %B = mul <2 x i32> %LGV, splat (i32 10)
+  call void @llvm.amdgcn.global.load.lds(ptr addrspace(1) %gptr, ptr addrspace(3) %lptr, i32 4, i32 32, i32 2)
+  store <2 x i32> %B, ptr @G, align 8
+  ret void
+}

@arsenm
Copy link
Contributor

arsenm commented Jan 20, 2025

A fix of an assert can not be NFC

@chinmaydd chinmaydd changed the title [AMDGPU][NFC] Fix crash due to assertion failure [AMDGPU] Fix crash due to assertion failure Jan 20, 2025
@chinmaydd chinmaydd force-pushed the chinmaydd/hazard-crash branch 2 times, most recently from e11fa0e to 15bdb37 Compare January 20, 2025 16:38
Add check for FLAT instructions that dont use vector registers when computing VALU hazard.

Change-Id: I558aed17b109047bdc64f8a7e5f419d4d37577cf
@chinmaydd chinmaydd force-pushed the chinmaydd/hazard-crash branch from 15bdb37 to 741b260 Compare January 20, 2025 18:27
@arsenm
Copy link
Contributor

arsenm commented Jan 21, 2025

Title should also state what the issue was, not state there was an assert twice

Change-Id: I08607cabfdc2bf1018c03ffcca745b99c1f82174
@chinmaydd chinmaydd changed the title [AMDGPU] Fix crash due to assertion failure [AMDGPU] Fix crash due to missing check for FLAT instructions that dont use vector registers when computing VALU hazard. Jan 21, 2025
@@ -858,9 +858,12 @@ int GCNHazardRecognizer::createsVALUHazard(const MachineInstr &MI) {
}

if (TII->isFLAT(MI)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the MIMG case above doing anything? It looks like it's just an assert? I'm also surprised we're querying the operand before the instruction type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the MIMG case above doing anything? It looks like it's just an assert?

Yes, seems that way. The comment suggests the code exists more as a reminder to include the assert than check anything -

All our MIMG definitions use a 256-bit T#, so we can skip checking for them.

@chinmaydd chinmaydd merged commit 9ca1323 into llvm:main Jan 21, 2025
8 checks passed
@chinmaydd chinmaydd deleted the chinmaydd/hazard-crash branch January 21, 2025 13:51
jrbyrnes pushed a commit to jrbyrnes/llvm-project that referenced this pull request May 20, 2025
…nt use vector registers when computing VALU hazard (llvm#123627)

Change-Id: If74798758b12b90b967ee004895525c2d5a8dab6
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request May 27, 2025
…nt use vector registers when computing VALU hazard (llvm#123627)

cherry-pick: 9ca1323 to amd-mainline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants