Skip to content

[AMDGPU] Avoid crashes for non-byte-sized types in PromoteAlloca #134042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ritter-x2a
Copy link
Member

This patch addresses three problems when promoting allocas to vectors:

  • Element types with size < 1 byte in allocas with a vector type caused
    divisions by zero.
  • Element types whose size doesn't match their AllocSize hit an assertion.
  • Access types whose size doesn't match their AllocSize hit an assertion.

With this patch, we do not attempt to promote affected allocas to vectors. In
principle, we could handle these cases in PromoteAlloca, e.g., by truncating
and extending elements from/to their allocation size. It's however unclear if
we ever encounter such cases in practice, so that doesn't seem worth the added
complexity.

For SWDEV-511252

This patch addresses three problems when promoting allocas to vectors:
- Element types with size < 1 byte in allocas with a vector type caused
  divisions by zero.
- Element types whose size doesn't match their AllocSize hit an assertion.
- Access types whose size doesn't match their AllocSize hit an assertion.

With this patch, we do not attempt to promote affected allocas to vectors. In
principle, we could handle these cases in PromoteAlloca, e.g., by truncating
and extending elements from/to their allocation size. It's however unclear if
we ever encounter such cases in practice, so that doesn't seem worth the added
complexity.

For SWDEV-511252
Copy link
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@llvmbot
Copy link
Member

llvmbot commented Apr 2, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Fabian Ritter (ritter-x2a)

Changes

This patch addresses three problems when promoting allocas to vectors:

  • Element types with size < 1 byte in allocas with a vector type caused
    divisions by zero.
  • Element types whose size doesn't match their AllocSize hit an assertion.
  • Access types whose size doesn't match their AllocSize hit an assertion.

With this patch, we do not attempt to promote affected allocas to vectors. In
principle, we could handle these cases in PromoteAlloca, e.g., by truncating
and extending elements from/to their allocation size. It's however unclear if
we ever encounter such cases in practice, so that doesn't seem worth the added
complexity.

For SWDEV-511252


Full diff: https://github.com/llvm/llvm-project/pull/134042.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp (+24-10)
  • (added) llvm/test/CodeGen/AMDGPU/promote-alloca-non-byte-sizes.ll (+204)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
index 94ecb6ba9a2b8..bac2a2ac189ba 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
@@ -728,6 +728,11 @@ static bool isSupportedAccessType(FixedVectorType *VecTy, Type *AccessTy,
   // We could handle more complicated cases, but it'd make things a lot more
   // complicated.
   if (isa<FixedVectorType>(AccessTy)) {
+    // If the type size and the store size don't match, we would need to do more
+    // than just bitcast to translate between an extracted/insertable subvectors
+    // and the accessed value.
+    if (DL.getTypeStoreSizeInBits(AccessTy) != DL.getTypeSizeInBits(AccessTy))
+      return false;
     TypeSize AccTS = DL.getTypeStoreSize(AccessTy);
     TypeSize VecTS = DL.getTypeStoreSize(VecTy->getElementType());
     return AccTS.isKnownMultipleOf(VecTS);
@@ -813,15 +818,17 @@ bool AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(AllocaInst &Alloca) {
 
     if (VectorType::isValidElementType(ElemTy) && NumElems > 0) {
       unsigned ElementSize = DL->getTypeSizeInBits(ElemTy) / 8;
-      unsigned AllocaSize = DL->getTypeStoreSize(AllocaTy);
-      // Expand vector if required to match padding of inner type,
-      // i.e. odd size subvectors.
-      // Storage size of new vector must match that of alloca for correct
-      // behaviour of byte offsets and GEP computation.
-      if (NumElems * ElementSize != AllocaSize)
-        NumElems = AllocaSize / ElementSize;
-      if (NumElems > 0 && (AllocaSize % ElementSize) == 0)
-        VectorTy = FixedVectorType::get(ElemTy, NumElems);
+      if (ElementSize > 0) {
+        unsigned AllocaSize = DL->getTypeStoreSize(AllocaTy);
+        // Expand vector if required to match padding of inner type,
+        // i.e. odd size subvectors.
+        // Storage size of new vector must match that of alloca for correct
+        // behaviour of byte offsets and GEP computation.
+        if (NumElems * ElementSize != AllocaSize)
+          NumElems = AllocaSize / ElementSize;
+        if (NumElems > 0 && (AllocaSize % ElementSize) == 0)
+          VectorTy = FixedVectorType::get(ElemTy, NumElems);
+      }
     }
   }
 
@@ -861,7 +868,14 @@ bool AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(AllocaInst &Alloca) {
   LLVM_DEBUG(dbgs() << "  Attempting promotion to: " << *VectorTy << "\n");
 
   Type *VecEltTy = VectorTy->getElementType();
-  unsigned ElementSize = DL->getTypeSizeInBits(VecEltTy) / 8;
+  unsigned ElementSizeInBits = DL->getTypeSizeInBits(VecEltTy);
+  if (ElementSizeInBits != DL->getTypeAllocSizeInBits(VecEltTy)) {
+    LLVM_DEBUG(dbgs() << "  Cannot convert to vector if the allocation size "
+                         "does not match the type's size\n");
+    return false;
+  }
+  unsigned ElementSize = ElementSizeInBits / 8;
+  assert(ElementSize > 0);
   for (auto *U : Uses) {
     Instruction *Inst = cast<Instruction>(U->getUser());
 
diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-non-byte-sizes.ll b/llvm/test/CodeGen/AMDGPU/promote-alloca-non-byte-sizes.ll
new file mode 100644
index 0000000000000..7195eae154677
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-non-byte-sizes.ll
@@ -0,0 +1,204 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes=amdgpu-promote-alloca < %s | FileCheck %s
+
+; Check that types where the store/allocation sizes don't match the type size
+; don't crash.
+
+
+define <7 x i9> @load_elem_i9_access_8xi9() {
+; CHECK-LABEL: @load_elem_i9_access_8xi9(
+; CHECK-NEXT:    [[P:%.*]] = alloca <16 x i9>, align 1
+; CHECK-NEXT:    [[G:%.*]] = getelementptr i8, ptr [[P]], i64 4
+; CHECK-NEXT:    [[L:%.*]] = load <7 x i9>, ptr [[G]], align 1
+; CHECK-NEXT:    ret <7 x i9> [[L]]
+;
+  %p = alloca <16 x i9>, align 1
+  %g = getelementptr i8, ptr %p, i64 4
+  %l = load <7 x i9>, ptr %g, align 1
+  ret <7 x i9> %l
+}
+
+define <8 x i1> @load_elem_i1_access_8xi1() {
+; CHECK-LABEL: @load_elem_i1_access_8xi1(
+; CHECK-NEXT:    [[P:%.*]] = alloca <16 x i1>, align 1
+; CHECK-NEXT:    [[G:%.*]] = getelementptr i8, ptr [[P]], i64 4
+; CHECK-NEXT:    [[L:%.*]] = load <8 x i1>, ptr [[G]], align 1
+; CHECK-NEXT:    ret <8 x i1> [[L]]
+;
+  %p = alloca <16 x i1>, align 1
+  %g = getelementptr i8, ptr %p, i64 4
+  %l = load <8 x i1>, ptr %g, align 1
+  ret <8 x i1> %l
+}
+
+define <3 x i1> @load_elem_i1_access_3xi1() {
+; CHECK-LABEL: @load_elem_i1_access_3xi1(
+; CHECK-NEXT:    [[P:%.*]] = alloca <16 x i1>, align 1
+; CHECK-NEXT:    [[G:%.*]] = getelementptr i8, ptr [[P]], i64 4
+; CHECK-NEXT:    [[L:%.*]] = load <3 x i1>, ptr [[G]], align 1
+; CHECK-NEXT:    ret <3 x i1> [[L]]
+;
+  %p = alloca <16 x i1>, align 1
+  %g = getelementptr i8, ptr %p, i64 4
+  %l = load <3 x i1>, ptr %g, align 1
+  ret <3 x i1> %l
+}
+
+define <3 x i1> @load_elem_i8_access_3xi1() {
+; CHECK-LABEL: @load_elem_i8_access_3xi1(
+; CHECK-NEXT:    [[P:%.*]] = alloca <8 x i8>, align 1
+; CHECK-NEXT:    store <8 x i8> <i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8>, ptr [[P]], align 1
+; CHECK-NEXT:    [[G:%.*]] = getelementptr <4 x i8>, ptr [[P]], i64 1
+; CHECK-NEXT:    [[L:%.*]] = load <3 x i1>, ptr [[G]], align 1
+; CHECK-NEXT:    ret <3 x i1> [[L]]
+;
+  %p = alloca <8 x i8>, align 1
+  store <8 x i8> <i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8>, ptr %p, align 1
+  %g = getelementptr <4 x i8>, ptr %p, i64 1
+  %l = load <3 x i1>, ptr %g, align 1
+  ret <3 x i1> %l
+}
+
+; This one is actually not problematic.
+define <8 x i1> @load_elem_i8_access_8xi1() {
+; CHECK-LABEL: @load_elem_i8_access_8xi1(
+; CHECK-NEXT:    [[P:%.*]] = freeze <8 x i8> poison
+; CHECK-NEXT:    ret <8 x i1> <i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false>
+;
+  %p = alloca <8 x i8>, align 1
+  store <8 x i8> <i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8>, ptr %p, align 1
+  %g = getelementptr <4 x i8>, ptr %p, i64 1
+  %l = load <8 x i1>, ptr %g, align 1
+  ret <8 x i1> %l
+}
+
+define <8 x i1> @storeload_elem_i1_access_8xi1() {
+; CHECK-LABEL: @storeload_elem_i1_access_8xi1(
+; CHECK-NEXT:    [[P:%.*]] = alloca <16 x i1>, align 1
+; CHECK-NEXT:    [[G:%.*]] = getelementptr i8, ptr [[P]], i64 4
+; CHECK-NEXT:    store <8 x i1> <i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false>, ptr [[G]], align 1
+; CHECK-NEXT:    [[L:%.*]] = load <8 x i1>, ptr [[G]], align 1
+; CHECK-NEXT:    ret <8 x i1> [[L]]
+;
+  %p = alloca <16 x i1>, align 1
+  %g = getelementptr i8, ptr %p, i64 4
+  store <8 x i1> <i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false>, ptr %g, align 1
+  %l = load <8 x i1>, ptr %g, align 1
+  ret <8 x i1> %l
+}
+
+define <3 x i1> @storeload_elem_i1_access_3xi1() {
+; CHECK-LABEL: @storeload_elem_i1_access_3xi1(
+; CHECK-NEXT:    [[P:%.*]] = alloca <16 x i1>, align 1
+; CHECK-NEXT:    [[G:%.*]] = getelementptr i8, ptr [[P]], i64 4
+; CHECK-NEXT:    store <3 x i1> <i1 true, i1 false, i1 true>, ptr [[G]], align 1
+; CHECK-NEXT:    [[L:%.*]] = load <3 x i1>, ptr [[G]], align 1
+; CHECK-NEXT:    ret <3 x i1> [[L]]
+;
+  %p = alloca <16 x i1>, align 1
+  %g = getelementptr i8, ptr %p, i64 4
+  store <3 x i1> <i1 true, i1 false, i1 true>, ptr %g, align 1
+  %l = load <3 x i1>, ptr %g, align 1
+  ret <3 x i1> %l
+}
+
+define <3 x i1> @storeload_elem_i8_access_3xi1() {
+; CHECK-LABEL: @storeload_elem_i8_access_3xi1(
+; CHECK-NEXT:    [[P:%.*]] = alloca <8 x i8>, align 1
+; CHECK-NEXT:    store <8 x i8> <i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8>, ptr [[P]], align 1
+; CHECK-NEXT:    [[G:%.*]] = getelementptr <4 x i8>, ptr [[P]], i64 1
+; CHECK-NEXT:    store <3 x i1> <i1 true, i1 false, i1 true>, ptr [[G]], align 1
+; CHECK-NEXT:    [[L:%.*]] = load <3 x i1>, ptr [[G]], align 1
+; CHECK-NEXT:    ret <3 x i1> [[L]]
+;
+  %p = alloca <8 x i8>, align 1
+  store <8 x i8> <i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8>, ptr %p, align 1
+  %g = getelementptr <4 x i8>, ptr %p, i64 1
+  store <3 x i1> <i1 true, i1 false, i1 true>, ptr %g, align 1
+  %l = load <3 x i1>, ptr %g, align 1
+  ret <3 x i1> %l
+}
+
+; This one is actually not problematic.
+define <8 x i1> @storeload_elem_i8_access_8xi1() {
+; CHECK-LABEL: @storeload_elem_i8_access_8xi1(
+; CHECK-NEXT:    [[P:%.*]] = freeze <8 x i8> poison
+; CHECK-NEXT:    ret <8 x i1> <i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false>
+;
+  %p = alloca <8 x i8>, align 1
+  store <8 x i8> <i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8>, ptr %p, align 1
+  %g = getelementptr <4 x i8>, ptr %p, i64 1
+  store <8 x i1> <i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false>, ptr %g, align 1
+  %l = load <8 x i1>, ptr %g, align 1
+  ret <8 x i1> %l
+}
+
+define <8 x i1> @array_of_vec_elem_i1_access_8xi1() {
+; CHECK-LABEL: @array_of_vec_elem_i1_access_8xi1(
+; CHECK-NEXT:    [[P:%.*]] = alloca [2 x <16 x i1>], align 1
+; CHECK-NEXT:    [[G:%.*]] = getelementptr i8, ptr [[P]], i64 4
+; CHECK-NEXT:    store <8 x i1> <i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false>, ptr [[G]], align 1
+; CHECK-NEXT:    [[L:%.*]] = load <8 x i1>, ptr [[G]], align 1
+; CHECK-NEXT:    ret <8 x i1> [[L]]
+;
+  %p = alloca [2 x <16 x i1>], align 1
+  %g = getelementptr i8, ptr %p, i64 4
+  store <8 x i1> <i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false>, ptr %g, align 1
+  %l = load <8 x i1>, ptr %g, align 1
+  ret <8 x i1> %l
+}
+
+define <3 x i1> @array_of_vec_elem_i1_access_3xi1() {
+; CHECK-LABEL: @array_of_vec_elem_i1_access_3xi1(
+; CHECK-NEXT:    [[P:%.*]] = alloca [2 x <16 x i1>], align 1
+; CHECK-NEXT:    [[G:%.*]] = getelementptr i8, ptr [[P]], i64 4
+; CHECK-NEXT:    store <3 x i1> <i1 true, i1 false, i1 true>, ptr [[G]], align 1
+; CHECK-NEXT:    [[L:%.*]] = load <3 x i1>, ptr [[G]], align 1
+; CHECK-NEXT:    ret <3 x i1> [[L]]
+;
+  %p = alloca [2 x <16 x i1>], align 1
+  %g = getelementptr i8, ptr %p, i64 4
+  store <3 x i1> <i1 true, i1 false, i1 true>, ptr %g, align 1
+  %l = load <3 x i1>, ptr %g, align 1
+  ret <3 x i1> %l
+}
+
+define <3 x i1> @array_of_vec_elem_i8_access_3xi1() {
+; CHECK-LABEL: @array_of_vec_elem_i8_access_3xi1(
+; CHECK-NEXT:    [[P:%.*]] = alloca [2 x <8 x i8>], align 1
+; CHECK-NEXT:    store <8 x i8> <i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8>, ptr [[P]], align 1
+; CHECK-NEXT:    [[G:%.*]] = getelementptr <4 x i8>, ptr [[P]], i64 1
+; CHECK-NEXT:    store <3 x i1> <i1 true, i1 false, i1 true>, ptr [[G]], align 1
+; CHECK-NEXT:    [[L:%.*]] = load <3 x i1>, ptr [[G]], align 1
+; CHECK-NEXT:    ret <3 x i1> [[L]]
+;
+  %p = alloca [2 x <8 x i8>], align 1
+  store <8 x i8> <i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8>, ptr %p, align 1
+  %g = getelementptr <4 x i8>, ptr %p, i64 1
+  store <3 x i1> <i1 true, i1 false, i1 true>, ptr %g, align 1
+  %l = load <3 x i1>, ptr %g, align 1
+  ret <3 x i1> %l
+}
+
+; This one is actually not problematic.
+define <8 x i1> @array_of_vec_elem_i8_access_8xi1() {
+; CHECK-LABEL: @array_of_vec_elem_i8_access_8xi1(
+; CHECK-NEXT:    [[P:%.*]] = freeze <16 x i8> poison
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <16 x i8> [[P]], i8 1, i32 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <16 x i8> [[TMP1]], i8 2, i32 1
+; CHECK-NEXT:    [[TMP3:%.*]] = insertelement <16 x i8> [[TMP2]], i8 3, i32 2
+; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <16 x i8> [[TMP3]], i8 4, i32 3
+; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <16 x i8> [[TMP4]], i8 5, i32 4
+; CHECK-NEXT:    [[TMP6:%.*]] = insertelement <16 x i8> [[TMP5]], i8 6, i32 5
+; CHECK-NEXT:    [[TMP7:%.*]] = insertelement <16 x i8> [[TMP6]], i8 7, i32 6
+; CHECK-NEXT:    [[TMP8:%.*]] = insertelement <16 x i8> [[TMP7]], i8 8, i32 7
+; CHECK-NEXT:    [[TMP9:%.*]] = insertelement <16 x i8> [[TMP8]], i8 5, i64 4
+; CHECK-NEXT:    ret <8 x i1> <i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false>
+;
+  %p = alloca [2 x <8 x i8>], align 1
+  store <8 x i8> <i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8>, ptr %p, align 1
+  %g = getelementptr <4 x i8>, ptr %p, i64 1
+  store <8 x i1> <i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false>, ptr %g, align 1
+  %l = load <8 x i1>, ptr %g, align 1
+  ret <8 x i1> %l
+}

@ritter-x2a ritter-x2a marked this pull request as ready for review April 2, 2025 07:40
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as #128769

@ritter-x2a
Copy link
Member Author

Same issue as #128769

I reached out to @sgundapa, we agreed to pursue this PR (#134042) over #128769 since it covers more cases.

@ritter-x2a ritter-x2a requested a review from sgundapa April 10, 2025 07:31
Comment on lines +873 to +874
LLVM_DEBUG(dbgs() << " Cannot convert to vector if the allocation size "
"does not match the type's size\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a follow up we should consider adding optimization remarks to the pass

Copy link
Member Author

ritter-x2a commented Apr 14, 2025

Merge activity

  • Apr 14, 3:11 AM EDT: A user started a stack merge that includes this pull request via Graphite.
  • Apr 14, 3:13 AM EDT: A user merged this pull request with Graphite.

@ritter-x2a ritter-x2a merged commit cf188d6 into main Apr 14, 2025
11 checks passed
@ritter-x2a ritter-x2a deleted the users/ritter-x2a/04-02-_amdgpu_avoid_crashes_for_non-byte-sized_types_in_promotealloca branch April 14, 2025 07:13
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…m#134042)

This patch addresses three problems when promoting allocas to vectors:
- Element types with size < 1 byte in allocas with a vector type caused
  divisions by zero.
- Element types whose size doesn't match their AllocSize hit an assertion.
- Access types whose size doesn't match their AllocSize hit an assertion.

With this patch, we do not attempt to promote affected allocas to vectors. In
principle, we could handle these cases in PromoteAlloca, e.g., by truncating
and extending elements from/to their allocation size. It's however unclear if
we ever encounter such cases in practice, so that doesn't seem worth the added
complexity.

For SWDEV-511252
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants