Skip to content

[AMDGPU] Skip handling of non-byte types in promote alloca. #128769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sgundapa
Copy link
Contributor

Non-byte types like i1 can be packed and be supported. For the time being these types are not promoted.

Issue found by fuzzer.

Non-byte types like i1 can be packed and be supported. For the time being
these types are not promoted.

Issue found by fuzzer.
@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Sumanth Gundapaneni (sgundapa)

Changes

Non-byte types like i1 can be packed and be supported. For the time being these types are not promoted.

Issue found by fuzzer.


Full diff: https://github.com/llvm/llvm-project/pull/128769.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp (+9-2)
  • (added) llvm/test/CodeGen/AMDGPU/promote-alloca-skip-non-byte-type.ll (+21)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
index 28016b5936ccf..007f930cea4f3 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
@@ -759,6 +759,14 @@ bool AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(AllocaInst &Alloca) {
     return false;
   }
 
+  Type *VecEltTy = VectorTy->getElementType();
+  constexpr unsigned SIZE_OF_BYTE = 8;
+  unsigned ElementSizeInBits = DL->getTypeSizeInBits(VecEltTy);
+  // FIXME: The non-byte type like i1 can be packed and be supported, but
+  // currently we do not handle them.
+  if (ElementSizeInBits % SIZE_OF_BYTE != 0)
+    return false;
+
   std::map<GetElementPtrInst *, WeakTrackingVH> GEPVectorIdx;
   SmallVector<Instruction *> WorkList;
   SmallVector<Instruction *> UsersToRemove;
@@ -776,8 +784,7 @@ bool AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(AllocaInst &Alloca) {
 
   LLVM_DEBUG(dbgs() << "  Attempting promotion to: " << *VectorTy << "\n");
 
-  Type *VecEltTy = VectorTy->getElementType();
-  unsigned ElementSize = DL->getTypeSizeInBits(VecEltTy) / 8;
+  unsigned ElementSize = ElementSizeInBits / SIZE_OF_BYTE;
   for (auto *U : Uses) {
     Instruction *Inst = cast<Instruction>(U->getUser());
 
diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-skip-non-byte-type.ll b/llvm/test/CodeGen/AMDGPU/promote-alloca-skip-non-byte-type.ll
new file mode 100644
index 0000000000000..3d2234f0a7ac3
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-skip-non-byte-type.ll
@@ -0,0 +1,21 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -mtriple=amdgcn-unknown-amdhsa -passes=amdgpu-promote-alloca < %s | FileCheck %s
+
+; Verify that we do not crash and not promote non-byte alloca types.
+define <8 x i1> @non_byte_alloca_type() {
+; CHECK-LABEL: define <8 x i1> @non_byte_alloca_type() {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[C:%.*]] = icmp ugt <16 x i1> zeroinitializer, zeroinitializer
+; CHECK-NEXT:    [[RP:%.*]] = alloca <8 x i1>, align 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load <8 x i1>, ptr [[RP]], align 1
+; CHECK-NEXT:    store <16 x i1> [[C]], ptr [[RP]], align 2
+; CHECK-NEXT:    ret <8 x i1> [[TMP0]]
+;
+entry:
+  %C = icmp ugt <16 x i1> zeroinitializer, zeroinitializer
+  %RP = alloca <8 x i1>, align 1
+  %0 = load <8 x i1>, ptr %RP, align 1
+  store <16 x i1> %C, ptr %RP, align 2
+  ret <8 x i1> %0
+}
+

@sgundapa sgundapa changed the title [AMDGPU] Skip handling non-byte types in promote alloca. [AMDGPU] Skip handling of non-byte types in promote alloca. Feb 25, 2025
@@ -776,8 +784,7 @@ bool AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(AllocaInst &Alloca) {

LLVM_DEBUG(dbgs() << " Attempting promotion to: " << *VectorTy << "\n");

Type *VecEltTy = VectorTy->getElementType();
unsigned ElementSize = DL->getTypeSizeInBits(VecEltTy) / 8;
unsigned ElementSize = ElementSizeInBits / SIZE_OF_BYTE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC SIZE_OF_BYTE is defined by the whatever compiler compiles LLVM instead of for AMDGPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean , to use some thing like this to derive the value from data layout "DL.getTypeSizeInBits(Type::getInt8Ty(M->getContext()))".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have defined it to be "constexpr unsigned SIZE_OF_BYTE = 8" in line 763. Probably pick a different name ?

Copy link
Contributor

@shiltian shiltian Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed that part. Hardcoding 8 is probably fine for now and in the any near future, but the proper approach is definitely to query DL.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the actual code, I don't see why this doesn't just work for this case. Is the assert wrong?

; CHECK-NEXT: ret <8 x i1> [[TMP0]]
;
entry:
%C = icmp ugt <16 x i1> zeroinitializer, zeroinitializer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use something that can't fold away

;
entry:
%C = icmp ugt <16 x i1> zeroinitializer, zeroinitializer
%RP = alloca <8 x i1>, align 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the correct alloca address space. Also this issue isn't about the UB under-alignment, so correct that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats correct. Here is an example that might trigger an UB

@g = global <8 x float> <float 4.200000e+01, float 4.200000e+01, float 4.200000e+01, float 4.200000e+01, float 4.200000e+01, float 4.200000e+01, float 4.200000e+01, float 4.200000e+01>

define <8 x i1> @f(float %0, i32 %1, i16 %2) {
BB:
%LGV = load <8 x float>, ptr @g, align 32
%RP = alloca <8 x i1>, align 1
%L = load <8 x float>, ptr %RP, align 32
%C = fcmp olt <8 x float> %L, %LGV
ret <8 x i1> %C
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if you do not specify the addrspace , wouldn't it default to generic addrsapce which is "0"

unsigned ElementSizeInBits = DL->getTypeSizeInBits(VecEltTy);
// FIXME: The non-byte type like i1 can be packed and be supported, but
// currently we do not handle them.
if (ElementSizeInBits % SIZE_OF_BYTE != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to replicate typeSizeEqualsStoreSize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will do

store <16 x i1> %C, ptr %RP, align 2
ret <8 x i1> %0
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some tests for the scalar case? Only the subvector extract was a problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion trigered here is due to subvector being <2 x i1> and the access type being <16 x i1>
The access size for < 16 x i1> is 2 and the computation to derive the subvector relies on this access size and ended with a <2xi1> that triggered the assert due to mismatch in storage size.

assert(DL.getTypeStoreSize(SubVecTy) == DL.getTypeStoreSize(AccessTy));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the assertions I am seeing are all being trigerred while handling subvectors for loads and stores.

@arsenm arsenm requested a review from Pierre-vh February 26, 2025 09:54
@sgundapa
Copy link
Contributor Author

ping

@ritter-x2a
Copy link
Member

#134042, which subsumes this PR, has landed in trunk.

@sgundapa sgundapa closed this Apr 21, 2025
@sgundapa
Copy link
Contributor Author

This issue is addressed in here : #134042

@sgundapa sgundapa deleted the alloca branch April 21, 2025 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants