Skip to content

[AMDGPU] Override getRegUsageForType() to fix <N x ptr(7)> crash #126642

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

krzysz00
Copy link
Contributor

@krzysz00 krzysz00 commented Feb 11, 2025

Even though ptr addrspace(7) is rewritten away into more primitive constructs before reaching SelectionDAG or GlobalISel, we still sometimes need to query properties like how many registers it will require using TargetTransformInfo::getRegUsageForType(Type*).

We already had an existing workaround that would map ptr addrspace(7) (and addrspace(9)) to MVT::{5,6}i32, their ultimate in-register representations, in overloads of
TargetLowering::getPointerType(DL, AddressSpace).

However, whenever TargetLowering::getValueType, which is called by the default getRegUsageForType, tried to construct a vector VT out of those vector MVTs, the vector constructor would assert because you can't have a vector of vectors.

Therefore, we override getRegUsageForType for AMD targets so that we can manually handle the vector of fat pointers case. Furthermore, since this is a vectorization interface, we allow the register usage analyzer to optimistically return the typical case - an will became a splat(p8) and a .

Even though ptr addrspace(7) is rewritten away into more primitive
constructs before reaching SelectionDAG or GlobalISel, we stil
sometimes need to query properties like how many registers it will
require.

We already had an existing workaroung that would map ptr
addrspace(7) (and addrspace(9)) to MVT::{5,6}i32, their ultimate
in-register representations, in overloads of
TargetLowering::getPointerType(DL, AddressSpace).

However, whenever TargetLowering::getValue() tried to construct a
vector VT out of those vector MVTs, the vector constructor would
assert because you can't have a vector of vectors.

This commit solves the crash by manually overriding getValueTy() and
getMemValueType() in the AMDGPU TargetLowering.

This is something of a big hammer, and I'm open to suggestions for a
more precise change.
@llvmbot
Copy link
Member

llvmbot commented Feb 11, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Krzysztof Drewniak (krzysz00)

Changes

Even though ptr addrspace(7) is rewritten away into more primitive constructs before reaching SelectionDAG or GlobalISel, we stil sometimes need to query properties like how many registers it will require.

We already had an existing workaroung that would map ptr addrspace(7) (and addrspace(9)) to MVT::{5,6}i32, their ultimate in-register representations, in overloads of
TargetLowering::getPointerType(DL, AddressSpace).

However, whenever TargetLowering::getValue() tried to construct a vector VT out of those vector MVTs, the vector constructor would assert because you can't have a vector of vectors.

This commit solves the crash by manually overriding getValueTy() and getMemValueType() in the AMDGPU TargetLowering.

This is something of a big hammer, and I'm open to suggestions for a more precise change.


Full diff: https://github.com/llvm/llvm-project/pull/126642.diff

4 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/TargetLowering.h (+4-5)
  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+35)
  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.h (+4)
  • (added) llvm/test/Transforms/LoopVectorize/AMDGPU/buffer-fat-pointer.ll (+39)
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index bbecc7a6ddaee79..8e9e2edc3e1499d 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -1675,8 +1675,8 @@ class TargetLoweringBase {
   /// operations except for the pointer size.  If AllowUnknown is true, this
   /// will return MVT::Other for types with no EVT counterpart (e.g. structs),
   /// otherwise it will assert.
-  EVT getValueType(const DataLayout &DL, Type *Ty,
-                   bool AllowUnknown = false) const {
+  virtual EVT getValueType(const DataLayout &DL, Type *Ty,
+                           bool AllowUnknown = false) const {
     // Lower scalar pointers to native pointer types.
     if (auto *PTy = dyn_cast<PointerType>(Ty))
       return getPointerTy(DL, PTy->getAddressSpace());
@@ -1695,8 +1695,8 @@ class TargetLoweringBase {
     return EVT::getEVT(Ty, AllowUnknown);
   }
 
-  EVT getMemValueType(const DataLayout &DL, Type *Ty,
-                      bool AllowUnknown = false) const {
+  virtual EVT getMemValueType(const DataLayout &DL, Type *Ty,
+                              bool AllowUnknown = false) const {
     // Lower scalar pointers to native pointer types.
     if (auto *PTy = dyn_cast<PointerType>(Ty))
       return getPointerMemTy(DL, PTy->getAddressSpace());
@@ -1714,7 +1714,6 @@ class TargetLoweringBase {
     return getValueType(DL, Ty, AllowUnknown);
   }
 
-
   /// Return the MVT corresponding to this LLVM type. See getValueType.
   MVT getSimpleValueType(const DataLayout &DL, Type *Ty,
                          bool AllowUnknown = false) const {
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index b632c50dae0e359..7bb3a9f262419e7 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1203,6 +1203,41 @@ MVT SITargetLowering::getPointerMemTy(const DataLayout &DL, unsigned AS) const {
   return AMDGPUTargetLowering::getPointerMemTy(DL, AS);
 }
 
+/// Passes like the loop vectorizer will, for example, try to query the size in
+/// registers of buffer fat pointer. They don't exist by the time we reach
+/// codegen, but these queries can still come in. Unfortunately, something like
+/// <2 x ptr addrspace(7)> will get lowered to <2 x v5i32> by the workarounds
+/// above, which causes a crash. Handle this case here.
+EVT SITargetLowering::getValueType(const DataLayout &DL, Type *Ty,
+                                   bool AllowUnknown) const {
+  if (auto *VT = dyn_cast<VectorType>(Ty)) {
+    if (auto *PT = dyn_cast<PointerType>(VT->getElementType())) {
+      MVT MET = getPointerTy(DL, PT->getAddressSpace());
+      if (MET.isVector() && MET.getVectorElementType() == MVT::i32) {
+        return EVT::getVectorVT(
+            Ty->getContext(), EVT(MET.getVectorElementType()),
+            VT->getElementCount() * MET.getVectorNumElements());
+      }
+    }
+  }
+  return AMDGPUTargetLowering::getValueType(DL, Ty, AllowUnknown);
+}
+
+EVT SITargetLowering::getMemValueType(const DataLayout &DL, Type *Ty,
+                                      bool AllowUnknown) const {
+  if (auto *VT = dyn_cast<VectorType>(Ty)) {
+    if (auto *PT = dyn_cast<PointerType>(VT->getElementType())) {
+      MVT ScalarTy = getPointerMemTy(DL, PT->getAddressSpace());
+      if (ScalarTy.isVector() && ScalarTy.getVectorElementType() == MVT::i32) {
+        return EVT::getVectorVT(
+            Ty->getContext(), EVT(ScalarTy.getVectorElementType()),
+            VT->getElementCount() * ScalarTy.getVectorNumElements());
+      }
+    }
+  }
+  return AMDGPUTargetLowering::getValueType(DL, Ty, AllowUnknown);
+}
+
 bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
                                           const CallInst &CI,
                                           MachineFunction &MF,
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.h b/llvm/lib/Target/AMDGPU/SIISelLowering.h
index 1cd7f1b29e0772c..f355e031c5f89b1 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.h
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.h
@@ -306,6 +306,10 @@ class SITargetLowering final : public AMDGPUTargetLowering {
   // so, to work around the lack of i160, map it to v5i32.
   MVT getPointerTy(const DataLayout &DL, unsigned AS) const override;
   MVT getPointerMemTy(const DataLayout &DL, unsigned AS) const override;
+  EVT getValueType(const DataLayout &DL, Type *Ty,
+                   bool AllowUnknown = false) const override;
+  EVT getMemValueType(const DataLayout &DL, Type *Ty,
+                      bool AllowUnknown = false) const override;
 
   bool getTgtMemIntrinsic(IntrinsicInfo &, const CallInst &,
                           MachineFunction &MF,
diff --git a/llvm/test/Transforms/LoopVectorize/AMDGPU/buffer-fat-pointer.ll b/llvm/test/Transforms/LoopVectorize/AMDGPU/buffer-fat-pointer.ll
new file mode 100644
index 000000000000000..b7cf8db453dcfbc
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AMDGPU/buffer-fat-pointer.ll
@@ -0,0 +1,39 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -passes=loop-vectorize -S < %s | FileCheck %s
+
+; Reduced from a crash, variables added to make things more realistic.
+; This is a roundabout test for TargetLowering::getValueType() returning
+; a reasonable value for <N x p7> instead of asserting.
+define amdgpu_kernel void @_dynamic_pack_simple_dispatch_0_pack_i32(ptr addrspace(1) %.ptr, i64 %0) {
+; CHECK-LABEL: define amdgpu_kernel void @_dynamic_pack_simple_dispatch_0_pack_i32(
+; CHECK-SAME: ptr addrspace(1) [[DOTPTR:%.*]], i64 [[TMP0:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[_LR_PH5:.*:]]
+; CHECK-NEXT:    [[DOTRSRC:%.*]] = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) [[DOTPTR]], i16 0, i32 -2147483648, i32 159744)
+; CHECK-NEXT:    [[TMP1:%.*]] = addrspacecast ptr addrspace(8) [[DOTRSRC]] to ptr addrspace(7)
+; CHECK-NEXT:    br label %[[BB2:.*]]
+; CHECK:       [[BB2]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = phi i64 [ 0, [[DOTLR_PH5:%.*]] ], [ [[TMP5:%.*]], %[[BB2]] ]
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr i32, ptr addrspace(7) [[TMP1]], i32 0
+; CHECK-NEXT:    [[TMP5]] = add i64 [[TMP3]], 1
+; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[TMP3]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], [[DOT_CRIT_EDGE_LOOPEXIT:label %.*]], label %[[BB2]]
+; CHECK:       [[__CRIT_EDGE_LOOPEXIT:.*:]]
+; CHECK-NEXT:    ret void
+;
+.lr.ph5:
+  %.rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) %.ptr, i16 0, i32 2147483648, i32 159744)
+  %1 = addrspacecast ptr addrspace(8) %.rsrc to ptr addrspace(7)
+  br label %2
+
+2:                                                ; preds = %2, %.lr.ph5
+  %3 = phi i64 [ 0, %.lr.ph5 ], [ %5, %2 ]
+  %4 = getelementptr i32, ptr addrspace(7) %1, i32 0
+  %5 = add i64 %3, 1
+  %exitcond.not = icmp eq i64 %3, %0
+  br i1 %exitcond.not, label %._crit_edge.loopexit, label %2
+
+._crit_edge.loopexit:                             ; preds = %2
+  ret void
+}
+
+declare ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) readnone, i16, i32, i32)

@@ -1675,8 +1675,8 @@ class TargetLoweringBase {
/// operations except for the pointer size. If AllowUnknown is true, this
/// will return MVT::Other for types with no EVT counterpart (e.g. structs),
/// otherwise it will assert.
EVT getValueType(const DataLayout &DL, Type *Ty,
bool AllowUnknown = false) const {
virtual EVT getValueType(const DataLayout &DL, Type *Ty,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to directly handle this in getValueType, rather than adding yet another target knob for really low level implementation details. Is this used in any meaningful way? Can we just return an MVT::Other or something in the unhandled case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some handling to getValueType()

And ... I'm not sure. The getValueType() came in from the loop vectorizer wanting to know how many registers would be needed to hold some value, and that led to the crash. It didn't help that, per comments from the last workaround of this sort I needed, there're a lot of pre-codegen uses of MVT getPointerTYpe(unsigned AS); that needed to be worked around with MVT::v5i32

@krzysz00 krzysz00 changed the title Make TargetLowering::getValueType() virtual to fix <N x ptr(7)> crash Add case to TargetLowering::getValueType() to fix AMDGPU <N x ptr(7)> crash Feb 11, 2025
@arsenm
Copy link
Contributor

arsenm commented Feb 12, 2025

Looking through the backtrace, this doesn't look like the first kludge inserted to handle irregular cases. The LoopVectorizer itself it pre-filtering tokens, invalid vector elements, and scalable vectors. Computing the number of registers used shouldn't really require figuring out the exact set of EVTs, and the users shouldn't need to worry about the edge cases either.

I really hate this chain of interfaces, but I suppose you could override getNumRegisters (which is already overridable) for this case.

@krzysz00
Copy link
Contributor Author

Yeah, I was considering overriding getNumRegisters(), but I figured I might as well hit the more general case ...

I'm entirely willing to make this a getNumRegisters() override if that seems better

@krzysz00 krzysz00 changed the title Add case to TargetLowering::getValueType() to fix AMDGPU <N x ptr(7)> crash [AMDGPU] Override getRegUsageForType() to fix <N x ptr(7)> crash Feb 12, 2025
@krzysz00
Copy link
Contributor Author

@arsenm I think we might want to go back to the version of the patch that amends getValueType() since, with this version of the patch, I found a crash via the SLP vectorizer that goes

#12 0x0000000003dd3e5a llvm::EVT::getExtendedVectorVT(llvm::LLVMContext&, llvm::EVT, llvm::ElementCount) /home/kdrewnia/iree/third_party/llvm-project/llvm/lib/CodeGen/ValueTypes.cpp:57:21
#13 0x00000000038d224f llvm::TargetLoweringBase::getValueType(llvm::DataLayout const&, llvm::Type*, bool) const /home/kdrewnia/iree/third_party/llvm-project/llvm/include/llvm/CodeGen/TargetLowering.h:1696:3
#14 0x0000000004cbf4b7 llvm::BasicTTIImplBase<llvm::GCNTTIImpl>::getTypeLegalizationCost(llvm::Type*) const /home/kdrewnia/iree/third_party/llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h:924:25
#15 0x0000000004cc0ffa llvm::BasicTTIImplBase<llvm::GCNTTIImpl>::getNumberOfParts(llvm::Type*) /home/kdrewnia/iree/third_party/llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h:0:42
#16 0x000000000582e36e getFloorFullVectorNumberOfElements(llvm::TargetTransformInfo const&, llvm::Type*, unsigned int) /home/kdrewnia/iree/third_party/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:292:16

Haven't isolated a reproducer, but I figured I'd note I might want to go back to a kludge in getValueType()

@krzysz00
Copy link
Contributor Author

Closing in favor of #127692

@krzysz00 krzysz00 closed this Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants