[AMDGPU] Override getRegUsageForType() to fix <N x ptr(7)> crash #126642

krzysz00 · 2025-02-11T01:13:39Z

Even though ptr addrspace(7) is rewritten away into more primitive constructs before reaching SelectionDAG or GlobalISel, we still sometimes need to query properties like how many registers it will require using TargetTransformInfo::getRegUsageForType(Type*).

We already had an existing workaround that would map ptr addrspace(7) (and addrspace(9)) to MVT::{5,6}i32, their ultimate in-register representations, in overloads of
TargetLowering::getPointerType(DL, AddressSpace).

However, whenever TargetLowering::getValueType, which is called by the default getRegUsageForType, tried to construct a vector VT out of those vector MVTs, the vector constructor would assert because you can't have a vector of vectors.

Therefore, we override getRegUsageForType for AMD targets so that we can manually handle the vector of fat pointers case. Furthermore, since this is a vectorization interface, we allow the register usage analyzer to optimistically return the typical case - an will became a splat(p8) and a .

Even though ptr addrspace(7) is rewritten away into more primitive constructs before reaching SelectionDAG or GlobalISel, we stil sometimes need to query properties like how many registers it will require. We already had an existing workaroung that would map ptr addrspace(7) (and addrspace(9)) to MVT::{5,6}i32, their ultimate in-register representations, in overloads of TargetLowering::getPointerType(DL, AddressSpace). However, whenever TargetLowering::getValue() tried to construct a vector VT out of those vector MVTs, the vector constructor would assert because you can't have a vector of vectors. This commit solves the crash by manually overriding getValueTy() and getMemValueType() in the AMDGPU TargetLowering. This is something of a big hammer, and I'm open to suggestions for a more precise change.

llvmbot · 2025-02-11T01:14:13Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Krzysztof Drewniak (krzysz00)

Changes

Even though ptr addrspace(7) is rewritten away into more primitive constructs before reaching SelectionDAG or GlobalISel, we stil sometimes need to query properties like how many registers it will require.

We already had an existing workaroung that would map ptr addrspace(7) (and addrspace(9)) to MVT::{5,6}i32, their ultimate in-register representations, in overloads of
TargetLowering::getPointerType(DL, AddressSpace).

However, whenever TargetLowering::getValue() tried to construct a vector VT out of those vector MVTs, the vector constructor would assert because you can't have a vector of vectors.

This commit solves the crash by manually overriding getValueTy() and getMemValueType() in the AMDGPU TargetLowering.

This is something of a big hammer, and I'm open to suggestions for a more precise change.

Full diff: https://github.com/llvm/llvm-project/pull/126642.diff

4 Files Affected:

(modified) llvm/include/llvm/CodeGen/TargetLowering.h (+4-5)
(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+35)
(modified) llvm/lib/Target/AMDGPU/SIISelLowering.h (+4)
(added) llvm/test/Transforms/LoopVectorize/AMDGPU/buffer-fat-pointer.ll (+39)

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index bbecc7a6ddaee79..8e9e2edc3e1499d 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -1675,8 +1675,8 @@ class TargetLoweringBase {
   /// operations except for the pointer size.  If AllowUnknown is true, this
   /// will return MVT::Other for types with no EVT counterpart (e.g. structs),
   /// otherwise it will assert.
-  EVT getValueType(const DataLayout &DL, Type *Ty,
-                   bool AllowUnknown = false) const {
+  virtual EVT getValueType(const DataLayout &DL, Type *Ty,
+                           bool AllowUnknown = false) const {
     // Lower scalar pointers to native pointer types.
     if (auto *PTy = dyn_cast<PointerType>(Ty))
       return getPointerTy(DL, PTy->getAddressSpace());
@@ -1695,8 +1695,8 @@ class TargetLoweringBase {
     return EVT::getEVT(Ty, AllowUnknown);
   }
 
-  EVT getMemValueType(const DataLayout &DL, Type *Ty,
-                      bool AllowUnknown = false) const {
+  virtual EVT getMemValueType(const DataLayout &DL, Type *Ty,
+                              bool AllowUnknown = false) const {
     // Lower scalar pointers to native pointer types.
     if (auto *PTy = dyn_cast<PointerType>(Ty))
       return getPointerMemTy(DL, PTy->getAddressSpace());
@@ -1714,7 +1714,6 @@ class TargetLoweringBase {
     return getValueType(DL, Ty, AllowUnknown);
   }
 
-
   /// Return the MVT corresponding to this LLVM type. See getValueType.
   MVT getSimpleValueType(const DataLayout &DL, Type *Ty,
                          bool AllowUnknown = false) const {
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index b632c50dae0e359..7bb3a9f262419e7 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1203,6 +1203,41 @@ MVT SITargetLowering::getPointerMemTy(const DataLayout &DL, unsigned AS) const {
   return AMDGPUTargetLowering::getPointerMemTy(DL, AS);
 }
 
+/// Passes like the loop vectorizer will, for example, try to query the size in
+/// registers of buffer fat pointer. They don't exist by the time we reach
+/// codegen, but these queries can still come in. Unfortunately, something like
+/// <2 x ptr addrspace(7)> will get lowered to <2 x v5i32> by the workarounds
+/// above, which causes a crash. Handle this case here.
+EVT SITargetLowering::getValueType(const DataLayout &DL, Type *Ty,
+                                   bool AllowUnknown) const {
+  if (auto *VT = dyn_cast<VectorType>(Ty)) {
+    if (auto *PT = dyn_cast<PointerType>(VT->getElementType())) {
+      MVT MET = getPointerTy(DL, PT->getAddressSpace());
+      if (MET.isVector() && MET.getVectorElementType() == MVT::i32) {
+        return EVT::getVectorVT(
+            Ty->getContext(), EVT(MET.getVectorElementType()),
+            VT->getElementCount() * MET.getVectorNumElements());
+      }
+    }
+  }
+  return AMDGPUTargetLowering::getValueType(DL, Ty, AllowUnknown);
+}
+
+EVT SITargetLowering::getMemValueType(const DataLayout &DL, Type *Ty,
+                                      bool AllowUnknown) const {
+  if (auto *VT = dyn_cast<VectorType>(Ty)) {
+    if (auto *PT = dyn_cast<PointerType>(VT->getElementType())) {
+      MVT ScalarTy = getPointerMemTy(DL, PT->getAddressSpace());
+      if (ScalarTy.isVector() && ScalarTy.getVectorElementType() == MVT::i32) {
+        return EVT::getVectorVT(
+            Ty->getContext(), EVT(ScalarTy.getVectorElementType()),
+            VT->getElementCount() * ScalarTy.getVectorNumElements());
+      }
+    }
+  }
+  return AMDGPUTargetLowering::getValueType(DL, Ty, AllowUnknown);
+}
+
 bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
                                           const CallInst &CI,
                                           MachineFunction &MF,
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.h b/llvm/lib/Target/AMDGPU/SIISelLowering.h
index 1cd7f1b29e0772c..f355e031c5f89b1 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.h
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.h
@@ -306,6 +306,10 @@ class SITargetLowering final : public AMDGPUTargetLowering {
   // so, to work around the lack of i160, map it to v5i32.
   MVT getPointerTy(const DataLayout &DL, unsigned AS) const override;
   MVT getPointerMemTy(const DataLayout &DL, unsigned AS) const override;
+  EVT getValueType(const DataLayout &DL, Type *Ty,
+                   bool AllowUnknown = false) const override;
+  EVT getMemValueType(const DataLayout &DL, Type *Ty,
+                      bool AllowUnknown = false) const override;
 
   bool getTgtMemIntrinsic(IntrinsicInfo &, const CallInst &,
                           MachineFunction &MF,
diff --git a/llvm/test/Transforms/LoopVectorize/AMDGPU/buffer-fat-pointer.ll b/llvm/test/Transforms/LoopVectorize/AMDGPU/buffer-fat-pointer.ll
new file mode 100644
index 000000000000000..b7cf8db453dcfbc
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AMDGPU/buffer-fat-pointer.ll
@@ -0,0 +1,39 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -passes=loop-vectorize -S < %s | FileCheck %s
+
+; Reduced from a crash, variables added to make things more realistic.
+; This is a roundabout test for TargetLowering::getValueType() returning
+; a reasonable value for <N x p7> instead of asserting.
+define amdgpu_kernel void @_dynamic_pack_simple_dispatch_0_pack_i32(ptr addrspace(1) %.ptr, i64 %0) {
+; CHECK-LABEL: define amdgpu_kernel void @_dynamic_pack_simple_dispatch_0_pack_i32(
+; CHECK-SAME: ptr addrspace(1) [[DOTPTR:%.*]], i64 [[TMP0:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[_LR_PH5:.*:]]
+; CHECK-NEXT:    [[DOTRSRC:%.*]] = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) [[DOTPTR]], i16 0, i32 -2147483648, i32 159744)
+; CHECK-NEXT:    [[TMP1:%.*]] = addrspacecast ptr addrspace(8) [[DOTRSRC]] to ptr addrspace(7)
+; CHECK-NEXT:    br label %[[BB2:.*]]
+; CHECK:       [[BB2]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = phi i64 [ 0, [[DOTLR_PH5:%.*]] ], [ [[TMP5:%.*]], %[[BB2]] ]
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr i32, ptr addrspace(7) [[TMP1]], i32 0
+; CHECK-NEXT:    [[TMP5]] = add i64 [[TMP3]], 1
+; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[TMP3]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], [[DOT_CRIT_EDGE_LOOPEXIT:label %.*]], label %[[BB2]]
+; CHECK:       [[__CRIT_EDGE_LOOPEXIT:.*:]]
+; CHECK-NEXT:    ret void
+;
+.lr.ph5:
+  %.rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) %.ptr, i16 0, i32 2147483648, i32 159744)
+  %1 = addrspacecast ptr addrspace(8) %.rsrc to ptr addrspace(7)
+  br label %2
+
+2:                                                ; preds = %2, %.lr.ph5
+  %3 = phi i64 [ 0, %.lr.ph5 ], [ %5, %2 ]
+  %4 = getelementptr i32, ptr addrspace(7) %1, i32 0
+  %5 = add i64 %3, 1
+  %exitcond.not = icmp eq i64 %3, %0
+  br i1 %exitcond.not, label %._crit_edge.loopexit, label %2
+
+._crit_edge.loopexit:                             ; preds = %2
+  ret void
+}
+
+declare ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) readnone, i16, i32, i32)

llvm/test/Transforms/LoopVectorize/AMDGPU/buffer-fat-pointer.ll

arsenm · 2025-02-11T12:29:19Z

llvm/include/llvm/CodeGen/TargetLowering.h

@@ -1675,8 +1675,8 @@ class TargetLoweringBase {
  /// operations except for the pointer size.  If AllowUnknown is true, this
  /// will return MVT::Other for types with no EVT counterpart (e.g. structs),
  /// otherwise it will assert.
-  EVT getValueType(const DataLayout &DL, Type *Ty,
-                   bool AllowUnknown = false) const {
+  virtual EVT getValueType(const DataLayout &DL, Type *Ty,


I'd prefer to directly handle this in getValueType, rather than adding yet another target knob for really low level implementation details. Is this used in any meaningful way? Can we just return an MVT::Other or something in the unhandled case?

I've added some handling to getValueType()

And ... I'm not sure. The getValueType() came in from the loop vectorizer wanting to know how many registers would be needed to hold some value, and that led to the crash. It didn't help that, per comments from the last workaround of this sort I needed, there're a lot of pre-codegen uses of MVT getPointerTYpe(unsigned AS); that needed to be worked around with MVT::v5i32

arsenm · 2025-02-12T06:49:30Z

Looking through the backtrace, this doesn't look like the first kludge inserted to handle irregular cases. The LoopVectorizer itself it pre-filtering tokens, invalid vector elements, and scalable vectors. Computing the number of registers used shouldn't really require figuring out the exact set of EVTs, and the users shouldn't need to worry about the edge cases either.

I really hate this chain of interfaces, but I suppose you could override getNumRegisters (which is already overridable) for this case.

krzysz00 · 2025-02-12T07:17:20Z

Yeah, I was considering overriding getNumRegisters(), but I figured I might as well hit the more general case ...

I'm entirely willing to make this a getNumRegisters() override if that seems better

krzysz00 · 2025-02-15T00:47:32Z

@arsenm I think we might want to go back to the version of the patch that amends getValueType() since, with this version of the patch, I found a crash via the SLP vectorizer that goes

#12 0x0000000003dd3e5a llvm::EVT::getExtendedVectorVT(llvm::LLVMContext&, llvm::EVT, llvm::ElementCount) /home/kdrewnia/iree/third_party/llvm-project/llvm/lib/CodeGen/ValueTypes.cpp:57:21
#13 0x00000000038d224f llvm::TargetLoweringBase::getValueType(llvm::DataLayout const&, llvm::Type*, bool) const /home/kdrewnia/iree/third_party/llvm-project/llvm/include/llvm/CodeGen/TargetLowering.h:1696:3
#14 0x0000000004cbf4b7 llvm::BasicTTIImplBase<llvm::GCNTTIImpl>::getTypeLegalizationCost(llvm::Type*) const /home/kdrewnia/iree/third_party/llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h:924:25
#15 0x0000000004cc0ffa llvm::BasicTTIImplBase<llvm::GCNTTIImpl>::getNumberOfParts(llvm::Type*) /home/kdrewnia/iree/third_party/llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h:0:42
#16 0x000000000582e36e getFloorFullVectorNumberOfElements(llvm::TargetTransformInfo const&, llvm::Type*, unsigned int) /home/kdrewnia/iree/third_party/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:292:16

Haven't isolated a reproducer, but I figured I'd note I might want to go back to a kludge in getValueType()

krzysz00 · 2025-02-18T20:14:55Z

Closing in favor of #127692

krzysz00 requested a review from arsenm February 11, 2025 01:13

llvmbot added backend:AMDGPU llvm:transforms labels Feb 11, 2025

arsenm reviewed Feb 11, 2025

View reviewed changes

Rework PR per Matt

dbb5c4d

krzysz00 changed the title ~~Make TargetLowering::getValueType() virtual to fix <N x ptr(7)> crash~~ Add case to TargetLowering::getValueType() to fix AMDGPU <N x ptr(7)> crash Feb 11, 2025

Undo version 2 of the change

7303934

krzysz00 changed the title ~~Add case to TargetLowering::getValueType() to fix AMDGPU <N x ptr(7)> crash~~ [AMDGPU] Override getRegUsageForType() to fix <N x ptr(7)> crash Feb 12, 2025

Rework the patch, v3

fa727bc

krzysz00 closed this Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Override getRegUsageForType() to fix <N x ptr(7)> crash #126642

[AMDGPU] Override getRegUsageForType() to fix <N x ptr(7)> crash #126642

Uh oh!

krzysz00 commented Feb 11, 2025 •

edited

Loading

Uh oh!

llvmbot commented Feb 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

arsenm Feb 11, 2025

Uh oh!

krzysz00 Feb 11, 2025

Uh oh!

arsenm commented Feb 12, 2025

Uh oh!

krzysz00 commented Feb 12, 2025

Uh oh!

krzysz00 commented Feb 15, 2025

Uh oh!

krzysz00 commented Feb 18, 2025

Uh oh!

Uh oh!

[AMDGPU] Override getRegUsageForType() to fix <N x ptr(7)> crash #126642

[AMDGPU] Override getRegUsageForType() to fix <N x ptr(7)> crash #126642

Uh oh!

Conversation

krzysz00 commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

arsenm Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

krzysz00 Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm commented Feb 12, 2025

Uh oh!

krzysz00 commented Feb 12, 2025

Uh oh!

krzysz00 commented Feb 15, 2025

Uh oh!

krzysz00 commented Feb 18, 2025

Uh oh!

Uh oh!

krzysz00 commented Feb 11, 2025 •

edited

Loading

llvmbot commented Feb 11, 2025 •

edited

Loading