-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] Override getRegUsageForType() to fix <N x ptr(7)> crash #126642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Even though ptr addrspace(7) is rewritten away into more primitive constructs before reaching SelectionDAG or GlobalISel, we stil sometimes need to query properties like how many registers it will require. We already had an existing workaroung that would map ptr addrspace(7) (and addrspace(9)) to MVT::{5,6}i32, their ultimate in-register representations, in overloads of TargetLowering::getPointerType(DL, AddressSpace). However, whenever TargetLowering::getValue() tried to construct a vector VT out of those vector MVTs, the vector constructor would assert because you can't have a vector of vectors. This commit solves the crash by manually overriding getValueTy() and getMemValueType() in the AMDGPU TargetLowering. This is something of a big hammer, and I'm open to suggestions for a more precise change.
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-amdgpu Author: Krzysztof Drewniak (krzysz00) ChangesEven though ptr addrspace(7) is rewritten away into more primitive constructs before reaching SelectionDAG or GlobalISel, we stil sometimes need to query properties like how many registers it will require. We already had an existing workaroung that would map ptr addrspace(7) (and addrspace(9)) to MVT::{5,6}i32, their ultimate in-register representations, in overloads of However, whenever TargetLowering::getValue() tried to construct a vector VT out of those vector MVTs, the vector constructor would assert because you can't have a vector of vectors. This commit solves the crash by manually overriding getValueTy() and getMemValueType() in the AMDGPU TargetLowering. This is something of a big hammer, and I'm open to suggestions for a more precise change. Full diff: https://github.com/llvm/llvm-project/pull/126642.diff 4 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index bbecc7a6ddaee79..8e9e2edc3e1499d 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -1675,8 +1675,8 @@ class TargetLoweringBase {
/// operations except for the pointer size. If AllowUnknown is true, this
/// will return MVT::Other for types with no EVT counterpart (e.g. structs),
/// otherwise it will assert.
- EVT getValueType(const DataLayout &DL, Type *Ty,
- bool AllowUnknown = false) const {
+ virtual EVT getValueType(const DataLayout &DL, Type *Ty,
+ bool AllowUnknown = false) const {
// Lower scalar pointers to native pointer types.
if (auto *PTy = dyn_cast<PointerType>(Ty))
return getPointerTy(DL, PTy->getAddressSpace());
@@ -1695,8 +1695,8 @@ class TargetLoweringBase {
return EVT::getEVT(Ty, AllowUnknown);
}
- EVT getMemValueType(const DataLayout &DL, Type *Ty,
- bool AllowUnknown = false) const {
+ virtual EVT getMemValueType(const DataLayout &DL, Type *Ty,
+ bool AllowUnknown = false) const {
// Lower scalar pointers to native pointer types.
if (auto *PTy = dyn_cast<PointerType>(Ty))
return getPointerMemTy(DL, PTy->getAddressSpace());
@@ -1714,7 +1714,6 @@ class TargetLoweringBase {
return getValueType(DL, Ty, AllowUnknown);
}
-
/// Return the MVT corresponding to this LLVM type. See getValueType.
MVT getSimpleValueType(const DataLayout &DL, Type *Ty,
bool AllowUnknown = false) const {
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index b632c50dae0e359..7bb3a9f262419e7 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1203,6 +1203,41 @@ MVT SITargetLowering::getPointerMemTy(const DataLayout &DL, unsigned AS) const {
return AMDGPUTargetLowering::getPointerMemTy(DL, AS);
}
+/// Passes like the loop vectorizer will, for example, try to query the size in
+/// registers of buffer fat pointer. They don't exist by the time we reach
+/// codegen, but these queries can still come in. Unfortunately, something like
+/// <2 x ptr addrspace(7)> will get lowered to <2 x v5i32> by the workarounds
+/// above, which causes a crash. Handle this case here.
+EVT SITargetLowering::getValueType(const DataLayout &DL, Type *Ty,
+ bool AllowUnknown) const {
+ if (auto *VT = dyn_cast<VectorType>(Ty)) {
+ if (auto *PT = dyn_cast<PointerType>(VT->getElementType())) {
+ MVT MET = getPointerTy(DL, PT->getAddressSpace());
+ if (MET.isVector() && MET.getVectorElementType() == MVT::i32) {
+ return EVT::getVectorVT(
+ Ty->getContext(), EVT(MET.getVectorElementType()),
+ VT->getElementCount() * MET.getVectorNumElements());
+ }
+ }
+ }
+ return AMDGPUTargetLowering::getValueType(DL, Ty, AllowUnknown);
+}
+
+EVT SITargetLowering::getMemValueType(const DataLayout &DL, Type *Ty,
+ bool AllowUnknown) const {
+ if (auto *VT = dyn_cast<VectorType>(Ty)) {
+ if (auto *PT = dyn_cast<PointerType>(VT->getElementType())) {
+ MVT ScalarTy = getPointerMemTy(DL, PT->getAddressSpace());
+ if (ScalarTy.isVector() && ScalarTy.getVectorElementType() == MVT::i32) {
+ return EVT::getVectorVT(
+ Ty->getContext(), EVT(ScalarTy.getVectorElementType()),
+ VT->getElementCount() * ScalarTy.getVectorNumElements());
+ }
+ }
+ }
+ return AMDGPUTargetLowering::getValueType(DL, Ty, AllowUnknown);
+}
+
bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
const CallInst &CI,
MachineFunction &MF,
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.h b/llvm/lib/Target/AMDGPU/SIISelLowering.h
index 1cd7f1b29e0772c..f355e031c5f89b1 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.h
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.h
@@ -306,6 +306,10 @@ class SITargetLowering final : public AMDGPUTargetLowering {
// so, to work around the lack of i160, map it to v5i32.
MVT getPointerTy(const DataLayout &DL, unsigned AS) const override;
MVT getPointerMemTy(const DataLayout &DL, unsigned AS) const override;
+ EVT getValueType(const DataLayout &DL, Type *Ty,
+ bool AllowUnknown = false) const override;
+ EVT getMemValueType(const DataLayout &DL, Type *Ty,
+ bool AllowUnknown = false) const override;
bool getTgtMemIntrinsic(IntrinsicInfo &, const CallInst &,
MachineFunction &MF,
diff --git a/llvm/test/Transforms/LoopVectorize/AMDGPU/buffer-fat-pointer.ll b/llvm/test/Transforms/LoopVectorize/AMDGPU/buffer-fat-pointer.ll
new file mode 100644
index 000000000000000..b7cf8db453dcfbc
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AMDGPU/buffer-fat-pointer.ll
@@ -0,0 +1,39 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -passes=loop-vectorize -S < %s | FileCheck %s
+
+; Reduced from a crash, variables added to make things more realistic.
+; This is a roundabout test for TargetLowering::getValueType() returning
+; a reasonable value for <N x p7> instead of asserting.
+define amdgpu_kernel void @_dynamic_pack_simple_dispatch_0_pack_i32(ptr addrspace(1) %.ptr, i64 %0) {
+; CHECK-LABEL: define amdgpu_kernel void @_dynamic_pack_simple_dispatch_0_pack_i32(
+; CHECK-SAME: ptr addrspace(1) [[DOTPTR:%.*]], i64 [[TMP0:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: [[_LR_PH5:.*:]]
+; CHECK-NEXT: [[DOTRSRC:%.*]] = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) [[DOTPTR]], i16 0, i32 -2147483648, i32 159744)
+; CHECK-NEXT: [[TMP1:%.*]] = addrspacecast ptr addrspace(8) [[DOTRSRC]] to ptr addrspace(7)
+; CHECK-NEXT: br label %[[BB2:.*]]
+; CHECK: [[BB2]]:
+; CHECK-NEXT: [[TMP3:%.*]] = phi i64 [ 0, [[DOTLR_PH5:%.*]] ], [ [[TMP5:%.*]], %[[BB2]] ]
+; CHECK-NEXT: [[TMP4:%.*]] = getelementptr i32, ptr addrspace(7) [[TMP1]], i32 0
+; CHECK-NEXT: [[TMP5]] = add i64 [[TMP3]], 1
+; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[TMP3]], [[TMP0]]
+; CHECK-NEXT: br i1 [[EXITCOND_NOT]], [[DOT_CRIT_EDGE_LOOPEXIT:label %.*]], label %[[BB2]]
+; CHECK: [[__CRIT_EDGE_LOOPEXIT:.*:]]
+; CHECK-NEXT: ret void
+;
+.lr.ph5:
+ %.rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) %.ptr, i16 0, i32 2147483648, i32 159744)
+ %1 = addrspacecast ptr addrspace(8) %.rsrc to ptr addrspace(7)
+ br label %2
+
+2: ; preds = %2, %.lr.ph5
+ %3 = phi i64 [ 0, %.lr.ph5 ], [ %5, %2 ]
+ %4 = getelementptr i32, ptr addrspace(7) %1, i32 0
+ %5 = add i64 %3, 1
+ %exitcond.not = icmp eq i64 %3, %0
+ br i1 %exitcond.not, label %._crit_edge.loopexit, label %2
+
+._crit_edge.loopexit: ; preds = %2
+ ret void
+}
+
+declare ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) readnone, i16, i32, i32)
|
llvm/test/Transforms/LoopVectorize/AMDGPU/buffer-fat-pointer.ll
Outdated
Show resolved
Hide resolved
@@ -1675,8 +1675,8 @@ class TargetLoweringBase { | |||
/// operations except for the pointer size. If AllowUnknown is true, this | |||
/// will return MVT::Other for types with no EVT counterpart (e.g. structs), | |||
/// otherwise it will assert. | |||
EVT getValueType(const DataLayout &DL, Type *Ty, | |||
bool AllowUnknown = false) const { | |||
virtual EVT getValueType(const DataLayout &DL, Type *Ty, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to directly handle this in getValueType, rather than adding yet another target knob for really low level implementation details. Is this used in any meaningful way? Can we just return an MVT::Other or something in the unhandled case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some handling to getValueType()
And ... I'm not sure. The getValueType() came in from the loop vectorizer wanting to know how many registers would be needed to hold some value, and that led to the crash. It didn't help that, per comments from the last workaround of this sort I needed, there're a lot of pre-codegen uses of MVT getPointerTYpe(unsigned AS);
that needed to be worked around with MVT::v5i32
Looking through the backtrace, this doesn't look like the first kludge inserted to handle irregular cases. The LoopVectorizer itself it pre-filtering tokens, invalid vector elements, and scalable vectors. Computing the number of registers used shouldn't really require figuring out the exact set of EVTs, and the users shouldn't need to worry about the edge cases either. I really hate this chain of interfaces, but I suppose you could override getNumRegisters (which is already overridable) for this case. |
Yeah, I was considering overriding getNumRegisters(), but I figured I might as well hit the more general case ... I'm entirely willing to make this a getNumRegisters() override if that seems better |
@arsenm I think we might want to go back to the version of the patch that amends
Haven't isolated a reproducer, but I figured I'd note I might want to go back to a kludge in getValueType() |
Closing in favor of #127692 |
Even though ptr addrspace(7) is rewritten away into more primitive constructs before reaching SelectionDAG or GlobalISel, we still sometimes need to query properties like how many registers it will require using TargetTransformInfo::getRegUsageForType(Type*).
We already had an existing workaround that would map ptr addrspace(7) (and addrspace(9)) to MVT::{5,6}i32, their ultimate in-register representations, in overloads of
TargetLowering::getPointerType(DL, AddressSpace).
However, whenever TargetLowering::getValueType, which is called by the default getRegUsageForType, tried to construct a vector VT out of those vector MVTs, the vector constructor would assert because you can't have a vector of vectors.
Therefore, we override getRegUsageForType for AMD targets so that we can manually handle the vector of fat pointers case. Furthermore, since this is a vectorization interface, we allow the register usage analyzer to optimistically return the typical case - an will became a splat(p8) and a .