-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AArch64][GISel] Support SVE with 128-bit min-size for G_LOAD and G_STORE #92130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
a61fb1a
135c91f
e2ec951
70b3f21
73a618b
b19d1b4
fb48ea1
9dedf0e
ad331f9
0c3cce3
4503074
22de2ad
3be060c
b5c72d7
2518b20
5a288d8
f1a4d7b
09570db
551fec4
5995c15
e53b252
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -145,6 +145,15 @@ static cl::opt<bool> EnableExtToTBL("aarch64-enable-ext-to-tbl", cl::Hidden, | |
static cl::opt<unsigned> MaxXors("aarch64-max-xors", cl::init(16), cl::Hidden, | ||
cl::desc("Maximum of xors")); | ||
|
||
// By turning this on, we will not fallback to DAG ISel when encountering | ||
// scalable vector types for all instruction, even if SVE is not yet supported | ||
// with some instructions. | ||
// See [AArch64TargetLowering::fallbackToDAGISel] for implementation details. | ||
static cl::opt<bool> EnableSVEGISel( | ||
"aarch64-enable-gisel-sve", cl::Hidden, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better would be -aarch64-enable-gisel-sve. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, we already have |
||
cl::desc("Enable / disable SVE scalable vectors in Global ISel"), | ||
cl::init(false)); | ||
|
||
/// Value type used for condition codes. | ||
static const MVT MVT_CC = MVT::i32; | ||
|
||
|
@@ -26376,16 +26385,22 @@ bool AArch64TargetLowering::shouldLocalize( | |
} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We also need to distinguish between GlobalISel and FastISel here. FastISel also calls this function, and it doesn't support scalable vectors. The patch you linked uses a command line flag to do this, but there is probably a better way than that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi - Please keep the command line option from #72976 at the moment. It will help having a debug option we can use for testing, but not be limited to implement every single SVE optimization before global-isel becomes production ready. This doesn't need to be dependant on the operations we currently "support", the normal fallback mechanism for unsupported operations should handle that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added option |
||
bool AArch64TargetLowering::fallBackToDAGISel(const Instruction &Inst) const { | ||
if (Inst.getType()->isScalableTy()) | ||
return true; | ||
|
||
for (unsigned i = 0; i < Inst.getNumOperands(); ++i) | ||
if (Inst.getOperand(i)->getType()->isScalableTy()) | ||
// Fallback for scalable vectors. | ||
// Note that if EnableSVEGISel is true, we allow scalable vector types for | ||
// all instructions, regardless of whether they are actually supported. | ||
if (!EnableSVEGISel) { | ||
Him188 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (Inst.getType()->isScalableTy()) { | ||
return true; | ||
} | ||
|
||
if (const AllocaInst *AI = dyn_cast<AllocaInst>(&Inst)) { | ||
if (AI->getAllocatedType()->isScalableTy()) | ||
return true; | ||
for (unsigned i = 0; i < Inst.getNumOperands(); ++i) | ||
if (Inst.getOperand(i)->getType()->isScalableTy()) | ||
return true; | ||
|
||
if (const AllocaInst *AI = dyn_cast<AllocaInst>(&Inst)) { | ||
if (AI->getAllocatedType()->isScalableTy()) | ||
return true; | ||
} | ||
} | ||
|
||
// Checks to allow the use of SME instructions | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,8 +12,8 @@ | |
/// General Purpose Registers: W, X. | ||
def GPRRegBank : RegisterBank<"GPR", [XSeqPairsClass]>; | ||
|
||
/// Floating Point/Vector Registers: B, H, S, D, Q. | ||
def FPRRegBank : RegisterBank<"FPR", [QQQQ]>; | ||
/// Floating Point, Vector, Scalable Vector Registers: B, H, S, D, Q, Z. | ||
def FPRRegBank : RegisterBank<"FPR", [QQQQ, ZPR]>; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment needs update to include ZPR too. |
||
|
||
/// Conditional register: NZCV. | ||
def CCRegBank : RegisterBank<"CC", [CCR]>; |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -61,6 +61,11 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) | |
const LLT v2s64 = LLT::fixed_vector(2, 64); | ||
const LLT v2p0 = LLT::fixed_vector(2, p0); | ||
|
||
const LLT nxv16s8 = LLT::scalable_vector(16, s8); | ||
const LLT nxv8s16 = LLT::scalable_vector(8, s16); | ||
const LLT nxv4s32 = LLT::scalable_vector(4, s32); | ||
const LLT nxv2s64 = LLT::scalable_vector(2, s64); | ||
|
||
std::initializer_list<LLT> PackedVectorAllTypeList = {/* Begin 128bit types */ | ||
v16s8, v8s16, v4s32, | ||
v2s64, v2p0, | ||
|
@@ -329,7 +334,31 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) | |
return ValTy.isPointerVector() && ValTy.getAddressSpace() == 0; | ||
}; | ||
|
||
getActionDefinitionsBuilder(G_LOAD) | ||
auto &LoadActions = getActionDefinitionsBuilder(G_LOAD); | ||
auto &StoreActions = getActionDefinitionsBuilder(G_STORE); | ||
|
||
if (ST.hasSVE()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For most backends, there is no sensible way to lower scalable vectors without SVE, so they often just crash in some way. So the hasSVE isn't necessarily as important as it could be if the alternative is to just crash in some other way. Having said that, I think it would be useful when dealing with optional architecture features to have a way to easily add legalization actions based on a condition. So adding something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For this PR, I vote for the G_ABS pattern and remove the verbose loop. We should open a ticket to discuss making feature-based legalization more economic. The X86 guys have similar issues. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we remove the loop then we have to duplicate the rules for both G_LOAD and G_STORE? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that might be OK - they might end up being different eventually anyway, after all the truncstores/extloads and type legalization gets added. |
||
LoadActions.legalForTypesWithMemDesc({ | ||
// 128 bit base sizes | ||
{nxv16s8, p0, nxv16s8, 8}, | ||
{nxv8s16, p0, nxv8s16, 8}, | ||
{nxv4s32, p0, nxv4s32, 8}, | ||
{nxv2s64, p0, nxv2s64, 8}, | ||
}); | ||
|
||
// TODO: Add nxv2p0. Consider bitcastIf. | ||
// See #92130 | ||
// https://github.com/llvm/llvm-project/pull/92130#discussion_r1616888461 | ||
StoreActions.legalForTypesWithMemDesc({ | ||
// 128 bit base sizes | ||
{nxv16s8, p0, nxv16s8, 8}, | ||
{nxv8s16, p0, nxv8s16, 8}, | ||
{nxv4s32, p0, nxv4s32, 8}, | ||
{nxv2s64, p0, nxv2s64, 8}, | ||
}); | ||
} | ||
|
||
LoadActions | ||
.customIf([=](const LegalityQuery &Query) { | ||
return HasRCPC3 && Query.Types[0] == s128 && | ||
Query.MMODescrs[0].Ordering == AtomicOrdering::Acquire; | ||
|
@@ -379,7 +408,7 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) | |
.customIf(IsPtrVecPred) | ||
.scalarizeIf(typeInSet(0, {v2s16, v2s8}), 0); | ||
|
||
getActionDefinitionsBuilder(G_STORE) | ||
StoreActions | ||
.customIf([=](const LegalityQuery &Query) { | ||
return HasRCPC3 && Query.Types[0] == s128 && | ||
Query.MMODescrs[0].Ordering == AtomicOrdering::Release; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4 | ||
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -global-isel -aarch64-enable-gisel-sve=true < %s | FileCheck %s | ||
|
||
define void @scalable_v16i8(ptr %l0, ptr %l1) { | ||
; CHECK-LABEL: scalable_v16i8: | ||
; CHECK: // %bb.0: | ||
; CHECK-NEXT: ptrue p0.b | ||
; CHECK-NEXT: ld1b { z0.b }, p0/z, [x0] | ||
; CHECK-NEXT: st1b { z0.b }, p0, [x1] | ||
; CHECK-NEXT: ret | ||
%l3 = load <vscale x 16 x i8>, ptr %l0, align 16 | ||
store <vscale x 16 x i8> %l3, ptr %l1, align 16 | ||
ret void | ||
} | ||
|
||
define void @scalable_v8i16(ptr %l0, ptr %l1) { | ||
; CHECK-LABEL: scalable_v8i16: | ||
; CHECK: // %bb.0: | ||
; CHECK-NEXT: ptrue p0.h | ||
; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0] | ||
; CHECK-NEXT: st1h { z0.h }, p0, [x1] | ||
; CHECK-NEXT: ret | ||
%l3 = load <vscale x 8 x i16>, ptr %l0, align 16 | ||
store <vscale x 8 x i16> %l3, ptr %l1, align 16 | ||
ret void | ||
} | ||
|
||
define void @scalable_v4i32(ptr %l0, ptr %l1) { | ||
; CHECK-LABEL: scalable_v4i32: | ||
; CHECK: // %bb.0: | ||
; CHECK-NEXT: ptrue p0.s | ||
; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0] | ||
; CHECK-NEXT: st1w { z0.s }, p0, [x1] | ||
; CHECK-NEXT: ret | ||
%l3 = load <vscale x 4 x i32>, ptr %l0, align 16 | ||
store <vscale x 4 x i32> %l3, ptr %l1, align 16 | ||
ret void | ||
} | ||
|
||
define void @scalable_v2i64(ptr %l0, ptr %l1) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a test for ptrs too? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It turns out ptrs do not work after adding the test. BTW, in RISCV they don't define There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. GlobalISel uses There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AArch64 only uses p0. It is as pointer in address space 0. If we cannot select nxv2p0, then we should not legalize it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are two options for ptrs that are stored
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I vote for postponing support for nxv2p0. The patch is large enough. Bitcasting seems to be preferable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah that sounds OK to me. Perhaps add a TODO about scalable vector ptr stores, so we can look into it later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed legalizer rules for nzv2p0. Added a TODO |
||
; CHECK-LABEL: scalable_v2i64: | ||
; CHECK: // %bb.0: | ||
; CHECK-NEXT: ptrue p0.d | ||
; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0] | ||
; CHECK-NEXT: st1d { z0.d }, p0, [x1] | ||
; CHECK-NEXT: ret | ||
%l3 = load <vscale x 2 x i64>, ptr %l0, align 16 | ||
store <vscale x 2 x i64> %l3, ptr %l1, align 16 | ||
ret void | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unrelated change. Isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed, otherwise we implicitly cast a ScalableVectorTy into ScalarTy and will fail.