Skip to content

[AArch64][GISel] Support SVE with 128-bit min-size for G_LOAD and G_STORE #92130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
May 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -652,17 +652,17 @@ bool GIMatchTableExecutor::executeMatchTable(
MachineMemOperand *MMO =
*(State.MIs[InsnID]->memoperands_begin() + MMOIdx);

unsigned Size = MRI.getType(MO.getReg()).getSizeInBits();
const TypeSize Size = MRI.getType(MO.getReg()).getSizeInBits();
if (MatcherOpcode == GIM_CheckMemorySizeEqualToLLT &&
MMO->getSizeInBits().getValue() != Size) {
MMO->getSizeInBits() != Size) {
if (handleReject() == RejectAndGiveUp)
return false;
} else if (MatcherOpcode == GIM_CheckMemorySizeLessThanLLT &&
MMO->getSizeInBits().getValue() >= Size) {
TypeSize::isKnownGE(MMO->getSizeInBits().getValue(), Size)) {
if (handleReject() == RejectAndGiveUp)
return false;
} else if (MatcherOpcode == GIM_CheckMemorySizeGreaterThanLLT &&
MMO->getSizeInBits().getValue() <= Size)
TypeSize::isKnownLE(MMO->getSizeInBits().getValue(), Size))
if (handleReject() == RejectAndGiveUp)
return false;

Expand Down
3 changes: 2 additions & 1 deletion llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1080,7 +1080,8 @@ bool CombinerHelper::isIndexedLoadStoreLegal(GLoadStore &LdSt) const {
LLT Ty = MRI.getType(LdSt.getReg(0));
LLT MemTy = LdSt.getMMO().getMemoryType();
SmallVector<LegalityQuery::MemDesc, 2> MemDescrs(
{{MemTy, MemTy.getSizeInBits(), AtomicOrdering::NotAtomic}});
{{MemTy, MemTy.getSizeInBits().getKnownMinValue(),
AtomicOrdering::NotAtomic}});
unsigned IndexedOpc = getIndexedOpc(LdSt.getOpcode());
SmallVector<LLT> OpTys;
if (IndexedOpc == TargetOpcode::G_INDEXED_STORE)
Expand Down
2 changes: 1 addition & 1 deletion llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1413,7 +1413,7 @@ bool IRTranslator::translateLoad(const User &U, MachineIRBuilder &MIRBuilder) {

bool IRTranslator::translateStore(const User &U, MachineIRBuilder &MIRBuilder) {
const StoreInst &SI = cast<StoreInst>(U);
if (DL->getTypeStoreSize(SI.getValueOperand()->getType()) == 0)
if (DL->getTypeStoreSize(SI.getValueOperand()->getType()).isZero())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unrelated change. Isn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed, otherwise we implicitly cast a ScalableVectorTy into ScalarTy and will fail.

TypeSize::operator TypeSize::ScalarTy() const {
  if (isScalable()) {
    reportInvalidSizeRequest(
        "Cannot implicitly convert a scalable size to a fixed-width size in "
        "`TypeSize::operator ScalarTy()`");
    return getKnownMinValue();
  }
  return getFixedValue();
}

return true;

ArrayRef<Register> Vals = getOrCreateVRegs(*SI.getValueOperand());
Expand Down
31 changes: 23 additions & 8 deletions llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,15 @@ static cl::opt<bool> EnableExtToTBL("aarch64-enable-ext-to-tbl", cl::Hidden,
static cl::opt<unsigned> MaxXors("aarch64-max-xors", cl::init(16), cl::Hidden,
cl::desc("Maximum of xors"));

// By turning this on, we will not fallback to DAG ISel when encountering
// scalable vector types for all instruction, even if SVE is not yet supported
// with some instructions.
// See [AArch64TargetLowering::fallbackToDAGISel] for implementation details.
static cl::opt<bool> EnableSVEGISel(
"aarch64-enable-gisel-sve", cl::Hidden,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better would be -aarch64-enable-gisel-sve.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we already have aarch64-enable-gisel-ldst-prelegal following the same pattern.

cl::desc("Enable / disable SVE scalable vectors in Global ISel"),
cl::init(false));

/// Value type used for condition codes.
static const MVT MVT_CC = MVT::i32;

Expand Down Expand Up @@ -26376,16 +26385,22 @@ bool AArch64TargetLowering::shouldLocalize(
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to distinguish between GlobalISel and FastISel here. FastISel also calls this function, and it doesn't support scalable vectors. The patch you linked uses a command line flag to do this, but there is probably a better way than that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi - Please keep the command line option from #72976 at the moment. It will help having a debug option we can use for testing, but not be limited to implement every single SVE optimization before global-isel becomes production ready.

This doesn't need to be dependant on the operations we currently "support", the normal fallback mechanism for unsupported operations should handle that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added option aarch64-enable-sve-gisel which is disabled by default, instead of aarch64-disable-sve-gisel, because I found other options are also "enable" like aarch64-enable-logical-imm

bool AArch64TargetLowering::fallBackToDAGISel(const Instruction &Inst) const {
if (Inst.getType()->isScalableTy())
return true;

for (unsigned i = 0; i < Inst.getNumOperands(); ++i)
if (Inst.getOperand(i)->getType()->isScalableTy())
// Fallback for scalable vectors.
// Note that if EnableSVEGISel is true, we allow scalable vector types for
// all instructions, regardless of whether they are actually supported.
if (!EnableSVEGISel) {
if (Inst.getType()->isScalableTy()) {
return true;
}

if (const AllocaInst *AI = dyn_cast<AllocaInst>(&Inst)) {
if (AI->getAllocatedType()->isScalableTy())
return true;
for (unsigned i = 0; i < Inst.getNumOperands(); ++i)
if (Inst.getOperand(i)->getType()->isScalableTy())
return true;

if (const AllocaInst *AI = dyn_cast<AllocaInst>(&Inst)) {
if (AI->getAllocatedType()->isScalableTy())
return true;
}
}

// Checks to allow the use of SME instructions
Expand Down
4 changes: 2 additions & 2 deletions llvm/lib/Target/AArch64/AArch64RegisterBanks.td
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@
/// General Purpose Registers: W, X.
def GPRRegBank : RegisterBank<"GPR", [XSeqPairsClass]>;

/// Floating Point/Vector Registers: B, H, S, D, Q.
def FPRRegBank : RegisterBank<"FPR", [QQQQ]>;
/// Floating Point, Vector, Scalable Vector Registers: B, H, S, D, Q, Z.
def FPRRegBank : RegisterBank<"FPR", [QQQQ, ZPR]>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment needs update to include ZPR too.


/// Conditional register: NZCV.
def CCRegBank : RegisterBank<"CC", [CCR]>;
33 changes: 31 additions & 2 deletions llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,11 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
const LLT v2s64 = LLT::fixed_vector(2, 64);
const LLT v2p0 = LLT::fixed_vector(2, p0);

const LLT nxv16s8 = LLT::scalable_vector(16, s8);
const LLT nxv8s16 = LLT::scalable_vector(8, s16);
const LLT nxv4s32 = LLT::scalable_vector(4, s32);
const LLT nxv2s64 = LLT::scalable_vector(2, s64);

std::initializer_list<LLT> PackedVectorAllTypeList = {/* Begin 128bit types */
v16s8, v8s16, v4s32,
v2s64, v2p0,
Expand Down Expand Up @@ -329,7 +334,31 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
return ValTy.isPointerVector() && ValTy.getAddressSpace() == 0;
};

getActionDefinitionsBuilder(G_LOAD)
auto &LoadActions = getActionDefinitionsBuilder(G_LOAD);
auto &StoreActions = getActionDefinitionsBuilder(G_STORE);

if (ST.hasSVE()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For most backends, there is no sensible way to lower scalable vectors without SVE, so they often just crash in some way. So the hasSVE isn't necessarily as important as it could be if the alternative is to just crash in some other way.

Having said that, I think it would be useful when dealing with optional architecture features to have a way to easily add legalization actions based on a condition. So adding something like legalForTypesWithMemDesc(hasSVE, {...}).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR, I vote for the G_ABS pattern and remove the verbose loop. We should open a ticket to discuss making feature-based legalization more economic. The X86 guys have similar issues.

Copy link
Member Author

@Him188 Him188 May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove the loop then we have to duplicate the rules for both G_LOAD and G_STORE?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that might be OK - they might end up being different eventually anyway, after all the truncstores/extloads and type legalization gets added.

LoadActions.legalForTypesWithMemDesc({
// 128 bit base sizes
{nxv16s8, p0, nxv16s8, 8},
{nxv8s16, p0, nxv8s16, 8},
{nxv4s32, p0, nxv4s32, 8},
{nxv2s64, p0, nxv2s64, 8},
});

// TODO: Add nxv2p0. Consider bitcastIf.
// See #92130
// https://github.com/llvm/llvm-project/pull/92130#discussion_r1616888461
StoreActions.legalForTypesWithMemDesc({
// 128 bit base sizes
{nxv16s8, p0, nxv16s8, 8},
{nxv8s16, p0, nxv8s16, 8},
{nxv4s32, p0, nxv4s32, 8},
{nxv2s64, p0, nxv2s64, 8},
});
}

LoadActions
.customIf([=](const LegalityQuery &Query) {
return HasRCPC3 && Query.Types[0] == s128 &&
Query.MMODescrs[0].Ordering == AtomicOrdering::Acquire;
Expand Down Expand Up @@ -379,7 +408,7 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
.customIf(IsPtrVecPred)
.scalarizeIf(typeInSet(0, {v2s16, v2s8}), 0);

getActionDefinitionsBuilder(G_STORE)
StoreActions
.customIf([=](const LegalityQuery &Query) {
return HasRCPC3 && Query.Types[0] == s128 &&
Query.MMODescrs[0].Ordering == AtomicOrdering::Release;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,8 @@ bool matchSplitStoreZero128(MachineInstr &MI, MachineRegisterInfo &MRI) {
if (!Store.isSimple())
return false;
LLT ValTy = MRI.getType(Store.getValueReg());
if (ValTy.isScalableVector())
return false;
if (!ValTy.isVector() || ValTy.getSizeInBits() != 128)
return false;
if (Store.getMemSizeInBits() != ValTy.getSizeInBits())
Expand Down Expand Up @@ -653,6 +655,11 @@ bool AArch64PostLegalizerCombiner::optimizeConsecutiveMemOpAddressing(
// should only be in a single block.
resetState();
for (auto &MI : MBB) {
// Skip for scalable vectors
if (auto *LdSt = dyn_cast<GLoadStore>(&MI);
LdSt && MRI.getType(LdSt->getOperand(0).getReg()).isScalableVector())
continue;

if (auto *St = dyn_cast<GStore>(&MI)) {
Register PtrBaseReg;
APInt Offset;
Expand Down
12 changes: 8 additions & 4 deletions llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,7 @@ AArch64RegisterBankInfo::getRegBankFromRegClass(const TargetRegisterClass &RC,
case AArch64::QQRegClassID:
case AArch64::QQQRegClassID:
case AArch64::QQQQRegClassID:
case AArch64::ZPRRegClassID:
return getRegBank(AArch64::FPRRegBankID);
case AArch64::GPR32commonRegClassID:
case AArch64::GPR32RegClassID:
Expand Down Expand Up @@ -740,12 +741,15 @@ AArch64RegisterBankInfo::getInstrMapping(const MachineInstr &MI) const {
LLT Ty = MRI.getType(MO.getReg());
if (!Ty.isValid())
continue;
OpSize[Idx] = Ty.getSizeInBits();
OpSize[Idx] = Ty.getSizeInBits().getKnownMinValue();

// As a top-level guess, vectors go in FPRs, scalars and pointers in GPRs.
// As a top-level guess, vectors including both scalable and non-scalable
// ones go in FPRs, scalars and pointers in GPRs.
// For floating-point instructions, scalars go in FPRs.
if (Ty.isVector() || isPreISelGenericFloatingPointOpcode(Opc) ||
Ty.getSizeInBits() > 64)
if (Ty.isVector())
OpRegBankIdx[Idx] = PMI_FirstFPR;
else if (isPreISelGenericFloatingPointOpcode(Opc) ||
Ty.getSizeInBits() > 64)
OpRegBankIdx[Idx] = PMI_FirstFPR;
else
OpRegBankIdx[Idx] = PMI_FirstGPR;
Expand Down
50 changes: 50 additions & 0 deletions llvm/test/CodeGen/AArch64/GlobalISel/sve-load-store.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -global-isel -aarch64-enable-gisel-sve=true < %s | FileCheck %s

define void @scalable_v16i8(ptr %l0, ptr %l1) {
; CHECK-LABEL: scalable_v16i8:
; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.b
; CHECK-NEXT: ld1b { z0.b }, p0/z, [x0]
; CHECK-NEXT: st1b { z0.b }, p0, [x1]
; CHECK-NEXT: ret
%l3 = load <vscale x 16 x i8>, ptr %l0, align 16
store <vscale x 16 x i8> %l3, ptr %l1, align 16
ret void
}

define void @scalable_v8i16(ptr %l0, ptr %l1) {
; CHECK-LABEL: scalable_v8i16:
; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.h
; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
; CHECK-NEXT: st1h { z0.h }, p0, [x1]
; CHECK-NEXT: ret
%l3 = load <vscale x 8 x i16>, ptr %l0, align 16
store <vscale x 8 x i16> %l3, ptr %l1, align 16
ret void
}

define void @scalable_v4i32(ptr %l0, ptr %l1) {
; CHECK-LABEL: scalable_v4i32:
; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0]
; CHECK-NEXT: st1w { z0.s }, p0, [x1]
; CHECK-NEXT: ret
%l3 = load <vscale x 4 x i32>, ptr %l0, align 16
store <vscale x 4 x i32> %l3, ptr %l1, align 16
ret void
}

define void @scalable_v2i64(ptr %l0, ptr %l1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test for ptrs too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out ptrs do not work after adding the test.
It failed to match an existing tablegen pattern during instruction selection. SDAG seems to handle ptrs as s64 but I'm not sure how GISEL handles ptrs. Does it also convert ptrs to s64 at some stage or need special instruction selection pattern for that?

BTW, in RISCV they don't define nxvNp0 types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GlobalISel uses p<n> (only p0 is used right now?) throughout the pipeline, afaik its not converted to integers anywhere.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AArch64 only uses p0. It is as pointer in address space 0. If we cannot select nxv2p0, then we should not legalize it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two options for ptrs that are stored

  • Scalars are copied it to integer in AArch64InstructionSelector::preISelLower
  • Vectors are bitcast to integer vectors in AArch64LegalizerInfo::legalizeLoadStore, by marking them as custom.
    We only really need one method though. The vector stores should be using bitcastIf not customIf and the scalars could hopefully use the same method. This might mean that G_BITCAST needed to be supported for scalable vectors though, if it was using the same method.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for postponing support for nxv2p0. The patch is large enough. Bitcasting seems to be preferable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds OK to me. Perhaps add a TODO about scalable vector ptr stores, so we can look into it later.

Copy link
Member Author

@Him188 Him188 May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed legalizer rules for nzv2p0. Added a TODO

; CHECK-LABEL: scalable_v2i64:
; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0]
; CHECK-NEXT: st1d { z0.d }, p0, [x1]
; CHECK-NEXT: ret
%l3 = load <vscale x 2 x i64>, ptr %l0, align 16
store <vscale x 2 x i64> %l3, ptr %l1, align 16
ret void
}
Loading