Skip to content

Commit 2ddf795

Browse files
committed
Reland "[CodeGen][AArch64] Support arm_sve_vector_bits attribute"
This relands D85743 with a fix for test CodeGen/attr-arm-sve-vector-bits-call.c that disables the new pass manager with '-fno-experimental-new-pass-manager'. Test was failing due to IR differences with the new pass manager which broke the Fuchsia builder [1]. Reverted in 2e7041f. [1] http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/10375 Original summary: This patch implements codegen for the 'arm_sve_vector_bits' type attribute, defined by the Arm C Language Extensions (ACLE) for SVE [1]. The purpose of this attribute is to define vector-length-specific (VLS) versions of existing vector-length-agnostic (VLA) types. VLSTs are represented as VectorType in the AST and fixed-length vectors in the IR everywhere except in function args/return. Implemented in this patch is codegen support for the following: * Implicit casting between VLA <-> VLS types. * Coercion of VLS types in function args/return. * Mangling of VLS types. Casting is handled by the CK_BitCast operation, which has been extended to support the two new vector kinds for fixed-length SVE predicate and data vectors, where the cast is implemented through memory rather than a bitcast which is unsupported. Implementing this as a normal bitcast would require relaxing checks in LLVM to allow bitcasting between scalable and fixed types. Another option was adding target-specific intrinsics, although codegen support would need to be added for these intrinsics. Given this, casting through memory seemed like the best approach as it's supported today and existing optimisations may remove unnecessary loads/stores, although there is room for improvement here. Coercion of VLSTs in function args/return from fixed to scalable is implemented through the AArch64 ABI in TargetInfo. The VLA and VLS types are defined by the ACLE to map to the same machine-level SVE vectors. VLS types are mangled in the same way as: __SVE_VLS<typename, unsigned> where the first argument is the underlying variable-length type and the second argument is the SVE vector length in bits. For example: #if __ARM_FEATURE_SVE_BITS==512 // Mangled as 9__SVE_VLSIu11__SVInt32_tLj512EE typedef svint32_t vec __attribute__((arm_sve_vector_bits(512))); // Mangled as 9__SVE_VLSIu10__SVBool_tLj512EE typedef svbool_t pred __attribute__((arm_sve_vector_bits(512))); #endif The latest ACLE specification (00bet5) does not contain details of this mangling scheme, it will be specified in the next revision. The mangling scheme is otherwise defined in the appendices to the Procedure Call Standard for the Arm Architecture, see [2] for more information. [1] https://developer.arm.com/documentation/100987/latest [2] https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85743
1 parent bfc7636 commit 2ddf795

12 files changed

+2066
-40
lines changed

clang/lib/AST/ItaniumMangle.cpp

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -531,6 +531,8 @@ class CXXNameMangler {
531531
void mangleNeonVectorType(const DependentVectorType *T);
532532
void mangleAArch64NeonVectorType(const VectorType *T);
533533
void mangleAArch64NeonVectorType(const DependentVectorType *T);
534+
void mangleAArch64FixedSveVectorType(const VectorType *T);
535+
void mangleAArch64FixedSveVectorType(const DependentVectorType *T);
534536

535537
void mangleIntegerLiteral(QualType T, const llvm::APSInt &Value);
536538
void mangleMemberExprBase(const Expr *base, bool isArrow);
@@ -3323,6 +3325,103 @@ void CXXNameMangler::mangleAArch64NeonVectorType(const DependentVectorType *T) {
33233325
Diags.Report(T->getAttributeLoc(), DiagID);
33243326
}
33253327

3328+
// The AArch64 ACLE specifies that fixed-length SVE vector and predicate types
3329+
// defined with the 'arm_sve_vector_bits' attribute map to the same AAPCS64
3330+
// type as the sizeless variants.
3331+
//
3332+
// The mangling scheme for VLS types is implemented as a "pseudo" template:
3333+
//
3334+
// '__SVE_VLS<<type>, <vector length>>'
3335+
//
3336+
// Combining the existing SVE type and a specific vector length (in bits).
3337+
// For example:
3338+
//
3339+
// typedef __SVInt32_t foo __attribute__((arm_sve_vector_bits(512)));
3340+
//
3341+
// is described as '__SVE_VLS<__SVInt32_t, 512u>' and mangled as:
3342+
//
3343+
// "9__SVE_VLSI" + base type mangling + "Lj" + __ARM_FEATURE_SVE_BITS + "EE"
3344+
//
3345+
// i.e. 9__SVE_VLSIu11__SVInt32_tLj512EE
3346+
//
3347+
// The latest ACLE specification (00bet5) does not contain details of this
3348+
// mangling scheme, it will be specified in the next revision. The mangling
3349+
// scheme is otherwise defined in the appendices to the Procedure Call Standard
3350+
// for the Arm Architecture, see
3351+
// https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling
3352+
void CXXNameMangler::mangleAArch64FixedSveVectorType(const VectorType *T) {
3353+
assert((T->getVectorKind() == VectorType::SveFixedLengthDataVector ||
3354+
T->getVectorKind() == VectorType::SveFixedLengthPredicateVector) &&
3355+
"expected fixed-length SVE vector!");
3356+
3357+
QualType EltType = T->getElementType();
3358+
assert(EltType->isBuiltinType() &&
3359+
"expected builtin type for fixed-length SVE vector!");
3360+
3361+
StringRef TypeName;
3362+
switch (cast<BuiltinType>(EltType)->getKind()) {
3363+
case BuiltinType::SChar:
3364+
TypeName = "__SVInt8_t";
3365+
break;
3366+
case BuiltinType::UChar: {
3367+
if (T->getVectorKind() == VectorType::SveFixedLengthDataVector)
3368+
TypeName = "__SVUint8_t";
3369+
else
3370+
TypeName = "__SVBool_t";
3371+
break;
3372+
}
3373+
case BuiltinType::Short:
3374+
TypeName = "__SVInt16_t";
3375+
break;
3376+
case BuiltinType::UShort:
3377+
TypeName = "__SVUint16_t";
3378+
break;
3379+
case BuiltinType::Int:
3380+
TypeName = "__SVInt32_t";
3381+
break;
3382+
case BuiltinType::UInt:
3383+
TypeName = "__SVUint32_t";
3384+
break;
3385+
case BuiltinType::Long:
3386+
TypeName = "__SVInt64_t";
3387+
break;
3388+
case BuiltinType::ULong:
3389+
TypeName = "__SVUint64_t";
3390+
break;
3391+
case BuiltinType::Float16:
3392+
TypeName = "__SVFloat16_t";
3393+
break;
3394+
case BuiltinType::Float:
3395+
TypeName = "__SVFloat32_t";
3396+
break;
3397+
case BuiltinType::Double:
3398+
TypeName = "__SVFloat64_t";
3399+
break;
3400+
case BuiltinType::BFloat16:
3401+
TypeName = "__SVBfloat16_t";
3402+
break;
3403+
default:
3404+
llvm_unreachable("unexpected element type for fixed-length SVE vector!");
3405+
}
3406+
3407+
unsigned VecSizeInBits = getASTContext().getTypeInfo(T).Width;
3408+
3409+
if (T->getVectorKind() == VectorType::SveFixedLengthPredicateVector)
3410+
VecSizeInBits *= 8;
3411+
3412+
Out << "9__SVE_VLSI" << 'u' << TypeName.size() << TypeName << "Lj"
3413+
<< VecSizeInBits << "EE";
3414+
}
3415+
3416+
void CXXNameMangler::mangleAArch64FixedSveVectorType(
3417+
const DependentVectorType *T) {
3418+
DiagnosticsEngine &Diags = Context.getDiags();
3419+
unsigned DiagID = Diags.getCustomDiagID(
3420+
DiagnosticsEngine::Error,
3421+
"cannot mangle this dependent fixed-length SVE vector type yet");
3422+
Diags.Report(T->getAttributeLoc(), DiagID);
3423+
}
3424+
33263425
// GNU extension: vector types
33273426
// <type> ::= <vector-type>
33283427
// <vector-type> ::= Dv <positive dimension number> _
@@ -3343,6 +3442,10 @@ void CXXNameMangler::mangleType(const VectorType *T) {
33433442
else
33443443
mangleNeonVectorType(T);
33453444
return;
3445+
} else if (T->getVectorKind() == VectorType::SveFixedLengthDataVector ||
3446+
T->getVectorKind() == VectorType::SveFixedLengthPredicateVector) {
3447+
mangleAArch64FixedSveVectorType(T);
3448+
return;
33463449
}
33473450
Out << "Dv" << T->getNumElements() << '_';
33483451
if (T->getVectorKind() == VectorType::AltiVecPixel)
@@ -3365,6 +3468,10 @@ void CXXNameMangler::mangleType(const DependentVectorType *T) {
33653468
else
33663469
mangleNeonVectorType(T);
33673470
return;
3471+
} else if (T->getVectorKind() == VectorType::SveFixedLengthDataVector ||
3472+
T->getVectorKind() == VectorType::SveFixedLengthPredicateVector) {
3473+
mangleAArch64FixedSveVectorType(T);
3474+
return;
33683475
}
33693476

33703477
Out << "Dv";

clang/lib/CodeGen/CGCall.cpp

Lines changed: 26 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1119,12 +1119,13 @@ void CodeGenFunction::ExpandTypeToArgs(
11191119

11201120
/// Create a temporary allocation for the purposes of coercion.
11211121
static Address CreateTempAllocaForCoercion(CodeGenFunction &CGF, llvm::Type *Ty,
1122-
CharUnits MinAlign) {
1122+
CharUnits MinAlign,
1123+
const Twine &Name = "tmp") {
11231124
// Don't use an alignment that's worse than what LLVM would prefer.
11241125
auto PrefAlign = CGF.CGM.getDataLayout().getPrefTypeAlignment(Ty);
11251126
CharUnits Align = std::max(MinAlign, CharUnits::fromQuantity(PrefAlign));
11261127

1127-
return CGF.CreateTempAlloca(Ty, Align);
1128+
return CGF.CreateTempAlloca(Ty, Align, Name + ".coerce");
11281129
}
11291130

11301131
/// EnterStructPointerForCoercedAccess - Given a struct pointer that we are
@@ -1230,14 +1231,15 @@ static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
12301231
if (SrcTy == Ty)
12311232
return CGF.Builder.CreateLoad(Src);
12321233

1233-
uint64_t DstSize = CGF.CGM.getDataLayout().getTypeAllocSize(Ty);
1234+
llvm::TypeSize DstSize = CGF.CGM.getDataLayout().getTypeAllocSize(Ty);
12341235

12351236
if (llvm::StructType *SrcSTy = dyn_cast<llvm::StructType>(SrcTy)) {
1236-
Src = EnterStructPointerForCoercedAccess(Src, SrcSTy, DstSize, CGF);
1237+
Src = EnterStructPointerForCoercedAccess(Src, SrcSTy,
1238+
DstSize.getFixedSize(), CGF);
12371239
SrcTy = Src.getElementType();
12381240
}
12391241

1240-
uint64_t SrcSize = CGF.CGM.getDataLayout().getTypeAllocSize(SrcTy);
1242+
llvm::TypeSize SrcSize = CGF.CGM.getDataLayout().getTypeAllocSize(SrcTy);
12411243

12421244
// If the source and destination are integer or pointer types, just do an
12431245
// extension or truncation to the desired type.
@@ -1248,7 +1250,8 @@ static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
12481250
}
12491251

12501252
// If load is legal, just bitcast the src pointer.
1251-
if (SrcSize >= DstSize) {
1253+
if (!SrcSize.isScalable() && !DstSize.isScalable() &&
1254+
SrcSize.getFixedSize() >= DstSize.getFixedSize()) {
12521255
// Generally SrcSize is never greater than DstSize, since this means we are
12531256
// losing bits. However, this can happen in cases where the structure has
12541257
// additional padding, for example due to a user specified alignment.
@@ -1261,10 +1264,12 @@ static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
12611264
}
12621265

12631266
// Otherwise do coercion through memory. This is stupid, but simple.
1264-
Address Tmp = CreateTempAllocaForCoercion(CGF, Ty, Src.getAlignment());
1265-
CGF.Builder.CreateMemCpy(Tmp.getPointer(), Tmp.getAlignment().getAsAlign(),
1266-
Src.getPointer(), Src.getAlignment().getAsAlign(),
1267-
llvm::ConstantInt::get(CGF.IntPtrTy, SrcSize));
1267+
Address Tmp =
1268+
CreateTempAllocaForCoercion(CGF, Ty, Src.getAlignment(), Src.getName());
1269+
CGF.Builder.CreateMemCpy(
1270+
Tmp.getPointer(), Tmp.getAlignment().getAsAlign(), Src.getPointer(),
1271+
Src.getAlignment().getAsAlign(),
1272+
llvm::ConstantInt::get(CGF.IntPtrTy, SrcSize.getKnownMinSize()));
12681273
return CGF.Builder.CreateLoad(Tmp);
12691274
}
12701275

@@ -1303,10 +1308,11 @@ static void CreateCoercedStore(llvm::Value *Src,
13031308
return;
13041309
}
13051310

1306-
uint64_t SrcSize = CGF.CGM.getDataLayout().getTypeAllocSize(SrcTy);
1311+
llvm::TypeSize SrcSize = CGF.CGM.getDataLayout().getTypeAllocSize(SrcTy);
13071312

13081313
if (llvm::StructType *DstSTy = dyn_cast<llvm::StructType>(DstTy)) {
1309-
Dst = EnterStructPointerForCoercedAccess(Dst, DstSTy, SrcSize, CGF);
1314+
Dst = EnterStructPointerForCoercedAccess(Dst, DstSTy,
1315+
SrcSize.getFixedSize(), CGF);
13101316
DstTy = Dst.getElementType();
13111317
}
13121318

@@ -1328,10 +1334,12 @@ static void CreateCoercedStore(llvm::Value *Src,
13281334
return;
13291335
}
13301336

1331-
uint64_t DstSize = CGF.CGM.getDataLayout().getTypeAllocSize(DstTy);
1337+
llvm::TypeSize DstSize = CGF.CGM.getDataLayout().getTypeAllocSize(DstTy);
13321338

13331339
// If store is legal, just bitcast the src pointer.
1334-
if (SrcSize <= DstSize) {
1340+
if (isa<llvm::ScalableVectorType>(SrcTy) ||
1341+
isa<llvm::ScalableVectorType>(DstTy) ||
1342+
SrcSize.getFixedSize() <= DstSize.getFixedSize()) {
13351343
Dst = CGF.Builder.CreateElementBitCast(Dst, SrcTy);
13361344
CGF.EmitAggregateStore(Src, Dst, DstIsVolatile);
13371345
} else {
@@ -1346,9 +1354,10 @@ static void CreateCoercedStore(llvm::Value *Src,
13461354
// to that information.
13471355
Address Tmp = CreateTempAllocaForCoercion(CGF, SrcTy, Dst.getAlignment());
13481356
CGF.Builder.CreateStore(Src, Tmp);
1349-
CGF.Builder.CreateMemCpy(Dst.getPointer(), Dst.getAlignment().getAsAlign(),
1350-
Tmp.getPointer(), Tmp.getAlignment().getAsAlign(),
1351-
llvm::ConstantInt::get(CGF.IntPtrTy, DstSize));
1357+
CGF.Builder.CreateMemCpy(
1358+
Dst.getPointer(), Dst.getAlignment().getAsAlign(), Tmp.getPointer(),
1359+
Tmp.getAlignment().getAsAlign(),
1360+
llvm::ConstantInt::get(CGF.IntPtrTy, DstSize.getFixedSize()));
13521361
}
13531362
}
13541363

clang/lib/CodeGen/CGExprScalar.cpp

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2003,6 +2003,34 @@ Value *ScalarExprEmitter::VisitCastExpr(CastExpr *CE) {
20032003
}
20042004
}
20052005

2006+
// Perform VLAT <-> VLST bitcast through memory.
2007+
if ((isa<llvm::FixedVectorType>(SrcTy) &&
2008+
isa<llvm::ScalableVectorType>(DstTy)) ||
2009+
(isa<llvm::ScalableVectorType>(SrcTy) &&
2010+
isa<llvm::FixedVectorType>(DstTy))) {
2011+
if (const CallExpr *CE = dyn_cast<CallExpr>(E)) {
2012+
// Call expressions can't have a scalar return unless the return type
2013+
// is a reference type so an lvalue can't be emitted. Create a temp
2014+
// alloca to store the call, bitcast the address then load.
2015+
QualType RetTy = CE->getCallReturnType(CGF.getContext());
2016+
Address Addr =
2017+
CGF.CreateDefaultAlignTempAlloca(SrcTy, "saved-call-rvalue");
2018+
LValue LV = CGF.MakeAddrLValue(Addr, RetTy);
2019+
CGF.EmitStoreOfScalar(Src, LV);
2020+
Addr = Builder.CreateElementBitCast(Addr, CGF.ConvertTypeForMem(DestTy),
2021+
"castFixedSve");
2022+
LValue DestLV = CGF.MakeAddrLValue(Addr, DestTy);
2023+
DestLV.setTBAAInfo(TBAAAccessInfo::getMayAliasInfo());
2024+
return EmitLoadOfLValue(DestLV, CE->getExprLoc());
2025+
}
2026+
2027+
Address Addr = EmitLValue(E).getAddress(CGF);
2028+
Addr = Builder.CreateElementBitCast(Addr, CGF.ConvertTypeForMem(DestTy));
2029+
LValue DestLV = CGF.MakeAddrLValue(Addr, DestTy);
2030+
DestLV.setTBAAInfo(TBAAAccessInfo::getMayAliasInfo());
2031+
return EmitLoadOfLValue(DestLV, CE->getExprLoc());
2032+
}
2033+
20062034
return Builder.CreateBitCast(Src, DstTy);
20072035
}
20082036
case CK_AddressSpaceConversion: {

0 commit comments

Comments
 (0)