Skip to content

Commit 961530f

Browse files
committed
[ARM,MVE] Fix vreinterpretq in big-endian mode.
Summary: In big-endian MVE, the simple vector load/store instructions (i.e. both contiguous and non-widening) don't all store the bytes of a register to memory in the same order: it matters whether you did a VSTRB.8, VSTRH.16 or VSTRW.32. Put another way, the in-register formats of different vector types relate to each other in a different way from the in-memory formats. So, if you want to 'bitcast' or 'reinterpret' one vector type as another, you have to carefully specify which you mean: did you want to reinterpret the //register// format of one type as that of the other, or the //memory// format? The ACLE `vreinterpretq` intrinsics are specified to reinterpret the register format. But I had implemented them as LLVM IR bitcast, which is specified for all types as a reinterpretation of the memory format. So a `vreinterpretq` intrinsic, applied to values already in registers, would code-generate incorrectly if compiled big-endian: instead of emitting no code, it would emit a `vrev`. To fix this, I've introduced a new IR intrinsic to perform a register-format reinterpretation: `@llvm.arm.mve.vreinterpretq`. It's implemented by a trivial isel pattern that expects the input in an MQPR register, and just returns it unchanged. In the clang codegen, I only emit this new intrinsic where it's actually needed: I prefer a bitcast wherever it will have the right effect, because LLVM understands bitcasts better. So we still generate bitcasts in little-endian mode, and even in big-endian when you're casting between two vector types with the same lane size. For testing, I've moved all the codegen tests of vreinterpretq out into their own file, so that they can have a different set of RUN lines to check both big- and little-endian. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73786
1 parent f8d4afc commit 961530f

File tree

10 files changed

+1748
-1254
lines changed

10 files changed

+1748
-1254
lines changed

clang/include/clang/Basic/arm_mve.td

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1063,7 +1063,7 @@ foreach desttype = T.All in {
10631063
!if(!eq(!cast<string>(desttype),!cast<string>(srctype)),[],[srctype])))
10641064
in {
10651065
def "vreinterpretq_" # desttype: Intrinsic<
1066-
VecOf<desttype>, (args Vector:$x), (bitcast $x, VecOf<desttype>)>;
1066+
VecOf<desttype>, (args Vector:$x), (vreinterpret $x, VecOf<desttype>)>;
10671067
}
10681068
}
10691069

clang/include/clang/Basic/arm_mve_defs.td

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,10 @@ class CGHelperFn<string func> : IRBuilderBase {
5757
// an argument.
5858
let prefix = func # "(Builder, ";
5959
}
60+
class CGFHelperFn<string func> : IRBuilderBase {
61+
// Like CGHelperFn, but also takes the CodeGenFunction itself.
62+
let prefix = func # "(Builder, this, ";
63+
}
6064
def add: IRBuilder<"CreateAdd">;
6165
def mul: IRBuilder<"CreateMul">;
6266
def not: IRBuilder<"CreateNot">;
@@ -89,6 +93,7 @@ def ielt_var: IRBuilder<"CreateInsertElement">;
8993
def xelt_var: IRBuilder<"CreateExtractElement">;
9094
def trunc: IRBuilder<"CreateTrunc">;
9195
def bitcast: IRBuilder<"CreateBitCast">;
96+
def vreinterpret: CGFHelperFn<"ARMMVEVectorReinterpret">;
9297
def extend: CGHelperFn<"SignOrZeroExtend"> {
9398
let special_params = [IRBuilderIntParam<2, "bool">];
9499
}

clang/lib/CodeGen/CGBuiltin.cpp

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7019,6 +7019,32 @@ static llvm::Value *ARMMVEVectorSplat(CGBuilderTy &Builder, llvm::Value *V) {
70197019
return Builder.CreateVectorSplat(Elements, V);
70207020
}
70217021

7022+
static llvm::Value *ARMMVEVectorReinterpret(CGBuilderTy &Builder,
7023+
CodeGenFunction *CGF,
7024+
llvm::Value *V,
7025+
llvm::Type *DestType) {
7026+
// Convert one MVE vector type into another by reinterpreting its in-register
7027+
// format.
7028+
//
7029+
// Little-endian, this is identical to a bitcast (which reinterprets the
7030+
// memory format). But big-endian, they're not necessarily the same, because
7031+
// the register and memory formats map to each other differently depending on
7032+
// the lane size.
7033+
//
7034+
// We generate a bitcast whenever we can (if we're little-endian, or if the
7035+
// lane sizes are the same anyway). Otherwise we fall back to an IR intrinsic
7036+
// that performs the different kind of reinterpretation.
7037+
if (CGF->getTarget().isBigEndian() &&
7038+
V->getType()->getScalarSizeInBits() != DestType->getScalarSizeInBits()) {
7039+
return Builder.CreateCall(
7040+
CGF->CGM.getIntrinsic(Intrinsic::arm_mve_vreinterpretq,
7041+
{DestType, V->getType()}),
7042+
V);
7043+
} else {
7044+
return Builder.CreateBitCast(V, DestType);
7045+
}
7046+
}
7047+
70227048
Value *CodeGenFunction::EmitARMMVEBuiltinExpr(unsigned BuiltinID,
70237049
const CallExpr *E,
70247050
ReturnValueSlot ReturnValue,

0 commit comments

Comments
 (0)