Skip to content

[RISCV] Improve casting between i1 scalable vectors and i8 fixed vectors for -mrvv-vector-bits #139190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 15 additions & 5 deletions clang/lib/CodeGen/CGCall.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1366,19 +1366,23 @@ static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
// If we are casting a fixed i8 vector to a scalable i1 predicate
// vector, use a vector insert and bitcast the result.
if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
FixedSrcTy->getElementType()->isIntegerTy(8)) {
ScalableDstTy = llvm::ScalableVectorType::get(
FixedSrcTy->getElementType(),
ScalableDstTy->getElementCount().getKnownMinValue() / 8);
llvm::divideCeil(
ScalableDstTy->getElementCount().getKnownMinValue(), 8));
}
if (ScalableDstTy->getElementType() == FixedSrcTy->getElementType()) {
auto *Load = CGF.Builder.CreateLoad(Src);
auto *PoisonVec = llvm::PoisonValue::get(ScalableDstTy);
llvm::Value *Result = CGF.Builder.CreateInsertVector(
ScalableDstTy, PoisonVec, Load, uint64_t(0), "cast.scalable");
if (ScalableDstTy != Ty)
Result = CGF.Builder.CreateBitCast(Result, Ty);
ScalableDstTy = cast<llvm::ScalableVectorType>(
llvm::VectorType::getWithSizeAndScalar(ScalableDstTy, Ty));
if (Result->getType() != ScalableDstTy)
Result = CGF.Builder.CreateBitCast(Result, ScalableDstTy);
if (Result->getType() != Ty)
Result = CGF.Builder.CreateExtractVector(Ty, Result, uint64_t(0));
return Result;
}
}
Expand Down Expand Up @@ -1476,8 +1480,14 @@ CoerceScalableToFixed(CodeGenFunction &CGF, llvm::FixedVectorType *ToTy,
// If we are casting a scalable i1 predicate vector to a fixed i8
// vector, first bitcast the source.
if (FromTy->getElementType()->isIntegerTy(1) &&
FromTy->getElementCount().isKnownMultipleOf(8) &&
ToTy->getElementType() == CGF.Builder.getInt8Ty()) {
if (!FromTy->getElementCount().isKnownMultipleOf(8)) {
FromTy = llvm::ScalableVectorType::get(
FromTy->getElementType(),
llvm::alignTo<8>(FromTy->getElementCount().getKnownMinValue()));
llvm::Value *ZeroVec = llvm::Constant::getNullValue(FromTy);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be poison instead of zero?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure of the semantics of bitcasting poison elements to a larger element type. Does it poison just the bits or the whole element?

Copy link
Collaborator

@paulwalker-arm paulwalker-arm May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LangRef for bitcast says "It is always a no-op cast because no bits change with this conversion.", which suggests any non-poison bits must be preserved.

Copy link
Collaborator Author

@topperc topperc May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the original RFC for poison had this text https://lists.llvm.org/pipermail/llvm-dev/2016-October/106182.html

 - bitcast between types of different bitwidth can return poison if when
concatenating adjacent values one of these values is poison. For example:
%x = bitcast <3 x i2>  <2, poison, 2> to <2 x i3>
  ->
%x = <poison, poison>

%x = bitcast <6 x i2>  <2, poison, 2, 2, 2, 2> to <4 x i3>
  ->
%x = <poison, poison, 5, 2>

I'm not sure if that's the semantics we ended up with. If those are the semantics then, i1 poison elements that get added to the upper bits would poison the entire i8 element after the bitcast.

@nikic @nunoplopes what are the semantics of bitcasting poison elements to a larger element type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the semantics. A single poison bits taints the whole value.
So when you bitcast a vector to elements with different sizes, you need to account for the lanes that share the original lane.
In summary, you can't combine poison elements into a lane that is meaningful for you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a choice between the LangRef and an old mailing list post then I think the LangRef should win. At least the bitcast part of the LangRef suggests this is not true, which is backed up by https://godbolt.org/z/n67WTcf1v that shows effort is put in to preserving the known bits?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is just a missed optimization. Offering bit-level poison semantics is too complicated. No one managed to come up with a proposal for that. So, we have to go with value-wise poison semantics for the foreseeable future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say "a missed optimization", I say "it has implemented the LangRef" :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nunoplopes is correct, there is no bitwise poison in LLVM. That particular bit of phrasing in the bitcast docs probably predates the introduction of poison.

V = CGF.Builder.CreateInsertVector(FromTy, ZeroVec, V, uint64_t(0));
}
FromTy = llvm::ScalableVectorType::get(
ToTy->getElementType(),
FromTy->getElementCount().getKnownMinValue() / 8);
Expand Down
21 changes: 17 additions & 4 deletions clang/lib/CodeGen/CGExprScalar.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2492,18 +2492,22 @@ Value *ScalarExprEmitter::VisitCastExpr(CastExpr *CE) {
// If we are casting a fixed i8 vector to a scalable i1 predicate
// vector, use a vector insert and bitcast the result.
if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
FixedSrcTy->getElementType()->isIntegerTy(8)) {
ScalableDstTy = llvm::ScalableVectorType::get(
FixedSrcTy->getElementType(),
ScalableDstTy->getElementCount().getKnownMinValue() / 8);
llvm::divideCeil(
ScalableDstTy->getElementCount().getKnownMinValue(), 8));
}
if (FixedSrcTy->getElementType() == ScalableDstTy->getElementType()) {
llvm::Value *PoisonVec = llvm::PoisonValue::get(ScalableDstTy);
llvm::Value *Result = Builder.CreateInsertVector(
ScalableDstTy, PoisonVec, Src, uint64_t(0), "cast.scalable");
ScalableDstTy = cast<llvm::ScalableVectorType>(
llvm::VectorType::getWithSizeAndScalar(ScalableDstTy, DstTy));
if (Result->getType() != ScalableDstTy)
Result = Builder.CreateBitCast(Result, ScalableDstTy);
if (Result->getType() != DstTy)
Result = Builder.CreateBitCast(Result, DstTy);
Result = Builder.CreateExtractVector(DstTy, Result, uint64_t(0));
return Result;
}
}
Expand All @@ -2517,8 +2521,17 @@ Value *ScalarExprEmitter::VisitCastExpr(CastExpr *CE) {
// If we are casting a scalable i1 predicate vector to a fixed i8
// vector, bitcast the source and use a vector extract.
if (ScalableSrcTy->getElementType()->isIntegerTy(1) &&
ScalableSrcTy->getElementCount().isKnownMultipleOf(8) &&
FixedDstTy->getElementType()->isIntegerTy(8)) {
if (!ScalableSrcTy->getElementCount().isKnownMultipleOf(8)) {
ScalableSrcTy = llvm::ScalableVectorType::get(
ScalableSrcTy->getElementType(),
llvm::alignTo<8>(
ScalableSrcTy->getElementCount().getKnownMinValue()));
llvm::Value *ZeroVec = llvm::Constant::getNullValue(ScalableSrcTy);
Src = Builder.CreateInsertVector(ScalableSrcTy, ZeroVec, Src,
uint64_t(0));
}

ScalableSrcTy = llvm::ScalableVectorType::get(
FixedDstTy->getElementType(),
ScalableSrcTy->getElementCount().getKnownMinValue() / 8);
Expand Down
104 changes: 16 additions & 88 deletions clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,12 @@ typedef vbool64_t fixed_bool64_t __attribute__((riscv_rvv_vector_bits(__riscv_v_

// CHECK-64-LABEL: @call_bool32_ff(
// CHECK-64-NEXT: entry:
// CHECK-64-NEXT: [[SAVED_VALUE4:%.*]] = alloca <vscale x 2 x i1>, align 1
// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2_COERCE:%.*]], i64 2)
// CHECK-64-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA6:![0-9]+]]
// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10:![0-9]+]]
// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
// CHECK-64-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[TMP1:%.*]], i64 2)
// CHECK-64-NEXT: ret <vscale x 2 x i1> [[TMP2]]
//
// CHECK-128-LABEL: @call_bool32_ff(
// CHECK-128-NEXT: entry:
// CHECK-128-NEXT: [[SAVED_VALUE4:%.*]] = alloca <vscale x 2 x i1>, align 1
// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2_COERCE:%.*]], i64 4)
// CHECK-128-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA6:![0-9]+]]
// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10:![0-9]+]]
// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
// CHECK-128-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[TMP1:%.*]], i64 4)
// CHECK-128-NEXT: ret <vscale x 2 x i1> [[TMP2]]
//
fixed_bool32_t call_bool32_ff(fixed_bool32_t op1, fixed_bool32_t op2) {
Expand All @@ -41,24 +29,12 @@ fixed_bool32_t call_bool32_ff(fixed_bool32_t op1, fixed_bool32_t op2) {

// CHECK-64-LABEL: @call_bool64_ff(
// CHECK-64-NEXT: entry:
// CHECK-64-NEXT: [[SAVED_VALUE4:%.*]] = alloca <vscale x 1 x i1>, align 1
// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2_COERCE:%.*]], i64 1)
// CHECK-64-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA11:![0-9]+]]
// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10]]
// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
// CHECK-64-NEXT: [[TMP2:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[TMP1:%.*]], i64 1)
// CHECK-64-NEXT: ret <vscale x 1 x i1> [[TMP2]]
//
// CHECK-128-LABEL: @call_bool64_ff(
// CHECK-128-NEXT: entry:
// CHECK-128-NEXT: [[SAVED_VALUE4:%.*]] = alloca <vscale x 1 x i1>, align 1
// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2_COERCE:%.*]], i64 2)
// CHECK-128-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA11:![0-9]+]]
// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10]]
// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
// CHECK-128-NEXT: [[TMP2:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[TMP1:%.*]], i64 2)
// CHECK-128-NEXT: ret <vscale x 1 x i1> [[TMP2]]
//
fixed_bool64_t call_bool64_ff(fixed_bool64_t op1, fixed_bool64_t op2) {
Expand All @@ -71,51 +47,27 @@ fixed_bool64_t call_bool64_ff(fixed_bool64_t op1, fixed_bool64_t op2) {

// CHECK-64-LABEL: @call_bool32_fs(
// CHECK-64-NEXT: entry:
// CHECK-64-NEXT: [[SAVED_VALUE2:%.*]] = alloca <vscale x 2 x i1>, align 1
// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
// CHECK-64-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA6]]
// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
// CHECK-64-NEXT: ret <vscale x 2 x i1> [[TMP2]]
// CHECK-64-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
// CHECK-64-NEXT: ret <vscale x 2 x i1> [[TMP1]]
//
// CHECK-128-LABEL: @call_bool32_fs(
// CHECK-128-NEXT: entry:
// CHECK-128-NEXT: [[SAVED_VALUE2:%.*]] = alloca <vscale x 2 x i1>, align 1
// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
// CHECK-128-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA6]]
// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
// CHECK-128-NEXT: ret <vscale x 2 x i1> [[TMP2]]
// CHECK-128-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
// CHECK-128-NEXT: ret <vscale x 2 x i1> [[TMP1]]
//
fixed_bool32_t call_bool32_fs(fixed_bool32_t op1, vbool32_t op2) {
return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 32);
}

// CHECK-64-LABEL: @call_bool64_fs(
// CHECK-64-NEXT: entry:
// CHECK-64-NEXT: [[SAVED_VALUE2:%.*]] = alloca <vscale x 1 x i1>, align 1
// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
// CHECK-64-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA11]]
// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
// CHECK-64-NEXT: ret <vscale x 1 x i1> [[TMP2]]
// CHECK-64-NEXT: [[TMP1:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
// CHECK-64-NEXT: ret <vscale x 1 x i1> [[TMP1]]
//
// CHECK-128-LABEL: @call_bool64_fs(
// CHECK-128-NEXT: entry:
// CHECK-128-NEXT: [[SAVED_VALUE2:%.*]] = alloca <vscale x 1 x i1>, align 1
// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
// CHECK-128-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA11]]
// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
// CHECK-128-NEXT: ret <vscale x 1 x i1> [[TMP2]]
// CHECK-128-NEXT: [[TMP1:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
// CHECK-128-NEXT: ret <vscale x 1 x i1> [[TMP1]]
//
fixed_bool64_t call_bool64_fs(fixed_bool64_t op1, vbool64_t op2) {
return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 64);
Expand All @@ -127,51 +79,27 @@ fixed_bool64_t call_bool64_fs(fixed_bool64_t op1, vbool64_t op2) {

// CHECK-64-LABEL: @call_bool32_ss(
// CHECK-64-NEXT: entry:
// CHECK-64-NEXT: [[SAVED_VALUE:%.*]] = alloca <vscale x 2 x i1>, align 1
// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
// CHECK-64-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6]]
// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
// CHECK-64-NEXT: ret <vscale x 2 x i1> [[TMP2]]
// CHECK-64-NEXT: ret <vscale x 2 x i1> [[TMP0]]
//
// CHECK-128-LABEL: @call_bool32_ss(
// CHECK-128-NEXT: entry:
// CHECK-128-NEXT: [[SAVED_VALUE:%.*]] = alloca <vscale x 2 x i1>, align 1
// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
// CHECK-128-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6]]
// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
// CHECK-128-NEXT: ret <vscale x 2 x i1> [[TMP2]]
// CHECK-128-NEXT: ret <vscale x 2 x i1> [[TMP0]]
//
fixed_bool32_t call_bool32_ss(vbool32_t op1, vbool32_t op2) {
return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 32);
}

// CHECK-64-LABEL: @call_bool64_ss(
// CHECK-64-NEXT: entry:
// CHECK-64-NEXT: [[SAVED_VALUE:%.*]] = alloca <vscale x 1 x i1>, align 1
// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
// CHECK-64-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA11]]
// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
// CHECK-64-NEXT: ret <vscale x 1 x i1> [[TMP2]]
// CHECK-64-NEXT: ret <vscale x 1 x i1> [[TMP0]]
//
// CHECK-128-LABEL: @call_bool64_ss(
// CHECK-128-NEXT: entry:
// CHECK-128-NEXT: [[SAVED_VALUE:%.*]] = alloca <vscale x 1 x i1>, align 1
// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
// CHECK-128-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA11]]
// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
// CHECK-128-NEXT: ret <vscale x 1 x i1> [[TMP2]]
// CHECK-128-NEXT: ret <vscale x 1 x i1> [[TMP0]]
//
fixed_bool64_t call_bool64_ss(vbool64_t op1, vbool64_t op2) {
return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 64);
Expand Down
Loading