Skip to content

[AArch64][GlobalISel] Adopt some Ld1Lane* patterns for GlobalISel to reduce codegen regressions #69607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions llvm/lib/CodeGen/GlobalISel/GIMatchTableExecutor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/TargetOpcodes.h"

#define DEBUG_TYPE "gi-match-table-executor"

Expand Down Expand Up @@ -62,9 +63,16 @@ bool GIMatchTableExecutor::isBaseWithConstantOffset(
bool GIMatchTableExecutor::isObviouslySafeToFold(MachineInstr &MI,
MachineInstr &IntoMI) const {
// Immediate neighbours are already folded.
if (MI.getParent() == IntoMI.getParent() &&
std::next(MI.getIterator()) == IntoMI.getIterator())
return true;
// Any G_CONSTANT between immediate neighbours can be ignored.
if (MI.getParent() == IntoMI.getParent()) {
auto IntoIt = IntoMI.getIterator();
auto NextIt = std::next(MI.getIterator());
while (!NextIt.isEnd() && NextIt != IntoIt &&
NextIt->getOpcode() == TargetOpcode::G_CONSTANT)
++NextIt;
if (NextIt == IntoIt)
return true;
}
Comment on lines +66 to +75
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example of what this does? Like a code pattern that would be rejected without this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before InstructionSelect, test_v4i16_post_reg_ld1lane function from llvm/test/CodeGen/AArch64/arm64-indexed-vector-ldst.ll contains this fragment:

  %4:fpr(s16) = G_LOAD %0:gpr(p0) :: (load (s16) from %ir.bar)
  %6:gpr(s32) = G_CONSTANT i32 1
  %5:fpr(<4 x s16>) = G_INSERT_VECTOR_ELT %3:fpr, %4:fpr(s16), %6:gpr(s32)

G_CONSTANT between instructions prevented Ld1Lane64PatGISel from being applied to this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there's a more general solution somewhere but I can't think of it at the moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there's a more general solution somewhere but I can't think of it at the moment.

Do you consider this credible? #70830


// Convergent instructions cannot be moved in the CFG.
if (MI.isConvergent() && MI.getParent() != IntoMI.getParent())
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/Target/AArch64/AArch64InstrFormats.td
Original file line number Diff line number Diff line change
Expand Up @@ -484,6 +484,9 @@ def UImmS8XForm : SDNodeXForm<imm, [{
return CurDAG->getTargetConstant(N->getZExtValue() / 8, SDLoc(N), MVT::i64);
}]>;

def gi_UImmS1XForm : GICustomOperandRenderer<"renderUImmS1">,
GISDNodeXFormEquiv<UImmS1XForm>;

// uimm5sN predicate - True if the immediate is a multiple of N in the range
// [0 * N, 32 * N].
def UImm5s2Operand : UImmScaledMemoryIndexed<5, 2>;
Expand Down
36 changes: 36 additions & 0 deletions llvm/lib/Target/AArch64/AArch64InstrGISel.td
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,7 @@ def : GINodeEquiv<G_UMULL, AArch64umull>;
def : GINodeEquiv<G_SMULL, AArch64smull>;

def : GINodeEquiv<G_EXTRACT_VECTOR_ELT, vector_extract>;
def : GINodeEquiv<G_INSERT_VECTOR_ELT, insertelt>;

def : GINodeEquiv<G_PREFETCH, AArch64Prefetch>;

Expand Down Expand Up @@ -528,3 +529,38 @@ def : Pat<(v2i64 (AArch64dup (i64 (load GPR64sp:$Rn)))),
(LD1Rv2d GPR64sp:$Rn)>;
def : Pat<(v1i64 (AArch64dup (i64 (load GPR64sp:$Rn)))),
(LD1Rv1d GPR64sp:$Rn)>;

class Ld1Lane64PatGISel<SDPatternOperator scalar_load, Operand VecIndex,
ValueType VTy, ValueType STy, Instruction LD1>
: Pat<(insertelt (VTy VecListOne64:$Rd),
(STy (scalar_load GPR64sp:$Rn)), VecIndex:$idx),
(EXTRACT_SUBREG
(LD1 (SUBREG_TO_REG (i32 0), VecListOne64:$Rd, dsub),
(UImmS1XForm VecIndex:$idx), GPR64sp:$Rn),
dsub)>;

class Ld1Lane128PatGISel<Operand VecIndex, ValueType VTy,
ValueType STy, Instruction LD1>
: Pat<(insertelt (VTy VecListOne128:$Rd),
(STy (load GPR64sp:$Rn)), VecIndex:$idx),
(LD1 VecListOne128:$Rd, (UImmS1XForm VecIndex:$idx), GPR64sp:$Rn)>;

// Enable these patterns only for GlobalISel, since
// SelectionDAG analogues only select insertelt with i32 indices.
let Predicates = [OnlyGISel] in {
def : Ld1Lane64PatGISel<load, VectorIndexB, v8i8, i8, LD1i8>;
def : Ld1Lane64PatGISel<load, VectorIndexB32b, v8i8, i8, LD1i8>;
def : Ld1Lane64PatGISel<load, VectorIndexH, v4i16, i16, LD1i16>;
def : Ld1Lane64PatGISel<load, VectorIndexH32b, v4i16, i16, LD1i16>;
def : Ld1Lane64PatGISel<load, VectorIndexS, v2i32, i32, LD1i32>;
def : Ld1Lane64PatGISel<load, VectorIndexS32b, v2i32, i32, LD1i32>;

def : Ld1Lane128PatGISel<VectorIndexB, v16i8, i8, LD1i8>;
def : Ld1Lane128PatGISel<VectorIndexB32b, v16i8, i8, LD1i8>;
def : Ld1Lane128PatGISel<VectorIndexH, v8i16, i16, LD1i16>;
def : Ld1Lane128PatGISel<VectorIndexH32b, v8i16, i16, LD1i16>;
def : Ld1Lane128PatGISel<VectorIndexH, v4i32, i32, LD1i32>;
def : Ld1Lane128PatGISel<VectorIndexH32b, v4i32, i32, LD1i32>;
def : Ld1Lane128PatGISel<VectorIndexH, v2i64, i64, LD1i64>;
def : Ld1Lane128PatGISel<VectorIndexH32b, v2i64, i64, LD1i64>;
}
2 changes: 2 additions & 0 deletions llvm/lib/Target/AArch64/AArch64InstrInfo.td
Original file line number Diff line number Diff line change
Expand Up @@ -872,6 +872,8 @@ let RecomputePerFunction = 1 in {

def SLSBLRMitigation : Predicate<[{ MF->getSubtarget<AArch64Subtarget>().hardenSlsBlr() }]>;
def NoSLSBLRMitigation : Predicate<[{ !MF->getSubtarget<AArch64Subtarget>().hardenSlsBlr() }]>;

def OnlyGISel : Predicate<"!MF->getProperties().hasProperty(MachineFunctionProperties::Property::FailedISel) && MF->getProperties().hasProperty(MachineFunctionProperties::Property::Legalized)">;
// Toggles patterns which aren't beneficial in GlobalISel when we aren't
// optimizing. This allows us to selectively use patterns without impacting
// SelectionDAG's behaviour.
Expand Down
11 changes: 11 additions & 0 deletions llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -479,6 +479,9 @@ class AArch64InstructionSelector : public InstructionSelector {
const MachineInstr &MI,
int OpIdx = -1) const;

void renderUImmS1(MachineInstrBuilder &MIB, const MachineInstr &MI,
int OpIdx = -1) const;

// Materialize a GlobalValue or BlockAddress using a movz+movk sequence.
void materializeLargeCMVal(MachineInstr &I, const Value *V, unsigned OpFlags);

Expand Down Expand Up @@ -7593,6 +7596,14 @@ void AArch64InstructionSelector::renderFPImm32SIMDModImmType4(
.getZExtValue()));
}

void AArch64InstructionSelector::renderUImmS1(MachineInstrBuilder &MIB,
const MachineInstr &MI,
int OpIdx) const {
assert(MI.getOpcode() == TargetOpcode::G_CONSTANT && OpIdx == -1 &&
"Expected G_CONSTANT");
MIB.addImm(MI.getOperand(1).getCImm()->getZExtValue());
}

bool AArch64InstructionSelector::isLoadStoreOfNumBytes(
const MachineInstr &MI, unsigned NumBytes) const {
if (!MI.mayLoadOrStore())
Expand Down
80 changes: 29 additions & 51 deletions llvm/test/CodeGen/AArch64/arm64-indexed-vector-ldst.ll
Original file line number Diff line number Diff line change
Expand Up @@ -14283,10 +14283,9 @@ define <16 x i8> @test_v16i8_post_imm_ld1lane(ptr %bar, ptr %ptr, <16 x i8> %A)
;
; CHECK-GISEL-LABEL: test_v16i8_post_imm_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr b1, [x0]
; CHECK-GISEL-NEXT: ld1.b { v0 }[1], [x0]
; CHECK-GISEL-NEXT: add x8, x0, #1
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: mov.b v0[1], v1[0]
; CHECK-GISEL-NEXT: ret
%tmp1 = load i8, ptr %bar
%tmp2 = insertelement <16 x i8> %A, i8 %tmp1, i32 1
Expand All @@ -14304,10 +14303,9 @@ define <16 x i8> @test_v16i8_post_reg_ld1lane(ptr %bar, ptr %ptr, i64 %inc, <16
;
; CHECK-GISEL-LABEL: test_v16i8_post_reg_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr b1, [x0]
; CHECK-GISEL-NEXT: ld1.b { v0 }[1], [x0]
; CHECK-GISEL-NEXT: add x8, x0, x2
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: mov.b v0[1], v1[0]
; CHECK-GISEL-NEXT: ret
%tmp1 = load i8, ptr %bar
%tmp2 = insertelement <16 x i8> %A, i8 %tmp1, i32 1
Expand All @@ -14327,11 +14325,10 @@ define <8 x i8> @test_v8i8_post_imm_ld1lane(ptr %bar, ptr %ptr, <8 x i8> %A) {
;
; CHECK-GISEL-LABEL: test_v8i8_post_imm_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr b1, [x0]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 def $q0
; CHECK-GISEL-NEXT: add x8, x0, #1
; CHECK-GISEL-NEXT: ld1.b { v0 }[1], [x0]
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: mov.b v0[1], v1[0]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 killed $q0
; CHECK-GISEL-NEXT: ret
%tmp1 = load i8, ptr %bar
Expand All @@ -14352,11 +14349,10 @@ define <8 x i8> @test_v8i8_post_reg_ld1lane(ptr %bar, ptr %ptr, i64 %inc, <8 x i
;
; CHECK-GISEL-LABEL: test_v8i8_post_reg_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr b1, [x0]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 def $q0
; CHECK-GISEL-NEXT: add x8, x0, x2
; CHECK-GISEL-NEXT: ld1.b { v0 }[1], [x0]
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: mov.b v0[1], v1[0]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 killed $q0
; CHECK-GISEL-NEXT: ret
%tmp1 = load i8, ptr %bar
Expand All @@ -14375,10 +14371,9 @@ define <8 x i16> @test_v8i16_post_imm_ld1lane(ptr %bar, ptr %ptr, <8 x i16> %A)
;
; CHECK-GISEL-LABEL: test_v8i16_post_imm_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr h1, [x0]
; CHECK-GISEL-NEXT: ld1.h { v0 }[1], [x0]
; CHECK-GISEL-NEXT: add x8, x0, #2
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: mov.h v0[1], v1[0]
; CHECK-GISEL-NEXT: ret
%tmp1 = load i16, ptr %bar
%tmp2 = insertelement <8 x i16> %A, i16 %tmp1, i32 1
Expand All @@ -14397,9 +14392,8 @@ define <8 x i16> @test_v8i16_post_reg_ld1lane(ptr %bar, ptr %ptr, i64 %inc, <8 x
;
; CHECK-GISEL-LABEL: test_v8i16_post_reg_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr h1, [x0]
; CHECK-GISEL-NEXT: ld1.h { v0 }[1], [x0]
; CHECK-GISEL-NEXT: add x8, x0, x2, lsl #1
; CHECK-GISEL-NEXT: mov.h v0[1], v1[0]
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: ret
%tmp1 = load i16, ptr %bar
Expand All @@ -14420,11 +14414,10 @@ define <4 x i16> @test_v4i16_post_imm_ld1lane(ptr %bar, ptr %ptr, <4 x i16> %A)
;
; CHECK-GISEL-LABEL: test_v4i16_post_imm_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr h1, [x0]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 def $q0
; CHECK-GISEL-NEXT: add x8, x0, #2
; CHECK-GISEL-NEXT: ld1.h { v0 }[1], [x0]
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: mov.h v0[1], v1[0]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 killed $q0
; CHECK-GISEL-NEXT: ret
%tmp1 = load i16, ptr %bar
Expand All @@ -14446,12 +14439,11 @@ define <4 x i16> @test_v4i16_post_reg_ld1lane(ptr %bar, ptr %ptr, i64 %inc, <4 x
;
; CHECK-GISEL-LABEL: test_v4i16_post_reg_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr h1, [x0]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 def $q0
; CHECK-GISEL-NEXT: add x8, x0, x2, lsl #1
; CHECK-GISEL-NEXT: mov.h v0[1], v1[0]
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: ld1.h { v0 }[1], [x0]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 killed $q0
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: ret
%tmp1 = load i16, ptr %bar
%tmp2 = insertelement <4 x i16> %A, i16 %tmp1, i32 1
Expand All @@ -14469,10 +14461,9 @@ define <4 x i32> @test_v4i32_post_imm_ld1lane(ptr %bar, ptr %ptr, <4 x i32> %A)
;
; CHECK-GISEL-LABEL: test_v4i32_post_imm_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr s1, [x0]
; CHECK-GISEL-NEXT: ld1.s { v0 }[1], [x0]
; CHECK-GISEL-NEXT: add x8, x0, #4
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: mov.s v0[1], v1[0]
; CHECK-GISEL-NEXT: ret
%tmp1 = load i32, ptr %bar
%tmp2 = insertelement <4 x i32> %A, i32 %tmp1, i32 1
Expand All @@ -14491,9 +14482,8 @@ define <4 x i32> @test_v4i32_post_reg_ld1lane(ptr %bar, ptr %ptr, i64 %inc, <4 x
;
; CHECK-GISEL-LABEL: test_v4i32_post_reg_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr s1, [x0]
; CHECK-GISEL-NEXT: ld1.s { v0 }[1], [x0]
; CHECK-GISEL-NEXT: add x8, x0, x2, lsl #2
; CHECK-GISEL-NEXT: mov.s v0[1], v1[0]
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: ret
%tmp1 = load i32, ptr %bar
Expand All @@ -14514,11 +14504,10 @@ define <2 x i32> @test_v2i32_post_imm_ld1lane(ptr %bar, ptr %ptr, <2 x i32> %A)
;
; CHECK-GISEL-LABEL: test_v2i32_post_imm_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr s1, [x0]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 def $q0
; CHECK-GISEL-NEXT: add x8, x0, #4
; CHECK-GISEL-NEXT: ld1.s { v0 }[1], [x0]
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: mov.s v0[1], v1[0]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 killed $q0
; CHECK-GISEL-NEXT: ret
%tmp1 = load i32, ptr %bar
Expand All @@ -14540,12 +14529,11 @@ define <2 x i32> @test_v2i32_post_reg_ld1lane(ptr %bar, ptr %ptr, i64 %inc, <2 x
;
; CHECK-GISEL-LABEL: test_v2i32_post_reg_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr s1, [x0]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 def $q0
; CHECK-GISEL-NEXT: add x8, x0, x2, lsl #2
; CHECK-GISEL-NEXT: mov.s v0[1], v1[0]
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: ld1.s { v0 }[1], [x0]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 killed $q0
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: ret
%tmp1 = load i32, ptr %bar
%tmp2 = insertelement <2 x i32> %A, i32 %tmp1, i32 1
Expand All @@ -14563,10 +14551,9 @@ define <2 x i64> @test_v2i64_post_imm_ld1lane(ptr %bar, ptr %ptr, <2 x i64> %A)
;
; CHECK-GISEL-LABEL: test_v2i64_post_imm_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr d1, [x0]
; CHECK-GISEL-NEXT: ld1.d { v0 }[1], [x0]
; CHECK-GISEL-NEXT: add x8, x0, #8
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: mov.d v0[1], v1[0]
; CHECK-GISEL-NEXT: ret
%tmp1 = load i64, ptr %bar
%tmp2 = insertelement <2 x i64> %A, i64 %tmp1, i32 1
Expand All @@ -14585,9 +14572,8 @@ define <2 x i64> @test_v2i64_post_reg_ld1lane(ptr %bar, ptr %ptr, i64 %inc, <2 x
;
; CHECK-GISEL-LABEL: test_v2i64_post_reg_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr d1, [x0]
; CHECK-GISEL-NEXT: ld1.d { v0 }[1], [x0]
; CHECK-GISEL-NEXT: add x8, x0, x2, lsl #3
; CHECK-GISEL-NEXT: mov.d v0[1], v1[0]
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: ret
%tmp1 = load i64, ptr %bar
Expand All @@ -14606,10 +14592,9 @@ define <4 x float> @test_v4f32_post_imm_ld1lane(ptr %bar, ptr %ptr, <4 x float>
;
; CHECK-GISEL-LABEL: test_v4f32_post_imm_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr s1, [x0]
; CHECK-GISEL-NEXT: ld1.s { v0 }[1], [x0]
; CHECK-GISEL-NEXT: add x8, x0, #4
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: mov.s v0[1], v1[0]
; CHECK-GISEL-NEXT: ret
%tmp1 = load float, ptr %bar
%tmp2 = insertelement <4 x float> %A, float %tmp1, i32 1
Expand All @@ -14628,9 +14613,8 @@ define <4 x float> @test_v4f32_post_reg_ld1lane(ptr %bar, ptr %ptr, i64 %inc, <4
;
; CHECK-GISEL-LABEL: test_v4f32_post_reg_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr s1, [x0]
; CHECK-GISEL-NEXT: ld1.s { v0 }[1], [x0]
; CHECK-GISEL-NEXT: add x8, x0, x2, lsl #2
; CHECK-GISEL-NEXT: mov.s v0[1], v1[0]
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: ret
%tmp1 = load float, ptr %bar
Expand All @@ -14651,11 +14635,10 @@ define <2 x float> @test_v2f32_post_imm_ld1lane(ptr %bar, ptr %ptr, <2 x float>
;
; CHECK-GISEL-LABEL: test_v2f32_post_imm_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr s1, [x0]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 def $q0
; CHECK-GISEL-NEXT: add x8, x0, #4
; CHECK-GISEL-NEXT: ld1.s { v0 }[1], [x0]
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: mov.s v0[1], v1[0]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 killed $q0
; CHECK-GISEL-NEXT: ret
%tmp1 = load float, ptr %bar
Expand All @@ -14677,12 +14660,11 @@ define <2 x float> @test_v2f32_post_reg_ld1lane(ptr %bar, ptr %ptr, i64 %inc, <2
;
; CHECK-GISEL-LABEL: test_v2f32_post_reg_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr s1, [x0]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 def $q0
; CHECK-GISEL-NEXT: add x8, x0, x2, lsl #2
; CHECK-GISEL-NEXT: mov.s v0[1], v1[0]
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: ld1.s { v0 }[1], [x0]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 killed $q0
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: ret
%tmp1 = load float, ptr %bar
%tmp2 = insertelement <2 x float> %A, float %tmp1, i32 1
Expand All @@ -14700,10 +14682,9 @@ define <2 x double> @test_v2f64_post_imm_ld1lane(ptr %bar, ptr %ptr, <2 x double
;
; CHECK-GISEL-LABEL: test_v2f64_post_imm_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr d1, [x0]
; CHECK-GISEL-NEXT: ld1.d { v0 }[1], [x0]
; CHECK-GISEL-NEXT: add x8, x0, #8
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: mov.d v0[1], v1[0]
; CHECK-GISEL-NEXT: ret
%tmp1 = load double, ptr %bar
%tmp2 = insertelement <2 x double> %A, double %tmp1, i32 1
Expand All @@ -14722,9 +14703,8 @@ define <2 x double> @test_v2f64_post_reg_ld1lane(ptr %bar, ptr %ptr, i64 %inc, <
;
; CHECK-GISEL-LABEL: test_v2f64_post_reg_ld1lane:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr d1, [x0]
; CHECK-GISEL-NEXT: ld1.d { v0 }[1], [x0]
; CHECK-GISEL-NEXT: add x8, x0, x2, lsl #3
; CHECK-GISEL-NEXT: mov.d v0[1], v1[0]
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: ret
%tmp1 = load double, ptr %bar
Expand Down Expand Up @@ -14779,15 +14759,14 @@ define <4 x i16> @test_v4i16_post_reg_ld1lane_forced_narrow(ptr %bar, ptr %ptr,
; CHECK-GISEL-LABEL: test_v4i16_post_reg_ld1lane_forced_narrow:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: add x8, x0, x2, lsl #1
; CHECK-GISEL-NEXT: ldr h1, [x0]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 def $q0
; CHECK-GISEL-NEXT: ld1.h { v0 }[1], [x0]
; CHECK-GISEL-NEXT: str x8, [x1]
; CHECK-GISEL-NEXT: mov.h v0[1], v1[0]
; CHECK-GISEL-NEXT: ldr d2, [x3]
; CHECK-GISEL-NEXT: ; kill: def $d0 killed $d0 killed $q0
; CHECK-GISEL-NEXT: cnt.8b v2, v2
; CHECK-GISEL-NEXT: uaddlp.4h v2, v2
; CHECK-GISEL-NEXT: uaddlp.2s v1, v2
; CHECK-GISEL-NEXT: ldr d1, [x3]
; CHECK-GISEL-NEXT: cnt.8b v1, v1
; CHECK-GISEL-NEXT: uaddlp.4h v1, v1
; CHECK-GISEL-NEXT: uaddlp.2s v1, v1
; CHECK-GISEL-NEXT: str d1, [x3]
; CHECK-GISEL-NEXT: ret
%tmp1 = load i16, ptr %bar
Expand Down Expand Up @@ -14987,9 +14966,8 @@ define <4 x i32> @test_inc_cycle(<4 x i32> %vec, ptr %in) {
;
; CHECK-GISEL-LABEL: test_inc_cycle:
; CHECK-GISEL: ; %bb.0:
; CHECK-GISEL-NEXT: ldr s1, [x0]
; CHECK-GISEL-NEXT: ld1.s { v0 }[0], [x0]
; CHECK-GISEL-NEXT: adrp x9, _var@PAGE
; CHECK-GISEL-NEXT: mov.s v0[0], v1[0]
; CHECK-GISEL-NEXT: fmov x8, d0
; CHECK-GISEL-NEXT: add x8, x0, x8, lsl #2
; CHECK-GISEL-NEXT: str x8, [x9, _var@PAGEOFF]
Expand Down
Loading