-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RISCV] Use VP strided load in concat_vectors combine #98131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
After llvm#98112 and llvm#98111 this should be the last use of riscv_masked_strided_load. The diff is due to vp_load not having the same generic combine for bitcasts where `(conv (load x)) -> (load (conv*)x)`. I don't think this makes much of a difference on RVV, and it doesn't seem to affect other patterns.
@llvm/pr-subscribers-backend-risc-v Author: Luke Lau (lukel97) ChangesAfter #98112 and #98111 this should be the last use of riscv_masked_strided_load. The diff is due to vp_load not having the same generic combine for bitcasts where Full diff: https://github.com/llvm/llvm-project/pull/98131.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 7972b9abc456c..a6f2641b48004 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -16205,18 +16205,10 @@ static SDValue performCONCAT_VECTORSCombine(SDNode *N, SelectionDAG &DAG,
if (MustNegateStride)
Stride = DAG.getNegative(Stride, DL, Stride.getValueType());
- SDVTList VTs = DAG.getVTList({WideVecVT, MVT::Other});
- SDValue IntID =
- DAG.getTargetConstant(Intrinsic::riscv_masked_strided_load, DL,
- Subtarget.getXLenVT());
-
SDValue AllOneMask =
DAG.getSplat(WideVecVT.changeVectorElementType(MVT::i1), DL,
DAG.getConstant(1, DL, MVT::i1));
- SDValue Ops[] = {BaseLd->getChain(), IntID, DAG.getUNDEF(WideVecVT),
- BaseLd->getBasePtr(), Stride, AllOneMask};
-
uint64_t MemSize;
if (auto *ConstStride = dyn_cast<ConstantSDNode>(Stride);
ConstStride && ConstStride->getSExtValue() >= 0)
@@ -16232,8 +16224,11 @@ static SDValue performCONCAT_VECTORSCombine(SDNode *N, SelectionDAG &DAG,
BaseLd->getPointerInfo(), BaseLd->getMemOperand()->getFlags(), MemSize,
Align);
- SDValue StridedLoad = DAG.getMemIntrinsicNode(ISD::INTRINSIC_W_CHAIN, DL, VTs,
- Ops, WideVecVT, MMO);
+ SDValue StridedLoad = DAG.getStridedLoadVP(
+ WideVecVT, DL, BaseLd->getChain(), BaseLd->getBasePtr(), Stride,
+ AllOneMask,
+ DAG.getConstant(N->getNumOperands(), DL, Subtarget.getXLenVT()), MMO);
+
for (SDValue Ld : N->ops())
DAG.makeEquivalentMemoryOrdering(cast<LoadSDNode>(Ld), StridedLoad);
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll
index 0e1105848440a..cdf0d35843620 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll
@@ -9,9 +9,9 @@
define void @widen_2xv4i16(ptr %x, ptr %z) {
; CHECK-LABEL: widen_2xv4i16:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
-; CHECK-NEXT: vle16.v v8, (a0)
-; CHECK-NEXT: vse16.v v8, (a1)
+; CHECK-NEXT: vsetivli zero, 2, e64, m1, ta, ma
+; CHECK-NEXT: vle64.v v8, (a0)
+; CHECK-NEXT: vse64.v v8, (a1)
; CHECK-NEXT: ret
%a = load <4 x i16>, ptr %x
%b.gep = getelementptr i8, ptr %x, i64 8
@@ -52,9 +52,9 @@ define void @widen_3xv4i16(ptr %x, ptr %z) {
define void @widen_4xv4i16(ptr %x, ptr %z) {
; CHECK-LABEL: widen_4xv4i16:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 16, e16, m2, ta, ma
-; CHECK-NEXT: vle16.v v8, (a0)
-; CHECK-NEXT: vse16.v v8, (a1)
+; CHECK-NEXT: vsetivli zero, 4, e64, m2, ta, ma
+; CHECK-NEXT: vle64.v v8, (a0)
+; CHECK-NEXT: vse64.v v8, (a1)
; CHECK-NEXT: ret
%a = load <4 x i16>, ptr %x
%b.gep = getelementptr i8, ptr %x, i64 8
@@ -90,9 +90,9 @@ define void @widen_4xv4i16_unaligned(ptr %x, ptr %z) {
;
; RV64-MISALIGN-LABEL: widen_4xv4i16_unaligned:
; RV64-MISALIGN: # %bb.0:
-; RV64-MISALIGN-NEXT: vsetivli zero, 16, e16, m2, ta, ma
-; RV64-MISALIGN-NEXT: vle16.v v8, (a0)
-; RV64-MISALIGN-NEXT: vse16.v v8, (a1)
+; RV64-MISALIGN-NEXT: vsetivli zero, 4, e64, m2, ta, ma
+; RV64-MISALIGN-NEXT: vle64.v v8, (a0)
+; RV64-MISALIGN-NEXT: vse64.v v8, (a1)
; RV64-MISALIGN-NEXT: ret
%a = load <4 x i16>, ptr %x, align 1
%b.gep = getelementptr i8, ptr %x, i64 8
|
; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma | ||
; CHECK-NEXT: vle16.v v8, (a0) | ||
; CHECK-NEXT: vse16.v v8, (a1) | ||
; CHECK-NEXT: vsetivli zero, 2, e64, m1, ta, ma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may cause unaligned access error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The combine checks for the alignment with TLI.isLegalStridedLoadStore(WideVecVT, Align)
, so I think this should be fine. This is what it was originally combining to anyway, the generic dag combine was just re-combining it to a smaller SEW
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the alignment of <4 x i16>
is 64 bits, IIUC? Then this makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma | ||
; CHECK-NEXT: vle16.v v8, (a0) | ||
; CHECK-NEXT: vse16.v v8, (a1) | ||
; CHECK-NEXT: vsetivli zero, 2, e64, m1, ta, ma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the alignment of <4 x i16>
is 64 bits, IIUC? Then this makes sense to me.
RISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after llvm#98131 and llvm#98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path. riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after llvm#97800
RISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after llvm#98131 and llvm#98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path. riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after llvm#97800
RISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after llvm#98131 and llvm#98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path. riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after llvm#97800
…98111) RISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after #98131 and #98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path. riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after #97800. For loads we also use a vp_select to get passthru (mask undisturbed) behaviour
…98111) Summary: RISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after #98131 and #98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path. riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after #97800. For loads we also use a vp_select to get passthru (mask undisturbed) behaviour Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250735
After #98112 and #98111 this should be the last use of riscv_masked_strided_load.
The diff is due to vp_load not having the same generic combine for bitcasts where
(conv (load x)) -> (load (conv*)x)
. I don't think this makes much of a difference on RVV, and it doesn't seem to affect other patterns.