Skip to content

[RISCV] Use VP strided load in concat_vectors combine #98131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 9, 2024

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Jul 9, 2024

After #98112 and #98111 this should be the last use of riscv_masked_strided_load.

The diff is due to vp_load not having the same generic combine for bitcasts where (conv (load x)) -> (load (conv*)x). I don't think this makes much of a difference on RVV, and it doesn't seem to affect other patterns.

After llvm#98112 and llvm#98111 this should be the last use of riscv_masked_strided_load.

The diff is due to vp_load not having the same generic combine for bitcasts where `(conv (load x)) -> (load (conv*)x)`. I don't think this makes much of a difference on RVV, and it doesn't seem to affect other patterns.
@lukel97 lukel97 requested review from topperc, wangpc-pp and yetingk July 9, 2024 08:45
@lukel97 lukel97 requested a review from preames July 9, 2024 08:45
@llvmbot
Copy link
Member

llvmbot commented Jul 9, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

After #98112 and #98111 this should be the last use of riscv_masked_strided_load.

The diff is due to vp_load not having the same generic combine for bitcasts where (conv (load x)) -> (load (conv*)x). I don't think this makes much of a difference on RVV, and it doesn't seem to affect other patterns.


Full diff: https://github.com/llvm/llvm-project/pull/98131.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+5-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll (+9-9)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 7972b9abc456c..a6f2641b48004 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -16205,18 +16205,10 @@ static SDValue performCONCAT_VECTORSCombine(SDNode *N, SelectionDAG &DAG,
   if (MustNegateStride)
     Stride = DAG.getNegative(Stride, DL, Stride.getValueType());
 
-  SDVTList VTs = DAG.getVTList({WideVecVT, MVT::Other});
-  SDValue IntID =
-    DAG.getTargetConstant(Intrinsic::riscv_masked_strided_load, DL,
-                          Subtarget.getXLenVT());
-
   SDValue AllOneMask =
     DAG.getSplat(WideVecVT.changeVectorElementType(MVT::i1), DL,
                  DAG.getConstant(1, DL, MVT::i1));
 
-  SDValue Ops[] = {BaseLd->getChain(),   IntID,  DAG.getUNDEF(WideVecVT),
-                   BaseLd->getBasePtr(), Stride, AllOneMask};
-
   uint64_t MemSize;
   if (auto *ConstStride = dyn_cast<ConstantSDNode>(Stride);
       ConstStride && ConstStride->getSExtValue() >= 0)
@@ -16232,8 +16224,11 @@ static SDValue performCONCAT_VECTORSCombine(SDNode *N, SelectionDAG &DAG,
       BaseLd->getPointerInfo(), BaseLd->getMemOperand()->getFlags(), MemSize,
       Align);
 
-  SDValue StridedLoad = DAG.getMemIntrinsicNode(ISD::INTRINSIC_W_CHAIN, DL, VTs,
-                                                Ops, WideVecVT, MMO);
+  SDValue StridedLoad = DAG.getStridedLoadVP(
+      WideVecVT, DL, BaseLd->getChain(), BaseLd->getBasePtr(), Stride,
+      AllOneMask,
+      DAG.getConstant(N->getNumOperands(), DL, Subtarget.getXLenVT()), MMO);
+
   for (SDValue Ld : N->ops())
     DAG.makeEquivalentMemoryOrdering(cast<LoadSDNode>(Ld), StridedLoad);
 
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll
index 0e1105848440a..cdf0d35843620 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll
@@ -9,9 +9,9 @@
 define void @widen_2xv4i16(ptr %x, ptr %z) {
 ; CHECK-LABEL: widen_2xv4i16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
-; CHECK-NEXT:    vle16.v v8, (a0)
-; CHECK-NEXT:    vse16.v v8, (a1)
+; CHECK-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; CHECK-NEXT:    vle64.v v8, (a0)
+; CHECK-NEXT:    vse64.v v8, (a1)
 ; CHECK-NEXT:    ret
   %a = load <4 x i16>, ptr %x
   %b.gep = getelementptr i8, ptr %x, i64 8
@@ -52,9 +52,9 @@ define void @widen_3xv4i16(ptr %x, ptr %z) {
 define void @widen_4xv4i16(ptr %x, ptr %z) {
 ; CHECK-LABEL: widen_4xv4i16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
-; CHECK-NEXT:    vle16.v v8, (a0)
-; CHECK-NEXT:    vse16.v v8, (a1)
+; CHECK-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
+; CHECK-NEXT:    vle64.v v8, (a0)
+; CHECK-NEXT:    vse64.v v8, (a1)
 ; CHECK-NEXT:    ret
   %a = load <4 x i16>, ptr %x
   %b.gep = getelementptr i8, ptr %x, i64 8
@@ -90,9 +90,9 @@ define void @widen_4xv4i16_unaligned(ptr %x, ptr %z) {
 ;
 ; RV64-MISALIGN-LABEL: widen_4xv4i16_unaligned:
 ; RV64-MISALIGN:       # %bb.0:
-; RV64-MISALIGN-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
-; RV64-MISALIGN-NEXT:    vle16.v v8, (a0)
-; RV64-MISALIGN-NEXT:    vse16.v v8, (a1)
+; RV64-MISALIGN-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
+; RV64-MISALIGN-NEXT:    vle64.v v8, (a0)
+; RV64-MISALIGN-NEXT:    vse64.v v8, (a1)
 ; RV64-MISALIGN-NEXT:    ret
   %a = load <4 x i16>, ptr %x, align 1
   %b.gep = getelementptr i8, ptr %x, i64 8

; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
; CHECK-NEXT: vle16.v v8, (a0)
; CHECK-NEXT: vse16.v v8, (a1)
; CHECK-NEXT: vsetivli zero, 2, e64, m1, ta, ma
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may cause unaligned access error?

Copy link
Contributor Author

@lukel97 lukel97 Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combine checks for the alignment with TLI.isLegalStridedLoadStore(WideVecVT, Align), so I think this should be fine. This is what it was originally combining to anyway, the generic dag combine was just re-combining it to a smaller SEW

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the alignment of <4 x i16> is 64 bits, IIUC? Then this makes sense to me.

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
; CHECK-NEXT: vle16.v v8, (a0)
; CHECK-NEXT: vse16.v v8, (a1)
; CHECK-NEXT: vsetivli zero, 2, e64, m1, ta, ma
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the alignment of <4 x i16> is 64 bits, IIUC? Then this makes sense to me.

@lukel97 lukel97 merged commit 19cc461 into llvm:main Jul 9, 2024
7 of 9 checks passed
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Jul 16, 2024
RISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after llvm#98131 and llvm#98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path.

riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after llvm#97800
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Jul 24, 2024
RISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after llvm#98131 and llvm#98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path.

riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after llvm#97800
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Jul 24, 2024
RISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after llvm#98131 and llvm#98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path.

riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after llvm#97800
lukel97 added a commit that referenced this pull request Jul 24, 2024
…98111)

RISCVGatherScatterLowering is the last user of
riscv_masked_strided_{load,store} after #98131 and #98112, this patch
changes it to emit the VP equivalent instead. This allows us to remove
the masked_strided intrinsics so we have only have one lowering path.

riscv_masked_strided_{load,store} didn't have AVL operands and were
always VLMAX, so this passes in the fixed or scalable element count to
the EVL instead, which RISCVVectorPeephole should now convert to VLMAX
after #97800.
For loads we also use a vp_select to get passthru (mask undisturbed)
behaviour
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
…98111)

Summary:
RISCVGatherScatterLowering is the last user of
riscv_masked_strided_{load,store} after #98131 and #98112, this patch
changes it to emit the VP equivalent instead. This allows us to remove
the masked_strided intrinsics so we have only have one lowering path.

riscv_masked_strided_{load,store} didn't have AVL operands and were
always VLMAX, so this passes in the fixed or scalable element count to
the EVL instead, which RISCVVectorPeephole should now convert to VLMAX
after #97800.
For loads we also use a vp_select to get passthru (mask undisturbed)
behaviour

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250735
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants