-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[RISCV] Fix incorrect slide offset when using vnsrl to de-interleave #132123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-risc-v Author: Min-Yih Hsu (mshockwave) ChangesGiven this shuffle:
#127272 lowers it with a bunch of vnsrl. If we describe the result in terms of the shuffle mask, we expect:
but we actually got:
for factor larger than 2. This is caused by Fix #132071 Full diff: https://github.com/llvm/llvm-project/pull/132123.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 48d8fc23dc1bb..4a0de99428605 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -5545,12 +5545,25 @@ static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG,
})) {
// Narrow each source and concatenate them.
// FIXME: For small LMUL it is better to concatenate first.
- MVT HalfVT = VT.getHalfNumVectorElementsVT();
+ MVT EltVT = VT.getVectorElementType();
+ auto EltCnt = VT.getVectorElementCount();
+ MVT SubVT =
+ MVT::getVectorVT(EltVT, EltCnt.divideCoefficientBy(Factor));
+
SDValue Lo =
- getDeinterleaveShiftAndTrunc(DL, HalfVT, V1, Factor, Index, DAG);
+ getDeinterleaveShiftAndTrunc(DL, SubVT, V1, Factor, Index, DAG);
SDValue Hi =
- getDeinterleaveShiftAndTrunc(DL, HalfVT, V2, Factor, Index, DAG);
- return DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, Lo, Hi);
+ getDeinterleaveShiftAndTrunc(DL, SubVT, V2, Factor, Index, DAG);
+
+ MVT NewVT = SubVT.getDoubleNumVectorElementsVT();
+ SDValue Concat = DAG.getNode(ISD::CONCAT_VECTORS, DL, NewVT, Lo, Hi);
+ for (unsigned F = Factor; F > 2; F >>= 1) {
+ SDValue Undef = DAG.getUNDEF(NewVT);
+ NewVT = NewVT.getDoubleNumVectorElementsVT();
+ Concat = DAG.getNode(ISD::CONCAT_VECTORS, DL, NewVT, Concat, Undef);
+ }
+
+ return Concat;
}
}
}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-deinterleave.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-deinterleave.ll
index 5e6d7c1eedb76..0b4231cedcab5 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-deinterleave.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-deinterleave.ll
@@ -378,8 +378,9 @@ define void @deinterleave4_0_i8_two_source(ptr %in0, ptr %in1, ptr %out) {
; CHECK-NEXT: vsetvli zero, zero, e8, mf8, ta, ma
; CHECK-NEXT: vnsrl.wi v8, v8, 0
; CHECK-NEXT: vnsrl.wi v9, v9, 0
+; CHECK-NEXT: vsetivli zero, 4, e8, mf4, ta, ma
+; CHECK-NEXT: vslideup.vi v9, v8, 2
; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
-; CHECK-NEXT: vslideup.vi v9, v8, 4
; CHECK-NEXT: vse8.v v9, (a2)
; CHECK-NEXT: ret
entry:
@@ -402,8 +403,9 @@ define void @deinterleave4_8_i8_two_source(ptr %in0, ptr %in1, ptr %out) {
; CHECK-NEXT: vsetvli zero, zero, e8, mf8, ta, ma
; CHECK-NEXT: vnsrl.wi v8, v8, 0
; CHECK-NEXT: vnsrl.wi v9, v9, 0
+; CHECK-NEXT: vsetivli zero, 4, e8, mf4, ta, ma
+; CHECK-NEXT: vslideup.vi v9, v8, 2
; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
-; CHECK-NEXT: vslideup.vi v9, v8, 4
; CHECK-NEXT: vse8.v v9, (a2)
; CHECK-NEXT: ret
entry:
|
for (unsigned F = Factor; F > 2; F >>= 1) { | ||
SDValue Undef = DAG.getUNDEF(NewVT); | ||
NewVT = NewVT.getDoubleNumVectorElementsVT(); | ||
Concat = DAG.getNode(ISD::CONCAT_VECTORS, DL, NewVT, Concat, Undef); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we just slowly building an INSERT_SUBVECTOR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we just slowly building an INSERT_SUBVECTOR?
The CONCAT_VECTORS lowering code does a similar thing on CONCAT_VECTORS with more than 2 operand which turns it into a tree of CONCAT_VECTORS. But I don't think this tree of CONCAT_VECTORS will become a single INSERT_SUBVECTOR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use INSERT_SUBVECTOR here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use INSERT_SUBVECTOR here?
Using INSERT_SUBVECTOR is indeed simpler. It's done know.
@@ -378,7 +378,7 @@ define void @deinterleave4_0_i8_two_source(ptr %in0, ptr %in1, ptr %out) { | |||
; CHECK-NEXT: vsetvli zero, zero, e8, mf8, ta, ma | |||
; CHECK-NEXT: vnsrl.wi v8, v8, 0 | |||
; CHECK-NEXT: vnsrl.wi v9, v9, 0 | |||
; CHECK-NEXT: vsetivli zero, 4, e8, mf4, ta, ma | |||
; CHECK-NEXT: vsetivli zero, 4, e8, mf2, tu, ma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to CONCAT_VECTORS the Lo and Hi using a 2x type, then widen the rest of the way with INSERT_SUBVECTOR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's better (no tu). It's fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Given this shuffle:
#127272 lowers it with a bunch of vnsrl. If we describe the result in terms of the shuffle mask, we expect:
but we actually got:
for factor larger than 2. This is caused by
CONCAT_VECTORS
on incorrect (sub) vector types. This patch fixes the said issue by building an aggregate vector with the correct sub vector types.Fix #132071