Skip to content

[RISCV] Fix incorrect slide offset when using vnsrl to de-interleave #132123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 20, 2025

Conversation

mshockwave
Copy link
Member

@mshockwave mshockwave commented Mar 20, 2025

Given this shuffle:

shufflevector <8 x i8> %0, <8 x i8> %1, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 undef, i32 undef, i32 undef, i32 undef>

#127272 lowers it with a bunch of vnsrl. If we describe the result in terms of the shuffle mask, we expect:

<0, 4, 8, 12, u, u, u, u>

but we actually got:

<0, 4, u, u, 8, 12, u, u>

for factor larger than 2. This is caused by CONCAT_VECTORS on incorrect (sub) vector types. This patch fixes the said issue by building an aggregate vector with the correct sub vector types.

Fix #132071

@llvmbot
Copy link
Member

llvmbot commented Mar 20, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Min-Yih Hsu (mshockwave)

Changes

Given this shuffle:

shufflevector &lt;8 x i8&gt; %0, &lt;8 x i8&gt; %1, &lt;8 x i32&gt; &lt;i32 0, i32 4, i32 8, i32 12, i32 undef, i32 undef, i32 undef, i32 undef&gt;

#127272 lowers it with a bunch of vnsrl. If we describe the result in terms of the shuffle mask, we expect:

&lt;0, 4, 8, 12, u, u, u, u&gt;

but we actually got:

&lt;0, 4, u, u, 8, 12, u, u&gt;

for factor larger than 2. This is caused by CONCAT_VECTORS on incorrect (sub) vector types. This patch fixes the said issue by using the correct sub vector types and recursively concat them with UNDEF when needed.

Fix #132071


Full diff: https://github.com/llvm/llvm-project/pull/132123.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+17-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-deinterleave.ll (+4-2)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 48d8fc23dc1bb..4a0de99428605 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -5545,12 +5545,25 @@ static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG,
             })) {
           // Narrow each source and concatenate them.
           // FIXME: For small LMUL it is better to concatenate first.
-          MVT HalfVT = VT.getHalfNumVectorElementsVT();
+          MVT EltVT = VT.getVectorElementType();
+          auto EltCnt = VT.getVectorElementCount();
+          MVT SubVT =
+              MVT::getVectorVT(EltVT, EltCnt.divideCoefficientBy(Factor));
+
           SDValue Lo =
-              getDeinterleaveShiftAndTrunc(DL, HalfVT, V1, Factor, Index, DAG);
+              getDeinterleaveShiftAndTrunc(DL, SubVT, V1, Factor, Index, DAG);
           SDValue Hi =
-              getDeinterleaveShiftAndTrunc(DL, HalfVT, V2, Factor, Index, DAG);
-          return DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, Lo, Hi);
+              getDeinterleaveShiftAndTrunc(DL, SubVT, V2, Factor, Index, DAG);
+
+          MVT NewVT = SubVT.getDoubleNumVectorElementsVT();
+          SDValue Concat = DAG.getNode(ISD::CONCAT_VECTORS, DL, NewVT, Lo, Hi);
+          for (unsigned F = Factor; F > 2; F >>= 1) {
+            SDValue Undef = DAG.getUNDEF(NewVT);
+            NewVT = NewVT.getDoubleNumVectorElementsVT();
+            Concat = DAG.getNode(ISD::CONCAT_VECTORS, DL, NewVT, Concat, Undef);
+          }
+
+          return Concat;
         }
       }
     }
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-deinterleave.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-deinterleave.ll
index 5e6d7c1eedb76..0b4231cedcab5 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-deinterleave.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-deinterleave.ll
@@ -378,8 +378,9 @@ define void @deinterleave4_0_i8_two_source(ptr %in0, ptr %in1, ptr %out) {
 ; CHECK-NEXT:    vsetvli zero, zero, e8, mf8, ta, ma
 ; CHECK-NEXT:    vnsrl.wi v8, v8, 0
 ; CHECK-NEXT:    vnsrl.wi v9, v9, 0
+; CHECK-NEXT:    vsetivli zero, 4, e8, mf4, ta, ma
+; CHECK-NEXT:    vslideup.vi v9, v8, 2
 ; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
-; CHECK-NEXT:    vslideup.vi v9, v8, 4
 ; CHECK-NEXT:    vse8.v v9, (a2)
 ; CHECK-NEXT:    ret
 entry:
@@ -402,8 +403,9 @@ define void @deinterleave4_8_i8_two_source(ptr %in0, ptr %in1, ptr %out) {
 ; CHECK-NEXT:    vsetvli zero, zero, e8, mf8, ta, ma
 ; CHECK-NEXT:    vnsrl.wi v8, v8, 0
 ; CHECK-NEXT:    vnsrl.wi v9, v9, 0
+; CHECK-NEXT:    vsetivli zero, 4, e8, mf4, ta, ma
+; CHECK-NEXT:    vslideup.vi v9, v8, 2
 ; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
-; CHECK-NEXT:    vslideup.vi v9, v8, 4
 ; CHECK-NEXT:    vse8.v v9, (a2)
 ; CHECK-NEXT:    ret
 entry:

for (unsigned F = Factor; F > 2; F >>= 1) {
SDValue Undef = DAG.getUNDEF(NewVT);
NewVT = NewVT.getDoubleNumVectorElementsVT();
Concat = DAG.getNode(ISD::CONCAT_VECTORS, DL, NewVT, Concat, Undef);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we just slowly building an INSERT_SUBVECTOR?

Copy link
Member Author

@mshockwave mshockwave Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we just slowly building an INSERT_SUBVECTOR?

The CONCAT_VECTORS lowering code does a similar thing on CONCAT_VECTORS with more than 2 operand which turns it into a tree of CONCAT_VECTORS. But I don't think this tree of CONCAT_VECTORS will become a single INSERT_SUBVECTOR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use INSERT_SUBVECTOR here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use INSERT_SUBVECTOR here?

Using INSERT_SUBVECTOR is indeed simpler. It's done know.

@@ -378,7 +378,7 @@ define void @deinterleave4_0_i8_two_source(ptr %in0, ptr %in1, ptr %out) {
; CHECK-NEXT: vsetvli zero, zero, e8, mf8, ta, ma
; CHECK-NEXT: vnsrl.wi v8, v8, 0
; CHECK-NEXT: vnsrl.wi v9, v9, 0
; CHECK-NEXT: vsetivli zero, 4, e8, mf4, ta, ma
; CHECK-NEXT: vsetivli zero, 4, e8, mf2, tu, ma
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to CONCAT_VECTORS the Lo and Hi using a 2x type, then widen the rest of the way with INSERT_SUBVECTOR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's better (no tu). It's fixed now.

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mshockwave mshockwave merged commit 03ceb26 into llvm:main Mar 20, 2025
11 checks passed
@mshockwave mshockwave deleted the patch/fix-132071 branch March 20, 2025 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RISC-V] Miscompile on rv64gcv with -O[23]
4 participants