Skip to content

[RISCV] Collect shuffle mask for the lane not by createSequentialMask #129830

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 5, 2025

Conversation

tclin914
Copy link
Contributor

@tclin914 tclin914 commented Mar 5, 2025

If there are the shuffle mask <1, u, u, u, 2, u, u, u> with factor 4. we should have the shuffle mask <1, 2> for lane 0 and <u, u> for lane 1, and so on. Since we use createSequentialMask to create the shuffle mask, the shuffle mask for lane 1 would be <u, 0>(dervied from <u, u+1>). This leads to poor code generation.

tclin914 added 2 commits March 5, 2025 10:06
If there are the shuffle mask <1, u, u, u, 2, u, u, u> with factor 4. we
should have the shuffle mask <1, 2> for lane 0 and <u, u> for lane 1,
and so on. Since we use createSequentialMask to create the shuffle mask, the
shuffle mask for lane 1 would be <u, 0>(dervied from <u, u+1>). This leads to
poor code generation.
@llvmbot
Copy link
Member

llvmbot commented Mar 5, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Jim Lin (tclin914)

Changes

If there are the shuffle mask <1, u, u, u, 2, u, u, u> with factor 4. we should have the shuffle mask <1, 2> for lane 0 and <u, u> for lane 1, and so on. Since we use createSequentialMask to create the shuffle mask, the shuffle mask for lane 1 would be <u, 0>(dervied from <u, u+1>). This leads to poor code generation.


Full diff: https://github.com/llvm/llvm-project/pull/129830.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+8-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll (+2-6)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 4e6b3a224b79b..54206aba01e05 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -23056,12 +23056,19 @@ bool RISCVTargetLowering::lowerInterleavedStore(StoreInst *SI,
       {VTy, SI->getPointerOperandType(), XLenTy});
 
   SmallVector<Value *, 10> Ops;
+  SmallVector<int, 16> NewShuffleMask;
 
   for (unsigned i = 0; i < Factor; i++) {
+    // Collect shuffle mask for this lane.
+    for (unsigned j = 0; j < VTy->getNumElements(); j++)
+      NewShuffleMask.push_back(Mask[i + Factor * j]);
+
     Value *Shuffle = Builder.CreateShuffleVector(
         SVI->getOperand(0), SVI->getOperand(1),
-        createSequentialMask(Mask[i], VTy->getNumElements(), 0));
+        NewShuffleMask);
     Ops.push_back(Shuffle);
+
+    NewShuffleMask.clear();
   }
   // This VL should be OK (should be executable in one vsseg instruction,
   // potentially under larger LMULs) because we checked that the fixed vector
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll
index 4200837227899..7cc8c0c3f2d89 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll
@@ -1394,16 +1394,12 @@ define void @store_factor4_one_active_fullwidth(ptr %ptr, <16 x i32> %v) {
   ret void
 }
 
-; TODO: This could be a vslidedown followed by a strided store
 define void @store_factor4_one_active_slidedown(ptr %ptr, <4 x i32> %v) {
 ; CHECK-LABEL: store_factor4_one_active_slidedown:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
-; CHECK-NEXT:    vslidedown.vi v9, v8, 1
-; CHECK-NEXT:    vslideup.vi v10, v8, 1
-; CHECK-NEXT:    vmv.v.v v11, v10
-; CHECK-NEXT:    vmv.v.v v12, v10
-; CHECK-NEXT:    vsseg4e32.v v9, (a0)
+; CHECK-NEXT:    vslidedown.vi v8, v8, 1
+; CHECK-NEXT:    vsseg4e32.v v8, (a0)
 ; CHECK-NEXT:    ret
   %v0 = shufflevector <4 x i32> %v, <4 x i32> poison, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 2, i32 undef, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 undef, i32 4,  i32 undef, i32 undef, i32 undef>
   store <16 x i32> %v0, ptr %ptr

Copy link

github-actions bot commented Mar 5, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice catch

@tclin914 tclin914 merged commit a0a904e into llvm:main Mar 5, 2025
11 checks passed
@tclin914 tclin914 deleted the shufflemasknotbycreateSequentialMask branch March 5, 2025 07:31
jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
…llvm#129830)

If there are the shuffle mask <1, u, u, u, 2, u, u, u> with factor 4. we
should have the shuffle mask <1, 2> for lane 0 and <u, u> for lane 1,
and so on. Since we use createSequentialMask to create the shuffle mask,
the shuffle mask for lane 1 would be <u, 0>(dervied from <u, u+1>). This
leads to poor code generation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants