Skip to content

[RISCV] Improve legalization of e8 m8 VL>256 shuffles #79330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

preames
Copy link
Collaborator

@preames preames commented Jan 24, 2024

If we can't produce a large enough index vector in i8, we may need to legalize
the shuffle (via scalarization - which in turn gets lowered into stack usage).
This change makes two related changes:

  • Deferring legalization until we actually need to generate the vrgather
    instruction. With the new recursive structure, this only happens when
    doing the fallback for one of the arms.
  • Check the actual mask values for something outside of the representable
    range.

Both are covered by recently added tests.

@llvmbot
Copy link
Member

llvmbot commented Jan 24, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

If we can't produce a large enough index vector in i8, we may need to legalize the shuffle (via splitting or scalarization). We were doing this before sub-dividing, but the actual vselect doesn't have this legality property. If our two arms can be generated without resorting to the vrgather path, we were failing for no reason.

This is a functional change, but I didn't include a test (mostly because the existing logic didn't seem to have a test either.)


Full diff: https://github.com/llvm/llvm-project/pull/79330.diff

1 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+8-8)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index a38352e8e87f21..46d75529a2a151 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -4965,14 +4965,6 @@ static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG,
   if (SDValue V = lowerVECTOR_SHUFFLEAsRotate(SVN, DAG, Subtarget))
     return V;
 
-  if (VT.getScalarSizeInBits() == 8 && VT.getVectorNumElements() > 256) {
-    // On such a large vector we're unable to use i8 as the index type.
-    // FIXME: We could promote the index to i16 and use vrgatherei16, but that
-    // may involve vector splitting if we're already at LMUL=8, or our
-    // user-supplied maximum fixed-length LMUL.
-    return SDValue();
-  }
-
   // As a backup, shuffles can be lowered via a vrgather instruction, possibly
   // merged with a second vrgather.
   SmallVector<int> ShuffleMaskLHS, ShuffleMaskRHS;
@@ -5002,6 +4994,14 @@ static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG,
   // single source permutation.  Note that all the splat variants
   // are handled above.
   if (V2.isUndef()) {
+    if (VT.getScalarSizeInBits() == 8 && VT.getVectorNumElements() > 256) {
+      // On such a large vector we're unable to use i8 as the index type.
+      // FIXME: We could promote the index to i16 and use vrgatherei16, but that
+      // may involve vector splitting if we're already at LMUL=8, or our
+      // user-supplied maximum fixed-length LMUL.
+      return SDValue();
+    }
+
     unsigned GatherVVOpc = RISCVISD::VRGATHER_VV_VL;
     MVT IndexVT = VT.changeTypeToInteger();
     // Since we can't introduce illegal index types at this stage, use i16 and

@topperc
Copy link
Collaborator

topperc commented Jan 24, 2024

The description here is misleading. I believe returning SDValue() here causes the shuffle to get expanded through memory. It will go through the Expand path in LegalizeDAG.

@topperc
Copy link
Collaborator

topperc commented Jan 24, 2024

The description here is misleading. I believe returning SDValue() here causes the shuffle to get expanded through memory. It will go through the Expand path in LegalizeDAG.

It doesn't go through memory. I guess I was thinking of build_vector. I'm not sure what shuffle expansion does. I guess it uses extracts and_vector?

preames added a commit that referenced this pull request Jan 25, 2024
Triggered by discussion on #79330.  In the process of writing this, realized one of my recent refactorings appears to have broken the legalization for the single source case here.  Fix to follow in separate patch.
preames added a commit that referenced this pull request Jan 25, 2024
If we're lowering an e8 m8 shuffle and we have an index value greater than
255, we have no available space to generate an e16 index vector.  The
code had originally handled this correctly, but in a recent refactoring
I had moved the single source code above the check, and thus broke the
single source by accident.

I have a change on review to rework this (#79330), but for now, go with the most obvious fix.
If we can't produce a large enough index vector in i8, we may need to legalize
the shuffle (via scalarization - which in turn gets lowered into stack usage).
This change makes two related changes:
* Defering legalization until we actually need to generate the vrgather
  instruction.  With the new recursive structure, this only happens when
  doing the fallback for one of the arms.
* Check the actual mask values for something outside of the representable
  range.

Both are covered by recently added tests.
@preames preames force-pushed the pr-riscv-shuffle-legalize-index-after-split branch from fdf385c to dde80b7 Compare January 25, 2024 01:35
Copy link

github-actions bot commented Jan 25, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@preames
Copy link
Collaborator Author

preames commented Jan 25, 2024

I went ahead and added test coverage for both the original code and the new case. In the process, I discovered that one my earlier refactorings actually broke this logic - I think my mental state was assuming the prior version of this patch despite it not having landed - and I landed a separate change to get us back to a working baseline.

Pushed a reworked change, please take a look at the description from scratch.

@preames preames changed the title [RISCV] Legalize shuffle index after splitting two argument shuffles [RISCV] Improve legalization of e8 m8 VL>256 shuffles Jan 25, 2024
Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if (VT.getScalarSizeInBits() == 8 && VT.getVectorNumElements() > 256) {
// On such a large vector we're unable to use i8 as the index type.
if (VT.getScalarSizeInBits() == 8 &&
any_of(enumerate(Mask), [&](const auto &Idx) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need enumerate?

@preames preames merged commit ff53d50 into llvm:main Jan 31, 2024
@preames preames deleted the pr-riscv-shuffle-legalize-index-after-split branch January 31, 2024 22:41
carlosgalvezp pushed a commit to carlosgalvezp/llvm-project that referenced this pull request Feb 1, 2024
If we can't produce a large enough index vector in i8, we may need to legalize
the shuffle (via scalarization - which in turn gets lowered into stack usage).
This change makes two related changes:
* Deferring legalization until we actually need to generate the vrgather
  instruction.  With the new recursive structure, this only happens when
  doing the fallback for one of the arms.
* Check the actual mask values for something outside of the representable
  range.

Both are covered by recently added tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants