[X86] Relax VPERMV3 to VPERMV combine for more types #97206

phoebewang · 2024-06-30T10:44:02Z

This is a follow up of #96414

This is a follow up of llvm#96414

llvmbot · 2024-06-30T10:44:32Z

@llvm/pr-subscribers-backend-x86

Author: Phoebe Wang (phoebewang)

Changes

This is a follow up of #96414

Full diff: https://github.com/llvm/llvm-project/pull/97206.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+3-5)
(modified) llvm/test/CodeGen/X86/avx512vl-intrinsics.ll (+24)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 1d4af62c3227d..8eadf079d4f2f 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -41334,15 +41334,13 @@ static SDValue combineTargetShuffle(SDValue N, const SDLoc &DL,
     return SDValue();
   }
   case X86ISD::VPERMV3: {
-    // VPERM[I,T]2[B,W] are 3 uops on Skylake and Icelake so we try to use
-    // VPERMV.
+    // Combine VPERMV3 to widened VPERMV if the two source operands are split
+    // from the same vector.
     SDValue V1 = peekThroughBitcasts(N.getOperand(0));
     SDValue V2 = peekThroughBitcasts(N.getOperand(2));
     MVT SVT = V1.getSimpleValueType();
-    MVT EVT = VT.getVectorElementType();
     MVT NVT = VT.getDoubleNumVectorElementsVT();
-    if ((EVT == MVT::i8 || EVT == MVT::i16) &&
-        (NVT.is256BitVector() ||
+    if ((NVT.is256BitVector() ||
          (NVT.is512BitVector() && Subtarget.hasEVEX512())) &&
         V1.getOpcode() == ISD::EXTRACT_SUBVECTOR &&
         V1.getConstantOperandVal(1) == 0 &&
diff --git a/llvm/test/CodeGen/X86/avx512vl-intrinsics.ll b/llvm/test/CodeGen/X86/avx512vl-intrinsics.ll
index fc7c8facb9d5e..f1c70378b1eb3 100644
--- a/llvm/test/CodeGen/X86/avx512vl-intrinsics.ll
+++ b/llvm/test/CodeGen/X86/avx512vl-intrinsics.ll
@@ -7008,6 +7008,30 @@ define <4 x double> @test_mask_vfmadd256_pd_rmkz(<4 x double> %a0, <4 x double>
   ret <4 x double> %1
 }
 
+define <8 x i32> @combine_vpermi2d_vpermps(<16 x i32> noundef %a) {
+; X86-LABEL: combine_vpermi2d_vpermps:
+; X86:       # %bb.0:
+; X86-NEXT:    vmovaps {{.*#+}} ymm1 = [14,13,6,3,5,15,0,1]
+; X86-NEXT:    # EVEX TO VEX Compression encoding: [0xc5,0xfc,0x28,0x0d,A,A,A,A]
+; X86-NEXT:    # fixup A - offset: 4, value: {{\.?LCPI[0-9]+_[0-9]+}}, kind: FK_Data_4
+; X86-NEXT:    vpermps %zmm0, %zmm1, %zmm0 # encoding: [0x62,0xf2,0x75,0x48,0x16,0xc0]
+; X86-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
+; X86-NEXT:    retl # encoding: [0xc3]
+;
+; X64-LABEL: combine_vpermi2d_vpermps:
+; X64:       # %bb.0:
+; X64-NEXT:    vmovaps {{.*#+}} ymm1 = [14,13,6,3,5,15,0,1]
+; X64-NEXT:    # EVEX TO VEX Compression encoding: [0xc5,0xfc,0x28,0x0d,A,A,A,A]
+; X64-NEXT:    # fixup A - offset: 4, value: {{\.?LCPI[0-9]+_[0-9]+}}-4, kind: reloc_riprel_4byte
+; X64-NEXT:    vpermps %zmm0, %zmm1, %zmm0 # encoding: [0x62,0xf2,0x75,0x48,0x16,0xc0]
+; X64-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
+; X64-NEXT:    retq # encoding: [0xc3]
+  %1 = shufflevector <16 x i32> %a, <16 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+  %2 = shufflevector <16 x i32> %a, <16 x i32> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+  %3 = tail call <8 x i32> @llvm.x86.avx512.vpermi2var.d.256(<8 x i32> %1, <8 x i32> <i32 14, i32 13, i32 6, i32 3, i32 5, i32 15, i32 0, i32 1>, <8 x i32> %2)
+  ret <8 x i32> %3
+}
+
 declare <8 x float> @llvm.fma.v8f32(<8 x float>, <8 x float>, <8 x float>)
 declare <4 x float> @llvm.fma.v4f32(<4 x float>, <4 x float>, <4 x float>)
 declare <4 x double> @llvm.fma.v4f64(<4 x double>, <4 x double>, <4 x double>)

RKSimon

LGTM

This is a follow up of llvm#96414

#96414 + #97206 didn't ensure that we were extracting subvectors from a vector double the width of the destination. We can relax this in a future patch, but fix the #97968 crash first. Fixes #97968

[X86] Relax VPERMV3 to VPERMV combine for more types

449e74e

This is a follow up of llvm#96414

phoebewang requested review from RKSimon and goldsteinn June 30, 2024 10:44

llvmbot added the backend:X86 label Jun 30, 2024

phoebewang mentioned this pull request Jun 30, 2024

[X86] Combine VPERMV3 to VPERMV for i8/i16 #96414

Merged

RKSimon approved these changes Jul 1, 2024

View reviewed changes

phoebewang merged commit 9b94056 into llvm:main Jul 1, 2024
7 of 9 checks passed

phoebewang deleted the vperm branch July 1, 2024 00:41

lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this pull request Jul 3, 2024

[X86] Relax VPERMV3 to VPERMV combine for more types (llvm#97206)

608d1c1

This is a follow up of llvm#96414

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] Relax VPERMV3 to VPERMV combine for more types #97206

[X86] Relax VPERMV3 to VPERMV combine for more types #97206

Uh oh!

phoebewang commented Jun 30, 2024

Uh oh!

llvmbot commented Jun 30, 2024

Uh oh!

RKSimon left a comment

Uh oh!

Uh oh!

Uh oh!

[X86] Relax VPERMV3 to VPERMV combine for more types #97206

[X86] Relax VPERMV3 to VPERMV combine for more types #97206

Uh oh!

Conversation

phoebewang commented Jun 30, 2024

Uh oh!

llvmbot commented Jun 30, 2024

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!