Skip to content

[DAG] Fold insert_subvector(N0, extract_subvector(N0, N2), N2) --> N0 #86487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 26, 2024

Conversation

RKSimon
Copy link
Collaborator

@RKSimon RKSimon commented Mar 25, 2024

Handle the case where we've ended up inserting back into the source vector we extracted the subvector from.

Handle the case where we've ended up inserting back into the source vector we extracted the subvector from.
@RKSimon RKSimon requested a review from topperc March 25, 2024 10:42
@RKSimon RKSimon added the llvm:SelectionDAG SelectionDAGISel as well label Mar 25, 2024
@llvmbot
Copy link
Member

llvmbot commented Mar 25, 2024

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

Changes

Handle the case where we've ended up inserting back into the source vector we extracted the subvector from.


Patch is 80.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/86487.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll (+13-17)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-6.ll (+360-376)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 05b4ce3aaa2cae..6f1b1080cc8801 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -26137,6 +26137,13 @@ SDValue DAGCombiner::visitINSERT_SUBVECTOR(SDNode *N) {
     }
   }
 
+  // Handle case where we've ended up inserting back into the source vector
+  // we extracted the subvector from.
+  // insert_subvector(N0, extract_subvector(N0, N2), N2) --> N0
+  if (N1.getOpcode() == ISD::EXTRACT_SUBVECTOR && N1.getOperand(0) == N0 &&
+      N1.getOperand(1) == N2)
+    return N0;
+
   // Simplify scalar inserts into an undef vector:
   // insert_subvector undef, (splat X), N2 -> splat X
   if (N0.isUndef() && N1.getOpcode() == ISD::SPLAT_VECTOR)
diff --git a/llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll b/llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll
index 4aae8b8bd1dc02..9a5e86d61c265e 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll
@@ -101,40 +101,36 @@ define void @vector_interleave_store_nxv16i64_nxv8i64(<vscale x 8 x i64> %a, <vs
 ; CHECK-NEXT:    slli a1, a1, 4
 ; CHECK-NEXT:    sub sp, sp, a1
 ; CHECK-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
-; CHECK-NEXT:    csrr a1, vlenb
-; CHECK-NEXT:    slli a1, a1, 3
-; CHECK-NEXT:    add a1, sp, a1
-; CHECK-NEXT:    addi a1, a1, 16
+; CHECK-NEXT:    addi a1, sp, 16
 ; CHECK-NEXT:    vs8r.v v8, (a1) # Unknown-size Folded Spill
 ; CHECK-NEXT:    csrr a1, vlenb
 ; CHECK-NEXT:    srli a2, a1, 1
 ; CHECK-NEXT:    vsetvli a3, zero, e16, m2, ta, mu
 ; CHECK-NEXT:    vid.v v24
 ; CHECK-NEXT:    vand.vi v26, v24, 1
-; CHECK-NEXT:    vmsne.vi v0, v26, 0
-; CHECK-NEXT:    vsrl.vi v6, v24, 1
-; CHECK-NEXT:    csrr a3, vlenb
-; CHECK-NEXT:    slli a3, a3, 3
-; CHECK-NEXT:    add a3, sp, a3
-; CHECK-NEXT:    addi a3, a3, 16
-; CHECK-NEXT:    vl8r.v v8, (a3) # Unknown-size Folded Reload
-; CHECK-NEXT:    vadd.vx v6, v6, a2, v0.t
+; CHECK-NEXT:    vmsne.vi v28, v26, 0
+; CHECK-NEXT:    vsrl.vi v24, v24, 1
+; CHECK-NEXT:    vmv1r.v v0, v28
+; CHECK-NEXT:    vadd.vx v24, v24, a2, v0.t
 ; CHECK-NEXT:    vmv4r.v v12, v16
 ; CHECK-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
-; CHECK-NEXT:    vrgatherei16.vv v24, v8, v6
-; CHECK-NEXT:    addi a2, sp, 16
-; CHECK-NEXT:    vs8r.v v24, (a2) # Unknown-size Folded Spill
+; CHECK-NEXT:    vrgatherei16.vv v0, v8, v24
 ; CHECK-NEXT:    csrr a2, vlenb
 ; CHECK-NEXT:    slli a2, a2, 3
 ; CHECK-NEXT:    add a2, sp, a2
 ; CHECK-NEXT:    addi a2, a2, 16
+; CHECK-NEXT:    vs8r.v v0, (a2) # Unknown-size Folded Spill
+; CHECK-NEXT:    addi a2, sp, 16
 ; CHECK-NEXT:    vl8r.v v8, (a2) # Unknown-size Folded Reload
 ; CHECK-NEXT:    vmv4r.v v16, v12
-; CHECK-NEXT:    vrgatherei16.vv v8, v16, v6
+; CHECK-NEXT:    vrgatherei16.vv v8, v16, v24
 ; CHECK-NEXT:    slli a1, a1, 3
 ; CHECK-NEXT:    add a1, a0, a1
 ; CHECK-NEXT:    vs8r.v v8, (a1)
-; CHECK-NEXT:    addi a1, sp, 16
+; CHECK-NEXT:    csrr a1, vlenb
+; CHECK-NEXT:    slli a1, a1, 3
+; CHECK-NEXT:    add a1, sp, a1
+; CHECK-NEXT:    addi a1, a1, 16
 ; CHECK-NEXT:    vl8r.v v8, (a1) # Unknown-size Folded Reload
 ; CHECK-NEXT:    vs8r.v v8, (a0)
 ; CHECK-NEXT:    csrr a0, vlenb
diff --git a/llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-6.ll b/llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-6.ll
index 88144e7880e357..de34e48c01d7d5 100644
--- a/llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-6.ll
+++ b/llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-6.ll
@@ -3004,428 +3004,412 @@ define void @store_i8_stride6_vf32(ptr %in.vecptr0, ptr %in.vecptr1, ptr %in.vec
 ; AVX512BW-LABEL: store_i8_stride6_vf32:
 ; AVX512BW:       # %bb.0:
 ; AVX512BW-NEXT:    movq {{[0-9]+}}(%rsp), %rax
-; AVX512BW-NEXT:    vmovdqa (%rdi), %ymm3
-; AVX512BW-NEXT:    vmovdqa (%rsi), %ymm5
-; AVX512BW-NEXT:    vmovdqa (%rdx), %ymm2
-; AVX512BW-NEXT:    vmovdqa (%rcx), %ymm4
-; AVX512BW-NEXT:    vmovdqa (%r8), %ymm0
-; AVX512BW-NEXT:    vmovdqa (%r9), %ymm1
-; AVX512BW-NEXT:    vmovdqa (%rsi), %xmm9
-; AVX512BW-NEXT:    vpbroadcastq {{.*#+}} xmm6 = [8,7,6,9,0,0,10,0,8,7,6,9,0,0,10,0]
-; AVX512BW-NEXT:    vpshufb %xmm6, %xmm9, %xmm7
-; AVX512BW-NEXT:    vmovdqa (%rdi), %xmm10
-; AVX512BW-NEXT:    vpshufb %xmm6, %xmm10, %xmm6
-; AVX512BW-NEXT:    vpunpckhbw {{.*#+}} xmm6 = xmm6[8],xmm7[8],xmm6[9],xmm7[9],xmm6[10],xmm7[10],xmm6[11],xmm7[11],xmm6[12],xmm7[12],xmm6[13],xmm7[13],xmm6[14],xmm7[14],xmm6[15],xmm7[15]
-; AVX512BW-NEXT:    vpermq {{.*#+}} ymm7 = ymm6[0,0,0,1]
-; AVX512BW-NEXT:    vpbroadcastq {{.*#+}} ymm6 = [5,8,7,6,9,0,0,10,5,8,7,6,9,0,0,10,5,8,7,6,9,0,0,10,5,8,7,6,9,0,0,10]
-; AVX512BW-NEXT:    vmovdqa (%rcx), %xmm11
-; AVX512BW-NEXT:    vpshufb %xmm6, %xmm11, %xmm8
-; AVX512BW-NEXT:    vmovdqa (%rdx), %xmm12
-; AVX512BW-NEXT:    vpshufb %xmm6, %xmm12, %xmm13
-; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} xmm8 = xmm13[0],xmm8[0],xmm13[1],xmm8[1],xmm13[2],xmm8[2],xmm13[3],xmm8[3],xmm13[4],xmm8[4],xmm13[5],xmm8[5],xmm13[6],xmm8[6],xmm13[7],xmm8[7]
-; AVX512BW-NEXT:    vpermq {{.*#+}} ymm8 = ymm8[0,0,0,1]
-; AVX512BW-NEXT:    movw $18724, %cx # imm = 0x4924
+; AVX512BW-NEXT:    vmovdqa (%rdi), %ymm9
+; AVX512BW-NEXT:    vmovdqa (%rsi), %ymm10
+; AVX512BW-NEXT:    vmovdqa (%rdx), %ymm11
+; AVX512BW-NEXT:    vmovdqa (%rcx), %ymm12
+; AVX512BW-NEXT:    vmovdqa (%r8), %ymm7
+; AVX512BW-NEXT:    vmovdqa (%r9), %ymm8
+; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} ymm0 = ymm9[0],ymm10[0],ymm9[1],ymm10[1],ymm9[2],ymm10[2],ymm9[3],ymm10[3],ymm9[4],ymm10[4],ymm9[5],ymm10[5],ymm9[6],ymm10[6],ymm9[7],ymm10[7],ymm9[16],ymm10[16],ymm9[17],ymm10[17],ymm9[18],ymm10[18],ymm9[19],ymm10[19],ymm9[20],ymm10[20],ymm9[21],ymm10[21],ymm9[22],ymm10[22],ymm9[23],ymm10[23]
+; AVX512BW-NEXT:    vmovdqa (%rsi), %xmm1
+; AVX512BW-NEXT:    vmovdqa (%rdi), %xmm2
+; AVX512BW-NEXT:    vpunpckhbw {{.*#+}} xmm3 = xmm2[8],xmm1[8],xmm2[9],xmm1[9],xmm2[10],xmm1[10],xmm2[11],xmm1[11],xmm2[12],xmm1[12],xmm2[13],xmm1[13],xmm2[14],xmm1[14],xmm2[15],xmm1[15]
+; AVX512BW-NEXT:    vinserti64x4 $1, %ymm0, %zmm3, %zmm0
+; AVX512BW-NEXT:    vpmovsxbw {{.*#+}} zmm3 = [4,3,6,5,4,3,6,5,4,3,6,5,7,7,7,7,24,27,26,25,24,27,26,25,24,27,26,25,28,29,30,29]
+; AVX512BW-NEXT:    vpermw %zmm0, %zmm3, %zmm0
+; AVX512BW-NEXT:    vmovdqa (%rcx), %xmm3
+; AVX512BW-NEXT:    vmovdqa (%rdx), %xmm4
+; AVX512BW-NEXT:    vpunpckhbw {{.*#+}} xmm5 = xmm4[8],xmm3[8],xmm4[9],xmm3[9],xmm4[10],xmm3[10],xmm4[11],xmm3[11],xmm4[12],xmm3[12],xmm4[13],xmm3[13],xmm4[14],xmm3[14],xmm4[15],xmm3[15]
+; AVX512BW-NEXT:    vpmovsxbw {{.*#+}} ymm6 = [5,4,3,6,5,4,3,6,5,4,3,6,7,7,7,7]
+; AVX512BW-NEXT:    vpermw %ymm5, %ymm6, %ymm5
+; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} ymm6 = ymm11[0],ymm12[0],ymm11[1],ymm12[1],ymm11[2],ymm12[2],ymm11[3],ymm12[3],ymm11[4],ymm12[4],ymm11[5],ymm12[5],ymm11[6],ymm12[6],ymm11[7],ymm12[7],ymm11[16],ymm12[16],ymm11[17],ymm12[17],ymm11[18],ymm12[18],ymm11[19],ymm12[19],ymm11[20],ymm12[20],ymm11[21],ymm12[21],ymm11[22],ymm12[22],ymm11[23],ymm12[23]
+; AVX512BW-NEXT:    vprold $16, %ymm6, %ymm6
+; AVX512BW-NEXT:    vpermq {{.*#+}} ymm6 = ymm6[2,2,2,3]
+; AVX512BW-NEXT:    vinserti64x4 $1, %ymm6, %zmm5, %zmm5
+; AVX512BW-NEXT:    movl $613566756, %ecx # imm = 0x24924924
 ; AVX512BW-NEXT:    kmovd %ecx, %k1
-; AVX512BW-NEXT:    vmovdqu16 %ymm7, %ymm8 {%k1}
-; AVX512BW-NEXT:    vinserti64x4 $1, %ymm8, %zmm0, %zmm7
-; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} xmm8 = xmm10[0],xmm9[0],xmm10[1],xmm9[1],xmm10[2],xmm9[2],xmm10[3],xmm9[3],xmm10[4],xmm9[4],xmm10[5],xmm9[5],xmm10[6],xmm9[6],xmm10[7],xmm9[7]
-; AVX512BW-NEXT:    vpmovsxbw {{.*#+}} ymm13 = [0,3,2,1,0,3,2,1,0,3,2,1,4,5,6,5]
-; AVX512BW-NEXT:    vpermw %ymm8, %ymm13, %ymm8
-; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} xmm13 = xmm12[0],xmm11[0],xmm12[1],xmm11[1],xmm12[2],xmm11[2],xmm12[3],xmm11[3],xmm12[4],xmm11[4],xmm12[5],xmm11[5],xmm12[6],xmm11[6],xmm12[7],xmm11[7]
-; AVX512BW-NEXT:    vprold $16, %xmm13, %xmm13
-; AVX512BW-NEXT:    vpermq {{.*#+}} ymm13 = ymm13[0,0,0,1]
-; AVX512BW-NEXT:    movw $9362, %cx # imm = 0x2492
+; AVX512BW-NEXT:    vmovdqu16 %zmm5, %zmm0 {%k1}
+; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} ymm13 = ymm7[0],ymm8[0],ymm7[1],ymm8[1],ymm7[2],ymm8[2],ymm7[3],ymm8[3],ymm7[4],ymm8[4],ymm7[5],ymm8[5],ymm7[6],ymm8[6],ymm7[7],ymm8[7],ymm7[16],ymm8[16],ymm7[17],ymm8[17],ymm7[18],ymm8[18],ymm7[19],ymm8[19],ymm7[20],ymm8[20],ymm7[21],ymm8[21],ymm7[22],ymm8[22],ymm7[23],ymm8[23]
+; AVX512BW-NEXT:    vmovdqa (%r9), %xmm5
+; AVX512BW-NEXT:    vmovdqa (%r8), %xmm6
+; AVX512BW-NEXT:    vpunpckhbw {{.*#+}} xmm14 = xmm6[8],xmm5[8],xmm6[9],xmm5[9],xmm6[10],xmm5[10],xmm6[11],xmm5[11],xmm6[12],xmm5[12],xmm6[13],xmm5[13],xmm6[14],xmm5[14],xmm6[15],xmm5[15]
+; AVX512BW-NEXT:    vinserti64x4 $1, %ymm13, %zmm14, %zmm13
+; AVX512BW-NEXT:    vpmovsxbw {{.*#+}} zmm14 = [2,5,4,3,2,5,4,3,2,5,4,3,6,5,6,7,26,25,24,27,26,25,24,27,26,25,24,27,28,28,28,28]
+; AVX512BW-NEXT:    movl $1227133513, %ecx # imm = 0x49249249
 ; AVX512BW-NEXT:    kmovd %ecx, %k2
-; AVX512BW-NEXT:    vmovdqu16 %ymm13, %ymm8 {%k2}
-; AVX512BW-NEXT:    vshufi64x2 {{.*#+}} zmm7 = zmm8[0,1,2,3],zmm7[4,5,6,7]
-; AVX512BW-NEXT:    vpbroadcastq {{.*#+}} ymm8 = [6,5,8,7,0,9,0,0,6,5,8,7,0,9,0,0,6,5,8,7,0,9,0,0,6,5,8,7,0,9,0,0]
-; AVX512BW-NEXT:    vmovdqa (%r9), %xmm13
-; AVX512BW-NEXT:    vpshufb %xmm8, %xmm13, %xmm14
-; AVX512BW-NEXT:    vmovdqa (%r8), %xmm15
-; AVX512BW-NEXT:    vpshufb %xmm8, %xmm15, %xmm16
-; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} xmm14 = xmm16[0],xmm14[0],xmm16[1],xmm14[1],xmm16[2],xmm14[2],xmm16[3],xmm14[3],xmm16[4],xmm14[4],xmm16[5],xmm14[5],xmm16[6],xmm14[6],xmm16[7],xmm14[7]
-; AVX512BW-NEXT:    vpermq {{.*#+}} ymm14 = ymm14[0,0,0,1]
-; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} xmm16 = xmm15[0],xmm13[0],xmm15[1],xmm13[1],xmm15[2],xmm13[2],xmm15[3],xmm13[3],xmm15[4],xmm13[4],xmm15[5],xmm13[5],xmm15[6],xmm13[6],xmm15[7],xmm13[7]
-; AVX512BW-NEXT:    vpmovsxbw {{.*#+}} ymm17 = [2,1,0,3,2,1,0,3,2,1,0,3,4,4,4,4]
-; AVX512BW-NEXT:    vpermw %ymm16, %ymm17, %ymm16
-; AVX512BW-NEXT:    vinserti64x4 $1, %ymm14, %zmm16, %zmm14
-; AVX512BW-NEXT:    movl $613566756, %ecx # imm = 0x24924924
-; AVX512BW-NEXT:    kmovd %ecx, %k3
-; AVX512BW-NEXT:    vmovdqu16 %zmm14, %zmm7 {%k3}
-; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} ymm14 = ymm3[0],ymm5[0],ymm3[1],ymm5[1],ymm3[2],ymm5[2],ymm3[3],ymm5[3],ymm3[4],ymm5[4],ymm3[5],ymm5[5],ymm3[6],ymm5[6],ymm3[7],ymm5[7],ymm3[16],ymm5[16],ymm3[17],ymm5[17],ymm3[18],ymm5[18],ymm3[19],ymm5[19],ymm3[20],ymm5[20],ymm3[21],ymm5[21],ymm3[22],ymm5[22],ymm3[23],ymm5[23]
-; AVX512BW-NEXT:    vpmovsxbw {{.*#+}} ymm16 = [8,11,10,9,8,11,10,9,8,11,10,9,12,13,14,13]
-; AVX512BW-NEXT:    vpermw %ymm14, %ymm16, %ymm14
-; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} ymm16 = ymm2[0],ymm4[0],ymm2[1],ymm4[1],ymm2[2],ymm4[2],ymm2[3],ymm4[3],ymm2[4],ymm4[4],ymm2[5],ymm4[5],ymm2[6],ymm4[6],ymm2[7],ymm4[7],ymm2[16],ymm4[16],ymm2[17],ymm4[17],ymm2[18],ymm4[18],ymm2[19],ymm4[19],ymm2[20],ymm4[20],ymm2[21],ymm4[21],ymm2[22],ymm4[22],ymm2[23],ymm4[23]
-; AVX512BW-NEXT:    vprold $16, %ymm16, %ymm16
-; AVX512BW-NEXT:    vpermq {{.*#+}} ymm16 = ymm16[2,2,2,3]
-; AVX512BW-NEXT:    vmovdqu16 %ymm16, %ymm14 {%k2}
-; AVX512BW-NEXT:    vinserti64x4 $1, %ymm14, %zmm0, %zmm14
-; AVX512BW-NEXT:    vpunpckhbw {{.*#+}} xmm11 = xmm12[8],xmm11[8],xmm12[9],xmm11[9],xmm12[10],xmm11[10],xmm12[11],xmm11[11],xmm12[12],xmm11[12],xmm12[13],xmm11[13],xmm12[14],xmm11[14],xmm12[15],xmm11[15]
-; AVX512BW-NEXT:    vpunpckhbw {{.*#+}} xmm9 = xmm10[8],xmm9[8],xmm10[9],xmm9[9],xmm10[10],xmm9[10],xmm10[11],xmm9[11],xmm10[12],xmm9[12],xmm10[13],xmm9[13],xmm10[14],xmm9[14],xmm10[15],xmm9[15]
-; AVX512BW-NEXT:    vpmovsxbw {{.*#+}} ymm10 = [4,3,6,5,4,3,6,5,4,3,6,5,7,7,7,7]
+; AVX512BW-NEXT:    vpermw %zmm13, %zmm14, %zmm0 {%k2}
+; AVX512BW-NEXT:    vpbroadcastq {{.*#+}} ymm13 = [8,7,6,9,0,0,10,0,8,7,6,9,0,0,10,0,8,7,6,9,0,0,10,0,8,7,6,9,0,0,10,0]
+; AVX512BW-NEXT:    vpshufb %ymm13, %ymm10, %ymm14
+; AVX512BW-NEXT:    vpshufb %ymm13, %ymm9, %ymm13
+; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} ymm13 = ymm13[0],ymm14[0],ymm13[1],ymm14[1],ymm13[2],ymm14[2],ymm13[3],ymm14[3],ymm13[4],ymm14[4],ymm13[5],ymm14[5],ymm13[6],ymm14[6],ymm13[7],ymm14[7],ymm13[16],ymm14[16],ymm13[17],ymm14[17],ymm13[18],ymm14[18],ymm13[19],ymm14[19],ymm13[20],ymm14[20],ymm13[21],ymm14[21],ymm13[22],ymm14[22],ymm13[23],ymm14[23]
+; AVX512BW-NEXT:    vpermq {{.*#+}} ymm13 = ymm13[2,2,2,3]
+; AVX512BW-NEXT:    vpunpckhbw {{.*#+}} ymm14 = ymm11[8],ymm12[8],ymm11[9],ymm12[9],ymm11[10],ymm12[10],ymm11[11],ymm12[11],ymm11[12],ymm12[12],ymm11[13],ymm12[13],ymm11[14],ymm12[14],ymm11[15],ymm12[15],ymm11[24],ymm12[24],ymm11[25],ymm12[25],ymm11[26],ymm12[26],ymm11[27],ymm12[27],ymm11[28],ymm12[28],ymm11[29],ymm12[29],ymm11[30],ymm12[30],ymm11[31],ymm12[31]
+; AVX512BW-NEXT:    vpmovsxbw {{.*#+}} ymm15 = [13,12,11,14,13,12,11,14,13,12,11,14,15,15,15,15]
+; AVX512BW-NEXT:    vpermw %ymm14, %ymm15, %ymm14
+; AVX512BW-NEXT:    vinserti64x4 $1, %ymm14, %zmm13, %zmm13
+; AVX512BW-NEXT:    vpbroadcastq {{.*#+}} ymm14 = [5,8,7,6,9,0,0,10,5,8,7,6,9,0,0,10,5,8,7,6,9,0,0,10,5,8,7,6,9,0,0,10]
+; AVX512BW-NEXT:    vpshufb %ymm14, %ymm12, %ymm12
+; AVX512BW-NEXT:    vpshufb %ymm14, %ymm11, %ymm11
+; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} ymm11 = ymm11[0],ymm12[0],ymm11[1],ymm12[1],ymm11[2],ymm12[2],ymm11[3],ymm12[3],ymm11[4],ymm12[4],ymm11[5],ymm12[5],ymm11[6],ymm12[6],ymm11[7],ymm12[7],ymm11[16],ymm12[16],ymm11[17],ymm12[17],ymm11[18],ymm12[18],ymm11[19],ymm12[19],ymm11[20],ymm12[20],ymm11[21],ymm12[21],ymm11[22],ymm12[22],ymm11[23],ymm12[23]
+; AVX512BW-NEXT:    vpermq {{.*#+}} ymm11 = ymm11[2,2,2,3]
+; AVX512BW-NEXT:    vpunpckhbw {{.*#+}} ymm9 = ymm9[8],ymm10[8],ymm9[9],ymm10[9],ymm9[10],ymm10[10],ymm9[11],ymm10[11],ymm9[12],ymm10[12],ymm9[13],ymm10[13],ymm9[14],ymm10[14],ymm9[15],ymm10[15],ymm9[24],ymm10[24],ymm9[25],ymm10[25],ymm9[26],ymm10[26],ymm9[27],ymm10[27],ymm9[28],ymm10[28],ymm9[29],ymm10[29],ymm9[30],ymm10[30],ymm9[31],ymm10[31]
+; AVX512BW-NEXT:    vpmovsxbw {{.*#+}} ymm10 = [12,11,14,13,12,11,14,13,12,11,14,13,15,15,15,15]
 ; AVX512BW-NEXT:    vpermw %ymm9, %ymm10, %ymm9
-; AVX512BW-NEXT:    vpmovsxbw {{.*#+}} ymm10 = [5,4,3,6,5,4,3,6,5,4,3,6,7,7,7,7]
-; AVX512BW-NEXT:    vpermw %ymm11, %ymm10, %ymm9 {%k1}
-; AVX512BW-NEXT:    vshufi64x2 {{.*#+}} zmm9 = zmm9[0,1,2,3],zmm14[4,5,6,7]
-; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} ymm10 = ymm0[0],ymm1[0],ymm0[1],ymm1[1],ymm0[2],ymm1[2],ymm0[3],ymm1[3],ymm0[4],ymm1[4],ymm0[5],ymm1[5],ymm0[6],ymm1[6],ymm0[7],ymm1[7],ymm0[16],ymm1[16],ymm0[17],ymm1[17],ymm0[18],ymm1[18],ymm0[19],ymm1[19],ymm0[20],ymm1[20],ymm0[21],ymm1[21],ymm0[22],ymm1[22],ymm0[23],ymm1[23]
-; AVX512BW-NEXT:    vpunpckhbw {{.*#+}} xmm11 = xmm15[8],xmm13[8],xmm15[9],xmm13[9],xmm15[10],xmm13[10],xmm15[11],xmm13[11],xmm15[12],xmm13[12],xmm15[13],xmm13[13],xmm15[14],xmm13[14],xmm15[15],xmm13[15]
-; AVX512BW-NEXT:    vinserti64x4 $1, %ymm10, %zmm11, %zmm10
-; AVX512BW-NEXT:    vpmovsxbw {{.*#+}} zmm11 = [2,5,4,3,2,5,4,3,2,5,4,3,6,5,6,7,26,25,24,27,26,25,24,27,26,25,24,27,28,28,28,28]
-; AVX512BW-NEXT:    movl $1227133513, %ecx # imm = 0x49249249
+; AVX512BW-NEXT:    vinserti64x4 $1, %ymm9, %zmm11, %zmm9
+; AVX512BW-NEXT:    movl $1227114788, %ecx # imm = 0x49244924
 ; AVX512BW-NEXT:    kmovd %ecx, %k2
-; AVX512BW-NEXT:    vpermw %zmm10, %zmm11, %zmm9 {%k2}
-; AVX512BW-NEXT:    vpunpckhbw {{.*#+}} ymm10 = ymm2[8],ymm4[8],ymm2[9],ymm4[9],ymm2[10],ymm4[10],ymm2[11],ymm4[11],ymm2[12],ymm4[12],ymm2[13],ymm4[13],ymm2[14],ymm4[14],ymm2[15],ymm4[15],ymm2[24],ymm4[24],ymm2[25],ymm4[25],ymm2[26],ymm4[26],ymm2[27],ymm4[27],ymm2[28],ymm4[28],ymm2[29],ymm4[29],ymm2[30],ymm4[30],ymm2[31],ymm4[31]
-; AVX512BW-NEXT:    vpunpckhbw {{.*#+}} ymm11 = ymm3[8],ymm5[8],ymm3[9],ymm5[9],ymm3[10],ymm5[10],ymm3[11],ymm5[11],ymm3[12],ymm5[12],ymm3[13],ymm5[13],ymm3[14],ymm5[14],ymm3[15],ymm5[15],ymm3[24],ymm5[24],ymm3[25],ymm5[25],ymm3[26],ymm5[26],ymm3[27],ymm5[27],ymm3[28],ymm5[28],ymm3[29],ymm5[29],ymm3[30],ymm5[30],ymm3[31],ymm5[31]
-; AVX512BW-NEXT:    vpmovsxbw {{.*#+}} ymm12 = [12,11,14,13,12,11,14,13,12,11,14,13,15,15,15,15]
-; AVX512BW-NEXT:    vpermw %ymm11, %ymm12, %ymm11
-; AVX512BW-NEXT:    vpmovsxbw {{.*#+}} ymm12 = [13,12,11,14,13,12,11,14,13,12,11,14,15,15,15,15]
-; AVX512BW-NEXT:    vpermw %ymm10, %ymm12, %ymm11 {%k1}
-; AVX512BW-NEXT:    vinserti64x4 $1, %ymm11, %zmm0, %zmm10
-; AVX512BW-NEXT:    vpbroadcastq {{.*#+}} ymm11 = [8,7,6,9,0,0,10,0,8,7,6,9,0,0,10,0,8,7,6,9,0,0,10,0,8,7,6,9,0,0,10,0]
-; AVX512BW-NEXT:    vpshufb %ymm11, %ymm5, %ymm5
-; AVX512BW-NEXT:    vpshufb %ymm11, %ymm3, %ymm3
-; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} ymm3 = ymm3[0],ymm5[0],ymm3[1],ymm5[1],ymm3[2],ymm5[2],ymm3[3],ymm5[3],ymm3[4],ymm5[4],ymm3[5],ymm5[5],ymm3[6],ymm5[6],ymm3[7],ymm5[7],ymm3[16],ymm5[16],ymm3[17],ymm5[17],ymm3[18],ymm5[18],ymm3[19],ymm5[19],ymm3[20],ymm5[20],ymm3[21],ymm5[21],ymm3[22],ymm5[22],ymm3[23],ymm5[23]
-; AVX512BW-NEXT:    vpermq {{.*#+}} ymm3 = ymm3[2,2,2,3]
-; AVX512BW-NEXT:    vpshufb %ymm6, %ymm4, %ymm4
-; AVX512BW-NEXT:    vpshufb %ymm6, %ymm2, %ymm2
-; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} ymm2 = ymm2[0],ymm4[0],ymm2[1],ymm4[1],ymm2[2],ymm4[2],ymm2[3],ymm4[3],ymm2[4],ymm4[4],ymm2[5],ymm4[5],ymm2[6],ymm4[6],ymm2[7],ymm4[7],ymm2[16],ymm4[16],ymm2[17],ymm4[17],ymm2[18],ymm4[18],ymm2[19],ymm4[19],ymm2[20],ymm4[20],ymm2[21],ymm4[21],ymm2[22],ymm4[22],ymm2[23],ymm4[23]
-; AVX512BW-NEXT:    vpermq {{.*#+}} ymm2 = ymm2[2,2,2,3]
-; AVX512BW-NEXT:    vmovdqu16 %ymm3, %ymm2 {%k1}
-; AVX512BW-NEXT:    vshufi64x2 {{.*#+}} zmm2 = zmm2[0,1,2,3],zmm10[4,5,6,7]
-; AVX512BW-NEXT:    vpshufb %ymm8, %ymm1, %ymm3
-; AVX512BW-NEXT:    vpshufb %ymm8, %ymm0, %ymm4
-; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} ymm3 = ymm4[0],ymm3[0],ymm4[1],ymm3[1],ymm4[2],ymm3[2],ymm4[3],ymm3[3],ymm4[4],ymm3[4],ymm4[5],ymm3[5],ymm4[6],ymm3[6],ymm4[7],ymm3[7],ymm4[16],ymm3[16],ymm4[17],ymm3[17],ymm4[18],ymm3[18],ymm4[19],ymm3[19],ymm4[20],ymm3[20],ymm4[21],ymm3[21],ymm4[22],ymm3[22],ymm4[23],ymm3[23]
-; AVX512BW-NEXT:    vpermq {{.*#+}} ymm3 = ymm3[2,2,2,3]
-; AVX512BW-NEXT:    vpunpckhbw {{.*#+}} ymm0 = ymm0[8],ymm1[8],ymm0[9],ymm1[9],ymm0[10],ymm1[10],ymm0[11],ymm1[11],ymm0[12],ymm1[12],ymm0[13],ymm1[13],ymm0[14],ymm1[14],ymm0[15],ymm1[15],ymm0[24],ymm1[24],ymm0[25],ymm1[25],ymm0[26],ymm1[26],ymm0[27],ymm1[27],ymm0[28],ymm1[28],ymm0[29],ymm1[29],ymm0[30],ymm1[30],ymm0[31],ymm1[31]
-; AVX512BW-NEXT:    vpmovsxbw {{.*#+}} ymm1 = [10,13,12,11,10,13,12,11,10,13,12,11,14,13,14,15]
-; AVX512BW-NEXT:    vpermw %ymm0, %ymm1, %ymm0
-; AVX512BW-NEXT:    vinserti64x4 $1, %ymm0, %zmm3, %zmm0
+; AVX512BW-NEXT:    vmovdqu16 %zmm13, %zmm9 {%k2}
+; AVX512BW-NEXT:    vpbroadcastq {{.*#+}} ymm10 = [6,5,8,7,0,9,0,0,6,5,8,7,0,9,0,0,6,5,8,7,0,9,0,0,6,5,8,7,0,9,0,0]
+; AVX512BW-NEXT:    vpshufb %ymm10, %ymm8, %ymm11
+; AVX512BW-NEXT:    vpshufb %ymm10, %ymm7, %ymm12
+; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} ymm11 = ymm12[0],ymm11[0],ymm12[1],ymm11[1],ymm12[2],ymm11[2],ymm12[3],ymm11[3],ymm12[4],ymm11[4],ymm12[5],ymm11[5],ymm12[6],ymm11[6],ymm12[7],ymm11[7],ymm12[16],ymm11[16],ymm12[17],ymm11[17],ymm12[18],ymm11[18],ymm12[19],ymm11[19],ymm12[20],ymm11[20],ymm12[21],ymm11[21],ymm12[22],ymm11[22],ymm12[23],ymm11[23]
+; AVX512BW-NEXT:    vpermq {{.*#+}} ymm11 = ymm11[2,2,2,3]
+; AVX512BW-NEXT:    vpunpckhbw {{.*#+}} ymm7 = ymm7[8],ymm8[8],ymm7[9],ymm8[9],ymm7[10],ymm8[10],ymm7[11],ymm8[11],ymm7[12],ymm8[12],ymm7[13],ymm8[13],ymm7[14],ymm8[14],ymm7[15],ymm8[15],ymm7[24],ymm8[24],ymm7[25],ymm8[25],ymm7[26],ymm8[26],ymm7[27],ymm8[27],ymm7[28],ymm8[28],ymm7[29],ymm8[29],ymm7[30],ymm8[30],ymm7[31],ymm8[31]
+; AVX512BW-NEXT:    vpmovsxbw {{.*#+}} ymm8 = [10,13,12,11,10,13,12,11,10,13,12,11,14,13,14,15]
+; AVX512BW-NEXT:    vpermw %ymm7, %ymm8, %ymm7
+; AVX512BW-NEXT:    vinserti64x4 $1, %ymm7, %zmm11, %zmm7
 ; AVX512BW-NEXT:    movl $-1840700270, %ecx # imm = 0x92492492
-; AVX512BW-NEXT:    kmovd %ecx, %k1
-; AVX512BW-NEXT:    vmovdqu16 %zmm0, %zmm2 {%k1}
-; AVX512BW-NEXT:    vmovdqa64 %zmm2, 128(%rax)
-; AVX512BW-NEXT:    vmovdqa64 %zmm9, 64(%rax)
+; AVX512BW-NEXT:    kmovd %ecx, %k2
+; AVX512BW-NEXT:    vmovdqu16 %zmm7, %zmm9 {%k2}
+; AVX512BW-NEXT:    vpshufb %xmm14, %xmm3, %xmm7
+; AVX512BW-NEXT:    vpshufb %xmm14, %xmm4, %xmm8
+; AVX512BW-NEXT:    vpunpcklbw {{.*#+}} xmm7 = xmm8[0],xmm7[0],xmm8[1],xmm7[1],xmm8[2],xmm7[2],xmm8[3],xmm...
[truncated]

Copy link

✅ With the latest revision this PR passed the Python code formatter.

Copy link

✅ With the latest revision this PR passed the C/C++ code formatter.

@RKSimon RKSimon merged commit c7198e0 into llvm:main Mar 26, 2024
@RKSimon RKSimon deleted the identity_subvector branch March 26, 2024 10:03
jrbyrnes pushed a commit to jrbyrnes/llvm-project that referenced this pull request Aug 19, 2024
…llvm#86487)

Handle the case where we've ended up inserting back into the source vector we extracted the subvector from.

Change-Id: If875da06e87381972df34a4cc43dd456917f8d87
jrbyrnes pushed a commit to jrbyrnes/llvm-project that referenced this pull request Sep 5, 2024
…llvm#86487)

Handle the case where we've ended up inserting back into the source vector we extracted the subvector from.

Change-Id: If875da06e87381972df34a4cc43dd456917f8d87
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Sep 16, 2024
…llvm#86487)

Handle the case where we've ended up inserting back into the source vector we extracted the subvector from.

Change-Id: I18d358544a93d211c554d8f782328eca0d4bb07f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants