Skip to content

[X86] Complete AMD znver4 AVX512 zeroing idioms #108740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

boomanaiden154
Copy link
Contributor

This patch completes scheduling information for the AVX512 zeroing idioms according to the znver4 software optimization guide.

This patch completes scheduling information for the AVX512 zeroing
idioms according to the znver4 software optimization guide.
@llvmbot
Copy link
Member

llvmbot commented Sep 15, 2024

@llvm/pr-subscribers-backend-x86

Author: Aiden Grossman (boomanaiden154)

Changes

This patch completes scheduling information for the AVX512 zeroing idioms according to the znver4 software optimization guide.


Patch is 115.51 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/108740.diff

4 Files Affected:

  • (modified) llvm/lib/Target/X86/X86ScheduleZnver4.td (+68-15)
  • (modified) llvm/test/tools/llvm-mca/X86/Znver4/resources-avx512.s (+13-13)
  • (modified) llvm/test/tools/llvm-mca/X86/Znver4/resources-avx512bw.s (+9-9)
  • (modified) llvm/test/tools/llvm-mca/X86/Znver4/zero-idioms.s (+355-355)
diff --git a/llvm/lib/Target/X86/X86ScheduleZnver4.td b/llvm/lib/Target/X86/X86ScheduleZnver4.td
index 6181ee841dd411..9763b651ff0cae 100644
--- a/llvm/lib/Target/X86/X86ScheduleZnver4.td
+++ b/llvm/lib/Target/X86/X86ScheduleZnver4.td
@@ -1839,35 +1839,59 @@ def Zn4WriteFZeroIdiom : SchedWriteVariant<[
 ]>;
 // NOTE: XORPSrr, XORPDrr are not zero-cycle!
 def : InstRW<[Zn4WriteFZeroIdiom], (instrs VXORPSrr, VXORPDrr,
-                                           VANDNPSrr, VANDNPDrr)>;
+                                           VXORPSZ128rr,
+                                           VXORPDZ128rr,
+                                           VANDNPSrr, VANDNPDrr,
+                                           VANDNPSZ128rr,
+                                           VANDNPDZ128rr)>;
 
 def Zn4WriteFZeroIdiomY : SchedWriteVariant<[
     SchedVar<MCSchedPredicate<ZeroIdiomPredicate>, [Zn4WriteZeroLatency]>,
     SchedVar<NoSchedPred,                          [WriteFLogicY]>
 ]>;
 def : InstRW<[Zn4WriteFZeroIdiomY], (instrs VXORPSYrr, VXORPDYrr,
-                                            VANDNPSYrr, VANDNPDYrr)>;
+                                            VXORPSZ256rr,
+                                            VXORPDZ256rr,
+                                            VANDNPSYrr, VANDNPDYrr,
+                                            VANDNPSZ256rr,
+                                            VANDNPDZ256rr)>;
+
+def Zn4WriteFZeroIdiomZ : SchedWriteVariant<[
+    SchedVar<MCSchedPredicate<ZeroIdiomPredicate>, [Zn4WriteZeroLatency]>,
+    SchedVar<NoSchedPred,                          [WriteFLogicZ]>
+]>;
+def : InstRW<[Zn4WriteFZeroIdiomZ], (instrs VXORPSZrr, VXORPDZrr,
+                                            VANDNPSZrr, VANDNPDZrr)>;
 
 def Zn4WriteVZeroIdiomLogicX : SchedWriteVariant<[
     SchedVar<MCSchedPredicate<ZeroIdiomPredicate>, [Zn4WriteZeroLatency]>,
     SchedVar<NoSchedPred,                          [WriteVecLogicX]>
 ]>;
 // NOTE: PXORrr,PANDNrr are not zero-cycle!
-def : InstRW<[Zn4WriteVZeroIdiomLogicX], (instrs VPXORrr, VPANDNrr)>;
+def : InstRW<[Zn4WriteVZeroIdiomLogicX], (instrs VPXORrr,
+                                                 VPXORDZ128rr,
+                                                 VPXORQZ128rr,
+                                                 VPANDNrr,
+                                                 VPANDNDZ128rr,
+                                                 VPANDNQZ128rr)>;
 
-// TODO: This should be extended to incorporate all of the AVX512 zeroing
-// idioms that can be executed by the renamer.
-def Zn4WriteVZeroIdiomLogicZ : SchedWriteVariant<[
+def Zn4WriteVZeroIdiomLogicY : SchedWriteVariant<[
     SchedVar<MCSchedPredicate<ZeroIdiomPredicate>, [Zn4WriteZeroLatency]>,
-    SchedVar<NoSchedPred,                          [WriteVecLogicZ]>
+    SchedVar<NoSchedPred,                          [WriteVecLogicY]>
 ]>;
-def : InstRW<[Zn4WriteVZeroIdiomLogicZ], (instrs VPXORDZrr)>;
+def : InstRW<[Zn4WriteVZeroIdiomLogicY], (instrs VPXORYrr,
+                                                 VPXORDZ256rr,
+                                                 VPXORQZ256rr,
+                                                 VPANDNYrr,
+                                                 VPANDNDZ256rr,
+                                                 VPANDNQZ256rr)>;
 
-def Zn4WriteVZeroIdiomLogicY : SchedWriteVariant<[
+def Zn4WriteVZeroIdiomLogicZ : SchedWriteVariant<[
     SchedVar<MCSchedPredicate<ZeroIdiomPredicate>, [Zn4WriteZeroLatency]>,
-    SchedVar<NoSchedPred,                          [WriteVecLogicY]>
+    SchedVar<NoSchedPred,                          [WriteVecLogicZ]>
 ]>;
-def : InstRW<[Zn4WriteVZeroIdiomLogicY], (instrs VPXORYrr, VPANDNYrr)>;
+def : InstRW<[Zn4WriteVZeroIdiomLogicZ], (instrs VPXORDZrr, VPXORQZrr,
+                                                 VPANDNDZrr, VPANDNQZrr)>;
 
 def Zn4WriteVZeroIdiomALUX : SchedWriteVariant<[
     SchedVar<MCSchedPredicate<ZeroIdiomPredicate>, [Zn4WriteZeroLatency]>,
@@ -1877,7 +1901,10 @@ def Zn4WriteVZeroIdiomALUX : SchedWriteVariant<[
 //       PCMPGTBrr, PCMPGTWrr, PCMPGTDrr, PCMPGTQrr are not zero-cycle!
 def : InstRW<[Zn4WriteVZeroIdiomALUX],
              (instrs VPSUBBrr, VPSUBWrr, VPSUBDrr, VPSUBQrr,
-                     VPCMPGTBrr, VPCMPGTWrr, VPCMPGTDrr, VPCMPGTQrr)>;
+                     VPSUBBZ128rr, VPSUBWZ128rr, VPSUBDZ128rr, VPSUBQZ128rr,
+                     VPCMPGTBrr, VPCMPGTWrr, VPCMPGTDrr, VPCMPGTQrr,
+                     VPCMPGTBZ128rr, VPCMPGTWZ128rr,
+                     VPCMPGTDZ128rr, VPCMPGTQZ128rr)>;
 
 def Zn4WriteVZeroIdiomALUY : SchedWriteVariant<[
     SchedVar<MCSchedPredicate<ZeroIdiomPredicate>, [Zn4WriteZeroLatency]>,
@@ -1885,7 +1912,18 @@ def Zn4WriteVZeroIdiomALUY : SchedWriteVariant<[
 ]>;
 def : InstRW<[Zn4WriteVZeroIdiomALUY],
              (instrs VPSUBBYrr, VPSUBWYrr, VPSUBDYrr, VPSUBQYrr,
-                     VPCMPGTBYrr, VPCMPGTWYrr, VPCMPGTDYrr, VPCMPGTQYrr)>;
+                     VPSUBBZ256rr, VPSUBWZ256rr, VPSUBDZ256rr, VPSUBQZ256rr,
+                     VPCMPGTBYrr, VPCMPGTWYrr, VPCMPGTDYrr, VPCMPGTQYrr,
+                     VPCMPGTBZ256rr, VPCMPGTWZ256rr,
+                     VPCMPGTDZ256rr, VPCMPGTQZ256rr)>;
+
+def Zn4WriteVZeroIdiomALUZ : SchedWriteVariant<[
+    SchedVar<MCSchedPredicate<ZeroIdiomPredicate>, [Zn4WriteZeroLatency]>,
+    SchedVar<NoSchedPred,                          [WriteVecALUZ]>
+]>;
+def : InstRW<[Zn4WriteVZeroIdiomALUY],
+             (instrs VPSUBBZrr, VPSUBWZrr, VPSUBDZrr, VPSUBQZrr,
+                     VPCMPGTBZrr, VPCMPGTWZrr, VPCMPGTDZrr, VPCMPGTQZrr)>;
 
 def : IsZeroIdiomFunction<[
   // GPR Zero-idioms.
@@ -1940,9 +1978,24 @@ def : IsZeroIdiomFunction<[
   ], ZeroIdiomPredicate>,
 
   // AVX ZMM Zero-idioms.
-  // TODO: This should be expanded to incorporate all AVX512 zeroing idioms.
   DepBreakingClass<[
-    VPXORDZrr
+    // fp variants.
+    VXORPSZrr, VXORPDZrr,
+    VXORPSZ128rr, VXORPDZ128rr, VXORPSZ256rr, VXORPDZ256rr,
+    VANDNPSZrr, VANDNPDZrr,
+    VANDNPSZ128rr, VANDNPDZ128rr, VANDNPSZ256rr, VANDNPDZ256rr,
+
+    // int variants.
+    VPCMPGTBZrr, VPCMPGTWZrr, VPCMPGTDZrr, VPCMPGTQZrr,
+    VPCMPGTBZ128rr, VPCMPGTWZ128rr, VPCMPGTDZ128rr, VPCMPGTQZ128rr,
+    VPCMPGTBZ256rr, VPCMPGTWZ256rr, VPCMPGTDZ256rr, VPCMPGTQZ256rr,
+    VPANDNDZrr, VPANDNQZrr,
+    VPANDNDZ128rr, VPANDNQZ128rr, VPANDNDZ256rr, VPANDNQZ256rr,
+    VPXORDZrr, VPXORQZrr,
+    VPXORDZ128rr, VPXORQZ128rr, VPXORDZ256rr, VPXORQZ256rr,
+    VPSUBBZrr, VPSUBWZrr, VPSUBDZrr, VPSUBQZrr,
+    VPSUBBZ128rr, VPSUBWZ128rr, VPSUBDZ128rr, VPSUBQZ128rr,
+    VPSUBBZ256rr, VPSUBWZ256rr, VPSUBDZ256rr, VPSUBQZ256rr,
   ], ZeroIdiomPredicate>,
 ]>;
 
diff --git a/llvm/test/tools/llvm-mca/X86/Znver4/resources-avx512.s b/llvm/test/tools/llvm-mca/X86/Znver4/resources-avx512.s
index 6e52eddd9a8f5e..0c4c6567680017 100644
--- a/llvm/test/tools/llvm-mca/X86/Znver4/resources-avx512.s
+++ b/llvm/test/tools/llvm-mca/X86/Znver4/resources-avx512.s
@@ -1609,13 +1609,13 @@ vunpcklps         (%rax){1to16}, %zmm17, %zmm19 {z}{k1}
 # CHECK-NEXT:  1      1     0.50                        vpcmpeqq	%zmm0, %zmm1, %k2 {%k3}
 # CHECK-NEXT:  1      8     0.50    *                   vpcmpeqq	(%rax), %zmm1, %k2 {%k3}
 # CHECK-NEXT:  1      8     0.50    *                   vpcmpeqq	(%rax){1to8}, %zmm1, %k2 {%k3}
-# CHECK-NEXT:  1      1     0.50                        vpcmpgtd	%zmm0, %zmm1, %k2
+# CHECK-NEXT:  1      1     0.25                        vpcmpgtd	%zmm0, %zmm1, %k2
 # CHECK-NEXT:  1      8     0.50    *                   vpcmpgtd	(%rax), %zmm1, %k2
 # CHECK-NEXT:  1      8     0.50    *                   vpcmpgtd	(%rax){1to16}, %zmm1, %k2
 # CHECK-NEXT:  1      1     0.50                        vpcmpgtd	%zmm0, %zmm1, %k2 {%k3}
 # CHECK-NEXT:  1      8     0.50    *                   vpcmpgtd	(%rax), %zmm1, %k2 {%k3}
 # CHECK-NEXT:  1      8     0.50    *                   vpcmpgtd	(%rax){1to16}, %zmm1, %k2 {%k3}
-# CHECK-NEXT:  1      1     0.50                        vpcmpgtq	%zmm0, %zmm1, %k2
+# CHECK-NEXT:  1      1     0.25                        vpcmpgtq	%zmm0, %zmm1, %k2
 # CHECK-NEXT:  1      8     0.50    *                   vpcmpgtq	(%rax), %zmm1, %k2
 # CHECK-NEXT:  1      8     0.50    *                   vpcmpgtq	(%rax){1to8}, %zmm1, %k2
 # CHECK-NEXT:  1      1     0.50                        vpcmpgtq	%zmm0, %zmm1, %k2 {%k3}
@@ -1815,7 +1815,7 @@ vunpcklps         (%rax){1to16}, %zmm17, %zmm19 {z}{k1}
 # CHECK-NEXT:  1      1     1.00                        vpshufd	$0, %zmm16, %zmm19 {%k1} {z}
 # CHECK-NEXT:  1      8     1.00    *                   vpshufd	$0, (%rax), %zmm19 {%k1} {z}
 # CHECK-NEXT:  1      8     1.00    *                   vpshufd	$0, (%rax){1to16}, %zmm19 {%k1} {z}
-# CHECK-NEXT:  1      1     0.50                        vpsubd	%zmm16, %zmm17, %zmm19
+# CHECK-NEXT:  1      1     0.25                        vpsubd	%zmm16, %zmm17, %zmm19
 # CHECK-NEXT:  1      8     0.50    *                   vpsubd	(%rax), %zmm17, %zmm19
 # CHECK-NEXT:  1      8     0.50    *                   vpsubd	(%rax){1to16}, %zmm17, %zmm19
 # CHECK-NEXT:  1      1     0.50                        vpsubd	%zmm16, %zmm17, %zmm19 {%k1}
@@ -1824,7 +1824,7 @@ vunpcklps         (%rax){1to16}, %zmm17, %zmm19 {z}{k1}
 # CHECK-NEXT:  1      1     0.50                        vpsubd	%zmm16, %zmm17, %zmm19 {%k1} {z}
 # CHECK-NEXT:  1      8     0.50    *                   vpsubd	(%rax), %zmm17, %zmm19 {%k1} {z}
 # CHECK-NEXT:  1      8     0.50    *                   vpsubd	(%rax){1to16}, %zmm17, %zmm19 {%k1} {z}
-# CHECK-NEXT:  1      1     0.50                        vpsubq	%zmm16, %zmm17, %zmm19
+# CHECK-NEXT:  1      1     0.25                        vpsubq	%zmm16, %zmm17, %zmm19
 # CHECK-NEXT:  1      8     0.50    *                   vpsubq	(%rax), %zmm17, %zmm19
 # CHECK-NEXT:  1      8     0.50    *                   vpsubq	(%rax){1to8}, %zmm17, %zmm19
 # CHECK-NEXT:  1      1     0.50                        vpsubq	%zmm16, %zmm17, %zmm19 {%k1}
@@ -1939,7 +1939,7 @@ vunpcklps         (%rax){1to16}, %zmm17, %zmm19 {z}{k1}
 # CHECK-NEXT:  1      22    5.00    *                   vsqrtss	(%rax), %xmm17, %xmm19 {%k1}
 # CHECK-NEXT:  1      15    5.00                        vsqrtss	%xmm16, %xmm17, %xmm19 {%k1} {z}
 # CHECK-NEXT:  1      22    5.00    *                   vsqrtss	(%rax), %xmm17, %xmm19 {%k1} {z}
-# CHECK-NEXT:  1      1     0.50                        vpsubd	%zmm16, %zmm17, %zmm19
+# CHECK-NEXT:  1      1     0.25                        vpsubd	%zmm16, %zmm17, %zmm19
 # CHECK-NEXT:  1      8     0.50    *                   vpsubd	(%rax), %zmm17, %zmm19
 # CHECK-NEXT:  1      8     0.50    *                   vpsubd	(%rax){1to16}, %zmm17, %zmm19
 # CHECK-NEXT:  1      1     0.50                        vpsubd	%zmm16, %zmm17, %zmm19 {%k1}
@@ -1948,7 +1948,7 @@ vunpcklps         (%rax){1to16}, %zmm17, %zmm19 {z}{k1}
 # CHECK-NEXT:  1      1     0.50                        vpsubd	%zmm16, %zmm17, %zmm19 {%k1} {z}
 # CHECK-NEXT:  1      8     0.50    *                   vpsubd	(%rax), %zmm17, %zmm19 {%k1} {z}
 # CHECK-NEXT:  1      8     0.50    *                   vpsubd	(%rax){1to16}, %zmm17, %zmm19 {%k1} {z}
-# CHECK-NEXT:  1      1     0.50                        vpsubq	%zmm16, %zmm17, %zmm19
+# CHECK-NEXT:  1      1     0.25                        vpsubq	%zmm16, %zmm17, %zmm19
 # CHECK-NEXT:  1      8     0.50    *                   vpsubq	(%rax), %zmm17, %zmm19
 # CHECK-NEXT:  1      8     0.50    *                   vpsubq	(%rax){1to8}, %zmm17, %zmm19
 # CHECK-NEXT:  1      1     0.50                        vpsubq	%zmm16, %zmm17, %zmm19 {%k1}
@@ -2065,7 +2065,7 @@ vunpcklps         (%rax){1to16}, %zmm17, %zmm19 {z}{k1}
 
 # CHECK:      Resource pressure per iteration:
 # CHECK-NEXT: [0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12.0] [12.1] [13]   [14.0] [14.1] [14.2] [15.0] [15.1] [15.2] [16.0] [16.1]
-# CHECK-NEXT: 5.33   5.33   5.33    -      -      -      -      -     221.00 1060.50 618.00 352.50 297.00 297.00 17.00 205.33 205.33 205.33 194.33 194.33 194.33 16.50  16.50
+# CHECK-NEXT: 5.33   5.33   5.33    -      -      -      -      -     219.50 1059.00 616.50 351.00 297.00 297.00 17.00 205.33 205.33 205.33 194.33 194.33 194.33 16.50  16.50
 
 # CHECK:      Resource pressure by instruction:
 # CHECK-NEXT: [0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12.0] [12.1] [13]   [14.0] [14.1] [14.2] [15.0] [15.1] [15.2] [16.0] [16.1] Instructions:
@@ -2590,13 +2590,13 @@ vunpcklps         (%rax){1to16}, %zmm17, %zmm19 {z}{k1}
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50    -      -      -      -      -      -      -      -      -      -      -     vpcmpeqq	%zmm0, %zmm1, %k2 {%k3}
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vpcmpeqq	(%rax), %zmm1, %k2 {%k3}
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vpcmpeqq	(%rax){1to8}, %zmm1, %k2 {%k3}
-# CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50    -      -      -      -      -      -      -      -      -      -      -     vpcmpgtd	%zmm0, %zmm1, %k2
+# CHECK-NEXT:  -      -      -      -      -      -      -      -     0.25   0.25   0.25   0.25    -      -      -      -      -      -      -      -      -      -      -     vpcmpgtd	%zmm0, %zmm1, %k2
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vpcmpgtd	(%rax), %zmm1, %k2
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vpcmpgtd	(%rax){1to16}, %zmm1, %k2
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50    -      -      -      -      -      -      -      -      -      -      -     vpcmpgtd	%zmm0, %zmm1, %k2 {%k3}
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vpcmpgtd	(%rax), %zmm1, %k2 {%k3}
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vpcmpgtd	(%rax){1to16}, %zmm1, %k2 {%k3}
-# CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50    -      -      -      -      -      -      -      -      -      -      -     vpcmpgtq	%zmm0, %zmm1, %k2
+# CHECK-NEXT:  -      -      -      -      -      -      -      -     0.25   0.25   0.25   0.25    -      -      -      -      -      -      -      -      -      -      -     vpcmpgtq	%zmm0, %zmm1, %k2
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vpcmpgtq	(%rax), %zmm1, %k2
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vpcmpgtq	(%rax){1to8}, %zmm1, %k2
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50    -      -      -      -      -      -      -      -      -      -      -     vpcmpgtq	%zmm0, %zmm1, %k2 {%k3}
@@ -2796,7 +2796,7 @@ vunpcklps         (%rax){1to16}, %zmm17, %zmm19 {z}{k1}
 # CHECK-NEXT:  -      -      -      -      -      -      -      -      -     1.00   1.00    -      -      -      -      -      -      -      -      -      -      -      -     vpshufd	$0, %zmm16, %zmm19 {%k1} {z}
 # CHECK-NEXT:  -      -      -      -      -      -      -      -      -     1.00   1.00    -     0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vpshufd	$0, (%rax), %zmm19 {%k1} {z}
 # CHECK-NEXT:  -      -      -      -      -      -      -      -      -     1.00   1.00    -     0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vpshufd	$0, (%rax){1to16}, %zmm19 {%k1} {z}
-# CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50    -      -      -      -      -      -      -      -      -      -      -     vpsubd	%zmm16, %zmm17, %zmm19
+# CHECK-NEXT:  -      -      -      -      -      -      -      -     0.25   0.25   0.25   0.25    -      -      -      -      -      -      -      -      -      -      -     vpsubd	%zmm16, %zmm17, %zmm19
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vpsubd	(%rax), %zmm17, %zmm19
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vpsubd	(%rax){1to16}, %zmm17, %zmm19
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50    -      -      -      -      -      -      -      -      -      -      -     vpsubd	%zmm16, %zmm17, %zmm19 {%k1}
@@ -2805,7 +2805,7 @@ vunpcklps         (%rax){1to16}, %zmm17, %zmm19 {z}{k1}
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50    -      -      -      -      -      -      -      -      -      -      -     vpsubd	%zmm16, %zmm17, %zmm19 {%k1} {z}
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vpsubd	(%rax), %zmm17, %zmm19 {%k1} {z}
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vpsubd	(%rax){1to16}, %zmm17, %zmm19 {%k1} {z}
-# CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50    -      -      -      -      -      -      -      -      -      -      -     vpsubq	%zmm16, %zmm17, %zmm19
+# CHECK-NEXT:  -      -      -      -      -      -      -      -     0.25   0.25   0.25   0.25    -      -      -      -      -      -      -      -      -      -      -     vpsubq	%zmm16, %zmm17, %zmm19
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vpsubq	(%rax), %zmm17, %zmm19
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vpsubq	(%rax){1to8}, %zmm17, %zmm19
 # CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50    -      -      -      -      -      -      -      -      -      -      -     vpsubq	%zmm16, %zmm17, %zmm19 {%k1}
@@ -2920,7 +2920,7 @@ vunpcklps         (%rax){1to16}, %zmm17, %zmm19 {z}{k1}
 # CHECK-NEXT:  -      -      -      -      -      -      -      -      -     5.00    -      -     0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vsqrtss	(%rax), %xmm17, %xmm19 {%k1}
 # CHECK-NEXT:  -      -      -      -      -      -      -      -      -     5.00    -      -      -      -      -      -      -      -      -      -      -      -      -     vsqrtss	%xmm16, %xmm17, %xmm19 {%k1} {z}
 # CHECK-NEXT:  -      -      -      -      -      -      -      -      -     5.00    -      -     0.50   0.50    -     0.33   0.33   0.33   0.33   0.33   0.33    -      -     vsqrtss	(%rax), %xmm17, %xmm19 {%k1} {z}
-# CHECK-NEXT:  -      -      -      -      -      -      -      -     0.50   0.50   0.50   0.50    -      -      -      -      -      -      -      -      -      -      -     vpsubd	%zmm16, %zmm17, %zmm19
+# CHECK-NEXT:  -      -      -      -      -      -      -      -     0.25   0.25   0.25   0.25    -      -      -      -      -      -      -      -      -      -      -     vpsubd	%zmm16, %...
[truncated]

VPSUBBZ128rr, VPSUBWZ128rr, VPSUBDZ128rr, VPSUBQZ128rr,
VPCMPGTBrr, VPCMPGTWrr, VPCMPGTDrr, VPCMPGTQrr,
VPCMPGTBZ128rr, VPCMPGTWZ128rr,
VPCMPGTDZ128rr, VPCMPGTQZ128rr)>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ganeshgit Please can you confirm that AVX512 VPCMPGTZ128/Z256/Z style compares (which write to k-reg) are zero-idioms? It says so in the SoG but I'm concerned its a cut+paste typo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ganeshgit Please can you confirm that AVX512 VPCMPGTZ128/Z256/Z style compares (which write to k-reg) are zero-idioms? It says so in the SoG but I'm concerned its a cut+paste typo.

Yes they are zero-idioms. In AVX these would write a YMM register, and in AVX512, they write a K register. So, they are okay.

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@boomanaiden154 boomanaiden154 merged commit 6b78ea8 into llvm:main Sep 17, 2024
10 checks passed
@boomanaiden154 boomanaiden154 deleted the znver-avx512-scheduling-zero-idioms branch September 17, 2024 18:07
hamphet pushed a commit to hamphet/llvm-project that referenced this pull request Sep 18, 2024
This patch completes scheduling information for the AVX512 zeroing
idioms according to the znver4 software optimization guide.
tmsri pushed a commit to tmsri/llvm-project that referenced this pull request Sep 19, 2024
This patch completes scheduling information for the AVX512 zeroing
idioms according to the znver4 software optimization guide.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants