Skip to content

[AArch64] Fix scheduling information for arithmetic and logical instructions. #113542

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 4, 2024

Conversation

Asher8118
Copy link

This patch corrects scheduling information relating to Neoverse-V2 arithmetic and logical instructions.

@llvmbot
Copy link
Member

llvmbot commented Oct 24, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Rin Dobrescu (Rin18)

Changes

This patch corrects scheduling information relating to Neoverse-V2 arithmetic and logical instructions.


Patch is 68.96 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/113542.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td (+16-9)
  • (modified) llvm/test/tools/llvm-mca/AArch64/Neoverse/V2-basic-instructions.s (+189-189)
diff --git a/llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td b/llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td
index f884d20ab7eda9..054fd2a382e720 100644
--- a/llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td
+++ b/llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td
@@ -56,6 +56,7 @@ def V2UnitD   : ProcResource<2>;  // Store data 0/1
 def V2UnitR   : ProcResGroup<[V2UnitS0, V2UnitS1]>;  // Integer single-cycle 0/1
 def V2UnitS   : ProcResGroup<[V2UnitS0, V2UnitS1, V2UnitS2, V2UnitS3]>;  // Integer single-cycle 0/1/2/3
 def V2UnitF   : ProcResGroup<[V2UnitS0, V2UnitS1, V2UnitM0, V2UnitM1]>;  // Integer single-cycle 0/1 and single/multicycle 0/1
+def V2UnitG   : ProcResGroup<[V2UnitS0, V2UnitS1, V2UnitM0]>;  // Integer single-cycle 0/1 and single/multicycle 0
 def V2UnitI   : ProcResGroup<[V2UnitS0, V2UnitS1, V2UnitS2, V2UnitS3, V2UnitM0, V2UnitM1]>;  // Integer single-cycle 0/1/2/3 and single/multicycle 0/1
 def V2UnitM   : ProcResGroup<[V2UnitM0, V2UnitM1]>;  // Integer single/multicycle 0/1
 def V2UnitL   : ProcResGroup<[V2UnitL01, V2UnitL2]>; // Load/Store 0/1 and Load 2
@@ -97,6 +98,7 @@ def V2Write_0c : SchedWriteRes<[]> { let Latency = 0; }
 
 def V2Write_1c_1B    : SchedWriteRes<[V2UnitB]>   { let Latency = 1; }
 def V2Write_1c_1F    : SchedWriteRes<[V2UnitF]>   { let Latency = 1; }
+def V2Write_1c_1G    : SchedWriteRes<[V2UnitG]>   { let Latency = 1; }
 def V2Write_1c_1I    : SchedWriteRes<[V2UnitI]>   { let Latency = 1; }
 def V2Write_1c_1M    : SchedWriteRes<[V2UnitM]>   { let Latency = 1; }
 def V2Write_1c_1M0   : SchedWriteRes<[V2UnitM0]>  { let Latency = 1; }
@@ -886,11 +888,11 @@ def V2Write_ArithI : SchedWriteVariant<[
                        SchedVar<NoSchedPred, [V2Write_2c_1M]>]>;
 
 def V2Write_ArithF : SchedWriteVariant<[
-                       SchedVar<IsCheapLSL,  [V2Write_1c_1F]>,
+                       SchedVar<IsCheapLSL,  [V2Write_1c_1G]>,
                        SchedVar<NoSchedPred, [V2Write_2c_1M]>]>;
 
 def V2Write_Logical : SchedWriteVariant<[
-                        SchedVar<NeoverseNoLSL, [V2Write_1c_1F]>,
+                        SchedVar<NeoverseNoLSL, [V2Write_1c_1G]>,
                         SchedVar<NoSchedPred,   [V2Write_2c_1M]>]>;
 
 def V2Write_Extr : SchedWriteVariant<[
@@ -1106,19 +1108,19 @@ def : InstRW<[V2Write_1c_1B_1R], (instrs BL, BLR)>;
 // -----------------------------------------------------------------------------
 
 // ALU, basic
-// ALU, basic, flagset
 def : SchedAlias<WriteI, V2Write_1c_1I>;
-def : InstRW<[V2Write_1c_1F], (instregex "^(ADD|SUB)S[WX]r[ir]$",
+
+// ALU, basic, flagset
+def : InstRW<[V2Write_1c_1G],
+             (instregex "^(ADD|SUB)S[WX]r[ir]$",
                         "^(ADC|SBC)S[WX]r$",
-                        "^ANDS[WX]ri$")>;
+                        "^ANDS[WX]ri$",
+                        "^(AND|BIC)S[WX]rr$")>;
 def : InstRW<[V2Write_0or1c_1I], (instregex "^MOVZ[WX]i$")>;
 
 // ALU, extend and shift
 def : SchedAlias<WriteIEReg, V2Write_2c_1M>;
 
-// Conditional compare
-def : InstRW<[V2Write_1c_1F], (instregex "^CCM[NP][WX][ir]")>;
-
 // Arithmetic, LSL shift, shift <= 4
 // Arithmetic, flagset, LSL shift, shift <= 4
 // Arithmetic, LSR/ASR/ROR shift or LSL shift > 4
@@ -1129,6 +1131,9 @@ def : InstRW<[V2Write_ArithF],
 // Arithmetic, immediate to logical address tag
 def : InstRW<[V2Write_2c_1M], (instrs ADDG, SUBG)>;
 
+// Conditional compare
+def : InstRW<[V2Write_1c_1G], (instregex "^CCM[NP][WX][ir]")>;
+
 // Convert floating-point condition flags
 // Flag manipulation instructions
 def : WriteRes<WriteSys, []> { let Latency = 1; }
@@ -1138,8 +1143,10 @@ def : InstRW<[V2Write_2c_1M], (instrs IRG, IRGstack)>;
 
 // Insert Tag Mask
 // Subtract Pointer
+def : InstRW<[V2Write_1c_1I], (instrs GMI, SUBP)>;
+
 // Subtract Pointer, flagset
-def : InstRW<[V2Write_1c_1I], (instrs GMI, SUBP, SUBPS)>;
+def : InstRW<[V2Write_1c_1G], (instrs SUBPS)>;
 
 // Logical, shift, no flagset
 def : InstRW<[V2Write_1c_1I],    (instregex "^(AND|BIC|EON|EOR|ORN)[WX]rs$")>;
diff --git a/llvm/test/tools/llvm-mca/AArch64/Neoverse/V2-basic-instructions.s b/llvm/test/tools/llvm-mca/AArch64/Neoverse/V2-basic-instructions.s
index 9a3bcac69582ac..987a41d8b08eed 100644
--- a/llvm/test/tools/llvm-mca/AArch64/Neoverse/V2-basic-instructions.s
+++ b/llvm/test/tools/llvm-mca/AArch64/Neoverse/V2-basic-instructions.s
@@ -1406,16 +1406,16 @@ drps
 # CHECK-NEXT:  1      1     0.17                        sub	w4, w20, #546, lsl #12
 # CHECK-NEXT:  1      1     0.17                        sub	sp, sp, #288
 # CHECK-NEXT:  1      1     0.17                        sub	wsp, w19, #16
-# CHECK-NEXT:  1      1     0.25                        adds	w13, w23, #291, lsl #12
-# CHECK-NEXT:  1      1     0.25                        cmn	w2, #4095
-# CHECK-NEXT:  1      1     0.25                        adds	w20, wsp, #0
-# CHECK-NEXT:  1      1     0.25                        cmn	x3, #1, lsl #12
-# CHECK-NEXT:  1      1     0.25                        cmp	sp, #20, lsl #12
-# CHECK-NEXT:  1      1     0.25                        cmp	x30, #4095
-# CHECK-NEXT:  1      1     0.25                        subs	x4, sp, #3822
-# CHECK-NEXT:  1      1     0.25                        cmn	w3, #291, lsl #12
-# CHECK-NEXT:  1      1     0.25                        cmn	wsp, #1365
-# CHECK-NEXT:  1      1     0.25                        cmn	sp, #1092, lsl #12
+# CHECK-NEXT:  1      1     0.33                        adds	w13, w23, #291, lsl #12
+# CHECK-NEXT:  1      1     0.33                        cmn	w2, #4095
+# CHECK-NEXT:  1      1     0.33                        adds	w20, wsp, #0
+# CHECK-NEXT:  1      1     0.33                        cmn	x3, #1, lsl #12
+# CHECK-NEXT:  1      1     0.33                        cmp	sp, #20, lsl #12
+# CHECK-NEXT:  1      1     0.33                        cmp	x30, #4095
+# CHECK-NEXT:  1      1     0.33                        subs	x4, sp, #3822
+# CHECK-NEXT:  1      1     0.33                        cmn	w3, #291, lsl #12
+# CHECK-NEXT:  1      1     0.33                        cmn	wsp, #1365
+# CHECK-NEXT:  1      1     0.33                        cmn	sp, #1092, lsl #12
 # CHECK-NEXT:  1      1     0.17                        mov	sp, x30
 # CHECK-NEXT:  1      1     0.17                        mov	wsp, w20
 # CHECK-NEXT:  1      1     0.17                        mov	x11, sp
@@ -1446,11 +1446,11 @@ drps
 # CHECK-NEXT:  1      2     0.50                        add	x2, x3, x4, asr #0
 # CHECK-NEXT:  1      2     0.50                        add	x5, x6, x7, asr #21
 # CHECK-NEXT:  1      2     0.50                        add	x8, x9, x10, asr #63
-# CHECK-NEXT:  1      1     0.25                        adds	w3, w5, w7
-# CHECK-NEXT:  1      1     0.25                        cmn	w3, w5
-# CHECK-NEXT:  1      1     0.25                        adds	w20, wzr, w4
-# CHECK-NEXT:  1      1     0.25                        adds	w4, w6, wzr
-# CHECK-NEXT:  1      1     0.25                        adds	w11, w13, w15
+# CHECK-NEXT:  1      1     0.33                        adds	w3, w5, w7
+# CHECK-NEXT:  1      1     0.33                        cmn	w3, w5
+# CHECK-NEXT:  1      1     0.33                        adds	w20, wzr, w4
+# CHECK-NEXT:  1      1     0.33                        adds	w4, w6, wzr
+# CHECK-NEXT:  1      1     0.33                        adds	w11, w13, w15
 # CHECK-NEXT:  1      2     0.50                        adds	w9, w3, wzr, lsl #10
 # CHECK-NEXT:  1      2     0.50                        adds	w17, w29, w20, lsl #31
 # CHECK-NEXT:  1      2     0.50                        adds	w21, w22, w23, lsr #0
@@ -1459,11 +1459,11 @@ drps
 # CHECK-NEXT:  1      2     0.50                        adds	w2, w3, w4, asr #0
 # CHECK-NEXT:  1      2     0.50                        adds	w5, w6, w7, asr #21
 # CHECK-NEXT:  1      2     0.50                        adds	w8, w9, w10, asr #31
-# CHECK-NEXT:  1      1     0.25                        adds	x3, x5, x7
-# CHECK-NEXT:  1      1     0.25                        cmn	x3, x5
-# CHECK-NEXT:  1      1     0.25                        adds	x20, xzr, x4
-# CHECK-NEXT:  1      1     0.25                        adds	x4, x6, xzr
-# CHECK-NEXT:  1      1     0.25                        adds	x11, x13, x15
+# CHECK-NEXT:  1      1     0.33                        adds	x3, x5, x7
+# CHECK-NEXT:  1      1     0.33                        cmn	x3, x5
+# CHECK-NEXT:  1      1     0.33                        adds	x20, xzr, x4
+# CHECK-NEXT:  1      1     0.33                        adds	x4, x6, xzr
+# CHECK-NEXT:  1      1     0.33                        adds	x11, x13, x15
 # CHECK-NEXT:  1      2     0.50                        adds	x9, x3, xzr, lsl #10
 # CHECK-NEXT:  1      2     0.50                        adds	x17, x29, x20, lsl #63
 # CHECK-NEXT:  1      2     0.50                        adds	x21, x22, x23, lsr #0
@@ -1496,10 +1496,10 @@ drps
 # CHECK-NEXT:  1      2     0.50                        sub	x2, x3, x4, asr #0
 # CHECK-NEXT:  1      2     0.50                        sub	x5, x6, x7, asr #21
 # CHECK-NEXT:  1      2     0.50                        sub	x8, x9, x10, asr #63
-# CHECK-NEXT:  1      1     0.25                        subs	w3, w5, w7
-# CHECK-NEXT:  1      1     0.25                        cmp	w3, w5
-# CHECK-NEXT:  1      1     0.25                        subs	w4, w6, wzr
-# CHECK-NEXT:  1      1     0.25                        subs	w11, w13, w15
+# CHECK-NEXT:  1      1     0.33                        subs	w3, w5, w7
+# CHECK-NEXT:  1      1     0.33                        cmp	w3, w5
+# CHECK-NEXT:  1      1     0.33                        subs	w4, w6, wzr
+# CHECK-NEXT:  1      1     0.33                        subs	w11, w13, w15
 # CHECK-NEXT:  1      2     0.50                        subs	w9, w3, wzr, lsl #10
 # CHECK-NEXT:  1      2     0.50                        subs	w17, w29, w20, lsl #31
 # CHECK-NEXT:  1      2     0.50                        subs	w21, w22, w23, lsr #0
@@ -1508,10 +1508,10 @@ drps
 # CHECK-NEXT:  1      2     0.50                        subs	w2, w3, w4, asr #0
 # CHECK-NEXT:  1      2     0.50                        subs	w5, w6, w7, asr #21
 # CHECK-NEXT:  1      2     0.50                        subs	w8, w9, w10, asr #31
-# CHECK-NEXT:  1      1     0.25                        subs	x3, x5, x7
-# CHECK-NEXT:  1      1     0.25                        cmp	x3, x5
-# CHECK-NEXT:  1      1     0.25                        subs	x4, x6, xzr
-# CHECK-NEXT:  1      1     0.25                        subs	x11, x13, x15
+# CHECK-NEXT:  1      1     0.33                        subs	x3, x5, x7
+# CHECK-NEXT:  1      1     0.33                        cmp	x3, x5
+# CHECK-NEXT:  1      1     0.33                        subs	x4, x6, xzr
+# CHECK-NEXT:  1      1     0.33                        subs	x11, x13, x15
 # CHECK-NEXT:  1      2     0.50                        subs	x9, x3, xzr, lsl #10
 # CHECK-NEXT:  1      2     0.50                        subs	x17, x29, x20, lsl #63
 # CHECK-NEXT:  1      2     0.50                        subs	x21, x22, x23, lsr #0
@@ -1520,9 +1520,9 @@ drps
 # CHECK-NEXT:  1      2     0.50                        subs	x2, x3, x4, asr #0
 # CHECK-NEXT:  1      2     0.50                        subs	x5, x6, x7, asr #21
 # CHECK-NEXT:  1      2     0.50                        subs	x8, x9, x10, asr #63
-# CHECK-NEXT:  1      1     0.25                        cmn	wzr, w4
-# CHECK-NEXT:  1      1     0.25                        cmn	w5, wzr
-# CHECK-NEXT:  1      1     0.25                        cmn	w6, w7
+# CHECK-NEXT:  1      1     0.33                        cmn	wzr, w4
+# CHECK-NEXT:  1      1     0.33                        cmn	w5, wzr
+# CHECK-NEXT:  1      1     0.33                        cmn	w6, w7
 # CHECK-NEXT:  1      2     0.50                        cmn	w8, w9, lsl #15
 # CHECK-NEXT:  1      2     0.50                        cmn	w10, w11, lsl #31
 # CHECK-NEXT:  1      2     0.50                        cmn	w12, w13, lsr #0
@@ -1531,10 +1531,10 @@ drps
 # CHECK-NEXT:  1      2     0.50                        cmn	w18, w19, asr #0
 # CHECK-NEXT:  1      2     0.50                        cmn	w20, w21, asr #22
 # CHECK-NEXT:  1      2     0.50                        cmn	w22, w23, asr #31
-# CHECK-NEXT:  1      1     0.25                        cmn	x0, x3
-# CHECK-NEXT:  1      1     0.25                        cmn	xzr, x4
-# CHECK-NEXT:  1      1     0.25                        cmn	x5, xzr
-# CHECK-NEXT:  1      1     0.25                        cmn	x6, x7
+# CHECK-NEXT:  1      1     0.33                        cmn	x0, x3
+# CHECK-NEXT:  1      1     0.33                        cmn	xzr, x4
+# CHECK-NEXT:  1      1     0.33                        cmn	x5, xzr
+# CHECK-NEXT:  1      1     0.33                        cmn	x6, x7
 # CHECK-NEXT:  1      2     0.50                        cmn	x8, x9, lsl #15
 # CHECK-NEXT:  1      2     0.50                        cmn	x10, x11, lsl #63
 # CHECK-NEXT:  1      2     0.50                        cmn	x12, x13, lsr #0
@@ -1543,10 +1543,10 @@ drps
 # CHECK-NEXT:  1      2     0.50                        cmn	x18, x19, asr #0
 # CHECK-NEXT:  1      2     0.50                        cmn	x20, x21, asr #55
 # CHECK-NEXT:  1      2     0.50                        cmn	x22, x23, asr #63
-# CHECK-NEXT:  1      1     0.25                        cmp	w0, w3
-# CHECK-NEXT:  1      1     0.25                        cmp	wzr, w4
-# CHECK-NEXT:  1      1     0.25                        cmp	w5, wzr
-# CHECK-NEXT:  1      1     0.25                        cmp	w6, w7
+# CHECK-NEXT:  1      1     0.33                        cmp	w0, w3
+# CHECK-NEXT:  1      1     0.33                        cmp	wzr, w4
+# CHECK-NEXT:  1      1     0.33                        cmp	w5, wzr
+# CHECK-NEXT:  1      1     0.33                        cmp	w6, w7
 # CHECK-NEXT:  1      2     0.50                        cmp	w8, w9, lsl #15
 # CHECK-NEXT:  1      2     0.50                        cmp	w10, w11, lsl #31
 # CHECK-NEXT:  1      2     0.50                        cmp	w12, w13, lsr #0
@@ -1554,10 +1554,10 @@ drps
 # CHECK-NEXT:  1      2     0.50                        cmp	w18, w19, asr #0
 # CHECK-NEXT:  1      2     0.50                        cmp	w20, w21, asr #22
 # CHECK-NEXT:  1      2     0.50                        cmp	w22, w23, asr #31
-# CHECK-NEXT:  1      1     0.25                        cmp	x0, x3
-# CHECK-NEXT:  1      1     0.25                        cmp	xzr, x4
-# CHECK-NEXT:  1      1     0.25                        cmp	x5, xzr
-# CHECK-NEXT:  1      1     0.25                        cmp	x6, x7
+# CHECK-NEXT:  1      1     0.33                        cmp	x0, x3
+# CHECK-NEXT:  1      1     0.33                        cmp	xzr, x4
+# CHECK-NEXT:  1      1     0.33                        cmp	x5, xzr
+# CHECK-NEXT:  1      1     0.33                        cmp	x6, x7
 # CHECK-NEXT:  1      2     0.50                        cmp	x8, x9, lsl #15
 # CHECK-NEXT:  1      2     0.50                        cmp	x10, x11, lsl #63
 # CHECK-NEXT:  1      2     0.50                        cmp	x12, x13, lsr #0
@@ -1566,8 +1566,8 @@ drps
 # CHECK-NEXT:  1      2     0.50                        cmp	x18, x19, asr #0
 # CHECK-NEXT:  1      2     0.50                        cmp	x20, x21, asr #55
 # CHECK-NEXT:  1      2     0.50                        cmp	x22, x23, asr #63
-# CHECK-NEXT:  1      1     0.25                        cmp	wzr, w0
-# CHECK-NEXT:  1      1     0.25                        cmp	xzr, x0
+# CHECK-NEXT:  1      1     0.33                        cmp	wzr, w0
+# CHECK-NEXT:  1      1     0.33                        cmp	xzr, x0
 # CHECK-NEXT:  1      1     0.17                        adc	w29, w27, w25
 # CHECK-NEXT:  1      1     0.17                        adc	wzr, w3, w4
 # CHECK-NEXT:  1      1     0.17                        adc	w9, wzr, w10
@@ -1576,14 +1576,14 @@ drps
 # CHECK-NEXT:  1      1     0.17                        adc	xzr, x3, x4
 # CHECK-NEXT:  1      1     0.17                        adc	x9, xzr, x10
 # CHECK-NEXT:  1      1     0.17                        adc	x20, x0, xzr
-# CHECK-NEXT:  1      1     0.25                        adcs	w29, w27, w25
-# CHECK-NEXT:  1      1     0.25                        adcs	wzr, w3, w4
-# CHECK-NEXT:  1      1     0.25                        adcs	w9, wzr, w10
-# CHECK-NEXT:  1      1     0.25                        adcs	w20, w0, wzr
-# CHECK-NEXT:  1      1     0.25                        adcs	x29, x27, x25
-# CHECK-NEXT:  1      1     0.25                        adcs	xzr, x3, x4
-# CHECK-NEXT:  1      1     0.25                        adcs	x9, xzr, x10
-# CHECK-NEXT:  1      1     0.25                        adcs	x20, x0, xzr
+# CHECK-NEXT:  1      1     0.33                        adcs	w29, w27, w25
+# CHECK-NEXT:  1      1     0.33                        adcs	wzr, w3, w4
+# CHECK-NEXT:  1      1     0.33                        adcs	w9, wzr, w10
+# CHECK-NEXT:  1      1     0.33                        adcs	w20, w0, wzr
+# CHECK-NEXT:  1      1     0.33                        adcs	x29, x27, x25
+# CHECK-NEXT:  1      1     0.33                        adcs	xzr, x3, x4
+# CHECK-NEXT:  1      1     0.33                        adcs	x9, xzr, x10
+# CHECK-NEXT:  1      1     0.33                        adcs	x20, x0, xzr
 # CHECK-NEXT:  1      1     0.17                        sbc	w29, w27, w25
 # CHECK-NEXT:  1      1     0.17                        sbc	wzr, w3, w4
 # CHECK-NEXT:  1      1     0.17                        ngc	w9, w10
@@ -1592,26 +1592,26 @@ drps
 # CHECK-NEXT:  1      1     0.17                        sbc	xzr, x3, x4
 # CHECK-NEXT:  1      1     0.17                        ngc	x9, x10
 # CHECK-NEXT:  1      1     0.17                        sbc	x20, x0, xzr
-# CHECK-NEXT:  1      1     0.25                        sbcs	w29, w27, w25
-# CHECK-NEXT:  1      1     0.25                        sbcs	wzr, w3, w4
-# CHECK-NEXT:  1      1     0.25                        ngcs	w9, w10
-# CHECK-NEXT:  1      1     0.25                        sbcs	w20, w0, wzr
-# CHECK-NEXT:  1      1     0.25                        sbcs	x29, x27, x25
-# CHECK-NEXT:  1      1     0.25                        sbcs	xzr, x3, x4
-# CHECK-NEXT:  1      1     0.25                        ngcs	x9, x10
-# CHECK-NEXT:  1      1     0.25                        sbcs	x20, x0, xzr
+# CHECK-NEXT:  1      1     0.33                        sbcs	w29, w27, w25
+# CHECK-NEXT:  1      1     0.33                        sbcs	wzr, w3, w4
+# CHECK-NEXT:  1      1     0.33                        ngcs	w9, w10
+# CHECK-NEXT:  1      1     0.33                        sbcs	w20, w0, wzr
+# CHECK-NEXT:  1      1     0.33                        sbcs	x29, x27, x25
+# CHECK-NEXT:  1      1     0.33                        sbcs	xzr, x3, x4
+# CHECK-NEXT:  1      1     0.33                        ngcs	x9, x10
+# CHECK-NEXT:  1      1     0.33                        sbcs	x20, x0, xzr
 # CHECK-NEXT:  1      1     0.17                        ngc	w3, w12
 # CHECK-NEXT:  1      1     0.17                        ngc	wzr, w9
 # CHECK-NEXT:  1      1     0.17                        ngc	w23, wzr
 # CHECK-NEXT:  1      1     0.17                        ngc	x29, x30
 # CHECK-NEXT:  1      1     0.17                        ngc	xzr, x0
 # CHECK-NEXT:  1      1     0.17                        ngc	x0, xzr
-# CHECK-NEXT:  1      1     0.25                        ngcs	w3, w12
-# CHECK-NEXT:  1      1     0.25                        ngcs	wzr, w9
-# CHECK-NEXT:  1      1     0.25                        ngcs	w23, wzr
-# CHECK-NEXT:  1      1     0.25                        ngcs	x29, x30
-# CHECK-NEXT:  1      1     0.25                        ngcs	xzr, x0
-# CHECK-NEXT:  1      1     0.25                        ngcs	x0, xzr
+# CHECK-NEXT:  1      1     0.33                        ngcs	w3, w12
+# CHECK-NEXT:  1      1     0.33                        ngcs	wzr, w9
+# CHECK-NEXT:  1      1     0.33                        ngcs	w23, wzr
+# CHECK-NEXT:  1      1     0.33                        ngcs	x29, x30
+# CHECK-NEXT:  1      1     0.33                        ngcs	xzr, x0
+# CHECK-NEXT:  1      1     0.33                        ngcs	x0, xzr
 # CHECK-NEXT:  1      1     0.17                        sbfx	x1, x2, #3, #2
 # CHECK-NEXT:  1      1     0.17                        asr	x3, x4, #63
 # CHECK-NEXT:  1      1     0.17                        asr	wzr, wzr, ...
[truncated]

Copy link
Contributor

@rj-jesus rj-jesus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for these changes, in general this LGTM. Just a couple of questions:

  1. The SOG doesn't seem to mention a G group for integer single-cycle 0/1 and single/multicycle 0 pipes. Are there any plans to update the SOG to reflect this?
  2. Except for the address generation instructions (ADR and ADRP), which have a throughput of 4, all instructions that use the F group seem to have a throughput of at best 3 according to the SOG. If there's going to be a discrepancy between the model and the SOG re the F group, would it be better to make the ADR* instructions the exception? In any case, if there's going to be a discrepancy, it would be good to have this documented in a comment, for example.

What do you think?

@Asher8118
Copy link
Author

Hi, thanks for these changes, in general this LGTM. Just a couple of questions:

  1. The SOG doesn't seem to mention a G group for integer single-cycle 0/1 and single/multicycle 0 pipes. Are there any plans to update the SOG to reflect this?
  2. Except for the address generation instructions (ADR and ADRP), which have a throughput of 4, all instructions that use the F group seem to have a throughput of at best 3 according to the SOG. If there's going to be a discrepancy between the model and the SOG re the F group, would it be better to make the ADR* instructions the exception? In any case, if there's going to be a discrepancy, it would be good to have this documented in a comment, for example.

What do you think?

Thanks for the comments, this patch does not reflect a SOG update, since the resources in group G are all part of F as well.
The G unit I created in the patch is for the purpose of modelling the behaviour of instructions that use pipeline F, but have throughput of 3 rather than 4. The instructions in this patch still use resources that are part of pipeline F. This is not a new group that needs to be added to the SOG. Similar behaviour is represented in the V1 scheduling model with the J unit.

@rj-jesus
Copy link
Contributor

Thanks for the comments, this patch does not reflect a SOG update, since the resources in group G are all part of F as well. The G unit I created in the patch is for the purpose of modelling the behaviour of instructions that use pipeline F, but have throughput of 3 rather than 4. The instructions in this patch still use resources that are part of pipeline F. This is not a new group that needs to be added to the SOG. Similar behaviour is represented in the V1 scheduling model with the J unit.

Thanks, I get your point and I agree, it's good to match the throughput reported in the SOG. I'm just trying to understand if the instructions in this patch can really only go through M0, or if they can go through either M0/1 pipe (in addition to the single-cycle 0/1 pipes). M0 already has quite some instructions specific to it, including some predicate instructions, so it would be good to get this right to model its backend pressure a bit more accurately.

@Asher8118
Copy link
Author

Asher8118 commented Oct 24, 2024

I'm just trying to understand if the instructions in this patch can really only go through M0, or if they can go through either M0/1 pipe (in addition to the single-cycle 0/1 pipes). M0 already has quite some instructions specific to it, including some predicate instructions, so it would be good to get this right to model its backend pressure a bit more accurately.

I see your point. My understanding is that since the SOG points to pipeline F for those instructions, then they can go through either M0/1 pipe. I selected M0 only to keep the model consistent with V1. I'm not sure what happens on average, whether pipeline M0 or M1 is used more often. If there is evidence to suggest which pipe is used less then I'll be happy to reflect this behaviour in this patch.

@davemgreen
Copy link
Collaborator

I believe that the throughput restrictions in these cases don't come from the pipelines that they can issue into but some other stage like rename or decode. Limiting the number of execution units they can use is one alternative, but in reality is can issue into any F pipeline. I have suggested in other reviews (with a different kind of instruction) that we get the pipeline features right and ignore getting the exact throughput correct. In reality it is difficult to hit this many flag-setting instructions that actually do something useful in a single cycle.

One way to fix it might be to add a new resource, which there are 3 of, that represents the number of flag-setting instructions per cycles. It might not be exactly correct as it will use that resource in the same stage as issue, not in the decode/rename stage. If we add too many of those the scheduler could start to get confused too, making sub-optimal decision because it is caring about the wrong thing. It might be worth a try though, so see if it can solve the issue that way.

@Asher8118
Copy link
Author

One way to fix it might be to add a new resource, which there are 3 of, that represents the number of flag-setting instructions per cycles. It might not be exactly correct as it will use that resource in the same stage as issue, not in the decode/rename stage. If we add too many of those the scheduler could start to get confused too, making sub-optimal decision because it is caring about the wrong thing. It might be worth a try though, so see if it can solve the issue that way.

I've implemented this for both V1 and V2. Most of the changes in the tests come from adding the new flag resource to the MCA output.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing V1 too. LGTM

@Asher8118 Asher8118 merged commit 69b4784 into llvm:main Dec 4, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants