Skip to content

[RISCV] Add SiFiveP600Model SchedModel that is used by sifive-p670 #84962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 18, 2024

Conversation

michaelmaitland
Copy link
Contributor

This PR includes an initial scheduler model shows improvement on multiple workloads over NoSchedModel and SiFive7Model for sifive-p670. We plan on making significant changes to this model in the future so that it is more accurate. This patch would close #80612.

@llvmbot
Copy link
Member

llvmbot commented Mar 12, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Michael Maitland (michaelmaitland)

Changes

This PR includes an initial scheduler model shows improvement on multiple workloads over NoSchedModel and SiFive7Model for sifive-p670. We plan on making significant changes to this model in the future so that it is more accurate. This patch would close #80612.


Patch is 45.48 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/84962.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCV.td (+1)
  • (modified) llvm/lib/Target/RISCV/RISCVProcessors.td (+1-1)
  • (added) llvm/lib/Target/RISCV/RISCVSchedSiFiveP600.td (+1016)
diff --git a/llvm/lib/Target/RISCV/RISCV.td b/llvm/lib/Target/RISCV/RISCV.td
index 27d52c16a4f39d..9fcb092417d175 100644
--- a/llvm/lib/Target/RISCV/RISCV.td
+++ b/llvm/lib/Target/RISCV/RISCV.td
@@ -43,6 +43,7 @@ include "RISCVMacroFusion.td"
 include "RISCVSchedRocket.td"
 include "RISCVSchedSiFive7.td"
 include "RISCVSchedSiFiveP400.td"
+include "RISCVSchedSiFiveP600.td"
 include "RISCVSchedSyntacoreSCR1.td"
 
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/RISCV/RISCVProcessors.td b/llvm/lib/Target/RISCV/RISCVProcessors.td
index 59bb811058d488..a3a56a3fbd1161 100644
--- a/llvm/lib/Target/RISCV/RISCVProcessors.td
+++ b/llvm/lib/Target/RISCV/RISCVProcessors.td
@@ -245,7 +245,7 @@ def SIFIVE_P450 : RISCVProcessorModel<"sifive-p450", SiFiveP400Model,
                                        TuneLUIADDIFusion,
                                        TuneAUIPCADDIFusion]>;
 
-def SIFIVE_P670 : RISCVProcessorModel<"sifive-p670", NoSchedModel,
+def SIFIVE_P670 : RISCVProcessorModel<"sifive-p670", SiFiveP600Model,
                                       [Feature64Bit,
                                        FeatureStdExtZifencei,
                                        FeatureStdExtM,
diff --git a/llvm/lib/Target/RISCV/RISCVSchedSiFiveP600.td b/llvm/lib/Target/RISCV/RISCVSchedSiFiveP600.td
new file mode 100644
index 00000000000000..b271daa7ae699a
--- /dev/null
+++ b/llvm/lib/Target/RISCV/RISCVSchedSiFiveP600.td
@@ -0,0 +1,1016 @@
+//==- RISCVSchedSiFiveP600.td - SiFiveP600 Scheduling Defs ---*- tablegen -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+//===----------------------------------------------------------------------===//
+
+/// c is true if mx has the worst case behavior compared to LMULs in MxList.
+/// On the SiFiveP600, the worst case LMUL is the Largest LMUL
+/// and the worst case sew is the smallest SEW for that LMUL.
+class SiFiveP600IsWorstCaseMX<string mx, list<string> MxList> {
+  string LLMUL = LargestLMUL<MxList>.r;
+  bit c = !eq(mx, LLMUL);
+}
+
+class SiFiveP600IsWorstCaseMXSEW<string mx, int sew, list<string> MxList, bit isF = 0> {
+  string LLMUL = LargestLMUL<MxList>.r;
+  int SSEW = SmallestSEW<mx, isF>.r;
+  bit c = !and(!eq(mx, LLMUL), !eq(sew, SSEW));
+}
+
+// 1 Micro-Op per cycle.
+class SiFiveP600GetLMulCycles<string mx> {
+  int c = !cond(
+    !eq(mx, "M1") : 1,
+    !eq(mx, "M2") : 2,
+    !eq(mx, "M4") : 4,
+    !eq(mx, "M8") : 8,
+    !eq(mx, "MF2") : 1,
+    !eq(mx, "MF4") : 1,
+    !eq(mx, "MF8") : 1
+  );
+}
+
+// Latency for segmented loads and stores are calculated as vl * nf.
+class SiFiveP600GetCyclesSegmented<string mx, int sew, int nf> {
+  defvar VLEN = 128;
+  defvar VLUpperBound = !cond(
+    !eq(mx, "M1") : !div(VLEN, sew),
+    !eq(mx, "M2") : !div(!mul(VLEN, 2), sew),
+    !eq(mx, "M4") : !div(!mul(VLEN, 4), sew),
+    !eq(mx, "M8") : !div(!mul(VLEN, 8), sew),
+    !eq(mx, "MF2") : !div(!div(VLEN, 2), sew),
+    !eq(mx, "MF4") : !div(!div(VLEN, 4), sew),
+    !eq(mx, "MF8") : !div(!div(VLEN, 8), sew),
+  );
+  int c = !mul(VLUpperBound, nf);
+}
+
+// SiFiveP600 machine model for scheduling and other instruction cost heuristics.
+def SiFiveP600Model : SchedMachineModel {
+  let IssueWidth = 4;         // 4 micro-ops are dispatched per cycle.
+  let MicroOpBufferSize = 160; // Max micro-ops that can be buffered.
+  let LoadLatency = 4;        // Cycles for loads to access the cache.
+  let MispredictPenalty = 9;  // Extra cycles for a mispredicted branch.
+  let PostRAScheduler = true;
+  let UnsupportedFeatures = [HasStdExtZbkb, HasStdExtZbkc, HasStdExtZbkx,
+                             HasStdExtZknd, HasStdExtZkne, HasStdExtZknh,
+                             HasStdExtZksed, HasStdExtZksh, HasStdExtZkr,
+                             HasVendorXSfvqmaccqoq];
+  let CompleteModel = false;
+}
+
+let SchedModel = SiFiveP600Model in {
+
+def SiFiveP600IEXQ0       : ProcResource<1>;
+def SiFiveP600IEXQ1       : ProcResource<1>;
+def SiFiveP600IEXQ2       : ProcResource<1>;
+def SiFiveP600IEXQ3       : ProcResource<1>;
+def SiFiveP600FEXQ0       : ProcResource<1>;
+def SiFiveP600FEXQ1       : ProcResource<1>;
+
+// Two Load/Store ports that can issue either two loads, two stores, or one load
+// and one store (P550 has one load and one separate store pipe).
+def SiFiveP600LDST       : ProcResource<2>;
+
+// 4-wide pipeline with 4 ALU pipes.
+def SiFiveP600IntArith    : ProcResGroup<[SiFiveP600IEXQ0, SiFiveP600IEXQ1, SiFiveP600IEXQ2, SiFiveP600IEXQ3]>;
+defvar SiFiveP600SYS      = SiFiveP600IEXQ0;
+defvar SiFiveP600CMOV     = SiFiveP600IEXQ0;
+defvar SiFiveP600MulI2F   = SiFiveP600IEXQ1;
+def SiFiveP600Branch      : ProcResGroup<[SiFiveP600IEXQ2, SiFiveP600IEXQ3]>;
+def SiFiveP600Div         : ProcResource<1>;
+
+def SiFiveP600FloatArith  : ProcResGroup<[SiFiveP600FEXQ0, SiFiveP600FEXQ1]>;
+defvar SiFiveP600F2I      = SiFiveP600FEXQ0;
+def SiFiveP600FloatDiv    : ProcResource<1>;
+
+// Vector pipeline
+// VEXQ0 handle Mask, Simple Slide instructions,
+// VEXQ1 handle Complex Slide, Permutation, Reductions, Divide instructions.
+// Other vector instructions can be done in VEXQ0 and VEXQ1.
+def SiFiveP600VEXQ0        : ProcResource<1>;
+def SiFiveP600VEXQ1        : ProcResource<1>;
+def SiFiveP600VectorArith  : ProcResGroup<[SiFiveP600VEXQ0, SiFiveP600VEXQ1]>;
+
+// In Baler has 2 pipeline for Load and Store.
+def SiFiveP600VLD          : ProcResource<1>;
+def SiFiveP600VST          : ProcResource<1>;
+def SiFiveP600VDiv         : ProcResource<1>;
+def SiFiveP600VFloatDiv    : ProcResource<1>;
+
+let Latency = 1 in {
+// Integer arithmetic and logic
+def : WriteRes<WriteIALU, [SiFiveP600IntArith]>;
+def : WriteRes<WriteIALU32, [SiFiveP600IntArith]>;
+def : WriteRes<WriteShiftImm, [SiFiveP600IntArith]>;
+def : WriteRes<WriteShiftImm32, [SiFiveP600IntArith]>;
+def : WriteRes<WriteShiftReg, [SiFiveP600IntArith]>;
+def : WriteRes<WriteShiftReg32, [SiFiveP600IntArith]>;
+// Branching
+def : WriteRes<WriteJmp, [SiFiveP600Branch]>;
+def : WriteRes<WriteJal, [SiFiveP600Branch]>;
+def : WriteRes<WriteJalr, [SiFiveP600Branch]>;
+}
+
+// CMOV
+def P600WriteCMOV : SchedWriteRes<[SiFiveP600Branch, SiFiveP600CMOV]> {
+  let Latency = 2;
+  let NumMicroOps = 2;
+}
+def : InstRW<[P600WriteCMOV], (instrs PseudoCCMOVGPRNoX0)>;
+
+let Latency = 3 in {
+// Integer multiplication
+def : WriteRes<WriteIMul, [SiFiveP600MulI2F]>;
+def : WriteRes<WriteIMul32, [SiFiveP600MulI2F]>;
+// cpop[w] look exactly like multiply.
+def : WriteRes<WriteCPOP, [SiFiveP600MulI2F]>;
+def : WriteRes<WriteCPOP32, [SiFiveP600MulI2F]>;
+}
+
+// Integer division
+def : WriteRes<WriteIDiv, [SiFiveP600MulI2F, SiFiveP600Div]> {
+  let Latency = 35;
+  let ReleaseAtCycles = [1, 34];
+}
+def : WriteRes<WriteIDiv32,  [SiFiveP600MulI2F, SiFiveP600Div]> {
+  let Latency = 20;
+  let ReleaseAtCycles = [1, 19];
+}
+
+let Latency = 1 in {
+// Bitmanip
+def : WriteRes<WriteRotateImm, [SiFiveP600IntArith]>;
+def : WriteRes<WriteRotateImm32, [SiFiveP600IntArith]>;
+def : WriteRes<WriteRotateReg, [SiFiveP600IntArith]>;
+def : WriteRes<WriteRotateReg32, [SiFiveP600IntArith]>;
+
+def : WriteRes<WriteCLZ, [SiFiveP600IntArith]>;
+def : WriteRes<WriteCLZ32, [SiFiveP600IntArith]>;
+def : WriteRes<WriteCTZ, [SiFiveP600IntArith]>;
+def : WriteRes<WriteCTZ32, [SiFiveP600IntArith]>;
+
+def : WriteRes<WriteORCB, [SiFiveP600IntArith]>;
+
+def : WriteRes<WriteREV8, [SiFiveP600IntArith]>;
+
+def : WriteRes<WriteSHXADD, [SiFiveP600IntArith]>;
+def : WriteRes<WriteSHXADD32, [SiFiveP600IntArith]>;
+
+def : WriteRes<WriteSingleBit, [SiFiveP600IntArith]>;
+def : WriteRes<WriteSingleBitImm, [SiFiveP600IntArith]>;
+def : WriteRes<WriteBEXT, [SiFiveP600IntArith]>;
+def : WriteRes<WriteBEXTI, [SiFiveP600IntArith]>;
+}
+
+// Memory
+let Latency = 1 in {
+def : WriteRes<WriteSTB, [SiFiveP600LDST]>;
+def : WriteRes<WriteSTH, [SiFiveP600LDST]>;
+def : WriteRes<WriteSTW, [SiFiveP600LDST]>;
+def : WriteRes<WriteSTD, [SiFiveP600LDST]>;
+def : WriteRes<WriteFST16, [SiFiveP600LDST]>;
+def : WriteRes<WriteFST32, [SiFiveP600LDST]>;
+def : WriteRes<WriteFST64, [SiFiveP600LDST]>;
+}
+let Latency = 4 in {
+def : WriteRes<WriteLDB, [SiFiveP600LDST]>;
+def : WriteRes<WriteLDH, [SiFiveP600LDST]>;
+}
+let Latency = 4 in {
+def : WriteRes<WriteLDW, [SiFiveP600LDST]>;
+def : WriteRes<WriteLDD, [SiFiveP600LDST]>;
+}
+
+let Latency = 6 in {
+def : WriteRes<WriteFLD16, [SiFiveP600LDST]>;
+def : WriteRes<WriteFLD32, [SiFiveP600LDST]>;
+def : WriteRes<WriteFLD64, [SiFiveP600LDST]>;
+}
+
+// Atomic memory
+let Latency = 3 in {
+def : WriteRes<WriteAtomicSTW, [SiFiveP600LDST]>;
+def : WriteRes<WriteAtomicSTD, [SiFiveP600LDST]>;
+def : WriteRes<WriteAtomicW, [SiFiveP600LDST]>;
+def : WriteRes<WriteAtomicD, [SiFiveP600LDST]>;
+def : WriteRes<WriteAtomicLDW, [SiFiveP600LDST]>;
+def : WriteRes<WriteAtomicLDD, [SiFiveP600LDST]>;
+}
+
+// Floating point
+let Latency = 2 in {
+def : WriteRes<WriteFAdd16, [SiFiveP600FloatArith]>;
+def : WriteRes<WriteFAdd32, [SiFiveP600FloatArith]>;
+def : WriteRes<WriteFAdd64, [SiFiveP600FloatArith]>;
+}
+let Latency = 3 in {
+def : WriteRes<WriteFMul16, [SiFiveP600FloatArith]>;
+def : WriteRes<WriteFMul32, [SiFiveP600FloatArith]>;
+def : WriteRes<WriteFMul64, [SiFiveP600FloatArith]>;
+}
+let Latency = 4 in {
+def : WriteRes<WriteFMA16, [SiFiveP600FloatArith]>;
+def : WriteRes<WriteFMA32, [SiFiveP600FloatArith]>;
+def : WriteRes<WriteFMA64, [SiFiveP600FloatArith]>;
+}
+
+let Latency = 2 in {
+def : WriteRes<WriteFSGNJ16, [SiFiveP600FloatArith]>;
+def : WriteRes<WriteFSGNJ32, [SiFiveP600FloatArith]>;
+def : WriteRes<WriteFSGNJ64, [SiFiveP600FloatArith]>;
+
+def : WriteRes<WriteFMinMax16, [SiFiveP600FloatArith]>;
+def : WriteRes<WriteFMinMax32, [SiFiveP600FloatArith]>;
+def : WriteRes<WriteFMinMax64, [SiFiveP600FloatArith]>;
+}
+
+// Half precision.
+def : WriteRes<WriteFDiv16, [SiFiveP600FEXQ1, SiFiveP600FloatDiv]> {
+  let Latency = 4;
+  let ReleaseAtCycles = [1, 4];
+}
+def : WriteRes<WriteFSqrt16, [SiFiveP600FEXQ1, SiFiveP600FloatDiv]> {
+  let Latency = 18;
+  let ReleaseAtCycles = [1, 17];
+}
+
+// Single precision.
+def : WriteRes<WriteFDiv32, [SiFiveP600FEXQ1, SiFiveP600FloatDiv]> {
+  let Latency = 6;
+  let ReleaseAtCycles = [1, 6];
+}
+def : WriteRes<WriteFSqrt32, [SiFiveP600FEXQ1, SiFiveP600FloatDiv]> {
+  let Latency = 18;
+  let ReleaseAtCycles = [1, 17];
+}
+
+// Double precision
+def : WriteRes<WriteFDiv64, [SiFiveP600FEXQ1, SiFiveP600FloatDiv]> {
+  let Latency = 11;
+  let ReleaseAtCycles = [1, 11];
+}
+def : WriteRes<WriteFSqrt64, [SiFiveP600FEXQ1, SiFiveP600FloatDiv]> {
+  let Latency = 33;
+  let ReleaseAtCycles = [1, 32];
+}
+
+// Conversions
+let Latency = 2 in {
+def : WriteRes<WriteFCvtI32ToF16, [SiFiveP600MulI2F]>;
+def : WriteRes<WriteFCvtI32ToF32, [SiFiveP600MulI2F]>;
+def : WriteRes<WriteFCvtI32ToF64, [SiFiveP600MulI2F]>;
+def : WriteRes<WriteFCvtI64ToF16, [SiFiveP600MulI2F]>;
+def : WriteRes<WriteFCvtI64ToF32, [SiFiveP600MulI2F]>;
+def : WriteRes<WriteFCvtI64ToF64, [SiFiveP600MulI2F]>;
+def : WriteRes<WriteFCvtF16ToI32, [SiFiveP600F2I]>;
+def : WriteRes<WriteFCvtF16ToI64, [SiFiveP600F2I]>;
+def : WriteRes<WriteFCvtF16ToF32, [SiFiveP600FloatArith]>;
+def : WriteRes<WriteFCvtF16ToF64, [SiFiveP600FloatArith]>;
+def : WriteRes<WriteFCvtF32ToI32, [SiFiveP600F2I]>;
+def : WriteRes<WriteFCvtF32ToI64, [SiFiveP600F2I]>;
+def : WriteRes<WriteFCvtF32ToF16, [SiFiveP600FloatArith]>;
+def : WriteRes<WriteFCvtF32ToF64, [SiFiveP600FloatArith]>;
+def : WriteRes<WriteFCvtF64ToI32, [SiFiveP600F2I]>;
+def : WriteRes<WriteFCvtF64ToI64, [SiFiveP600F2I]>;
+def : WriteRes<WriteFCvtF64ToF16, [SiFiveP600FloatArith]>;
+def : WriteRes<WriteFCvtF64ToF32, [SiFiveP600FloatArith]>;
+
+def : WriteRes<WriteFClass16, [SiFiveP600F2I]>;
+def : WriteRes<WriteFClass32, [SiFiveP600F2I]>;
+def : WriteRes<WriteFClass64, [SiFiveP600F2I]>;
+def : WriteRes<WriteFCmp16, [SiFiveP600F2I]>;
+def : WriteRes<WriteFCmp32, [SiFiveP600F2I]>;
+def : WriteRes<WriteFCmp64, [SiFiveP600F2I]>;
+def : WriteRes<WriteFMovI16ToF16, [SiFiveP600MulI2F]>;
+def : WriteRes<WriteFMovF16ToI16, [SiFiveP600F2I]>;
+def : WriteRes<WriteFMovI32ToF32, [SiFiveP600MulI2F]>;
+def : WriteRes<WriteFMovF32ToI32, [SiFiveP600F2I]>;
+def : WriteRes<WriteFMovI64ToF64, [SiFiveP600MulI2F]>;
+def : WriteRes<WriteFMovF64ToI64, [SiFiveP600F2I]>;
+}
+
+// 6. Configuration-Setting Instructions
+def : WriteRes<WriteVSETVLI, [SiFiveP600SYS]>;
+def : WriteRes<WriteVSETIVLI, [SiFiveP600SYS]>;
+def : WriteRes<WriteVSETVL, [SiFiveP600SYS]>;
+
+// 7. Vector Loads and Stores
+// FIXME: This unit is still being improved, currently
+// it is based on stage numbers. Estimates are optimistic,
+// latency may be longer.
+foreach mx = SchedMxList in {
+  defvar LMulLat = SiFiveP600GetLMulCycles<mx>.c;
+  defvar IsWorstCase = SiFiveP600IsWorstCaseMX<mx, SchedMxList>.c;
+  let Latency = 8, ReleaseAtCycles = [LMulLat] in {
+    defm "" : LMULWriteResMX<"WriteVLDE",    [SiFiveP600VLD], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVLDM",    [SiFiveP600VLD], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVLDFF",   [SiFiveP600VLD], mx, IsWorstCase>;
+  }
+  let Latency = 12, ReleaseAtCycles = [LMulLat] in {
+    defm "" : LMULWriteResMX<"WriteVLDS8",   [SiFiveP600VLD], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVLDS16",  [SiFiveP600VLD], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVLDS32",  [SiFiveP600VLD], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVLDS64",  [SiFiveP600VLD], mx, IsWorstCase>;
+  }
+  let Latency = 12, ReleaseAtCycles = [LMulLat] in {
+    defm "" : LMULWriteResMX<"WriteVLDUX8",  [SiFiveP600VLD], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVLDUX16", [SiFiveP600VLD], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVLDUX32", [SiFiveP600VLD], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVLDUX64", [SiFiveP600VLD], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVLDOX8",  [SiFiveP600VLD], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVLDOX16", [SiFiveP600VLD], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVLDOX32", [SiFiveP600VLD], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVLDOX64", [SiFiveP600VLD], mx, IsWorstCase>;
+  }
+}
+
+foreach mx = SchedMxList in {
+  defvar LMulLat = SiFiveP600GetLMulCycles<mx>.c;
+  defvar IsWorstCase = SiFiveP600IsWorstCaseMX<mx, SchedMxList>.c;
+  let Latency = 8, ReleaseAtCycles = [LMulLat] in {
+    defm "" : LMULWriteResMX<"WriteVSTE",    [SiFiveP600VST], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTM",    [SiFiveP600VST], mx, IsWorstCase>;
+  }
+  let Latency = 12, ReleaseAtCycles = [LMulLat] in {
+    defm "" : LMULWriteResMX<"WriteVSTS8",   [SiFiveP600VST], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTS16",  [SiFiveP600VST], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTS32",  [SiFiveP600VST], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTS64",  [SiFiveP600VST], mx, IsWorstCase>;
+  }
+  let Latency = 12, ReleaseAtCycles = [LMulLat] in {
+    defm "" : LMULWriteResMX<"WriteVSTUX8",  [SiFiveP600VST], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTUX16", [SiFiveP600VST], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTUX32", [SiFiveP600VST], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTUX64", [SiFiveP600VST], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTOX8",  [SiFiveP600VST], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTOX16", [SiFiveP600VST], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTOX32", [SiFiveP600VST], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTOX64", [SiFiveP600VST], mx, IsWorstCase>;
+  }
+}
+
+foreach mx = SchedMxList in {
+  foreach nf=2-8 in {
+    foreach eew = [8, 16, 32, 64] in {
+      defvar LMulLat = SiFiveP600GetCyclesSegmented<mx, eew, nf>.c;
+      defvar IsWorstCase = SiFiveP600IsWorstCaseMX<mx, SchedMxList>.c;
+      let Latency = !add(12, LMulLat), ReleaseAtCycles = [!add(12, LMulLat)] in {
+        defm "" : LMULWriteResMX<"WriteVLSEG" # nf # "e" # eew,   [SiFiveP600VLD], mx, IsWorstCase>;
+        defm "" : LMULWriteResMX<"WriteVLSEGFF" # nf # "e" # eew, [SiFiveP600VLD], mx, IsWorstCase>;
+        defm "" : LMULWriteResMX<"WriteVLSSEG" # nf # "e" # eew,  [SiFiveP600VLD], mx, IsWorstCase>;
+        defm "" : LMULWriteResMX<"WriteVLUXSEG" # nf # "e" # eew, [SiFiveP600VLD], mx, IsWorstCase>;
+        defm "" : LMULWriteResMX<"WriteVLOXSEG" # nf # "e" # eew, [SiFiveP600VLD], mx, IsWorstCase>;
+      }
+      let Latency = !add(1, LMulLat), ReleaseAtCycles = [!add(12, LMulLat)] in {
+        defm "" : LMULWriteResMX<"WriteVSSEG" # nf # "e" # eew,   [SiFiveP600VST], mx, IsWorstCase>;
+        defm "" : LMULWriteResMX<"WriteVSSSEG" # nf # "e" # eew,  [SiFiveP600VST], mx, IsWorstCase>;
+        defm "" : LMULWriteResMX<"WriteVSUXSEG" # nf # "e" # eew, [SiFiveP600VST], mx, IsWorstCase>;
+        defm "" : LMULWriteResMX<"WriteVSOXSEG" # nf # "e" # eew, [SiFiveP600VST], mx, IsWorstCase>;
+      }
+    }
+  }
+}
+
+// Whole register move/load/store
+foreach LMul = [1, 2, 4, 8] in {
+  let Latency = 8, ReleaseAtCycles = [LMul] in {
+    def : WriteRes<!cast<SchedWrite>("WriteVLD" # LMul # "R"), [SiFiveP600VLD]>;
+    def : WriteRes<!cast<SchedWrite>("WriteVST" # LMul # "R"), [SiFiveP600VST]>;
+  }
+  let Latency = LMul, ReleaseAtCycles = [LMul] in {
+    def : WriteRes<!cast<SchedWrite>("WriteVMov" # LMul # "V"), [SiFiveP600VectorArith]>;
+  }
+}
+
+// 11. Vector Integer Arithmetic Instructions
+foreach mx = SchedMxList in {
+  defvar LMulLat = SiFiveP600GetLMulCycles<mx>.c;
+  defvar IsWorstCase = SiFiveP600IsWorstCaseMX<mx, SchedMxList>.c;
+  let Latency = 1, ReleaseAtCycles = [LMulLat] in {
+    defm "" : LMULWriteResMX<"WriteVIALUV",   [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVIALUX",   [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVIALUI",   [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVExtV",    [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVICALUV",  [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVICALUX",  [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVICALUI",  [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVICmpV",   [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVICmpX",   [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVICmpI",   [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVIMergeV", [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVIMergeX", [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVIMergeI", [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVIMovV",   [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVIMovX",   [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVIMovI",   [SiFiveP600VectorArith], mx, IsWorstCase>;
+  }
+  let Latency = 6, ReleaseAtCycles = [LMulLat] in {
+    defm "" : LMULWriteResMX<"WriteVShiftV",   [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVShiftX",   [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVShiftI",   [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVIMinMaxV", [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVIMinMaxX", [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVIMulV",    [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVIMulX",    [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVIMulAddV", [SiFiveP600VectorArith], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVIMulAddX", [SiFiveP600VectorArith], mx, IsWorstCase>;
+  }
+}
+// Widening
+foreach mx = SchedMxListW in {
+  defvar LMulLat = SiFiveP600GetLMulCycles<mx>.c;
+  defvar IsWorstCase = SiFiveP600IsWorstCaseM...
[truncated]

This changeset includes an initial scheduler model shows improvement on spec2017
over NoSchedModel for sifive-p670. We plan on making signfificant changes to
this model in the future.
Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM w/minor comments

// Latency for segmented loads and stores are calculated as vl * nf.
class SiFiveP600GetCyclesSegmented<string mx, int sew, int nf> {
defvar VLEN = 128;
defvar VLUpperBound = !cond(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to simply be computing VLMAX. If we don't already have a shared utility for that, we should extract one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will extract this in a NFC follow up since I think we use this in other SchedModel too.

def SiFiveP600VEXQ1 : ProcResource<1>;
def SiFiveP600VectorArith : ProcResGroup<[SiFiveP600VEXQ0, SiFiveP600VEXQ1]>;

// In Baler has 2 pipeline for Load and Store.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my bad English, but I don't know what In Baler means...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Baler is for vector load/store, used to classify index/mask/segment/stride instruction types and then transfer the instructions to the LD/ST pipe. And it can collect the vector load/store results and bypass the results to the vector unit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's an internal name? Then I think maybe we should make it clear in comments?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's internal name. I agree remove the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the comment

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@michaelmaitland michaelmaitland merged commit c48d818 into llvm:main Mar 18, 2024
michaelmaitland added a commit that referenced this pull request Mar 18, 2024
CI checks were passing in #84962 (c48d818) but
that commit caused failures once merged due to ships passing since the
PR was not rebased on #85131. This commit fixes this problem by adding
sched resources for integer min max instructions from Zbb in P600 model.
chencha3 pushed a commit to chencha3/llvm-project that referenced this pull request Mar 23, 2024
…lvm#84962)

This PR includes an initial scheduler model shows improvement on
multiple workloads over NoSchedModel and SiFive7Model for sifive-p670.
We plan on making significant changes to this model in the future so
that it is more accurate. This patch would close
llvm#80612.
chencha3 pushed a commit to chencha3/llvm-project that referenced this pull request Mar 23, 2024
CI checks were passing in llvm#84962 (c48d818) but
that commit caused failures once merged due to ships passing since the
PR was not rebased on llvm#85131. This commit fixes this problem by adding
sched resources for integer min max instructions from Zbb in P600 model.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants