Skip to content

[MVT][TableGen] Extend Machine Value Type to uint16_t #99657

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 31, 2024

Conversation

4vtomat
Copy link
Member

@4vtomat 4vtomat commented Jul 19, 2024

RFC: https://discourse.llvm.org/t/rfc-extend-machine-value-type-from-uint8-t-to-uint16-t/80274
compile-time-tracker: https://llvm-compile-time-tracker.com/compare.php?from=4b9fab591916eec9fd1942f37afe3b137b564089&to=177d28247efe5a4d59a8d8150b4daf01e4f57d74&stat=wall-time

Currently 208 out of 256 MVTs are used, it will be run out soon, so
ultimately we need to extend the original MVT::SimpleValueType from
uint8_t to uint16_t to accomodate more types.
The MatcherTable uses unsigned char for encoding the matcher code,
so the extended MVTs are no longer fit into the table, thus we need to
use VBR to encode them as we do on others that are wider than 8 bits.

The statistics below shows the difference of "Total Array size" of the
matcher table that appears in every files:

Table                       Before     After     Change(%)
WebAssemblyGenDAGISel.inc   23576      23775     0.844
NVPTXGenDAGISel.inc         173498     173498    0
RISCVGenDAGISel.inc         2179121    2369929   8.756
AVRGenDAGISel.inc           2754       2754      0
PPCGenDAGISel.inc           163315     163617    0.185
MipsGenDAGISel.inc          47280      47447     0.353
SystemZGenDAGISel.inc       56243      56461     0.388
AArch64GenDAGISel.inc       467893     487830    4.261
MSP430GenDAGISel.inc        8069       8069      0
LoongArchGenDAGISel.inc     78928      79131     0.257
XCoreGenDAGISel.inc         3432       3432      0
BPFGenDAGISel.inc           3733       3733      0
VEGenDAGISel.inc            65174      66456     1.967
LanaiGenDAGISel.inc         2067       2067      0
X86GenDAGISel.inc           628787     636987    1.304
ARMGenDAGISel.inc           170968     171036    0.040
HexagonGenDAGISel.inc       155764     155764    0
SparcGenDAGISel.inc         5762       5798      0.625
AMDGPUGenDAGISel.inc        504356     504463    0.021
R600GenDAGISel.inc          29785      29785     0

The statistics below shows the runtime peak memory usage by compiling a
simple C program:
/bin/time -v clang -target $TARGET -O3 -c test.c

  int test(int a) {
    return a * 3;
  }
Target        Before(kbytes)    After(kbytes)    Change(%)
wasm64        110172            110088           -0.076
nvptx64       109784            109980            0.179
riscv64       114020            113656           -0.319
avr           110352            110068           -0.257
ppc64         112612            112476           -0.120
mips64        113588            113668            0.070
systemz       110860            110760           -0.090
aarch64       113704            113432           -0.239
msp430        110284            110200           -0.076
loongarch64   111052            110756           -0.267
xcore         108340            108020           -0.295
bpf           110620            110708            0.080
ve            110960            110920           -0.036
lanai         110180            109960           -0.200
x86_64        113640            113304           -0.296
arm64         113540            113172           -0.324
hexagon       114620            114684            0.056
sparc         110412            110136           -0.250
amdgcn        118164            117144           -0.863
r600          111200            110508           -0.622

@llvmbot llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label Jul 19, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 19, 2024

@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-llvm-selectiondag

Author: Brandon Wu (4vtomat)

Changes

Currently 208 out of 256 MVTs are used, it will be run out soon, so
ultimately we need to extend the original MVT::SimpleValueType from
uint8_t to uint16_t to accomodate more types.
The MatcherTable uses unsigned char for encoding the matcher code,
so the extended MVTs are no longer fit into the table, thus we need to
use VBR to encode them as we do on others that are wider than 8 bits.

The statistics below shows the difference of the generated matcher table
file in terms of "number of characters":
Table Before After Change(%)
WebAssemblyGenDAGISel.inc 393561 379244 -3.64
NVPTXGenDAGISel.inc 2559591 2430267 -5.05
RISCVGenDAGISel.inc 34077228 33338013 -2.17
AVRGenDAGISel.inc 50151 47965 -4.36
PPCGenDAGISel.inc 2504435 2392041 -4.49
MipsGenDAGISel.inc 778353 751718 -3.42
SystemZGenDAGISel.inc 896893 863890 -3.68
AArch64GenDAGISel.inc 8107608 7841939 -3.28
MSP430GenDAGISel.inc 135129 130357 -3.53
LoongArchGenDAGISel.inc 1270101 1220317 -3.92
XCoreGenDAGISel.inc 61883 60740 -1.85
BPFGenDAGISel.inc 70406 69036 -1.95
VEGenDAGISel.inc 939901 909025 -3.29
LanaiGenDAGISel.inc 39537 38571 -2.44
X86GenDAGISel.inc 10144147 9789910 -3.49
ARMGenDAGISel.inc 2538230 2431278 -4.21
HexagonGenDAGISel.inc 2553975 2470275 -3.28
SparcGenDAGISel.inc 96671 93963 -2.80
AMDGPUGenDAGISel.inc 8546226 8391452 -1.81
R600GenDAGISel.inc 397075 390725 -1.60

The statistics below shows the runtime peak memory usage by compiling a
simple C program:
/bin/time -v clang -target $TARGET -O3 -c test.c

  int test(int a) {
    return a * 3;
  }

Target Before(kbytes) After(kbytes) Change(%)
wasm64 109708 108748 -0.88
nvptx64 109892 108616 -1.16
riscv64 113492 112624 -0.76
avr 110440 109080 -1.23
ppc64 112384 111676 -0.63
mips64 113524 112568 -0.84
systemz 110864 109964 -0.81
aarch64 113736 112484 -1.10
msp430 110136 108956 -1.07
loongarch64 110784 109784 -0.90
xcore 108004 106776 -1.14
bpf 110612 109520 -0.99
ve 110976 109756 -1.10
lanai 109880 108620 -1.15
x86_64 113652 112540 -0.98
arm64 113576 112588 -0.87
hexagon 115264 113556 -1.48
sparc 110336 108792 -1.40
amdgcn 117248 116088 -0.99
r600 111060 109788 -1.15


Full diff: https://github.com/llvm/llvm-project/pull/99657.diff

6 Files Affected:

  • (modified) llvm/include/llvm/CodeGenTypes/MachineValueType.h (+1-1)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp (+50-19)
  • (modified) llvm/test/TableGen/dag-isel-regclass-emit-enum.td (+2-2)
  • (modified) llvm/utils/TableGen/Common/CodeGenDAGPatterns.h (+4-4)
  • (modified) llvm/utils/TableGen/DAGISelMatcherEmitter.cpp (+40-26)
  • (modified) llvm/utils/TableGen/VTEmitter.cpp (+2-2)
diff --git a/llvm/include/llvm/CodeGenTypes/MachineValueType.h b/llvm/include/llvm/CodeGenTypes/MachineValueType.h
index 7e7608c34fb70..556531c3147e5 100644
--- a/llvm/include/llvm/CodeGenTypes/MachineValueType.h
+++ b/llvm/include/llvm/CodeGenTypes/MachineValueType.h
@@ -33,7 +33,7 @@ namespace llvm {
   /// type can be represented by an MVT.
   class MVT {
   public:
-    enum SimpleValueType : uint8_t {
+    enum SimpleValueType : uint16_t {
       // Simple value types that aren't explicitly part of this enumeration
       // are considered extended value types.
       INVALID_SIMPLE_VALUE_TYPE = 0,
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
index df3d207d85d35..63612ffdc9397 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
@@ -2896,8 +2896,10 @@ CheckChild2CondCode(const unsigned char *MatcherTable, unsigned &MatcherIndex,
 LLVM_ATTRIBUTE_ALWAYS_INLINE static bool
 CheckValueType(const unsigned char *MatcherTable, unsigned &MatcherIndex,
                SDValue N, const TargetLowering *TLI, const DataLayout &DL) {
-  MVT::SimpleValueType VT =
-      static_cast<MVT::SimpleValueType>(MatcherTable[MatcherIndex++]);
+  unsigned SimpleVT = MatcherTable[MatcherIndex++];
+  if (SimpleVT & 128)
+    SimpleVT = GetVBR(SimpleVT, MatcherTable, MatcherIndex);
+  MVT::SimpleValueType VT = static_cast<MVT::SimpleValueType>(SimpleVT);
   if (cast<VTSDNode>(N)->getVT() == VT)
     return true;
 
@@ -3027,7 +3029,10 @@ static unsigned IsPredicateKnownToFail(const unsigned char *Table,
       VT = MVT::i64;
       break;
     default:
-      VT = static_cast<MVT::SimpleValueType>(Table[Index++]);
+      unsigned SimpleVT = Table[Index++];
+      if (SimpleVT & 128)
+        SimpleVT = GetVBR(SimpleVT, Table, Index);
+      VT = static_cast<MVT::SimpleValueType>(SimpleVT);
       break;
     }
     Result = !::CheckType(VT, N, SDISel.TLI, SDISel.CurDAG->getDataLayout());
@@ -3035,7 +3040,10 @@ static unsigned IsPredicateKnownToFail(const unsigned char *Table,
   }
   case SelectionDAGISel::OPC_CheckTypeRes: {
     unsigned Res = Table[Index++];
-    Result = !::CheckType(static_cast<MVT::SimpleValueType>(Table[Index++]),
+    unsigned SimpleVT = Table[Index++];
+    if (SimpleVT & 128)
+      SimpleVT = GetVBR(SimpleVT, Table, Index);
+    Result = !::CheckType(static_cast<MVT::SimpleValueType>(SimpleVT),
                           N.getValue(Res), SDISel.TLI,
                           SDISel.CurDAG->getDataLayout());
     return Index;
@@ -3075,7 +3083,10 @@ static unsigned IsPredicateKnownToFail(const unsigned char *Table,
       VT = MVT::i64;
       ChildNo = Opcode - SelectionDAGISel::OPC_CheckChild0TypeI64;
     } else {
-      VT = static_cast<MVT::SimpleValueType>(Table[Index++]);
+      unsigned SimpleVT = Table[Index++];
+      if (SimpleVT & 128)
+        SimpleVT = GetVBR(SimpleVT, Table, Index);
+      VT = static_cast<MVT::SimpleValueType>(SimpleVT);
       ChildNo = Opcode - SelectionDAGISel::OPC_CheckChild0Type;
     }
     Result = !::CheckChildType(VT, N, SDISel.TLI,
@@ -3579,7 +3590,10 @@ void SelectionDAGISel::SelectCodeCommon(SDNode *NodeToMatch,
         VT = MVT::i64;
         break;
       default:
-        VT = static_cast<MVT::SimpleValueType>(MatcherTable[MatcherIndex++]);
+        unsigned SimpleVT = MatcherTable[MatcherIndex++];
+        if (SimpleVT & 128)
+          SimpleVT = GetVBR(SimpleVT, MatcherTable, MatcherIndex);
+        VT = static_cast<MVT::SimpleValueType>(SimpleVT);
         break;
       }
       if (!::CheckType(VT, N, TLI, CurDAG->getDataLayout()))
@@ -3588,9 +3602,11 @@ void SelectionDAGISel::SelectCodeCommon(SDNode *NodeToMatch,
 
     case OPC_CheckTypeRes: {
       unsigned Res = MatcherTable[MatcherIndex++];
-      if (!::CheckType(
-              static_cast<MVT::SimpleValueType>(MatcherTable[MatcherIndex++]),
-              N.getValue(Res), TLI, CurDAG->getDataLayout()))
+      unsigned SimpleVT = MatcherTable[MatcherIndex++];
+      if (SimpleVT & 128)
+        SimpleVT = GetVBR(SimpleVT, MatcherTable, MatcherIndex);
+      if (!::CheckType(static_cast<MVT::SimpleValueType>(SimpleVT),
+                       N.getValue(Res), TLI, CurDAG->getDataLayout()))
         break;
       continue;
     }
@@ -3629,7 +3645,7 @@ void SelectionDAGISel::SelectCodeCommon(SDNode *NodeToMatch,
     case OPC_SwitchType: {
       MVT CurNodeVT = N.getSimpleValueType();
       unsigned SwitchStart = MatcherIndex-1; (void)SwitchStart;
-      unsigned CaseSize;
+      unsigned CaseSize, CaseSimpleVT;
       while (true) {
         // Get the size of this case.
         CaseSize = MatcherTable[MatcherIndex++];
@@ -3637,8 +3653,10 @@ void SelectionDAGISel::SelectCodeCommon(SDNode *NodeToMatch,
           CaseSize = GetVBR(CaseSize, MatcherTable, MatcherIndex);
         if (CaseSize == 0) break;
 
-        MVT CaseVT =
-            static_cast<MVT::SimpleValueType>(MatcherTable[MatcherIndex++]);
+        CaseSimpleVT = MatcherTable[MatcherIndex++];
+        if (CaseSimpleVT & 128)
+          CaseSimpleVT = GetVBR(CaseSimpleVT, MatcherTable, MatcherIndex);
+        MVT CaseVT = static_cast<MVT::SimpleValueType>(CaseSimpleVT);
         if (CaseVT == MVT::iPTR)
           CaseVT = TLI->getPointerTy(CurDAG->getDataLayout());
 
@@ -3694,7 +3712,10 @@ void SelectionDAGISel::SelectCodeCommon(SDNode *NodeToMatch,
         VT = MVT::i64;
         ChildNo = Opcode - SelectionDAGISel::OPC_CheckChild0TypeI64;
       } else {
-        VT = static_cast<MVT::SimpleValueType>(MatcherTable[MatcherIndex++]);
+        unsigned SimpleVT = MatcherTable[MatcherIndex++];
+        if (SimpleVT & 128)
+          SimpleVT = GetVBR(SimpleVT, MatcherTable, MatcherIndex);
+        VT = static_cast<MVT::SimpleValueType>(SimpleVT);
         ChildNo = Opcode - SelectionDAGISel::OPC_CheckChild0Type;
       }
       if (!::CheckChildType(VT, N, TLI, CurDAG->getDataLayout(), ChildNo))
@@ -3788,7 +3809,10 @@ void SelectionDAGISel::SelectCodeCommon(SDNode *NodeToMatch,
         VT = MVT::i64;
         break;
       default:
-        VT = static_cast<MVT::SimpleValueType>(MatcherTable[MatcherIndex++]);
+        unsigned SimpleVT = MatcherTable[MatcherIndex++];
+        if (SimpleVT & 128)
+          SimpleVT = GetVBR(SimpleVT, MatcherTable, MatcherIndex);
+        VT = static_cast<MVT::SimpleValueType>(SimpleVT);
         break;
       }
       int64_t Val = MatcherTable[MatcherIndex++];
@@ -3812,7 +3836,10 @@ void SelectionDAGISel::SelectCodeCommon(SDNode *NodeToMatch,
         VT = MVT::i64;
         break;
       default:
-        VT = static_cast<MVT::SimpleValueType>(MatcherTable[MatcherIndex++]);
+        unsigned SimpleVT = MatcherTable[MatcherIndex++];
+        if (SimpleVT & 128)
+          SimpleVT = GetVBR(SimpleVT, MatcherTable, MatcherIndex);
+        VT = static_cast<MVT::SimpleValueType>(SimpleVT);
         break;
       }
       unsigned RegNo = MatcherTable[MatcherIndex++];
@@ -3824,8 +3851,10 @@ void SelectionDAGISel::SelectCodeCommon(SDNode *NodeToMatch,
       // For targets w/ more than 256 register names, the register enum
       // values are stored in two bytes in the matcher table (just like
       // opcodes).
-      MVT::SimpleValueType VT =
-          static_cast<MVT::SimpleValueType>(MatcherTable[MatcherIndex++]);
+      unsigned SimpleVT = MatcherTable[MatcherIndex++];
+      if (SimpleVT & 128)
+        SimpleVT = GetVBR(SimpleVT, MatcherTable, MatcherIndex);
+      MVT::SimpleValueType VT = static_cast<MVT::SimpleValueType>(SimpleVT);
       unsigned RegNo = MatcherTable[MatcherIndex++];
       RegNo |= MatcherTable[MatcherIndex++] << 8;
       RecordedNodes.push_back(std::pair<SDValue, SDNode*>(
@@ -4063,8 +4092,10 @@ void SelectionDAGISel::SelectCodeCommon(SDNode *NodeToMatch,
         NumVTs = MatcherTable[MatcherIndex++];
       SmallVector<EVT, 4> VTs;
       for (unsigned i = 0; i != NumVTs; ++i) {
-        MVT::SimpleValueType VT =
-            static_cast<MVT::SimpleValueType>(MatcherTable[MatcherIndex++]);
+        unsigned SimpleVT = MatcherTable[MatcherIndex++];
+        if (SimpleVT & 128)
+          SimpleVT = GetVBR(SimpleVT, MatcherTable, MatcherIndex);
+        MVT::SimpleValueType VT = static_cast<MVT::SimpleValueType>(SimpleVT);
         if (VT == MVT::iPTR)
           VT = TLI->getPointerTy(CurDAG->getDataLayout()).SimpleTy;
         VTs.push_back(VT);
diff --git a/llvm/test/TableGen/dag-isel-regclass-emit-enum.td b/llvm/test/TableGen/dag-isel-regclass-emit-enum.td
index ba60df2bbcf6e..0fd7fa02f064c 100644
--- a/llvm/test/TableGen/dag-isel-regclass-emit-enum.td
+++ b/llvm/test/TableGen/dag-isel-regclass-emit-enum.td
@@ -27,13 +27,13 @@ def GPRAbove127 : RegisterClass<"TestTarget", [i32], 32,
 // CHECK-NEXT: OPC_CheckChild1Integer, 0,
 // CHECK-NEXT: OPC_EmitInteger32, 0|128,2/*256*/,
 // CHECK-NEXT: OPC_MorphNodeTo1None, TARGET_VAL(TargetOpcode::COPY_TO_REGCLASS),
-// CHECK-NEXT:     MVT::i32, 2/*#Ops*/, 1, 0,
+// CHECK-NEXT:     7, 2/*#Ops*/, 1, 0,
 def : Pat<(i32 (add i32:$src, (i32 0))),
           (COPY_TO_REGCLASS GPRAbove127, GPR0:$src)>;
 
 // CHECK:      OPC_CheckChild1Integer, 2,
 // CHECK-NEXT: OPC_EmitStringInteger32, TestNamespace::GPR127RegClassID,
 // CHECK-NEXT: OPC_MorphNodeTo1None, TARGET_VAL(TargetOpcode::COPY_TO_REGCLASS),
-// CHECK-NEXT:     MVT::i32, 2/*#Ops*/, 1, 0,
+// CHECK-NEXT:     7, 2/*#Ops*/, 1, 0,
 def : Pat<(i32 (add i32:$src, (i32 1))),
           (COPY_TO_REGCLASS GPR127, GPR0:$src)>;
diff --git a/llvm/utils/TableGen/Common/CodeGenDAGPatterns.h b/llvm/utils/TableGen/Common/CodeGenDAGPatterns.h
index 1f4d45d81fd33..bac213a356d84 100644
--- a/llvm/utils/TableGen/Common/CodeGenDAGPatterns.h
+++ b/llvm/utils/TableGen/Common/CodeGenDAGPatterns.h
@@ -48,15 +48,15 @@ class CodeGenDAGPatterns;
 using TreePatternNodePtr = IntrusiveRefCntPtr<TreePatternNode>;
 
 /// This represents a set of MVTs. Since the underlying type for the MVT
-/// is uint8_t, there are at most 256 values. To reduce the number of memory
+/// is uint16_t, there are at most 65536 values. To reduce the number of memory
 /// allocations and deallocations, represent the set as a sequence of bits.
 /// To reduce the allocations even further, make MachineValueTypeSet own
 /// the storage and use std::array as the bit container.
 struct MachineValueTypeSet {
   static_assert(std::is_same<std::underlying_type_t<MVT::SimpleValueType>,
-                             uint8_t>::value,
-                "Change uint8_t here to the SimpleValueType's type");
-  static unsigned constexpr Capacity = std::numeric_limits<uint8_t>::max() + 1;
+                             uint16_t>::value,
+                "Change uint16_t here to the SimpleValueType's type");
+  static unsigned constexpr Capacity = std::numeric_limits<uint16_t>::max() + 1;
   using WordType = uint64_t;
   static unsigned constexpr WordWidth = CHAR_BIT * sizeof(WordType);
   static unsigned constexpr NumWords = Capacity / WordWidth;
diff --git a/llvm/utils/TableGen/DAGISelMatcherEmitter.cpp b/llvm/utils/TableGen/DAGISelMatcherEmitter.cpp
index 229245ff40504..178ceb1443335 100644
--- a/llvm/utils/TableGen/DAGISelMatcherEmitter.cpp
+++ b/llvm/utils/TableGen/DAGISelMatcherEmitter.cpp
@@ -339,7 +339,8 @@ unsigned MatcherTableEmitter::SizeMatcher(Matcher *N, raw_ostream &OS) {
         Size += 2; // Count the child's opcode.
       } else {
         Child = cast<SwitchTypeMatcher>(N)->getCaseMatcher(i);
-        ++Size; // Count the child's type.
+        Size += GetVBRSize(cast<SwitchTypeMatcher>(N)->getCaseType(
+            i)); // Count the child's type.
       }
       const unsigned ChildSize = SizeMatcherList(Child, OS);
       assert(ChildSize != 0 && "Matcher cannot have child of size 0");
@@ -599,7 +600,8 @@ unsigned MatcherTableEmitter::EmitMatcher(const Matcher *N,
         IdxSize = 2; // size of opcode in table is 2 bytes.
       } else {
         Child = cast<SwitchTypeMatcher>(N)->getCaseMatcher(i);
-        IdxSize = 1; // size of type in table is 1 byte.
+        IdxSize = GetVBRSize(cast<SwitchTypeMatcher>(N)->getCaseType(
+            i)); // size of type in table is sizeof(MVT) byte.
       }
 
       if (i != 0) {
@@ -616,7 +618,8 @@ unsigned MatcherTableEmitter::EmitMatcher(const Matcher *N,
       if (const SwitchOpcodeMatcher *SOM = dyn_cast<SwitchOpcodeMatcher>(N))
         OS << "TARGET_VAL(" << SOM->getCaseOpcode(i).getEnumName() << "),";
       else
-        OS << getEnumName(cast<SwitchTypeMatcher>(N)->getCaseType(i)) << ',';
+        EmitVBRValue(cast<SwitchTypeMatcher>(N)->getCaseType(i),
+                     OS); // size of type in table is sizeof(MVT) byte.
       if (!OmitComments)
         OS << "// ->" << CurrentIdx + ChildSize;
       OS << '\n';
@@ -639,7 +642,7 @@ unsigned MatcherTableEmitter::EmitMatcher(const Matcher *N,
     return CurrentIdx - StartIdx + 1;
   }
 
-  case Matcher::CheckType:
+  case Matcher::CheckType: {
     if (cast<CheckTypeMatcher>(N)->getResNo() == 0) {
       MVT::SimpleValueType VT = cast<CheckTypeMatcher>(N)->getType();
       switch (VT) {
@@ -648,13 +651,17 @@ unsigned MatcherTableEmitter::EmitMatcher(const Matcher *N,
         OS << "OPC_CheckTypeI" << MVT(VT).getSizeInBits() << ",\n";
         return 1;
       default:
-        OS << "OPC_CheckType, " << getEnumName(VT) << ",\n";
-        return 2;
+        OS << "OPC_CheckType, ";
+        unsigned NumBytes = EmitVBRValue(VT, OS);
+        OS << "\n";
+        return NumBytes + 1;
       }
     }
-    OS << "OPC_CheckTypeRes, " << cast<CheckTypeMatcher>(N)->getResNo() << ", "
-       << getEnumName(cast<CheckTypeMatcher>(N)->getType()) << ",\n";
-    return 3;
+    OS << "OPC_CheckTypeRes, " << cast<CheckTypeMatcher>(N)->getResNo() << ", ";
+    unsigned NumBytes = EmitVBRValue(cast<CheckTypeMatcher>(N)->getType(), OS);
+    OS << "\n";
+    return NumBytes + 2;
+  }
 
   case Matcher::CheckChildType: {
     MVT::SimpleValueType VT = cast<CheckChildTypeMatcher>(N)->getType();
@@ -666,8 +673,10 @@ unsigned MatcherTableEmitter::EmitMatcher(const Matcher *N,
       return 1;
     default:
       OS << "OPC_CheckChild" << cast<CheckChildTypeMatcher>(N)->getChildNo()
-         << "Type, " << getEnumName(VT) << ",\n";
-      return 2;
+         << "Type, ";
+      unsigned NumBytes = EmitVBRValue(VT, OS);
+      OS << "\n";
+      return NumBytes + 1;
     }
   }
 
@@ -696,10 +705,13 @@ unsigned MatcherTableEmitter::EmitMatcher(const Matcher *N,
        << cast<CheckChild2CondCodeMatcher>(N)->getCondCodeName() << ",\n";
     return 2;
 
-  case Matcher::CheckValueType:
-    OS << "OPC_CheckValueType, "
-       << getEnumName(cast<CheckValueTypeMatcher>(N)->getVT()) << ",\n";
-    return 2;
+  case Matcher::CheckValueType: {
+    OS << "OPC_CheckValueType, ";
+    unsigned NumBytes =
+        EmitVBRValue(cast<CheckValueTypeMatcher>(N)->getVT(), OS);
+    OS << "\n";
+    return NumBytes + 1;
+  }
 
   case Matcher::CheckComplexPat: {
     const CheckComplexPatMatcher *CCPM = cast<CheckComplexPatMatcher>(N);
@@ -766,8 +778,8 @@ unsigned MatcherTableEmitter::EmitMatcher(const Matcher *N,
       OS << "OPC_EmitInteger" << MVT(VT).getSizeInBits() << ", ";
       break;
     default:
-      OpBytes = 2;
-      OS << "OPC_EmitInteger, " << getEnumName(VT) << ", ";
+      OS << "OPC_EmitInteger, ";
+      OpBytes = EmitVBRValue(VT, OS) + 1;
       break;
     }
     unsigned Bytes = OpBytes + EmitSignedVBRValue(Val, OS);
@@ -785,8 +797,8 @@ unsigned MatcherTableEmitter::EmitMatcher(const Matcher *N,
       OS << "OPC_EmitStringInteger" << MVT(VT).getSizeInBits() << ", ";
       break;
     default:
-      OpBytes = 2;
-      OS << "OPC_EmitStringInteger, " << getEnumName(VT) << ", ";
+      OS << "OPC_EmitStringInteger, ";
+      OpBytes = EmitVBRValue(VT, OS) + 1;
       break;
     }
     OS << Val << ",\n";
@@ -797,14 +809,15 @@ unsigned MatcherTableEmitter::EmitMatcher(const Matcher *N,
     const EmitRegisterMatcher *Matcher = cast<EmitRegisterMatcher>(N);
     const CodeGenRegister *Reg = Matcher->getReg();
     MVT::SimpleValueType VT = Matcher->getVT();
+    unsigned OpBytes;
     // If the enum value of the register is larger than one byte can handle,
     // use EmitRegister2.
     if (Reg && Reg->EnumValue > 255) {
-      OS << "OPC_EmitRegister2, " << getEnumName(VT) << ", ";
+      OS << "OPC_EmitRegister2, ";
+      OpBytes = EmitVBRValue(VT, OS);
       OS << "TARGET_VAL(" << getQualifiedName(Reg->TheDef) << "),\n";
-      return 4;
+      return OpBytes + 3;
     }
-    unsigned OpBytes;
     switch (VT) {
     case MVT::i32:
     case MVT::i64:
@@ -812,8 +825,8 @@ unsigned MatcherTableEmitter::EmitMatcher(const Matcher *N,
       OS << "OPC_EmitRegisterI" << MVT(VT).getSizeInBits() << ", ";
       break;
     default:
-      OpBytes = 2;
-      OS << "OPC_EmitRegister, " << getEnumName(VT) << ", ";
+      OS << "OPC_EmitRegister, ";
+      OpBytes = EmitVBRValue(VT, OS) + 1;
       break;
     }
     if (Reg) {
@@ -958,8 +971,9 @@ unsigned MatcherTableEmitter::EmitMatcher(const Matcher *N,
         OS << "/*#VTs*/";
       OS << ", ";
     }
+    unsigned NumTypeBytes = 0;
     for (unsigned i = 0, e = EN->getNumVTs(); i != e; ++i)
-      OS << getEnumName(EN->getVT(i)) << ", ";
+      NumTypeBytes += EmitVBRValue(EN->getVT(i), OS);
 
     OS << EN->getNumOperands();
     if (!OmitComments)
@@ -992,7 +1006,7 @@ unsigned MatcherTableEmitter::EmitMatcher(const Matcher *N,
     } else
       OS << '\n';
 
-    return 4 + !CompressVTs + !CompressNodeInfo + EN->getNumVTs() +
+    return 4 + !CompressVTs + !CompressNodeInfo + NumTypeBytes +
            NumOperandBytes + NumCoveredBytes;
   }
   case Matcher::CompleteMatch: {
diff --git a/llvm/utils/TableGen/VTEmitter.cpp b/llvm/utils/TableGen/VTEmitter.cpp
index 851a525cf48c0..77d3514b4c22c 100644
--- a/llvm/utils/TableGen/VTEmitter.cpp
+++ b/llvm/utils/TableGen/VTEmitter.cpp
@@ -79,12 +79,12 @@ static void VTtoGetLLVMTyString(raw_ostream &OS, const Record *VT) {
 void VTEmitter::run(raw_ostream &OS) {
   emitSourceFileHeader("ValueTypes Source Fragment", OS, Records);
 
-  std::array<const Record *, 256> VTsByNumber = {};
+  std::array<const Record *, 65536> VTsByNumber = {};
   auto ValueTypes = Records.getAllDerivedDefinitions("ValueType");
   for (auto *VT : ValueTypes) {
     auto Number = VT->getValueAsInt("Value");
     assert(0 <= Number && Number < (int)VTsByNumber.size() &&
-           "ValueType should be uint8_t");
+           "ValueType should be uint16_t");
     assert(!VTsByNumber[Number] && "Duplicate ValueType");
     VTsByNumber[Number] = VT;
   }

@4vtomat 4vtomat requested review from arsenm, kito-cheng and topperc July 19, 2024 15:06
@topperc
Copy link
Collaborator

topperc commented Jul 19, 2024

The statistics below shows the difference of the generated matcher table
file in terms of "number of characters":

Please use the "Total Array size is X bytes" that appears in every file.

@4vtomat
Copy link
Member Author

4vtomat commented Jul 19, 2024

The statistics below shows the difference of the generated matcher table
file in terms of "number of characters":

Please use the "Total Array size is X bytes" that appears in every file.

Oh actually it's the "total size of the file", I just notice that there's "total array size" in each file, let me change it.

@wangpc-pp
Copy link
Contributor

Why are there decreases of total bytes and memory usage? I suppose it should increase?
And please print a comment about the real MVT name since we miss it after VBR encoding.

@4vtomat 4vtomat force-pushed the extend_mvt_to_16_bits branch from 7db6e8c to b470e33 Compare July 19, 2024 16:34
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Jul 19, 2024
@4vtomat 4vtomat force-pushed the extend_mvt_to_16_bits branch from b470e33 to f944eef Compare July 19, 2024 16:36
@4vtomat
Copy link
Member Author

4vtomat commented Jul 19, 2024

Why are there decreases of total bytes and memory usage? I suppose it should increase?
And please print a comment about the real MVT name since we miss it after VBR encoding.

Total bytes of array actually increases especially for RISCV. For runtime memory, I think it's maybe because SelectionDAG only exists for very short period of time and the memory usage of SelectionDAG is not the bottleneck in the first place?

Currently 208 out of 256 MVTs are used, it will be run out soon, so
ultimately we need to extend the original `MVT::SimpleValueType` from
`uint8_t` to `uint16_t` to accomodate more types.
The `MatcherTable` uses `unsigned char` for encoding the matcher code,
so the extended MVTs are no longer fit into the table, thus we need to
use VBR to encode them as we do on others that are wider than 8 bits.

The statistics below shows the difference of "Total Array size" of the
matcher table that appears in every files:
Table                       Before     After     Change(%)
WebAssemblyGenDAGISel.inc   23576      23775     0.844
NVPTXGenDAGISel.inc         173498     173498    0
RISCVGenDAGISel.inc         2179121    2369929   8.756
AVRGenDAGISel.inc           2754       2754      0
PPCGenDAGISel.inc           163315     163617    0.185
MipsGenDAGISel.inc          47280      47447     0.353
SystemZGenDAGISel.inc       56243      56461     0.388
AArch64GenDAGISel.inc       467893     487830    4.261
MSP430GenDAGISel.inc        8069       8069      0
LoongArchGenDAGISel.inc     78928      79131     0.257
XCoreGenDAGISel.inc         3432       3432      0
BPFGenDAGISel.inc           3733       3733      0
VEGenDAGISel.inc            65174      66456     1.967
LanaiGenDAGISel.inc         2067       2067      0
X86GenDAGISel.inc           628787     636987    1.304
ARMGenDAGISel.inc           170968     171036    0.040
HexagonGenDAGISel.inc       155764     155764    0
SparcGenDAGISel.inc         5762       5798      0.625
AMDGPUGenDAGISel.inc        504356     504463    0.021
R600GenDAGISel.inc          29785      29785     0

The statistics below shows the runtime peak memory usage by compiling a
simple C program:
`/bin/time -v clang -target $TARGET -O3 -c test.c`
```
  int test(int a) {
    return a * 3;
  }
```
Target        Before(kbytes)    After(kbytes)    Change(%)
wasm64        110172            110088           -0.076
nvptx64       109784            109980            0.179
riscv64       114020            113656           -0.319
avr           110352            110068           -0.257
ppc64         112612            112476           -0.120
mips64        113588            113668            0.070
systemz       110860            110760           -0.090
aarch64       113704            113432           -0.239
msp430        110284            110200           -0.076
loongarch64   111052            110756           -0.267
xcore         108340            108020           -0.295
bpf           110620            110708            0.080
ve            110960            110920           -0.036
lanai         110180            109960           -0.200
x86_64        113640            113304           -0.296
arm64         113540            113172           -0.324
hexagon       114620            114684            0.056
sparc         110412            110136           -0.250
amdgcn        118164            117144           -0.863
r600          111200            110508           -0.622
@4vtomat 4vtomat force-pushed the extend_mvt_to_16_bits branch from f944eef to 177d282 Compare July 19, 2024 18:34
@clementval clementval removed flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Jul 19, 2024
@wangpc-pp
Copy link
Contributor

I think the community needs to be notified by this change (the impact is big), please file a RFC on the discourse forum. :-)

@4vtomat
Copy link
Member Author

4vtomat commented Jul 22, 2024

I think the community needs to be notified by this change (the impact is big), please file a RFC on the discourse forum. :-)

Sure, I'm cooking on this!

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@4vtomat Please can you get a https://llvm-compile-time-tracker.com/ report on the effect on compile time as well please?

@wangpc-pp
Copy link
Contributor

wangpc-pp commented Jul 22, 2024

Another NOTE: after this patch, we should reorder the MVT values and move top-128-uses MVTs ahead to reduce MatcherTable size.

@4vtomat
Copy link
Member Author

4vtomat commented Jul 22, 2024

@4vtomat Please can you get a https://llvm-compile-time-tracker.com/ report on the effect on compile time as well please?

Sure, here it is: https://llvm-compile-time-tracker.com/compare.php?from=4b9fab591916eec9fd1942f37afe3b137b564089&to=177d28247efe5a4d59a8d8150b4daf01e4f57d74&stat=wall-time.

@4vtomat
Copy link
Member Author

4vtomat commented Jul 22, 2024

Another NOTE: after this patch, we should reorder the MVT values and move top-128-uses MVTs ahead to reduce MatcherTable size.

You are right, maybe we need to collect some data and see what the order should be.

@topperc
Copy link
Collaborator

topperc commented Jul 23, 2024

Should we move these to the end of the 16-bit space which will change the sizes you measured. Or maybe put them near 14-bits so they stay as 2 byte VBRs?

def token      : ValueType<0, 248>;  // TokenTy                                  
def MetadataVT : ValueType<0, 249> { // Metadata                                 
  let LLVMName = "Metadata";                                                     
}                                                                                
                                                                                 
// Pseudo valuetype mapped to the current pointer size to any address space.     
// Should only be used in TableGen.                                              
def iPTRAny    : VTAny<250>;                                                     
                                                                                 
// Pseudo valuetype to represent "vector of any size"                            
// Should only be used in TableGen.                                              
def vAny       : VTAny<251>;                                                     
                                                                                 
// Pseudo valuetype to represent "float of any format"                           
// Should only be used in TableGen.                                              
def fAny       : VTAny<252>;                                                     
                                                                                 
// Pseudo valuetype to represent "integer of any bit width"                      
// Should only be used in TableGen.                                              
def iAny       : VTAny<253>;                                                     
                                                                                 
// Pseudo valuetype mapped to the current pointer size.                          
// Should only be used in TableGen.                                              
def iPTR       : ValueType<0, 254>;                                              
                                                                                 
// Pseudo valuetype to represent "any type of any size".                         
// Should only be used in TableGen.                                              
def Any        : VTAny<255>;

@topperc
Copy link
Collaborator

topperc commented Jul 23, 2024

Another NOTE: after this patch, we should reorder the MVT values and move top-128-uses MVTs ahead to reduce MatcherTable size.

Aren't they grouped in ranges?

@4vtomat
Copy link
Member Author

4vtomat commented Jul 23, 2024

Should we move these to the end of the 16-bit space which will change the sizes you measured. Or maybe put them near 14-bits so they stay as 2 byte VBRs?

def token      : ValueType<0, 248>;  // TokenTy                                  
def MetadataVT : ValueType<0, 249> { // Metadata                                 
  let LLVMName = "Metadata";                                                     
}                                                                                
                                                                                 
// Pseudo valuetype mapped to the current pointer size to any address space.     
// Should only be used in TableGen.                                              
def iPTRAny    : VTAny<250>;                                                     
                                                                                 
// Pseudo valuetype to represent "vector of any size"                            
// Should only be used in TableGen.                                              
def vAny       : VTAny<251>;                                                     
                                                                                 
// Pseudo valuetype to represent "float of any format"                           
// Should only be used in TableGen.                                              
def fAny       : VTAny<252>;                                                     
                                                                                 
// Pseudo valuetype to represent "integer of any bit width"                      
// Should only be used in TableGen.                                              
def iAny       : VTAny<253>;                                                     
                                                                                 
// Pseudo valuetype mapped to the current pointer size.                          
// Should only be used in TableGen.                                              
def iPTR       : ValueType<0, 254>;                                              
                                                                                 
// Pseudo valuetype to represent "any type of any size".                         
// Should only be used in TableGen.                                              
def Any        : VTAny<255>;

Yea, I guess I can move it to align 2^14 so that the size won't change.

@4vtomat
Copy link
Member Author

4vtomat commented Jul 23, 2024

Another NOTE: after this patch, we should reorder the MVT values and move top-128-uses MVTs ahead to reduce MatcherTable size.

Aren't they grouped in ranges?

I think @wangpc-pp means reorder the group?

@topperc
Copy link
Collaborator

topperc commented Jul 23, 2024

Why are there decreases of total bytes and memory usage? I suppose it should increase?
And please print a comment about the real MVT name since we miss it after VBR encoding.

Total bytes of array actually increases especially for RISCV. For runtime memory, I think it's maybe because SelectionDAG only exists for very short period of time and the memory usage of SelectionDAG is not the bottleneck in the first place?

I don't think patch will have much if any effect on the runtime memory usage SelectionDAG. SelectionDAG runtime memory usage is more directly affected by MVT::VALUETYPE_SIZE which is not changed by this patch.

@topperc
Copy link
Collaborator

topperc commented Jul 23, 2024

Another NOTE: after this patch, we should reorder the MVT values and move top-128-uses MVTs ahead to reduce MatcherTable size.

Aren't they grouped in ranges?

I think @wangpc-pp means reorder the group?

Yes, but I don't think that will be easy. There are quite a few places that loop over all integer types or all vector types by assuming they are in continguous ranges.

@4vtomat
Copy link
Member Author

4vtomat commented Jul 23, 2024

Why are there decreases of total bytes and memory usage? I suppose it should increase?
And please print a comment about the real MVT name since we miss it after VBR encoding.

Total bytes of array actually increases especially for RISCV. For runtime memory, I think it's maybe because SelectionDAG only exists for very short period of time and the memory usage of SelectionDAG is not the bottleneck in the first place?

I don't think patch will have much if any effect on the runtime memory usage SelectionDAG. SelectionDAG runtime memory usage is more directly affected by MVT::VALUETYPE_SIZE which is not changed by this patch.

Correct, I just changed MVT::VALUETYPE_SIZE to 2^16 - 1 and measure the memory again, it increase to over 100 times bigger.

@wangpc-pp
Copy link
Contributor

Another NOTE: after this patch, we should reorder the MVT values and move top-128-uses MVTs ahead to reduce MatcherTable size.

Aren't they grouped in ranges?

I think @wangpc-pp means reorder the group?

Yes! I was thinking about reordering these groups. And further, we can reorder all MVTs one by one.

There are quite a few places that loop over all integer types or all vector types by assuming they are in continguous ranges.

This assumption is weak, we should change it. We can remove these MVT::FIRST_XXX and MVT::LAST_XXX, and generate arrays for these MVTs with same attributes.

Yes, but I don't think that will be easy.

Agree! But for better maintainability and extendibility, we should address this weakness now.

@4vtomat
Copy link
Member Author

4vtomat commented Jul 30, 2024

ping~

@@ -599,7 +600,8 @@ unsigned MatcherTableEmitter::EmitMatcher(const Matcher *N,
IdxSize = 2; // size of opcode in table is 2 bytes.
} else {
Child = cast<SwitchTypeMatcher>(N)->getCaseMatcher(i);
IdxSize = 1; // size of type in table is 1 byte.
IdxSize = GetVBRSize(cast<SwitchTypeMatcher>(N)->getCaseType(
i)); // size of type in table is sizeof(MVT) byte.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is wrong for VBR isn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be VBR(MVT).

@@ -79,12 +79,12 @@ static void VTtoGetLLVMTyString(raw_ostream &OS, const Record *VT) {
void VTEmitter::run(raw_ostream &OS) {
emitSourceFileHeader("ValueTypes Source Fragment", OS, Records);

std::array<const Record *, 256> VTsByNumber = {};
std::array<const Record *, 65536> VTsByNumber = {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16384? And should probably change to std::vector to avoid a large stack allocation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure~

Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 4b9fab591916eec9fd1942f37afe3b137b564089 72367a05a0651193be0bf56d6abe29438c914d66 --extensions cpp,h -- llvm/include/llvm/CodeGenTypes/MachineValueType.h llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp llvm/utils/TableGen/Common/CodeGenDAGPatterns.h llvm/utils/TableGen/DAGISelMatcherEmitter.cpp llvm/utils/TableGen/VTEmitter.cpp
View the diff from clang-format here.
diff --git a/llvm/utils/TableGen/DAGISelMatcherEmitter.cpp b/llvm/utils/TableGen/DAGISelMatcherEmitter.cpp
index 1b93e3d5e3..2dcf86ad93 100644
--- a/llvm/utils/TableGen/DAGISelMatcherEmitter.cpp
+++ b/llvm/utils/TableGen/DAGISelMatcherEmitter.cpp
@@ -621,8 +621,7 @@ unsigned MatcherTableEmitter::EmitMatcher(const Matcher *N,
         if (!OmitComments)
           OS << "/*" << getEnumName(cast<SwitchTypeMatcher>(N)->getCaseType(i))
              << "*/";
-        EmitVBRValue(cast<SwitchTypeMatcher>(N)->getCaseType(i),
-                     OS);
+        EmitVBRValue(cast<SwitchTypeMatcher>(N)->getCaseType(i), OS);
       }
       if (!OmitComments)
         OS << "// ->" << CurrentIdx + ChildSize;

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@4vtomat 4vtomat merged commit a4c6ebe into llvm:main Jul 31, 2024
6 of 7 checks passed
@4vtomat 4vtomat deleted the extend_mvt_to_16_bits branch July 31, 2024 17:19
@dtcxzyw
Copy link
Member

dtcxzyw commented Jul 31, 2024

This patch seems to cause llvm-tblgen to be killed due to OOM/hang :(
good commit: a847b0f
bad commit: 9fe455f
Compiler: gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0

@dtcxzyw
Copy link
Member

dtcxzyw commented Jul 31, 2024

This patch seems to cause llvm-tblgen to be killed due to OOM/hang :( good commit: a847b0f bad commit: 9fe455f Compiler: gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0

Update: Reverting this commit fixes the issue.

@topperc
Copy link
Collaborator

topperc commented Jul 31, 2024

This patch seems to cause llvm-tblgen to be killed due to OOM/hang :( good commit: a847b0f bad commit: 9fe455f Compiler: gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0

Update: Reverting this commit fixes the issue.

My best guess is its related to the increase in size of MachineValueTypeSet

@topperc
Copy link
Collaborator

topperc commented Jul 31, 2024

I'm going to pull the maximum VT down from 16384 to 512 to hopefully bring memory usage under control.

@topperc
Copy link
Collaborator

topperc commented Aug 1, 2024

@MaskRay
Copy link
Member

MaskRay commented Aug 1, 2024

This is showing a compile time regression llvm-compile-time-tracker.com/compare.php?from=9718f3dec1c366809b0c2508ba77cc5dd3cf82ec&to=a4c6ebeb20168bbedbb1a3aff9f5096416e5914a&stat=instructions:u

I reproduced huge time regression when invoking -gen-global-isel to generate RISCVGenGlobalISel.inc. The time went up from 4.8s to 16.6s

@topperc
Copy link
Collaborator

topperc commented Aug 1, 2024

This is showing a compile time regression llvm-compile-time-tracker.com/compare.php?from=9718f3dec1c366809b0c2508ba77cc5dd3cf82ec&to=a4c6ebeb20168bbedbb1a3aff9f5096416e5914a&stat=instructions:u

I reproduced huge time regression when invoking -gen-global-isel to generate RISCVGenGlobalISel.inc. The time went up from 4.8s to 16.6s

Does #101401 help?

@chapuni
Copy link
Contributor

chapuni commented Aug 1, 2024

@topperc Thanks! This was memory exhausted in my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants