Skip to content

[TableGen] Optimize intrinsic info type signature encoding #106809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 4, 2024

Conversation

jurahul
Copy link
Contributor

@jurahul jurahul commented Aug 30, 2024

Change the "fixed encoding" table used for encoding intrinsic
type signature to use 16-bit encoding as opposed to 32-bit.

This results in both space and time improvements. For space,
the total static storage size (in bytes) of this info reduces by 50%:

  • Current = 14193*4 (Fixed table) + 16058 + 3 (Long Table) = 72833
  • New size = 14193*2 (Fixed table) + 19879 + 3 (Long Table) = 48268.
  • Reduction = 50.9%

For time, with the added benchmark, we see a 7.3% speedup in
GetIntrinsicInfoTableEntries benchmark. Actual output of the
benchmark in included in the GitHub MR.

@jurahul jurahul force-pushed the reduce_iit_type_encoding_size branch 3 times, most recently from 3c97e88 to 5ccc29f Compare September 1, 2024 15:30
@jurahul
Copy link
Contributor Author

jurahul commented Sep 2, 2024

Actual benchmark runs:

Old:

~/upstream_clean$ ./build/benchmarks/GetIntrinsicInfoTableEntriesBM --benchmark_repetitions=40 --benchmark_display_aggregates_only
2024-09-01T08:16:40-07:00
Running ./build/benchmarks/GetIntrinsicInfoTableEntriesBM
Run on (20 X 2918.4 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x10)
  L1 Instruction 32 KiB (x10)
  L2 Unified 1280 KiB (x10)
  L3 Unified 24576 KiB (x1)
Load Average: 0.43, 0.29, 0.40
---------------------------------------------------------------------------------
Benchmark                                       Time             CPU   Iterations
---------------------------------------------------------------------------------
BM_GetIntrinsicInfoTableEntries_mean       511908 ns       511877 ns           40
BM_GetIntrinsicInfoTableEntries_median     510461 ns       510461 ns           40
BM_GetIntrinsicInfoTableEntries_stddev       9411 ns         9428 ns           40
BM_GetIntrinsicInfoTableEntries_cv           1.84 %          1.84 %            40


new:

~/upstream_llvm/llvm-project$ ./build/benchmarks/GetIntrinsicInfoTableEntriesBM --benchmark_repetitions=40 --benchmark_display_aggregates_only
2024-09-01T08:15:59-07:00
Running ./build/benchmarks/GetIntrinsicInfoTableEntriesBM
Run on (20 X 2918.4 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x10)
  L1 Instruction 32 KiB (x10)
  L2 Unified 1280 KiB (x10)
  L3 Unified 24576 KiB (x1)
Load Average: 0.19, 0.23, 0.39
---------------------------------------------------------------------------------
Benchmark                                       Time             CPU   Iterations
---------------------------------------------------------------------------------
BM_GetIntrinsicInfoTableEntries_mean       474298 ns       474293 ns           40
BM_GetIntrinsicInfoTableEntries_median     474271 ns       474270 ns           40
BM_GetIntrinsicInfoTableEntries_stddev       5285 ns         5287 ns           40
BM_GetIntrinsicInfoTableEntries_cv           1.11 %          1.11 %            40

Speedup = 511877 -> 474293 = 7.3%

@jurahul jurahul force-pushed the reduce_iit_type_encoding_size branch from 5ccc29f to a974951 Compare September 2, 2024 13:47
@jurahul jurahul marked this pull request as ready for review September 2, 2024 13:47
@llvmbot llvmbot added the llvm:ir label Sep 2, 2024
@llvmbot
Copy link
Member

llvmbot commented Sep 2, 2024

@llvm/pr-subscribers-llvm-ir

Author: Rahul Joshi (jurahul)

Changes

Change the "fixed encoding" table used for encoding intrinsic type signature to use 16-bit encoding as opposed to 32-bit.
This results in both space and time improvements. For space, the total static storage size of this info reduces by 50%.

Currently measured data is as follows:

  • Current size = 14193*4 (Fixed table) + 16058 + 3 (Long Encoding Table) = 72833 bytes.
  • New size = 14193*2 (Fixed table) + 19879 + 3 (Long Encoding Table) = 48268 bytes.
  • Reduction = 50.9%

For time, with the attached benchmark, we see a 7.3% speedup in GetIntrinsicInfoTableEntries benchmark. Actual output of the benchmark in included in the GitHub MR.


Full diff: https://github.com/llvm/llvm-project/pull/106809.diff

4 Files Affected:

  • (modified) llvm/benchmarks/CMakeLists.txt (+1)
  • (added) llvm/benchmarks/GetIntrinsicInfoTableEntriesBM.cpp (+31)
  • (modified) llvm/lib/IR/Function.cpp (+8-6)
  • (modified) llvm/utils/TableGen/IntrinsicEmitter.cpp (+46-32)
diff --git a/llvm/benchmarks/CMakeLists.txt b/llvm/benchmarks/CMakeLists.txt
index e3366e6f3ffe19..aa0cb777733441 100644
--- a/llvm/benchmarks/CMakeLists.txt
+++ b/llvm/benchmarks/CMakeLists.txt
@@ -6,3 +6,4 @@ add_benchmark(DummyYAML DummyYAML.cpp PARTIAL_SOURCES_INTENDED)
 add_benchmark(xxhash xxhash.cpp PARTIAL_SOURCES_INTENDED)
 add_benchmark(GetIntrinsicForClangBuiltin GetIntrinsicForClangBuiltin.cpp PARTIAL_SOURCES_INTENDED)
 add_benchmark(FormatVariadicBM FormatVariadicBM.cpp PARTIAL_SOURCES_INTENDED)
+add_benchmark(GetIntrinsicInfoTableEntriesBM GetIntrinsicInfoTableEntriesBM.cpp PARTIAL_SOURCES_INTENDED)
diff --git a/llvm/benchmarks/GetIntrinsicInfoTableEntriesBM.cpp b/llvm/benchmarks/GetIntrinsicInfoTableEntriesBM.cpp
new file mode 100644
index 00000000000000..2854bacc5ab094
--- /dev/null
+++ b/llvm/benchmarks/GetIntrinsicInfoTableEntriesBM.cpp
@@ -0,0 +1,31 @@
+//===- GetIntrinsicInfoTableEntries.cpp - IIT signature benchmark ---------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "benchmark/benchmark.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/IR/Intrinsics.h"
+#include <variant>
+
+using namespace llvm;
+using namespace Intrinsic;
+
+static void BM_GetIntrinsicInfoTableEntries(benchmark::State &state) {
+  SmallVector<IITDescriptor> Table;
+  for (auto _ : state) {
+    for (ID ID = 1; ID < num_intrinsics; ++ID) {
+      // This makes sure the vector does not keep growing, as well as after the
+      // first iteration does not result in additional allocations.
+      Table.clear();
+      getIntrinsicInfoTableEntries(ID, Table);
+    }
+  }
+}
+
+BENCHMARK(BM_GetIntrinsicInfoTableEntries);
+
+BENCHMARK_MAIN();
diff --git a/llvm/lib/IR/Function.cpp b/llvm/lib/IR/Function.cpp
index 69520fdb03dc7c..afef8930669e84 100644
--- a/llvm/lib/IR/Function.cpp
+++ b/llvm/lib/IR/Function.cpp
@@ -1381,22 +1381,24 @@ static void DecodeIITType(unsigned &NextElt, ArrayRef<unsigned char> Infos,
 
 void Intrinsic::getIntrinsicInfoTableEntries(ID id,
                                              SmallVectorImpl<IITDescriptor> &T){
+  static_assert(sizeof(IIT_Table[0]) == 2,
+                "Expect 16-bit entries in IIT_Table");
   // Check to see if the intrinsic's type was expressible by the table.
-  unsigned TableVal = IIT_Table[id-1];
+  uint16_t TableVal = IIT_Table[id - 1];
 
   // Decode the TableVal into an array of IITValues.
-  SmallVector<unsigned char, 8> IITValues;
+  SmallVector<unsigned char> IITValues;
   ArrayRef<unsigned char> IITEntries;
   unsigned NextElt = 0;
-  if ((TableVal >> 31) != 0) {
+  if (TableVal >> 15) {
     // This is an offset into the IIT_LongEncodingTable.
     IITEntries = IIT_LongEncodingTable;
 
     // Strip sentinel bit.
-    NextElt = (TableVal << 1) >> 1;
+    NextElt = TableVal & 0x7fff;
   } else {
-    // Decode the TableVal into an array of IITValues.  If the entry was encoded
-    // into a single word in the table itself, decode it now.
+    // If the entry was encoded into a single word in the table itself, decode
+    // it from an array of nibbles to an array of bytes.
     do {
       IITValues.push_back(TableVal & 0xF);
       TableVal >>= 4;
diff --git a/llvm/utils/TableGen/IntrinsicEmitter.cpp b/llvm/utils/TableGen/IntrinsicEmitter.cpp
index 09eb1ed5e1863b..3dd4767e965364 100644
--- a/llvm/utils/TableGen/IntrinsicEmitter.cpp
+++ b/llvm/utils/TableGen/IntrinsicEmitter.cpp
@@ -61,6 +61,7 @@ class IntrinsicEmitter {
   void EmitIntrinsicToOverloadTable(const CodeGenIntrinsicTable &Ints,
                                     raw_ostream &OS);
   void EmitGenerator(const CodeGenIntrinsicTable &Ints, raw_ostream &OS);
+
   void EmitAttributes(const CodeGenIntrinsicTable &Ints, raw_ostream &OS);
   void EmitIntrinsicToBuiltinMap(const CodeGenIntrinsicTable &Ints,
                                  bool IsClang, raw_ostream &OS);
@@ -282,11 +283,34 @@ static TypeSigTy ComputeTypeSignature(const CodeGenIntrinsic &Int) {
   return TypeSig;
 }
 
+// Pack the type signature into 32-bit fixed encoding word.
+std::optional<unsigned> encodePacked(const TypeSigTy &TypeSig) {
+  if (TypeSig.size() > 8)
+    return std::nullopt;
+
+  unsigned Result = 0;
+  for (unsigned char C : reverse(TypeSig)) {
+    if (C > 15)
+      return std::nullopt;
+    Result = (Result << 4) | C;
+  }
+  return Result;
+}
+
 void IntrinsicEmitter::EmitGenerator(const CodeGenIntrinsicTable &Ints,
                                      raw_ostream &OS) {
-  // If we can compute a 32-bit fixed encoding for this intrinsic, do so and
+  // Note: the code below can be switched to use 32-bit fixed encoding by
+  // flipping the flag below.
+  constexpr bool Use16BitFixedEncoding = true;
+  using EncodingTy =
+      std::conditional_t<Use16BitFixedEncoding, uint16_t, unsigned>;
+  const unsigned Mask = Use16BitFixedEncoding ? 0x7FFF : 0x7FFFFFFF;
+  const unsigned MSBPostion = Use16BitFixedEncoding ? 15 : 31;
+  StringRef TypeName = Use16BitFixedEncoding ? "uint16_t" : "unsigned";
+
+  // If we can compute a 16/32-bit fixed encoding for this intrinsic, do so and
   // capture it in this vector, otherwise store a ~0U.
-  std::vector<unsigned> FixedEncodings;
+  std::vector<EncodingTy> FixedEncodings;
   SequenceToOffsetTable<TypeSigTy> LongEncodingTable;
 
   FixedEncodings.reserve(Ints.size());
@@ -296,69 +320,59 @@ void IntrinsicEmitter::EmitGenerator(const CodeGenIntrinsicTable &Ints,
     // Get the signature for the intrinsic.
     TypeSigTy TypeSig = ComputeTypeSignature(Int);
 
-    // Check to see if we can encode it into a 32-bit word. We can only encode
-    // 8 nibbles into a 32-bit word.
-    if (TypeSig.size() <= 8) {
-      // Attempt to pack elements of TypeSig into a 32-bit word, starting from
-      // the most significant nibble.
-      unsigned Result = 0;
-      bool Failed = false;
-      for (unsigned char C : reverse(TypeSig)) {
-        if (C > 15) {
-          Failed = true;
-          break;
-        }
-        Result = (Result << 4) | C;
-      }
-
-      // If this could be encoded into a 31-bit word, return it.
-      if (!Failed && (Result >> 31) == 0) {
-        FixedEncodings.push_back(Result);
-        continue;
-      }
+    // Check to see if we can encode it into a 16/32 bit word.
+    std::optional<unsigned> Result = encodePacked(TypeSig);
+    if (Result && (*Result & Mask) == Result) {
+      FixedEncodings.push_back(static_cast<EncodingTy>(*Result));
+      continue;
     }
 
-    // Otherwise, we're going to unique the sequence into the
-    // LongEncodingTable, and use its offset in the 32-bit table instead.
     LongEncodingTable.add(TypeSig);
 
     // This is a placehold that we'll replace after the table is laid out.
-    FixedEncodings.push_back(~0U);
+    FixedEncodings.push_back(static_cast<EncodingTy>(~0U));
   }
 
   LongEncodingTable.layout();
 
-  OS << R"(// Global intrinsic function declaration type table.
+  OS << formatv(R"(// Global intrinsic function declaration type table.
 #ifdef GET_INTRINSIC_GENERATOR_GLOBAL
-static constexpr unsigned IIT_Table[] = {
-  )";
+static constexpr {0} IIT_Table[] = {{
+  )",
+                TypeName);
 
+  unsigned MaxOffset = 0;
   for (auto [Idx, FixedEncoding, Int] : enumerate(FixedEncodings, Ints)) {
     if ((Idx & 7) == 7)
       OS << "\n  ";
 
     // If the entry fit in the table, just emit it.
-    if (FixedEncoding != ~0U) {
+    if ((FixedEncoding & Mask) == FixedEncoding) {
       OS << "0x" << Twine::utohexstr(FixedEncoding) << ", ";
       continue;
     }
 
     TypeSigTy TypeSig = ComputeTypeSignature(Int);
+    unsigned Offset = LongEncodingTable.get(TypeSig);
+    MaxOffset = std::max(MaxOffset, Offset);
 
     // Otherwise, emit the offset into the long encoding table.  We emit it this
     // way so that it is easier to read the offset in the .def file.
-    OS << "(1U<<31) | " << LongEncodingTable.get(TypeSig) << ", ";
+    OS << formatv("(1U<<{0}) | {1}, ", MSBPostion, Offset);
   }
 
   OS << "0\n};\n\n";
 
+  // verify that all offsets will fit in 16/32 bits.
+  if ((MaxOffset & Mask) != MaxOffset)
+    PrintFatalError("Offset of long encoding table exceeds encoding bits");
+
   // Emit the shared table of register lists.
   OS << "static constexpr unsigned char IIT_LongEncodingTable[] = {\n";
   if (!LongEncodingTable.empty())
     LongEncodingTable.emit(
         OS, [](raw_ostream &OS, unsigned char C) { OS << (unsigned)C; });
-  OS << "  255\n};\n\n";
-
+  OS << "  255\n};\n";
   OS << "#endif\n\n"; // End of GET_INTRINSIC_GENERATOR_GLOBAL
 }
 

@jurahul jurahul force-pushed the reduce_iit_type_encoding_size branch from a974951 to 5d3b486 Compare September 3, 2024 13:31
@jurahul jurahul force-pushed the reduce_iit_type_encoding_size branch from 5d3b486 to 0d0abc2 Compare September 4, 2024 12:59
@jurahul jurahul force-pushed the reduce_iit_type_encoding_size branch from 0d0abc2 to 36f1760 Compare September 4, 2024 14:40
Add intrinsic emitter option to change the "fixed encoding" table used
for encoding intrinsic type signature to use 16-bit encoding as opposed
to 32-bit.

To better segragate LLVM Core from this encoding detail, add a function
`decodeIITFixedEncoding` to the emitted code to decode the fixed
encoding. This allows TableGen intrinsic emitter to choose 16 or 32-bit
fixed encoding withot changing the LLVM code.

When using 16-bit encoding, we seem to reduce the total static storage
size of this info by 50%. Currently measure data is as follows

- Current size = 14193*4 + 16058 + 3 = 72833 bytes.
- New size = 14193*2 + 19879 + 3 = 48268 bytes.
- Reduction = 50.9%
@jurahul jurahul force-pushed the reduce_iit_type_encoding_size branch from 36f1760 to 801af60 Compare September 4, 2024 14:42
@jurahul jurahul merged commit dcf0160 into llvm:main Sep 4, 2024
8 checks passed
@jurahul jurahul deleted the reduce_iit_type_encoding_size branch September 4, 2024 21:46
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 4, 2024

LLVM Buildbot has detected a new failure on builder openmp-s390x-linux running on systemz-1 while building llvm at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/88/builds/2271

Here is the relevant piece of the build log for the reference
Step 6 (test-openmp) failure: 1200 seconds without output running [b'ninja', b'-j 4', b'check-openmp'], attempting to kill
...
PASS: ompd-test :: openmp_examples/example_3.c (439 of 449)
PASS: ompd-test :: openmp_examples/example_4.c (440 of 449)
PASS: ompd-test :: openmp_examples/example_5.c (441 of 449)
PASS: ompd-test :: openmp_examples/example_task.c (442 of 449)
UNSUPPORTED: ompd-test :: openmp_examples/ompd_bt.c (443 of 449)
PASS: ompd-test :: openmp_examples/fibonacci.c (444 of 449)
UNSUPPORTED: ompd-test :: openmp_examples/ompd_parallel.c (445 of 449)
PASS: ompd-test :: openmp_examples/parallel.c (446 of 449)
PASS: ompd-test :: openmp_examples/ompd_icvs.c (447 of 449)
PASS: ompd-test :: openmp_examples/nested.c (448 of 449)
command timed out: 1200 seconds without output running [b'ninja', b'-j 4', b'check-openmp'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1312.136230

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 4, 2024

LLVM Buildbot has detected a new failure on builder bolt-x86_64-ubuntu-nfc running on bolt-worker while building llvm at step 8 "test-build-bolt-check-bolt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/92/builds/5816

Here is the relevant piece of the build log for the reference
Step 8 (test-build-bolt-check-bolt) failure: test (failure)
******************** TEST 'BOLT :: perf2bolt/perf_test.test' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 5: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -fuse-ld=lld -Wl,--script=/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.lds -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -fuse-ld=lld -Wl,--script=/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.lds -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
RUN: at line 6: perf record -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
+ perf record -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
Lowering default frequency rate from 4000 to 2000.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 (19 samples) ]
RUN: at line 7: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp3 -nl -ignore-build-id 2>&1 | /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp3 -nl -ignore-build-id
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test
RUN: at line 12: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -no-pie -fuse-ld=lld -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -no-pie -fuse-ld=lld -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4
RUN: at line 13: perf record -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4
+ perf record -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4
Lowering default frequency rate from 4000 to 2000.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 (9 samples) ]
RUN: at line 14: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4 -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp6 -nl -ignore-build-id 2>&1 | /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test --check-prefix=CHECK-NO-PIE
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test --check-prefix=CHECK-NO-PIE
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4 -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp6 -nl -ignore-build-id
/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test:17:19: error: CHECK-NO-PIE-NOT: excluded string found in input
CHECK-NO-PIE-NOT: !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection.
                  ^
<stdin>:26:2: note: found here
 !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection. The generated data may be ineffective for improving performance.
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Input file: <stdin>
Check file: /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test

-dump-input=help explains the following input dump.

Input was:
<<<<<<
        .
        .
        .
       21: BOLT-WARNING: Running parallel work of 0 estimated cost, will switch to trivial scheduling. 
       22: PERF2BOLT: processing basic events (without LBR)... 
       23: PERF2BOLT: read 9 samples 
       24: PERF2BOLT: out of range samples recorded in unknown regions: 9 (100.0%) 
       25:  
       26:  !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection. The generated data may be ineffective for improving performance. 
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants