Skip to content

[SampleProf] Templatize longestCommonSequence (NFC) #114633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 4, 2024

Conversation

kazutakahirata
Copy link
Contributor

This patch moves the implementation of longestCommonSequence to a new
header file.

I'm planning to implement a profile undrifting algorithm for MemProf
so that the compiler can ingest somewhat stale MemProf profile and
still deliver most of the benefits that would be delivered if the
profile were completely up to date (with no line number or column
number differences).

Since the core undrifting algorithm is the same between MemProf and
AutoFDO, this patch turns longestCommonSequence into a template. The
original longestCommonSequence implementation is repurposed and now
serves as a wrapper around a template specialization.

Note that the usage differences between MemProf and AutoFDO are minor.
For example, I'm planning to use line-column number pair instead of
LineLocation, which uses a discriminator. To identify a function, I'm
planning to use uint64_t GUID instead of FunctionId.

For now, I'm returning matches via a function object InsertMatching
because it's impossible to infer the map type from LineLocation alone.
Specifically:

std::unordered_map<LineLocation, LineLocation>

does not work because we cannot infer the hash functor
LineLocationHash. I could define std::hash.
Alternatively, in the future, I might switch to DenseMap and define
DenseMapInfo. This way:

DenseMap<LineLocation, LineLocation>

automatically picks up DenseMapInfo.

This patch moves the implementation of longestCommonSequence to a new
header file.

I'm planning to implement a profile undrifting algorithm for MemProf
so that the compiler can ingest somewhat stale MemProf profile and
still deliver most of the benefits that would be delivered if the
profile were completely up to date (with no line number or column
number differences).

Since the core undrifting algorithm is the same between MemProf and
AutoFDO, this patch turns longestCommonSequence into a template.  The
original longestCommonSequence implementation is repurposed and now
serves as a wrapper around a template specialization.

Note that the usage differences between MemProf and AutoFDO are minor.
For example, I'm planning to use line-column number pair instead of
LineLocation, which uses a discriminator.  To identify a function, I'm
planning to use uint64_t GUID instead of FunctionId.

For now, I'm returning matches via a function object InsertMatching
because it's impossible to infer the map type from LineLocation alone.
Specifically:

  std::unordered_map<LineLocation, LineLocation>

does not work because we cannot infer the hash functor
LineLocationHash.  I could define std::hash<LineLocation>.
Alternatively, in the future, I might switch to DenseMap and define
DenseMapInfo<LineLocation>.  This way:

  DenseMap<LineLocation, LineLocation>

automatically picks up DenseMapInfo<LineLocation>.
@llvmbot llvmbot added PGO Profile Guided Optimizations llvm:transforms labels Nov 2, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 2, 2024

@llvm/pr-subscribers-pgo

@llvm/pr-subscribers-llvm-transforms

Author: Kazu Hirata (kazutakahirata)

Changes

This patch moves the implementation of longestCommonSequence to a new
header file.

I'm planning to implement a profile undrifting algorithm for MemProf
so that the compiler can ingest somewhat stale MemProf profile and
still deliver most of the benefits that would be delivered if the
profile were completely up to date (with no line number or column
number differences).

Since the core undrifting algorithm is the same between MemProf and
AutoFDO, this patch turns longestCommonSequence into a template. The
original longestCommonSequence implementation is repurposed and now
serves as a wrapper around a template specialization.

Note that the usage differences between MemProf and AutoFDO are minor.
For example, I'm planning to use line-column number pair instead of
LineLocation, which uses a discriminator. To identify a function, I'm
planning to use uint64_t GUID instead of FunctionId.

For now, I'm returning matches via a function object InsertMatching
because it's impossible to infer the map type from LineLocation alone.
Specifically:

std::unordered_map<LineLocation, LineLocation>

does not work because we cannot infer the hash functor
LineLocationHash. I could define std::hash<LineLocation>.
Alternatively, in the future, I might switch to DenseMap and define
DenseMapInfo<LineLocation>. This way:

DenseMap<LineLocation, LineLocation>

automatically picks up DenseMapInfo<LineLocation>.


Full diff: https://github.com/llvm/llvm-project/pull/114633.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h (-10)
  • (added) llvm/include/llvm/Transforms/Utils/LongestCommonSequence.h (+116)
  • (modified) llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp (+11-76)
diff --git a/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h b/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h
index 4e757b29961817..3bdcf9a18fe408 100644
--- a/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h
+++ b/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h
@@ -205,16 +205,6 @@ class SampleProfileMatcher {
   }
   void distributeIRToProfileLocationMap();
   void distributeIRToProfileLocationMap(FunctionSamples &FS);
-  // This function implements the Myers diff algorithm used for stale profile
-  // matching. The algorithm provides a simple and efficient way to find the
-  // Longest Common Subsequence(LCS) or the Shortest Edit Script(SES) of two
-  // sequences. For more details, refer to the paper 'An O(ND) Difference
-  // Algorithm and Its Variations' by Eugene W. Myers.
-  // In the scenario of profile fuzzy matching, the two sequences are the IR
-  // callsite anchors and profile callsite anchors. The subsequence equivalent
-  // parts from the resulting SES are used to remap the IR locations to the
-  // profile locations. As the number of function callsite is usually not big,
-  // we currently just implements the basic greedy version(page 6 of the paper).
   LocToLocMap longestCommonSequence(const AnchorList &IRCallsiteAnchors,
                                     const AnchorList &ProfileCallsiteAnchors,
                                     bool MatchUnusedFunction);
diff --git a/llvm/include/llvm/Transforms/Utils/LongestCommonSequence.h b/llvm/include/llvm/Transforms/Utils/LongestCommonSequence.h
new file mode 100644
index 00000000000000..5c94bda6abd2be
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Utils/LongestCommonSequence.h
@@ -0,0 +1,116 @@
+//===- LongestCommonSequence.h - Compute LCS --------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements longestCommonSequence, useful for finding matches
+// between two sequences, such as lists of profiling points.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_UTILS_LONGESTCOMMONSEQEUNCE_H
+#define LLVM_TRANSFORMS_UTILS_LONGESTCOMMONSEQEUNCE_H
+
+#include "llvm/ADT/ArrayRef.h"
+
+#include <cstdint>
+#include <vector>
+
+namespace llvm {
+
+// This function implements the Myers diff algorithm used for stale profile
+// matching. The algorithm provides a simple and efficient way to find the
+// Longest Common Subsequence(LCS) or the Shortest Edit Script(SES) of two
+// sequences. For more details, refer to the paper 'An O(ND) Difference
+// Algorithm and Its Variations' by Eugene W. Myers.
+// In the scenario of profile fuzzy matching, the two sequences are the IR
+// callsite anchors and profile callsite anchors. The subsequence equivalent
+// parts from the resulting SES are used to remap the IR locations to the
+// profile locations. As the number of function callsite is usually not big,
+// we currently just implements the basic greedy version(page 6 of the paper).
+template <typename Loc, typename Function,
+          typename AnchorList = ArrayRef<std::pair<Loc, Function>>>
+void longestCommonSequence(
+    AnchorList AnchorList1, AnchorList AnchorList2,
+    llvm::function_ref<bool(const Function &, const Function &)>
+        FunctionMatchesProfile,
+    llvm::function_ref<void(Loc, Loc)> InsertMatching) {
+  int32_t Size1 = AnchorList1.size(), Size2 = AnchorList2.size(),
+          MaxDepth = Size1 + Size2;
+  auto Index = [&](int32_t I) { return I + MaxDepth; };
+
+  if (MaxDepth == 0)
+    return;
+
+  // Backtrack the SES result.
+  auto Backtrack = [&](ArrayRef<std::vector<int32_t>> Trace,
+                       AnchorList AnchorList1, AnchorList AnchorList2) {
+    int32_t X = Size1, Y = Size2;
+    for (int32_t Depth = Trace.size() - 1; X > 0 || Y > 0; Depth--) {
+      const auto &P = Trace[Depth];
+      int32_t K = X - Y;
+      int32_t PrevK = K;
+      if (K == -Depth || (K != Depth && P[Index(K - 1)] < P[Index(K + 1)]))
+        PrevK = K + 1;
+      else
+        PrevK = K - 1;
+
+      int32_t PrevX = P[Index(PrevK)];
+      int32_t PrevY = PrevX - PrevK;
+      while (X > PrevX && Y > PrevY) {
+        X--;
+        Y--;
+        InsertMatching(AnchorList1[X].first, AnchorList2[Y].first);
+      }
+
+      if (Depth == 0)
+        break;
+
+      if (Y == PrevY)
+        X--;
+      else if (X == PrevX)
+        Y--;
+      X = PrevX;
+      Y = PrevY;
+    }
+  };
+
+  // The greedy LCS/SES algorithm.
+
+  // An array contains the endpoints of the furthest reaching D-paths.
+  std::vector<int32_t> V(2 * MaxDepth + 1, -1);
+  V[Index(1)] = 0;
+  // Trace is used to backtrack the SES result.
+  std::vector<std::vector<int32_t>> Trace;
+  for (int32_t Depth = 0; Depth <= MaxDepth; Depth++) {
+    Trace.push_back(V);
+    for (int32_t K = -Depth; K <= Depth; K += 2) {
+      int32_t X = 0, Y = 0;
+      if (K == -Depth || (K != Depth && V[Index(K - 1)] < V[Index(K + 1)]))
+        X = V[Index(K + 1)];
+      else
+        X = V[Index(K - 1)] + 1;
+      Y = X - K;
+      while (
+          X < Size1 && Y < Size2 &&
+          FunctionMatchesProfile(AnchorList1[X].second, AnchorList2[Y].second))
+        X++, Y++;
+
+      V[Index(K)] = X;
+
+      if (X >= Size1 && Y >= Size2) {
+        // Length of an SES is D.
+        Backtrack(Trace, AnchorList1, AnchorList2);
+        return;
+      }
+    }
+  }
+  // Length of an SES is greater than MaxDepth.
+}
+
+} // end namespace llvm
+
+#endif // LLVM_TRANSFORMS_UTILS_LONGESTCOMMONSEQEUNCE_H
diff --git a/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp b/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
index 0c676e8fb95fdb..846f173c19a450 100644
--- a/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
@@ -15,6 +15,7 @@
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/MDBuilder.h"
 #include "llvm/Support/CommandLine.h"
+#include "llvm/Transforms/Utils/LongestCommonSequence.h"
 
 using namespace llvm;
 using namespace sampleprof;
@@ -194,82 +195,16 @@ LocToLocMap
 SampleProfileMatcher::longestCommonSequence(const AnchorList &AnchorList1,
                                             const AnchorList &AnchorList2,
                                             bool MatchUnusedFunction) {
-  int32_t Size1 = AnchorList1.size(), Size2 = AnchorList2.size(),
-          MaxDepth = Size1 + Size2;
-  auto Index = [&](int32_t I) { return I + MaxDepth; };
-
-  LocToLocMap EqualLocations;
-  if (MaxDepth == 0)
-    return EqualLocations;
-
-  // Backtrack the SES result.
-  auto Backtrack = [&](const std::vector<std::vector<int32_t>> &Trace,
-                       const AnchorList &AnchorList1,
-                       const AnchorList &AnchorList2,
-                       LocToLocMap &EqualLocations) {
-    int32_t X = Size1, Y = Size2;
-    for (int32_t Depth = Trace.size() - 1; X > 0 || Y > 0; Depth--) {
-      const auto &P = Trace[Depth];
-      int32_t K = X - Y;
-      int32_t PrevK = K;
-      if (K == -Depth || (K != Depth && P[Index(K - 1)] < P[Index(K + 1)]))
-        PrevK = K + 1;
-      else
-        PrevK = K - 1;
-
-      int32_t PrevX = P[Index(PrevK)];
-      int32_t PrevY = PrevX - PrevK;
-      while (X > PrevX && Y > PrevY) {
-        X--;
-        Y--;
-        EqualLocations.insert({AnchorList1[X].first, AnchorList2[Y].first});
-      }
-
-      if (Depth == 0)
-        break;
-
-      if (Y == PrevY)
-        X--;
-      else if (X == PrevX)
-        Y--;
-      X = PrevX;
-      Y = PrevY;
-    }
-  };
-
-  // The greedy LCS/SES algorithm.
-
-  // An array contains the endpoints of the furthest reaching D-paths.
-  std::vector<int32_t> V(2 * MaxDepth + 1, -1);
-  V[Index(1)] = 0;
-  // Trace is used to backtrack the SES result.
-  std::vector<std::vector<int32_t>> Trace;
-  for (int32_t Depth = 0; Depth <= MaxDepth; Depth++) {
-    Trace.push_back(V);
-    for (int32_t K = -Depth; K <= Depth; K += 2) {
-      int32_t X = 0, Y = 0;
-      if (K == -Depth || (K != Depth && V[Index(K - 1)] < V[Index(K + 1)]))
-        X = V[Index(K + 1)];
-      else
-        X = V[Index(K - 1)] + 1;
-      Y = X - K;
-      while (X < Size1 && Y < Size2 &&
-             functionMatchesProfile(
-                 AnchorList1[X].second, AnchorList2[Y].second,
-                 !MatchUnusedFunction /* Find matched function only */))
-        X++, Y++;
-
-      V[Index(K)] = X;
-
-      if (X >= Size1 && Y >= Size2) {
-        // Length of an SES is D.
-        Backtrack(Trace, AnchorList1, AnchorList2, EqualLocations);
-        return EqualLocations;
-      }
-    }
-  }
-  // Length of an SES is greater than MaxDepth.
-  return EqualLocations;
+  LocToLocMap MatchedAnchors;
+  llvm::longestCommonSequence<LineLocation, FunctionId>(
+      AnchorList1, AnchorList2,
+      [&](const FunctionId &A, const FunctionId &B) {
+        return functionMatchesProfile(A, B, !MatchUnusedFunction);
+      },
+      [&](LineLocation A, LineLocation B) {
+        MatchedAnchors.try_emplace(A, B);
+      });
+  return MatchedAnchors;
 }
 
 void SampleProfileMatcher::matchNonCallsiteLocs(

@wlei-llvm wlei-llvm requested a review from WenleiHe November 2, 2024 03:23
@WenleiHe
Copy link
Member

WenleiHe commented Nov 3, 2024

Good to know this can be used to help memprof as well :)

Copy link
Contributor

@wlei-llvm wlei-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kazutakahirata kazutakahirata merged commit 3a5791e into llvm:main Nov 4, 2024
8 checks passed
@kazutakahirata kazutakahirata deleted the memprof_lcs branch November 4, 2024 21:20
PhilippRados pushed a commit to PhilippRados/llvm-project that referenced this pull request Nov 6, 2024
This patch moves the implementation of longestCommonSequence to a new
header file.

I'm planning to implement a profile undrifting algorithm for MemProf
so that the compiler can ingest somewhat stale MemProf profile and
still deliver most of the benefits that would be delivered if the
profile were completely up to date (with no line number or column
number differences).

Since the core undrifting algorithm is the same between MemProf and
AutoFDO, this patch turns longestCommonSequence into a template.  The
original longestCommonSequence implementation is repurposed and now
serves as a wrapper around a template specialization.

Note that the usage differences between MemProf and AutoFDO are minor.
For example, I'm planning to use line-column number pair instead of
LineLocation, which uses a discriminator.  To identify a function, I'm
planning to use uint64_t GUID instead of FunctionId.

For now, I'm returning matches via a function object InsertMatching
because it's impossible to infer the map type from LineLocation alone.
Specifically:

  std::unordered_map<LineLocation, LineLocation>

does not work because we cannot infer the hash functor
LineLocationHash.  I could define std::hash<LineLocation>.
Alternatively, in the future, I might switch to DenseMap and define
DenseMapInfo<LineLocation>.  This way:

  DenseMap<LineLocation, LineLocation>

automatically picks up DenseMapInfo<LineLocation>.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants