Skip to content

Implements PGOBBAddrMap in Object and ObjectYAML with tests [1/5] #71750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Dec 12, 2023

Conversation

red1bluelost
Copy link
Member

@red1bluelost red1bluelost commented Nov 9, 2023

A part of RFC - PGO Accuracy Metrics: Emitting and Evaluating Branch and Block Analysis.

This PR adds the PGOBBAddrMap data structure and implements encoding and decoding through Object and ObjectYAML along with associated tests.

Design Alternative

The current implementation makes an effort to avoid modifying the original BBAddrMap data structure. This keeps the API the same as before for all pre-existing code and avoids unnecessary fields for regular BBAddrMap usage. It comes with the cost of some complexity in sharing encoding/decoding code.

Alternatively, we could just place all new fields into BBAddrMap. This makes the encoding/decoding code simpler. It has the cost of changing some behavior in the original API. It also adds unnecessary fields for anyone wanting to just use the regular BBAddrMap.

Please let us know if you feel that the alternate design of placing new fields into BBAddrMap is preferable.

PR Series

The current code for PGOBBAddrMap to be upstreamed is split into five PRs:

  1. Object and ObjectYAML - (this one) https://github.com/red1bluelost/llvm-project/tree/pgo-bb-addr-map--object
  2. AsmPrinter - Implements PGOBBAddrMap in AsmPrinter with tests [2/5] red1bluelost/llvm-project#1 or https://github.com/red1bluelost/llvm-project/tree/pgo-bb-addr-map--asm-printer
  3. llvm-readobj - Implements llvm-readobj with tests. [3/5] red1bluelost/llvm-project#2 or https://github.com/red1bluelost/llvm-project/tree/pgo-bb-addr-map--llvm-readobj
  4. llvm-objdump - https://github.com/red1bluelost/llvm-project/tree/pgo-bb-addr-map--llvm-objdump
  5. llvm obj2yaml - https://github.com/red1bluelost/llvm-project/tree/pgo-bb-addr-map--llvm-obj2yaml

If you would like to try testing PGOBBAddrMap locally on a program, PR-3 llvm-readobj is likely the minimum code needed to meaningfully use this feature.

@llvmbot
Copy link
Member

llvmbot commented Nov 9, 2023

@llvm/pr-subscribers-llvm-binary-utilities
@llvm/pr-subscribers-objectyaml

@llvm/pr-subscribers-mc

Author: Micah Weston (red1bluelost)

Changes

A part of RFC - PGO Accuracy Metrics: Emitting and Evaluating Branch and Block Analysis.

This PR adds the PGOBBAddrMap data structure and implements encoding and decoding through Object and ObjectYAML along with associated tests.

Design Alternative

The current implementation makes an effort to avoid modifying the original BBAddrMap data structure. This keeps the API the same as before for all pre-existing code and avoids unnecessary fields for regular BBAddrMap usage. It comes with the cost of some complexity in sharing encoding/decoding code.

Alternatively, we could just place all new fields into BBAddrMap. This makes the encoding/decoding code simpler. It has the cost of changing some behavior in the original API. It also adds unnecessary fields for anyone wanting to just use the regular BBAddrMap.

Please let us know if you feel that the alternate design of placing new fields into BBAddrMap is preferable.

PR Series

The current code for PGOBBAddrMap to be upstreamed is split into five PRs:

  1. Object and ObjectYAML - (this one) https://github.com/red1bluelost/llvm-project/tree/pgo-bb-addr-map--object
  2. AsmPrinter - Implements PGOBBAddrMap in AsmPrinter with tests [2/5] red1bluelost/llvm-project#1 or https://github.com/red1bluelost/llvm-project/tree/pgo-bb-addr-map--asm-printer
  3. llvm-readobj - Implements llvm-readobj with tests. [3/5] red1bluelost/llvm-project#2 or https://github.com/red1bluelost/llvm-project/tree/pgo-bb-addr-map--llvm-readobj
  4. llvm-objdump - https://github.com/red1bluelost/llvm-project/tree/pgo-bb-addr-map--llvm-objdump
  5. llvm obj2yaml - https://github.com/red1bluelost/llvm-project/tree/pgo-bb-addr-map--llvm-obj2yaml

If you would like to try testing PGOBBAddrMap locally on a program, PR-3 llvm-readobj is likely the minimum code needed to meaningfully use this feature.


Patch is 60.80 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/71750.diff

15 Files Affected:

  • (modified) llvm/include/llvm/BinaryFormat/ELF.h (+1)
  • (modified) llvm/include/llvm/Object/ELF.h (+6)
  • (modified) llvm/include/llvm/Object/ELFObjectFile.h (+3)
  • (modified) llvm/include/llvm/Object/ELFTypes.h (+108)
  • (modified) llvm/include/llvm/ObjectYAML/ELFYAML.h (+56-4)
  • (modified) llvm/lib/MC/MCParser/ELFAsmParser.cpp (+2)
  • (modified) llvm/lib/MC/MCSectionELF.cpp (+2)
  • (modified) llvm/lib/Object/ELF.cpp (+211-52)
  • (modified) llvm/lib/Object/ELFObjectFile.cpp (+44-8)
  • (modified) llvm/lib/ObjectYAML/ELFEmitter.cpp (+108-22)
  • (modified) llvm/lib/ObjectYAML/ELFYAML.cpp (+42-3)
  • (modified) llvm/test/MC/AsmParser/llvm_section_types.s (+4)
  • (modified) llvm/tools/obj2yaml/elf2yaml.cpp (+1-1)
  • (modified) llvm/unittests/Object/ELFObjectFileTest.cpp (+440)
  • (modified) llvm/unittests/Object/ELFTypesTest.cpp (+52-1)
diff --git a/llvm/include/llvm/BinaryFormat/ELF.h b/llvm/include/llvm/BinaryFormat/ELF.h
index 3596174f74dde80..299d7e8265317b3 100644
--- a/llvm/include/llvm/BinaryFormat/ELF.h
+++ b/llvm/include/llvm/BinaryFormat/ELF.h
@@ -1037,6 +1037,7 @@ enum : unsigned {
   SHT_LLVM_BB_ADDR_MAP = 0x6fff4c0a,        // LLVM Basic Block Address Map.
   SHT_LLVM_OFFLOADING = 0x6fff4c0b,         // LLVM device offloading data.
   SHT_LLVM_LTO = 0x6fff4c0c,                // .llvm.lto for fat LTO.
+  SHT_LLVM_PGO_BB_ADDR_MAP = 0x6fff4c0d,    // LLVM PGO extended BB Addr Map.
   // Android's experimental support for SHT_RELR sections.
   // https://android.googlesource.com/platform/bionic/+/b7feec74547f84559a1467aca02708ff61346d2a/libc/include/elf.h#512
   SHT_ANDROID_RELR = 0x6fffff00,   // Relocation entries; only offsets.
diff --git a/llvm/include/llvm/Object/ELF.h b/llvm/include/llvm/Object/ELF.h
index a1cf47a1c4a6173..3768347ee59582b 100644
--- a/llvm/include/llvm/Object/ELF.h
+++ b/llvm/include/llvm/Object/ELF.h
@@ -417,6 +417,12 @@ class ELFFile {
   Expected<std::vector<BBAddrMap>>
   decodeBBAddrMap(const Elf_Shdr &Sec, const Elf_Shdr *RelaSec = nullptr) const;
 
+  /// Decodes same as decodeBBAddrMap but also decodes extra information from
+  /// features which are enabled.
+  Expected<std::vector<PGOBBAddrMap>>
+  decodePGOBBAddrMap(const Elf_Shdr &Sec,
+                     const Elf_Shdr *RelaSec = nullptr) const;
+
   /// Returns a map from every section matching \p IsMatch to its relocation
   /// section, or \p nullptr if it has no relocation section. This function
   /// returns an error if any of the \p IsMatch calls fail or if it fails to
diff --git a/llvm/include/llvm/Object/ELFObjectFile.h b/llvm/include/llvm/Object/ELFObjectFile.h
index d7947d85739eb3b..d6dd62d487affbe 100644
--- a/llvm/include/llvm/Object/ELFObjectFile.h
+++ b/llvm/include/llvm/Object/ELFObjectFile.h
@@ -114,6 +114,9 @@ class ELFObjectFileBase : public ObjectFile {
   // corresponding to the section with that index.
   Expected<std::vector<BBAddrMap>>
   readBBAddrMap(std::optional<unsigned> TextSectionIndex = std::nullopt) const;
+
+  Expected<std::vector<PGOBBAddrMap>> readPGOBBAddrMap(
+      std::optional<unsigned> TextSectionIndex = std::nullopt) const;
 };
 
 class ELFSectionRef : public SectionRef {
diff --git a/llvm/include/llvm/Object/ELFTypes.h b/llvm/include/llvm/Object/ELFTypes.h
index 45fc52288bdd4c0..66880bdfac99581 100644
--- a/llvm/include/llvm/Object/ELFTypes.h
+++ b/llvm/include/llvm/Object/ELFTypes.h
@@ -10,9 +10,12 @@
 #define LLVM_OBJECT_ELFTYPES_H
 
 #include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/BitmaskEnum.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/BinaryFormat/ELF.h"
 #include "llvm/Object/Error.h"
+#include "llvm/Support/BlockFrequency.h"
+#include "llvm/Support/BranchProbability.h"
 #include "llvm/Support/Endian.h"
 #include "llvm/Support/Error.h"
 #include "llvm/Support/MathExtras.h"
@@ -806,6 +809,11 @@ struct BBAddrMap {
       bool HasIndirectBranch : 1; // If this block ends with an indirect branch
                                   // (branch via a register).
 
+      // Number of bits used when encoding Metadata, that way an extension
+      // can pack into the extra space if possible. This must be updated when
+      // new bits are added here.
+      static constexpr uint32_t NumberOfBits = 5;
+
       bool operator==(const Metadata &Other) const {
         return HasReturn == Other.HasReturn &&
                HasTailCall == Other.HasTailCall && IsEHPad == Other.IsEHPad &&
@@ -865,6 +873,106 @@ struct BBAddrMap {
   }
 };
 
+/// An extension of BBAddrMap that holds information relevant to PGO.
+struct PGOBBAddrMap {
+  /// Bitmask of optional features to include in the PGO extended map.
+  enum class Features {
+    None = 0,
+    FuncEntryCnt = (1 << 0),
+    BBFreq = (1 << 1),
+    BrProb = (1 << 2),
+    LLVM_MARK_AS_BITMASK_ENUM(/*LargestValue=*/BrProb),
+  };
+
+  /// Super-set of BBAddrMap::BBEntry with additional fields for block frequency
+  /// and branch probability.
+  struct BBEntry {
+    using BaseMetadata = BBAddrMap::BBEntry::Metadata;
+
+    /// Enum indicating the how many successors a block has. This enum must fit
+    /// into two bits.
+    enum class SuccessorsType {
+      /// None should be present if PGOBBAddrMap has disabled branch
+      /// probability.
+      None = 0,
+      /// Single successor blocks are not present in the successor entries.
+      One = 1,
+      /// Common case for conditional branches to avoid encoding size.
+      Two = 2,
+      /// Uncommon case which needs successor size to be encoded.
+      Multiple = 3,
+    };
+
+    /// Single successor of a given basic block that contains the tag and branch
+    /// probability associated with it.
+    struct SuccessorEntry {
+      /// Unique ID of this successor basic block.
+      uint32_t ID;
+      /// Branch Probability of the edge to this successor taken from MBPI
+      BranchProbability Prob;
+
+      bool operator==(const SuccessorEntry &Other) const {
+        return std::tie(ID, Prob) == std::tie(Other.ID, Other.Prob);
+      }
+    };
+
+    /// Reuse of the fields provided by regular BBAddrMap
+    BBAddrMap::BBEntry Base;
+    /// Block frequency taken from MBFI
+    BlockFrequency BlockFreq;
+    /// List of successors of the current block
+    llvm::SmallVector<SuccessorEntry, 2> Successors;
+
+    /// Converts number of successors into a SuccessorsType.
+    static SuccessorsType getSuccessorsType(unsigned SuccessorsCount) {
+      return SuccessorsCount == 0   ? SuccessorsType::None
+             : SuccessorsCount == 1 ? SuccessorsType::One
+             : SuccessorsCount == 2 ? SuccessorsType::Two
+                                    : SuccessorsType::Multiple;
+    }
+
+    /// Encodes extra information in the free bits of the base metadata
+    static uint32_t encodeMD(BaseMetadata MD, SuccessorsType SuccType) {
+      return MD.encode() | static_cast<uint32_t>(SuccType)
+                               << BaseMetadata::NumberOfBits;
+    }
+
+    /// Extracts successors type then defers all errors to the base metadata
+    static Expected<std::pair<BaseMetadata, SuccessorsType>>
+    decodeMD(uint32_t V) {
+      auto SuccType = SuccessorsType((V >> BaseMetadata::NumberOfBits) & 0b11);
+      V &= ~(0b11 << BaseMetadata::NumberOfBits); // Clear extra bits
+      BaseMetadata MD;
+      if (llvm::Error E = BaseMetadata::decode(V).moveInto(MD))
+        return std::move(E);
+      return std::make_pair(MD, SuccType);
+    }
+
+    bool operator==(const BBEntry &Other) const {
+      return std::tie(Base, BlockFreq, Successors) ==
+             std::tie(Other.Base, Other.BlockFreq, Other.Successors);
+    }
+  };
+  // This field is duplicated from BBAddrMap since this class needs a different
+  // type for the vector of entries.
+  uint64_t Addr;                  // Function address
+  std::vector<BBEntry> BBEntries; // Extended basic block entries
+  uint64_t FuncEntryCount;        // Prof count from IR function
+
+  // Flags to indicate if each PGO related info was enabled in this function
+  bool FuncEntryCountEnabled : 1;
+  bool BBFreqEnabled : 1;
+  bool BBSuccProbEnabled : 1;
+
+  bool operator==(const PGOBBAddrMap &Other) const {
+    return std::tie(Addr, FuncEntryCount, BBEntries, FuncEntryCountEnabled,
+                    BBFreqEnabled, BBSuccProbEnabled) ==
+           std::tie(Other.Addr, Other.FuncEntryCount, Other.BBEntries,
+                    Other.FuncEntryCountEnabled, Other.BBFreqEnabled,
+                    Other.BBSuccProbEnabled);
+  }
+};
+
 } // end namespace object.
 } // end namespace llvm.
 
diff --git a/llvm/include/llvm/ObjectYAML/ELFYAML.h b/llvm/include/llvm/ObjectYAML/ELFYAML.h
index 1ba41232f552e3a..3f18d189b7dc870 100644
--- a/llvm/include/llvm/ObjectYAML/ELFYAML.h
+++ b/llvm/include/llvm/ObjectYAML/ELFYAML.h
@@ -156,6 +156,13 @@ struct DynamicEntry {
   llvm::yaml::Hex64 Val;
 };
 
+struct BBAddrMapCommonBase {
+  uint8_t Version;
+  llvm::yaml::Hex8 Feature;
+  llvm::yaml::Hex64 Address;
+  std::optional<uint64_t> NumBlocks;
+};
+
 struct BBAddrMapEntry {
   struct BBEntry {
     uint32_t ID;
@@ -163,10 +170,22 @@ struct BBAddrMapEntry {
     llvm::yaml::Hex64 Size;
     llvm::yaml::Hex64 Metadata;
   };
-  uint8_t Version;
-  llvm::yaml::Hex8 Feature;
-  llvm::yaml::Hex64 Address;
-  std::optional<uint64_t> NumBlocks;
+  BBAddrMapCommonBase Common;
+  std::optional<std::vector<BBEntry>> BBEntries;
+};
+
+struct PGOBBAddrMapEntry {
+  struct BBEntry {
+    struct SuccessorEntry {
+      uint32_t ID;
+      llvm::yaml::Hex32 BrProb;
+    };
+    BBAddrMapEntry::BBEntry Base;
+    std::optional<uint64_t> BBFreq;
+    std::optional<std::vector<SuccessorEntry>> Successors;
+  };
+  BBAddrMapCommonBase Common;
+  std::optional<uint64_t> FuncEntryCount;
   std::optional<std::vector<BBEntry>> BBEntries;
 };
 
@@ -204,6 +223,7 @@ struct Chunk {
     DependentLibraries,
     CallGraphProfile,
     BBAddrMap,
+    PGOBBAddrMap,
 
     // Special chunks.
     SpecialChunksStart,
@@ -329,6 +349,20 @@ struct BBAddrMapSection : Section {
   }
 };
 
+struct PGOBBAddrMapSection : Section {
+  std::optional<std::vector<PGOBBAddrMapEntry>> Entries;
+
+  PGOBBAddrMapSection() : Section(ChunkKind::PGOBBAddrMap) {}
+
+  std::vector<std::pair<StringRef, bool>> getEntries() const override {
+    return {{"Entries", Entries.has_value()}};
+  };
+
+  static bool classof(const Chunk *S) {
+    return S->Kind == ChunkKind::PGOBBAddrMap;
+  }
+};
+
 struct StackSizesSection : Section {
   std::optional<std::vector<StackSizeEntry>> Entries;
 
@@ -737,6 +771,10 @@ bool shouldAllocateFileSpace(ArrayRef<ProgramHeader> Phdrs,
 LLVM_YAML_IS_SEQUENCE_VECTOR(llvm::ELFYAML::StackSizeEntry)
 LLVM_YAML_IS_SEQUENCE_VECTOR(llvm::ELFYAML::BBAddrMapEntry)
 LLVM_YAML_IS_SEQUENCE_VECTOR(llvm::ELFYAML::BBAddrMapEntry::BBEntry)
+LLVM_YAML_IS_SEQUENCE_VECTOR(llvm::ELFYAML::PGOBBAddrMapEntry)
+LLVM_YAML_IS_SEQUENCE_VECTOR(llvm::ELFYAML::PGOBBAddrMapEntry::BBEntry)
+LLVM_YAML_IS_SEQUENCE_VECTOR(
+    llvm::ELFYAML::PGOBBAddrMapEntry::BBEntry::SuccessorEntry)
 LLVM_YAML_IS_SEQUENCE_VECTOR(llvm::ELFYAML::DynamicEntry)
 LLVM_YAML_IS_SEQUENCE_VECTOR(llvm::ELFYAML::LinkerOption)
 LLVM_YAML_IS_SEQUENCE_VECTOR(llvm::ELFYAML::CallGraphEntryWeight)
@@ -905,6 +943,20 @@ template <> struct MappingTraits<ELFYAML::BBAddrMapEntry::BBEntry> {
   static void mapping(IO &IO, ELFYAML::BBAddrMapEntry::BBEntry &Rel);
 };
 
+template <> struct MappingTraits<ELFYAML::PGOBBAddrMapEntry> {
+  static void mapping(IO &IO, ELFYAML::PGOBBAddrMapEntry &Rel);
+};
+
+template <> struct MappingTraits<ELFYAML::PGOBBAddrMapEntry::BBEntry> {
+  static void mapping(IO &IO, ELFYAML::PGOBBAddrMapEntry::BBEntry &Rel);
+};
+
+template <>
+struct MappingTraits<ELFYAML::PGOBBAddrMapEntry::BBEntry::SuccessorEntry> {
+  static void mapping(IO &IO,
+                      ELFYAML::PGOBBAddrMapEntry::BBEntry::SuccessorEntry &Rel);
+};
+
 template <> struct MappingTraits<ELFYAML::GnuHashHeader> {
   static void mapping(IO &IO, ELFYAML::GnuHashHeader &Rel);
 };
diff --git a/llvm/lib/MC/MCParser/ELFAsmParser.cpp b/llvm/lib/MC/MCParser/ELFAsmParser.cpp
index dbfe0d83e1b29b1..e4c1cc7763f7804 100644
--- a/llvm/lib/MC/MCParser/ELFAsmParser.cpp
+++ b/llvm/lib/MC/MCParser/ELFAsmParser.cpp
@@ -673,6 +673,8 @@ bool ELFAsmParser::ParseSectionArguments(bool IsPush, SMLoc loc) {
       Type = ELF::SHT_LLVM_SYMPART;
     else if (TypeName == "llvm_bb_addr_map")
       Type = ELF::SHT_LLVM_BB_ADDR_MAP;
+    else if (TypeName == "llvm_pgo_bb_addr_map")
+      Type = ELF::SHT_LLVM_PGO_BB_ADDR_MAP;
     else if (TypeName == "llvm_offloading")
       Type = ELF::SHT_LLVM_OFFLOADING;
     else if (TypeName == "llvm_lto")
diff --git a/llvm/lib/MC/MCSectionELF.cpp b/llvm/lib/MC/MCSectionELF.cpp
index 95fdf33522076e4..02e63f7b03919a8 100644
--- a/llvm/lib/MC/MCSectionELF.cpp
+++ b/llvm/lib/MC/MCSectionELF.cpp
@@ -170,6 +170,8 @@ void MCSectionELF::printSwitchToSection(const MCAsmInfo &MAI, const Triple &T,
     OS << "llvm_bb_addr_map";
   else if (Type == ELF::SHT_LLVM_BB_ADDR_MAP_V0)
     OS << "llvm_bb_addr_map_v0";
+  else if (Type == ELF::SHT_LLVM_PGO_BB_ADDR_MAP)
+    OS << "llvm_pgo_bb_addr_map";
   else if (Type == ELF::SHT_LLVM_OFFLOADING)
     OS << "llvm_offloading";
   else if (Type == ELF::SHT_LLVM_LTO)
diff --git a/llvm/lib/Object/ELF.cpp b/llvm/lib/Object/ELF.cpp
index 0d1862e573712f5..e5b2b38048b9382 100644
--- a/llvm/lib/Object/ELF.cpp
+++ b/llvm/lib/Object/ELF.cpp
@@ -312,6 +312,7 @@ StringRef llvm::object::getELFSectionTypeName(uint32_t Machine, unsigned Type) {
     STRINGIFY_ENUM_CASE(ELF, SHT_LLVM_PART_PHDR);
     STRINGIFY_ENUM_CASE(ELF, SHT_LLVM_BB_ADDR_MAP_V0);
     STRINGIFY_ENUM_CASE(ELF, SHT_LLVM_BB_ADDR_MAP);
+    STRINGIFY_ENUM_CASE(ELF, SHT_LLVM_PGO_BB_ADDR_MAP);
     STRINGIFY_ENUM_CASE(ELF, SHT_LLVM_OFFLOADING);
     STRINGIFY_ENUM_CASE(ELF, SHT_LLVM_LTO);
     STRINGIFY_ENUM_CASE(ELF, SHT_GNU_ATTRIBUTES);
@@ -645,11 +646,179 @@ ELFFile<ELFT>::toMappedAddr(uint64_t VAddr, WarningHandler WarnHandler) const {
   return base() + Offset;
 }
 
-template <class ELFT>
-Expected<std::vector<BBAddrMap>>
-ELFFile<ELFT>::decodeBBAddrMap(const Elf_Shdr &Sec,
-                               const Elf_Shdr *RelaSec) const {
-  bool IsRelocatable = getHeader().e_type == ELF::ET_REL;
+// Helper to extract and decode the next ULEB128 value as unsigned int.
+// Returns zero and sets ULEBSizeErr if the ULEB128 value exceeds the unsigned
+// int limit.
+// Also returns zero if ULEBSizeErr is already in an error state.
+// ULEBSizeErr is an out variable if an error occurs.
+template <typename IntTy>
+static IntTy readULEB128As(DataExtractor &Data, DataExtractor::Cursor &Cur,
+                           Error &ULEBSizeErr) {
+  static_assert(std::is_unsigned_v<IntTy> &&
+                    (std::numeric_limits<IntTy>::radix == 2),
+                "only use unsigned radix 2");
+  // Bail out and do not extract data if ULEBSizeErr is already set.
+  if (ULEBSizeErr)
+    return 0;
+  uint64_t Offset = Cur.tell();
+  uint64_t Value = Data.getULEB128(Cur);
+  if (Value > std::numeric_limits<IntTy>::max()) {
+    ULEBSizeErr = createError("ULEB128 value at offset 0x" +
+                              Twine::utohexstr(Offset) + " exceeds UINT" +
+                              Twine(std::numeric_limits<IntTy>::digits) +
+                              "_MAX (0x" + Twine::utohexstr(Value) + ")");
+    return 0;
+  }
+  return static_cast<IntTy>(Value);
+}
+
+namespace {
+struct BasicAddrMapBBEntry {
+  uint32_t ID;
+  uint32_t Offset;
+  uint32_t Size;
+  uint32_t MD;
+};
+
+template <typename ELFT, typename AddrMap> struct AddrMapDecodeTrait;
+
+template <typename ELFT> struct AddrMapDecodeTrait<ELFT, BBAddrMap> {
+  // Base addr map has no extra data
+  struct ExtraData {};
+
+  static constexpr unsigned SectionID = ELF::SHT_LLVM_BB_ADDR_MAP;
+
+  DataExtractor &Data;
+  DataExtractor::Cursor &Cur;
+  Error &ULEBSizeErr;
+  Error &MetadataDecodeErr;
+
+  uint32_t PrevBBEndOffset = 0;
+
+  AddrMapDecodeTrait(DataExtractor &Data, DataExtractor::Cursor &Cur,
+                     Error &ULEBSizeErr, Error &MetadataDecodeErr)
+      : Data(Data), Cur(Cur), ULEBSizeErr(ULEBSizeErr),
+        MetadataDecodeErr(MetadataDecodeErr) {}
+
+  ExtraData decodeExtraDataHeader(uint8_t, uint8_t) {
+    PrevBBEndOffset = 0;
+    return {};
+  }
+
+  // This helper method avoids decoding the metadata so other AddrMaps can use
+  BasicAddrMapBBEntry decodeBasicInfo(uint8_t Version, uint8_t Feature,
+                                      uint32_t BlockIndex) {
+    uint32_t ID = Version >= 2 ? readULEB128As<uint32_t>(Data, Cur, ULEBSizeErr)
+                               : BlockIndex;
+    uint32_t Offset = readULEB128As<uint32_t>(Data, Cur, ULEBSizeErr);
+    uint32_t Size = readULEB128As<uint32_t>(Data, Cur, ULEBSizeErr);
+    uint32_t MD = readULEB128As<uint32_t>(Data, Cur, ULEBSizeErr);
+    if (Version >= 1) {
+      // Offset is calculated relative to the end of the previous BB.
+      Offset += PrevBBEndOffset;
+      PrevBBEndOffset = Offset + Size;
+    }
+    return {ID, Offset, Size, MD};
+  }
+
+  BBAddrMap::BBEntry decodeBBEntry(uint8_t Version, uint8_t Feature,
+                                   uint32_t BlockIndex) {
+    auto [ID, Offset, Size, MD] = decodeBasicInfo(Version, Feature, BlockIndex);
+    auto MetadataOrErr = BBAddrMap::BBEntry::Metadata::decode(MD);
+    if (Error E = MetadataOrErr.takeError()) {
+      MetadataDecodeErr = std::move(E);
+      return {{}, {}, {}, {}};
+    }
+    return {ID, Offset, Size, *MetadataOrErr};
+  }
+
+  BBAddrMap makeAddrMap(ExtraData, uint8_t Feature, uint64_t Address,
+                        std::vector<BBAddrMap::BBEntry> BBEntries) {
+    return {Address, std::move(BBEntries)};
+  }
+};
+
+template <typename ELFT> struct AddrMapDecodeTrait<ELFT, PGOBBAddrMap> {
+  using Features = PGOBBAddrMap::Features;
+
+  struct ExtraData {
+    uint64_t FuncEntryCount;
+  };
+
+  static constexpr unsigned SectionID = ELF::SHT_LLVM_PGO_BB_ADDR_MAP;
+
+  AddrMapDecodeTrait<ELFT, BBAddrMap> Base;
+
+  AddrMapDecodeTrait(DataExtractor &Data, DataExtractor::Cursor &Cur,
+                     Error &ULEBSizeErr, Error &MetadataDecodeErr)
+      : Base(Data, Cur, ULEBSizeErr, MetadataDecodeErr) {}
+
+  ExtraData decodeExtraDataHeader(uint8_t Version, uint8_t Feature) {
+    Base.decodeExtraDataHeader(Version, Feature);
+    if (Version < 2) {
+      // hijack the metadata error if version is too low
+      Base.MetadataDecodeErr =
+          createError("unsupported SHT_LLVM_PGO_BB_ADDR_MAP version: " +
+                      Twine(static_cast<int>(Version)));
+      return {};
+    }
+    return {Feature & uint8_t(PGOBBAddrMap::Features::FuncEntryCnt)
+                ? readULEB128As<uint64_t>(Base.Data, Base.Cur, Base.ULEBSizeErr)
+                : 0};
+  }
+
+  PGOBBAddrMap::BBEntry decodeBBEntry(uint8_t Version, uint8_t Feature,
+                                      uint32_t BlockIndex) {
+    auto [ID, Offset, Size, MD] =
+        Base.decodeBasicInfo(Version, Feature, BlockIndex);
+    auto MetadataOrErr = PGOBBAddrMap::BBEntry::decodeMD(MD);
+    if (Error E = MetadataOrErr.takeError()) {
+      Base.MetadataDecodeErr = std::move(E);
+      return {{{}, {}, {}, {}}, {}, {}};
+    }
+    auto [MetaData, SuccsType] = *MetadataOrErr;
+
+    uint64_t BBF =
+        Feature & uint8_t(PGOBBAddrMap::Features::BBFreq)
+            ? readULEB128As<uint64_t>(Base.Data, Base.Cur, Base.ULEBSizeErr)
+            : 0;
+
+    llvm::SmallVector<PGOBBAddrMap::BBEntry::SuccessorEntry, 2> Successors;
+    if (Feature & uint8_t(PGOBBAddrMap::Features::BrProb)) {
+      auto SuccCount =
+          SuccsType == PGOBBAddrMap::BBEntry::SuccessorsType::Multiple
+              ? readULEB128As<uint64_t>(Base.Data, Base.Cur, Base.ULEBSizeErr)
+              : uint64_t(SuccsType);
+      for (uint64_t I = 0; I < SuccCount; ++I) {
+        uint32_t BBID =
+            readULEB128As<uint32_t>(Base.Data, Base.Cur, Base.ULEBSizeErr);
+        uint32_t BrProb =
+            readULEB128As<uint32_t>(Base.Data, Base.Cur, Base.ULEBSizeErr);
+        Successors.push_back({BBID, BranchProbability::getRaw(BrProb)});
+      }
+    }
+    return PGOBBAddrMap::BBEntry{
+        {ID, Offset, Size, MetaData}, BlockFrequency(BBF), Successors};
+  }
+
+  PGOBBAddrMap makeAddrMap(ExtraData ED, uint8_t Feature, uint64_t Address,
+                           std::vector<PGOBBAddrMap::BBEntry> BBEntries) {
+    return {Address,
+            std::move(BBEntries),
+            ED.FuncEntryCount,
+            bool(Feature & uint8_t(Features::FuncEntryCnt)),
+            bool(Feature & uint8_t(Features::BBFreq)),
+            bool(Feature & uint8_t(Features::BrProb))};
+  }
+};
+} // namespace
+
+template <typename AddrMap, typename ELFT>
+static Expected<std::vector<AddrMap>>
+decodeBBAddrMapCommonImpl(const ELFFile<ELFT> &EF,
+                          const typename ELFFile<ELFT>::Elf_Shdr &Sec,
+                          const typename ELFFile<ELFT>::Elf_Shdr *RelaSec) {
+  bool IsRelocatable = EF.getHeader().e_type == ELF::ET_REL;
 
   // This DenseMap maps the offset of each function (the location of the
   // reference to the function in the SHT_LLVM_BB_ADDR_MAP section) to the
@@ -659,57 +828,44 @@ ELFFile<ELFT>::decodeBBAddrMap(const...
[truncated]

@jh7370 jh7370 requested review from rlavaee and MaskRay November 9, 2023 08:00
@jh7370
Copy link
Collaborator

jh7370 commented Nov 9, 2023

I've added @rlavaee as a reviewer, as they've done a lot of the work for the BB support in the various tools. I think it makes most sense for him to do the high-level review of this.

@rlavaee
Copy link
Contributor

rlavaee commented Nov 10, 2023

Haven't read the code yet, but this is my high-level comment:

Alternatively, we could just place all new fields into BBAddrMap. This makes the encoding/decoding code simpler. It has the cost of changing some behavior in the original API. It also adds unnecessary fields for anyone wanting to just use the regular BBAddrMap.

We can distinguish between the ELF representation (SHT_LLVM_BB_ADDR_MAP) and the in-memory representation (BBAddrMap which is returned by readBBAddrMap). It's perfectly fine to add new fields to the ELF representation. This is why the feature and version attributes exist. This means we don't need to define a new section type and duplicate the encoding/decoding logic. However, it's a sensible idea to define a new PGOBBAddrMap class so we don't clobber the existing BBAddrMap with unnecessary fields. WDYT?

@red1bluelost
Copy link
Member Author

Haven't read the code yet, but this is my high-level comment:

Alternatively, we could just place all new fields into BBAddrMap. This makes the encoding/decoding code simpler. It has the cost of changing some behavior in the original API. It also adds unnecessary fields for anyone wanting to just use the regular BBAddrMap.

We can distinguish between the ELF representation (SHT_LLVM_BB_ADDR_MAP) and the in-memory representation (BBAddrMap which is returned by readBBAddrMap). It's perfectly fine to add new fields to the ELF representation. This is why the feature and version attributes exist. This means we don't need to define a new section type and duplicate the encoding/decoding logic. However, it's a sensible idea to define a new PGOBBAddrMap class so we don't clobber the existing BBAddrMap with unnecessary fields. WDYT?

I see what you mean by sharing ELF section (SHT_LLVM_BB_ADDR_MAP) rather than having it separate (SHT_PGO_LLVM_BB_ADDR_MAP). We would use feature byte to indicate enabled analyses. I would not be opposed to doing this.

If we keep the two classes PGOBBAddrMap and BBAddrMap, I don't think that this would simplify the encoding/decoding logic that much. PGOBBAddrMap would be the same while BBAddrMap would either need to error if Feature is not zero or decode but ignore the extra data. I personally would lean towards having BBAddrMap ignore extra data since PGO... is a superset and I suspect ignoring could simplify 'trait design' complexity.

I'll give a try to combine and have BBAddrMap ignore extra data in the mean time.

@rlavaee
Copy link
Contributor

rlavaee commented Nov 14, 2023

If we keep the two classes PGOBBAddrMap and BBAddrMap, I don't think that this would simplify the encoding/decoding logic that much. PGOBBAddrMap would be the same while BBAddrMap would either need to error if Feature is not zero or decode but ignore the extra data. I personally would lean towards having BBAddrMap ignore extra data since PGO... is a superset and I suspect ignoring could simplify 'trait design' complexity.

One way to do this is to have the decoding function return two structures std::vector<BBAddrMap> and std::vector<PGOFrequencyMap>. PGOFrequencyMap does not store any address information so we don't need to duplicate the base fields. The only drawback is that we need to define the mapping between the two data structures.

For decoding, we can inspect the feature field and populate PGOFrequencyMap if we need to. WDYT?

@red1bluelost
Copy link
Member Author

If we keep the two classes PGOBBAddrMap and BBAddrMap, I don't think that this would simplify the encoding/decoding logic that much. PGOBBAddrMap would be the same while BBAddrMap would either need to error if Feature is not zero or decode but ignore the extra data. I personally would lean towards having BBAddrMap ignore extra data since PGO... is a superset and I suspect ignoring could simplify 'trait design' complexity.

One way to do this is to have the decoding function return two structures std::vector<BBAddrMap> and std::vector<PGOFrequencyMap>. PGOFrequencyMap does not store any address information so we don't need to duplicate the base fields. The only drawback is that we need to define the mapping between the two data structures.

For decoding, we can inspect the feature field and populate PGOFrequencyMap if we need to. WDYT?

I see. I think that should be better. Using the "PGOFrequencyMap" will definitely help simplify Object's decode/encode. ObjectYAML is the harder area to tackle. I have one more idea to try so that we don't need new fields in yaml::BBAddrMapEntry, but if it fails, it might be cleanest to duplicate the fields just in yaml.

I think mapping should be fine. I see two designs.

  1. We could just keep the sizes identical to have one-to-one mapping. The PGOFrequencyMap would contain FuncEntryCount and a vector of BBFreq and BrProbs. In the case where a function is cold or empty, we could just leave the block vector empty.
  2. We could include the function address in PGOFrequencyMap and not make cold or empty function. I'm not sure if there on concerns with unique addresses.

I'll try building the first designs.

@red1bluelost red1bluelost force-pushed the pgo-bb-addr-map--object branch from 746ac2e to 989c084 Compare November 21, 2023 00:36
@red1bluelost
Copy link
Member Author

I went ahead and merged into this PR the design changes suggested @rlavaee. (I called it PGOAnalysisMap instead of PGOFrequencyMap). Using the same ELF section with the feature guard does help to simplify and shorten the code. If the user does not request the PGOAnalysisMap when decoding the BBAddrMap section then the code just ignores any extra data.

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments - I haven't done a particularly detailed review.

Comment on lines 658 to 660
static_assert(std::is_unsigned_v<IntTy> &&
(std::numeric_limits<IntTy>::radix == 2),
"only use unsigned radix 2");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you use enable_if or similar to prevent this function being used by an incorrect type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work but it looks a bit ugly since formats weird and you need bool_constant. Since this method is just a static helper, the static_assert is cleaner and avoids accidental misuse.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you're doing the same thing I imagined. I played around with your code, and the template signature just becomes:

template <typename IntTy,
          typename = std::enable_if_t<std::is_unsigned_v<IntTy> &&
                                      (std::numeric_limits<IntTy>::radix == 2)>>

Which doesn't seem weirdly formatted to me and doesn't use bool_constant? I think enable_if (or any other SFINAE approach) is the more canonical way in C++ of preventing use of a template function rather than a static_assert.

But anyway, is the radix part actually needed? What case would it suppress that you think is potentially going to happen? What would be the issue if the function were to be used with such an integer type (if the static_assert/enable_if weren't present)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using std conjunction and forgot I can just do &&. Either way, I do agree checking radix is probably overkill. The guard is primarily there just to make the next person stop and check if it is fine since we only used it with unsigned int values so far.

I switched it to SFINAE with just checking for unsigned.

Comment on lines +1844 to +1845
IO.mapRequired("ID", E.ID);
IO.mapRequired("BrProb", E.BrProb);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these fields both REALLY required, or could reasonable defaults (including empty strings, 0 etc) be used?

Copy link
Member Author

@red1bluelost red1bluelost Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are listing the successors in the YAML, these two fields are essential. It's optional to have the list of successor but not optional to have these fields in a successor.
Maybe in the future if SuccessorEntry gets more fields then optional would become worth it.

Copy link
Contributor

@rlavaee rlavaee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my concerns.

I wonder if there is any benefit in emitting the PGOAnalysis BB entries next to their BBAddrMap counterparts in the section. I see we are using the Metadata field to emit the successor type (used primarily for succinct decoding/encoding), but we probably shouldn't introduce this kind of coupling.

Here is an alternative:
Emit the entire PGOAnalysisMap entry of each function right after its BBAddrMap (NBlocks and Feature are shared). Define and use your own fancy encoding for successor type and count.

uint64_t Offset = Cur.tell();
uint64_t Value = Data.getULEB128(Cur);
if (Value > std::numeric_limits<IntTy>::max()) {
ULEBSizeErr = createError("ULEB128 value at offset 0x" +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any testing of this error? I couldn't immediately find any in this patch, but I suppose it could exist already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I this is already tested in the regular BBAddrMap tests. Found at ELFObjectFileTest.cpp:564

Comment on lines 658 to 660
static_assert(std::is_unsigned_v<IntTy> &&
(std::numeric_limits<IntTy>::radix == 2),
"only use unsigned radix 2");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you're doing the same thing I imagined. I played around with your code, and the template signature just becomes:

template <typename IntTy,
          typename = std::enable_if_t<std::is_unsigned_v<IntTy> &&
                                      (std::numeric_limits<IntTy>::radix == 2)>>

Which doesn't seem weirdly formatted to me and doesn't use bool_constant? I think enable_if (or any other SFINAE approach) is the more canonical way in C++ of preventing use of a template function rather than a static_assert.

But anyway, is the radix part actually needed? What case would it suppress that you think is potentially going to happen? What would be the issue if the function were to be used with such an integer type (if the static_assert/enable_if weren't present)?

@red1bluelost
Copy link
Member Author

Thanks for addressing my concerns.

I wonder if there is any benefit in emitting the PGOAnalysis BB entries next to their BBAddrMap counterparts in the section. I see we are using the Metadata field to emit the successor type (used primarily for succinct decoding/encoding), but we probably shouldn't introduce this kind of coupling.

Here is an alternative: Emit the entire PGOAnalysisMap entry of each function right after its BBAddrMap (NBlocks and Feature are shared). Define and use your own fancy encoding for successor type and count.

I'll try that out. Without packing into Metadata there is no space saved for Branch Probability. But that probably doesn't matter much.

@red1bluelost
Copy link
Member Author

Thanks for addressing my concerns.

I wonder if there is any benefit in emitting the PGOAnalysis BB entries next to their BBAddrMap counterparts in the section. I see we are using the Metadata field to emit the successor type (used primarily for succinct decoding/encoding), but we probably shouldn't introduce this kind of coupling.

Here is an alternative: Emit the entire PGOAnalysisMap entry of each function right after its BBAddrMap (NBlocks and Feature are shared). Define and use your own fancy encoding for successor type and count.

Done. It now encoded PGOAnalysisMap after each BBAddrMap function if it was enabled in Features.

Copy link

github-actions bot commented Nov 22, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. No more comments from me, barring one possible nit.

@red1bluelost red1bluelost requested a review from rlavaee November 28, 2023 02:59
Copy link
Contributor

@rlavaee rlavaee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates. Only one comment from me.

@rlavaee
Copy link
Contributor

rlavaee commented Nov 30, 2023

Also, please add the "[SHT_LLVM_BB_ADDR_MAP]" tag to the commit message.

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two final nitpicks from me (I didn't check to see if there were other cases of the same issue, but if there are, please fix them.

@rlavaee
Copy link
Contributor

rlavaee commented Dec 1, 2023

Would you please wait until I merge #74107? This should allow removing some boilerplate code from your PR as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants