Skip to content

[BOLT][Instrumentation] Initial instrumentation support for RISCV64 #133882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 17, 2025

Conversation

WangJee
Copy link
Contributor

@WangJee WangJee commented Apr 1, 2025

This patch adds code generation for RISCV64 instrumentation.The work
involved includes the following three points:

a) Implements support for instrumenting direct function call and jump
on RISC-V which relies on , Atomic instructions
(used to increment counters) are only available on RISC-V when the A
extension is used.

b) Implements support for instrumenting direct function inderect call
by implementing the createInstrumentedIndCallHandlerEntryBB and
createInstrumentedIndCallHandlerExitBB interfaces. In this process, we
need to accurately record the target address and IndCallID to ensure
the correct recording of the indirect call counters.

c)Implemented the RISCV64 Bolt runtime library, implemented some system
call interfaces through embedded assembly. Get the difference between
runtime addrress of .text section andstatic address in section header
table, which in turn can be used to search for indirect call description.

However, the community code currently has problems with relocation in
some scenarios, but this has nothing to do with instrumentation. We
may continue to submit patches to fix the related bugs.

    This patch adds code generation for RISCV64 instrumentation.The work
    involved includes the following three points:

    a) Implements support for instrumenting direct function call and jump
    on RISC-V which relies on , Atomic instructions
    (used to increment counters) are only available on RISC-V when the A
    extension is used.

    b) Implements support for instrumenting direct function inderect call
    by implementing the createInstrumentedIndCallHandlerEntryBB and
    createInstrumentedIndCallHandlerExitBB interfaces. In this process, we
    need to accurately record the target address and IndCallID to ensure
    the correct recording of the indirect call counters.

    c)Implemented the RISCV64 Bolt runtime library, implemented some system
    call interfaces through embedded assembly. Get the difference between
    runtime addrress of .text section andstatic address in section header
    table, which in turn can be used to search for indirect call description.

    However, the community code currently has problems with relocation in
    some scenarios, but this has nothing to do with instrumentation. We
    may continue to submit patches to fix the related bugs.
Copy link

github-actions bot commented Apr 1, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@itsmanisk
Copy link

@WangJee , i had few doubts regarding bolt for RISCV , can you please help me out?

TIA

@WangJee
Copy link
Contributor Author

WangJee commented Apr 8, 2025

@maksfb Could you please review this PR?

Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @WangJee, thank you for your contribution! The patch looks clean to me, but I'm not much familiar with RISC-V. Perhaps @dinyy can also chime in? I'm preliminary approving the patch to unblock your progress.

@@ -2901,7 +2901,7 @@ void RewriteInstance::handleRelocation(const SectionRef &RelocatedSection,
// of sections and whose symbol address is not really what should be
// encoded in the instruction). So we essentially disabled this check
// for AArch64 and live with bogus names for objects.
assert((IsAArch64 || IsSectionRelocation ||
assert((IsAArch64 || BC->isRISCV() || IsSectionRelocation ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please update the comment above and substitute AArch64 with either RISC or non-x86?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it has been modified

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please help me merge it, because I don't have the permission to merge yet?

@maksfb maksfb changed the title [Bolt][Instrumentation] Initial instrumentation support for RISCV64 [BOLT][Instrumentation] Initial instrumentation support for RISCV64 Apr 15, 2025
@llvmbot llvmbot added the BOLT label Apr 15, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 15, 2025

@llvm/pr-subscribers-bolt

Author: wangjue (WangJee)

Changes
This patch adds code generation for RISCV64 instrumentation.The work
involved includes the following three points:

a) Implements support for instrumenting direct function call and jump
on RISC-V which relies on , Atomic instructions
(used to increment counters) are only available on RISC-V when the A
extension is used.

b) Implements support for instrumenting direct function inderect call
by implementing the createInstrumentedIndCallHandlerEntryBB and
createInstrumentedIndCallHandlerExitBB interfaces. In this process, we
need to accurately record the target address and IndCallID to ensure
the correct recording of the indirect call counters.

c)Implemented the RISCV64 Bolt runtime library, implemented some system
call interfaces through embedded assembly. Get the difference between
runtime addrress of .text section andstatic address in section header
table, which in turn can be used to search for indirect call description.

However, the community code currently has problems with relocation in
some scenarios, but this has nothing to do with instrumentation. We
may continue to submit patches to fix the related bugs.

Patch is 45.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133882.diff

11 Files Affected:

  • (modified) bolt/CMakeLists.txt (+2-1)
  • (modified) bolt/lib/Core/Relocation.cpp (+4)
  • (modified) bolt/lib/Rewrite/RewriteInstance.cpp (+1-1)
  • (modified) bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp (+378)
  • (modified) bolt/runtime/CMakeLists.txt (+10-4)
  • (modified) bolt/runtime/common.h (+3-1)
  • (modified) bolt/runtime/instr.cpp (+48)
  • (added) bolt/runtime/sys_riscv64.h (+460)
  • (added) bolt/test/runtime/RISCV/basic-instrumentation.s (+33)
  • (added) bolt/test/runtime/RISCV/instrumentation-ind-call.c (+39)
  • (added) bolt/test/runtime/RISCV/lit.local.cfg (+2)
diff --git a/bolt/CMakeLists.txt b/bolt/CMakeLists.txt
index 73c3ab54722e9..52c796518ac05 100644
--- a/bolt/CMakeLists.txt
+++ b/bolt/CMakeLists.txt
@@ -82,7 +82,8 @@ endforeach()
 
 set(BOLT_ENABLE_RUNTIME_default OFF)
 if ((CMAKE_SYSTEM_PROCESSOR STREQUAL "x86_64"
-    OR CMAKE_SYSTEM_PROCESSOR MATCHES "^(arm64|aarch64)$")
+    OR CMAKE_SYSTEM_PROCESSOR MATCHES "^(arm64|aarch64)$"
+    OR CMAKE_SYSTEM_PROCESSOR STREQUAL "riscv64")
     AND (CMAKE_SYSTEM_NAME STREQUAL "Linux"
       OR CMAKE_SYSTEM_NAME STREQUAL "Darwin")
     AND (NOT CMAKE_CROSSCOMPILING))
diff --git a/bolt/lib/Core/Relocation.cpp b/bolt/lib/Core/Relocation.cpp
index ff1681f823987..f099dfa46f3d4 100644
--- a/bolt/lib/Core/Relocation.cpp
+++ b/bolt/lib/Core/Relocation.cpp
@@ -123,6 +123,7 @@ static bool isSupportedRISCV(uint32_t Type) {
   case ELF::R_RISCV_LO12_S:
   case ELF::R_RISCV_64:
   case ELF::R_RISCV_TLS_GOT_HI20:
+  case ELF::R_RISCV_TLS_GD_HI20:
   case ELF::R_RISCV_TPREL_HI20:
   case ELF::R_RISCV_TPREL_ADD:
   case ELF::R_RISCV_TPREL_LO12_I:
@@ -236,6 +237,7 @@ static size_t getSizeForTypeRISCV(uint32_t Type) {
   case ELF::R_RISCV_64:
   case ELF::R_RISCV_GOT_HI20:
   case ELF::R_RISCV_TLS_GOT_HI20:
+  case ELF::R_RISCV_TLS_GD_HI20:
     // See extractValueRISCV for why this is necessary.
     return 8;
   }
@@ -491,6 +493,7 @@ static uint64_t extractValueRISCV(uint32_t Type, uint64_t Contents,
     return extractBImmRISCV(Contents);
   case ELF::R_RISCV_GOT_HI20:
   case ELF::R_RISCV_TLS_GOT_HI20:
+  case ELF::R_RISCV_TLS_GD_HI20:
     // We need to know the exact address of the GOT entry so we extract the
     // value from both the AUIPC and L[D|W]. We cannot rely on the symbol in the
     // relocation for this since it simply refers to the object that is stored
@@ -707,6 +710,7 @@ static bool isPCRelativeRISCV(uint32_t Type) {
   case ELF::R_RISCV_RVC_BRANCH:
   case ELF::R_RISCV_32_PCREL:
   case ELF::R_RISCV_TLS_GOT_HI20:
+  case ELF::R_RISCV_TLS_GD_HI20:
     return true;
   }
 }
diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp
index a03df3de39e83..ab7685fd24783 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -2931,7 +2931,7 @@ void RewriteInstance::handleRelocation(const SectionRef &RelocatedSection,
       // of sections and whose symbol address is not really what should be
       // encoded in the instruction). So we essentially disabled this check
       // for AArch64 and live with bogus names for objects.
-      assert((IsAArch64 || IsSectionRelocation ||
+      assert((IsAArch64 || BC->isRISCV() || IsSectionRelocation ||
               BD->nameStartsWith(SymbolName) ||
               BD->nameStartsWith("PG" + SymbolName) ||
               (BD->nameStartsWith("ANONYMOUS") &&
diff --git a/bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp b/bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp
index 4320c679acd54..0e27d29019e95 100644
--- a/bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp
+++ b/bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp
@@ -14,7 +14,9 @@
 #include "MCTargetDesc/RISCVMCTargetDesc.h"
 #include "bolt/Core/MCPlusBuilder.h"
 #include "llvm/BinaryFormat/ELF.h"
+#include "llvm/MC/MCContext.h"
 #include "llvm/MC/MCInst.h"
+#include "llvm/MC/MCInstBuilder.h"
 #include "llvm/MC/MCSubtargetInfo.h"
 #include "llvm/Support/ErrorHandling.h"
 
@@ -72,6 +74,7 @@ class RISCVMCPlusBuilder : public MCPlusBuilder {
     case ELF::R_RISCV_LO12_I:
     case ELF::R_RISCV_LO12_S:
     case ELF::R_RISCV_TLS_GOT_HI20:
+    case ELF::R_RISCV_TLS_GD_HI20:
       return true;
     default:
       llvm_unreachable("Unexpected RISCV relocation type in code");
@@ -252,6 +255,11 @@ class RISCVMCPlusBuilder : public MCPlusBuilder {
     return createCall(RISCV::PseudoCALL, Inst, Target, Ctx);
   }
 
+  void createLongTailCall(InstructionListType &Seq, const MCSymbol *Target,
+                          MCContext *Ctx) override {
+    createShortJmp(Seq, Target, Ctx, /*IsTailCall*/ true);
+  }
+
   void createTailCall(MCInst &Inst, const MCSymbol *Target,
                       MCContext *Ctx) override {
     return createCall(RISCV::PseudoTAIL, Inst, Target, Ctx);
@@ -424,6 +432,7 @@ class RISCVMCPlusBuilder : public MCPlusBuilder {
       return Expr;
     case ELF::R_RISCV_GOT_HI20:
     case ELF::R_RISCV_TLS_GOT_HI20:
+    case ELF::R_RISCV_TLS_GD_HI20:
       // The GOT is reused so no need to create GOT relocations
     case ELF::R_RISCV_PCREL_HI20:
       return RISCVMCExpr::create(Expr, RISCVMCExpr::VK_PCREL_HI, Ctx);
@@ -483,6 +492,375 @@ class RISCVMCPlusBuilder : public MCPlusBuilder {
       return 2;
     return 4;
   }
+
+  void createStackPointerIncrement(
+      MCInst &Inst, int imm,
+      bool NoFlagsClobber = false /*unused for RISCV*/) const override {
+    Inst = MCInstBuilder(RISCV::ADDI)
+               .addReg(RISCV::X2)
+               .addReg(RISCV::X2)
+               .addImm(-imm);
+  }
+
+  void createStackPointerDecrement(
+      MCInst &Inst, int imm,
+      bool NoFlagsClobber = false /*unused for RISCV*/) const override {
+    Inst = MCInstBuilder(RISCV::ADDI)
+               .addReg(RISCV::X2)
+               .addReg(RISCV::X2)
+               .addImm(imm);
+  }
+
+  void loadReg(MCInst &Inst, MCPhysReg To, MCPhysReg From,
+               int64_t offset) const {
+    Inst = MCInstBuilder(RISCV::LD).addReg(To).addReg(From).addImm(offset);
+  }
+
+  void storeReg(MCInst &Inst, MCPhysReg From, MCPhysReg To,
+                int64_t offset) const {
+    Inst = MCInstBuilder(RISCV::SD).addReg(From).addReg(To).addImm(offset);
+  }
+
+  void spillRegs(InstructionListType &Insts,
+                 const SmallVector<unsigned> &Regs) const {
+    Insts.emplace_back();
+    createStackPointerIncrement(Insts.back(), Regs.size() * 8);
+
+    int64_t Offset = 0;
+    for (auto Reg : Regs) {
+      Insts.emplace_back();
+      storeReg(Insts.back(), Reg, RISCV::X2, Offset);
+      Offset += 8;
+    }
+  }
+
+  void reloadRegs(InstructionListType &Insts,
+                  const SmallVector<unsigned> &Regs) const {
+    int64_t Offset = 0;
+    for (auto Reg : Regs) {
+      Insts.emplace_back();
+      loadReg(Insts.back(), Reg, RISCV::X2, Offset);
+      Offset += 8;
+    }
+
+    Insts.emplace_back();
+    createStackPointerDecrement(Insts.back(), Regs.size() * 8);
+  }
+
+  void atomicAdd(MCInst &Inst, MCPhysReg RegAtomic, MCPhysReg RegTo,
+                 MCPhysReg RegCnt) const {
+    Inst = MCInstBuilder(RISCV::AMOADD_D)
+               .addReg(RegAtomic)
+               .addReg(RegTo)
+               .addReg(RegCnt);
+  }
+
+  InstructionListType createCmpJE(MCPhysReg RegNo, MCPhysReg RegTmp,
+                                  const MCSymbol *Target,
+                                  MCContext *Ctx) const {
+    InstructionListType Insts;
+    Insts.emplace_back(
+        MCInstBuilder(RISCV::SUB).addReg(RegTmp).addReg(RegNo).addReg(RegNo));
+    Insts.emplace_back(MCInstBuilder(RISCV::BEQ)
+                           .addReg(RegNo)
+                           .addReg(RegTmp)
+                           .addExpr(MCSymbolRefExpr::create(
+                               Target, MCSymbolRefExpr::VK_None, *Ctx)));
+    return Insts;
+  }
+
+  void createTrap(MCInst &Inst) const override {
+    Inst.clear();
+    Inst.setOpcode(RISCV::EBREAK);
+  }
+
+  void createShortJmp(InstructionListType &Seq, const MCSymbol *Target,
+                      MCContext *Ctx, bool IsTailCall) override {
+    // The sequence of instructions we create here is the following:
+    //  auipc   a5, hi20(Target)
+    //  addi    a5, a5, low12(Target)
+    //  jr x5 => jalr x0, x5, 0
+    MCPhysReg Reg = RISCV::X5;
+    InstructionListType Insts = materializeAddress(Target, Ctx, Reg);
+    Insts.emplace_back();
+    MCInst &Inst = Insts.back();
+    Inst.clear();
+    Inst = MCInstBuilder(RISCV::JALR).addReg(RISCV::X0).addReg(Reg).addImm(0);
+    if (IsTailCall)
+      setTailCall(Inst);
+    Seq.swap(Insts);
+  }
+
+  InstructionListType createGetter(MCContext *Ctx, const char *name) const {
+    InstructionListType Insts(4);
+    MCSymbol *Locs = Ctx->getOrCreateSymbol(name);
+    InstructionListType Addr = materializeAddress(Locs, Ctx, RISCV::X10);
+    std::copy(Addr.begin(), Addr.end(), Insts.begin());
+    loadReg(Insts[2], RISCV::X10, RISCV::X10, 0);
+    createReturn(Insts[3]);
+    return Insts;
+  }
+
+  InstructionListType createIncMemory(MCPhysReg RegTo, MCPhysReg RegCnt,
+                                      MCPhysReg RegAtomic) const {
+    InstructionListType Insts;
+    Insts.emplace_back();
+    Insts.back() =
+        MCInstBuilder(RISCV::ADDI).addReg(RegCnt).addReg(RegAtomic).addImm(1);
+    Insts.emplace_back();
+    atomicAdd(Insts.back(), RegAtomic, RegTo, RegCnt);
+    return Insts;
+  }
+
+  InstructionListType materializeAddress(const MCSymbol *Target, MCContext *Ctx,
+                                         MCPhysReg RegName,
+                                         int64_t Addend = 0) const override {
+    // Get the symbol address by auipc + addi
+    InstructionListType Insts(2);
+    MCSymbol *AuipcLabel = Ctx->createNamedTempSymbol("pcrel_hi");
+    Insts[0] = MCInstBuilder(RISCV::AUIPC).addReg(RegName).addImm(0);
+    setOperandToSymbolRef(Insts[0], /* OpNum */ 1, Target, Addend, Ctx,
+                          ELF::R_RISCV_PCREL_HI20);
+    setInstLabel(Insts[0], AuipcLabel);
+
+    Insts[1] =
+        MCInstBuilder(RISCV::ADDI).addReg(RegName).addReg(RegName).addImm(0);
+    setOperandToSymbolRef(Insts[1], /* OpNum */ 2, AuipcLabel, Addend, Ctx,
+                          ELF::R_RISCV_PCREL_LO12_I);
+    return Insts;
+  }
+
+  InstructionListType
+  createInstrIncMemory(const MCSymbol *Target, MCContext *Ctx, bool IsLeaf,
+                       unsigned CodePointerSize) const override {
+    // We need 2 scratch registers: one for the target address (x10), and one
+    // for the increment value (x11).
+    // addi sp, sp, -16
+    // sd x10, 0(sp)
+    // sd x11, 8(sp)
+    // la x10, target         # 1: auipc x10, %pcrel_hi(target)
+    //                        # addi x10, x10, %pcrel_lo(1b)
+    // li x11, 1              # addi x11, zero, 1
+    // amoadd.d zero, x10, x11
+    // ld x10, 0(sp)
+    // ld x11, 8(sp)
+    // addi sp, sp, 16
+
+    InstructionListType Insts;
+    spillRegs(Insts, {RISCV::X10, RISCV::X11});
+    InstructionListType Addr = materializeAddress(Target, Ctx, RISCV::X10);
+    Insts.insert(Insts.end(), Addr.begin(), Addr.end());
+    InstructionListType IncInsts =
+        createIncMemory(RISCV::X10, RISCV::X11, RISCV::X0);
+    Insts.insert(Insts.end(), IncInsts.begin(), IncInsts.end());
+    reloadRegs(Insts, {RISCV::X10, RISCV::X11});
+    return Insts;
+  }
+
+  void createDirectCall(MCInst &Inst, const MCSymbol *Target, MCContext *Ctx,
+                        bool IsTailCall) override {
+    Inst.setOpcode(RISCV::JAL);
+    Inst.clear();
+    if (IsTailCall) {
+      Inst.addOperand(MCOperand::createReg(RISCV::X0));
+      Inst.addOperand(MCOperand::createExpr(getTargetExprFor(
+          Inst, MCSymbolRefExpr::create(Target, MCSymbolRefExpr::VK_None, *Ctx),
+          *Ctx, 0)));
+      convertJmpToTailCall(Inst);
+    } else {
+      Inst.addOperand(MCOperand::createReg(RISCV::X1));
+      Inst.addOperand(MCOperand::createExpr(getTargetExprFor(
+          Inst, MCSymbolRefExpr::create(Target, MCSymbolRefExpr::VK_None, *Ctx),
+          *Ctx, 0)));
+    }
+  }
+
+  void createIndirectCallInst(MCInst &Inst, bool IsTailCall, MCPhysReg Reg,
+                              int64_t Disp) const {
+    Inst.clear();
+    Inst.setOpcode(RISCV::JALR);
+    Inst.clear();
+    if (IsTailCall) {
+      Inst.addOperand(MCOperand::createReg(RISCV::X0));
+      Inst.addOperand(MCOperand::createReg(Reg));
+      Inst.addOperand(MCOperand::createImm(Disp));
+    } else {
+      Inst.addOperand(MCOperand::createReg(RISCV::X1));
+      Inst.addOperand(MCOperand::createReg(Reg));
+      Inst.addOperand(MCOperand::createImm(Disp));
+    }
+  }
+
+  InstructionListType
+  createInstrumentedIndCallHandlerEntryBB(const MCSymbol *InstrTrampoline,
+                                          const MCSymbol *IndCallHandler,
+                                          MCContext *Ctx) override {
+    // Code sequence used to check whether InstrTampoline was initialized
+    // and call it if so, returns via IndCallHandler
+    //   sp      -16(sp)
+    //   sd      x10, 0(sp)
+    //   sd      x11, 0(sp)
+    //   la      x10, InstrTrampoline -> auipc + addi
+    //   ld      x10, [x10]
+    //   beq     x10, x11, IndCallHandler
+    //   sp      -16(sp)
+    //   sd      x1, 0(sp)
+    //   jalr    x1,x10,0
+    //   ld      x1, [sp], #16
+    //   sp      16(sp)
+    //   jal     x0, IndCallHandler
+
+    InstructionListType Insts;
+    spillRegs(Insts, {RISCV::X10, RISCV::X11});
+    InstructionListType Addr =
+        materializeAddress(InstrTrampoline, Ctx, RISCV::X10);
+    Insts.insert(Insts.end(), Addr.begin(), Addr.end());
+    Insts.emplace_back();
+    loadReg(Insts.back(), RISCV::X10, RISCV::X10, 0);
+    InstructionListType cmpJmp =
+        createCmpJE(RISCV::X10, RISCV::X11, IndCallHandler, Ctx);
+    Insts.insert(Insts.end(), cmpJmp.begin(), cmpJmp.end());
+    Insts.emplace_back();
+    createStackPointerIncrement(Insts.back(), 16);
+    Insts.emplace_back();
+    storeReg(Insts.back(), RISCV::X1, RISCV::X2, 0);
+    Insts.emplace_back();
+    createIndirectCallInst(Insts.back(), /*IsTailCall*/ false, RISCV::X10, 0);
+    Insts.emplace_back();
+    loadReg(Insts.back(), RISCV::X1, RISCV::X2, 0);
+    Insts.emplace_back();
+    createStackPointerDecrement(Insts.back(), 16);
+    Insts.emplace_back();
+    createDirectCall(Insts.back(), IndCallHandler, Ctx, /*IsTailCall*/ true);
+    return Insts;
+  }
+
+  InstructionListType createInstrumentedIndCallHandlerExitBB() const override {
+    InstructionListType Insts;
+    reloadRegs(Insts, {RISCV::X10, RISCV::X11});
+    Insts.emplace_back();
+    loadReg(Insts.back(), RISCV::X5, RISCV::X2, 0);
+    Insts.emplace_back();
+    createStackPointerDecrement(Insts.back(), 16);
+    reloadRegs(Insts, {RISCV::X10, RISCV::X11});
+    Insts.emplace_back();
+    createIndirectCallInst(Insts.back(), /*IsTailCall*/ true, RISCV::X5, 0);
+    return Insts;
+  }
+
+  InstructionListType
+  createInstrumentedIndTailCallHandlerExitBB() const override {
+    return createInstrumentedIndCallHandlerExitBB();
+  }
+
+  std::vector<MCInst> createSymbolTrampoline(const MCSymbol *TgtSym,
+                                             MCContext *Ctx) override {
+    std::vector<MCInst> Insts;
+    createShortJmp(Insts, TgtSym, Ctx, /*IsTailCall*/ true);
+    return Insts;
+  }
+
+  InstructionListType createNumCountersGetter(MCContext *Ctx) const override {
+    return createGetter(Ctx, "__bolt_num_counters");
+  }
+
+  InstructionListType
+  createInstrLocationsGetter(MCContext *Ctx) const override {
+    return createGetter(Ctx, "__bolt_instr_locations");
+  }
+
+  InstructionListType createInstrTablesGetter(MCContext *Ctx) const override {
+    return createGetter(Ctx, "__bolt_instr_tables");
+  }
+
+  InstructionListType createInstrNumFuncsGetter(MCContext *Ctx) const override {
+    return createGetter(Ctx, "__bolt_instr_num_funcs");
+  }
+
+  void convertIndirectCallToLoad(MCInst &Inst, MCPhysReg Reg,
+                                 MCPhysReg ZeroReg) const {
+    bool IsTailCall = isTailCall(Inst);
+    if (IsTailCall)
+      removeAnnotation(Inst, MCPlus::MCAnnotation::kTailCall);
+    Inst.setOpcode(RISCV::ADD);
+    Inst.insert(Inst.begin(), MCOperand::createReg(Reg));
+    Inst.insert(Inst.begin() + 1, MCOperand::createReg(ZeroReg));
+    return;
+  }
+
+  InstructionListType createLoadImmediate(const MCPhysReg Dest,
+                                          uint64_t Imm) const override {
+    InstructionListType Insts;
+    // get IMM higher 32bit
+    Insts.emplace_back(
+        MCInstBuilder(RISCV::LUI).addReg(Dest).addImm((Imm >> 44) & 0xFFFFF));
+    Insts.emplace_back(MCInstBuilder(RISCV::LUI)
+                           .addReg(RISCV::X5)
+                           .addImm((Imm >> 32) & 0xFFF));
+    Insts.emplace_back(MCInstBuilder(RISCV::SRLI)
+                           .addReg(RISCV::X5)
+                           .addReg(RISCV::X5)
+                           .addImm(12));
+    Insts.emplace_back(
+        MCInstBuilder(RISCV::OR).addReg(Dest).addReg(Dest).addReg(RISCV::X5));
+    Insts.emplace_back(
+        MCInstBuilder(RISCV::SLLI).addReg(Dest).addReg(Dest).addImm(32));
+
+    // get IMM lower 32bit
+    Insts.emplace_back(MCInstBuilder(RISCV::LUI)
+                           .addReg(RISCV::X5)
+                           .addImm((Imm >> 12) & 0xFFFFF));
+    Insts.emplace_back(
+        MCInstBuilder(RISCV::LUI).addReg(RISCV::X6).addImm((Imm)&0xFFF));
+    Insts.emplace_back(MCInstBuilder(RISCV::SRLI)
+                           .addReg(RISCV::X6)
+                           .addReg(RISCV::X6)
+                           .addImm(12));
+    Insts.emplace_back(
+        MCInstBuilder(RISCV::OR).addReg(RISCV::X5).addReg(RISCV::X5).addReg(
+            RISCV::X6));
+
+    // get 64bit IMM
+    Insts.emplace_back(
+        MCInstBuilder(RISCV::OR).addReg(Dest).addReg(Dest).addReg(RISCV::X5));
+    return Insts;
+  }
+
+  InstructionListType createInstrumentedIndirectCall(MCInst &&CallInst,
+                                                     MCSymbol *HandlerFuncAddr,
+                                                     int CallSiteID,
+                                                     MCContext *Ctx) override {
+    // Code sequence used to enter indirect call instrumentation helper:
+    //   addi  sp, sp, -0x10
+    //   sd  a0, 0x0(sp)
+    //   sd  a1, 0x8(sp)
+    //   mov target x0  convertIndirectCallToLoad -> add a0, zero, target
+    //   mov x1 CallSiteID createLoadImmediate
+    //   addi  sp, sp, -0x10
+    //   sd  a0, 0x0(sp)
+    //   sd  a1, 0x8(sp)
+    //   la x0 *HandlerFuncAddr -> auipc + addi
+    //   jalr x0
+
+    InstructionListType Insts;
+    spillRegs(Insts, {RISCV::X10, RISCV::X11});
+    Insts.emplace_back(CallInst);
+    convertIndirectCallToLoad(Insts.back(), RISCV::X10, RISCV::X0);
+    InstructionListType LoadImm = createLoadImmediate(RISCV::X11, CallSiteID);
+    Insts.insert(Insts.end(), LoadImm.begin(), LoadImm.end());
+    spillRegs(Insts, {RISCV::X10, RISCV::X11});
+    InstructionListType Addr =
+        materializeAddress(HandlerFuncAddr, Ctx, RISCV::X5);
+    Insts.insert(Insts.end(), Addr.begin(), Addr.end());
+    Insts.emplace_back();
+    createIndirectCallInst(Insts.back(), isTailCall(CallInst), RISCV::X5, 0);
+
+    // // Carry over metadata including tail call marker if present.
+    stripAnnotations(Insts.back());
+    moveAnnotations(std::move(CallInst), Insts.back());
+
+    return Insts;
+  }
 };
 
 } // end anonymous namespace
diff --git a/bolt/runtime/CMakeLists.txt b/bolt/runtime/CMakeLists.txt
index 0deb69a27d435..87cc44812da11 100644
--- a/bolt/runtime/CMakeLists.txt
+++ b/bolt/runtime/CMakeLists.txt
@@ -35,15 +35,21 @@ set(BOLT_RT_FLAGS
   -fno-exceptions
   -fno-rtti
   -fno-stack-protector
-  -fPIC
-  -mgeneral-regs-only)
+  -fPIC)
 if (CMAKE_SYSTEM_PROCESSOR MATCHES "x86_64")
-  set(BOLT_RT_FLAGS ${BOLT_RT_FLAGS} "-mno-sse")
+  set(BOLT_RT_FLAGS ${BOLT_RT_FLAGS} 
+    -mno-sse 
+    -mgeneral-regs-only)
+endif()
+if (CMAKE_SYSTEM_PROCESSOR MATCHES "riscv64")
+  set(BOLT_RT_FLAGS ${BOLT_RT_FLAGS})
 endif()
 if (CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64")
   check_cxx_compiler_flag("-mno-outline-atomics" CXX_SUPPORTS_OUTLINE_ATOMICS)
   if (CXX_SUPPORTS_OUTLINE_ATOMICS)
-    set(BOLT_RT_FLAGS ${BOLT_RT_FLAGS} "-mno-outline-atomics")
+    set(BOLT_RT_FLAGS ${BOLT_RT_FLAGS} 
+      -mno-outline-atomics 
+      -mgeneral-regs-only)
   endif()
 endif()
 
diff --git a/bolt/runtime/common.h b/bolt/runtime/common.h
index 27d0830071067..3461f8c545f9f 100644
--- a/bolt/runtime/common.h
+++ b/bolt/runtime/common.h
@@ -153,10 +153,12 @@ struct timespec {
 
 #if defined(__aarch64__) || defined(__arm64__)
 #include "sys_aarch64.h"
+#elif defined(__riscv)
+#include "sys_riscv64.h"
 #elif defined(__x86_64__)
 #include "sys_x86_64.h"
 #else
-#error "For AArch64/ARM64 and X86_64 only."
+#error "For AArch64/ARM64,X86_64 AND RISCV64 only."
 #endif
 
 constexpr uint32_t BufSize = 10240;
...
[truncated]

    This patch adds code generation for RISCV64 instrumentation.The work
    involved includes the following three points:

    a) Implements support for instrumenting direct function call and jump
    on RISC-V which relies on , Atomic instructions
    (used to increment counters) are only available on RISC-V when the A
    extension is used.

    b) Implements support for instrumenting direct function inderect call
    by implementing the createInstrumentedIndCallHandlerEntryBB and
    createInstrumentedIndCallHandlerExitBB interfaces. In this process, we
    need to accurately record the target address and IndCallID to ensure
    the correct recording of the indirect call counters.

    c)Implemented the RISCV64 Bolt runtime library, implemented some system
    call interfaces through embedded assembly. Get the difference between
    runtime addrress of .text section andstatic address in section header
    table, which in turn can be used to search for indirect call description.

    However, the community code currently has problems with relocation in
    some scenarios, but this has nothing to do with instrumentation. We
    may continue to submit patches to fix the related bugs.
@aaupov aaupov merged commit dbb79c3 into llvm:main Apr 17, 2025
7 of 10 checks passed
Copy link

@WangJee Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions cpp,h,c -- bolt/runtime/sys_riscv64.h bolt/test/runtime/RISCV/instrumentation-ind-call.c bolt/lib/Core/Relocation.cpp bolt/lib/Rewrite/RewriteInstance.cpp bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp bolt/runtime/common.h bolt/runtime/instr.cpp
View the diff from clang-format here.
diff --git a/bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp b/bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp
index 0e27d2901..c8a747bee 100644
--- a/bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp
+++ b/bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp
@@ -811,7 +811,7 @@ public:
                            .addReg(RISCV::X5)
                            .addImm((Imm >> 12) & 0xFFFFF));
     Insts.emplace_back(
-        MCInstBuilder(RISCV::LUI).addReg(RISCV::X6).addImm((Imm)&0xFFF));
+        MCInstBuilder(RISCV::LUI).addReg(RISCV::X6).addImm((Imm) & 0xFFF));
     Insts.emplace_back(MCInstBuilder(RISCV::SRLI)
                            .addReg(RISCV::X6)
                            .addReg(RISCV::X6)
diff --git a/bolt/test/runtime/RISCV/instrumentation-ind-call.c b/bolt/test/runtime/RISCV/instrumentation-ind-call.c
index 1fd49a774..318858dec 100644
--- a/bolt/test/runtime/RISCV/instrumentation-ind-call.c
+++ b/bolt/test/runtime/RISCV/instrumentation-ind-call.c
@@ -1,7 +1,7 @@
 /*
 REQUIRES: system-linux,bolt-runtime
 
-RUN: %clang %cflags %s -o %t.exe -Wl,-q 
+RUN: %clang %cflags %s -o %t.exe -Wl,-q
 
 RUN: llvm-bolt %t.exe --instrument --instrumentation-file=%t.fdata \
 RUN:   -o %t.instrumented

@kazutakahirata
Copy link
Contributor

@aaupov @WangJee I'm seeing warnings like:

bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp:558:23: error: '(anonymous namespace)::RISCVMCPlusBuilder::createCmpJE' hides overloaded virtual function [-Werror,-Woverloaded-virtual]
  InstructionListType createCmpJE(MCPhysReg RegNo, MCPhysReg RegTmp,
                      ^

and:

bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp:780:8: error: '(anonymous namespace)::RISCVMCPlusBuilder::convertIndirectCallToLoad' hides overloaded virtual function [-Werror,-Woverloaded-virtual]
  void convertIndirectCallToLoad(MCInst &Inst, MCPhysReg Reg,
       ^

Would you mind taking a look? Thanks!

var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…lvm#133882)

This patch adds code generation for RISCV64 instrumentation.The work
    involved includes the following three points:

a) Implements support for instrumenting direct function call and jump
    on RISC-V which relies on , Atomic instructions
    (used to increment counters) are only available on RISC-V when the A
    extension is used.

b) Implements support for instrumenting direct function inderect call
    by implementing the createInstrumentedIndCallHandlerEntryBB and
createInstrumentedIndCallHandlerExitBB interfaces. In this process, we
    need to accurately record the target address and IndCallID to ensure
    the correct recording of the indirect call counters.

c)Implemented the RISCV64 Bolt runtime library, implemented some system
call interfaces through embedded assembly. Get the difference between
runtime addrress of .text section andstatic address in section header
table, which in turn can be used to search for indirect call
description.

However, the community code currently has problems with relocation in
    some scenarios, but this has nothing to do with instrumentation. We
    may continue to submit patches to fix the related bugs.
@WangJee
Copy link
Contributor Author

WangJee commented Apr 17, 2025

@kazutakahirata Thanks for your reminder, I submitted a new PR to fix these warnings #136129

@dinyy
Copy link

dinyy commented Apr 17, 2025

Sorry, I didn't see this message yesterday and missed joining in. @maksfb I spent some time trying to use his code and found some problen. I fount that it couldn't pass the test in bolt/test/runtime/RISCV/basic-instrumentation.s.

1 __do_global_dtors_aux/1 8 1 __do_global_dtors_aux/1 a 0 1
1 __do_global_dtors_aux/1 16 1 __do_global_dtors_aux/1 18 0 1
1 __do_global_dtors_aux/1 20 1 __do_global_dtors_aux/1 22 0 1
1 __do_global_dtors_aux/1 20 0 [unknown] 0 0 1
0 [unknown] 0 1 _start 0 0 1
0 [unknown] 0 1 load_gp/1 0 0 1
0 [unknown] 0 1 __do_global_dtors_aux/1 0 0 1
0 [unknown] 0 1 frame_dummy/1 0 0 1
0 [unknown] 0 1 main 0 0 1
0 [unknown] 0 1 f 0 0 1

but the correct on is that:

1 _start 0 1 load_gp/1 0 0 1
1 _start 1c 1 __libc_start_main@PLT 0 0 1
1 __do_global_dtors_aux/1 8 1 __do_global_dtors_aux/1 a 0 1
1 __do_global_dtors_aux/1 16 1 __do_global_dtors_aux/1 18 0 1
1 __do_global_dtors_aux/1 20 1 __do_global_dtors_aux/1 22 0 1
1 __do_global_dtors_aux/1 22 1 deregister_tm_clones/1 0 0 1
1 frame_dummy/1 0 1 register_tm_clones/1 0 0 1
1 main 4 1 f 0 0 1
1 __do_global_dtors_aux/1 20 0 [unknown] 0 0 1
0 [unknown] 0 1 _start 0 0 1
0 [unknown] 0 1 __do_global_dtors_aux/1 0 0 1
0 [unknown] 0 1 frame_dummy/1 0 0 1
0 [unknown] 0 1 main 0 0 1

and here is my basic-instr.exe . Could you fix this problem? @WangJee

@kazutakahirata
Copy link
Contributor

@WangJee @svs-quic @aaupov I've uploaded #136172.

@kazutakahirata
Copy link
Contributor

@WangJee @svs-quic @aaupov I've uploaded #136172.

I didn't realize #136129 was already up. Thanks for the fix!

@dinyy
Copy link

dinyy commented Apr 18, 2025

Sorry, I didn't see this message yesterday and missed joining in. @maksfb I spent some time trying to use his code and found some problen. I fount that it couldn't pass the test in bolt/test/runtime/RISCV/basic-instrumentation.s.

1 __do_global_dtors_aux/1 8 1 __do_global_dtors_aux/1 a 0 1
1 __do_global_dtors_aux/1 16 1 __do_global_dtors_aux/1 18 0 1
1 __do_global_dtors_aux/1 20 1 __do_global_dtors_aux/1 22 0 1
1 __do_global_dtors_aux/1 20 0 [unknown] 0 0 1
0 [unknown] 0 1 _start 0 0 1
0 [unknown] 0 1 load_gp/1 0 0 1
0 [unknown] 0 1 __do_global_dtors_aux/1 0 0 1
0 [unknown] 0 1 frame_dummy/1 0 0 1
0 [unknown] 0 1 main 0 0 1
0 [unknown] 0 1 f 0 0 1

but the correct on is that:

1 _start 0 1 load_gp/1 0 0 1
1 _start 1c 1 __libc_start_main@PLT 0 0 1
1 __do_global_dtors_aux/1 8 1 __do_global_dtors_aux/1 a 0 1
1 __do_global_dtors_aux/1 16 1 __do_global_dtors_aux/1 18 0 1
1 __do_global_dtors_aux/1 20 1 __do_global_dtors_aux/1 22 0 1
1 __do_global_dtors_aux/1 22 1 deregister_tm_clones/1 0 0 1
1 frame_dummy/1 0 1 register_tm_clones/1 0 0 1
1 main 4 1 f 0 0 1
1 __do_global_dtors_aux/1 20 0 [unknown] 0 0 1
0 [unknown] 0 1 _start 0 0 1
0 [unknown] 0 1 __do_global_dtors_aux/1 0 0 1
0 [unknown] 0 1 frame_dummy/1 0 0 1
0 [unknown] 0 1 main 0 0 1

and here is my basic-instr.exe . Could you fix this problem? @WangJee

I have fix this problem , and submit PR here. @WangJee @maksfb

@dinyy
Copy link

dinyy commented Apr 18, 2025

As we’ve previously discussed, there may be overlap in our work on this feature. To avoid duplication of effort, I think we should disscuss about how to collaborate with each other. @WangJee

@WangJee
Copy link
Contributor Author

WangJee commented Apr 18, 2025

Sorry, I didn't see this message yesterday and missed joining in. @maksfb I spent some time trying to use his code and found some problen. I fount that it couldn't pass the test in bolt/test/runtime/RISCV/basic-instrumentation.s.

1 __do_global_dtors_aux/1 8 1 __do_global_dtors_aux/1 a 0 1
1 __do_global_dtors_aux/1 16 1 __do_global_dtors_aux/1 18 0 1
1 __do_global_dtors_aux/1 20 1 __do_global_dtors_aux/1 22 0 1
1 __do_global_dtors_aux/1 20 0 [unknown] 0 0 1
0 [unknown] 0 1 _start 0 0 1
0 [unknown] 0 1 load_gp/1 0 0 1
0 [unknown] 0 1 __do_global_dtors_aux/1 0 0 1
0 [unknown] 0 1 frame_dummy/1 0 0 1
0 [unknown] 0 1 main 0 0 1
0 [unknown] 0 1 f 0 0 1

but the correct on is that:

1 _start 0 1 load_gp/1 0 0 1
1 _start 1c 1 __libc_start_main@PLT 0 0 1
1 __do_global_dtors_aux/1 8 1 __do_global_dtors_aux/1 a 0 1
1 __do_global_dtors_aux/1 16 1 __do_global_dtors_aux/1 18 0 1
1 __do_global_dtors_aux/1 20 1 __do_global_dtors_aux/1 22 0 1
1 __do_global_dtors_aux/1 22 1 deregister_tm_clones/1 0 0 1
1 frame_dummy/1 0 1 register_tm_clones/1 0 0 1
1 main 4 1 f 0 0 1
1 __do_global_dtors_aux/1 20 0 [unknown] 0 0 1
0 [unknown] 0 1 _start 0 0 1
0 [unknown] 0 1 __do_global_dtors_aux/1 0 0 1
0 [unknown] 0 1 frame_dummy/1 0 0 1
0 [unknown] 0 1 main 0 0 1

and here is my basic-instr.exe . Could you fix this problem? @WangJee

I have fix this problem , and submit PR here. @WangJee @maksfb

@dinyy I'm sorry I just saw your message, thank you very much for the fix

@WangJee
Copy link
Contributor Author

WangJee commented Apr 18, 2025

As we’ve previously discussed, there may be overlap in our work on this feature. To avoid duplication of effort, I think we should disscuss about how to collaborate with each other. @WangJee

Sure, I was trying to use llvm-bolt to optimize MySQL, there are still some bugs,such as #135711. I have made temporary fixes locally. Maybe we can work together to solve these problems.

@dinyy
Copy link

dinyy commented Apr 18, 2025

As we’ve previously discussed, there may be overlap in our work on this feature. To avoid duplication of effort, I think we should disscuss about how to collaborate with each other. @WangJee

Sure, I was trying to use llvm-bolt to optimize MySQL, there are still some bugs,such as #135711. I have made temporary fixes locally. Maybe we can work together to solve these problems.

I was thinking maybe creating a to-do list could help us move faster by breaking this work into smaller, independent tasks. What do you think? I believe this might make our collaboration on this feature smoother. @WangJee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants