Skip to content

[AArch64][PAC] Combine signing with address materialization #130809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

atrosinenko
Copy link
Contributor

@atrosinenko atrosinenko commented Mar 11, 2025

In pauthtest ABI, it is common to store a pointer signed with address diversity to a heap-allocated object (namely, storing a signed pointer to VTable as part of new object construction). This patch tries to prevent introducing a signing oracle by combining pointer materialization and its (re)signing into a single pseudo instruction which is not expanded until AsmPrinter, if possible. Furthermore, if the discriminator is computed as blend(addr, imm), these components are detected and passed via separate operands of that pseudo instruction, to prevent substitution of the immediate modifier.

One of the typical patterns is materializing an unsigned pointer with MOVaddr pseudo and then signing it with PAC[ID][AB] instruction, which can be moved far away from MOVaddr by one of the passes in the machine pipeline. As the storage address is not a Constant value, one cannot simply emit a ptrauth constant in the frontend, which would be selected into MOVaddrPAC pseudo.

Another pattern is fetching a pointer to VTable from a signed GOT entry using LOADgotAUTH pseudo, authenticating and checking it, and then re-signing after adding an offset.

This commit adds an instruction insertion hook for PAC[ID][AB] which detects the above patterns and replaces it either with MOVaddrPAC or LOADgotPAC instruction.


Even though the description of ISD::PtrAuthGlobalAddress mentions non-constant address discriminators, this seems to be DAGISel-specific and currently unused, so I post-process PAC* instructions in AArch64TargetLowering::EmitInstrWithCustomInserter not to implement this logic twice: in DAGISel and GlobalISel.

@llvmbot
Copy link
Member

llvmbot commented Mar 11, 2025

@llvm/pr-subscribers-lld-elf
@llvm/pr-subscribers-lld
@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-globalisel

Author: Anatoly Trosinenko (atrosinenko)

Changes

In pauthtest ABI, it is common to store a pointer signed with address diversity to a heap-allocated object (namely, storing a signed pointer to VTable as part of new object construction). This patch tries to prevent introducing a signing oracle by combining pointer materialization and its (re)signing into a single pseudo instruction which is not expanded until AsmPrinter, if possible.

One of the typical patterns is materializing an unsigned pointer with MOVaddr pseudo and then signing it with PAC[ID][AB] instruction, which can be moved far away from MOVaddr by one of the passes in the machine pipeline. As the storage address is not a Constant value, one cannot simply emit a ptrauth constant in the frontend, which would be selected into MOVaddrPAC pseudo.

Another pattern is fetching a pointer to VTable from a signed GOT entry using LOADgotAUTH pseudo, authenticating and checking it, and then re-signing after adding an offset.

This commit adds an instruction insertion hook for PAC[ID][AB] which detects the above patterns and replaces it either with MOVaddrPAC or LOADgotPAC instruction.


Even though the description of ISD::PtrAuthGlobalAddress mentions non-constant address discriminators, this seems to be DAGISel-specific and currently unused, so I post-process PAC* instructions in AArch64TargetLowering::EmitInstrWithCustomInserter not to implement this logic twice: in DAGISel and GlobalISel.


Full diff: https://github.com/llvm/llvm-project/pull/130809.diff

6 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+82)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+2)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.h (+33)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+2)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/ptrauth-constant-in-code.ll (+78)
  • (modified) llvm/test/CodeGen/AArch64/ptrauth-constant-in-code.ll (+73)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index b99b3df57b162..2b721ce47b71d 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -3339,6 +3339,78 @@ AArch64TargetLowering::EmitGetSMESaveSize(MachineInstr &MI,
             MI.getOperand(0).getReg())
         .addReg(AArch64::XZR);
   BB->remove_instr(&MI);
+
+  return BB;
+}
+
+MachineBasicBlock *
+AArch64TargetLowering::tryRewritingPAC(MachineInstr &MI,
+                                       MachineBasicBlock *BB) const {
+  const TargetInstrInfo *TII = Subtarget->getInstrInfo();
+  MachineRegisterInfo &MRI = MI.getMF()->getRegInfo();
+  const DebugLoc &DL = MI.getDebugLoc();
+
+  MachineInstr *AddrInstr = nullptr;
+  int64_t Offset = 0;
+  // Try to find the address-setting instruction, accumulating the offset
+  // along the way. If any unknown pattern is found, keep everything as-is.
+  MachineOperand *CurOp = &MI.getOperand(1);
+  while (CurOp) {
+    MachineOperand *Def = MRI.getOneDef(CurOp->getReg());
+    if (!Def)
+      return BB;
+    MachineInstr *DefMI = Def->getParent();
+    assert(DefMI != nullptr);
+
+    switch (DefMI->getOpcode()) {
+    case AArch64::COPY:
+      CurOp = &DefMI->getOperand(1);
+      break;
+    case AArch64::ADDXri:
+      if (DefMI->getOperand(3).getImm() != 0) // shifts are not handled
+        return BB;
+      CurOp = &DefMI->getOperand(1);
+      Offset += DefMI->getOperand(2).getImm();
+      break;
+    case AArch64::MOVaddr:
+    case AArch64::LOADgotAUTH:
+      AddrInstr = DefMI;
+      CurOp = nullptr;
+      break;
+    default:
+      return BB;
+    }
+  }
+
+  unsigned NewOpcode = AddrInstr->getOpcode() == AArch64::LOADgotAUTH
+                           ? AArch64::LOADgotPAC
+                           : AArch64::MOVaddrPAC;
+  MachineOperand &AddrOp = AddrInstr->getOperand(1);
+  unsigned TargetFlags = AddrOp.getTargetFlags() & ~AArch64II::MO_PAGE;
+  Offset += AddrOp.getOffset();
+
+  // MOVaddrPAC and LOADgotPAC pseudos are expanded so that they use X16/X17
+  // internally, thus their restrictions on the register class of $AddrDisc
+  // operand are stricter than those of real PAC* instructions.
+  // If the original instruction accepts a discriminator operand, make sure
+  // it is moved out of X16/X17.
+  Register DiscReg = AArch64::XZR;
+  if (!isPACWithZeroDisc(MI.getOpcode())) {
+    DiscReg = MRI.createVirtualRegister(&AArch64::GPR64noipRegClass);
+    BuildMI(*BB, MI, DL, TII->get(AArch64::COPY), DiscReg)
+        .addReg(MI.getOperand(2).getReg());
+  }
+
+  BuildMI(*BB, MI, DL, TII->get(NewOpcode))
+      .addGlobalAddress(AddrOp.getGlobal(), Offset, TargetFlags)
+      .addImm(getKeyForPACOpcode(MI.getOpcode()))
+      .addReg(DiscReg)
+      .addImm(0);
+
+  BuildMI(*BB, MI, DL, TII->get(AArch64::COPY), MI.getOperand(0).getReg())
+      .addReg(AArch64::X16);
+
+  MI.removeFromParent();
   return BB;
 }
 
@@ -3440,6 +3512,16 @@ MachineBasicBlock *AArch64TargetLowering::EmitInstrWithCustomInserter(
     return EmitZTInstr(MI, BB, AArch64::ZERO_T, /*Op0IsDef=*/true);
   case AArch64::MOVT_TIZ_PSEUDO:
     return EmitZTInstr(MI, BB, AArch64::MOVT_TIZ, /*Op0IsDef=*/true);
+
+  case AArch64::PACDA:
+  case AArch64::PACDB:
+  case AArch64::PACIA:
+  case AArch64::PACIB:
+  case AArch64::PACDZA:
+  case AArch64::PACDZB:
+  case AArch64::PACIZA:
+  case AArch64::PACIZB:
+    return tryRewritingPAC(MI, BB);
   }
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index 1987c892ac080..1c5d8ca4f83d4 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -691,6 +691,8 @@ class AArch64TargetLowering : public TargetLowering {
                                                MachineBasicBlock *BB) const;
   MachineBasicBlock *EmitGetSMESaveSize(MachineInstr &MI,
                                         MachineBasicBlock *BB) const;
+  MachineBasicBlock *tryRewritingPAC(MachineInstr &MI,
+                                     MachineBasicBlock *BB) const;
 
   MachineBasicBlock *
   EmitInstrWithCustomInserter(MachineInstr &MI,
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.h b/llvm/lib/Target/AArch64/AArch64InstrInfo.h
index 3eab98139fa7c..3bd83b0af50af 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.h
@@ -765,6 +765,39 @@ static inline unsigned getPACOpcodeForKey(AArch64PACKey::ID K, bool Zero) {
   llvm_unreachable("Unhandled AArch64PACKey::ID enum");
 }
 
+static inline AArch64PACKey::ID getKeyForPACOpcode(unsigned Opcode) {
+  switch (Opcode) {
+  case AArch64::PACDA:
+  case AArch64::PACDZA:
+    return AArch64PACKey::DA;
+  case AArch64::PACDB:
+  case AArch64::PACDZB:
+    return AArch64PACKey::DB;
+  case AArch64::PACIA:
+  case AArch64::PACIZA:
+    return AArch64PACKey::IA;
+  case AArch64::PACIB:
+  case AArch64::PACIZB:
+    return AArch64PACKey::IB;
+  }
+  llvm_unreachable("Unhandled PAC opcode");
+}
+
+static inline bool isPACWithZeroDisc(unsigned Opcode) {
+  switch (Opcode) {
+  case AArch64::PACDA:
+  case AArch64::PACDB:
+  case AArch64::PACIA:
+  case AArch64::PACIB:
+    return false;
+  case AArch64::PACDZA:
+  case AArch64::PACDZB:
+  case AArch64::PACIZA:
+  case AArch64::PACIZB:
+    return true;
+  }
+  llvm_unreachable("Unhandled PAC opcode");
+}
 // struct TSFlags {
 #define TSFLAG_ELEMENT_SIZE_TYPE(X)      (X)        // 3-bits
 #define TSFLAG_DESTRUCTIVE_INST_TYPE(X) ((X) << 3)  // 4-bits
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index ee109277dfbba..9f6e91743196c 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -1821,7 +1821,9 @@ let Predicates = [HasPAuth] in {
     def DZB  : SignAuthZero<prefix_z,  0b11, !strconcat(asm, "dzb"), op>;
   }
 
+  let usesCustomInserter = true in
   defm PAC : SignAuth<0b000, 0b010, "pac", int_ptrauth_sign>;
+
   defm AUT : SignAuth<0b001, 0b011, "aut", null_frag>;
 
   def XPACI : ClearAuth<0, "xpaci">;
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/ptrauth-constant-in-code.ll b/llvm/test/CodeGen/AArch64/GlobalISel/ptrauth-constant-in-code.ll
index 12a3448111fcb..4ef5cc7981924 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/ptrauth-constant-in-code.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/ptrauth-constant-in-code.ll
@@ -78,6 +78,84 @@ define ptr @foo() {
   ret ptr ptrauth (ptr @g, i32 0)
 }
 
+;--- finalize-isel.ll
+
+; RUN: llc < finalize-isel.ll -mtriple aarch64-elf -mattr=+pauth -global-isel=1 \
+; RUN:   -verify-machineinstrs -global-isel-abort=1 -stop-after=finalize-isel | \
+; RUN:   FileCheck --check-prefixes=ISEL,ISEL-ELF %s
+; RUN: llc < finalize-isel.ll -mtriple arm64-apple-ios -mattr=+pauth -global-isel=1 \
+; RUN:   -verify-machineinstrs -global-isel-abort=1 -stop-after=finalize-isel | \
+; RUN:   FileCheck --check-prefixes=ISEL %s
+
+@const_table_local = dso_local constant [3 x ptr] [ptr null, ptr null, ptr null]
+@const_table_got = constant [3 x ptr] [ptr null, ptr null, ptr null]
+
+define void @store_signed_const_local(ptr %dest) {
+; ISEL-LABEL: name: store_signed_const_local
+; ISEL:       body:
+; ISEL:         %0:gpr64common = COPY $x0
+; ISEL-NEXT:    %10:gpr64common = MOVaddr target-flags(aarch64-page) @const_table_local + 8, target-flags(aarch64-pageoff, aarch64-nc) @const_table_local + 8
+; ISEL-NEXT:    %2:gpr64common = PAUTH_BLEND %0, 1234
+; ISEL-NEXT:    %15:gpr64noip = COPY %2
+; ISEL-NEXT:    MOVaddrPAC @const_table_local + 8, 2, %15, 0, implicit-def $x16, implicit-def $x17
+; ISEL-NEXT:    %4:gpr64 = COPY $x16
+; ISEL-NEXT:    %14:gpr64 = COPY %4
+; ISEL-NEXT:    STRXui %14, %0, 0 :: (store (p0) into %ir.dest)
+; ISEL-NEXT:    RET_ReallyLR
+  %dest.i = ptrtoint ptr %dest to i64
+  %discr = call i64 @llvm.ptrauth.blend(i64 %dest.i, i64 1234)
+  %signed.i = call i64 @llvm.ptrauth.sign(i64 ptrtoint (ptr getelementptr ([2 x ptr], ptr @const_table_local, i32 0, i32 1) to i64), i32 2, i64 %discr)
+  %signed.ptr = inttoptr i64 %signed.i to ptr
+  store ptr %signed.ptr, ptr %dest
+  ret void
+}
+
+define void @store_signed_const_got(ptr %dest) {
+; ISEL-ELF-LABEL: name: store_signed_const_got
+; ISEL-ELF:       body:
+; ISEL-ELF:         %0:gpr64common = COPY $x0
+; ISEL-ELF-NEXT:    %7:gpr64common = LOADgotAUTH target-flags(aarch64-got) @const_table_got
+; ISEL-ELF-NEXT:    %6:gpr64common = ADDXri %7, 8, 0
+; ISEL-ELF-NEXT:    %2:gpr64common = PAUTH_BLEND %0, 1234
+; ISEL-ELF-NEXT:    %12:gpr64noip = COPY %2
+; ISEL-ELF-NEXT:    LOADgotPAC target-flags(aarch64-got) @const_table_got + 8, 2, %12, 0, implicit-def $x16, implicit-def $x17, implicit-def $nzcv
+; ISEL-ELF-NEXT:    %4:gpr64 = COPY $x16
+; ISEL-ELF-NEXT:    %10:gpr64 = COPY %4
+; ISEL-ELF-NEXT:    STRXui %10, %0, 0 :: (store (p0) into %ir.dest)
+; ISEL-ELF-NEXT:    RET_ReallyLR
+  %dest.i = ptrtoint ptr %dest to i64
+  %discr = call i64 @llvm.ptrauth.blend(i64 %dest.i, i64 1234)
+  %signed.i = call i64 @llvm.ptrauth.sign(i64 ptrtoint (ptr getelementptr ([2 x ptr], ptr @const_table_got, i32 0, i32 1) to i64), i32 2, i64 %discr)
+  %signed.ptr = inttoptr i64 %signed.i to ptr
+  store ptr %signed.ptr, ptr %dest
+  ret void
+}
+
+define void @store_signed_arg(ptr %dest, ptr %p) {
+; ISEL-LABEL: name: store_signed_arg
+; ISEL:       body:
+; ISEL:         %0:gpr64common = COPY $x0
+; ISEL-NEXT:    %1:gpr64common = COPY $x1
+; ISEL-NEXT:    %3:gpr64common = PAUTH_BLEND %0, 1234
+; ISEL-NEXT:    %6:gpr64common = ADDXri %1, 8, 0
+; Check that no implicit defs are added to PACDA instruction.
+; ISEL-NEXT:    %8:gpr64 = PACDA %6, %3{{$}}
+; ISEL-NEXT:    %10:gpr64 = COPY %8
+; ISEL-NEXT:    STRXui %10, %0, 0 :: (store (p0) into %ir.dest)
+; ISEL-NEXT:    RET_ReallyLR
+  %dest.i = ptrtoint ptr %dest to i64
+  %discr = call i64 @llvm.ptrauth.blend(i64 %dest.i, i64 1234)
+  %p.offset = getelementptr [2 x ptr], ptr %p, i32 0, i32 1
+  %p.offset.i = ptrtoint ptr %p.offset to i64
+  %signed.i = call i64 @llvm.ptrauth.sign(i64 %p.offset.i, i32 2, i64 %discr)
+  %signed.ptr = inttoptr i64 %signed.i to ptr
+  store ptr %signed.ptr, ptr %dest
+  ret void
+}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 8, !"ptrauth-elf-got", i32 1}
+
 ;--- ok.ll
 
 ; RUN: llc < ok.ll -mtriple aarch64-elf -mattr=+pauth -global-isel=1 \
diff --git a/llvm/test/CodeGen/AArch64/ptrauth-constant-in-code.ll b/llvm/test/CodeGen/AArch64/ptrauth-constant-in-code.ll
index 76339a7cc5791..e59f7925b7e02 100644
--- a/llvm/test/CodeGen/AArch64/ptrauth-constant-in-code.ll
+++ b/llvm/test/CodeGen/AArch64/ptrauth-constant-in-code.ll
@@ -69,6 +69,79 @@ define ptr @foo() {
   ret ptr ptrauth (ptr @g, i32 0)
 }
 
+;--- finalize-isel.ll
+
+; RUN: llc < finalize-isel.ll -mtriple aarch64-elf -mattr=+pauth -global-isel=0 \
+; RUN:   -verify-machineinstrs -stop-after=finalize-isel | FileCheck --check-prefixes=ISEL,ISEL-ELF %s
+; RUN: llc < finalize-isel.ll -mtriple arm64-apple-ios -mattr=+pauth -global-isel=0 \
+; RUN:   -verify-machineinstrs -stop-after=finalize-isel | FileCheck --check-prefixes=ISEL %s
+
+@const_table_local = dso_local constant [3 x ptr] [ptr null, ptr null, ptr null]
+@const_table_got = constant [3 x ptr] [ptr null, ptr null, ptr null]
+
+define void @store_signed_const_local(ptr %dest) {
+; ISEL-LABEL: name: store_signed_const_local
+; ISEL:       body:
+; ISEL:         %0:gpr64common = COPY $x0
+; ISEL-NEXT:    %1:gpr64common = PAUTH_BLEND %0, 1234
+; ISEL-NEXT:    %2:gpr64common = MOVaddr target-flags(aarch64-page) @const_table_local + 8, target-flags(aarch64-pageoff, aarch64-nc) @const_table_local + 8
+; ISEL-NEXT:    %4:gpr64noip = COPY %1
+; ISEL-NEXT:    MOVaddrPAC @const_table_local + 8, 2, %4, 0, implicit-def $x16, implicit-def $x17
+; ISEL-NEXT:    %3:gpr64 = COPY $x16
+; ISEL-NEXT:    STRXui killed %3, %0, 0 :: (store (s64) into %ir.dest)
+; ISEL-NEXT:    RET_ReallyLR
+  %dest.i = ptrtoint ptr %dest to i64
+  %discr = call i64 @llvm.ptrauth.blend(i64 %dest.i, i64 1234)
+  %signed.i = call i64 @llvm.ptrauth.sign(i64 ptrtoint (ptr getelementptr ([2 x ptr], ptr @const_table_local, i32 0, i32 1) to i64), i32 2, i64 %discr)
+  %signed.ptr = inttoptr i64 %signed.i to ptr
+  store ptr %signed.ptr, ptr %dest
+  ret void
+}
+
+define void @store_signed_const_got(ptr %dest) {
+; ISEL-ELF-LABEL: name: store_signed_const_got
+; ISEL-ELF:       body:
+; ISEL-ELF:         %0:gpr64common = COPY $x0
+; ISEL-ELF-NEXT:    %1:gpr64common = PAUTH_BLEND %0, 1234
+; ISEL-ELF-NEXT:    %2:gpr64common = LOADgotAUTH target-flags(aarch64-got) @const_table_got, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv
+; ISEL-ELF-NEXT:    %3:gpr64common = ADDXri killed %2, 8, 0
+; ISEL-ELF-NEXT:    %5:gpr64noip = COPY %1
+; ISEL-ELF-NEXT:    LOADgotPAC target-flags(aarch64-got) @const_table_got + 8, 2, %5, 0, implicit-def $x16, implicit-def $x17, implicit-def $nzcv
+; ISEL-ELF-NEXT:    %4:gpr64 = COPY $x16
+; ISEL-ELF-NEXT:    STRXui killed %4, %0, 0 :: (store (s64) into %ir.dest)
+; ISEL-ELF-NEXT:    RET_ReallyLR
+  %dest.i = ptrtoint ptr %dest to i64
+  %discr = call i64 @llvm.ptrauth.blend(i64 %dest.i, i64 1234)
+  %signed.i = call i64 @llvm.ptrauth.sign(i64 ptrtoint (ptr getelementptr ([2 x ptr], ptr @const_table_got, i32 0, i32 1) to i64), i32 2, i64 %discr)
+  %signed.ptr = inttoptr i64 %signed.i to ptr
+  store ptr %signed.ptr, ptr %dest
+  ret void
+}
+
+define void @store_signed_arg(ptr %dest, ptr %p) {
+; ISEL-LABEL: name: store_signed_arg
+; ISEL:       body:
+; ISEL:         %1:gpr64common = COPY $x1
+; ISEL-NEXT:    %0:gpr64common = COPY $x0
+; ISEL-NEXT:    %2:gpr64common = PAUTH_BLEND %0, 1234
+; ISEL-NEXT:    %3:gpr64common = ADDXri %1, 8, 0
+; Check that no implicit defs are added to PACDA instruction.
+; ISEL-NEXT:    %4:gpr64 = PACDA %3, killed %2{{$}}
+; ISEL-NEXT:    STRXui killed %4, %0, 0 :: (store (s64) into %ir.dest)
+; ISEL-NEXT:    RET_ReallyLR
+  %dest.i = ptrtoint ptr %dest to i64
+  %discr = call i64 @llvm.ptrauth.blend(i64 %dest.i, i64 1234)
+  %p.offset = getelementptr [2 x ptr], ptr %p, i32 0, i32 1
+  %p.offset.i = ptrtoint ptr %p.offset to i64
+  %signed.i = call i64 @llvm.ptrauth.sign(i64 %p.offset.i, i32 2, i64 %discr)
+  %signed.ptr = inttoptr i64 %signed.i to ptr
+  store ptr %signed.ptr, ptr %dest
+  ret void
+}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 8, !"ptrauth-elf-got", i32 1}
+
 ;--- ok.ll
 
 ; RUN: llc < ok.ll -mtriple aarch64-elf -mattr=+pauth -global-isel=0 \

@atrosinenko
Copy link
Contributor Author

Here is an example of signing oracle observed in llvm-test-suite (for MOVaddr -> MOVaddrPAC case). When I compile MultiSource/Benchmarks/Prolangs-C++/primes/primes.cpp source with the following command:

/path/to/bin/clang++ -target aarch64-linux-pauthtest -march=v8.3-a --sysroot=... -stdlib=libc++ -O2 ../MultiSource/Benchmarks/Prolangs-C++/primes/primes.cpp -S -o- -mllvm -print-after=finalize-isel

I get the following (with CFI directives and most comments omitted for brevity):

`main` function (without this patch)
main:
// %bb.0:                               // %entry
	pacibsp
	sub	sp, sp, #96
	stp	x29, x30, [sp, #48]
	stp	x22, x21, [sp, #64]
	stp	x20, x19, [sp, #80]
	add	x29, sp, #48
	add	x8, sp, #24
	adrp	x9, _ZTV7counter+16
	add	x9, x9, :lo12:_ZTV7counter+16
	mov	x10, x8
	movk	x10, #53906, lsl #48
	add	x11, sp, #8
	mov	w22, #34465
	pacda	x9, x10
	adrp	x10, _ZTV5sieve+16
	add	x10, x10, :lo12:_ZTV5sieve+16
	movk	x11, #53906, lsl #48
	adrp	x21, _ZTV6filter+16
	add	x21, x21, :lo12:_ZTV6filter+16
	pacda	x10, x11
	mov	w11, #2
	adrp	x19, .L.str
	add	x19, x19, :lo12:.L.str
	movk	w22, #1, lsl #16
	str	w11, [sp, #40]
	stp	x9, xzr, [sp, #24]
	stp	x10, x8, [sp, #8]
.LBB0_1:                                // %do.body
	ldr	x0, [sp, #16]
	ldr	x16, [x0]
	mov	x17, x0
	movk	x17, #53906, lsl #48
	autda	x16, x17
	mov	x17, x16
	xpacd	x17
	cmp	x16, x17
	b.eq	.Lauth_success_0
	brk	#0xc472
.Lauth_success_0:
	ldr	x8, [x16]
	movk	x16, #35396, lsl #48
	blraa	x8, x16
	mov	w20, w0
	mov	w0, #24
	bl	_Znwm                           // x21 can be tampered with by an attacker during the call
	mov	x8, x21                         // x21 contains _ZTV6filter+16 under normal circumstances
	mov	x9, x0
	movk	x9, #53906, lsl #48
	str	w20, [x0, #16]
	pacda	x8, x9                          // <-- signing oracle
	ldr	x9, [sp, #16]
	str	x0, [sp, #16]
	mov	w1, w20
	stp	x8, x9, [x0]
	mov	x0, x19
	bl	printf
	cmp	w20, w22
	b.lt	.LBB0_1
// %bb.2:                               // %do.end
	mov	w0, #10
	bl	putchar
	mov	w0, wzr
	ldp	x20, x19, [sp, #80]
	ldp	x22, x21, [sp, #64]
	ldp	x29, x30, [sp, #48]
	add	sp, sp, #96
	retab

Under normal circumstances, the value of _ZTV6filter+16 expression is stored to the callee-saved x21 register at the beginning of the function and then signed with address diversity multiple times. The problem is that function call is performed between computing the address and signing it, so x21 can be spilled to memory and tampered with by the attacker.

At the Machine IR level, after finalize-isel pass this looks like

  %20:gpr64common = MOVaddr target-flags(aarch64-page) @_ZTV6filter + 16, target-flags(aarch64-pageoff, aarch64-nc) @_ZTV6filter + 16
  %21:gpr64 = PACDA %20:gpr64common(tied-def 0), killed %19:gpr64common

With this patch applied, it gets rewritten like this (MOVaddr is expected to be dead-code-eliminated later, if not used):

  %20:gpr64common = MOVaddr target-flags(aarch64-page) @_ZTV6filter + 16, target-flags(aarch64-pageoff, aarch64-nc) @_ZTV6filter + 16
  %31:gpr64noip = COPY %19:gpr64common
  MOVaddrPAC @_ZTV6filter + 16, 2, %31:gpr64noip, 0, implicit-def $x16, implicit-def $x17
  %21:gpr64 = COPY $x16

This prevents signing oracle in main function:

`main` function (with this patch)
main:
// %bb.0:                               // %entry
	pacibsp
	sub	sp, sp, #96
	stp	x29, x30, [sp, #48]
	str	x21, [sp, #64]
	stp	x20, x19, [sp, #80]
	add	x29, sp, #48
	add	x8, sp, #24
	mov	w9, #2
	mov	w21, #34465
	mov	x10, x8
	movk	x10, #53906, lsl #48
	adrp	x19, .L.str
	add	x19, x19, :lo12:.L.str
	adrp	x16, _ZTV7counter
	add	x16, x16, :lo12:_ZTV7counter
	add	x16, x16, #16
	pacda	x16, x10
	str	w9, [sp, #40]
	add	x9, sp, #8
	movk	x9, #53906, lsl #48
	stp	x16, xzr, [sp, #24]
	movk	w21, #1, lsl #16
	adrp	x16, _ZTV5sieve
	add	x16, x16, :lo12:_ZTV5sieve
	add	x16, x16, #16
	pacda	x16, x9
	stp	x16, x8, [sp, #8]
.LBB0_1:                                // %do.body
	ldr	x0, [sp, #16]
	ldr	x16, [x0]
	mov	x17, x0
	movk	x17, #53906, lsl #48
	autda	x16, x17
	mov	x17, x16
	xpacd	x17
	cmp	x16, x17
	b.eq	.Lauth_success_0
	brk	#0xc472
.Lauth_success_0:
	ldr	x8, [x16]
	movk	x16, #35396, lsl #48
	blraa	x8, x16
	mov	w20, w0
	mov	w0, #24
	bl	_Znwm
	ldr	x8, [sp, #16]
	mov	x9, x0
	movk	x9, #53906, lsl #48
	mov	w1, w20
	adrp	x16, _ZTV6filter
	add	x16, x16, :lo12:_ZTV6filter
	add	x16, x16, #16
	pacda	x16, x9                     // <-- the address to sign cannot be spilled to memory
	str	w20, [x0, #16]
	stp	x16, x8, [x0]
	str	x0, [sp, #16]
	mov	x0, x19
	bl	printf
	cmp	w20, w21
	b.lt	.LBB0_1
// %bb.2:                               // %do.end
	mov	w0, #10
	bl	putchar
	mov	w0, wzr
	ldp	x20, x19, [sp, #80]
	ldr	x21, [sp, #64]
	ldp	x29, x30, [sp, #48]
	add	sp, sp, #96
	retab

Copy link
Contributor

@ojhunt ojhunt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantically, this looks reasonable to me, but I'm not really a back end/isel expert so I think @ahmedbougacha needs to look at this.

One thing I'd like to see is a test case that tries to induce an oracle using the intrinsics in C, something akin to

   void* f1(void *p) {
      void* authed = __ptrauth_auth(p, ....);
      __asm__(""); // may be a sufficient barrier?
      return __ptrauth_sign(authed, ...);
   }
   
   void* f2(void *p) {
      void* authed = __ptrauth_auth(p, ....);
      __asm__(""::"r"(authed)); // theoretically does not spill so no oracle
      return __ptrauth_sign(authed, ...);
   }
   
   void* f3(void *p) {
      void* authed = __ptrauth_auth(p, ....);
      __asm__(""::"m"(authed)); // theoretically does spill so would be an oracle, what should happen?
      return __ptrauth_sign(authed, ...);
   }

In an ideal world - out of scope for this change - I think we'd like to error out if we ever see a read from memory into an unauthenticated sign, or a spill and subsequent read of an authenticated value. Doing this robustly probably requires recording that a value was authenticated, then in clang producing a warning or error if a developer does something that forces post-auth store to memory, and for llvm/implicitly authed values increasing the keep-in-register priority for those values, and finally if they still end up being spilled giving them a custom auth schema and signing them on spill.

@atrosinenko
Copy link
Contributor Author

One thing I'd like to see is a test case that tries to induce an oracle using the intrinsics in C

@ojhunt Do you suggest adding a real C source to clang/test or just adding LLVM IR emitted for C functions like these to LLVM tests introduced by this PR (and checking both MIR and final asm listing - among other things, this would check that dead instructions left by EmitInstrWithCustomInserter are eliminated down the pipeline)?

In an ideal world - out of scope for this change - I think we'd like to error out if we ever see a read from memory into an unauthenticated sign, or a spill and subsequent read of an authenticated value.

IIUC there are a few cases when the former is valid - such as in a dynamic loader, so an opt-out machinery will be needed (such as maybe some function attribute).

@ojhunt
Copy link
Contributor

ojhunt commented Mar 26, 2025

One thing I'd like to see is a test case that tries to induce an oracle using the intrinsics in C

@ojhunt Do you suggest adding a real C source to clang/test or just adding LLVM IR emitted for C functions like these to LLVM tests introduced by this PR (and checking both MIR and final asm listing - among other things, this would check that dead instructions left by EmitInstrWithCustomInserter are eliminated down the pipeline)?

Yes, I was thinking a clang/test test, but maybe IR with intervening asm node would be sufficient? I don't know the IR well enough to know how hard it would be to get tests similar to my above pseudo-c :D

In an ideal world - out of scope for this change - I think we'd like to error out if we ever see a read from memory into an unauthenticated sign, or a spill and subsequent read of an authenticated value.

IIUC there are a few cases when the former is valid - such as in a dynamic loader, so an opt-out machinery will be needed (such as maybe some function attribute).

Yeah, if we were to do something like this for generic loads (not just restore from spill) we'd probably want a new "trust me I know what I'm doing" signing intrinsic that is explicitly permitted to bypass such a guard. Note I'm not suggesting anything like this for this PR, it's very much a future work idea :D

@atrosinenko
Copy link
Contributor Author

atrosinenko commented Mar 27, 2025

@ojhunt Updated the test cases - turned out, there was actually an issue with LOADgotAUTH not being eliminated.

Considering your test cases, I just realized that they should probably be handled by #130807, not by this patch. As for converting them to LLVM IR, f1 and f2 are translated by Clang in a rather straightforward manner and f3 is a bit more tricky (this output can be further cleaned up):

LLVM IR for f1, f2, f3
define dso_local ptr @f1(ptr noundef %p) local_unnamed_addr #0 {
entry:
  %0 = ptrtoint ptr %p to i64
  %1 = tail call i64 @llvm.ptrauth.auth(i64 %0, i32 2, i64 1234)
  tail call void asm sideeffect "", ""() #4, !srcloc !9
  %2 = tail call i64 @llvm.ptrauth.sign(i64 %1, i32 3, i64 42)
  %3 = inttoptr i64 %2 to ptr
  ret ptr %3
}

define dso_local ptr @f2(ptr noundef %p) local_unnamed_addr #0 {
entry:
  %0 = ptrtoint ptr %p to i64
  %1 = tail call i64 @llvm.ptrauth.auth(i64 %0, i32 2, i64 1234)
  %2 = inttoptr i64 %1 to ptr
  tail call void asm sideeffect "", "r"(ptr %2) #4, !srcloc !10
  %3 = tail call i64 @llvm.ptrauth.sign(i64 %1, i32 3, i64 42)
  %4 = inttoptr i64 %3 to ptr
  ret ptr %4
}

define dso_local ptr @f3(ptr noundef %p) local_unnamed_addr #0 {
entry:
  %authed = alloca ptr, align 8
  call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %authed) #4
  %0 = ptrtoint ptr %p to i64
  %1 = tail call i64 @llvm.ptrauth.auth(i64 %0, i32 2, i64 1234)
  %2 = inttoptr i64 %1 to ptr
  store ptr %2, ptr %authed, align 8, !tbaa !11
  call void asm sideeffect "", "*m"(ptr nonnull elementtype(ptr) %authed) #4, !srcloc !15
  %3 = load ptr, ptr %authed, align 8, !tbaa !11
  %4 = ptrtoint ptr %3 to i64
  %5 = call i64 @llvm.ptrauth.sign(i64 %4, i32 3, i64 42)
  %6 = inttoptr i64 %5 to ptr
  call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %authed) #4
  ret ptr %6
}

As this PR does not handle folding "standalone" auth into sign, the following code is emitted for f3:

f3:
        sub     sp, sp, #16
        mov     x16, x0
        mov     x17, #1234
        autda   x16, x17
        mov     x17, x16
        xpacd   x17
        cmp     x16, x17
        b.eq    .Lauth_success_2
        brk     #0xc472
.Lauth_success_2:
        add     x8, sp, #8
        str     x16, [sp, #8]
        //APP
        //NO_APP
        ldr     x0, [sp, #8]
        mov     w8, #42
        pacdb   x0, x8
        add     sp, sp, #16
        ret

On the branch of #130807 the result is similar, which is quite surprising, moreover x0 is not reloaded after inline asm, meaning there should be no reason not to merge auth+sign.

On the other hand, if I replace inline asm with function call like this:

  void callee(void *);
  void* f3(void *p) {
     void* authed = __builtin_ptrauth_auth(p, 2, 1234);
     callee(authed);
     return __builtin_ptrauth_sign_unauthenticated(authed, 3, 42);
  }

... both patches behave as expected: this one calls a function and signs a potentially overwritten register:

// ...
        bl      callee
        mov     w8, #42
        pacdb   x19, x8
// ...

and with PR #130807, authentication is re-done after function call:

f3:
        pacibsp
        stp     x29, x30, [sp, #-32]!
        str     x19, [sp, #16]
        mov     x29, sp
        mov     x16, x0
        mov     x19, x0
        mov     x17, #1234
        autda   x16, x17
        mov     x17, x16
        xpacd   x17
        cmp     x16, x17
        b.eq    .Lauth_success_0
        brk     #0xc472
.Lauth_success_0:
        mov     x0, x16
        bl      callee
        mov     x16, x19
        mov     x17, #1234
        autda   x16, x17
        mov     x17, x16
        xpacd   x17
        cmp     x16, x17
        b.eq    .Lauth_success_1
        brk     #0xc472
.Lauth_success_1:
        mov     x17, #42
        pacdb   x16, x17
        mov     x0, x16
        ldr     x19, [sp, #16]
        ldp     x29, x30, [sp], #32
        retab

@ahmedbougacha
Copy link
Member

ahmedbougacha commented Mar 31, 2025

Regarding this specific problem, I realized we have a downstream change that I must have lost in the upstreaming branch (my bad), that tries to address a similar problem in a way we can reason about more easily. Specifically: if, when lowering IR before ISel, we see an llvm.ptrauth.sign intrinsic with a constant pointer operand, we should lower that to the equivalent of a ptrauth constant. I tried to extract the most basic patch I could in:
[AArch64] Lower ptrauth.sign of constant as PtrAuthGlobalAddress.
please take a look and let me know if that addresses the original problem! I haven't looked too closely at this PR yet, I'll do that next

In general, we've tried to only do this sort of opportunistic late hardening after we tried to do all we could provide guarantees in the frontend and in IR. One good reason is, it makes these weak spots trivial to find and to generalize from, whereas doing something opportunistic will often hide problems in the common case and makes perfectly-accurate analysis tooling load-bearing. That said, the time may have come to do this as a catch-all, since we realistically won't be able to do much about the remaining representational problems (like adding an offset to a ptrauth qualified pointer). So it's a good idea to consider something like this patch.

@atrosinenko
Copy link
Contributor Author

@ahmedbougacha I performed some testing of #133788 compared to this PR - it seems to eliminate mostly the same set of problems. Though, if I understand it correctly, it does not touch GlobalISel at all (though I don't know whether GlobalISel is widely used).

The code base to be built by various versions of LLVM is llvm-test-suite, it was built with the same set of compilation options (optimization level is -O2). The resulting binaries were scanned with a prototype of a BOLT-based analyzer implemented by @kbeyls (see #122304 for the upstreamed pac-ret scanner) extended to detect other kinds of gadgets (the logic under the detection of signing oracles is mostly the same as in #134146).

The idea is that there was detected a number of signing oracles which can be resolved by this PR together with a patch for inst-combine sent as #130807, so the LLVM versions being tested are:

  1. b739a3c (the parent commit of [AArch64] Lower ptrauth.sign of constant as PtrAuthGlobalAddress. #133788)
  2. 40e0181 (the [AArch64] Lower ptrauth.sign of constant as PtrAuthGlobalAddress. #133788 itself)
  3. this PR rebased to the same base commit (1)
  4. the patch for inst-combine cherry-picked onto (2)
  5. the patch for inst-combine cherry-picked onto (3)

For the base commit (1), 1481 signing oracles are reported (note that the same utility code is linked into many different executables, so a lot of duplicated reports are possible).

The LLVM version from #133788 (2) emits only 355 signing oracles and the version from this PR (3) only emits 349 signing oracles.

With #130807 cherry-picked on top of both, there are 6 and 0 signing oracles reported, correspondingly.

I will provide more information on these 6 remaining oracles after a bit of debugging.

@atrosinenko
Copy link
Contributor Author

@ahmedbougacha Considering the remaining signing oracles in #133788, there are six:

MultiSource/Benchmarks/Prolangs-C++/employ: Materialization of _ZTV11PieceWorker+16 and _ZTV20HourlyWorkerOvertime+16 in function main:

Command line
/path/to/bin/clang++ -DNDEBUG -target aarch64-linux-pauthtest \
    -march=v8.3-a --sysroot=... -rtlib=compiler-rt -stdlib=libc++ \
    -mllvm --aarch64-authenticated-lr-check-method=xpac-hint \
    -O2 -c /path/to/llvm-test-suite/MultiSource/Benchmarks/Prolangs-C++/employ/driver.cpp -o /tmp/employ.o

MultiSource/Benchmarks/Prolangs-C++/ocean: Materialization of _ZTV8Obstacle+16 in function _ZN5Ocean12addObstaclesEv and of _ZTV8Predator+16 in function _ZN5Ocean12addPredatorsEv:

Command line
/path/to/bin/clang++ -DNDEBUG -target aarch64-linux-pauthtest \
    -march=v8.3-a --sysroot=... -rtlib=compiler-rt -stdlib=libc++ \
    -mllvm --aarch64-authenticated-lr-check-method=xpac-hint \
    -O2 /path/to/llvm-test-suite/MultiSource/Benchmarks/Prolangs-C++/ocean/ocean.cpp -c -o /tmp/ocean.o

MultiSource/Benchmarks/Prolangs-C++/simul: Materialization of _ZTV10pop_around+16 in function main:

Command line
/path/to/bin/clang++ -DNDEBUG -target aarch64-linux-pauthtest \
    -march=v8.3-a --sysroot=... -rtlib=compiler-rt -stdlib=libc++ \
    -mllvm --aarch64-authenticated-lr-check-method=xpac-hint \
    -O2 -c /path/to/llvm-test-suite/MultiSource/Benchmarks/Prolangs-C++/simul/simulate.cpp -o /tmp/simul.o

MultiSource/Benchmarks/Prolangs-C++/family: Materialization of _ZTV5Child+16 in function main:

Command line
/path/to/bin/clang++ -DNDEBUG -target aarch64-linux-pauthtest \
    -march=v8.3-a --sysroot=... -rtlib=compiler-rt -stdlib=libc++ \
    -mllvm --aarch64-authenticated-lr-check-method=xpac-hint \
    -O2 -c /path/to/llvm-test-suite/MultiSource/Benchmarks/Prolangs-C++/family/family.cpp -o /tmp/family.o

I added some debug printing, and it turned out that while several of the affected symbols occur multiple times in calls to @llvm.ptrauth.sign intrinsic, at least one such call was always ignored due to AddrDiscInst->getParent() != I.getParent() condition.

Copy link

github-actions bot commented May 21, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@atrosinenko atrosinenko force-pushed the pauth-isel-merge-pac branch from a6ac921 to 7af97c8 Compare May 21, 2025 19:46
@asl asl added this to the LLVM 21.x Release milestone Jun 2, 2025
@atrosinenko atrosinenko force-pushed the pauth-isel-merge-pac branch from f0714a6 to 6631434 Compare June 3, 2025 16:29
@atrosinenko
Copy link
Contributor Author

Ping.

I have updated this PR to account for possibility of immediate modifier substitution:

  movk    x10, #1234, #48
  bl      callee
  adrp    x17, :got:sym
  add     x17, x17, :got_lo12:sym
  pacda   x17, x10

In the above example, the discriminator is computed as blend(addr, 1234) - while it is impossible to prevent substitution of addr in general, it is definitely possible to at least make sure that 1234 is always blended into whatever is stored in x10 after callee returns (in the below code x10 is used for clarity - with this PR applied x16 would be used instead):

  bl      callee
  adrp    x17, :got:sym
  add     x17, x17, :got_lo12:sym
  movk    x10, #1234, #48
  pacda   x17, x10

In their current state, both this PR and #133788 protect both pointer-to-be-signed as well as the immediate modifier from tampering, but #133788 still has to be updated to handle GlobalISel in addition to DAGISel.

Hopefully, having a unified post-processing at the MachineInstr level instead of two separate implementations in DAGISel and GlobalISel could improve maintainability - though, it may be my bias towards MachineFunction passes vs. instruction selectors. Furthermore, it feels like with custom inserters it should be possible to extend #132857 to completely get rid of DAGIsel- and GlobalISel-specific selectors for AUT and PAC and use the selectors tablegen-erated from declarative patterns.

Another feature of custom inserters seems to be better handling of cross-basic-block analysis. If I understand it right, at finalize-isel stage virtual registers are in SSA form, and it should be safe to ignore basic blocks while following the chains of copy and increment/decrement instructions, as long as only virtual registers are traversed and no PHI instructions occur.

@var = global i32 0
@discvar = global i64 0

define i64 @foo(i1 %cond) {
entry:
  %addrdisc = load i64, ptr @discvar
  %disc = call i64 @llvm.ptrauth.blend(i64 %addrdisc, i64 42)
  %var.ptr.i = ptrtoint ptr @var to i64
  br i1 %cond, label %next, label %exit

next:
  call void asm sideeffect "nop", "r"(i64 %disc)
  br label %exit

exit:
  %signed = call i64 @llvm.ptrauth.sign(i64 %var.ptr.i, i32 2, i64 %disc)
  ret i64 %signed
}

!llvm.module.flags = !{!0}
!0 = !{i32 8, !"ptrauth-elf-got", i32 1}

here the DAGISel-based patch is unable to provide a PtrAuthGlobalAddress node because discriminator computation is in another basic block (note the bb.2.exit basic block in the dump after finalize-isel pass):

DAGISel-based patch
# ./bin/clang -target aarch64-linux-pauthtest -march=armv8.3-a /tmp/repro.ll -S -o- -O2 -mllvm -print-after=codegenprepare,finalize-isel
warning: overriding the module target triple with aarch64-unknown-linux-pauthtest [-Woverride-module]
        .file   "repro.ll"
*** IR Dump After CodeGen Prepare (codegenprepare) ***
define i64 @foo(i1 %cond) local_unnamed_addr {
entry:
  %addrdisc = load i64, ptr @discvar, align 8
  %disc = tail call i64 @llvm.ptrauth.blend(i64 %addrdisc, i64 42)
  br i1 %cond, label %next, label %exit

next:                                             ; preds = %entry
  tail call void asm sideeffect "nop", "r"(i64 %disc) #1
  br label %exit

exit:                                             ; preds = %next, %entry
  %signed = tail call i64 @llvm.ptrauth.sign(i64 ptrtoint (ptr @var to i64), i32 2, i64 %disc)
  ret i64 %signed
}
# *** IR Dump After Finalize ISel and expand pseudo-instructions (finalize-isel) ***:
# Machine code for function foo: IsSSA, TracksLiveness
Function Live Ins: $w0 in %1

bb.0.entry:
  successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%)
  liveins: $w0
  %1:gpr32 = COPY $w0
  %2:gpr64common = LOADgotAUTH target-flags(aarch64-got) @discvar, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv
  %3:gpr64 = LDRXui killed %2:gpr64common, 0 :: (dereferenceable load (s64) from @discvar)
  %4:gpr64 = MOVKXi %3:gpr64(tied-def 0), 42, 48
  %0:gpr64sp = COPY %4:gpr64
  TBZW %1:gpr32, 0, %bb.2
  B %bb.1

bb.1.next:
; predecessors: %bb.0
  successors: %bb.2(0x80000000); %bb.2(100.00%)

  %5:gpr64common = COPY %0:gpr64sp
  INLINEASM &nop [sideeffect] [attdialect], $0:[reguse:GPR64common], %5:gpr64common

bb.2.exit:
; predecessors: %bb.0, %bb.1

  %6:gpr64common = LOADgotAUTH target-flags(aarch64-got) @var, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv
  %7:gpr64 = PACDA %6:gpr64common(tied-def 0), %0:gpr64sp
  $x0 = COPY %7:gpr64
  RET_ReallyLR implicit $x0

# End machine code for function foo.

        .text
        .globl  foo                             // -- Begin function foo
        .p2align        2
        .type   foo,@function
foo:                                    // @foo
        .cfi_startproc
// %bb.0:                               // %entry
        adrp    x17, :got_auth:discvar
        add     x17, x17, :got_auth_lo12:discvar
        ldr     x16, [x17]
        autda   x16, x17
        mov     x17, x16
        xpacd   x17
        cmp     x16, x17
        b.eq    .Lauth_success_0
        brk     #0xc472
.Lauth_success_0:
        mov     x8, x16
        ldr     x8, [x8]
        movk    x8, #42, lsl #48
        tbz     w0, #0, .LBB0_2
// %bb.1:                               // %next
        //APP
        nop
        //NO_APP
.LBB0_2:                                // %exit
        adrp    x17, :got_auth:var
        add     x17, x17, :got_auth_lo12:var
        ldr     x16, [x17]
        autda   x16, x17
        mov     x17, x16
        xpacd   x17
        cmp     x16, x17
        b.eq    .Lauth_success_1
        brk     #0xc472
.Lauth_success_1:
        mov     x0, x16
        pacda   x0, x8
        ret
.Lfunc_end0:
        .size   foo, .Lfunc_end0-foo
        .cfi_endproc
                                        // -- End function
        .type   var,@object                     // @var
        .bss
        .globl  var
        .p2align        2, 0x0
var:
        .word   0                               // 0x0
        .size   var, 4

        .type   discvar,@object                 // @discvar
        .globl  discvar
        .p2align        3, 0x0
discvar:
        .xword  0                               // 0x0
        .size   discvar, 8

        .section        ".note.GNU-stack","",@progbits
        .addrsig
        .addrsig_sym var
1 warning generated.
Custom inserter-based patch
# ./bin/clang -target aarch64-linux-pauthtest -march=armv8.3-a /tmp/repro.ll -S -o- -O2 -mllvm -print-after=codegenprepare,finalize-isel
warning: overriding the module target triple with aarch64-unknown-linux-pauthtest [-Woverride-module]
        .file   "repro.ll"
*** IR Dump After CodeGen Prepare (codegenprepare) ***
define i64 @foo(i1 %cond) local_unnamed_addr {
entry:
  %addrdisc = load i64, ptr @discvar, align 8
  %disc = tail call i64 @llvm.ptrauth.blend(i64 %addrdisc, i64 42)
  br i1 %cond, label %next, label %exit

next:                                             ; preds = %entry
  tail call void asm sideeffect "nop", "r"(i64 %disc) #1
  br label %exit

exit:                                             ; preds = %next, %entry
  %signed = tail call i64 @llvm.ptrauth.sign(i64 ptrtoint (ptr @var to i64), i32 2, i64 %disc)
  ret i64 %signed
}
# *** IR Dump After Finalize ISel and expand pseudo-instructions (finalize-isel) ***:
# Machine code for function foo: IsSSA, TracksLiveness
Function Live Ins: $w0 in %1

bb.0.entry:
  successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%)
  liveins: $w0
  %1:gpr32 = COPY $w0
  %2:gpr64common = LOADgotAUTH target-flags(aarch64-got) @discvar, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv
  %3:gpr64 = LDRXui killed %2:gpr64common, 0 :: (dereferenceable load (s64) from @discvar)
  %4:gpr64 = MOVKXi %3:gpr64(tied-def 0), 42, 48
  %0:gpr64sp = COPY %4:gpr64
  TBZW %1:gpr32, 0, %bb.2
  B %bb.1

bb.1.next:
; predecessors: %bb.0
  successors: %bb.2(0x80000000); %bb.2(100.00%)

  %5:gpr64common = COPY %0:gpr64sp
  INLINEASM &nop [sideeffect] [attdialect], $0:[reguse:GPR64common], %5:gpr64common

bb.2.exit:
; predecessors: %bb.0, %bb.1

  %6:gpr64common = LOADgotAUTH target-flags(aarch64-got) @var, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv
  %8:gpr64noip = COPY %3:gpr64
  LOADgotPAC target-flags(aarch64-got) @var, 2, %8:gpr64noip, 42, implicit-def $x16, implicit-def $x17, implicit-def $nzcv
  %7:gpr64 = COPY $x16
  $x0 = COPY %7:gpr64
  RET_ReallyLR implicit $x0

# End machine code for function foo.

        .text
        .globl  foo                             // -- Begin function foo
        .p2align        2
        .type   foo,@function
foo:                                    // @foo
        .cfi_startproc
// %bb.0:                               // %entry
        adrp    x17, :got_auth:discvar
        add     x17, x17, :got_auth_lo12:discvar
        ldr     x16, [x17]
        autda   x16, x17
        mov     x17, x16
        xpacd   x17
        cmp     x16, x17
        b.eq    .Lauth_success_0
        brk     #0xc472
.Lauth_success_0:
        mov     x8, x16
        ldr     x8, [x8]
        tbz     w0, #0, .LBB0_2
// %bb.1:                               // %next
        mov     x9, x8
        movk    x9, #42, lsl #48
        //APP
        nop
        //NO_APP
.LBB0_2:                                // %exit
        adrp    x17, :got_auth:var
        add     x17, x17, :got_auth_lo12:var
        ldr     x16, [x17]
        autda   x16, x17
        mov     x17, x16
        xpacd   x17
        cmp     x16, x17
        b.eq    .Lauth_success_1
        brk     #0xc472
.Lauth_success_1:
        mov     x17, x8
        movk    x17, #42, lsl #48
        pacda   x16, x17
        mov     x0, x16
        ret
.Lfunc_end0:
        .size   foo, .Lfunc_end0-foo
        .cfi_endproc
                                        // -- End function
        .type   var,@object                     // @var
        .bss
        .globl  var
        .p2align        2, 0x0
var:
        .word   0                               // 0x0
        .size   var, 4

        .type   discvar,@object                 // @discvar
        .globl  discvar
        .p2align        3, 0x0
discvar:
        .xword  0                               // 0x0
        .size   discvar, 8

        .section        ".note.GNU-stack","",@progbits
        .addrsig
        .addrsig_sym var
1 warning generated.

kovdan01 added a commit to kovdan01/llvm-project that referenced this pull request Jun 9, 2025
When changing codegen (e.g. in llvm#130809), offsets in binaries produced by
LTO tests might change. We do not need to match concrete offset values, it's
enough to ensure that hex values in particular places are identical.
@atrosinenko
Copy link
Contributor Author

Turned out, this PR crashes MultiSource/Benchmarks/Ptrdist/ks/ks test from llvm-test-suite because I specified LOADgotAUTH as hasSideEffects = 0 to make DCE able to remove it if needed, but then MachineLICM moved LOADgotAUTH like this:

    renamable $x16 = COPY renamable $x12
    renamable $x20 = LOADgotAUTH target-flags(aarch64-got) @cost, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv
    B %bb.4

This patch seems to fix the issue: atrosinenko@ca1b92f.

kovdan01 added a commit that referenced this pull request Jun 17, 2025
When changing codegen (e.g. in #130809), offsets in binaries produced by
LTO tests might change. We do not need to match concrete offset values,
it's enough to ensure that hex values in particular places are
identical.

---------

Co-authored-by: Anatoly Trosinenko <[email protected]>
ajaden-codes pushed a commit to Jaddyen/llvm-project that referenced this pull request Jun 17, 2025
…43358)

When changing codegen (e.g. in llvm#130809), offsets in binaries produced by
LTO tests might change. We do not need to match concrete offset values,
it's enough to ensure that hex values in particular places are
identical.

---------

Co-authored-by: Anatoly Trosinenko <[email protected]>
fschlimb pushed a commit to fschlimb/llvm-project that referenced this pull request Jun 18, 2025
…43358)

When changing codegen (e.g. in llvm#130809), offsets in binaries produced by
LTO tests might change. We do not need to match concrete offset values,
it's enough to ensure that hex values in particular places are
identical.

---------

Co-authored-by: Anatoly Trosinenko <[email protected]>
In pauthtest ABI, it is common to store a pointer signed with address
diversity to a heap-allocated object (namely, storing a signed pointer
to VTable as part of new object construction). This patch tries to
prevent introducing a signing oracle by combining pointer
materialization and its (re)signing into a single pseudo instruction
which is not expanded until AsmPrinter, if possible.

One of the typical patterns is materializing an unsigned pointer with
`MOVaddr` pseudo and then signing it with `PAC[ID][AB]` instruction,
which can be moved far away from `MOVaddr` by one of the passes in the
machine pipeline. As the storage address is not a `Constant` value, one
cannot simply emit a `ptrauth` constant in the frontend, which would be
selected into `MOVaddrPAC` pseudo.

Another pattern is fetching a pointer to VTable from a signed GOT entry
using `LOADgotAUTH` pseudo, authenticating and checking it, and then
re-signing after adding an offset.

This commit adds an instruction insertion hook for `PAC[ID][AB]` which
detects the above patterns and replaces it either with `MOVaddrPAC` or
`LOADgotPAC` instruction.
@atrosinenko atrosinenko force-pushed the pauth-isel-merge-pac branch from ced57f0 to b4c02ae Compare June 19, 2025 16:03
@atrosinenko
Copy link
Contributor Author

atrosinenko commented Jun 19, 2025

Rebased to make use of the fix in #143358 and drop ad-hoc changes to lld/test/ELF/lto/aarch64-pac-got-func.ll (removed lld and lld:ELF labels, as they were added due to that test file).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

5 participants