-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[BOLT][binary-analysis] Add initial pac-ret gadget scanner #122304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This adds an initial pac-ret gadget scanner to the llvm-bolt-binary-analysis-tool. The scanner is taken from the prototype that was published last year at llvm/llvm-project@main...kbeyls:llvm-project:bolt-gadget-scanner-prototype, and has been discussed in RFC https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148 and in the EuroLLVM 2024 keynote "Does LLVM implement security hardenings correctly? A BOLT-based static analyzer to the rescue?" [Video](https://youtu.be/Sn_Fxa0tdpY) [Slides](https://llvm.org/devmtg/2024-04/slides/Keynote/Beyls_EuroLLVM2024_security_hardening_keynote.pdf) In the spirit of incremental development, this PR aims to add a minimal implementation that is "fully working" on its own, but has major limitations, as described in the bolt/docs/BinaryAnalysis.md documentation in this proposed commit. These and other limitations will be fixed in follow-on PRs, mostly based on code already existing in the prototype branch. I hope incrementally upstreaming will make it easier to review the code. That being said, this patch isn't really small. I believe that this patch needs 2 "kinds" of reviewers: 1. People familiar with BOLT internals. 2. People familiar with the pac-ret hardening scheme and AArch64. I expect people familiar with the pac-ret hardening scheme might need to focus on reviewing mostly what is in file `NonPacProtectedRetAnalysis.cpp`, `AArch64MCPlusBuilder.cpp` and the unit tests. I expect people familiar with BOLT might mainly need to focus on the other parts, and the high-level structure of `NonPacProtectedRetAnalysis.cpp`. The patch is not very polished currently, but I think it's in the right shape to start getting it reviewed. Note that I believe that this could also form the basis of a scanner to analyze correct implementation of PAuthABI.
@llvm/pr-subscribers-bolt Author: Kristof Beyls (kbeyls) ChangesThis adds an initial pac-ret gadget scanner to the llvm-bolt-binary-analysis-tool. The scanner is taken from the prototype that was published last year at main...kbeyls:llvm-project:bolt-gadget-scanner-prototype, and has been discussed in RFC In the spirit of incremental development, this PR aims to add a minimal implementation that is "fully working" on its own, but has major limitations, as described in the bolt/docs/BinaryAnalysis.md documentation in this proposed commit. These and other limitations will be fixed in follow-on PRs, mostly based on code already existing in the prototype branch. I hope incrementally upstreaming will make it easier to review the code. That being said, this patch isn't really small. I believe that this patch needs 2 "kinds" of reviewers:
I expect people familiar with the pac-ret hardening scheme might need to focus on reviewing mostly what is in file I expect people familiar with BOLT might mainly need to focus on the other parts, and the high-level structure of The patch is not very polished currently, but I think it's in the right shape to start getting it reviewed. Note that I believe that this could also form the basis of a scanner to analyze correct implementation of PAuthABI. Patch is 69.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122304.diff 12 Files Affected:
diff --git a/bolt/docs/BinaryAnalysis.md b/bolt/docs/BinaryAnalysis.md
index f91b77d046de8f..e247ce20b4625c 100644
--- a/bolt/docs/BinaryAnalysis.md
+++ b/bolt/docs/BinaryAnalysis.md
@@ -9,9 +9,168 @@ analyses implemented in the BOLT libraries.
## Which binary analyses are implemented?
-At the moment, no binary analyses are implemented.
+* [Security scanners](#security-scanners)
+ * [pac-ret analysis](#pac-ret-analysis)
-The goal is to make it easy using a plug-in framework to add your own analyses.
+### Security scanners
+
+For the past 25 years, a large numbers of exploits have been built and used in
+the wild to undermine computer security. The majority of these exploits abuse
+memory vulnerabilities in programs, see evidence from
+[Microsoft](https://youtu.be/PjbGojjnBZQ?si=oCHCa0SHgaSNr6Gr&t=836),
+[Chromium](https://www.chromium.org/Home/chromium-security/memory-safety/) and
+[Android](https://security.googleblog.com/2021/01/data-driven-security-hardening-in.html).
+
+It is not surprising therefore, that a large number of mitigations have been
+added to instruction sets and toolchains to make it harder to build an exploit
+using a memory vulnerability. Examples are: stack canaries, stack clash,
+pac-ret, shadow stacks, arm64e, and many more.
+
+These mitigations guarantee a so-called "security property" on the binaries they
+produce. For example, for stack canaries, the security property is roughly that
+a canary is located on the stack between the set of saved variables and set of
+local variables. For pac-ret, it is roughly that there are no writes to the
+register containing the return address after either an authenticating
+instruction or a Branch-and-link instruction.
+
+From time to time, however, a bug gets found in the implementation of such
+mitigations in toolchains. Also, code that is written in assembler by hand
+requires the developer to ensure these security properties by hand.
+
+In short, it is sometimes found that a few places in the binary code are not
+protected as expected given the requested mitigations. Attackers could make use
+of those places (sometimes called gadgets) to circumvent to protection that the
+mitigation should give.
+
+One of the reasons that such gadgets, or holes in the mitigation implementation,
+exist is that typically the amount of testing and verification for these
+security properties is pretty limited.
+
+In comparison, for testing functional correctness, or for testing performance,
+toolchain and software in general typically get tested with large test suites
+and benchmarks and testing and benchmarking plans. In contrast, this typically
+does not get done for testing the security properties of binary code.
+
+The security scanners implemented in `llvm-bolt-binary-analysis` aim to enable
+better testing of security hardening implementations that require compilers to
+somehow generate assembly code with specific security properties.
+
+#### pac-ret analysis
+
+`pac-ret` protection is a security hardening scheme implemented in compilers
+such as gcc and clang, using the command line option
+`-mbranch-protection=pac-ret`. This option is enabled by default on most widely
+used distributions.
+
+The hardening scheme mitigates
+[Return-Oriented Programming (ROP)](https://llsoftsec.github.io/llsoftsecbook/#return-oriented-programming)
+attacks by making sure that return addresses are only ever stored to memory with
+a cryptographic hash, called a
+["Pointer Authentication Code" (PAC)](https://llsoftsec.github.io/llsoftsecbook/#pointer-authentication),
+in the upper bits of the pointer. This makes it substantially harder for
+attackers to divert control flow by overwriting a return address with a
+different value.
+
+The hardening scheme relies on compilers producing different code sequences when
+processing return addresses, especially when these are stored to and retrieved
+from memory.
+
+The 'pac-ret' binary analysis can be invoked using the command line option
+`--scanners=pac-ret`. It makes `llvm-bolt-binary-analysis` scan through the
+provided binary and check the following property:
+
+The security property that is checked is:
+
+When a register is used as the address to jump to in a return instruction,
+that register must either:
+
+1. never be changed within this function, i.e. have the same value as when the
+ function started, or
+2. the last write to the register must be by an authenticating instruction.
+
+##### Example 1
+
+For example, a typical non-pac-ret-protected function looks as follows:
+
+```
+stp x29, x30, [sp, #-0x10]!
+mov x29, sp
+bl g@PLT
+add x0, x0, #0x3
+ldp x29, x30, [sp], #0x10
+ret
+```
+
+The return instruction `ret` uses register `x30` as containing the address to
+return to. Register `x30` was last written by instruction `ldp`, which is not an
+authenticating instruction. `llvm-bolt-binary-analysis --scanners=pac-ret` will
+report this as follows:
+
+```
+GS-PACRET: non-protected ret found in function f1, basic block .LBB00, at address 10310
+ The return instruction is 00010310: ret # pacret-gadget: pac-ret-gadget<Ret:MCInstBBRef<BB:.LBB00:6>, Overwriting:[MCInstBBRef<BB:.LBB00:5> ]>
+ The 1 instructions that write to the return register after any authentication are:
+ 1. 0001030c: ldp x29, x30, [sp], #0x10
+ This happens in the following basic block:
+ 000102fc: stp x29, x30, [sp, #-0x10]!
+ 00010300: mov x29, sp
+ 00010304: bl g@PLT
+ 00010308: add x0, x0, #0x3
+ 0001030c: ldp x29, x30, [sp], #0x10
+ 00010310: ret # pacret-gadget: pac-ret-gadget<Ret:MCInstBBRef<BB:.LBB00:6>, Overwriting:[MCInstBBRef<BB:.LBB00:5> ]>
+```
+
+The exact format of how `llvm-bolt-binary-analysis` reports this is expected to
+evolve over time.
+
+##### Example 2: multiple "last-overwriting" instructions
+
+A simple example that show how there can be a set of "last overwriting"
+instructions of a register is the following:
+
+```
+ paciasp
+ stp x29, x30, [sp, #-16]!
+ ldp x29, x30, [sp], #16
+ cbnz x0, 1f
+ autiasp
+1:
+ ret
+```
+
+This will produce the following diagnostic:
+
+```
+GS-PACRET: non-protected ret found in function f_crossbb1, basic block .Ltmp0, at address 102dc
+ The return instruction is 000102dc: ret # pacret-gadget: pac-ret-gadget<Ret:MCInstBBRef<BB:.Ltmp0:0>, Overwriting:[MCInstBBRef<BB:.LFT0:0> MCInstBBRef<BB:.LBB00:2> ]>
+ The 2 instructions that write to the return register after any authentication are:
+ 1. 000102d0: ldp x29, x30, [sp], #0x10
+ 2. 000102d8: autiasp
+```
+
+(Yes, this diagnostic could be improved because the second "overwriting"
+instruction, `autiasp`, is an authenticating instruction...)
+
+##### Known false positives or negatives
+
+The following are current known cases of false positives:
+
+1. Not handling "no-return" functions. See issue
+ [#115154](https://github.com/llvm/llvm-project/issues/115154) for details and
+ pointers to open PRs to fix this.
+
+The folowing are current known cases of false negatives:
+
+1. Not handling functions for which the CFG cannot be reconstructed by BOLT. The
+ plan is to implement support for this, picking up the implementation from the
+ [prototype branch](
+ https://github.com/llvm/llvm-project/compare/main...kbeyls:llvm-project:bolt-gadget-scanner-prototype).
+
+BOLT cannot currently handle functions with `cfi_negate_ra_state` correctly,
+i.e. any binaries built with `-mbranch-protection=pac-ret`. The scanner is meant
+to be used on specifically such binaries, so this is a major limitation! Work is
+going on in PR [#120064](https://github.com/llvm/llvm-project/pull/120064) to
+fix this.
## How to add your own binary analysis
diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h
index 3634fed9757ceb..9351107e66bdf6 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -550,6 +550,22 @@ class MCPlusBuilder {
return Analysis->isReturn(Inst);
}
+ virtual MCPhysReg getAuthenticatedReg(const MCInst &Inst) const {
+ llvm_unreachable("not implemented");
+ return false;
+ }
+
+ virtual bool isAuthenticationOfReg(const MCInst &Inst,
+ const unsigned RegAuthenticated) const {
+ llvm_unreachable("not implemented");
+ return false;
+ }
+
+ virtual llvm::MCPhysReg getRegUsedAsRetDest(const MCInst &Inst) const {
+ llvm_unreachable("not implemented");
+ return getNoRegister();
+ }
+
virtual bool isTerminator(const MCInst &Inst) const;
virtual bool isNoop(const MCInst &Inst) const {
diff --git a/bolt/include/bolt/Passes/NonPacProtectedRetAnalysis.h b/bolt/include/bolt/Passes/NonPacProtectedRetAnalysis.h
new file mode 100644
index 00000000000000..a9740896369fa2
--- /dev/null
+++ b/bolt/include/bolt/Passes/NonPacProtectedRetAnalysis.h
@@ -0,0 +1,209 @@
+//===- bolt/Passes/NonPacProtectedRetAnalysis.h -----------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef BOLT_PASSES_NONPACPROTECTEDRETANALYSIS_H
+#define BOLT_PASSES_NONPACPROTECTEDRETANALYSIS_H
+
+#include "bolt/Core/BinaryContext.h"
+#include "bolt/Core/BinaryFunction.h"
+#include "bolt/Passes/BinaryPasses.h"
+#include "llvm/ADT/SmallSet.h"
+
+namespace llvm {
+namespace bolt {
+
+/// @brief MCInstReference represents a reference to an MCInst as stored either
+/// in a BinaryFunction (i.e. before a CFG is created), or in a BinaryBasicBlock
+/// (after a CFG is created). It aims to store the necessary information to be
+/// able to find the specific MCInst in either the BinaryFunction or
+/// BinaryBasicBlock data structures later, so that e.g. the InputAddress of
+/// the corresponding instruction can be computed.
+
+struct MCInstInBBReference {
+ BinaryBasicBlock *BB;
+ int64_t BBIndex;
+ MCInstInBBReference(BinaryBasicBlock *BB, int64_t BBIndex)
+ : BB(BB), BBIndex(BBIndex) {}
+ MCInstInBBReference() : BB(nullptr), BBIndex(0) {}
+ static MCInstInBBReference get(const MCInst *Inst, BinaryFunction &BF) {
+ for (BinaryBasicBlock& BB : BF)
+ for (size_t I = 0; I < BB.size(); ++I)
+ if (Inst == &(BB.getInstructionAtIndex(I)))
+ return MCInstInBBReference(&BB, I);
+ return {};
+ }
+ bool operator==(const MCInstInBBReference &RHS) const {
+ return BB == RHS.BB && BBIndex == RHS.BBIndex;
+ }
+ bool operator<(const MCInstInBBReference &RHS) const {
+ if (BB != RHS.BB)
+ return BB < RHS.BB;
+ return BBIndex < RHS.BBIndex;
+ }
+ operator MCInst &() const {
+ assert(BB != nullptr);
+ return BB->getInstructionAtIndex(BBIndex);
+ }
+ uint64_t getAddress() const {
+ // 4 bytes per instruction on AArch64;
+ return BB->getFunction()->getAddress() + BB->getOffset() + BBIndex * 4;
+ }
+};
+
+raw_ostream &operator<<(raw_ostream &OS, const MCInstInBBReference &);
+
+struct MCInstInBFReference {
+ BinaryFunction *BF;
+ uint32_t Offset;
+ MCInstInBFReference(BinaryFunction *BF, uint32_t Offset)
+ : BF(BF), Offset(Offset) {}
+ MCInstInBFReference() : BF(nullptr) {}
+ bool operator==(const MCInstInBFReference &RHS) const {
+ return BF == RHS.BF && Offset == RHS.Offset;
+ }
+ bool operator<(const MCInstInBFReference &RHS) const {
+ if (BF != RHS.BF)
+ return BF < RHS.BF;
+ return Offset < RHS.Offset;
+ }
+ operator MCInst &() const {
+ assert(BF != nullptr);
+ return *(BF->getInstructionAtOffset(Offset));
+ }
+
+ uint64_t getOffset() const { return Offset; }
+
+ uint64_t getAddress() const {
+ return BF->getAddress() + getOffset();
+ }
+};
+
+raw_ostream &operator<<(raw_ostream &OS, const MCInstInBFReference &);
+
+struct MCInstReference {
+ enum StoredIn { _BinaryFunction, _BinaryBasicBlock };
+ StoredIn CurrentLocation;
+ union U {
+ MCInstInBBReference BBRef;
+ MCInstInBFReference BFRef;
+ U(MCInstInBBReference BBRef) : BBRef(BBRef) {}
+ U(MCInstInBFReference BFRef) : BFRef(BFRef) {}
+ } U;
+ MCInstReference(MCInstInBBReference BBRef)
+ : CurrentLocation(_BinaryBasicBlock), U(BBRef) {}
+ MCInstReference(MCInstInBFReference BFRef)
+ : CurrentLocation(_BinaryFunction), U(BFRef) {}
+ MCInstReference(BinaryBasicBlock *BB, int64_t BBIndex)
+ : MCInstReference(MCInstInBBReference(BB, BBIndex)) {}
+ MCInstReference(BinaryFunction *BF, uint32_t Offset)
+ : MCInstReference(MCInstInBFReference(BF, Offset)) {}
+
+ bool operator<(const MCInstReference &RHS) const {
+ if (CurrentLocation != RHS.CurrentLocation)
+ return CurrentLocation < RHS.CurrentLocation;
+ switch (CurrentLocation) {
+ case _BinaryBasicBlock:
+ return U.BBRef < RHS.U.BBRef;
+ case _BinaryFunction:
+ return U.BFRef < RHS.U.BFRef;
+ }
+ llvm_unreachable("");
+ }
+
+ bool operator==(const MCInstReference &RHS) const {
+ if (CurrentLocation != RHS.CurrentLocation)
+ return false;
+ switch (CurrentLocation) {
+ case _BinaryBasicBlock:
+ return U.BBRef == RHS.U.BBRef;
+ case _BinaryFunction:
+ return U.BFRef == RHS.U.BFRef;
+ }
+ llvm_unreachable("");
+ }
+
+ operator MCInst &() const {
+ switch (CurrentLocation) {
+ case _BinaryBasicBlock:
+ return U.BBRef;
+ case _BinaryFunction:
+ return U.BFRef;
+ }
+ llvm_unreachable("");
+ }
+
+ uint64_t getAddress() const {
+ switch (CurrentLocation) {
+ case _BinaryBasicBlock:
+ return U.BBRef.getAddress();
+ case _BinaryFunction:
+ return U.BFRef.getAddress();
+ }
+ llvm_unreachable("");
+ }
+
+ BinaryFunction *getFunction() const {
+ switch (CurrentLocation) {
+ case _BinaryFunction:
+ return U.BFRef.BF;
+ case _BinaryBasicBlock:
+ return U.BBRef.BB->getFunction();
+ }
+ llvm_unreachable("");
+ }
+
+ BinaryBasicBlock *getBasicBlock() const {
+ switch (CurrentLocation) {
+ case _BinaryFunction:
+ return nullptr;
+ case _BinaryBasicBlock:
+ return U.BBRef.BB;
+ }
+ llvm_unreachable("");
+ }
+};
+
+raw_ostream &operator<<(raw_ostream &OS, const MCInstReference &);
+
+struct NonPacProtectedRetGadget {
+ MCInstReference RetInst;
+ std::vector<MCInstReference> OverwritingRetRegInst;
+ bool operator==(const NonPacProtectedRetGadget &RHS) const {
+ return RetInst == RHS.RetInst &&
+ OverwritingRetRegInst == RHS.OverwritingRetRegInst;
+ }
+ NonPacProtectedRetGadget(
+ MCInstReference RetInst,
+ const std::vector<MCInstReference>& OverwritingRetRegInst)
+ : RetInst(RetInst), OverwritingRetRegInst(OverwritingRetRegInst) {}
+};
+
+raw_ostream &operator<<(raw_ostream &OS, const NonPacProtectedRetGadget &NPPRG);
+class PacRetAnalysis;
+
+class NonPacProtectedRetAnalysis : public BinaryFunctionPass {
+ void runOnFunction(BinaryFunction &Function,
+ MCPlusBuilder::AllocatorIdTy AllocatorId);
+ SmallSet<MCPhysReg, 1>
+ computeDfState(PacRetAnalysis &PRA, BinaryFunction &BF,
+ MCPlusBuilder::AllocatorIdTy AllocatorId);
+ unsigned GadgetAnnotationIndex;
+
+public:
+ explicit NonPacProtectedRetAnalysis() : BinaryFunctionPass(false) {}
+
+ const char *getName() const override { return "non-pac-protected-rets"; }
+
+ /// Pass entry point
+ Error runOnFunctions(BinaryContext &BC) override;
+};
+
+} // namespace bolt
+} // namespace llvm
+
+#endif
\ No newline at end of file
diff --git a/bolt/include/bolt/Utils/CommandLineOpts.h b/bolt/include/bolt/Utils/CommandLineOpts.h
index 111eb650c37465..fefd7969ef6f4f 100644
--- a/bolt/include/bolt/Utils/CommandLineOpts.h
+++ b/bolt/include/bolt/Utils/CommandLineOpts.h
@@ -80,6 +80,10 @@ extern llvm::cl::opt<unsigned> Verbosity;
/// Return true if we should process all functions in the binary.
bool processAllFunctions();
+enum GadgetScannerKind { GS_PACRET, GS_ALL };
+
+extern llvm::cl::list<GadgetScannerKind> GadgetScannersToRun;
+
} // namespace opts
namespace llvm {
diff --git a/bolt/lib/Passes/CMakeLists.txt b/bolt/lib/Passes/CMakeLists.txt
index 1c1273b3d2420d..b3fbb0ba0108f4 100644
--- a/bolt/lib/Passes/CMakeLists.txt
+++ b/bolt/lib/Passes/CMakeLists.txt
@@ -23,6 +23,7 @@ add_llvm_library(LLVMBOLTPasses
LoopInversionPass.cpp
LivenessAnalysis.cpp
MCF.cpp
+ NonPacProtectedRetAnalysis.cpp
PatchEntries.cpp
PettisAndHansen.cpp
PLTCall.cpp
diff --git a/bolt/lib/Passes/NonPacProtectedRetAnalysis.cpp b/bolt/lib/Passes/NonPacProtectedRetAnalysis.cpp
new file mode 100644
index 00000000000000..d6bc902f663fc9
--- /dev/null
+++ b/bolt/lib/Passes/NonPacProtectedRetAnalysis.cpp
@@ -0,0 +1,520 @@
+//===- bolt/Passes/NonPacProtectedRetAnalysis.cpp -------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements a pass that looks for any AArch64 return instructions
+// that may not be protected by PAuth authentication instructions when needed.
+//
+// When needed = the register used to return (almost always X30), is potentially
+// written to between the AUThentication instruction and the RETurn instruction.
+//
+//===----------------------------------------------------------------------===//
+
+#include "bolt/Passes/NonPacProtectedRetAnalysis.h"
+#include "bolt/Core/ParallelUtilities.h"
+#include "bolt/Passes/DataflowAnalysis.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/MC/MCInst.h"
+#include "llvm/Support/Format.h"
+
+#define DEBUG_TYPE "bolt-nonpacprotectedret"
+
+namespace llvm {
+namespace bolt {
+
+raw_ostream &operator<<(raw_ostream &OS, const MCInstInBBReference &Ref) {
+ OS << "MCInstBBRef<";
+ if (Ref.BB == nullptr)
+ OS << "BB:(null)";
+ else
+ OS << "BB:" << Ref.BB->getName() << ":" << Ref.BBIndex;
+ OS << ">";
+ return OS;
+}
+
+raw_ostream &operator<<(raw_ostream &OS, const MCInstInBFReference &Ref) {
+ OS << "MCInstBFRef<";
+ if (Ref.BF == nullptr)
+ OS << "BF:(null)";
+ else
+ OS << "BF:" << Ref.BF->getPrintName() << ":" << Ref.getOffset();
+ OS << ">";
+ return OS;
+}
+
+raw_ostream &operator<<(raw_ostream &OS, const MCInstReference &Ref) {
+ switch (Ref.CurrentLocation) {
+ case MCInstReference::_BinaryBasicBlock:
+ OS << Ref.U.BBRef;
+ return OS;
+ case MCInstReference::_BinaryFunction:
+ OS << Ref.U.BFRef;
+ return OS;
+ }
+ llvm_unreachable("");
+}
+
+raw_ostream &operator<<(raw_ostream &OS,
+ const NonPacProtectedRetGadget &NPPRG) {
+ OS << "pac-ret-gadget<";
+ OS << "Ret:" << NPPRG.RetInst << ", ";
+ OS << "Overwriting:[";
+ for (auto Ref : NPPRG.OverwritingRetRegInst)
+ OS << Ref << " ";
+ OS << "]>";
+ return OS;
+}
+
+// The security property that is checked is:
+// When a register is used as the address to jump to in a return instruction,
+// that register must either:
+// (a) never be changed within this function, i.e. have the same value as when
+// the function started, or
+// (b) the last write to the register must be by an authentication instruction.
+
+// This property is checked by using data flow analysis to keep track of which
+// registers have been written (def-ed), since last authenticated. Those are
+// exactly the registers containing values that should not be trusted (as they
+// could have changed since the last time they were authenticated). For pac-ret,
+// any return instruction using such a register is a gadget to be reported. For
+// PAuthABI, any indirect control flow using such a register should be reported?
+
+// This security property is verified using a dataflow analysis.
+
+// Furthermore, when producing a diagnostic for a found non-pac-ret protected
+// return, the analysis also lists the last instructions that wrote to the
+// register used in the return instruction.
+// The total set of registers used in return instructions in a given function is
+// small. It almost always is just `X30`.
+// In order to reduce the memory consumption of s...
[truncated]
|
@asl FYI: since you mentioned the usefulness of a scanner for checking whether PAuthABI was applied correctly to a binary in your presentation on PAuthABI at the US LLVM dev meeting: The scanner for pac-ret in this PR could be the basis of a scanner that also checks PAuthABI security properties. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Tagging @atrosinenko |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first pass through the comments/documentation. I've not had a chance to go through the code yet.
Although out of scope for this patch, I'm thinking it will be worth working on a mechanism to exclude certain functions from the analysis. For example if someone has used the branch protection attribute to explicitly exclude a function from branch-protection. An obvious starting point would be some kind of exclude list file, possibly using the same format as the sanitizers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR! Added a few polishing comments.
Thank you, I took all your suggestions.
That's a good call! I hadn't thought about that. I'll consider adding that in a later patch. One of the other later patches planned is to introduce support for passing from the command line which functions should be considered "no-return", to reduce false positive rate. Going with the same format as the sanitizers would probably also be a good idea for that. |
Thank you very much! I followed most of your suggestions and resolved the comments where I don't think there's any further discussion. I left a few comments unresolved where might have anything more to add... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kbeyls!
This is just a first pass of the implementation, mostly just nits. I'll have a look through the unit tests next week, but I'm already fairly confident after having used it a few times on real code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kbeyls! Here are a few more comments, most of them are trivial nit-picking.
…m correctly. ERETAA/ERETAB/ERET instructions return to the address stored in one of the system registers ELR_EL1, ELR_EL2 or ELR_EL3, depending on the current "Exception Level" state. I'm not aware of any examples of use of ERETA{A|B} in open source software currently. My understanding is that could be used as follows to create a pac-ret-like hardening for ERET: exception_entry: MRS x0, elr_el1 // Could be elr_el2, elr_el3 depending on whether // this code is designed to run at EL1, EL2 or EL3. PACIA x0, SP STR x0, [some place] ... LDR x0, [some place] MSR elr_el1, x0 ERETAA Assuming my understanding above is correct, this makes it impossible to write a fully static "pac-ret"-like checker for exception returns, because the static analyzer doesn't know whether the code will run at EL1, EL2 or EL3. Therefore, the static analyzer can't know whether to check for writes to elr_el1, elr_el2 or elr_el3. Therefore, in this commit, changes are made to produce a warning when any of the ERET* instructions are encountered, stating that the analyzer can't analyze them. This in turn results in the non-pac-ret-protected-return analyzer now needing to produce 2 different kinds of diagnostics. I found that this was most nicely modelled by creating a class hierarchy for these diagnostics. However, it seems that MCAnnotations currently do not work (well or at all) when trying to store any object from a class hierarchy, rather than an object with fixed type. At which point I realized that it was a bit strange that the diagnostics that need to be produced are stored using MCAnnotations at all... So, this patch redesigns that so that the diagnostics that need to be produced are not stored as MCAnnotations on an MCInst at all... Overall, it seems like this is a cleaner design and a better example for any future binary analyses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @kbeyls ! I added a few more comments covering all the files except NonPacProtectedRetAnalysis.cpp
- in other respects these files look good to me in their current state.
…ster values across functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, added several comments on NonPacProtectedRetAnalysis.cpp
file, in other respects the PR looks good to me as to a reviewer of the second kind ("People familiar with the pac-ret hardening scheme and AArch64").
I think we're converging on having this PR ready to be merged. |
The overall structure of the PR looks good to me. Thanks! |
I merged this PR a few hours ago. Thank you for the extensive reviews @atrosinenko , @jacobbramley and @smithp35 ! |
FYI I bisected a test failure to this change:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/23691 Here is the relevant piece of the build log for the reference
|
Thanks for reporting this. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/119/builds/4685 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/151/builds/4846 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/92/builds/14125 Here is the relevant piece of the build log for the reference
|
by making the regex to match basic block names more general. See failing test case that was reported on some system in comment #122304 (comment) These test cases were introduced in PR #122304, commit 850b492 .
by making the regex to match basic block names more general. See failing test case that was reported on some system in comment llvm/llvm-project#122304 (comment) These test cases were introduced in PR #122304, commit 850b492 .
This adds an initial pac-ret gadget scanner to the llvm-bolt-binary-analysis-tool.
The scanner is taken from the prototype that was published last year at main...kbeyls:llvm-project:bolt-gadget-scanner-prototype, and has been discussed in RFC
https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148 and in the EuroLLVM 2024 keynote "Does LLVM implement security hardenings correctly? A BOLT-based static analyzer to the rescue?" Video Slides
In the spirit of incremental development, this PR aims to add a minimal implementation that is "fully working" on its own, but has major limitations, as described in the bolt/docs/BinaryAnalysis.md documentation in this proposed commit. These and other limitations will be fixed in follow-on PRs, mostly based on code already existing in the prototype branch. I hope incrementally upstreaming will make it easier to review the code.
That being said, this patch isn't really small.
I believe that this patch needs 2 "kinds" of reviewers:
I expect people familiar with the pac-ret hardening scheme might need to focus on reviewing mostly what is in file
NonPacProtectedRetAnalysis.cpp
,AArch64MCPlusBuilder.cpp
and the unit tests.I expect people familiar with BOLT might mainly need to focus on the other parts, and the high-level structure of
NonPacProtectedRetAnalysis.cpp
.The patch is not very polished currently, but I think it's in the right shape to start getting it reviewed.
Note that I believe that this could also form the basis of a scanner to analyze correct implementation of PAuthABI.