Skip to content

Commit 850b492

Browse files
authored
[BOLT][binary-analysis] Add initial pac-ret gadget scanner (#122304)
This adds an initial pac-ret gadget scanner to the llvm-bolt-binary-analysis-tool. The scanner is taken from the prototype that was published last year at main...kbeyls:llvm-project:bolt-gadget-scanner-prototype, and has been discussed in RFC https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148 and in the EuroLLVM 2024 keynote "Does LLVM implement security hardenings correctly? A BOLT-based static analyzer to the rescue?" [Video](https://youtu.be/Sn_Fxa0tdpY) [Slides](https://llvm.org/devmtg/2024-04/slides/Keynote/Beyls_EuroLLVM2024_security_hardening_keynote.pdf) In the spirit of incremental development, this PR aims to add a minimal implementation that is "fully working" on its own, but has major limitations, as described in the bolt/docs/BinaryAnalysis.md documentation in this proposed commit. These and other limitations will be fixed in follow-on PRs, mostly based on code already existing in the prototype branch. I hope incrementally upstreaming will make it easier to review the code. Note that I believe that this could also form the basis of a scanner to analyze correct implementation of PAuthABI.
1 parent 86cb0bd commit 850b492

File tree

12 files changed

+2126
-5
lines changed

12 files changed

+2126
-5
lines changed

bolt/docs/BinaryAnalysis.md

Lines changed: 175 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,182 @@ analyses implemented in the BOLT libraries.
99

1010
## Which binary analyses are implemented?
1111

12-
At the moment, no binary analyses are implemented.
12+
* [Security scanners](#security-scanners)
13+
* [pac-ret analysis](#pac-ret-analysis)
1314

14-
The goal is to make it easy using a plug-in framework to add your own analyses.
15+
### Security scanners
16+
17+
For the past 25 years, a large numbers of exploits have been built and used in
18+
the wild to undermine computer security. The majority of these exploits abuse
19+
memory vulnerabilities in programs, see evidence from
20+
[Microsoft](https://youtu.be/PjbGojjnBZQ?si=oCHCa0SHgaSNr6Gr&t=836),
21+
[Chromium](https://www.chromium.org/Home/chromium-security/memory-safety/) and
22+
[Android](https://security.googleblog.com/2021/01/data-driven-security-hardening-in.html).
23+
24+
It is not surprising therefore, that a large number of mitigations have been
25+
added to instruction sets and toolchains to make it harder to build an exploit
26+
using a memory vulnerability. Examples are: stack canaries, stack clash,
27+
pac-ret, shadow stacks, arm64e, and many more.
28+
29+
These mitigations guarantee a so-called "security property" on the binaries they
30+
produce. For example, for stack canaries, the security property is roughly that
31+
a canary is located on the stack between the set of saved registers and the set
32+
of local variables. For pac-ret, it is roughly that either the return address is
33+
never stored/retrieved to/from memory; or, there are no writes to the register
34+
containing the return address between an instruction authenticating it and a
35+
return instruction using it.
36+
37+
From time to time, however, a bug gets found in the implementation of such
38+
mitigations in toolchains. Also, code that is written in assembler by hand
39+
requires the developer to ensure these security properties by hand.
40+
41+
In short, it is sometimes found that a few places in the binary code are not
42+
protected as well as expected given the requested mitigations. Attackers could
43+
make use of those places (sometimes called gadgets) to circumvent the protection
44+
that the mitigation should give.
45+
46+
One of the reasons that such gadgets, or holes in the mitigation implementation,
47+
exist is that typically the amount of testing and verification for these
48+
security properties is limited to checking results on specific examples.
49+
50+
In comparison, for testing functional correctness, or for testing performance,
51+
toolchain and software in general typically get tested with large test suites
52+
and benchmarks. In contrast, this typically does not get done for testing the
53+
security properties of binary code.
54+
55+
Unlike functional correctness where compilation errors result in test failures,
56+
and performance where speed and size differences are measurable, broken security
57+
properties cannot be easily observed using existing testing and benchmarking
58+
tools.
59+
60+
The security scanners implemented in `llvm-bolt-binary-analysis` aim to enable
61+
the testing of security hardening in arbitrary programs and not just specific
62+
examples.
63+
64+
65+
#### pac-ret analysis
66+
67+
`pac-ret` protection is a security hardening scheme implemented in compilers
68+
such as GCC and Clang, using the command line option
69+
`-mbranch-protection=pac-ret`. This option is enabled by default on most widely
70+
used Linux distributions.
71+
72+
The hardening scheme mitigates
73+
[Return-Oriented Programming (ROP)](https://llsoftsec.github.io/llsoftsecbook/#return-oriented-programming)
74+
attacks by making sure that return addresses are only ever stored to memory with
75+
a cryptographic hash, called a
76+
["Pointer Authentication Code" (PAC)](https://llsoftsec.github.io/llsoftsecbook/#pointer-authentication),
77+
in the upper bits of the pointer. This makes it substantially harder for
78+
attackers to divert control flow by overwriting a return address with a
79+
different value.
80+
81+
The hardening scheme relies on compilers producing appropriate code sequences when
82+
processing return addresses, especially when these are stored to and retrieved
83+
from memory.
84+
85+
The `pac-ret` binary analysis can be invoked using the command line option
86+
`--scanners=pac-ret`. It makes `llvm-bolt-binary-analysis` scan through the
87+
provided binary, checking each function for the following security property:
88+
89+
> For each procedure and exception return instruction, the destination register
90+
> must have one of the following properties:
91+
>
92+
> 1. be immutable within the function, or
93+
> 2. the last write to the register must be by an authenticating instruction. This
94+
> includes combined authentication and return instructions such as `RETAA`.
95+
96+
##### Example 1
97+
98+
For example, a typical non-pac-ret-protected function looks as follows:
99+
100+
```
101+
stp x29, x30, [sp, #-0x10]!
102+
mov x29, sp
103+
bl g@PLT
104+
add x0, x0, #0x3
105+
ldp x29, x30, [sp], #0x10
106+
ret
107+
```
108+
109+
The return instruction `ret` implicitly uses register `x30` as the address to
110+
return to. Register `x30` was last written by instruction `ldp`, which is not an
111+
authenticating instruction. `llvm-bolt-binary-analysis --scanners=pac-ret` will
112+
report this as follows:
113+
114+
```
115+
GS-PACRET: non-protected ret found in function f1, basic block .LBB00, at address 10310
116+
The return instruction is 00010310: ret # pacret-gadget: pac-ret-gadget<Ret:MCInstBBRef<BB:.LBB00:6>, Overwriting:[MCInstBBRef<BB:.LBB00:5> ]>
117+
The 1 instructions that write to the return register after any authentication are:
118+
1. 0001030c: ldp x29, x30, [sp], #0x10
119+
This happens in the following basic block:
120+
000102fc: stp x29, x30, [sp, #-0x10]!
121+
00010300: mov x29, sp
122+
00010304: bl g@PLT
123+
00010308: add x0, x0, #0x3
124+
0001030c: ldp x29, x30, [sp], #0x10
125+
00010310: ret # pacret-gadget: pac-ret-gadget<Ret:MCInstBBRef<BB:.LBB00:6>, Overwriting:[MCInstBBRef<BB:.LBB00:5> ]>
126+
```
127+
128+
The exact format of how `llvm-bolt-binary-analysis` reports this is expected to
129+
evolve over time.
130+
131+
##### Example 2: multiple "last-overwriting" instructions
132+
133+
A simple example that shows how there can be a set of "last overwriting"
134+
instructions of a register follows:
135+
136+
```
137+
paciasp
138+
stp x29, x30, [sp, #-0x10]!
139+
ldp x29, x30, [sp], #0x10
140+
cbnz x0, 1f
141+
autiasp
142+
1:
143+
ret
144+
```
145+
146+
This will produce the following diagnostic:
147+
148+
```
149+
GS-PACRET: non-protected ret found in function f_crossbb1, basic block .Ltmp0, at address 102dc
150+
The return instruction is 000102dc: ret # pacret-gadget: pac-ret-gadget<Ret:MCInstBBRef<BB:.Ltmp0:0>, Overwriting:[MCInstBBRef<BB:.LFT0:0> MCInstBBRef<BB:.LBB00:2> ]>
151+
The 2 instructions that write to the return register after any authentication are:
152+
1. 000102d0: ldp x29, x30, [sp], #0x10
153+
2. 000102d8: autiasp
154+
```
155+
156+
(Yes, this diagnostic could be improved because the second "overwriting"
157+
instruction, `autiasp`, is an authenticating instruction...)
158+
159+
##### Known false positives or negatives
160+
161+
The following are current known cases of false positives:
162+
163+
1. Not handling "no-return" functions. See issue
164+
[#115154](https://github.com/llvm/llvm-project/issues/115154) for details and
165+
pointers to open PRs to fix this.
166+
2. Not recognizing that a move of a properly authenticated value between registers,
167+
results in the destination register having a properly authenticated value.
168+
For example, the scanner currently produces a false negative for the following
169+
code sequence:
170+
```
171+
autiasp
172+
mov x16, x30
173+
ret x16
174+
```
175+
176+
The following are current known cases of false negatives:
177+
178+
1. Not handling functions for which the CFG cannot be reconstructed by BOLT. The
179+
plan is to implement support for this, picking up the implementation from the
180+
[prototype branch](
181+
https://github.com/llvm/llvm-project/compare/main...kbeyls:llvm-project:bolt-gadget-scanner-prototype).
182+
183+
BOLT cannot currently handle functions with `cfi_negate_ra_state` correctly,
184+
i.e. any binaries built with `-mbranch-protection=pac-ret`. The scanner is meant
185+
to be used on specifically such binaries, so this is a major limitation! Work is
186+
going on in PR [#120064](https://github.com/llvm/llvm-project/pull/120064) to
187+
fix this.
15188

16189
## How to add your own binary analysis
17190

bolt/include/bolt/Core/MCPlusBuilder.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
#include "llvm/MC/MCInstrAnalysis.h"
2828
#include "llvm/MC/MCInstrDesc.h"
2929
#include "llvm/MC/MCInstrInfo.h"
30+
#include "llvm/MC/MCRegister.h"
3031
#include "llvm/Support/Allocator.h"
3132
#include "llvm/Support/Casting.h"
3233
#include "llvm/Support/ErrorHandling.h"
@@ -550,6 +551,22 @@ class MCPlusBuilder {
550551
return Analysis->isReturn(Inst);
551552
}
552553

554+
virtual ErrorOr<MCPhysReg> getAuthenticatedReg(const MCInst &Inst) const {
555+
llvm_unreachable("not implemented");
556+
return getNoRegister();
557+
}
558+
559+
virtual bool isAuthenticationOfReg(const MCInst &Inst,
560+
MCPhysReg AuthenticatedReg) const {
561+
llvm_unreachable("not implemented");
562+
return false;
563+
}
564+
565+
virtual ErrorOr<MCPhysReg> getRegUsedAsRetDest(const MCInst &Inst) const {
566+
llvm_unreachable("not implemented");
567+
return getNoRegister();
568+
}
569+
553570
virtual bool isTerminator(const MCInst &Inst) const;
554571

555572
virtual bool isNoop(const MCInst &Inst) const {

0 commit comments

Comments
 (0)