Skip to content

Commit db53dce

Browse files
committed
Merge branch 'bpf-rewrite-value-tracking-in-verifier'
Edward Cree says: ==================== bpf: rewrite value tracking in verifier This series simplifies alignment tracking, generalises bounds tracking and fixes some bounds-tracking bugs in the BPF verifier. Pointer arithmetic on packet pointers, stack pointers, map value pointers and context pointers has been unified, and bounds on these pointers are only checked when the pointer is dereferenced. Operations on pointers which destroy all relation to the original pointer (such as multiplies and shifts) are disallowed if !env->allow_ptr_leaks, otherwise they convert the pointer to an unknown scalar and feed it to the normal scalar arithmetic handling. Pointer types have been unified with the corresponding adjusted-pointer types where those existed (e.g. PTR_TO_MAP_VALUE[_ADJ] or FRAME_PTR vs PTR_TO_STACK); similarly, CONST_IMM and UNKNOWN_VALUE have been unified into SCALAR_VALUE. Pointer types (except CONST_PTR_TO_MAP, PTR_TO_MAP_VALUE_OR_NULL and PTR_TO_PACKET_END, which do not allow arithmetic) have a 'fixed offset' and a 'variable offset'; the former is used when e.g. adding an immediate or a known-constant register, as long as it does not overflow. Otherwise the latter is used, and any operation creating a new variable offset creates a new 'id' (and, for PTR_TO_PACKET, clears the 'range'). SCALAR_VALUEs use the 'variable offset' fields to track the range of possible values; the 'fixed offset' should never be set on a scalar. ==================== Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2 parents e1cb90f + 8e17c1b commit db53dce

File tree

10 files changed

+2217
-1281
lines changed

10 files changed

+2217
-1281
lines changed

Documentation/networking/filter.txt

Lines changed: 104 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -793,7 +793,7 @@ Some core changes of the new internal format:
793793
bpf_exit
794794

795795
After the call the registers R1-R5 contain junk values and cannot be read.
796-
In the future an eBPF verifier can be used to validate internal BPF programs.
796+
An in-kernel eBPF verifier is used to validate internal BPF programs.
797797

798798
Also in the new design, eBPF is limited to 4096 insns, which means that any
799799
program will terminate quickly and will only call a fixed number of kernel
@@ -1017,7 +1017,7 @@ At the start of the program the register R1 contains a pointer to context
10171017
and has type PTR_TO_CTX.
10181018
If verifier sees an insn that does R2=R1, then R2 has now type
10191019
PTR_TO_CTX as well and can be used on the right hand side of expression.
1020-
If R1=PTR_TO_CTX and insn is R2=R1+R1, then R2=UNKNOWN_VALUE,
1020+
If R1=PTR_TO_CTX and insn is R2=R1+R1, then R2=SCALAR_VALUE,
10211021
since addition of two valid pointers makes invalid pointer.
10221022
(In 'secure' mode verifier will reject any type of pointer arithmetic to make
10231023
sure that kernel addresses don't leak to unprivileged users)
@@ -1039,7 +1039,7 @@ is a correct program. If there was R1 instead of R6, it would have
10391039
been rejected.
10401040

10411041
load/store instructions are allowed only with registers of valid types, which
1042-
are PTR_TO_CTX, PTR_TO_MAP, FRAME_PTR. They are bounds and alignment checked.
1042+
are PTR_TO_CTX, PTR_TO_MAP, PTR_TO_STACK. They are bounds and alignment checked.
10431043
For example:
10441044
bpf_mov R1 = 1
10451045
bpf_mov R2 = 2
@@ -1058,7 +1058,7 @@ intends to load a word from address R6 + 8 and store it into R0
10581058
If R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know
10591059
that offset 8 of size 4 bytes can be accessed for reading, otherwise
10601060
the verifier will reject the program.
1061-
If R6=FRAME_PTR, then access should be aligned and be within
1061+
If R6=PTR_TO_STACK, then access should be aligned and be within
10621062
stack bounds, which are [-MAX_BPF_STACK, 0). In this example offset is 8,
10631063
so it will fail verification, since it's out of bounds.
10641064

@@ -1069,7 +1069,7 @@ For example:
10691069
bpf_ld R0 = *(u32 *)(R10 - 4)
10701070
bpf_exit
10711071
is invalid program.
1072-
Though R10 is correct read-only register and has type FRAME_PTR
1072+
Though R10 is correct read-only register and has type PTR_TO_STACK
10731073
and R10 - 4 is within stack bounds, there were no stores into that location.
10741074

10751075
Pointer register spill/fill is tracked as well, since four (R6-R9)
@@ -1094,6 +1094,71 @@ all use cases.
10941094

10951095
See details of eBPF verifier in kernel/bpf/verifier.c
10961096

1097+
Register value tracking
1098+
-----------------------
1099+
In order to determine the safety of an eBPF program, the verifier must track
1100+
the range of possible values in each register and also in each stack slot.
1101+
This is done with 'struct bpf_reg_state', defined in include/linux/
1102+
bpf_verifier.h, which unifies tracking of scalar and pointer values. Each
1103+
register state has a type, which is either NOT_INIT (the register has not been
1104+
written to), SCALAR_VALUE (some value which is not usable as a pointer), or a
1105+
pointer type. The types of pointers describe their base, as follows:
1106+
PTR_TO_CTX Pointer to bpf_context.
1107+
CONST_PTR_TO_MAP Pointer to struct bpf_map. "Const" because arithmetic
1108+
on these pointers is forbidden.
1109+
PTR_TO_MAP_VALUE Pointer to the value stored in a map element.
1110+
PTR_TO_MAP_VALUE_OR_NULL
1111+
Either a pointer to a map value, or NULL; map accesses
1112+
(see section 'eBPF maps', below) return this type,
1113+
which becomes a PTR_TO_MAP_VALUE when checked != NULL.
1114+
Arithmetic on these pointers is forbidden.
1115+
PTR_TO_STACK Frame pointer.
1116+
PTR_TO_PACKET skb->data.
1117+
PTR_TO_PACKET_END skb->data + headlen; arithmetic forbidden.
1118+
However, a pointer may be offset from this base (as a result of pointer
1119+
arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable
1120+
offset'. The former is used when an exactly-known value (e.g. an immediate
1121+
operand) is added to a pointer, while the latter is used for values which are
1122+
not exactly known. The variable offset is also used in SCALAR_VALUEs, to track
1123+
the range of possible values in the register.
1124+
The verifier's knowledge about the variable offset consists of:
1125+
* minimum and maximum values as unsigned
1126+
* minimum and maximum values as signed
1127+
* knowledge of the values of individual bits, in the form of a 'tnum': a u64
1128+
'mask' and a u64 'value'. 1s in the mask represent bits whose value is unknown;
1129+
1s in the value represent bits known to be 1. Bits known to be 0 have 0 in both
1130+
mask and value; no bit should ever be 1 in both. For example, if a byte is read
1131+
into a register from memory, the register's top 56 bits are known zero, while
1132+
the low 8 are unknown - which is represented as the tnum (0x0; 0xff). If we
1133+
then OR this with 0x40, we get (0x40; 0xcf), then if we add 1 we get (0x0;
1134+
0x1ff), because of potential carries.
1135+
Besides arithmetic, the register state can also be updated by conditional
1136+
branches. For instance, if a SCALAR_VALUE is compared > 8, in the 'true' branch
1137+
it will have a umin_value (unsigned minimum value) of 9, whereas in the 'false'
1138+
branch it will have a umax_value of 8. A signed compare (with BPF_JSGT or
1139+
BPF_JSGE) would instead update the signed minimum/maximum values. Information
1140+
from the signed and unsigned bounds can be combined; for instance if a value is
1141+
first tested < 8 and then tested s> 4, the verifier will conclude that the value
1142+
is also > 4 and s< 8, since the bounds prevent crossing the sign boundary.
1143+
PTR_TO_PACKETs with a variable offset part have an 'id', which is common to all
1144+
pointers sharing that same variable offset. This is important for packet range
1145+
checks: after adding some variable to a packet pointer, if you then copy it to
1146+
another register and (say) add a constant 4, both registers will share the same
1147+
'id' but one will have a fixed offset of +4. Then if it is bounds-checked and
1148+
found to be less than a PTR_TO_PACKET_END, the other register is now known to
1149+
have a safe range of at least 4 bytes. See 'Direct packet access', below, for
1150+
more on PTR_TO_PACKET ranges.
1151+
The 'id' field is also used on PTR_TO_MAP_VALUE_OR_NULL, common to all copies of
1152+
the pointer returned from a map lookup. This means that when one copy is
1153+
checked and found to be non-NULL, all copies can become PTR_TO_MAP_VALUEs.
1154+
As well as range-checking, the tracked information is also used for enforcing
1155+
alignment of pointer accesses. For instance, on most systems the packet pointer
1156+
is 2 bytes after a 4-byte alignment. If a program adds 14 bytes to that to jump
1157+
over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting
1158+
pointer will have a variable offset known to be 4n+2 for some n, so adding the 2
1159+
bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through
1160+
that pointer are safe.
1161+
10971162
Direct packet access
10981163
--------------------
10991164
In cls_bpf and act_bpf programs the verifier allows direct access to the packet
@@ -1121,7 +1186,7 @@ it now points to 'skb->data + 14' and accessible range is [R5, R5 + 14 - 14)
11211186
which is zero bytes.
11221187

11231188
More complex packet access may look like:
1124-
R0=imm1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
1189+
R0=inv1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
11251190
6: r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */
11261191
7: r4 = *(u8 *)(r3 +12)
11271192
8: r4 *= 14
@@ -1135,26 +1200,31 @@ More complex packet access may look like:
11351200
16: r2 += 8
11361201
17: r1 = *(u32 *)(r1 +80) /* load skb->data_end */
11371202
18: if r2 > r1 goto pc+2
1138-
R0=inv56 R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv52 R5=pkt(id=0,off=14,r=14) R10=fp
1203+
R0=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)) R5=pkt(id=0,off=14,r=14) R10=fp
11391204
19: r1 = *(u8 *)(r3 +4)
11401205
The state of the register R3 is R3=pkt(id=2,off=0,r=8)
11411206
id=2 means that two 'r3 += rX' instructions were seen, so r3 points to some
11421207
offset within a packet and since the program author did
11431208
'if (r3 + 8 > r1) goto err' at insn #18, the safe range is [R3, R3 + 8).
1144-
The verifier only allows 'add' operation on packet registers. Any other
1145-
operation will set the register state to 'unknown_value' and it won't be
1209+
The verifier only allows 'add'/'sub' operations on packet registers. Any other
1210+
operation will set the register state to 'SCALAR_VALUE' and it won't be
11461211
available for direct packet access.
11471212
Operation 'r3 += rX' may overflow and become less than original skb->data,
1148-
therefore the verifier has to prevent that. So it tracks the number of
1149-
upper zero bits in all 'uknown_value' registers, so when it sees
1150-
'r3 += rX' instruction and rX is more than 16-bit value, it will error as:
1151-
"cannot add integer value with N upper zero bits to ptr_to_packet"
1213+
therefore the verifier has to prevent that. So when it sees 'r3 += rX'
1214+
instruction and rX is more than 16-bit value, any subsequent bounds-check of r3
1215+
against skb->data_end will not give us 'range' information, so attempts to read
1216+
through the pointer will give "invalid access to packet" error.
11521217
Ex. after insn 'r4 = *(u8 *)(r3 +12)' (insn #7 above) the state of r4 is
1153-
R4=inv56 which means that upper 56 bits on the register are guaranteed
1154-
to be zero. After insn 'r4 *= 14' the state becomes R4=inv52, since
1155-
multiplying 8-bit value by constant 14 will keep upper 52 bits as zero.
1156-
Similarly 'r2 >>= 48' will make R2=inv48, since the shift is not sign
1157-
extending. This logic is implemented in evaluate_reg_alu() function.
1218+
R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) which means that upper 56 bits
1219+
of the register are guaranteed to be zero, and nothing is known about the lower
1220+
8 bits. After insn 'r4 *= 14' the state becomes
1221+
R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)), since multiplying an 8-bit
1222+
value by constant 14 will keep upper 52 bits as zero, also the least significant
1223+
bit will be zero as 14 is even. Similarly 'r2 >>= 48' will make
1224+
R2=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)), since the shift is not sign
1225+
extending. This logic is implemented in adjust_reg_min_max_vals() function,
1226+
which calls adjust_ptr_min_max_vals() for adding pointer to scalar (or vice
1227+
versa) and adjust_scalar_min_max_vals() for operations on two scalars.
11581228

11591229
The end result is that bpf program author can access packet directly
11601230
using normal C code as:
@@ -1214,6 +1284,22 @@ The map is defined by:
12141284
. key size in bytes
12151285
. value size in bytes
12161286

1287+
Pruning
1288+
-------
1289+
The verifier does not actually walk all possible paths through the program. For
1290+
each new branch to analyse, the verifier looks at all the states it's previously
1291+
been in when at this instruction. If any of them contain the current state as a
1292+
subset, the branch is 'pruned' - that is, the fact that the previous state was
1293+
accepted implies the current state would be as well. For instance, if in the
1294+
previous state, r1 held a packet-pointer, and in the current state, r1 holds a
1295+
packet-pointer with a range as long or longer and at least as strict an
1296+
alignment, then r1 is safe. Similarly, if r2 was NOT_INIT before then it can't
1297+
have been used by any path from that point, so any value in r2 (including
1298+
another NOT_INIT) is safe. The implementation is in the function regsafe().
1299+
Pruning considers not only the registers but also the stack (and any spilled
1300+
registers it may hold). They must all be safe for the branch to be pruned.
1301+
This is implemented in states_equal().
1302+
12171303
Understanding eBPF verifier messages
12181304
------------------------------------
12191305

drivers/net/ethernet/netronome/nfp/bpf/verifier.c

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -79,28 +79,32 @@ nfp_bpf_check_exit(struct nfp_prog *nfp_prog,
7979
const struct bpf_verifier_env *env)
8080
{
8181
const struct bpf_reg_state *reg0 = &env->cur_state.regs[0];
82+
u64 imm;
8283

8384
if (nfp_prog->act == NN_ACT_XDP)
8485
return 0;
8586

86-
if (reg0->type != CONST_IMM) {
87-
pr_info("unsupported exit state: %d, imm: %llx\n",
88-
reg0->type, reg0->imm);
87+
if (!(reg0->type == SCALAR_VALUE && tnum_is_const(reg0->var_off))) {
88+
char tn_buf[48];
89+
90+
tnum_strn(tn_buf, sizeof(tn_buf), reg0->var_off);
91+
pr_info("unsupported exit state: %d, var_off: %s\n",
92+
reg0->type, tn_buf);
8993
return -EINVAL;
9094
}
9195

92-
if (nfp_prog->act != NN_ACT_DIRECT &&
93-
reg0->imm != 0 && (reg0->imm & ~0U) != ~0U) {
96+
imm = reg0->var_off.value;
97+
if (nfp_prog->act != NN_ACT_DIRECT && imm != 0 && (imm & ~0U) != ~0U) {
9498
pr_info("unsupported exit state: %d, imm: %llx\n",
95-
reg0->type, reg0->imm);
99+
reg0->type, imm);
96100
return -EINVAL;
97101
}
98102

99-
if (nfp_prog->act == NN_ACT_DIRECT && reg0->imm <= TC_ACT_REDIRECT &&
100-
reg0->imm != TC_ACT_SHOT && reg0->imm != TC_ACT_STOLEN &&
101-
reg0->imm != TC_ACT_QUEUED) {
103+
if (nfp_prog->act == NN_ACT_DIRECT && imm <= TC_ACT_REDIRECT &&
104+
imm != TC_ACT_SHOT && imm != TC_ACT_STOLEN &&
105+
imm != TC_ACT_QUEUED) {
102106
pr_info("unsupported exit state: %d, imm: %llx\n",
103-
reg0->type, reg0->imm);
107+
reg0->type, imm);
104108
return -EINVAL;
105109
}
106110

include/linux/bpf.h

Lines changed: 12 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -117,35 +117,25 @@ enum bpf_access_type {
117117
};
118118

119119
/* types of values stored in eBPF registers */
120+
/* Pointer types represent:
121+
* pointer
122+
* pointer + imm
123+
* pointer + (u16) var
124+
* pointer + (u16) var + imm
125+
* if (range > 0) then [ptr, ptr + range - off) is safe to access
126+
* if (id > 0) means that some 'var' was added
127+
* if (off > 0) means that 'imm' was added
128+
*/
120129
enum bpf_reg_type {
121130
NOT_INIT = 0, /* nothing was written into register */
122-
UNKNOWN_VALUE, /* reg doesn't contain a valid pointer */
131+
SCALAR_VALUE, /* reg doesn't contain a valid pointer */
123132
PTR_TO_CTX, /* reg points to bpf_context */
124133
CONST_PTR_TO_MAP, /* reg points to struct bpf_map */
125134
PTR_TO_MAP_VALUE, /* reg points to map element value */
126135
PTR_TO_MAP_VALUE_OR_NULL,/* points to map elem value or NULL */
127-
FRAME_PTR, /* reg == frame_pointer */
128-
PTR_TO_STACK, /* reg == frame_pointer + imm */
129-
CONST_IMM, /* constant integer value */
130-
131-
/* PTR_TO_PACKET represents:
132-
* skb->data
133-
* skb->data + imm
134-
* skb->data + (u16) var
135-
* skb->data + (u16) var + imm
136-
* if (range > 0) then [ptr, ptr + range - off) is safe to access
137-
* if (id > 0) means that some 'var' was added
138-
* if (off > 0) menas that 'imm' was added
139-
*/
140-
PTR_TO_PACKET,
136+
PTR_TO_STACK, /* reg == frame_pointer + offset */
137+
PTR_TO_PACKET, /* reg points to skb->data */
141138
PTR_TO_PACKET_END, /* skb->data + headlen */
142-
143-
/* PTR_TO_MAP_VALUE_ADJ is used for doing pointer math inside of a map
144-
* elem value. We only allow this if we can statically verify that
145-
* access from this register are going to fall within the size of the
146-
* map element.
147-
*/
148-
PTR_TO_MAP_VALUE_ADJ,
149139
};
150140

151141
struct bpf_prog;

include/linux/bpf_verifier.h

Lines changed: 34 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -9,41 +9,54 @@
99

1010
#include <linux/bpf.h> /* for enum bpf_reg_type */
1111
#include <linux/filter.h> /* for MAX_BPF_STACK */
12+
#include <linux/tnum.h>
1213

13-
/* Just some arbitrary values so we can safely do math without overflowing and
14-
* are obviously wrong for any sort of memory access.
15-
*/
16-
#define BPF_REGISTER_MAX_RANGE (1024 * 1024 * 1024)
17-
#define BPF_REGISTER_MIN_RANGE -1
14+
/* Maximum variable offset umax_value permitted when resolving memory accesses.
15+
* In practice this is far bigger than any realistic pointer offset; this limit
16+
* ensures that umax_value + (int)off + (int)size cannot overflow a u64.
17+
*/
18+
#define BPF_MAX_VAR_OFF (1ULL << 31)
19+
/* Maximum variable size permitted for ARG_CONST_SIZE[_OR_ZERO]. This ensures
20+
* that converting umax_value to int cannot overflow.
21+
*/
22+
#define BPF_MAX_VAR_SIZ INT_MAX
1823

1924
struct bpf_reg_state {
2025
enum bpf_reg_type type;
2126
union {
22-
/* valid when type == CONST_IMM | PTR_TO_STACK | UNKNOWN_VALUE */
23-
s64 imm;
24-
25-
/* valid when type == PTR_TO_PACKET* */
26-
struct {
27-
u16 off;
28-
u16 range;
29-
};
27+
/* valid when type == PTR_TO_PACKET */
28+
u16 range;
3029

3130
/* valid when type == CONST_PTR_TO_MAP | PTR_TO_MAP_VALUE |
3231
* PTR_TO_MAP_VALUE_OR_NULL
3332
*/
3433
struct bpf_map *map_ptr;
3534
};
35+
/* Fixed part of pointer offset, pointer types only */
36+
s32 off;
37+
/* For PTR_TO_PACKET, used to find other pointers with the same variable
38+
* offset, so they can share range knowledge.
39+
* For PTR_TO_MAP_VALUE_OR_NULL this is used to share which map value we
40+
* came from, when one is tested for != NULL.
41+
*/
3642
u32 id;
43+
/* These five fields must be last. See states_equal() */
44+
/* For scalar types (SCALAR_VALUE), this represents our knowledge of
45+
* the actual value.
46+
* For pointer types, this represents the variable part of the offset
47+
* from the pointed-to object, and is shared with all bpf_reg_states
48+
* with the same id as us.
49+
*/
50+
struct tnum var_off;
3751
/* Used to determine if any memory access using this register will
38-
* result in a bad access. These two fields must be last.
39-
* See states_equal()
52+
* result in a bad access.
53+
* These refer to the same value as var_off, not necessarily the actual
54+
* contents of the register.
4055
*/
41-
s64 min_value;
42-
u64 max_value;
43-
u32 min_align;
44-
u32 aux_off;
45-
u32 aux_off_align;
46-
bool value_from_signed;
56+
s64 smin_value; /* minimum possible (s64)value */
57+
s64 smax_value; /* maximum possible (s64)value */
58+
u64 umin_value; /* minimum possible (u64)value */
59+
u64 umax_value; /* maximum possible (u64)value */
4760
};
4861

4962
enum bpf_stack_slot_type {

0 commit comments

Comments
 (0)