Skip to content

Commit 1e0bd5a

Browse files
anakryikoborkmann
authored andcommitted
bpf: Switch bpf_map ref counter to atomic64_t so bpf_map_inc() never fails
92117d8 ("bpf: fix refcnt overflow") turned refcounting of bpf_map into potentially failing operation, when refcount reaches BPF_MAX_REFCNT limit (32k). Due to using 32-bit counter, it's possible in practice to overflow refcounter and make it wrap around to 0, causing erroneous map free, while there are still references to it, causing use-after-free problems. But having a failing refcounting operations are problematic in some cases. One example is mmap() interface. After establishing initial memory-mapping, user is allowed to arbitrarily map/remap/unmap parts of mapped memory, arbitrarily splitting it into multiple non-contiguous regions. All this happening without any control from the users of mmap subsystem. Rather mmap subsystem sends notifications to original creator of memory mapping through open/close callbacks, which are optionally specified during initial memory mapping creation. These callbacks are used to maintain accurate refcount for bpf_map (see next patch in this series). The problem is that open() callback is not supposed to fail, because memory-mapped resource is set up and properly referenced. This is posing a problem for using memory-mapping with BPF maps. One solution to this is to maintain separate refcount for just memory-mappings and do single bpf_map_inc/bpf_map_put when it goes from/to zero, respectively. There are similar use cases in current work on tcp-bpf, necessitating extra counter as well. This seems like a rather unfortunate and ugly solution that doesn't scale well to various new use cases. Another approach to solve this is to use non-failing refcount_t type, which uses 32-bit counter internally, but, once reaching overflow state at UINT_MAX, stays there. This utlimately causes memory leak, but prevents use after free. But given refcounting is not the most performance-critical operation with BPF maps (it's not used from running BPF program code), we can also just switch to 64-bit counter that can't overflow in practice, potentially disadvantaging 32-bit platforms a tiny bit. This simplifies semantics and allows above described scenarios to not worry about failing refcount increment operation. In terms of struct bpf_map size, we are still good and use the same amount of space: BEFORE (3 cache lines, 8 bytes of padding at the end): struct bpf_map { const struct bpf_map_ops * ops __attribute__((__aligned__(64))); /* 0 8 */ struct bpf_map * inner_map_meta; /* 8 8 */ void * security; /* 16 8 */ enum bpf_map_type map_type; /* 24 4 */ u32 key_size; /* 28 4 */ u32 value_size; /* 32 4 */ u32 max_entries; /* 36 4 */ u32 map_flags; /* 40 4 */ int spin_lock_off; /* 44 4 */ u32 id; /* 48 4 */ int numa_node; /* 52 4 */ u32 btf_key_type_id; /* 56 4 */ u32 btf_value_type_id; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct btf * btf; /* 64 8 */ struct bpf_map_memory memory; /* 72 16 */ bool unpriv_array; /* 88 1 */ bool frozen; /* 89 1 */ /* XXX 38 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ atomic_t refcnt __attribute__((__aligned__(64))); /* 128 4 */ atomic_t usercnt; /* 132 4 */ struct work_struct work; /* 136 32 */ char name[16]; /* 168 16 */ /* size: 192, cachelines: 3, members: 21 */ /* sum members: 146, holes: 1, sum holes: 38 */ /* padding: 8 */ /* forced alignments: 2, forced holes: 1, sum forced holes: 38 */ } __attribute__((__aligned__(64))); AFTER (same 3 cache lines, no extra padding now): struct bpf_map { const struct bpf_map_ops * ops __attribute__((__aligned__(64))); /* 0 8 */ struct bpf_map * inner_map_meta; /* 8 8 */ void * security; /* 16 8 */ enum bpf_map_type map_type; /* 24 4 */ u32 key_size; /* 28 4 */ u32 value_size; /* 32 4 */ u32 max_entries; /* 36 4 */ u32 map_flags; /* 40 4 */ int spin_lock_off; /* 44 4 */ u32 id; /* 48 4 */ int numa_node; /* 52 4 */ u32 btf_key_type_id; /* 56 4 */ u32 btf_value_type_id; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct btf * btf; /* 64 8 */ struct bpf_map_memory memory; /* 72 16 */ bool unpriv_array; /* 88 1 */ bool frozen; /* 89 1 */ /* XXX 38 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ atomic64_t refcnt __attribute__((__aligned__(64))); /* 128 8 */ atomic64_t usercnt; /* 136 8 */ struct work_struct work; /* 144 32 */ char name[16]; /* 176 16 */ /* size: 192, cachelines: 3, members: 21 */ /* sum members: 154, holes: 1, sum holes: 38 */ /* forced alignments: 2, forced holes: 1, sum forced holes: 38 */ } __attribute__((__aligned__(64))); This patch, while modifying all users of bpf_map_inc, also cleans up its interface to match bpf_map_put with separate operations for bpf_map_inc and bpf_map_inc_with_uref (to match bpf_map_put and bpf_map_put_with_uref, respectively). Also, given there are no users of bpf_map_inc_not_zero specifying uref=true, remove uref flag and default to uref=false internally. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
1 parent 2893c99 commit 1e0bd5a

File tree

8 files changed

+34
-49
lines changed

8 files changed

+34
-49
lines changed

drivers/net/ethernet/netronome/nfp/bpf/offload.c

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,9 +46,7 @@ nfp_map_ptr_record(struct nfp_app_bpf *bpf, struct nfp_prog *nfp_prog,
4646
/* Grab a single ref to the map for our record. The prog destroy ndo
4747
* happens after free_used_maps().
4848
*/
49-
map = bpf_map_inc(map, false);
50-
if (IS_ERR(map))
51-
return PTR_ERR(map);
49+
bpf_map_inc(map);
5250

5351
record = kmalloc(sizeof(*record), GFP_KERNEL);
5452
if (!record) {

include/linux/bpf.h

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -103,8 +103,8 @@ struct bpf_map {
103103
/* The 3rd and 4th cacheline with misc members to avoid false sharing
104104
* particularly with refcounting.
105105
*/
106-
atomic_t refcnt ____cacheline_aligned;
107-
atomic_t usercnt;
106+
atomic64_t refcnt ____cacheline_aligned;
107+
atomic64_t usercnt;
108108
struct work_struct work;
109109
char name[BPF_OBJ_NAME_LEN];
110110
};
@@ -783,9 +783,9 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
783783

784784
struct bpf_map *bpf_map_get_with_uref(u32 ufd);
785785
struct bpf_map *__bpf_map_get(struct fd f);
786-
struct bpf_map * __must_check bpf_map_inc(struct bpf_map *map, bool uref);
787-
struct bpf_map * __must_check bpf_map_inc_not_zero(struct bpf_map *map,
788-
bool uref);
786+
void bpf_map_inc(struct bpf_map *map);
787+
void bpf_map_inc_with_uref(struct bpf_map *map);
788+
struct bpf_map * __must_check bpf_map_inc_not_zero(struct bpf_map *map);
789789
void bpf_map_put_with_uref(struct bpf_map *map);
790790
void bpf_map_put(struct bpf_map *map);
791791
int bpf_map_charge_memlock(struct bpf_map *map, u32 pages);

kernel/bpf/inode.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ static void *bpf_any_get(void *raw, enum bpf_type type)
3434
raw = bpf_prog_inc(raw);
3535
break;
3636
case BPF_TYPE_MAP:
37-
raw = bpf_map_inc(raw, true);
37+
bpf_map_inc_with_uref(raw);
3838
break;
3939
default:
4040
WARN_ON_ONCE(1);

kernel/bpf/map_in_map.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ void *bpf_map_fd_get_ptr(struct bpf_map *map,
9898
return inner_map;
9999

100100
if (bpf_map_meta_equal(map->inner_map_meta, inner_map))
101-
inner_map = bpf_map_inc(inner_map, false);
101+
bpf_map_inc(inner_map);
102102
else
103103
inner_map = ERR_PTR(-EINVAL);
104104

kernel/bpf/syscall.c

Lines changed: 22 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -311,7 +311,7 @@ static void bpf_map_free_deferred(struct work_struct *work)
311311

312312
static void bpf_map_put_uref(struct bpf_map *map)
313313
{
314-
if (atomic_dec_and_test(&map->usercnt)) {
314+
if (atomic64_dec_and_test(&map->usercnt)) {
315315
if (map->ops->map_release_uref)
316316
map->ops->map_release_uref(map);
317317
}
@@ -322,7 +322,7 @@ static void bpf_map_put_uref(struct bpf_map *map)
322322
*/
323323
static void __bpf_map_put(struct bpf_map *map, bool do_idr_lock)
324324
{
325-
if (atomic_dec_and_test(&map->refcnt)) {
325+
if (atomic64_dec_and_test(&map->refcnt)) {
326326
/* bpf_map_free_id() must be called first */
327327
bpf_map_free_id(map, do_idr_lock);
328328
btf_put(map->btf);
@@ -575,8 +575,8 @@ static int map_create(union bpf_attr *attr)
575575
if (err)
576576
goto free_map;
577577

578-
atomic_set(&map->refcnt, 1);
579-
atomic_set(&map->usercnt, 1);
578+
atomic64_set(&map->refcnt, 1);
579+
atomic64_set(&map->usercnt, 1);
580580

581581
if (attr->btf_key_type_id || attr->btf_value_type_id) {
582582
struct btf *btf;
@@ -653,21 +653,19 @@ struct bpf_map *__bpf_map_get(struct fd f)
653653
return f.file->private_data;
654654
}
655655

656-
/* prog's and map's refcnt limit */
657-
#define BPF_MAX_REFCNT 32768
658-
659-
struct bpf_map *bpf_map_inc(struct bpf_map *map, bool uref)
656+
void bpf_map_inc(struct bpf_map *map)
660657
{
661-
if (atomic_inc_return(&map->refcnt) > BPF_MAX_REFCNT) {
662-
atomic_dec(&map->refcnt);
663-
return ERR_PTR(-EBUSY);
664-
}
665-
if (uref)
666-
atomic_inc(&map->usercnt);
667-
return map;
658+
atomic64_inc(&map->refcnt);
668659
}
669660
EXPORT_SYMBOL_GPL(bpf_map_inc);
670661

662+
void bpf_map_inc_with_uref(struct bpf_map *map)
663+
{
664+
atomic64_inc(&map->refcnt);
665+
atomic64_inc(&map->usercnt);
666+
}
667+
EXPORT_SYMBOL_GPL(bpf_map_inc_with_uref);
668+
671669
struct bpf_map *bpf_map_get_with_uref(u32 ufd)
672670
{
673671
struct fd f = fdget(ufd);
@@ -677,38 +675,30 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd)
677675
if (IS_ERR(map))
678676
return map;
679677

680-
map = bpf_map_inc(map, true);
678+
bpf_map_inc_with_uref(map);
681679
fdput(f);
682680

683681
return map;
684682
}
685683

686684
/* map_idr_lock should have been held */
687-
static struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map,
688-
bool uref)
685+
static struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref)
689686
{
690687
int refold;
691688

692-
refold = atomic_fetch_add_unless(&map->refcnt, 1, 0);
693-
694-
if (refold >= BPF_MAX_REFCNT) {
695-
__bpf_map_put(map, false);
696-
return ERR_PTR(-EBUSY);
697-
}
698-
689+
refold = atomic64_fetch_add_unless(&map->refcnt, 1, 0);
699690
if (!refold)
700691
return ERR_PTR(-ENOENT);
701-
702692
if (uref)
703-
atomic_inc(&map->usercnt);
693+
atomic64_inc(&map->usercnt);
704694

705695
return map;
706696
}
707697

708-
struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map, bool uref)
698+
struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map)
709699
{
710700
spin_lock_bh(&map_idr_lock);
711-
map = __bpf_map_inc_not_zero(map, uref);
701+
map = __bpf_map_inc_not_zero(map, false);
712702
spin_unlock_bh(&map_idr_lock);
713703

714704
return map;
@@ -1455,6 +1445,9 @@ static struct bpf_prog *____bpf_prog_get(struct fd f)
14551445
return f.file->private_data;
14561446
}
14571447

1448+
/* prog's refcnt limit */
1449+
#define BPF_MAX_REFCNT 32768
1450+
14581451
struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i)
14591452
{
14601453
if (atomic_add_return(i, &prog->aux->refcnt) > BPF_MAX_REFCNT) {

kernel/bpf/verifier.c

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8179,11 +8179,7 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
81798179
* will be used by the valid program until it's unloaded
81808180
* and all maps are released in free_used_maps()
81818181
*/
8182-
map = bpf_map_inc(map, false);
8183-
if (IS_ERR(map)) {
8184-
fdput(f);
8185-
return PTR_ERR(map);
8186-
}
8182+
bpf_map_inc(map);
81878183

81888184
aux->map_index = env->used_map_cnt;
81898185
env->used_maps[env->used_map_cnt++] = map;

kernel/bpf/xskmap.c

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,8 @@
1111

1212
int xsk_map_inc(struct xsk_map *map)
1313
{
14-
struct bpf_map *m = &map->map;
15-
16-
m = bpf_map_inc(m, false);
17-
return PTR_ERR_OR_ZERO(m);
14+
bpf_map_inc(&map->map);
15+
return 0;
1816
}
1917

2018
void xsk_map_put(struct xsk_map *map)

net/core/bpf_sk_storage.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -798,7 +798,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
798798
* Try to grab map refcnt to make sure that it's still
799799
* alive and prevent concurrent removal.
800800
*/
801-
map = bpf_map_inc_not_zero(&smap->map, false);
801+
map = bpf_map_inc_not_zero(&smap->map);
802802
if (IS_ERR(map))
803803
continue;
804804

0 commit comments

Comments
 (0)