Skip to content

Commit 226bc6a

Browse files
author
Martin KaFai Lau
committed
Merge branch 'Transit between BPF TCP congestion controls.'
Kui-Feng Lee says: ==================== Major changes: - Create bpf_links in the kernel for BPF struct_ops to register and unregister it. - Enables switching between implementations of bpf-tcp-cc under a name instantly by replacing the backing struct_ops map of a bpf_link. Previously, BPF struct_ops didn't go off, as even when the user program creating it was terminated, none of these ever were pinned. For instance, the TCP congestion control subsystem indirectly maintains a reference count on the struct_ops of any registered BPF implemented algorithm. Thus, the algorithm won't be deactivated until someone deliberately unregisters it. For compatibility with other BPF programs, bpf_links have been created to work in coordination with struct_ops maps. This ensures that the registration and unregistration of these respective maps is carried out at the start and end of the bpf_link. We also faced complications when attempting to replace an existing TCP congestion control algorithm with a new implementation on the fly. A struct_ops map was used to register a TCP congestion control algorithm with a unique name. We had to either register the alternative implementation with a new name and move over or unregister the current one before being able to reregistration with the same name. To fix this problem, we can an option to migrate the registration of the algorithm from struct_ops maps to bpf_links. By modifying the backing map of a bpf_link, it suddenly becomes possible to replace an existing TCP congestion control algorithm with ease. --- The major differences from v11: - Fix incorrectly setting both old_prog_fd and old_map_fd. The major differences from v10: - Add old_map_fd as an additional field instead of an union in bpf_link_update_opts. The major differences from v9: - Add test case for BPF_F_LINK. Includes adding old_map_fd to struct bpf_link_update_opts in patch 6. - Return -EPERM instead of -EINVAL when the old map fd doesn't match with BPF_F_LINK. - Fix -EBUSY case in bpf_map__attach_struct_ops(). The major differences form v8: - Check bpf_struct_ops::{validate,update} in bpf_struct_ops_map_alloc() The major differences from v7: - Use synchronize_rcu_mult(call_rcu, call_rcu_tasks) to replace synchronize_rcu() and synchronize_rcu_tasks(). - Call synchronize_rcu() in tcp_update_congestion_control(). - Handle -EBUSY in bpf_map__attach_struct_ops() to allow a struct_ops can be used to create links more than once. Include a test case. - Add old_map_fd to bpf_attr and handle BPF_F_REPLACE in bpf_struct_ops_map_link_update(). - Remove changes in bpf_dummy_struct_ops.c and add a check of .update function pointer of bpf_struct_ops. The major differences from v6: - Reword commit logs of the patch 1, 2, and 8. - Call synchronize_rcu_tasks() as well in bpf_struct_ops_map_free(). - Refactor bpf_struct_ops_map_free() so that bpf_struct_ops_map_alloc() can free a struct_ops without waiting for a RCU grace period. The major differences from v5: - Add a new step to bpf_object__load() to prepare vdata. - Accept BPF_F_REPLACE. - Check section IDs in find_struct_ops_map_by_offset() - Add a test case to check mixing w/ and w/o link struct_ops. - Add a test case of using struct_ops w/o link to update a link. - Improve bpf_link__detach_struct_ops() to handle the w/ link case. The major differences from v4: - Rebase. - Reorder patches and merge part 4 to part 2 of the v4. The major differences from v3: - Remove bpf_struct_ops_map_free_rcu(), and use synchronize_rcu(). - Improve the commit log of the part 1. - Before transitioning to the READY state, we conduct a value check to ensure that struct_ops can be successfully utilized and links created later. The major differences from v2: - Simplify states - Remove TOBEUNREG. - Rename UNREG to READY. - Stop using the refcnt of the kvalue of a struct_ops. Explicitly increase and decrease the refcount of struct_ops. - Prepare kernel vdata during the load phase of libbpf. The major differences from v1: - Added bpf_struct_ops_link to replace the previous union-based approach. - Added UNREG and TOBEUNREG to the state of bpf_struct_ops_map. - bpf_struct_ops_transit_state() maintains state transitions. - Fixed synchronization issue. - Prepare kernel vdata of struct_ops during the loading phase of bpf_object. - Merged previous patch 3 to patch 1. v11: https://lore.kernel.org/all/[email protected]/ v10: https://lore.kernel.org/all/[email protected]/ v9: https://lore.kernel.org/all/[email protected]/ v8: https://lore.kernel.org/all/[email protected]/ v7: https://lore.kernel.org/all/[email protected]/ v6: https://lore.kernel.org/all/[email protected]/ v5: https://lore.kernel.org/all/[email protected]/ v4: https://lore.kernel.org/all/[email protected]/ v3: https://lore.kernel.org/all/[email protected]/ v2: https://lore.kernel.org/bpf/[email protected]/ v1: https://lore.kernel.org/bpf/[email protected]/ ==================== Signed-off-by: Martin KaFai Lau <[email protected]>
2 parents b63cbc4 + 06da9f3 commit 226bc6a

File tree

15 files changed

+817
-103
lines changed

15 files changed

+817
-103
lines changed

include/linux/bpf.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1476,6 +1476,8 @@ struct bpf_link_ops {
14761476
void (*show_fdinfo)(const struct bpf_link *link, struct seq_file *seq);
14771477
int (*fill_link_info)(const struct bpf_link *link,
14781478
struct bpf_link_info *info);
1479+
int (*update_map)(struct bpf_link *link, struct bpf_map *new_map,
1480+
struct bpf_map *old_map);
14791481
};
14801482

14811483
struct bpf_tramp_link {
@@ -1518,6 +1520,8 @@ struct bpf_struct_ops {
15181520
void *kdata, const void *udata);
15191521
int (*reg)(void *kdata);
15201522
void (*unreg)(void *kdata);
1523+
int (*update)(void *kdata, void *old_kdata);
1524+
int (*validate)(void *kdata);
15211525
const struct btf_type *type;
15221526
const struct btf_type *value_type;
15231527
const char *name;
@@ -1552,6 +1556,7 @@ static inline void bpf_module_put(const void *data, struct module *owner)
15521556
else
15531557
module_put(owner);
15541558
}
1559+
int bpf_struct_ops_link_create(union bpf_attr *attr);
15551560

15561561
#ifdef CONFIG_NET
15571562
/* Define it here to avoid the use of forward declaration */
@@ -1592,6 +1597,11 @@ static inline int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map,
15921597
{
15931598
return -EINVAL;
15941599
}
1600+
static inline int bpf_struct_ops_link_create(union bpf_attr *attr)
1601+
{
1602+
return -EOPNOTSUPP;
1603+
}
1604+
15951605
#endif
15961606

15971607
#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_LSM)
@@ -1945,6 +1955,7 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd);
19451955
struct bpf_map *__bpf_map_get(struct fd f);
19461956
void bpf_map_inc(struct bpf_map *map);
19471957
void bpf_map_inc_with_uref(struct bpf_map *map);
1958+
struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref);
19481959
struct bpf_map * __must_check bpf_map_inc_not_zero(struct bpf_map *map);
19491960
void bpf_map_put_with_uref(struct bpf_map *map);
19501961
void bpf_map_put(struct bpf_map *map);

include/net/tcp.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1117,6 +1117,9 @@ struct tcp_congestion_ops {
11171117

11181118
int tcp_register_congestion_control(struct tcp_congestion_ops *type);
11191119
void tcp_unregister_congestion_control(struct tcp_congestion_ops *type);
1120+
int tcp_update_congestion_control(struct tcp_congestion_ops *type,
1121+
struct tcp_congestion_ops *old_type);
1122+
int tcp_validate_congestion_control(struct tcp_congestion_ops *ca);
11201123

11211124
void tcp_assign_congestion_control(struct sock *sk);
11221125
void tcp_init_congestion_control(struct sock *sk);

include/uapi/linux/bpf.h

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1033,6 +1033,7 @@ enum bpf_attach_type {
10331033
BPF_PERF_EVENT,
10341034
BPF_TRACE_KPROBE_MULTI,
10351035
BPF_LSM_CGROUP,
1036+
BPF_STRUCT_OPS,
10361037
__MAX_BPF_ATTACH_TYPE
10371038
};
10381039

@@ -1266,6 +1267,9 @@ enum {
12661267

12671268
/* Create a map that is suitable to be an inner map with dynamic max entries */
12681269
BPF_F_INNER_MAP = (1U << 12),
1270+
1271+
/* Create a map that will be registered/unregesitered by the backed bpf_link */
1272+
BPF_F_LINK = (1U << 13),
12691273
};
12701274

12711275
/* Flags for BPF_PROG_QUERY. */
@@ -1507,7 +1511,10 @@ union bpf_attr {
15071511
} task_fd_query;
15081512

15091513
struct { /* struct used by BPF_LINK_CREATE command */
1510-
__u32 prog_fd; /* eBPF program to attach */
1514+
union {
1515+
__u32 prog_fd; /* eBPF program to attach */
1516+
__u32 map_fd; /* struct_ops to attach */
1517+
};
15111518
union {
15121519
__u32 target_fd; /* object to attach to */
15131520
__u32 target_ifindex; /* target ifindex */
@@ -1548,12 +1555,23 @@ union bpf_attr {
15481555

15491556
struct { /* struct used by BPF_LINK_UPDATE command */
15501557
__u32 link_fd; /* link fd */
1551-
/* new program fd to update link with */
1552-
__u32 new_prog_fd;
1558+
union {
1559+
/* new program fd to update link with */
1560+
__u32 new_prog_fd;
1561+
/* new struct_ops map fd to update link with */
1562+
__u32 new_map_fd;
1563+
};
15531564
__u32 flags; /* extra flags */
1554-
/* expected link's program fd; is specified only if
1555-
* BPF_F_REPLACE flag is set in flags */
1556-
__u32 old_prog_fd;
1565+
union {
1566+
/* expected link's program fd; is specified only if
1567+
* BPF_F_REPLACE flag is set in flags.
1568+
*/
1569+
__u32 old_prog_fd;
1570+
/* expected link's map fd; is specified only
1571+
* if BPF_F_REPLACE flag is set.
1572+
*/
1573+
__u32 old_map_fd;
1574+
};
15571575
} link_update;
15581576

15591577
struct {
@@ -6379,6 +6397,9 @@ struct bpf_link_info {
63796397
struct {
63806398
__u32 ifindex;
63816399
} xdp;
6400+
struct {
6401+
__u32 map_id;
6402+
} struct_ops;
63826403
};
63836404
} __attribute__((aligned(8)));
63846405

0 commit comments

Comments
 (0)