Skip to content

Commit 116e7db

Browse files
author
Alexei Starovoitov
committed
Merge branch 'gen-syn-cookie'
Petar Penkov says: ==================== This patch series introduces a BPF helper function that allows generating SYN cookies from BPF. Currently, this helper is enabled at both the TC hook and the XDP hook. The first two patches in the series add/modify several TCP helper functions to allow for SKB-less operation, as is the case at the XDP hook. The third patch introduces the bpf_tcp_gen_syncookie helper function which generates a SYN cookie for either XDP or TC programs. The return value of this function contains both the MSS value, encoded in the cookie, and the cookie itself. The last three patches sync tools/ and add a test. Performance evaluation: I sent 10Mpps to a fixed port on a host with 2 10G bonded Mellanox 4 NICs from random IPv6 source addresses. Without XDP I observed 7.2Mpps (syn-acks) being sent out if the IPv6 packets carry 20 bytes of TCP options or 7.6Mpps if they carry no options. If I attached a simple program that checks if a packet is IPv6/TCP/SYN, looks up the socket, issues a cookie, and sends it back out after swapping src/dest, recomputing the checksum, and setting the ACK flag, I observed 10Mpps being sent back out. Changes since v1: 1/ Added performance numbers to the cover letter 2/ Patch 2: Refactored a bit to fix compilation issues 3/ Patch 3: Changed ENOTSUPP to EOPNOTSUPP at Toke's suggestion Changes since RFC: 1/ Cookie is returned in host order at Alexei's suggestion 2/ If cookies are not enabled via a sysctl, the helper function returns -ENOENT instead of -EINVAL at Lorenz's suggestion 3/ Fixed documentation to properly reflect that MSS is 16 bits at Lorenz's suggestion 4/ BPF helper requires TCP length to match ->doff field, rather than to simply be no more than 20 bytes at Eric and Alexei's suggestion 5/ Packet type is looked up from the packet version field, rather than from the socket. v4 packets are rejected on v6-only sockets but should work with dual stack listeners at Eric's suggestion 6/ Removed unnecessary `net` argument from helper function in patch 2 at Lorenz's suggestion 7/ Changed test to only pass MSS option so we can convince the verifier that the memory access is not out of bounds Note that 7/ below illustrates the verifier might need to be extended to allow passing a variable tcph->doff to the helper function like below: __u32 thlen = tcph->doff * 4; if (thlen < sizeof(*tcph)) return; __s64 cookie = bpf_tcp_gen_syncookie(sk, ipv4h, 20, tcph, thlen); ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2 parents d340691 + 91bc357 commit 116e7db

File tree

11 files changed

+354
-22
lines changed

11 files changed

+354
-22
lines changed

include/net/tcp.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -414,6 +414,16 @@ void tcp_parse_options(const struct net *net, const struct sk_buff *skb,
414414
int estab, struct tcp_fastopen_cookie *foc);
415415
const u8 *tcp_parse_md5sig_option(const struct tcphdr *th);
416416

417+
/*
418+
* BPF SKB-less helpers
419+
*/
420+
u16 tcp_v4_get_syncookie(struct sock *sk, struct iphdr *iph,
421+
struct tcphdr *th, u32 *cookie);
422+
u16 tcp_v6_get_syncookie(struct sock *sk, struct ipv6hdr *iph,
423+
struct tcphdr *th, u32 *cookie);
424+
u16 tcp_get_syncookie_mss(struct request_sock_ops *rsk_ops,
425+
const struct tcp_request_sock_ops *af_ops,
426+
struct sock *sk, struct tcphdr *th);
417427
/*
418428
* TCP v4 functions exported for the inet6 API
419429
*/

include/uapi/linux/bpf.h

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2714,6 +2714,33 @@ union bpf_attr {
27142714
* **-EPERM** if no permission to send the *sig*.
27152715
*
27162716
* **-EAGAIN** if bpf program can try again.
2717+
*
2718+
* s64 bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
2719+
* Description
2720+
* Try to issue a SYN cookie for the packet with corresponding
2721+
* IP/TCP headers, *iph* and *th*, on the listening socket in *sk*.
2722+
*
2723+
* *iph* points to the start of the IPv4 or IPv6 header, while
2724+
* *iph_len* contains **sizeof**\ (**struct iphdr**) or
2725+
* **sizeof**\ (**struct ip6hdr**).
2726+
*
2727+
* *th* points to the start of the TCP header, while *th_len*
2728+
* contains the length of the TCP header.
2729+
*
2730+
* Return
2731+
* On success, lower 32 bits hold the generated SYN cookie in
2732+
* followed by 16 bits which hold the MSS value for that cookie,
2733+
* and the top 16 bits are unused.
2734+
*
2735+
* On failure, the returned value is one of the following:
2736+
*
2737+
* **-EINVAL** SYN cookie cannot be issued due to error
2738+
*
2739+
* **-ENOENT** SYN cookie should not be issued (no SYN flood)
2740+
*
2741+
* **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
2742+
*
2743+
* **-EPROTONOSUPPORT** IP packet version is not 4 or 6
27172744
*/
27182745
#define __BPF_FUNC_MAPPER(FN) \
27192746
FN(unspec), \
@@ -2825,7 +2852,8 @@ union bpf_attr {
28252852
FN(strtoul), \
28262853
FN(sk_storage_get), \
28272854
FN(sk_storage_delete), \
2828-
FN(send_signal),
2855+
FN(send_signal), \
2856+
FN(tcp_gen_syncookie),
28292857

28302858
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
28312859
* function eBPF program intends to call

net/core/filter.c

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5855,6 +5855,75 @@ static const struct bpf_func_proto bpf_tcp_check_syncookie_proto = {
58555855
.arg5_type = ARG_CONST_SIZE,
58565856
};
58575857

5858+
BPF_CALL_5(bpf_tcp_gen_syncookie, struct sock *, sk, void *, iph, u32, iph_len,
5859+
struct tcphdr *, th, u32, th_len)
5860+
{
5861+
#ifdef CONFIG_SYN_COOKIES
5862+
u32 cookie;
5863+
u16 mss;
5864+
5865+
if (unlikely(th_len < sizeof(*th) || th_len != th->doff * 4))
5866+
return -EINVAL;
5867+
5868+
if (sk->sk_protocol != IPPROTO_TCP || sk->sk_state != TCP_LISTEN)
5869+
return -EINVAL;
5870+
5871+
if (!sock_net(sk)->ipv4.sysctl_tcp_syncookies)
5872+
return -ENOENT;
5873+
5874+
if (!th->syn || th->ack || th->fin || th->rst)
5875+
return -EINVAL;
5876+
5877+
if (unlikely(iph_len < sizeof(struct iphdr)))
5878+
return -EINVAL;
5879+
5880+
/* Both struct iphdr and struct ipv6hdr have the version field at the
5881+
* same offset so we can cast to the shorter header (struct iphdr).
5882+
*/
5883+
switch (((struct iphdr *)iph)->version) {
5884+
case 4:
5885+
if (sk->sk_family == AF_INET6 && sk->sk_ipv6only)
5886+
return -EINVAL;
5887+
5888+
mss = tcp_v4_get_syncookie(sk, iph, th, &cookie);
5889+
break;
5890+
5891+
#if IS_BUILTIN(CONFIG_IPV6)
5892+
case 6:
5893+
if (unlikely(iph_len < sizeof(struct ipv6hdr)))
5894+
return -EINVAL;
5895+
5896+
if (sk->sk_family != AF_INET6)
5897+
return -EINVAL;
5898+
5899+
mss = tcp_v6_get_syncookie(sk, iph, th, &cookie);
5900+
break;
5901+
#endif /* CONFIG_IPV6 */
5902+
5903+
default:
5904+
return -EPROTONOSUPPORT;
5905+
}
5906+
if (mss <= 0)
5907+
return -ENOENT;
5908+
5909+
return cookie | ((u64)mss << 32);
5910+
#else
5911+
return -EOPNOTSUPP;
5912+
#endif /* CONFIG_SYN_COOKIES */
5913+
}
5914+
5915+
static const struct bpf_func_proto bpf_tcp_gen_syncookie_proto = {
5916+
.func = bpf_tcp_gen_syncookie,
5917+
.gpl_only = true, /* __cookie_v*_init_sequence() is GPL */
5918+
.pkt_access = true,
5919+
.ret_type = RET_INTEGER,
5920+
.arg1_type = ARG_PTR_TO_SOCK_COMMON,
5921+
.arg2_type = ARG_PTR_TO_MEM,
5922+
.arg3_type = ARG_CONST_SIZE,
5923+
.arg4_type = ARG_PTR_TO_MEM,
5924+
.arg5_type = ARG_CONST_SIZE,
5925+
};
5926+
58585927
#endif /* CONFIG_INET */
58595928

58605929
bool bpf_helper_changes_pkt_data(void *func)
@@ -6144,6 +6213,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
61446213
return &bpf_tcp_check_syncookie_proto;
61456214
case BPF_FUNC_skb_ecn_set_ce:
61466215
return &bpf_skb_ecn_set_ce_proto;
6216+
case BPF_FUNC_tcp_gen_syncookie:
6217+
return &bpf_tcp_gen_syncookie_proto;
61476218
#endif
61486219
default:
61496220
return bpf_base_func_proto(func_id);
@@ -6183,6 +6254,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
61836254
return &bpf_xdp_skc_lookup_tcp_proto;
61846255
case BPF_FUNC_tcp_check_syncookie:
61856256
return &bpf_tcp_check_syncookie_proto;
6257+
case BPF_FUNC_tcp_gen_syncookie:
6258+
return &bpf_tcp_gen_syncookie_proto;
61866259
#endif
61876260
default:
61886261
return bpf_base_func_proto(func_id);

net/ipv4/tcp_input.c

Lines changed: 76 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3782,6 +3782,49 @@ static void smc_parse_options(const struct tcphdr *th,
37823782
#endif
37833783
}
37843784

3785+
/* Try to parse the MSS option from the TCP header. Return 0 on failure, clamped
3786+
* value on success.
3787+
*/
3788+
static u16 tcp_parse_mss_option(const struct tcphdr *th, u16 user_mss)
3789+
{
3790+
const unsigned char *ptr = (const unsigned char *)(th + 1);
3791+
int length = (th->doff * 4) - sizeof(struct tcphdr);
3792+
u16 mss = 0;
3793+
3794+
while (length > 0) {
3795+
int opcode = *ptr++;
3796+
int opsize;
3797+
3798+
switch (opcode) {
3799+
case TCPOPT_EOL:
3800+
return mss;
3801+
case TCPOPT_NOP: /* Ref: RFC 793 section 3.1 */
3802+
length--;
3803+
continue;
3804+
default:
3805+
if (length < 2)
3806+
return mss;
3807+
opsize = *ptr++;
3808+
if (opsize < 2) /* "silly options" */
3809+
return mss;
3810+
if (opsize > length)
3811+
return mss; /* fail on partial options */
3812+
if (opcode == TCPOPT_MSS && opsize == TCPOLEN_MSS) {
3813+
u16 in_mss = get_unaligned_be16(ptr);
3814+
3815+
if (in_mss) {
3816+
if (user_mss && user_mss < in_mss)
3817+
in_mss = user_mss;
3818+
mss = in_mss;
3819+
}
3820+
}
3821+
ptr += opsize - 2;
3822+
length -= opsize;
3823+
}
3824+
}
3825+
return mss;
3826+
}
3827+
37853828
/* Look for tcp options. Normally only called on SYN and SYNACK packets.
37863829
* But, this can also be called on packets in the established flow when
37873830
* the fast version below fails.
@@ -6422,9 +6465,7 @@ EXPORT_SYMBOL(inet_reqsk_alloc);
64226465
/*
64236466
* Return true if a syncookie should be sent
64246467
*/
6425-
static bool tcp_syn_flood_action(const struct sock *sk,
6426-
const struct sk_buff *skb,
6427-
const char *proto)
6468+
static bool tcp_syn_flood_action(const struct sock *sk, const char *proto)
64286469
{
64296470
struct request_sock_queue *queue = &inet_csk(sk)->icsk_accept_queue;
64306471
const char *msg = "Dropping request";
@@ -6444,7 +6485,7 @@ static bool tcp_syn_flood_action(const struct sock *sk,
64446485
net->ipv4.sysctl_tcp_syncookies != 2 &&
64456486
xchg(&queue->synflood_warned, 1) == 0)
64466487
net_info_ratelimited("%s: Possible SYN flooding on port %d. %s. Check SNMP counters.\n",
6447-
proto, ntohs(tcp_hdr(skb)->dest), msg);
6488+
proto, sk->sk_num, msg);
64486489

64496490
return want_cookie;
64506491
}
@@ -6466,6 +6507,36 @@ static void tcp_reqsk_record_syn(const struct sock *sk,
64666507
}
64676508
}
64686509

6510+
/* If a SYN cookie is required and supported, returns a clamped MSS value to be
6511+
* used for SYN cookie generation.
6512+
*/
6513+
u16 tcp_get_syncookie_mss(struct request_sock_ops *rsk_ops,
6514+
const struct tcp_request_sock_ops *af_ops,
6515+
struct sock *sk, struct tcphdr *th)
6516+
{
6517+
struct tcp_sock *tp = tcp_sk(sk);
6518+
u16 mss;
6519+
6520+
if (sock_net(sk)->ipv4.sysctl_tcp_syncookies != 2 &&
6521+
!inet_csk_reqsk_queue_is_full(sk))
6522+
return 0;
6523+
6524+
if (!tcp_syn_flood_action(sk, rsk_ops->slab_name))
6525+
return 0;
6526+
6527+
if (sk_acceptq_is_full(sk)) {
6528+
NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
6529+
return 0;
6530+
}
6531+
6532+
mss = tcp_parse_mss_option(th, tp->rx_opt.user_mss);
6533+
if (!mss)
6534+
mss = af_ops->mss_clamp;
6535+
6536+
return mss;
6537+
}
6538+
EXPORT_SYMBOL_GPL(tcp_get_syncookie_mss);
6539+
64696540
int tcp_conn_request(struct request_sock_ops *rsk_ops,
64706541
const struct tcp_request_sock_ops *af_ops,
64716542
struct sock *sk, struct sk_buff *skb)
@@ -6487,7 +6558,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
64876558
*/
64886559
if ((net->ipv4.sysctl_tcp_syncookies == 2 ||
64896560
inet_csk_reqsk_queue_is_full(sk)) && !isn) {
6490-
want_cookie = tcp_syn_flood_action(sk, skb, rsk_ops->slab_name);
6561+
want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name);
64916562
if (!want_cookie)
64926563
goto drop;
64936564
}

net/ipv4/tcp_ipv4.c

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1515,6 +1515,21 @@ static struct sock *tcp_v4_cookie_check(struct sock *sk, struct sk_buff *skb)
15151515
return sk;
15161516
}
15171517

1518+
u16 tcp_v4_get_syncookie(struct sock *sk, struct iphdr *iph,
1519+
struct tcphdr *th, u32 *cookie)
1520+
{
1521+
u16 mss = 0;
1522+
#ifdef CONFIG_SYN_COOKIES
1523+
mss = tcp_get_syncookie_mss(&tcp_request_sock_ops,
1524+
&tcp_request_sock_ipv4_ops, sk, th);
1525+
if (mss) {
1526+
*cookie = __cookie_v4_init_sequence(iph, th, &mss);
1527+
tcp_synq_overflow(sk);
1528+
}
1529+
#endif
1530+
return mss;
1531+
}
1532+
15181533
/* The socket must have it's spinlock held when we get
15191534
* here, unless it is a TCP_LISTEN socket.
15201535
*

net/ipv6/tcp_ipv6.c

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1063,6 +1063,21 @@ static struct sock *tcp_v6_cookie_check(struct sock *sk, struct sk_buff *skb)
10631063
return sk;
10641064
}
10651065

1066+
u16 tcp_v6_get_syncookie(struct sock *sk, struct ipv6hdr *iph,
1067+
struct tcphdr *th, u32 *cookie)
1068+
{
1069+
u16 mss = 0;
1070+
#ifdef CONFIG_SYN_COOKIES
1071+
mss = tcp_get_syncookie_mss(&tcp6_request_sock_ops,
1072+
&tcp_request_sock_ipv6_ops, sk, th);
1073+
if (mss) {
1074+
*cookie = __cookie_v6_init_sequence(iph, th, &mss);
1075+
tcp_synq_overflow(sk);
1076+
}
1077+
#endif
1078+
return mss;
1079+
}
1080+
10661081
static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
10671082
{
10681083
if (skb->protocol == htons(ETH_P_IP))

tools/include/uapi/linux/bpf.h

Lines changed: 34 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1572,8 +1572,11 @@ union bpf_attr {
15721572
* but this is only implemented for native XDP (with driver
15731573
* support) as of this writing).
15741574
*
1575-
* All values for *flags* are reserved for future usage, and must
1576-
* be left at zero.
1575+
* The lower two bits of *flags* are used as the return code if
1576+
* the map lookup fails. This is so that the return value can be
1577+
* one of the XDP program return codes up to XDP_TX, as chosen by
1578+
* the caller. Any higher bits in the *flags* argument must be
1579+
* unset.
15771580
*
15781581
* When used to redirect packets to net devices, this helper
15791582
* provides a high performance increase over **bpf_redirect**\ ().
@@ -2711,6 +2714,33 @@ union bpf_attr {
27112714
* **-EPERM** if no permission to send the *sig*.
27122715
*
27132716
* **-EAGAIN** if bpf program can try again.
2717+
*
2718+
* s64 bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
2719+
* Description
2720+
* Try to issue a SYN cookie for the packet with corresponding
2721+
* IP/TCP headers, *iph* and *th*, on the listening socket in *sk*.
2722+
*
2723+
* *iph* points to the start of the IPv4 or IPv6 header, while
2724+
* *iph_len* contains **sizeof**\ (**struct iphdr**) or
2725+
* **sizeof**\ (**struct ip6hdr**).
2726+
*
2727+
* *th* points to the start of the TCP header, while *th_len*
2728+
* contains the length of the TCP header.
2729+
*
2730+
* Return
2731+
* On success, lower 32 bits hold the generated SYN cookie in
2732+
* followed by 16 bits which hold the MSS value for that cookie,
2733+
* and the top 16 bits are unused.
2734+
*
2735+
* On failure, the returned value is one of the following:
2736+
*
2737+
* **-EINVAL** SYN cookie cannot be issued due to error
2738+
*
2739+
* **-ENOENT** SYN cookie should not be issued (no SYN flood)
2740+
*
2741+
* **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
2742+
*
2743+
* **-EPROTONOSUPPORT** IP packet version is not 4 or 6
27142744
*/
27152745
#define __BPF_FUNC_MAPPER(FN) \
27162746
FN(unspec), \
@@ -2822,7 +2852,8 @@ union bpf_attr {
28222852
FN(strtoul), \
28232853
FN(sk_storage_get), \
28242854
FN(sk_storage_delete), \
2825-
FN(send_signal),
2855+
FN(send_signal), \
2856+
FN(tcp_gen_syncookie),
28262857

28272858
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
28282859
* function eBPF program intends to call

tools/testing/selftests/bpf/bpf_helpers.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,9 @@ static void *(*bpf_sk_storage_get)(void *map, struct bpf_sock *sk,
228228
static int (*bpf_sk_storage_delete)(void *map, struct bpf_sock *sk) =
229229
(void *)BPF_FUNC_sk_storage_delete;
230230
static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal;
231+
static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip,
232+
int ip_len, void *tcp, int tcp_len) =
233+
(void *) BPF_FUNC_tcp_gen_syncookie;
231234

232235
/* llvm builtin functions that eBPF C program may use to
233236
* emit BPF_LD_ABS and BPF_LD_IND instructions

0 commit comments

Comments
 (0)