Skip to content

Commit 40a1227

Browse files
iamkafaiborkmann
authored andcommitted
tcp: Avoid TCP syncookie rejected by SO_REUSEPORT socket
Although the actual cookie check "__cookie_v[46]_check()" does not involve sk specific info, it checks whether the sk has recent synq overflow event in "tcp_synq_no_recent_overflow()". The tcp_sk(sk)->rx_opt.ts_recent_stamp is updated every second when it has sent out a syncookie (through "tcp_synq_overflow()"). The above per sk "recent synq overflow event timestamp" works well for non SO_REUSEPORT use case. However, it may cause random connection request reject/discard when SO_REUSEPORT is used with syncookie because it fails the "tcp_synq_no_recent_overflow()" test. When SO_REUSEPORT is used, it usually has multiple listening socks serving TCP connection requests destinated to the same local IP:PORT. There are cases that the TCP-ACK-COOKIE may not be received by the same sk that sent out the syncookie. For example, if reuse->socks[] began with {sk0, sk1}, 1) sk1 sent out syncookies and tcp_sk(sk1)->rx_opt.ts_recent_stamp was updated. 2) the reuse->socks[] became {sk1, sk2} later. e.g. sk0 was first closed and then sk2 was added. Here, sk2 does not have ts_recent_stamp set. There are other ordering that will trigger the similar situation below but the idea is the same. 3) When the TCP-ACK-COOKIE comes back, sk2 was selected. "tcp_synq_no_recent_overflow(sk2)" returns true. In this case, all syncookies sent by sk1 will be handled (and rejected) by sk2 while sk1 is still alive. The userspace may create and remove listening SO_REUSEPORT sockets as it sees fit. E.g. Adding new thread (and SO_REUSEPORT sock) to handle incoming requests, old process stopping and new process starting...etc. With or without SO_ATTACH_REUSEPORT_[CB]BPF, the sockets leaving and joining a reuseport group makes picking the same sk to check the syncookie very difficult (if not impossible). The later patches will allow bpf prog more flexibility in deciding where a sk should be located in a bpf map and selecting a particular SO_REUSEPORT sock as it sees fit. e.g. Without closing any sock, replace the whole bpf reuseport_array in one map_update() by using map-in-map. Getting the syncookie check working smoothly across socks in the same "reuse->socks[]" is important. A partial solution is to set the newly added sk's ts_recent_stamp to the max ts_recent_stamp of a reuseport group but that will require to iterate through reuse->socks[] OR pessimistically set it to "now - TCP_SYNCOOKIE_VALID" when a sk is joining a reuseport group. However, neither of them will solve the existing sk getting moved around the reuse->socks[] and that sk may not have ts_recent_stamp updated, unlikely under continuous synflood but not impossible. This patch opts to treat the reuseport group as a whole when considering the last synq overflow timestamp since they are serving the same IP:PORT from the userspace (and BPF program) perspective. "synq_overflow_ts" is added to "struct sock_reuseport". The tcp_synq_overflow() and tcp_synq_no_recent_overflow() will update/check reuse->synq_overflow_ts if the sk is in a reuseport group. Similar to the reuseport decision in __inet_lookup_listener(), both sk->sk_reuseport and sk->sk_reuseport_cb are tested for SO_REUSEPORT usage. Update on "synq_overflow_ts" happens at roughly once every second. A synflood test was done with a 16 rx-queues and 16 reuseport sockets. No meaningful performance change is observed. Before and after the change is ~9Mpps in IPv4. Cc: Eric Dumazet <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
1 parent 74b247f commit 40a1227

File tree

3 files changed

+33
-2
lines changed

3 files changed

+33
-2
lines changed

include/net/sock_reuseport.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@ struct sock_reuseport {
1212

1313
u16 max_socks; /* length of socks */
1414
u16 num_socks; /* elements in socks */
15+
/* The last synq overflow event timestamp of this
16+
* reuse->socks[] group.
17+
*/
18+
unsigned int synq_overflow_ts;
1519
struct bpf_prog __rcu *prog; /* optional BPF sock selector */
1620
struct sock *socks[0]; /* array of sock pointers */
1721
};

include/net/tcp.h

Lines changed: 28 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
#include <net/inet_hashtables.h>
3737
#include <net/checksum.h>
3838
#include <net/request_sock.h>
39+
#include <net/sock_reuseport.h>
3940
#include <net/sock.h>
4041
#include <net/snmp.h>
4142
#include <net/ip.h>
@@ -473,19 +474,44 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb);
473474
*/
474475
static inline void tcp_synq_overflow(const struct sock *sk)
475476
{
476-
unsigned int last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
477+
unsigned int last_overflow;
477478
unsigned int now = jiffies;
478479

480+
if (sk->sk_reuseport) {
481+
struct sock_reuseport *reuse;
482+
483+
reuse = rcu_dereference(sk->sk_reuseport_cb);
484+
if (likely(reuse)) {
485+
last_overflow = READ_ONCE(reuse->synq_overflow_ts);
486+
if (time_after32(now, last_overflow + HZ))
487+
WRITE_ONCE(reuse->synq_overflow_ts, now);
488+
return;
489+
}
490+
}
491+
492+
last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
479493
if (time_after32(now, last_overflow + HZ))
480494
tcp_sk(sk)->rx_opt.ts_recent_stamp = now;
481495
}
482496

483497
/* syncookies: no recent synqueue overflow on this listening socket? */
484498
static inline bool tcp_synq_no_recent_overflow(const struct sock *sk)
485499
{
486-
unsigned int last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
500+
unsigned int last_overflow;
487501
unsigned int now = jiffies;
488502

503+
if (sk->sk_reuseport) {
504+
struct sock_reuseport *reuse;
505+
506+
reuse = rcu_dereference(sk->sk_reuseport_cb);
507+
if (likely(reuse)) {
508+
last_overflow = READ_ONCE(reuse->synq_overflow_ts);
509+
return time_after32(now, last_overflow +
510+
TCP_SYNCOOKIE_VALID);
511+
}
512+
}
513+
514+
last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
489515
return time_after32(now, last_overflow + TCP_SYNCOOKIE_VALID);
490516
}
491517

net/core/sock_reuseport.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ static struct sock_reuseport *reuseport_grow(struct sock_reuseport *reuse)
8181

8282
memcpy(more_reuse->socks, reuse->socks,
8383
reuse->num_socks * sizeof(struct sock *));
84+
more_reuse->synq_overflow_ts = READ_ONCE(reuse->synq_overflow_ts);
8485

8586
for (i = 0; i < reuse->num_socks; ++i)
8687
rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb,

0 commit comments

Comments
 (0)