Skip to content

Commit d41a69f

Browse files
edumazetdavem330
authored andcommitted
tcp: make tcp_sendmsg() aware of socket backlog
Large sendmsg()/write() hold socket lock for the duration of the call, unless sk->sk_sndbuf limit is hit. This is bad because incoming packets are parked into socket backlog for a long time. Critical decisions like fast retransmit might be delayed. Receivers have to maintain a big out of order queue with additional cpu overhead, and also possible stalls in TX once windows are full. Bidirectional flows are particularly hurt since the backlog can become quite big if the copy from user space triggers IO (page faults) Some applications learnt to use sendmsg() (or sendmmsg()) with small chunks to avoid this issue. Kernel should know better, right ? Add a generic sk_flush_backlog() helper and use it right before a new skb is allocated. Typically we put 64KB of payload per skb (unless MSG_EOR is requested) and checking socket backlog every 64KB gives good results. As a matter of fact, tests with TSO/GSO disabled give very nice results, as we manage to keep a small write queue and smaller perceived rtt. Note that sk_flush_backlog() maintains socket ownership, so is not equivalent to a {release_sock(sk); lock_sock(sk);}, to ensure implicit atomicity rules that sendmsg() was giving to (possibly buggy) applications. In this simple implementation, I chose to not call tcp_release_cb(), but we might consider this later. Signed-off-by: Eric Dumazet <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Marcelo Ricardo Leitner <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
1 parent 5413d1b commit d41a69f

File tree

3 files changed

+24
-2
lines changed

3 files changed

+24
-2
lines changed

include/net/sock.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -926,6 +926,17 @@ void sk_stream_kill_queues(struct sock *sk);
926926
void sk_set_memalloc(struct sock *sk);
927927
void sk_clear_memalloc(struct sock *sk);
928928

929+
void __sk_flush_backlog(struct sock *sk);
930+
931+
static inline bool sk_flush_backlog(struct sock *sk)
932+
{
933+
if (unlikely(READ_ONCE(sk->sk_backlog.tail))) {
934+
__sk_flush_backlog(sk);
935+
return true;
936+
}
937+
return false;
938+
}
939+
929940
int sk_wait_data(struct sock *sk, long *timeo, const struct sk_buff *skb);
930941

931942
struct request_sock_ops;

net/core/sock.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2048,6 +2048,13 @@ static void __release_sock(struct sock *sk)
20482048
sk->sk_backlog.len = 0;
20492049
}
20502050

2051+
void __sk_flush_backlog(struct sock *sk)
2052+
{
2053+
spin_lock_bh(&sk->sk_lock.slock);
2054+
__release_sock(sk);
2055+
spin_unlock_bh(&sk->sk_lock.slock);
2056+
}
2057+
20512058
/**
20522059
* sk_wait_data - wait for data to arrive at sk_receive_queue
20532060
* @sk: sock to wait on

net/ipv4/tcp.c

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1136,11 +1136,12 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
11361136
/* This should be in poll */
11371137
sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
11381138

1139-
mss_now = tcp_send_mss(sk, &size_goal, flags);
1140-
11411139
/* Ok commence sending. */
11421140
copied = 0;
11431141

1142+
restart:
1143+
mss_now = tcp_send_mss(sk, &size_goal, flags);
1144+
11441145
err = -EPIPE;
11451146
if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
11461147
goto out_err;
@@ -1166,6 +1167,9 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
11661167
if (!sk_stream_memory_free(sk))
11671168
goto wait_for_sndbuf;
11681169

1170+
if (sk_flush_backlog(sk))
1171+
goto restart;
1172+
11691173
skb = sk_stream_alloc_skb(sk,
11701174
select_size(sk, sg),
11711175
sk->sk_allocation,

0 commit comments

Comments
 (0)