Skip to content

Commit 36a6503

Browse files
Eric Dumazetdavem330
authored andcommitted
tcp: refine tcp_prune_ofo_queue() to not drop all packets
Over the years, TCP BDP has increased a lot, and is typically in the order of ~10 Mbytes with help of clever Congestion Control modules. In presence of packet losses, TCP stores incoming packets into an out of order queue, and number of skbs sitting there waiting for the missing packets to be received can match the BDP (~10 Mbytes) In some cases, TCP needs to make room for incoming skbs, and current strategy can simply remove all skbs in the out of order queue as a last resort, incurring a huge penalty, both for receiver and sender. Unfortunately these 'last resort events' are quite frequent, forcing sender to send all packets again, stalling the flow and wasting a lot of resources. This patch cleans only a part of the out of order queue in order to meet the memory constraints. Signed-off-by: Eric Dumazet <[email protected]> Cc: Neal Cardwell <[email protected]> Cc: Yuchung Cheng <[email protected]> Cc: Soheil Hassas Yeganeh <[email protected]> Cc: C. Stephen Gun <[email protected]> Cc: Van Jacobson <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Acked-by: Yuchung Cheng <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
1 parent e2d8f64 commit 36a6503

File tree

1 file changed

+28
-19
lines changed

1 file changed

+28
-19
lines changed

net/ipv4/tcp_input.c

Lines changed: 28 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -4392,12 +4392,9 @@ static int tcp_try_rmem_schedule(struct sock *sk, struct sk_buff *skb,
43924392
if (tcp_prune_queue(sk) < 0)
43934393
return -1;
43944394

4395-
if (!sk_rmem_schedule(sk, skb, size)) {
4395+
while (!sk_rmem_schedule(sk, skb, size)) {
43964396
if (!tcp_prune_ofo_queue(sk))
43974397
return -1;
4398-
4399-
if (!sk_rmem_schedule(sk, skb, size))
4400-
return -1;
44014398
}
44024399
}
44034400
return 0;
@@ -4874,29 +4871,41 @@ static void tcp_collapse_ofo_queue(struct sock *sk)
48744871
}
48754872

48764873
/*
4877-
* Purge the out-of-order queue.
4878-
* Return true if queue was pruned.
4874+
* Clean the out-of-order queue to make room.
4875+
* We drop high sequences packets to :
4876+
* 1) Let a chance for holes to be filled.
4877+
* 2) not add too big latencies if thousands of packets sit there.
4878+
* (But if application shrinks SO_RCVBUF, we could still end up
4879+
* freeing whole queue here)
4880+
*
4881+
* Return true if queue has shrunk.
48794882
*/
48804883
static bool tcp_prune_ofo_queue(struct sock *sk)
48814884
{
48824885
struct tcp_sock *tp = tcp_sk(sk);
4883-
bool res = false;
4886+
struct sk_buff *skb;
48844887

4885-
if (!skb_queue_empty(&tp->out_of_order_queue)) {
4886-
NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
4887-
__skb_queue_purge(&tp->out_of_order_queue);
4888+
if (skb_queue_empty(&tp->out_of_order_queue))
4889+
return false;
48884890

4889-
/* Reset SACK state. A conforming SACK implementation will
4890-
* do the same at a timeout based retransmit. When a connection
4891-
* is in a sad state like this, we care only about integrity
4892-
* of the connection not performance.
4893-
*/
4894-
if (tp->rx_opt.sack_ok)
4895-
tcp_sack_reset(&tp->rx_opt);
4891+
NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
4892+
4893+
while ((skb = __skb_dequeue_tail(&tp->out_of_order_queue)) != NULL) {
4894+
tcp_drop(sk, skb);
48964895
sk_mem_reclaim(sk);
4897-
res = true;
4896+
if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
4897+
!tcp_under_memory_pressure(sk))
4898+
break;
48984899
}
4899-
return res;
4900+
4901+
/* Reset SACK state. A conforming SACK implementation will
4902+
* do the same at a timeout based retransmit. When a connection
4903+
* is in a sad state like this, we care only about integrity
4904+
* of the connection not performance.
4905+
*/
4906+
if (tp->rx_opt.sack_ok)
4907+
tcp_sack_reset(&tp->rx_opt);
4908+
return true;
49004909
}
49014910

49024911
/* Reduce allocated memory if we can, trying to get

0 commit comments

Comments
 (0)