Skip to content

Commit bcefe17

Browse files
Cong Wangdavem330
authored andcommitted
tcp: introduce a per-route knob for quick ack
In previous discussions, I tried to find some reasonable heuristics for delayed ACK, however this seems not possible, according to Eric: "ACKS might also be delayed because of bidirectional traffic, and is more controlled by the application response time. TCP stack can not easily estimate it." "ACK can be incredibly useful to recover from losses in a short time. The vast majority of TCP sessions are small lived, and we send one ACK per received segment anyway at beginning or retransmits to let the sender smoothly increase its cwnd, so an auto-tuning facility wont help them that much." and according to David: "ACKs are the only information we have to detect loss. And, for the same reasons that TCP VEGAS is fundamentally broken, we cannot measure the pipe or some other receiver-side-visible piece of information to determine when it's "safe" to stretch ACK. And even if it's "safe", we should not do it so that losses are accurately detected and we don't spuriously retransmit. The only way to know when the bandwidth increases is to "test" it, by sending more and more packets until drops happen. That's why all successful congestion control algorithms must operate on explicited tested pieces of information. Similarly, it's not really possible to universally know if it's safe to stretch ACK or not." It still makes sense to enable or disable quick ack mode like what TCP_QUICK_ACK does. Similar to TCP_QUICK_ACK option, but for people who can't modify the source code and still wants to control TCP delayed ACK behavior. As David suggested, this should belong to per-path scope, since different pathes may want different behaviors. Cc: Eric Dumazet <[email protected]> Cc: Rick Jones <[email protected]> Cc: Stephen Hemminger <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Thomas Graf <[email protected]> CC: David Laight <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
1 parent 2c0740e commit bcefe17

File tree

3 files changed

+10
-3
lines changed

3 files changed

+10
-3
lines changed

include/uapi/linux/rtnetlink.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -386,6 +386,8 @@ enum {
386386
#define RTAX_RTO_MIN RTAX_RTO_MIN
387387
RTAX_INITRWND,
388388
#define RTAX_INITRWND RTAX_INITRWND
389+
RTAX_QUICKACK,
390+
#define RTAX_QUICKACK RTAX_QUICKACK
389391
__RTAX_MAX
390392
};
391393

net/ipv4/tcp_input.c

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3717,6 +3717,7 @@ void tcp_reset(struct sock *sk)
37173717
static void tcp_fin(struct sock *sk)
37183718
{
37193719
struct tcp_sock *tp = tcp_sk(sk);
3720+
const struct dst_entry *dst;
37203721

37213722
inet_csk_schedule_ack(sk);
37223723

@@ -3728,7 +3729,9 @@ static void tcp_fin(struct sock *sk)
37283729
case TCP_ESTABLISHED:
37293730
/* Move to CLOSE_WAIT */
37303731
tcp_set_state(sk, TCP_CLOSE_WAIT);
3731-
inet_csk(sk)->icsk_ack.pingpong = 1;
3732+
dst = __sk_dst_get(sk);
3733+
if (!dst || !dst_metric(dst, RTAX_QUICKACK))
3734+
inet_csk(sk)->icsk_ack.pingpong = 1;
37323735
break;
37333736

37343737
case TCP_CLOSE_WAIT:

net/ipv4/tcp_output.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,7 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
160160
{
161161
struct inet_connection_sock *icsk = inet_csk(sk);
162162
const u32 now = tcp_time_stamp;
163+
const struct dst_entry *dst = __sk_dst_get(sk);
163164

164165
if (sysctl_tcp_slow_start_after_idle &&
165166
(!tp->packets_out && (s32)(now - tp->lsndtime) > icsk->icsk_rto))
@@ -170,8 +171,9 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
170171
/* If it is a reply for ato after last received
171172
* packet, enter pingpong mode.
172173
*/
173-
if ((u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato)
174-
icsk->icsk_ack.pingpong = 1;
174+
if ((u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato &&
175+
(!dst || !dst_metric(dst, RTAX_QUICKACK)))
176+
icsk->icsk_ack.pingpong = 1;
175177
}
176178

177179
/* Account for an ACK we sent. */

0 commit comments

Comments
 (0)