Skip to content

Commit b534dc4

Browse files
wdebruijkuba-moo
authored andcommitted
net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCP
Add an option to initialize SOF_TIMESTAMPING_OPT_ID for TCP from write_seq sockets instead of snd_una. This should have been the behavior from the start. Because processes may now exist that rely on the established behavior, do not change behavior of the existing option, but add the right behavior with a new flag. It is encouraged to always set SOF_TIMESTAMPING_OPT_ID_TCP on stream sockets along with the existing SOF_TIMESTAMPING_OPT_ID. Intuitively the contract is that the counter is zero after the setsockopt, so that the next write N results in a notification for the last byte N - 1. On idle sockets snd_una == write_seq and this holds for both. But on sockets with data in transmission, snd_una records the unacked offset in the stream. This depends on the ACK response from the peer. A process cannot learn this in a race free manner (ioctl SIOCOUTQ is one racy approach). write_seq records the offset at the last byte written by the process. This is a better starting point. It matches the intuitive contract in all circumstances, unaffected by external behavior. The new timestamp flag necessitates increasing sk_tsflags to 32 bits. Move the field in struct sock to avoid growing the socket (for some common CONFIG variants). The UAPI interface so_timestamping.flags is already int, so 32 bits wide. Reported-by: Sotirios Delimanolis <[email protected]> Signed-off-by: Willem de Bruijn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
1 parent ecd6df3 commit b534dc4

File tree

5 files changed

+45
-6
lines changed

5 files changed

+45
-6
lines changed

Documentation/networking/timestamping.rst

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,8 @@ SOF_TIMESTAMPING_OPT_ID:
179179
identifier and returns that along with the timestamp. The identifier
180180
is derived from a per-socket u32 counter (that wraps). For datagram
181181
sockets, the counter increments with each sent packet. For stream
182-
sockets, it increments with every byte.
182+
sockets, it increments with every byte. For stream sockets, also set
183+
SOF_TIMESTAMPING_OPT_ID_TCP, see the section below.
183184

184185
The counter starts at zero. It is initialized the first time that
185186
the socket option is enabled. It is reset each time the option is
@@ -192,6 +193,35 @@ SOF_TIMESTAMPING_OPT_ID:
192193
among all possibly concurrently outstanding timestamp requests for
193194
that socket.
194195

196+
SOF_TIMESTAMPING_OPT_ID_TCP:
197+
Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
198+
timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
199+
counter increments for stream sockets, but its starting point is
200+
not entirely trivial. This option fixes that.
201+
202+
For stream sockets, if SOF_TIMESTAMPING_OPT_ID is set, this should
203+
always be set too. On datagram sockets the option has no effect.
204+
205+
A reasonable expectation is that the counter is reset to zero with
206+
the system call, so that a subsequent write() of N bytes generates
207+
a timestamp with counter N-1. SOF_TIMESTAMPING_OPT_ID_TCP
208+
implements this behavior under all conditions.
209+
210+
SOF_TIMESTAMPING_OPT_ID without modifier often reports the same,
211+
especially when the socket option is set when no data is in
212+
transmission. If data is being transmitted, it may be off by the
213+
length of the output queue (SIOCOUTQ).
214+
215+
The difference is due to being based on snd_una versus write_seq.
216+
snd_una is the offset in the stream acknowledged by the peer. This
217+
depends on factors outside of process control, such as network RTT.
218+
write_seq is the last byte written by the process. This offset is
219+
not affected by external inputs.
220+
221+
The difference is subtle and unlikely to be noticed when configured
222+
at initial socket creation, when no data is queued or sent. But
223+
SOF_TIMESTAMPING_OPT_ID_TCP behavior is more robust regardless of
224+
when the socket option is set.
195225

196226
SOF_TIMESTAMPING_OPT_CMSG:
197227
Support recv() cmsg for all timestamped packets. Control messages

include/net/sock.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -503,10 +503,10 @@ struct sock {
503503
#if BITS_PER_LONG==32
504504
seqlock_t sk_stamp_seq;
505505
#endif
506-
u16 sk_tsflags;
507-
u8 sk_shutdown;
508506
atomic_t sk_tskey;
509507
atomic_t sk_zckey;
508+
u32 sk_tsflags;
509+
u8 sk_shutdown;
510510

511511
u8 sk_clockid;
512512
u8 sk_txtime_deadline_mode : 1,
@@ -1899,7 +1899,7 @@ static inline void sock_replace_proto(struct sock *sk, struct proto *proto)
18991899
struct sockcm_cookie {
19001900
u64 transmit_time;
19011901
u32 mark;
1902-
u16 tsflags;
1902+
u32 tsflags;
19031903
};
19041904

19051905
static inline void sockcm_init(struct sockcm_cookie *sockc,

include/uapi/linux/net_tstamp.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,9 @@ enum {
3131
SOF_TIMESTAMPING_OPT_PKTINFO = (1<<13),
3232
SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14),
3333
SOF_TIMESTAMPING_BIND_PHC = (1 << 15),
34+
SOF_TIMESTAMPING_OPT_ID_TCP = (1 << 16),
3435

35-
SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_BIND_PHC,
36+
SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_ID_TCP,
3637
SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
3738
SOF_TIMESTAMPING_LAST
3839
};

net/core/sock.c

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -901,13 +901,20 @@ int sock_set_timestamping(struct sock *sk, int optname,
901901
if (val & ~SOF_TIMESTAMPING_MASK)
902902
return -EINVAL;
903903

904+
if (val & SOF_TIMESTAMPING_OPT_ID_TCP &&
905+
!(val & SOF_TIMESTAMPING_OPT_ID))
906+
return -EINVAL;
907+
904908
if (val & SOF_TIMESTAMPING_OPT_ID &&
905909
!(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
906910
if (sk_is_tcp(sk)) {
907911
if ((1 << sk->sk_state) &
908912
(TCPF_CLOSE | TCPF_LISTEN))
909913
return -EINVAL;
910-
atomic_set(&sk->sk_tskey, tcp_sk(sk)->snd_una);
914+
if (val & SOF_TIMESTAMPING_OPT_ID_TCP)
915+
atomic_set(&sk->sk_tskey, tcp_sk(sk)->write_seq);
916+
else
917+
atomic_set(&sk->sk_tskey, tcp_sk(sk)->snd_una);
911918
} else {
912919
atomic_set(&sk->sk_tskey, 0);
913920
}

net/ethtool/common.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -417,6 +417,7 @@ const char sof_timestamping_names[][ETH_GSTRING_LEN] = {
417417
[const_ilog2(SOF_TIMESTAMPING_OPT_PKTINFO)] = "option-pktinfo",
418418
[const_ilog2(SOF_TIMESTAMPING_OPT_TX_SWHW)] = "option-tx-swhw",
419419
[const_ilog2(SOF_TIMESTAMPING_BIND_PHC)] = "bind-phc",
420+
[const_ilog2(SOF_TIMESTAMPING_OPT_ID_TCP)] = "option-id-tcp",
420421
};
421422
static_assert(ARRAY_SIZE(sof_timestamping_names) == __SOF_TIMESTAMPING_CNT);
422423

0 commit comments

Comments
 (0)