Skip to content

Commit 2b96547

Browse files
yupeng0921davem330
authored andcommitted
add document for TCP OFO, PAWS and skip ACK counters
add document and examples for below counters: TcpExtTCPOFOQueue TcpExtTCPOFODrop TcpExtTCPOFOMerge TcpExtPAWSActive TcpExtPAWSEstab TcpExtTCPACKSkippedSynRecv TcpExtTCPACKSkippedPAWS TcpExtTCPACKSkippedSeq TcpExtTCPACKSkippedFinWait2 TcpExtTCPACKSkippedTimeWait TcpExtTCPACKSkippedChallenge Signed-off-by: yupeng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
1 parent 3a0ed3e commit 2b96547

File tree

1 file changed

+239
-1
lines changed

1 file changed

+239
-1
lines changed

Documentation/networking/snmp_counter.rst

Lines changed: 239 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -571,7 +571,97 @@ duplicate packet is received.
571571

572572
* TcpExtTCPDSACKOfoRecv
573573
The TCP stack receives a DSACK, which indicate an out of order
574-
duplciate packet is received.
574+
duplicate packet is received.
575+
576+
TCP out of order
577+
===============
578+
* TcpExtTCPOFOQueue
579+
The TCP layer receives an out of order packet and has enough memory
580+
to queue it.
581+
582+
* TcpExtTCPOFODrop
583+
The TCP layer receives an out of order packet but doesn't have enough
584+
memory, so drops it. Such packets won't be counted into
585+
TcpExtTCPOFOQueue.
586+
587+
* TcpExtTCPOFOMerge
588+
The received out of order packet has an overlay with the previous
589+
packet. the overlay part will be dropped. All of TcpExtTCPOFOMerge
590+
packets will also be counted into TcpExtTCPOFOQueue.
591+
592+
TCP PAWS
593+
=======
594+
PAWS (Protection Against Wrapped Sequence numbers) is an algorithm
595+
which is used to drop old packets. It depends on the TCP
596+
timestamps. For detail information, please refer the `timestamp wiki`_
597+
and the `RFC of PAWS`_.
598+
599+
.. _RFC of PAWS: https://tools.ietf.org/html/rfc1323#page-17
600+
.. _timestamp wiki: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_timestamps
601+
602+
* TcpExtPAWSActive
603+
Packets are dropped by PAWS in Syn-Sent status.
604+
605+
* TcpExtPAWSEstab
606+
Packets are dropped by PAWS in any status other than Syn-Sent.
607+
608+
TCP ACK skip
609+
===========
610+
In some scenarios, kernel would avoid sending duplicate ACKs too
611+
frequently. Please find more details in the tcp_invalid_ratelimit
612+
section of the `sysctl document`_. When kernel decides to skip an ACK
613+
due to tcp_invalid_ratelimit, kernel would update one of below
614+
counters to indicate the ACK is skipped in which scenario. The ACK
615+
would only be skipped if the received packet is either a SYN packet or
616+
it has no data.
617+
618+
.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
619+
620+
* TcpExtTCPACKSkippedSynRecv
621+
The ACK is skipped in Syn-Recv status. The Syn-Recv status means the
622+
TCP stack receives a SYN and replies SYN+ACK. Now the TCP stack is
623+
waiting for an ACK. Generally, the TCP stack doesn't need to send ACK
624+
in the Syn-Recv status. But in several scenarios, the TCP stack need
625+
to send an ACK. E.g., the TCP stack receives the same SYN packet
626+
repeately, the received packet does not pass the PAWS check, or the
627+
received packet sequence number is out of window. In these scenarios,
628+
the TCP stack needs to send ACK. If the ACk sending frequency is higher than
629+
tcp_invalid_ratelimit allows, the TCP stack will skip sending ACK and
630+
increase TcpExtTCPACKSkippedSynRecv.
631+
632+
633+
* TcpExtTCPACKSkippedPAWS
634+
The ACK is skipped due to PAWS (Protect Against Wrapped Sequence
635+
numbers) check fails. If the PAWS check fails in Syn-Recv, Fin-Wait-2
636+
or Time-Wait statuses, the skipped ACK would be counted to
637+
TcpExtTCPACKSkippedSynRecv, TcpExtTCPACKSkippedFinWait2 or
638+
TcpExtTCPACKSkippedTimeWait. In all other statuses, the skipped ACK
639+
would be counted to TcpExtTCPACKSkippedPAWS.
640+
641+
* TcpExtTCPACKSkippedSeq
642+
The sequence number is out of window and the timestamp passes the PAWS
643+
check and the TCP status is not Syn-Recv, Fin-Wait-2, and Time-Wait.
644+
645+
* TcpExtTCPACKSkippedFinWait2
646+
The ACK is skipped in Fin-Wait-2 status, the reason would be either
647+
PAWS check fails or the received sequence number is out of window.
648+
649+
* TcpExtTCPACKSkippedTimeWait
650+
Tha ACK is skipped in Time-Wait status, the reason would be either
651+
PAWS check failed or the received sequence number is out of window.
652+
653+
* TcpExtTCPACKSkippedChallenge
654+
The ACK is skipped if the ACK is a challenge ACK. The RFC 5961 defines
655+
3 kind of challenge ACK, please refer `RFC 5961 section 3.2`_,
656+
`RFC 5961 section 4.2`_ and `RFC 5961 section 5.2`_. Besides these
657+
three scenarios, In some TCP status, the linux TCP stack would also
658+
send challenge ACKs if the ACK number is before the first
659+
unacknowledged number (more strict than `RFC 5961 section 5.2`_).
660+
661+
.. _RFC 5961 section 3.2: https://tools.ietf.org/html/rfc5961#page-7
662+
.. _RFC 5961 section 4.2: https://tools.ietf.org/html/rfc5961#page-9
663+
.. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11
664+
575665

576666
examples
577667
=======
@@ -1188,3 +1278,151 @@ Run nstat on server B::
11881278
We have deleted the default route on server B. Server B couldn't find
11891279
a route for the 8.8.8.8 IP address, so server B increased
11901280
IpOutNoRoutes.
1281+
1282+
TcpExtTCPACKSkippedSynRecv
1283+
------------------------
1284+
In this test, we send 3 same SYN packets from client to server. The
1285+
first SYN will let server create a socket, set it to Syn-Recv status,
1286+
and reply a SYN/ACK. The second SYN will let server reply the SYN/ACK
1287+
again, and record the reply time (the duplicate ACK reply time). The
1288+
third SYN will let server check the previous duplicate ACK reply time,
1289+
and decide to skip the duplicate ACK, then increase the
1290+
TcpExtTCPACKSkippedSynRecv counter.
1291+
1292+
Run tcpdump to capture a SYN packet::
1293+
1294+
nstatuser@nstat-a:~$ sudo tcpdump -c 1 -w /tmp/syn.pcap port 9000
1295+
tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
1296+
1297+
Open another terminal, run nc command::
1298+
1299+
nstatuser@nstat-a:~$ nc nstat-b 9000
1300+
1301+
As the nstat-b didn't listen on port 9000, it should reply a RST, and
1302+
the nc command exited immediately. It was enough for the tcpdump
1303+
command to capture a SYN packet. A linux server might use hardware
1304+
offload for the TCP checksum, so the checksum in the /tmp/syn.pcap
1305+
might be not correct. We call tcprewrite to fix it::
1306+
1307+
nstatuser@nstat-a:~$ tcprewrite --infile=/tmp/syn.pcap --outfile=/tmp/syn_fixcsum.pcap --fixcsum
1308+
1309+
On nstat-b, we run nc to listen on port 9000::
1310+
1311+
nstatuser@nstat-b:~$ nc -lkv 9000
1312+
Listening on [0.0.0.0] (family 0, port 9000)
1313+
1314+
On nstat-a, we blocked the packet from port 9000, or nstat-a would send
1315+
RST to nstat-b::
1316+
1317+
nstatuser@nstat-a:~$ sudo iptables -A INPUT -p tcp --sport 9000 -j DROP
1318+
1319+
Send 3 SYN repeatly to nstat-b::
1320+
1321+
nstatuser@nstat-a:~$ for i in {1..3}; do sudo tcpreplay -i ens3 /tmp/syn_fixcsum.pcap; done
1322+
1323+
Check snmp cunter on nstat-b::
1324+
1325+
nstatuser@nstat-b:~$ nstat | grep -i skip
1326+
TcpExtTCPACKSkippedSynRecv 1 0.0
1327+
1328+
As we expected, TcpExtTCPACKSkippedSynRecv is 1.
1329+
1330+
TcpExtTCPACKSkippedPAWS
1331+
----------------------
1332+
To trigger PAWS, we could send an old SYN.
1333+
1334+
On nstat-b, let nc listen on port 9000::
1335+
1336+
nstatuser@nstat-b:~$ nc -lkv 9000
1337+
Listening on [0.0.0.0] (family 0, port 9000)
1338+
1339+
On nstat-a, run tcpdump to capture a SYN::
1340+
1341+
nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/paws_pre.pcap -c 1 port 9000
1342+
tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
1343+
1344+
On nstat-a, run nc as a client to connect nstat-b::
1345+
1346+
nstatuser@nstat-a:~$ nc -v nstat-b 9000
1347+
Connection to nstat-b 9000 port [tcp/*] succeeded!
1348+
1349+
Now the tcpdump has captured the SYN and exit. We should fix the
1350+
checksum::
1351+
1352+
nstatuser@nstat-a:~$ tcprewrite --infile /tmp/paws_pre.pcap --outfile /tmp/paws.pcap --fixcsum
1353+
1354+
Send the SYN packet twice::
1355+
1356+
nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/paws.pcap; done
1357+
1358+
On nstat-b, check the snmp counter::
1359+
1360+
nstatuser@nstat-b:~$ nstat | grep -i skip
1361+
TcpExtTCPACKSkippedPAWS 1 0.0
1362+
1363+
We sent two SYN via tcpreplay, both of them would let PAWS check
1364+
failed, the nstat-b replied an ACK for the first SYN, skipped the ACK
1365+
for the second SYN, and updated TcpExtTCPACKSkippedPAWS.
1366+
1367+
TcpExtTCPACKSkippedSeq
1368+
--------------------
1369+
To trigger TcpExtTCPACKSkippedSeq, we send packets which have valid
1370+
timestamp (to pass PAWS check) but the sequence number is out of
1371+
window. The linux TCP stack would avoid to skip if the packet has
1372+
data, so we need a pure ACK packet. To generate such a packet, we
1373+
could create two sockets: one on port 9000, another on port 9001. Then
1374+
we capture an ACK on port 9001, change the source/destination port
1375+
numbers to match the port 9000 socket. Then we could trigger
1376+
TcpExtTCPACKSkippedSeq via this packet.
1377+
1378+
On nstat-b, open two terminals, run two nc commands to listen on both
1379+
port 9000 and port 9001::
1380+
1381+
nstatuser@nstat-b:~$ nc -lkv 9000
1382+
Listening on [0.0.0.0] (family 0, port 9000)
1383+
1384+
nstatuser@nstat-b:~$ nc -lkv 9001
1385+
Listening on [0.0.0.0] (family 0, port 9001)
1386+
1387+
On nstat-a, run two nc clients::
1388+
1389+
nstatuser@nstat-a:~$ nc -v nstat-b 9000
1390+
Connection to nstat-b 9000 port [tcp/*] succeeded!
1391+
1392+
nstatuser@nstat-a:~$ nc -v nstat-b 9001
1393+
Connection to nstat-b 9001 port [tcp/*] succeeded!
1394+
1395+
On nstat-a, run tcpdump to capture an ACK::
1396+
1397+
nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/seq_pre.pcap -c 1 dst port 9001
1398+
tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
1399+
1400+
On nstat-b, send a packet via the port 9001 socket. E.g. we sent a
1401+
string 'foo' in our example::
1402+
1403+
nstatuser@nstat-b:~$ nc -lkv 9001
1404+
Listening on [0.0.0.0] (family 0, port 9001)
1405+
Connection from nstat-a 42132 received!
1406+
foo
1407+
1408+
On nstat-a, the tcpdump should have caputred the ACK. We should check
1409+
the source port numbers of the two nc clients::
1410+
1411+
nstatuser@nstat-a:~$ ss -ta '( dport = :9000 || dport = :9001 )' | tee
1412+
State Recv-Q Send-Q Local Address:Port Peer Address:Port
1413+
ESTAB 0 0 192.168.122.250:50208 192.168.122.251:9000
1414+
ESTAB 0 0 192.168.122.250:42132 192.168.122.251:9001
1415+
1416+
Run tcprewrite, change port 9001 to port 9000, chagne port 42132 to
1417+
port 50208::
1418+
1419+
nstatuser@nstat-a:~$ tcprewrite --infile /tmp/seq_pre.pcap --outfile /tmp/seq.pcap -r 9001:9000 -r 42132:50208 --fixcsum
1420+
1421+
Now the /tmp/seq.pcap is the packet we need. Send it to nstat-b::
1422+
1423+
nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/seq.pcap; done
1424+
1425+
Check TcpExtTCPACKSkippedSeq on nstat-b::
1426+
1427+
nstatuser@nstat-b:~$ nstat | grep -i skip
1428+
TcpExtTCPACKSkippedSeq 1 0.0

0 commit comments

Comments
 (0)