Skip to content

Commit 19b351f

Browse files
committed
netfilter: add flowtable documentation
This patch adds initial documentation for the Netfilter flowtable infrastructure. Reviewed-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
1 parent 1be3ac9 commit 19b351f

File tree

1 file changed

+112
-0
lines changed

1 file changed

+112
-0
lines changed
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
Netfilter's flowtable infrastructure
2+
====================================
3+
4+
This documentation describes the software flowtable infrastructure available in
5+
Netfilter since Linux kernel 4.16.
6+
7+
Overview
8+
--------
9+
10+
Initial packets follow the classic forwarding path, once the flow enters the
11+
established state according to the conntrack semantics (ie. we have seen traffic
12+
in both directions), then you can decide to offload the flow to the flowtable
13+
from the forward chain via the 'flow offload' action available in nftables.
14+
15+
Packets that find an entry in the flowtable (ie. flowtable hit) are sent to the
16+
output netdevice via neigh_xmit(), hence, they bypass the classic forwarding
17+
path (the visible effect is that you do not see these packets from any of the
18+
netfilter hooks coming after the ingress). In case of flowtable miss, the packet
19+
follows the classic forward path.
20+
21+
The flowtable uses a resizable hashtable, lookups are based on the following
22+
7-tuple selectors: source, destination, layer 3 and layer 4 protocols, source
23+
and destination ports and the input interface (useful in case there are several
24+
conntrack zones in place).
25+
26+
Flowtables are populated via the 'flow offload' nftables action, so the user can
27+
selectively specify what flows are placed into the flow table. Hence, packets
28+
follow the classic forwarding path unless the user explicitly instruct packets
29+
to use this new alternative forwarding path via nftables policy.
30+
31+
This is represented in Fig.1, which describes the classic forwarding path
32+
including the Netfilter hooks and the flowtable fastpath bypass.
33+
34+
userspace process
35+
^ |
36+
| |
37+
_____|____ ____\/___
38+
/ \ / \
39+
| input | | output |
40+
\__________/ \_________/
41+
^ |
42+
| |
43+
_________ __________ --------- _____\/_____
44+
/ \ / \ |Routing | / \
45+
--> ingress ---> prerouting ---> |decision| | postrouting |--> neigh_xmit
46+
\_________/ \__________/ ---------- \____________/ ^
47+
| ^ | | ^ |
48+
flowtable | | ____\/___ | |
49+
| | | / \ | |
50+
__\/___ | --------->| forward |------------ |
51+
|-----| | \_________/ |
52+
|-----| | 'flow offload' rule |
53+
|-----| | adds entry to |
54+
|_____| | flowtable |
55+
| | |
56+
/ \ | |
57+
/hit\_no_| |
58+
\ ? / |
59+
\ / |
60+
|__yes_________________fastpath bypass ____________________________|
61+
62+
Fig.1 Netfilter hooks and flowtable interactions
63+
64+
The flowtable entry also stores the NAT configuration, so all packets are
65+
mangled according to the NAT policy that matches the initial packets that went
66+
through the classic forwarding path. The TTL is decremented before calling
67+
neigh_xmit(). Fragmented traffic is passed up to follow the classic forwarding
68+
path given that the transport selectors are missing, therefore flowtable lookup
69+
is not possible.
70+
71+
Example configuration
72+
---------------------
73+
74+
Enabling the flowtable bypass is relatively easy, you only need to create a
75+
flowtable and add one rule to your forward chain.
76+
77+
table inet x {
78+
flowtable f {
79+
hook ingress priority 0 devices = { eth0, eth1 };
80+
}
81+
chain y {
82+
type filter hook forward priority 0; policy accept;
83+
ip protocol tcp flow offload @f
84+
counter packets 0 bytes 0
85+
}
86+
}
87+
88+
This example adds the flowtable 'f' to the ingress hook of the eth0 and eth1
89+
netdevices. You can create as many flowtables as you want in case you need to
90+
perform resource partitioning. The flowtable priority defines the order in which
91+
hooks are run in the pipeline, this is convenient in case you already have a
92+
nftables ingress chain (make sure the flowtable priority is smaller than the
93+
nftables ingress chain hence the flowtable runs before in the pipeline).
94+
95+
The 'flow offload' action from the forward chain 'y' adds an entry to the
96+
flowtable for the TCP syn-ack packet coming in the reply direction. Once the
97+
flow is offloaded, you will observe that the counter rule in the example above
98+
does not get updated for the packets that are being forwarded through the
99+
forwarding bypass.
100+
101+
More reading
102+
------------
103+
104+
This documentation is based on the LWN.net articles [1][2]. Rafal Milecki also
105+
made a very complete and comprehensive summary called "A state of network
106+
acceleration" that describes how things were before this infrastructure was
107+
mailined [3] and it also makes a rough summary of this work [4].
108+
109+
[1] https://lwn.net/Articles/738214/
110+
[2] https://lwn.net/Articles/742164/
111+
[3] http://lists.infradead.org/pipermail/lede-dev/2018-January/010830.html
112+
[4] http://lists.infradead.org/pipermail/lede-dev/2018-January/010829.html

0 commit comments

Comments
 (0)