|
| 1 | +Netfilter's flowtable infrastructure |
| 2 | +==================================== |
| 3 | + |
| 4 | +This documentation describes the software flowtable infrastructure available in |
| 5 | +Netfilter since Linux kernel 4.16. |
| 6 | + |
| 7 | +Overview |
| 8 | +-------- |
| 9 | + |
| 10 | +Initial packets follow the classic forwarding path, once the flow enters the |
| 11 | +established state according to the conntrack semantics (ie. we have seen traffic |
| 12 | +in both directions), then you can decide to offload the flow to the flowtable |
| 13 | +from the forward chain via the 'flow offload' action available in nftables. |
| 14 | + |
| 15 | +Packets that find an entry in the flowtable (ie. flowtable hit) are sent to the |
| 16 | +output netdevice via neigh_xmit(), hence, they bypass the classic forwarding |
| 17 | +path (the visible effect is that you do not see these packets from any of the |
| 18 | +netfilter hooks coming after the ingress). In case of flowtable miss, the packet |
| 19 | +follows the classic forward path. |
| 20 | + |
| 21 | +The flowtable uses a resizable hashtable, lookups are based on the following |
| 22 | +7-tuple selectors: source, destination, layer 3 and layer 4 protocols, source |
| 23 | +and destination ports and the input interface (useful in case there are several |
| 24 | +conntrack zones in place). |
| 25 | + |
| 26 | +Flowtables are populated via the 'flow offload' nftables action, so the user can |
| 27 | +selectively specify what flows are placed into the flow table. Hence, packets |
| 28 | +follow the classic forwarding path unless the user explicitly instruct packets |
| 29 | +to use this new alternative forwarding path via nftables policy. |
| 30 | + |
| 31 | +This is represented in Fig.1, which describes the classic forwarding path |
| 32 | +including the Netfilter hooks and the flowtable fastpath bypass. |
| 33 | + |
| 34 | + userspace process |
| 35 | + ^ | |
| 36 | + | | |
| 37 | + _____|____ ____\/___ |
| 38 | + / \ / \ |
| 39 | + | input | | output | |
| 40 | + \__________/ \_________/ |
| 41 | + ^ | |
| 42 | + | | |
| 43 | + _________ __________ --------- _____\/_____ |
| 44 | + / \ / \ |Routing | / \ |
| 45 | + --> ingress ---> prerouting ---> |decision| | postrouting |--> neigh_xmit |
| 46 | + \_________/ \__________/ ---------- \____________/ ^ |
| 47 | + | ^ | | ^ | |
| 48 | + flowtable | | ____\/___ | | |
| 49 | + | | | / \ | | |
| 50 | + __\/___ | --------->| forward |------------ | |
| 51 | + |-----| | \_________/ | |
| 52 | + |-----| | 'flow offload' rule | |
| 53 | + |-----| | adds entry to | |
| 54 | + |_____| | flowtable | |
| 55 | + | | | |
| 56 | + / \ | | |
| 57 | + /hit\_no_| | |
| 58 | + \ ? / | |
| 59 | + \ / | |
| 60 | + |__yes_________________fastpath bypass ____________________________| |
| 61 | + |
| 62 | + Fig.1 Netfilter hooks and flowtable interactions |
| 63 | + |
| 64 | +The flowtable entry also stores the NAT configuration, so all packets are |
| 65 | +mangled according to the NAT policy that matches the initial packets that went |
| 66 | +through the classic forwarding path. The TTL is decremented before calling |
| 67 | +neigh_xmit(). Fragmented traffic is passed up to follow the classic forwarding |
| 68 | +path given that the transport selectors are missing, therefore flowtable lookup |
| 69 | +is not possible. |
| 70 | + |
| 71 | +Example configuration |
| 72 | +--------------------- |
| 73 | + |
| 74 | +Enabling the flowtable bypass is relatively easy, you only need to create a |
| 75 | +flowtable and add one rule to your forward chain. |
| 76 | + |
| 77 | + table inet x { |
| 78 | + flowtable f { |
| 79 | + hook ingress priority 0 devices = { eth0, eth1 }; |
| 80 | + } |
| 81 | + chain y { |
| 82 | + type filter hook forward priority 0; policy accept; |
| 83 | + ip protocol tcp flow offload @f |
| 84 | + counter packets 0 bytes 0 |
| 85 | + } |
| 86 | + } |
| 87 | + |
| 88 | +This example adds the flowtable 'f' to the ingress hook of the eth0 and eth1 |
| 89 | +netdevices. You can create as many flowtables as you want in case you need to |
| 90 | +perform resource partitioning. The flowtable priority defines the order in which |
| 91 | +hooks are run in the pipeline, this is convenient in case you already have a |
| 92 | +nftables ingress chain (make sure the flowtable priority is smaller than the |
| 93 | +nftables ingress chain hence the flowtable runs before in the pipeline). |
| 94 | + |
| 95 | +The 'flow offload' action from the forward chain 'y' adds an entry to the |
| 96 | +flowtable for the TCP syn-ack packet coming in the reply direction. Once the |
| 97 | +flow is offloaded, you will observe that the counter rule in the example above |
| 98 | +does not get updated for the packets that are being forwarded through the |
| 99 | +forwarding bypass. |
| 100 | + |
| 101 | +More reading |
| 102 | +------------ |
| 103 | + |
| 104 | +This documentation is based on the LWN.net articles [1][2]. Rafal Milecki also |
| 105 | +made a very complete and comprehensive summary called "A state of network |
| 106 | +acceleration" that describes how things were before this infrastructure was |
| 107 | +mailined [3] and it also makes a rough summary of this work [4]. |
| 108 | + |
| 109 | +[1] https://lwn.net/Articles/738214/ |
| 110 | +[2] https://lwn.net/Articles/742164/ |
| 111 | +[3] http://lists.infradead.org/pipermail/lede-dev/2018-January/010830.html |
| 112 | +[4] http://lists.infradead.org/pipermail/lede-dev/2018-January/010829.html |
0 commit comments