Skip to content

Commit d162190

Browse files
committed
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree. This batch comes with more input sanitization for xtables to address bug reports from fuzzers, preparation works to the flowtable infrastructure and assorted updates. In no particular order, they are: 1) Make sure userspace provides a valid standard target verdict, from Florian Westphal. 2) Sanitize error target size, also from Florian. 3) Validate that last rule in basechain matches underflow/policy since userspace assumes this when decoding the ruleset blob that comes from the kernel, from Florian. 4) Consolidate hook entry checks through xt_check_table_hooks(), patch from Florian. 5) Cap ruleset allocations at 512 mbytes, 134217728 rules and reject very large compat offset arrays, so we have a reasonable upper limit and fuzzers don't exercise the oom-killer. Patches from Florian. 6) Several WARN_ON checks on xtables mutex helper, from Florian. 7) xt_rateest now has a hashtable per net, from Cong Wang. 8) Consolidate counter allocation in xt_counters_alloc(), from Florian. 9) Earlier xt_table_unlock() call in {ip,ip6,arp,eb}tables, patch from Xin Long. 10) Set FLOW_OFFLOAD_DIR_* to IP_CT_DIR_* definitions, patch from Felix Fietkau. 11) Consolidate code through flow_offload_fill_dir(), also from Felix. 12) Inline ip6_dst_mtu_forward() just like ip_dst_mtu_maybe_forward() to remove a dependency with flowtable and ipv6.ko, from Felix. 13) Cache mtu size in flow_offload_tuple object, this is safe for forwarding as f87c10a describes, from Felix. 14) Rename nf_flow_table.c to nf_flow_table_core.o, to simplify too modular infrastructure, from Felix. 15) Add rt0, rt2 and rt4 IPv6 routing extension support, patch from Ahmed Abdelsalam. 16) Remove unused parameter in nf_conncount_count(), from Yi-Hung Wei. 17) Support for counting only to nf_conncount infrastructure, patch from Yi-Hung Wei. 18) Add strict NFT_CT_{SRC_IP,DST_IP,SRC_IP6,DST_IP6} key datatypes to nft_ct. 19) Use boolean as return value from ipt_ah and from IPVS too, patch from Gustavo A. R. Silva. 20) Remove useless parameters in nfnl_acct_overquota() and nf_conntrack_broadcast_help(), from Taehee Yoo. 21) Use ipv6_addr_is_multicast() from xt_cluster, also from Taehee Yoo. 22) Statify nf_tables_obj_lookup_byhandle, patch from Fengguang Wu. 23) Fix typo in xt_limit, from Geert Uytterhoeven. 24) Do no use VLAs in Netfilter code, again from Gustavo. 25) Use ADD_COUNTER from ebtables, from Taehee Yoo. 26) Bitshift support for CONNMARK and MARK targets, from Jack Ma. 27) Use pr_*() and add pr_fmt(), from Arushi Singhal. 28) Add synproxy support to ctnetlink. 29) ICMP type and IGMP matching support for ebtables, patches from Matthias Schiffer. 30) Support for the revision infrastructure to ebtables, from Bernie Harris. 31) String match support for ebtables, also from Bernie. 32) Documentation for the new flowtable infrastructure. 33) Use generic comparison functions in ebt_stp, from Joe Perches. 34) Demodularize filter chains in nftables. 35) Register conntrack hooks in case nftables NAT chain is added. 36) Merge assignments with return in a couple of spots in the Netfilter codebase, also from Arushi. 37) Document that xtables percpu counters are stored in the same memory area, from Ben Hutchings. 38) Revert mark_source_chains() sanity checks that break existing rulesets, from Florian Westphal. 39) Use is_zero_ether_addr() in the ipset codebase, from Joe Perches. ==================== Signed-off-by: David S. Miller <[email protected]>
2 parents b9a1260 + 26c97c5 commit d162190

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+1383
-858
lines changed
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
Netfilter's flowtable infrastructure
2+
====================================
3+
4+
This documentation describes the software flowtable infrastructure available in
5+
Netfilter since Linux kernel 4.16.
6+
7+
Overview
8+
--------
9+
10+
Initial packets follow the classic forwarding path, once the flow enters the
11+
established state according to the conntrack semantics (ie. we have seen traffic
12+
in both directions), then you can decide to offload the flow to the flowtable
13+
from the forward chain via the 'flow offload' action available in nftables.
14+
15+
Packets that find an entry in the flowtable (ie. flowtable hit) are sent to the
16+
output netdevice via neigh_xmit(), hence, they bypass the classic forwarding
17+
path (the visible effect is that you do not see these packets from any of the
18+
netfilter hooks coming after the ingress). In case of flowtable miss, the packet
19+
follows the classic forward path.
20+
21+
The flowtable uses a resizable hashtable, lookups are based on the following
22+
7-tuple selectors: source, destination, layer 3 and layer 4 protocols, source
23+
and destination ports and the input interface (useful in case there are several
24+
conntrack zones in place).
25+
26+
Flowtables are populated via the 'flow offload' nftables action, so the user can
27+
selectively specify what flows are placed into the flow table. Hence, packets
28+
follow the classic forwarding path unless the user explicitly instruct packets
29+
to use this new alternative forwarding path via nftables policy.
30+
31+
This is represented in Fig.1, which describes the classic forwarding path
32+
including the Netfilter hooks and the flowtable fastpath bypass.
33+
34+
userspace process
35+
^ |
36+
| |
37+
_____|____ ____\/___
38+
/ \ / \
39+
| input | | output |
40+
\__________/ \_________/
41+
^ |
42+
| |
43+
_________ __________ --------- _____\/_____
44+
/ \ / \ |Routing | / \
45+
--> ingress ---> prerouting ---> |decision| | postrouting |--> neigh_xmit
46+
\_________/ \__________/ ---------- \____________/ ^
47+
| ^ | | ^ |
48+
flowtable | | ____\/___ | |
49+
| | | / \ | |
50+
__\/___ | --------->| forward |------------ |
51+
|-----| | \_________/ |
52+
|-----| | 'flow offload' rule |
53+
|-----| | adds entry to |
54+
|_____| | flowtable |
55+
| | |
56+
/ \ | |
57+
/hit\_no_| |
58+
\ ? / |
59+
\ / |
60+
|__yes_________________fastpath bypass ____________________________|
61+
62+
Fig.1 Netfilter hooks and flowtable interactions
63+
64+
The flowtable entry also stores the NAT configuration, so all packets are
65+
mangled according to the NAT policy that matches the initial packets that went
66+
through the classic forwarding path. The TTL is decremented before calling
67+
neigh_xmit(). Fragmented traffic is passed up to follow the classic forwarding
68+
path given that the transport selectors are missing, therefore flowtable lookup
69+
is not possible.
70+
71+
Example configuration
72+
---------------------
73+
74+
Enabling the flowtable bypass is relatively easy, you only need to create a
75+
flowtable and add one rule to your forward chain.
76+
77+
table inet x {
78+
flowtable f {
79+
hook ingress priority 0 devices = { eth0, eth1 };
80+
}
81+
chain y {
82+
type filter hook forward priority 0; policy accept;
83+
ip protocol tcp flow offload @f
84+
counter packets 0 bytes 0
85+
}
86+
}
87+
88+
This example adds the flowtable 'f' to the ingress hook of the eth0 and eth1
89+
netdevices. You can create as many flowtables as you want in case you need to
90+
perform resource partitioning. The flowtable priority defines the order in which
91+
hooks are run in the pipeline, this is convenient in case you already have a
92+
nftables ingress chain (make sure the flowtable priority is smaller than the
93+
nftables ingress chain hence the flowtable runs before in the pipeline).
94+
95+
The 'flow offload' action from the forward chain 'y' adds an entry to the
96+
flowtable for the TCP syn-ack packet coming in the reply direction. Once the
97+
flow is offloaded, you will observe that the counter rule in the example above
98+
does not get updated for the packets that are being forwarded through the
99+
forwarding bypass.
100+
101+
More reading
102+
------------
103+
104+
This documentation is based on the LWN.net articles [1][2]. Rafal Milecki also
105+
made a very complete and comprehensive summary called "A state of network
106+
acceleration" that describes how things were before this infrastructure was
107+
mailined [3] and it also makes a rough summary of this work [4].
108+
109+
[1] https://lwn.net/Articles/738214/
110+
[2] https://lwn.net/Articles/742164/
111+
[3] http://lists.infradead.org/pipermail/lede-dev/2018-January/010830.html
112+
[4] http://lists.infradead.org/pipermail/lede-dev/2018-January/010829.html

include/linux/netfilter/nfnetlink_acct.h

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,5 @@ struct nf_acct;
1616
struct nf_acct *nfnl_acct_find_get(struct net *net, const char *filter_name);
1717
void nfnl_acct_put(struct nf_acct *acct);
1818
void nfnl_acct_update(const struct sk_buff *skb, struct nf_acct *nfacct);
19-
int nfnl_acct_overquota(struct net *net, const struct sk_buff *skb,
20-
struct nf_acct *nfacct);
19+
int nfnl_acct_overquota(struct net *net, struct nf_acct *nfacct);
2120
#endif /* _NFNL_ACCT_H */

include/linux/netfilter/x_tables.h

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,8 @@ int xt_check_entry_offsets(const void *base, const char *elems,
281281
unsigned int target_offset,
282282
unsigned int next_offset);
283283

284+
int xt_check_table_hooks(const struct xt_table_info *info, unsigned int valid_hooks);
285+
284286
unsigned int *xt_alloc_entry_offsets(unsigned int size);
285287
bool xt_find_jump_offset(const unsigned int *offsets,
286288
unsigned int target, unsigned int size);
@@ -301,6 +303,7 @@ int xt_data_to_user(void __user *dst, const void *src,
301303

302304
void *xt_copy_counters_from_user(const void __user *user, unsigned int len,
303305
struct xt_counters_info *info, bool compat);
306+
struct xt_counters *xt_counters_alloc(unsigned int counters);
304307

305308
struct xt_table *xt_register_table(struct net *net,
306309
const struct xt_table *table,
@@ -509,7 +512,7 @@ void xt_compat_unlock(u_int8_t af);
509512

510513
int xt_compat_add_offset(u_int8_t af, unsigned int offset, int delta);
511514
void xt_compat_flush_offsets(u_int8_t af);
512-
void xt_compat_init_offsets(u_int8_t af, unsigned int number);
515+
int xt_compat_init_offsets(u8 af, unsigned int number);
513516
int xt_compat_calc_jump(u_int8_t af, unsigned int offset);
514517

515518
int xt_compat_match_offset(const struct xt_match *match);

include/net/netfilter/nf_conntrack_count.h

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ void nf_conncount_destroy(struct net *net, unsigned int family,
1111
unsigned int nf_conncount_count(struct net *net,
1212
struct nf_conncount_data *data,
1313
const u32 *key,
14-
unsigned int family,
1514
const struct nf_conntrack_tuple *tuple,
1615
const struct nf_conntrack_zone *zone);
1716
#endif

include/net/netfilter/nf_conntrack_helper.h

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -132,8 +132,7 @@ void nf_conntrack_helper_pernet_fini(struct net *net);
132132
int nf_conntrack_helper_init(void);
133133
void nf_conntrack_helper_fini(void);
134134

135-
int nf_conntrack_broadcast_help(struct sk_buff *skb, unsigned int protoff,
136-
struct nf_conn *ct,
135+
int nf_conntrack_broadcast_help(struct sk_buff *skb, struct nf_conn *ct,
137136
enum ip_conntrack_info ctinfo,
138137
unsigned int timeout);
139138

include/net/netfilter/nf_tables.h

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -434,11 +434,11 @@ static inline struct nft_set *nft_set_container_of(const void *priv)
434434
return (void *)priv - offsetof(struct nft_set, data);
435435
}
436436

437-
struct nft_set *nft_set_lookup(const struct net *net,
438-
const struct nft_table *table,
439-
const struct nlattr *nla_set_name,
440-
const struct nlattr *nla_set_id,
441-
u8 genmask);
437+
struct nft_set *nft_set_lookup_global(const struct net *net,
438+
const struct nft_table *table,
439+
const struct nlattr *nla_set_name,
440+
const struct nlattr *nla_set_id,
441+
u8 genmask);
442442

443443
static inline unsigned long nft_set_gc_interval(const struct nft_set *set)
444444
{
@@ -868,34 +868,38 @@ struct nft_chain {
868868
char *name;
869869
};
870870

871-
enum nft_chain_type {
871+
enum nft_chain_types {
872872
NFT_CHAIN_T_DEFAULT = 0,
873873
NFT_CHAIN_T_ROUTE,
874874
NFT_CHAIN_T_NAT,
875875
NFT_CHAIN_T_MAX
876876
};
877877

878878
/**
879-
* struct nf_chain_type - nf_tables chain type info
879+
* struct nft_chain_type - nf_tables chain type info
880880
*
881881
* @name: name of the type
882882
* @type: numeric identifier
883883
* @family: address family
884884
* @owner: module owner
885885
* @hook_mask: mask of valid hooks
886886
* @hooks: array of hook functions
887+
* @init: chain initialization function
888+
* @free: chain release function
887889
*/
888-
struct nf_chain_type {
890+
struct nft_chain_type {
889891
const char *name;
890-
enum nft_chain_type type;
892+
enum nft_chain_types type;
891893
int family;
892894
struct module *owner;
893895
unsigned int hook_mask;
894896
nf_hookfn *hooks[NF_MAX_HOOKS];
897+
int (*init)(struct nft_ctx *ctx);
898+
void (*free)(struct nft_ctx *ctx);
895899
};
896900

897901
int nft_chain_validate_dependency(const struct nft_chain *chain,
898-
enum nft_chain_type type);
902+
enum nft_chain_types type);
899903
int nft_chain_validate_hooks(const struct nft_chain *chain,
900904
unsigned int hook_flags);
901905

@@ -917,7 +921,7 @@ struct nft_stats {
917921
*/
918922
struct nft_base_chain {
919923
struct nf_hook_ops ops;
920-
const struct nf_chain_type *type;
924+
const struct nft_chain_type *type;
921925
u8 policy;
922926
u8 flags;
923927
struct nft_stats __percpu *stats;
@@ -970,8 +974,8 @@ struct nft_table {
970974
char *name;
971975
};
972976

973-
int nft_register_chain_type(const struct nf_chain_type *);
974-
void nft_unregister_chain_type(const struct nf_chain_type *);
977+
void nft_register_chain_type(const struct nft_chain_type *);
978+
void nft_unregister_chain_type(const struct nft_chain_type *);
975979

976980
int nft_register_expr(struct nft_expr_type *);
977981
void nft_unregister_expr(struct nft_expr_type *);
@@ -1345,4 +1349,7 @@ struct nft_trans_flowtable {
13451349
#define nft_trans_flowtable(trans) \
13461350
(((struct nft_trans_flowtable *)trans->data)->flowtable)
13471351

1352+
int __init nft_chain_filter_init(void);
1353+
void __exit nft_chain_filter_fini(void);
1354+
13481355
#endif /* _NET_NF_TABLES_H */

include/net/netfilter/xt_rateest.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ struct xt_rateest {
2121
struct net_rate_estimator __rcu *rate_est;
2222
};
2323

24-
struct xt_rateest *xt_rateest_lookup(const char *name);
25-
void xt_rateest_put(struct xt_rateest *est);
24+
struct xt_rateest *xt_rateest_lookup(struct net *net, const char *name);
25+
void xt_rateest_put(struct net *net, struct xt_rateest *est);
2626

2727
#endif /* _XT_RATEEST_H */

include/uapi/linux/netfilter/nf_conntrack_common.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,7 @@ enum ip_conntrack_events {
129129
IPCT_NATSEQADJ = IPCT_SEQADJ,
130130
IPCT_SECMARK, /* new security mark has been set */
131131
IPCT_LABEL, /* new connlabel has been set */
132+
IPCT_SYNPROXY, /* synproxy has been set */
132133
#ifdef __KERNEL__
133134
__IPCT_MAX
134135
#endif

include/uapi/linux/netfilter/nf_tables.h

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -909,8 +909,8 @@ enum nft_rt_attributes {
909909
* @NFT_CT_EXPIRATION: relative conntrack expiration time in ms
910910
* @NFT_CT_HELPER: connection tracking helper assigned to conntrack
911911
* @NFT_CT_L3PROTOCOL: conntrack layer 3 protocol
912-
* @NFT_CT_SRC: conntrack layer 3 protocol source (IPv4/IPv6 address)
913-
* @NFT_CT_DST: conntrack layer 3 protocol destination (IPv4/IPv6 address)
912+
* @NFT_CT_SRC: conntrack layer 3 protocol source (IPv4/IPv6 address, deprecated)
913+
* @NFT_CT_DST: conntrack layer 3 protocol destination (IPv4/IPv6 address, deprecated)
914914
* @NFT_CT_PROTOCOL: conntrack layer 4 protocol
915915
* @NFT_CT_PROTO_SRC: conntrack layer 4 protocol source
916916
* @NFT_CT_PROTO_DST: conntrack layer 4 protocol destination
@@ -920,6 +920,10 @@ enum nft_rt_attributes {
920920
* @NFT_CT_AVGPKT: conntrack average bytes per packet
921921
* @NFT_CT_ZONE: conntrack zone
922922
* @NFT_CT_EVENTMASK: ctnetlink events to be generated for this conntrack
923+
* @NFT_CT_SRC_IP: conntrack layer 3 protocol source (IPv4 address)
924+
* @NFT_CT_DST_IP: conntrack layer 3 protocol destination (IPv4 address)
925+
* @NFT_CT_SRC_IP6: conntrack layer 3 protocol source (IPv6 address)
926+
* @NFT_CT_DST_IP6: conntrack layer 3 protocol destination (IPv6 address)
923927
*/
924928
enum nft_ct_keys {
925929
NFT_CT_STATE,
@@ -941,6 +945,10 @@ enum nft_ct_keys {
941945
NFT_CT_AVGPKT,
942946
NFT_CT_ZONE,
943947
NFT_CT_EVENTMASK,
948+
NFT_CT_SRC_IP,
949+
NFT_CT_DST_IP,
950+
NFT_CT_SRC_IP6,
951+
NFT_CT_DST_IP6,
944952
};
945953

946954
/**

include/uapi/linux/netfilter/nfnetlink_conntrack.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ enum ctattr_type {
5454
CTA_MARK_MASK,
5555
CTA_LABELS,
5656
CTA_LABELS_MASK,
57+
CTA_SYNPROXY,
5758
__CTA_MAX
5859
};
5960
#define CTA_MAX (__CTA_MAX - 1)
@@ -190,6 +191,15 @@ enum ctattr_natseq {
190191
};
191192
#define CTA_NAT_SEQ_MAX (__CTA_NAT_SEQ_MAX - 1)
192193

194+
enum ctattr_synproxy {
195+
CTA_SYNPROXY_UNSPEC,
196+
CTA_SYNPROXY_ISN,
197+
CTA_SYNPROXY_ITS,
198+
CTA_SYNPROXY_TSOFF,
199+
__CTA_SYNPROXY_MAX,
200+
};
201+
#define CTA_SYNPROXY_MAX (__CTA_SYNPROXY_MAX - 1)
202+
193203
enum ctattr_expect {
194204
CTA_EXPECT_UNSPEC,
195205
CTA_EXPECT_MASTER,

include/uapi/linux/netfilter/xt_connmark.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,21 @@ enum {
1919
XT_CONNMARK_RESTORE
2020
};
2121

22+
enum {
23+
D_SHIFT_LEFT = 0,
24+
D_SHIFT_RIGHT,
25+
};
26+
2227
struct xt_connmark_tginfo1 {
2328
__u32 ctmark, ctmask, nfmask;
2429
__u8 mode;
2530
};
2631

32+
struct xt_connmark_tginfo2 {
33+
__u32 ctmark, ctmask, nfmask;
34+
__u8 shift_dir, shift_bits, mode;
35+
};
36+
2737
struct xt_connmark_mtinfo1 {
2838
__u32 mark, mask;
2939
__u8 invert;

include/uapi/linux/netfilter_bridge/ebt_ip.h

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,10 @@
2424
#define EBT_IP_PROTO 0x08
2525
#define EBT_IP_SPORT 0x10
2626
#define EBT_IP_DPORT 0x20
27+
#define EBT_IP_ICMP 0x40
28+
#define EBT_IP_IGMP 0x80
2729
#define EBT_IP_MASK (EBT_IP_SOURCE | EBT_IP_DEST | EBT_IP_TOS | EBT_IP_PROTO |\
28-
EBT_IP_SPORT | EBT_IP_DPORT )
30+
EBT_IP_SPORT | EBT_IP_DPORT | EBT_IP_ICMP | EBT_IP_IGMP)
2931
#define EBT_IP_MATCH "ip"
3032

3133
/* the same values are used for the invflags */
@@ -38,8 +40,15 @@ struct ebt_ip_info {
3840
__u8 protocol;
3941
__u8 bitmask;
4042
__u8 invflags;
41-
__u16 sport[2];
42-
__u16 dport[2];
43+
union {
44+
__u16 sport[2];
45+
__u8 icmp_type[2];
46+
__u8 igmp_type[2];
47+
};
48+
union {
49+
__u16 dport[2];
50+
__u8 icmp_code[2];
51+
};
4352
};
4453

4554
#endif

0 commit comments

Comments
 (0)