Skip to content

Commit 7d38484

Browse files
committed
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains a second batch of Netfilter updates for your net-next tree. This includes a rework of the core hook infrastructure that improves Netfilter performance by ~15% according to synthetic benchmarks. Then, a large batch with ipset updates, including a new hash:ipmac set type, via Jozsef Kadlecsik. This also includes a couple of assorted updates. Regarding the core hook infrastructure rework to improve performance, using this simple drop-all packets ruleset from ingress: nft add table netdev x nft add chain netdev x y { type filter hook ingress device eth0 priority 0\; } nft add rule netdev x y drop And generating traffic through Jesper Brouer's samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh script using -i option. perf report shows nf_tables calls in its top 10: 17.30% kpktgend_0 [nf_tables] [k] nft_do_chain 15.75% kpktgend_0 [kernel.vmlinux] [k] __netif_receive_skb_core 10.39% kpktgend_0 [nf_tables_netdev] [k] nft_do_chain_netdev I'm measuring here an improvement of ~15% in performance with this patchset, so we got +2.5Mpps more. I have used my old laptop Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz 4-cores. This rework contains more specifically, in strict order, these patches: 1) Remove compile-time debugging from core. 2) Remove obsolete comments that predate the rcu era. These days it is well known that a Netfilter hook always runs under rcu_read_lock(). 3) Remove threshold handling, this is only used by br_netfilter too. We already have specific code to handle this from br_netfilter, so remove this code from the core path. 4) Deprecate NF_STOP, as this is only used by br_netfilter. 5) Place nf_state_hook pointer into xt_action_param structure, so this structure fits into one single cacheline according to pahole. This also implicit affects nftables since it also relies on the xt_action_param structure. 6) Move state->hook_entries into nf_queue entry. The hook_entries pointer is only required by nf_queue(), so we can store this in the queue entry instead. 7) use switch() statement to handle verdict cases. 8) Remove hook_entries field from nf_hook_state structure, this is only required by nf_queue, so store it in nf_queue_entry structure. 9) Merge nf_iterate() into nf_hook_slow() that results in a much more simple and readable function. 10) Handle NF_REPEAT away from the core, so far the only client is nf_conntrack_in() and we can restart the packet processing using a simple goto to jump back there when the TCP requires it. This update required a second pass to fix fallout, fix from Arnd Bergmann. 11) Set random seed from nft_hash when no seed is specified from userspace. 12) Simplify nf_tables expression registration, in a much smarter way to save lots of boiler plate code, by Liping Zhang. 13) Simplify layer 4 protocol conntrack tracker registration, from Davide Caratti. 14) Missing CONFIG_NF_SOCKET_IPV4 dependency for udp4_lib_lookup, due to recent generalization of the socket infrastructure, from Arnd Bergmann. 15) Then, the ipset batch from Jozsef, he describes it as it follows: * Cleanup: Remove extra whitespaces in ip_set.h * Cleanup: Mark some of the helpers arguments as const in ip_set.h * Cleanup: Group counter helper functions together in ip_set.h * struct ip_set_skbinfo is introduced instead of open coded fields in skbinfo get/init helper funcions. * Use kmalloc() in comment extension helper instead of kzalloc() because it is unnecessary to zero out the area just before explicit initialization. * Cleanup: Split extensions into separate files. * Cleanup: Separate memsize calculation code into dedicated function. * Cleanup: group ip_set_put_extensions() and ip_set_get_extensions() together. * Add element count to hash headers by Eric B Munson. * Add element count to all set types header for uniform output across all set types. * Count non-static extension memory into memsize calculation for userspace. * Cleanup: Remove redundant mtype_expire() arguments, because they can be get from other parameters. * Cleanup: Simplify mtype_expire() for hash types by removing one level of intendation. * Make NLEN compile time constant for hash types. * Make sure element data size is a multiple of u32 for the hash set types. * Optimize hash creation routine, exit as early as possible. * Make struct htype per ipset family so nets array becomes fixed size and thus simplifies the struct htype allocation. * Collapse same condition body into a single one. * Fix reported memory size for hash:* types, base hash bucket structure was not taken into account. * hash:ipmac type support added to ipset by Tomasz Chilinski. * Use setup_timer() and mod_timer() instead of init_timer() by Muhammad Falak R Wani, individually for the set type families. 16) Remove useless connlabel field in struct netns_ct, patch from Florian Westphal. 17) xt_find_table_lock() doesn't return ERR_PTR() anymore, so simplify {ip,ip6,arp}tables code that uses this. ==================== Signed-off-by: David S. Miller <[email protected]>
2 parents 8d41932 + eb1a6bd commit 7d38484

File tree

123 files changed

+1351
-1171
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

123 files changed

+1351
-1171
lines changed

include/linux/netfilter.h

Lines changed: 16 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -49,13 +49,11 @@ struct sock;
4949

5050
struct nf_hook_state {
5151
unsigned int hook;
52-
int thresh;
5352
u_int8_t pf;
5453
struct net_device *in;
5554
struct net_device *out;
5655
struct sock *sk;
5756
struct net *net;
58-
struct nf_hook_entry __rcu *hook_entries;
5957
int (*okfn)(struct net *, struct sock *, struct sk_buff *);
6058
};
6159

@@ -82,23 +80,20 @@ struct nf_hook_entry {
8280
};
8381

8482
static inline void nf_hook_state_init(struct nf_hook_state *p,
85-
struct nf_hook_entry *hook_entry,
8683
unsigned int hook,
87-
int thresh, u_int8_t pf,
84+
u_int8_t pf,
8885
struct net_device *indev,
8986
struct net_device *outdev,
9087
struct sock *sk,
9188
struct net *net,
9289
int (*okfn)(struct net *, struct sock *, struct sk_buff *))
9390
{
9491
p->hook = hook;
95-
p->thresh = thresh;
9692
p->pf = pf;
9793
p->in = indev;
9894
p->out = outdev;
9995
p->sk = sk;
10096
p->net = net;
101-
RCU_INIT_POINTER(p->hook_entries, hook_entry);
10297
p->okfn = okfn;
10398
}
10499

@@ -152,23 +147,20 @@ void nf_unregister_sockopt(struct nf_sockopt_ops *reg);
152147
extern struct static_key nf_hooks_needed[NFPROTO_NUMPROTO][NF_MAX_HOOKS];
153148
#endif
154149

155-
int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state);
150+
int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,
151+
struct nf_hook_entry *entry);
156152

157153
/**
158-
* nf_hook_thresh - call a netfilter hook
154+
* nf_hook - call a netfilter hook
159155
*
160156
* Returns 1 if the hook has allowed the packet to pass. The function
161157
* okfn must be invoked by the caller in this case. Any other return
162158
* value indicates the packet has been consumed by the hook.
163159
*/
164-
static inline int nf_hook_thresh(u_int8_t pf, unsigned int hook,
165-
struct net *net,
166-
struct sock *sk,
167-
struct sk_buff *skb,
168-
struct net_device *indev,
169-
struct net_device *outdev,
170-
int (*okfn)(struct net *, struct sock *, struct sk_buff *),
171-
int thresh)
160+
static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net,
161+
struct sock *sk, struct sk_buff *skb,
162+
struct net_device *indev, struct net_device *outdev,
163+
int (*okfn)(struct net *, struct sock *, struct sk_buff *))
172164
{
173165
struct nf_hook_entry *hook_head;
174166
int ret = 1;
@@ -185,24 +177,16 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned int hook,
185177
if (hook_head) {
186178
struct nf_hook_state state;
187179

188-
nf_hook_state_init(&state, hook_head, hook, thresh,
189-
pf, indev, outdev, sk, net, okfn);
180+
nf_hook_state_init(&state, hook, pf, indev, outdev,
181+
sk, net, okfn);
190182

191-
ret = nf_hook_slow(skb, &state);
183+
ret = nf_hook_slow(skb, &state, hook_head);
192184
}
193185
rcu_read_unlock();
194186

195187
return ret;
196188
}
197189

198-
static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net,
199-
struct sock *sk, struct sk_buff *skb,
200-
struct net_device *indev, struct net_device *outdev,
201-
int (*okfn)(struct net *, struct sock *, struct sk_buff *))
202-
{
203-
return nf_hook_thresh(pf, hook, net, sk, skb, indev, outdev, okfn, INT_MIN);
204-
}
205-
206190
/* Activate hook; either okfn or kfree_skb called, unless a hook
207191
returns NF_STOLEN (in which case, it's up to the hook to deal with
208192
the consequences).
@@ -220,19 +204,6 @@ static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net,
220204
coders :)
221205
*/
222206

223-
static inline int
224-
NF_HOOK_THRESH(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
225-
struct sk_buff *skb, struct net_device *in,
226-
struct net_device *out,
227-
int (*okfn)(struct net *, struct sock *, struct sk_buff *),
228-
int thresh)
229-
{
230-
int ret = nf_hook_thresh(pf, hook, net, sk, skb, in, out, okfn, thresh);
231-
if (ret == 1)
232-
ret = okfn(net, sk, skb);
233-
return ret;
234-
}
235-
236207
static inline int
237208
NF_HOOK_COND(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
238209
struct sk_buff *skb, struct net_device *in, struct net_device *out,
@@ -242,7 +213,7 @@ NF_HOOK_COND(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
242213
int ret;
243214

244215
if (!cond ||
245-
((ret = nf_hook_thresh(pf, hook, net, sk, skb, in, out, okfn, INT_MIN)) == 1))
216+
((ret = nf_hook(pf, hook, net, sk, skb, in, out, okfn)) == 1))
246217
ret = okfn(net, sk, skb);
247218
return ret;
248219
}
@@ -252,7 +223,10 @@ NF_HOOK(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk, struct
252223
struct net_device *in, struct net_device *out,
253224
int (*okfn)(struct net *, struct sock *, struct sk_buff *))
254225
{
255-
return NF_HOOK_THRESH(pf, hook, net, sk, skb, in, out, okfn, INT_MIN);
226+
int ret = nf_hook(pf, hook, net, sk, skb, in, out, okfn);
227+
if (ret == 1)
228+
ret = okfn(net, sk, skb);
229+
return ret;
256230
}
257231

258232
/* Call setsockopt() */

include/linux/netfilter/ipset/ip_set.h

Lines changed: 21 additions & 115 deletions
Original file line numberDiff line numberDiff line change
@@ -79,10 +79,12 @@ enum ip_set_ext_id {
7979
IPSET_EXT_ID_MAX,
8080
};
8181

82+
struct ip_set;
83+
8284
/* Extension type */
8385
struct ip_set_ext_type {
8486
/* Destroy extension private data (can be NULL) */
85-
void (*destroy)(void *ext);
87+
void (*destroy)(struct ip_set *set, void *ext);
8688
enum ip_set_extension type;
8789
enum ipset_cadt_flags flag;
8890
/* Size and minimal alignment */
@@ -92,17 +94,6 @@ struct ip_set_ext_type {
9294

9395
extern const struct ip_set_ext_type ip_set_extensions[];
9496

95-
struct ip_set_ext {
96-
u64 packets;
97-
u64 bytes;
98-
u32 timeout;
99-
u32 skbmark;
100-
u32 skbmarkmask;
101-
u32 skbprio;
102-
u16 skbqueue;
103-
char *comment;
104-
};
105-
10697
struct ip_set_counter {
10798
atomic64_t bytes;
10899
atomic64_t packets;
@@ -122,6 +113,15 @@ struct ip_set_skbinfo {
122113
u32 skbmarkmask;
123114
u32 skbprio;
124115
u16 skbqueue;
116+
u16 __pad;
117+
};
118+
119+
struct ip_set_ext {
120+
struct ip_set_skbinfo skbinfo;
121+
u64 packets;
122+
u64 bytes;
123+
char *comment;
124+
u32 timeout;
125125
};
126126

127127
struct ip_set;
@@ -252,6 +252,10 @@ struct ip_set {
252252
u8 flags;
253253
/* Default timeout value, if enabled */
254254
u32 timeout;
255+
/* Number of elements (vs timeout) */
256+
u32 elements;
257+
/* Size of the dynamic extensions (vs timeout) */
258+
size_t ext_size;
255259
/* Element data size */
256260
size_t dsize;
257261
/* Offsets to extensions in elements */
@@ -268,7 +272,7 @@ ip_set_ext_destroy(struct ip_set *set, void *data)
268272
*/
269273
if (SET_WITH_COMMENT(set))
270274
ip_set_extensions[IPSET_EXT_ID_COMMENT].destroy(
271-
ext_comment(data, set));
275+
set, ext_comment(data, set));
272276
}
273277

274278
static inline int
@@ -294,104 +298,6 @@ ip_set_put_flags(struct sk_buff *skb, struct ip_set *set)
294298
return nla_put_net32(skb, IPSET_ATTR_CADT_FLAGS, htonl(cadt_flags));
295299
}
296300

297-
static inline void
298-
ip_set_add_bytes(u64 bytes, struct ip_set_counter *counter)
299-
{
300-
atomic64_add((long long)bytes, &(counter)->bytes);
301-
}
302-
303-
static inline void
304-
ip_set_add_packets(u64 packets, struct ip_set_counter *counter)
305-
{
306-
atomic64_add((long long)packets, &(counter)->packets);
307-
}
308-
309-
static inline u64
310-
ip_set_get_bytes(const struct ip_set_counter *counter)
311-
{
312-
return (u64)atomic64_read(&(counter)->bytes);
313-
}
314-
315-
static inline u64
316-
ip_set_get_packets(const struct ip_set_counter *counter)
317-
{
318-
return (u64)atomic64_read(&(counter)->packets);
319-
}
320-
321-
static inline void
322-
ip_set_update_counter(struct ip_set_counter *counter,
323-
const struct ip_set_ext *ext,
324-
struct ip_set_ext *mext, u32 flags)
325-
{
326-
if (ext->packets != ULLONG_MAX &&
327-
!(flags & IPSET_FLAG_SKIP_COUNTER_UPDATE)) {
328-
ip_set_add_bytes(ext->bytes, counter);
329-
ip_set_add_packets(ext->packets, counter);
330-
}
331-
if (flags & IPSET_FLAG_MATCH_COUNTERS) {
332-
mext->packets = ip_set_get_packets(counter);
333-
mext->bytes = ip_set_get_bytes(counter);
334-
}
335-
}
336-
337-
static inline void
338-
ip_set_get_skbinfo(struct ip_set_skbinfo *skbinfo,
339-
const struct ip_set_ext *ext,
340-
struct ip_set_ext *mext, u32 flags)
341-
{
342-
mext->skbmark = skbinfo->skbmark;
343-
mext->skbmarkmask = skbinfo->skbmarkmask;
344-
mext->skbprio = skbinfo->skbprio;
345-
mext->skbqueue = skbinfo->skbqueue;
346-
}
347-
static inline bool
348-
ip_set_put_skbinfo(struct sk_buff *skb, struct ip_set_skbinfo *skbinfo)
349-
{
350-
/* Send nonzero parameters only */
351-
return ((skbinfo->skbmark || skbinfo->skbmarkmask) &&
352-
nla_put_net64(skb, IPSET_ATTR_SKBMARK,
353-
cpu_to_be64((u64)skbinfo->skbmark << 32 |
354-
skbinfo->skbmarkmask),
355-
IPSET_ATTR_PAD)) ||
356-
(skbinfo->skbprio &&
357-
nla_put_net32(skb, IPSET_ATTR_SKBPRIO,
358-
cpu_to_be32(skbinfo->skbprio))) ||
359-
(skbinfo->skbqueue &&
360-
nla_put_net16(skb, IPSET_ATTR_SKBQUEUE,
361-
cpu_to_be16(skbinfo->skbqueue)));
362-
}
363-
364-
static inline void
365-
ip_set_init_skbinfo(struct ip_set_skbinfo *skbinfo,
366-
const struct ip_set_ext *ext)
367-
{
368-
skbinfo->skbmark = ext->skbmark;
369-
skbinfo->skbmarkmask = ext->skbmarkmask;
370-
skbinfo->skbprio = ext->skbprio;
371-
skbinfo->skbqueue = ext->skbqueue;
372-
}
373-
374-
static inline bool
375-
ip_set_put_counter(struct sk_buff *skb, struct ip_set_counter *counter)
376-
{
377-
return nla_put_net64(skb, IPSET_ATTR_BYTES,
378-
cpu_to_be64(ip_set_get_bytes(counter)),
379-
IPSET_ATTR_PAD) ||
380-
nla_put_net64(skb, IPSET_ATTR_PACKETS,
381-
cpu_to_be64(ip_set_get_packets(counter)),
382-
IPSET_ATTR_PAD);
383-
}
384-
385-
static inline void
386-
ip_set_init_counter(struct ip_set_counter *counter,
387-
const struct ip_set_ext *ext)
388-
{
389-
if (ext->bytes != ULLONG_MAX)
390-
atomic64_set(&(counter)->bytes, (long long)(ext->bytes));
391-
if (ext->packets != ULLONG_MAX)
392-
atomic64_set(&(counter)->packets, (long long)(ext->packets));
393-
}
394-
395301
/* Netlink CB args */
396302
enum {
397303
IPSET_CB_NET = 0, /* net namespace */
@@ -431,6 +337,8 @@ extern size_t ip_set_elem_len(struct ip_set *set, struct nlattr *tb[],
431337
size_t len, size_t align);
432338
extern int ip_set_get_extensions(struct ip_set *set, struct nlattr *tb[],
433339
struct ip_set_ext *ext);
340+
extern int ip_set_put_extensions(struct sk_buff *skb, const struct ip_set *set,
341+
const void *e, bool active);
434342

435343
static inline int
436344
ip_set_get_hostipaddr4(struct nlattr *nla, u32 *ipaddr)
@@ -546,10 +454,8 @@ bitmap_bytes(u32 a, u32 b)
546454

547455
#include <linux/netfilter/ipset/ip_set_timeout.h>
548456
#include <linux/netfilter/ipset/ip_set_comment.h>
549-
550-
int
551-
ip_set_put_extensions(struct sk_buff *skb, const struct ip_set *set,
552-
const void *e, bool active);
457+
#include <linux/netfilter/ipset/ip_set_counter.h>
458+
#include <linux/netfilter/ipset/ip_set_skbinfo.h>
553459

554460
#define IP_SET_INIT_KEXT(skb, opt, set) \
555461
{ .bytes = (skb)->len, .packets = 1, \

include/linux/netfilter/ipset/ip_set_bitmap.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@
66
#define IPSET_BITMAP_MAX_RANGE 0x0000FFFF
77

88
enum {
9+
IPSET_ADD_STORE_PLAIN_TIMEOUT = -1,
910
IPSET_ADD_FAILED = 1,
10-
IPSET_ADD_STORE_PLAIN_TIMEOUT,
1111
IPSET_ADD_START_STORED_TIMEOUT,
1212
};
1313

include/linux/netfilter/ipset/ip_set_comment.h

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,30 +20,32 @@ ip_set_comment_uget(struct nlattr *tb)
2020
* The kadt functions don't use the comment extensions in any way.
2121
*/
2222
static inline void
23-
ip_set_init_comment(struct ip_set_comment *comment,
23+
ip_set_init_comment(struct ip_set *set, struct ip_set_comment *comment,
2424
const struct ip_set_ext *ext)
2525
{
2626
struct ip_set_comment_rcu *c = rcu_dereference_protected(comment->c, 1);
2727
size_t len = ext->comment ? strlen(ext->comment) : 0;
2828

2929
if (unlikely(c)) {
30+
set->ext_size -= sizeof(*c) + strlen(c->str) + 1;
3031
kfree_rcu(c, rcu);
3132
rcu_assign_pointer(comment->c, NULL);
3233
}
3334
if (!len)
3435
return;
3536
if (unlikely(len > IPSET_MAX_COMMENT_SIZE))
3637
len = IPSET_MAX_COMMENT_SIZE;
37-
c = kzalloc(sizeof(*c) + len + 1, GFP_ATOMIC);
38+
c = kmalloc(sizeof(*c) + len + 1, GFP_ATOMIC);
3839
if (unlikely(!c))
3940
return;
4041
strlcpy(c->str, ext->comment, len + 1);
42+
set->ext_size += sizeof(*c) + strlen(c->str) + 1;
4143
rcu_assign_pointer(comment->c, c);
4244
}
4345

4446
/* Used only when dumping a set, protected by rcu_read_lock_bh() */
4547
static inline int
46-
ip_set_put_comment(struct sk_buff *skb, struct ip_set_comment *comment)
48+
ip_set_put_comment(struct sk_buff *skb, const struct ip_set_comment *comment)
4749
{
4850
struct ip_set_comment_rcu *c = rcu_dereference_bh(comment->c);
4951

@@ -58,13 +60,14 @@ ip_set_put_comment(struct sk_buff *skb, struct ip_set_comment *comment)
5860
* of the set data anymore.
5961
*/
6062
static inline void
61-
ip_set_comment_free(struct ip_set_comment *comment)
63+
ip_set_comment_free(struct ip_set *set, struct ip_set_comment *comment)
6264
{
6365
struct ip_set_comment_rcu *c;
6466

6567
c = rcu_dereference_protected(comment->c, 1);
6668
if (unlikely(!c))
6769
return;
70+
set->ext_size -= sizeof(*c) + strlen(c->str) + 1;
6871
kfree_rcu(c, rcu);
6972
rcu_assign_pointer(comment->c, NULL);
7073
}

0 commit comments

Comments
 (0)