Skip to content

Commit 3948b05

Browse files
Eric Dumazetkuba-moo
authored andcommitted
net: introduce a config option to tweak MAX_SKB_FRAGS
Currently, MAX_SKB_FRAGS value is 17. For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg() attempts order-3 allocations, stuffing 32768 bytes per frag. But with zero copy, we use order-0 pages. For BIG TCP to show its full potential, we add a config option to be able to fit up to 45 segments per skb. This is also needed for BIG TCP rx zerocopy, as zerocopy currently does not support skbs with frag list. We have used MAX_SKB_FRAGS=45 value for years at Google before we deployed 4K MTU, with no adverse effect, other than a recent issue in mlx4, fixed in commit 26782aa ("net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS") Back then, goal was to be able to receive full size (64KB) GRO packets without the frag_list overhead. Note that /proc/sys/net/core/max_skb_frags can also be used to limit the number of fragments TCP can use in tx packets. By default we keep the old/legacy value of 17 until we get more coverage for the updated values. Sizes of struct skb_shared_info on 64bit arches MAX_SKB_FRAGS | sizeof(struct skb_shared_info): ============================================== 17 320 21 320+64 = 384 25 320+128 = 448 29 320+192 = 512 33 320+256 = 576 37 320+320 = 640 41 320+384 = 704 45 320+448 = 768 This inflation might cause problems for drivers assuming they could pack both the incoming packet (for MTU=1500) and skb_shared_info in half a page, using build_skb(). v3: fix build error when CONFIG_NET=n v2: fix two build errors assuming MAX_SKB_FRAGS was "unsigned long" Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: Jason Xing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
1 parent e5b4248 commit 3948b05

File tree

4 files changed

+21
-15
lines changed

4 files changed

+21
-15
lines changed

drivers/scsi/cxgbi/libcxgbi.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2314,9 +2314,9 @@ static int cxgbi_sock_tx_queue_up(struct cxgbi_sock *csk, struct sk_buff *skb)
23142314
frags++;
23152315

23162316
if (frags >= SKB_WR_LIST_SIZE) {
2317-
pr_err("csk 0x%p, frags %u, %u,%u >%lu.\n",
2317+
pr_err("csk 0x%p, frags %u, %u,%u >%u.\n",
23182318
csk, skb_shinfo(skb)->nr_frags, skb->len,
2319-
skb->data_len, SKB_WR_LIST_SIZE);
2319+
skb->data_len, (unsigned int)SKB_WR_LIST_SIZE);
23202320
return -EINVAL;
23212321
}
23222322

include/linux/skbuff.h

Lines changed: 5 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -345,18 +345,12 @@ struct sk_buff_head {
345345

346346
struct sk_buff;
347347

348-
/* To allow 64K frame to be packed as single skb without frag_list we
349-
* require 64K/PAGE_SIZE pages plus 1 additional page to allow for
350-
* buffers which do not start on a page boundary.
351-
*
352-
* Since GRO uses frags we allocate at least 16 regardless of page
353-
* size.
354-
*/
355-
#if (65536/PAGE_SIZE + 1) < 16
356-
#define MAX_SKB_FRAGS 16UL
357-
#else
358-
#define MAX_SKB_FRAGS (65536/PAGE_SIZE + 1)
348+
#ifndef CONFIG_MAX_SKB_FRAGS
349+
# define CONFIG_MAX_SKB_FRAGS 17
359350
#endif
351+
352+
#define MAX_SKB_FRAGS CONFIG_MAX_SKB_FRAGS
353+
360354
extern int sysctl_max_skb_frags;
361355

362356
/* Set skb_shinfo(skb)->gso_size to this in case you want skb_segment to

net/Kconfig

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -251,6 +251,18 @@ config PCPU_DEV_REFCNT
251251
network device refcount are using per cpu variables if this option is set.
252252
This can be forced to N to detect underflows (with a performance drop).
253253

254+
config MAX_SKB_FRAGS
255+
int "Maximum number of fragments per skb_shared_info"
256+
range 17 45
257+
default 17
258+
help
259+
Having more fragments per skb_shared_info can help GRO efficiency.
260+
This helps BIG TCP workloads, but might expose bugs in some
261+
legacy drivers.
262+
This also increases memory overhead of small packets,
263+
and in drivers using build_skb().
264+
If unsure, say 17.
265+
254266
config RPS
255267
bool
256268
depends on SMP && SYSFS

net/packet/af_packet.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2622,8 +2622,8 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
26222622
nr_frags = skb_shinfo(skb)->nr_frags;
26232623

26242624
if (unlikely(nr_frags >= MAX_SKB_FRAGS)) {
2625-
pr_err("Packet exceed the number of skb frags(%lu)\n",
2626-
MAX_SKB_FRAGS);
2625+
pr_err("Packet exceed the number of skb frags(%u)\n",
2626+
(unsigned int)MAX_SKB_FRAGS);
26272627
return -EFAULT;
26282628
}
26292629

0 commit comments

Comments
 (0)