Skip to content

Commit 23f347e

Browse files
committed
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates from David Miller: 1) Fix inaccuracies in network driver interface documentation, from Ben Hutchings. 2) Fix handling of negative offsets in BPF JITs, from Jan Seiffert. 3) Compile warning, locking, and refcounting fixes in netfilter's xt_CT, from Pablo Neira Ayuso. 4) phonet sendmsg needs to validate user length just like any other datagram protocol, fix from Sasha Levin. 5) Ipv6 multicast code uses wrong loop index, from RongQing Li. 6) Link handling and firmware fixes in bnx2x driver from Yaniv Rosner and Yuval Mintz. 7) mlx4 erroneously allocates 4 pages at a time, regardless of page size, fix from Thadeu Lima de Souza Cascardo. 8) SCTP socket option wasn't extended in a backwards compatible way, fix from Thomas Graf. 9) Add missing address change event emissions to bonding, from Shlomo Pongratz. 10) /proc/net/dev regressed because it uses a private offset to track where we are in the hash table, but this doesn't track the offset pullback that the seq_file code does resulting in some entries being missed in large dumps. Fix from Eric Dumazet. 11) do_tcp_sendpage() unloads the send queue way too fast, because it invokes tcp_push() when it shouldn't. Let the natural sequence generated by the splice paths, and the assosciated MSG_MORE settings, guide the tcp_push() calls. Otherwise what goes out of TCP is spaghetti and doesn't batch effectively into GSO/TSO clusters. From Eric Dumazet. 12) Once we put a SKB into either the netlink receiver's queue or a socket error queue, it can be consumed and freed up, therefore we cannot touch it after queueing it like that. Fixes from Eric Dumazet. 13) PPP has this annoying behavior in that for every transmit call it immediately stops the TX queue, then calls down into the next layer to transmit the PPP frame. But if that next layer can take it immediately, it just un-stops the TX queue right before returning from the transmit method. Besides being useless work, it makes several facilities unusable, in particular things like the equalizers. Well behaved devices should only stop the TX queue when they really are full, and in PPP's case when it gets backlogged to the downstream device. David Woodhouse therefore fixed PPP to not stop the TX queue until it's downstream can't take data any more. 14) IFF_UNICAST_FLT got accidently lost in some recent stmmac driver changes, re-add. From Marc Kleine-Budde. 15) Fix link flaps in ixgbe, from Eric W. Multanen. 16) Descriptor writeback fixes in e1000e from Matthew Vick. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits) net: fix a race in sock_queue_err_skb() netlink: fix races after skb queueing doc, net: Update ndo_start_xmit return type and values doc, net: Remove instruction to set net_device::trans_start doc, net: Update netdev operation names doc, net: Update documentation of synchronisation for TX multiqueue doc, net: Remove obsolete reference to dev->poll ethtool: Remove exception to the requirement of holding RTNL lock MAINTAINERS: update for Marvell Ethernet drivers bonding: properly unset current_arp_slave on slave link up phonet: Check input from user before allocating tcp: tcp_sendpages() should call tcp_push() once ipv6: fix array index in ip6_mc_add_src() mlx4: allocate just enough pages instead of always 4 pages stmmac: re-add IFF_UNICAST_FLT for dwmac1000 bnx2x: Clear MDC/MDIO warning message bnx2x: Fix BCM57711+BCM84823 link issue bnx2x: Clear BCM84833 LED after fan failure bnx2x: Fix BCM84833 PHY FW version presentation bnx2x: Fix link issue for BCM8727 boards. ...
2 parents 314489b + 110c433 commit 23f347e

File tree

38 files changed

+602
-376
lines changed

38 files changed

+602
-376
lines changed

Documentation/networking/driver.txt

Lines changed: 15 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,16 @@ Document about softnet driver issues
22

33
Transmit path guidelines:
44

5-
1) The hard_start_xmit method must never return '1' under any
6-
normal circumstances. It is considered a hard error unless
5+
1) The ndo_start_xmit method must not return NETDEV_TX_BUSY under
6+
any normal circumstances. It is considered a hard error unless
77
there is no way your device can tell ahead of time when it's
88
transmit function will become busy.
99

1010
Instead it must maintain the queue properly. For example,
1111
for a driver implementing scatter-gather this means:
1212

13-
static int drv_hard_start_xmit(struct sk_buff *skb,
14-
struct net_device *dev)
13+
static netdev_tx_t drv_hard_start_xmit(struct sk_buff *skb,
14+
struct net_device *dev)
1515
{
1616
struct drv *dp = netdev_priv(dev);
1717

@@ -23,7 +23,7 @@ Transmit path guidelines:
2323
unlock_tx(dp);
2424
printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n",
2525
dev->name);
26-
return 1;
26+
return NETDEV_TX_BUSY;
2727
}
2828

2929
... queue packet to card ...
@@ -35,6 +35,7 @@ Transmit path guidelines:
3535
...
3636
unlock_tx(dp);
3737
...
38+
return NETDEV_TX_OK;
3839
}
3940

4041
And then at the end of your TX reclamation event handling:
@@ -58,24 +59,22 @@ Transmit path guidelines:
5859
TX_BUFFS_AVAIL(dp) > 0)
5960
netif_wake_queue(dp->dev);
6061

61-
2) Do not forget to update netdev->trans_start to jiffies after
62-
each new tx packet is given to the hardware.
63-
64-
3) A hard_start_xmit method must not modify the shared parts of a
62+
2) An ndo_start_xmit method must not modify the shared parts of a
6563
cloned SKB.
6664

67-
4) Do not forget that once you return 0 from your hard_start_xmit
68-
method, it is your driver's responsibility to free up the SKB
69-
and in some finite amount of time.
65+
3) Do not forget that once you return NETDEV_TX_OK from your
66+
ndo_start_xmit method, it is your driver's responsibility to free
67+
up the SKB and in some finite amount of time.
7068

7169
For example, this means that it is not allowed for your TX
7270
mitigation scheme to let TX packets "hang out" in the TX
7371
ring unreclaimed forever if no new TX packets are sent.
7472
This error can deadlock sockets waiting for send buffer room
7573
to be freed up.
7674

77-
If you return 1 from the hard_start_xmit method, you must not keep
78-
any reference to that SKB and you must not attempt to free it up.
75+
If you return NETDEV_TX_BUSY from the ndo_start_xmit method, you
76+
must not keep any reference to that SKB and you must not attempt
77+
to free it up.
7978

8079
Probing guidelines:
8180

@@ -85,10 +84,10 @@ Probing guidelines:
8584

8685
Close/stop guidelines:
8786

88-
1) After the dev->stop routine has been called, the hardware must
87+
1) After the ndo_stop routine has been called, the hardware must
8988
not receive or transmit any data. All in flight packets must
9089
be aborted. If necessary, poll or wait for completion of
9190
any reset commands.
9291

93-
2) The dev->stop routine will be called by unregister_netdevice
92+
2) The ndo_stop routine will be called by unregister_netdevice
9493
if device is still UP.

Documentation/networking/ip-sysctl.txt

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -604,15 +604,8 @@ IP Variables:
604604
ip_local_port_range - 2 INTEGERS
605605
Defines the local port range that is used by TCP and UDP to
606606
choose the local port. The first number is the first, the
607-
second the last local port number. Default value depends on
608-
amount of memory available on the system:
609-
> 128Mb 32768-61000
610-
< 128Mb 1024-4999 or even less.
611-
This number defines number of active connections, which this
612-
system can issue simultaneously to systems not supporting
613-
TCP extensions (timestamps). With tcp_tw_recycle enabled
614-
(i.e. by default) range 1024-4999 is enough to issue up to
615-
2000 connections per second to systems supporting timestamps.
607+
second the last local port number. The default values are
608+
32768 and 61000 respectively.
616609

617610
ip_local_reserved_ports - list of comma separated ranges
618611
Specify the ports which are reserved for known third-party

Documentation/networking/netdevices.txt

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -47,26 +47,25 @@ packets is preferred.
4747

4848
struct net_device synchronization rules
4949
=======================================
50-
dev->open:
50+
ndo_open:
5151
Synchronization: rtnl_lock() semaphore.
5252
Context: process
5353

54-
dev->stop:
54+
ndo_stop:
5555
Synchronization: rtnl_lock() semaphore.
5656
Context: process
57-
Note1: netif_running() is guaranteed false
58-
Note2: dev->poll() is guaranteed to be stopped
57+
Note: netif_running() is guaranteed false
5958

60-
dev->do_ioctl:
59+
ndo_do_ioctl:
6160
Synchronization: rtnl_lock() semaphore.
6261
Context: process
6362

64-
dev->get_stats:
63+
ndo_get_stats:
6564
Synchronization: dev_base_lock rwlock.
6665
Context: nominally process, but don't sleep inside an rwlock
6766

68-
dev->hard_start_xmit:
69-
Synchronization: netif_tx_lock spinlock.
67+
ndo_start_xmit:
68+
Synchronization: __netif_tx_lock spinlock.
7069

7170
When the driver sets NETIF_F_LLTX in dev->features this will be
7271
called without holding netif_tx_lock. In this case the driver
@@ -87,20 +86,20 @@ dev->hard_start_xmit:
8786
o NETDEV_TX_LOCKED Locking failed, please retry quickly.
8887
Only valid when NETIF_F_LLTX is set.
8988

90-
dev->tx_timeout:
91-
Synchronization: netif_tx_lock spinlock.
89+
ndo_tx_timeout:
90+
Synchronization: netif_tx_lock spinlock; all TX queues frozen.
9291
Context: BHs disabled
9392
Notes: netif_queue_stopped() is guaranteed true
9493

95-
dev->set_rx_mode:
96-
Synchronization: netif_tx_lock spinlock.
94+
ndo_set_rx_mode:
95+
Synchronization: netif_addr_lock spinlock.
9796
Context: BHs disabled
9897

9998
struct napi_struct synchronization rules
10099
========================================
101100
napi->poll:
102101
Synchronization: NAPI_STATE_SCHED bit in napi->state. Device
103-
driver's dev->close method will invoke napi_disable() on
102+
driver's ndo_stop method will invoke napi_disable() on
104103
all NAPI instances which will do a sleeping poll on the
105104
NAPI_STATE_SCHED napi->state bit, waiting for all pending
106105
NAPI activity to cease.

MAINTAINERS

Lines changed: 7 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4309,6 +4309,13 @@ W: http://www.kernel.org/doc/man-pages
43094309
43104310
S: Maintained
43114311

4312+
MARVELL GIGABIT ETHERNET DRIVERS (skge/sky2)
4313+
M: Mirko Lindner <[email protected]>
4314+
M: Stephen Hemminger <[email protected]>
4315+
4316+
S: Maintained
4317+
F: drivers/net/ethernet/marvell/sk*
4318+
43124319
MARVELL LIBERTAS WIRELESS DRIVER
43134320
M: Dan Williams <[email protected]>
43144321
@@ -4339,12 +4346,6 @@ M: Nicolas Pitre <[email protected]>
43394346
S: Odd Fixes
43404347
F: drivers/mmc/host/mvsdio.*
43414348

4342-
MARVELL YUKON / SYSKONNECT DRIVER
4343-
M: Mirko Lindner <[email protected]>
4344-
M: Ralph Roesler <[email protected]>
4345-
W: http://www.syskonnect.com
4346-
S: Supported
4347-
43484349
MATROX FRAMEBUFFER DRIVER
43494350
43504351
S: Orphan
@@ -6116,12 +6117,6 @@ W: http://www.winischhofer.at/linuxsisusbvga.shtml
61166117
S: Maintained
61176118
F: drivers/usb/misc/sisusbvga/
61186119

6119-
SKGE, SKY2 10/100/1000 GIGABIT ETHERNET DRIVERS
6120-
M: Stephen Hemminger <[email protected]>
6121-
6122-
S: Maintained
6123-
F: drivers/net/ethernet/marvell/sk*
6124-
61256120
SLAB ALLOCATOR
61266121
M: Christoph Lameter <[email protected]>
61276122
M: Pekka Enberg <[email protected]>

arch/x86/net/bpf_jit.S

Lines changed: 91 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -18,17 +18,17 @@
1818
* r9d : hlen = skb->len - skb->data_len
1919
*/
2020
#define SKBDATA %r8
21-
22-
sk_load_word_ind:
23-
.globl sk_load_word_ind
24-
25-
add %ebx,%esi /* offset += X */
26-
# test %esi,%esi /* if (offset < 0) goto bpf_error; */
27-
js bpf_error
21+
#define SKF_MAX_NEG_OFF $(-0x200000) /* SKF_LL_OFF from filter.h */
2822

2923
sk_load_word:
3024
.globl sk_load_word
3125

26+
test %esi,%esi
27+
js bpf_slow_path_word_neg
28+
29+
sk_load_word_positive_offset:
30+
.globl sk_load_word_positive_offset
31+
3232
mov %r9d,%eax # hlen
3333
sub %esi,%eax # hlen - offset
3434
cmp $3,%eax
@@ -37,16 +37,15 @@ sk_load_word:
3737
bswap %eax /* ntohl() */
3838
ret
3939

40-
41-
sk_load_half_ind:
42-
.globl sk_load_half_ind
43-
44-
add %ebx,%esi /* offset += X */
45-
js bpf_error
46-
4740
sk_load_half:
4841
.globl sk_load_half
4942

43+
test %esi,%esi
44+
js bpf_slow_path_half_neg
45+
46+
sk_load_half_positive_offset:
47+
.globl sk_load_half_positive_offset
48+
5049
mov %r9d,%eax
5150
sub %esi,%eax # hlen - offset
5251
cmp $1,%eax
@@ -55,14 +54,15 @@ sk_load_half:
5554
rol $8,%ax # ntohs()
5655
ret
5756

58-
sk_load_byte_ind:
59-
.globl sk_load_byte_ind
60-
add %ebx,%esi /* offset += X */
61-
js bpf_error
62-
6357
sk_load_byte:
6458
.globl sk_load_byte
6559

60+
test %esi,%esi
61+
js bpf_slow_path_byte_neg
62+
63+
sk_load_byte_positive_offset:
64+
.globl sk_load_byte_positive_offset
65+
6666
cmp %esi,%r9d /* if (offset >= hlen) goto bpf_slow_path_byte */
6767
jle bpf_slow_path_byte
6868
movzbl (SKBDATA,%rsi),%eax
@@ -73,25 +73,21 @@ sk_load_byte:
7373
*
7474
* Implements BPF_S_LDX_B_MSH : ldxb 4*([offset]&0xf)
7575
* Must preserve A accumulator (%eax)
76-
* Inputs : %esi is the offset value, already known positive
76+
* Inputs : %esi is the offset value
7777
*/
78-
ENTRY(sk_load_byte_msh)
79-
CFI_STARTPROC
78+
sk_load_byte_msh:
79+
.globl sk_load_byte_msh
80+
test %esi,%esi
81+
js bpf_slow_path_byte_msh_neg
82+
83+
sk_load_byte_msh_positive_offset:
84+
.globl sk_load_byte_msh_positive_offset
8085
cmp %esi,%r9d /* if (offset >= hlen) goto bpf_slow_path_byte_msh */
8186
jle bpf_slow_path_byte_msh
8287
movzbl (SKBDATA,%rsi),%ebx
8388
and $15,%bl
8489
shl $2,%bl
8590
ret
86-
CFI_ENDPROC
87-
ENDPROC(sk_load_byte_msh)
88-
89-
bpf_error:
90-
# force a return 0 from jit handler
91-
xor %eax,%eax
92-
mov -8(%rbp),%rbx
93-
leaveq
94-
ret
9591

9692
/* rsi contains offset and can be scratched */
9793
#define bpf_slow_path_common(LEN) \
@@ -138,3 +134,67 @@ bpf_slow_path_byte_msh:
138134
shl $2,%al
139135
xchg %eax,%ebx
140136
ret
137+
138+
#define sk_negative_common(SIZE) \
139+
push %rdi; /* save skb */ \
140+
push %r9; \
141+
push SKBDATA; \
142+
/* rsi already has offset */ \
143+
mov $SIZE,%ecx; /* size */ \
144+
call bpf_internal_load_pointer_neg_helper; \
145+
test %rax,%rax; \
146+
pop SKBDATA; \
147+
pop %r9; \
148+
pop %rdi; \
149+
jz bpf_error
150+
151+
152+
bpf_slow_path_word_neg:
153+
cmp SKF_MAX_NEG_OFF, %esi /* test range */
154+
jl bpf_error /* offset lower -> error */
155+
sk_load_word_negative_offset:
156+
.globl sk_load_word_negative_offset
157+
sk_negative_common(4)
158+
mov (%rax), %eax
159+
bswap %eax
160+
ret
161+
162+
bpf_slow_path_half_neg:
163+
cmp SKF_MAX_NEG_OFF, %esi
164+
jl bpf_error
165+
sk_load_half_negative_offset:
166+
.globl sk_load_half_negative_offset
167+
sk_negative_common(2)
168+
mov (%rax),%ax
169+
rol $8,%ax
170+
movzwl %ax,%eax
171+
ret
172+
173+
bpf_slow_path_byte_neg:
174+
cmp SKF_MAX_NEG_OFF, %esi
175+
jl bpf_error
176+
sk_load_byte_negative_offset:
177+
.globl sk_load_byte_negative_offset
178+
sk_negative_common(1)
179+
movzbl (%rax), %eax
180+
ret
181+
182+
bpf_slow_path_byte_msh_neg:
183+
cmp SKF_MAX_NEG_OFF, %esi
184+
jl bpf_error
185+
sk_load_byte_msh_negative_offset:
186+
.globl sk_load_byte_msh_negative_offset
187+
xchg %eax,%ebx /* dont lose A , X is about to be scratched */
188+
sk_negative_common(1)
189+
movzbl (%rax),%eax
190+
and $15,%al
191+
shl $2,%al
192+
xchg %eax,%ebx
193+
ret
194+
195+
bpf_error:
196+
# force a return 0 from jit handler
197+
xor %eax,%eax
198+
mov -8(%rbp),%rbx
199+
leaveq
200+
ret

0 commit comments

Comments
 (0)