Skip to content

Commit 144748e

Browse files
jrfastabborkmann
authored andcommitted
bpf, sockmap: Fix incorrect fwd_alloc accounting
Incorrect accounting fwd_alloc can result in a warning when the socket is torn down, [18455.319240] WARNING: CPU: 0 PID: 24075 at net/core/stream.c:208 sk_stream_kill_queues+0x21f/0x230 [...] [18455.319543] Call Trace: [18455.319556] inet_csk_destroy_sock+0xba/0x1f0 [18455.319577] tcp_rcv_state_process+0x1b4e/0x2380 [18455.319593] ? lock_downgrade+0x3a0/0x3a0 [18455.319617] ? tcp_finish_connect+0x1e0/0x1e0 [18455.319631] ? sk_reset_timer+0x15/0x70 [18455.319646] ? tcp_schedule_loss_probe+0x1b2/0x240 [18455.319663] ? lock_release+0xb2/0x3f0 [18455.319676] ? __release_sock+0x8a/0x1b0 [18455.319690] ? lock_downgrade+0x3a0/0x3a0 [18455.319704] ? lock_release+0x3f0/0x3f0 [18455.319717] ? __tcp_close+0x2c6/0x790 [18455.319736] ? tcp_v4_do_rcv+0x168/0x370 [18455.319750] tcp_v4_do_rcv+0x168/0x370 [18455.319767] __release_sock+0xbc/0x1b0 [18455.319785] __tcp_close+0x2ee/0x790 [18455.319805] tcp_close+0x20/0x80 This currently happens because on redirect case we do skb_set_owner_r() with the original sock. This increments the fwd_alloc memory accounting on the original sock. Then on redirect we may push this into the queue of the psock we are redirecting to. When the skb is flushed from the queue we give the memory back to the original sock. The problem is if the original sock is destroyed/closed with skbs on another psocks queue then the original sock will not have a way to reclaim the memory before being destroyed. Then above warning will be thrown sockA sockB sk_psock_strp_read() sk_psock_verdict_apply() -- SK_REDIRECT -- sk_psock_skb_redirect() skb_queue_tail(psock_other->ingress_skb..) sk_close() sock_map_unref() sk_psock_put() sk_psock_drop() sk_psock_zap_ingress() At this point we have torn down our own psock, but have the outstanding skb in psock_other. Note that SK_PASS doesn't have this problem because the sk_psock_drop() logic releases the skb, its still associated with our psock. To resolve lets only account for sockets on the ingress queue that are still associated with the current socket. On the redirect case we will check memory limits per 6fa9201, but will omit fwd_alloc accounting until skb is actually enqueued. When the skb is sent via skb_send_sock_locked or received with sk_psock_skb_ingress memory will be claimed on psock_other. Fixes: 6fa9201 ("bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self") Reported-by: Andrii Nakryiko <[email protected]> Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/161731444013.68884.4021114312848535993.stgit@john-XPS-13-9370
1 parent 1c84b33 commit 144748e

File tree

1 file changed

+5
-7
lines changed

1 file changed

+5
-7
lines changed

net/core/skmsg.c

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -488,6 +488,7 @@ static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb
488488
if (unlikely(!msg))
489489
return -EAGAIN;
490490
sk_msg_init(msg);
491+
skb_set_owner_r(skb, sk);
491492
return sk_psock_skb_ingress_enqueue(skb, psock, sk, msg);
492493
}
493494

@@ -790,7 +791,6 @@ static void sk_psock_tls_verdict_apply(struct sk_buff *skb, struct sock *sk, int
790791
{
791792
switch (verdict) {
792793
case __SK_REDIRECT:
793-
skb_set_owner_r(skb, sk);
794794
sk_psock_skb_redirect(skb);
795795
break;
796796
case __SK_PASS:
@@ -808,10 +808,6 @@ int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb)
808808
rcu_read_lock();
809809
prog = READ_ONCE(psock->progs.skb_verdict);
810810
if (likely(prog)) {
811-
/* We skip full set_owner_r here because if we do a SK_PASS
812-
* or SK_DROP we can skip skb memory accounting and use the
813-
* TLS context.
814-
*/
815811
skb->sk = psock->sk;
816812
tcp_skb_bpf_redirect_clear(skb);
817813
ret = sk_psock_bpf_run(psock, prog, skb);
@@ -880,12 +876,13 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb)
880876
kfree_skb(skb);
881877
goto out;
882878
}
883-
skb_set_owner_r(skb, sk);
884879
prog = READ_ONCE(psock->progs.skb_verdict);
885880
if (likely(prog)) {
881+
skb->sk = sk;
886882
tcp_skb_bpf_redirect_clear(skb);
887883
ret = sk_psock_bpf_run(psock, prog, skb);
888884
ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb));
885+
skb->sk = NULL;
889886
}
890887
sk_psock_verdict_apply(psock, skb, ret);
891888
out:
@@ -956,12 +953,13 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb,
956953
kfree_skb(skb);
957954
goto out;
958955
}
959-
skb_set_owner_r(skb, sk);
960956
prog = READ_ONCE(psock->progs.skb_verdict);
961957
if (likely(prog)) {
958+
skb->sk = sk;
962959
tcp_skb_bpf_redirect_clear(skb);
963960
ret = sk_psock_bpf_run(psock, prog, skb);
964961
ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb));
962+
skb->sk = NULL;
965963
}
966964
sk_psock_verdict_apply(psock, skb, ret);
967965
out:

0 commit comments

Comments
 (0)