Skip to content

Commit c3874bb

Browse files
committed
Merge tag 'rxrpc-iothread-20240305' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says: ==================== Here are some changes to AF_RXRPC: (1) Cache the transmission serial number of ACK and DATA packets in the rxrpc_txbuf struct and log this in the retransmit tracepoint. (2) Don't use atomics on rxrpc_txbuf::flags[*] and cache the intended wire header flags there too to avoid duplication. (3) Cache the wire checksum in rxrpc_txbuf to make it easier to create jumbo packets in future (which will require altering the wire header to a jumbo header and restoring it back again for retransmission). (4) Fix the protocol names in the wire ACK trailer struct. (5) Strip all the barriers and atomics out of the call timer tracking[*]. (6) Remove atomic handling from call->tx_transmitted and call->acks_prev_seq[*]. (7) Don't bother resetting the DF flag after UDP packet transmission. To change it, we now call directly into UDP code, so it's quick just to set it every time. (8) Merge together the DF/non-DF branches of the DATA transmission to reduce duplication in the code. (9) Add a kvec array into rxrpc_txbuf and start moving things over to it. This paves the way for using page frags. (10) Split (sub)packet preparation and timestamping out of the DATA transmission function. This helps pave the way for future jumbo packet generation. (11) In rxkad, don't pick values out of the wire header stored in rxrpc_txbuf, buf rather find them elsewhere so we can remove the wire header from there. (12) Move rxrpc_send_ACK() to output.c so that it can be merged with rxrpc_send_ack_packet(). (13) Use rxrpc_txbuf::kvec[0] to access the wire header for the packet rather than directly accessing the copy in rxrpc_txbuf. This will allow that to be removed to a page frag. (14) Switch from keeping the transmission buffers in rxrpc_txbuf allocated in the slab to allocating them using page fragment allocators. There are separate allocators for DATA packets (which persist for a while) and control packets (which are discarded immediately). We can then turn on MSG_SPLICE_PAGES when transmitting DATA and ACK packets. We can also get rid of the RCU cleanup on rxrpc_txbufs, preferring instead to release the page frags as soon as possible. (15) Parse received packets before handling timeouts as the former may reset the latter. (16) Make sure we don't retransmit DATA packets after all the packets have been ACK'd. (17) Differentiate traces for PING ACK transmission. (18) Switch to keeping timeouts as ktime_t rather than a number of jiffies as the latter is too coarse a granularity. Only set the call timer at the end of the call event function from the aggregate of all the timeouts, thereby reducing the number of timer calls made. In future, it might be possible to reduce the number of timers from one per call to one per I/O thread and to use a high-precision timer. (19) Record RTT probes after successful transmission rather than recording it before and then cancelling it after if unsuccessful[*]. This allows a number of calls to get the current time to be removed. (20) Clean up the resend algorithm as there's now no need to walk the transmission buffer under lock[*]. DATA packets can be retransmitted as soon as they're found rather than being queued up and transmitted when the locked is dropped. (21) When initially parsing a received ACK packet, extract some of the fields from the ack info to the skbuff private data. This makes it easier to do path MTU discovery in the future when the call to which a PING RESPONSE ACK refers has been deallocated. [*] Possible with the move of almost all code from softirq context to the I/O thread. Link: https://lore.kernel.org/r/[email protected]/ # v1 Link: https://lore.kernel.org/r/[email protected]/ # v2 * tag 'rxrpc-iothread-20240305' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (21 commits) rxrpc: Extract useful fields from a received ACK to skb priv data rxrpc: Clean up the resend algorithm rxrpc: Record probes after transmission and reduce number of time-gets rxrpc: Use ktimes for call timeout tracking and set the timer lazily rxrpc: Differentiate PING ACK transmission traces. rxrpc: Don't permit resending after all Tx packets acked rxrpc: Parse received packets before dealing with timeouts rxrpc: Do zerocopy using MSG_SPLICE_PAGES and page frags rxrpc: Use rxrpc_txbuf::kvec[0] instead of rxrpc_txbuf::wire rxrpc: Move rxrpc_send_ACK() to output.c with rxrpc_send_ack_packet() rxrpc: Don't pick values out of the wire header when setting up security rxrpc: Split up the DATA packet transmission function rxrpc: Add a kvec[] to the rxrpc_txbuf struct rxrpc: Merge together DF/non-DF branches of data Tx function rxrpc: Do lazy DF flag resetting rxrpc: Remove atomic handling on some fields only used in I/O thread rxrpc: Strip barriers and atomics off of timer tracking rxrpc: Fix the names of the fields in the ACK trailer struct rxrpc: Note cksum in txbuf rxrpc: Convert rxrpc_txbuf::flags into a mask and don't use atomics ... ==================== Link: https://lore.kernel.org/r/ Signed-off-by: Jakub Kicinski <[email protected]>
2 parents 9cb3d52 + 4b68137 commit c3874bb

21 files changed

+853
-804
lines changed

include/trace/events/rxrpc.h

Lines changed: 107 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@
8383
EM(rxrpc_badmsg_bad_abort, "bad-abort") \
8484
EM(rxrpc_badmsg_bad_jumbo, "bad-jumbo") \
8585
EM(rxrpc_badmsg_short_ack, "short-ack") \
86-
EM(rxrpc_badmsg_short_ack_info, "short-ack-info") \
86+
EM(rxrpc_badmsg_short_ack_trailer, "short-ack-trailer") \
8787
EM(rxrpc_badmsg_short_hdr, "short-hdr") \
8888
EM(rxrpc_badmsg_unsupported_packet, "unsup-pkt") \
8989
EM(rxrpc_badmsg_zero_call, "zero-call") \
@@ -119,6 +119,7 @@
119119
EM(rxrpc_call_poke_complete, "Compl") \
120120
EM(rxrpc_call_poke_error, "Error") \
121121
EM(rxrpc_call_poke_idle, "Idle") \
122+
EM(rxrpc_call_poke_set_timeout, "Set-timo") \
122123
EM(rxrpc_call_poke_start, "Start") \
123124
EM(rxrpc_call_poke_timer, "Timer") \
124125
E_(rxrpc_call_poke_timer_now, "Timer-now")
@@ -340,35 +341,26 @@
340341
E_(rxrpc_rtt_rx_requested_ack, "RACK")
341342

342343
#define rxrpc_timer_traces \
343-
EM(rxrpc_timer_begin, "Begin ") \
344-
EM(rxrpc_timer_exp_ack, "ExpAck") \
345-
EM(rxrpc_timer_exp_hard, "ExpHrd") \
346-
EM(rxrpc_timer_exp_idle, "ExpIdl") \
347-
EM(rxrpc_timer_exp_keepalive, "ExpKA ") \
348-
EM(rxrpc_timer_exp_lost_ack, "ExpLoA") \
349-
EM(rxrpc_timer_exp_normal, "ExpNml") \
350-
EM(rxrpc_timer_exp_ping, "ExpPng") \
351-
EM(rxrpc_timer_exp_resend, "ExpRsn") \
352-
EM(rxrpc_timer_init_for_reply, "IniRpl") \
353-
EM(rxrpc_timer_init_for_send_reply, "SndRpl") \
354-
EM(rxrpc_timer_restart, "Restrt") \
355-
EM(rxrpc_timer_set_for_ack, "SetAck") \
356-
EM(rxrpc_timer_set_for_hard, "SetHrd") \
357-
EM(rxrpc_timer_set_for_idle, "SetIdl") \
358-
EM(rxrpc_timer_set_for_keepalive, "KeepAl") \
359-
EM(rxrpc_timer_set_for_lost_ack, "SetLoA") \
360-
EM(rxrpc_timer_set_for_normal, "SetNml") \
361-
EM(rxrpc_timer_set_for_ping, "SetPng") \
362-
EM(rxrpc_timer_set_for_resend, "SetRTx") \
363-
E_(rxrpc_timer_set_for_send, "SetSnd")
344+
EM(rxrpc_timer_trace_delayed_ack, "DelayAck ") \
345+
EM(rxrpc_timer_trace_expect_rx, "ExpectRx ") \
346+
EM(rxrpc_timer_trace_hard, "HardLimit") \
347+
EM(rxrpc_timer_trace_idle, "IdleLimit") \
348+
EM(rxrpc_timer_trace_keepalive, "KeepAlive") \
349+
EM(rxrpc_timer_trace_lost_ack, "LostAck ") \
350+
EM(rxrpc_timer_trace_ping, "DelayPing") \
351+
EM(rxrpc_timer_trace_resend, "Resend ") \
352+
EM(rxrpc_timer_trace_resend_reset, "ResendRst") \
353+
E_(rxrpc_timer_trace_resend_tx, "ResendTx ")
364354

365355
#define rxrpc_propose_ack_traces \
366356
EM(rxrpc_propose_ack_client_tx_end, "ClTxEnd") \
357+
EM(rxrpc_propose_ack_delayed_ack, "DlydAck") \
367358
EM(rxrpc_propose_ack_input_data, "DataIn ") \
368359
EM(rxrpc_propose_ack_input_data_hole, "DataInH") \
369360
EM(rxrpc_propose_ack_ping_for_keepalive, "KeepAlv") \
370361
EM(rxrpc_propose_ack_ping_for_lost_ack, "LostAck") \
371362
EM(rxrpc_propose_ack_ping_for_lost_reply, "LostRpl") \
363+
EM(rxrpc_propose_ack_ping_for_0_retrans, "0-Retrn") \
372364
EM(rxrpc_propose_ack_ping_for_old_rtt, "OldRtt ") \
373365
EM(rxrpc_propose_ack_ping_for_params, "Params ") \
374366
EM(rxrpc_propose_ack_ping_for_rtt, "Rtt ") \
@@ -1084,18 +1076,17 @@ TRACE_EVENT(rxrpc_tx_packet,
10841076

10851077
TRACE_EVENT(rxrpc_tx_data,
10861078
TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t seq,
1087-
rxrpc_serial_t serial, u8 flags, bool retrans, bool lose),
1079+
rxrpc_serial_t serial, unsigned int flags, bool lose),
10881080

1089-
TP_ARGS(call, seq, serial, flags, retrans, lose),
1081+
TP_ARGS(call, seq, serial, flags, lose),
10901082

10911083
TP_STRUCT__entry(
10921084
__field(unsigned int, call)
10931085
__field(rxrpc_seq_t, seq)
10941086
__field(rxrpc_serial_t, serial)
10951087
__field(u32, cid)
10961088
__field(u32, call_id)
1097-
__field(u8, flags)
1098-
__field(bool, retrans)
1089+
__field(u16, flags)
10991090
__field(bool, lose)
11001091
),
11011092

@@ -1106,7 +1097,6 @@ TRACE_EVENT(rxrpc_tx_data,
11061097
__entry->seq = seq;
11071098
__entry->serial = serial;
11081099
__entry->flags = flags;
1109-
__entry->retrans = retrans;
11101100
__entry->lose = lose;
11111101
),
11121102

@@ -1116,8 +1106,8 @@ TRACE_EVENT(rxrpc_tx_data,
11161106
__entry->call_id,
11171107
__entry->serial,
11181108
__entry->seq,
1119-
__entry->flags,
1120-
__entry->retrans ? " *RETRANS*" : "",
1109+
__entry->flags & RXRPC_TXBUF_WIRE_FLAGS,
1110+
__entry->flags & RXRPC_TXBUF_RESENT ? " *RETRANS*" : "",
11211111
__entry->lose ? " *LOSE*" : "")
11221112
);
11231113

@@ -1314,90 +1304,112 @@ TRACE_EVENT(rxrpc_rtt_rx,
13141304
__entry->rto)
13151305
);
13161306

1317-
TRACE_EVENT(rxrpc_timer,
1318-
TP_PROTO(struct rxrpc_call *call, enum rxrpc_timer_trace why,
1319-
unsigned long now),
1307+
TRACE_EVENT(rxrpc_timer_set,
1308+
TP_PROTO(struct rxrpc_call *call, ktime_t delay,
1309+
enum rxrpc_timer_trace why),
13201310

1321-
TP_ARGS(call, why, now),
1311+
TP_ARGS(call, delay, why),
13221312

13231313
TP_STRUCT__entry(
13241314
__field(unsigned int, call)
13251315
__field(enum rxrpc_timer_trace, why)
1326-
__field(long, now)
1327-
__field(long, ack_at)
1328-
__field(long, ack_lost_at)
1329-
__field(long, resend_at)
1330-
__field(long, ping_at)
1331-
__field(long, expect_rx_by)
1332-
__field(long, expect_req_by)
1333-
__field(long, expect_term_by)
1334-
__field(long, timer)
1316+
__field(ktime_t, delay)
13351317
),
13361318

13371319
TP_fast_assign(
13381320
__entry->call = call->debug_id;
13391321
__entry->why = why;
1340-
__entry->now = now;
1341-
__entry->ack_at = call->delay_ack_at;
1342-
__entry->ack_lost_at = call->ack_lost_at;
1343-
__entry->resend_at = call->resend_at;
1344-
__entry->expect_rx_by = call->expect_rx_by;
1345-
__entry->expect_req_by = call->expect_req_by;
1346-
__entry->expect_term_by = call->expect_term_by;
1347-
__entry->timer = call->timer.expires;
1322+
__entry->delay = delay;
13481323
),
13491324

1350-
TP_printk("c=%08x %s a=%ld la=%ld r=%ld xr=%ld xq=%ld xt=%ld t=%ld",
1325+
TP_printk("c=%08x %s to=%lld",
13511326
__entry->call,
13521327
__print_symbolic(__entry->why, rxrpc_timer_traces),
1353-
__entry->ack_at - __entry->now,
1354-
__entry->ack_lost_at - __entry->now,
1355-
__entry->resend_at - __entry->now,
1356-
__entry->expect_rx_by - __entry->now,
1357-
__entry->expect_req_by - __entry->now,
1358-
__entry->expect_term_by - __entry->now,
1359-
__entry->timer - __entry->now)
1328+
ktime_to_us(__entry->delay))
1329+
);
1330+
1331+
TRACE_EVENT(rxrpc_timer_exp,
1332+
TP_PROTO(struct rxrpc_call *call, ktime_t delay,
1333+
enum rxrpc_timer_trace why),
1334+
1335+
TP_ARGS(call, delay, why),
1336+
1337+
TP_STRUCT__entry(
1338+
__field(unsigned int, call)
1339+
__field(enum rxrpc_timer_trace, why)
1340+
__field(ktime_t, delay)
1341+
),
1342+
1343+
TP_fast_assign(
1344+
__entry->call = call->debug_id;
1345+
__entry->why = why;
1346+
__entry->delay = delay;
1347+
),
1348+
1349+
TP_printk("c=%08x %s to=%lld",
1350+
__entry->call,
1351+
__print_symbolic(__entry->why, rxrpc_timer_traces),
1352+
ktime_to_us(__entry->delay))
1353+
);
1354+
1355+
TRACE_EVENT(rxrpc_timer_can,
1356+
TP_PROTO(struct rxrpc_call *call, enum rxrpc_timer_trace why),
1357+
1358+
TP_ARGS(call, why),
1359+
1360+
TP_STRUCT__entry(
1361+
__field(unsigned int, call)
1362+
__field(enum rxrpc_timer_trace, why)
1363+
),
1364+
1365+
TP_fast_assign(
1366+
__entry->call = call->debug_id;
1367+
__entry->why = why;
1368+
),
1369+
1370+
TP_printk("c=%08x %s",
1371+
__entry->call,
1372+
__print_symbolic(__entry->why, rxrpc_timer_traces))
1373+
);
1374+
1375+
TRACE_EVENT(rxrpc_timer_restart,
1376+
TP_PROTO(struct rxrpc_call *call, ktime_t delay, unsigned long delayj),
1377+
1378+
TP_ARGS(call, delay, delayj),
1379+
1380+
TP_STRUCT__entry(
1381+
__field(unsigned int, call)
1382+
__field(unsigned long, delayj)
1383+
__field(ktime_t, delay)
1384+
),
1385+
1386+
TP_fast_assign(
1387+
__entry->call = call->debug_id;
1388+
__entry->delayj = delayj;
1389+
__entry->delay = delay;
1390+
),
1391+
1392+
TP_printk("c=%08x to=%lld j=%ld",
1393+
__entry->call,
1394+
ktime_to_us(__entry->delay),
1395+
__entry->delayj)
13601396
);
13611397

13621398
TRACE_EVENT(rxrpc_timer_expired,
1363-
TP_PROTO(struct rxrpc_call *call, unsigned long now),
1399+
TP_PROTO(struct rxrpc_call *call),
13641400

1365-
TP_ARGS(call, now),
1401+
TP_ARGS(call),
13661402

13671403
TP_STRUCT__entry(
13681404
__field(unsigned int, call)
1369-
__field(long, now)
1370-
__field(long, ack_at)
1371-
__field(long, ack_lost_at)
1372-
__field(long, resend_at)
1373-
__field(long, ping_at)
1374-
__field(long, expect_rx_by)
1375-
__field(long, expect_req_by)
1376-
__field(long, expect_term_by)
1377-
__field(long, timer)
13781405
),
13791406

13801407
TP_fast_assign(
13811408
__entry->call = call->debug_id;
1382-
__entry->now = now;
1383-
__entry->ack_at = call->delay_ack_at;
1384-
__entry->ack_lost_at = call->ack_lost_at;
1385-
__entry->resend_at = call->resend_at;
1386-
__entry->expect_rx_by = call->expect_rx_by;
1387-
__entry->expect_req_by = call->expect_req_by;
1388-
__entry->expect_term_by = call->expect_term_by;
1389-
__entry->timer = call->timer.expires;
13901409
),
13911410

1392-
TP_printk("c=%08x EXPIRED a=%ld la=%ld r=%ld xr=%ld xq=%ld xt=%ld t=%ld",
1393-
__entry->call,
1394-
__entry->ack_at - __entry->now,
1395-
__entry->ack_lost_at - __entry->now,
1396-
__entry->resend_at - __entry->now,
1397-
__entry->expect_rx_by - __entry->now,
1398-
__entry->expect_req_by - __entry->now,
1399-
__entry->expect_term_by - __entry->now,
1400-
__entry->timer - __entry->now)
1411+
TP_printk("c=%08x EXPIRED",
1412+
__entry->call)
14011413
);
14021414

14031415
TRACE_EVENT(rxrpc_rx_lose,
@@ -1506,26 +1518,30 @@ TRACE_EVENT(rxrpc_drop_ack,
15061518
);
15071519

15081520
TRACE_EVENT(rxrpc_retransmit,
1509-
TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t seq, s64 expiry),
1521+
TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t seq,
1522+
rxrpc_serial_t serial, ktime_t expiry),
15101523

1511-
TP_ARGS(call, seq, expiry),
1524+
TP_ARGS(call, seq, serial, expiry),
15121525

15131526
TP_STRUCT__entry(
15141527
__field(unsigned int, call)
15151528
__field(rxrpc_seq_t, seq)
1516-
__field(s64, expiry)
1529+
__field(rxrpc_serial_t, serial)
1530+
__field(ktime_t, expiry)
15171531
),
15181532

15191533
TP_fast_assign(
15201534
__entry->call = call->debug_id;
15211535
__entry->seq = seq;
1536+
__entry->serial = serial;
15221537
__entry->expiry = expiry;
15231538
),
15241539

1525-
TP_printk("c=%08x q=%x xp=%lld",
1540+
TP_printk("c=%08x q=%x r=%x xp=%lld",
15261541
__entry->call,
15271542
__entry->seq,
1528-
__entry->expiry)
1543+
__entry->serial,
1544+
ktime_to_us(__entry->expiry))
15291545
);
15301546

15311547
TRACE_EVENT(rxrpc_congest,

net/rxrpc/af_rxrpc.c

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -487,22 +487,22 @@ EXPORT_SYMBOL(rxrpc_kernel_new_call_notification);
487487
* rxrpc_kernel_set_max_life - Set maximum lifespan on a call
488488
* @sock: The socket the call is on
489489
* @call: The call to configure
490-
* @hard_timeout: The maximum lifespan of the call in jiffies
490+
* @hard_timeout: The maximum lifespan of the call in ms
491491
*
492492
* Set the maximum lifespan of a call. The call will end with ETIME or
493493
* ETIMEDOUT if it takes longer than this.
494494
*/
495495
void rxrpc_kernel_set_max_life(struct socket *sock, struct rxrpc_call *call,
496496
unsigned long hard_timeout)
497497
{
498-
unsigned long now;
498+
ktime_t delay = ms_to_ktime(hard_timeout), expect_term_by;
499499

500500
mutex_lock(&call->user_mutex);
501501

502-
now = jiffies;
503-
hard_timeout += now;
504-
WRITE_ONCE(call->expect_term_by, hard_timeout);
505-
rxrpc_reduce_call_timer(call, hard_timeout, now, rxrpc_timer_set_for_hard);
502+
expect_term_by = ktime_add(ktime_get_real(), delay);
503+
WRITE_ONCE(call->expect_term_by, expect_term_by);
504+
trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_hard);
505+
rxrpc_poke_call(call, rxrpc_call_poke_set_timeout);
506506

507507
mutex_unlock(&call->user_mutex);
508508
}

0 commit comments

Comments
 (0)