Skip to content

Commit adeee1b

Browse files
pkannojujfvogel
authored andcommitted
RDS: avoid queueing delayed work on an offlined cpu
During cpu scaling operations, when an rds delayed_work with non-zero delay is scheduled on an offlined cpu, we've seen that the work gets stuck and the work will reside in the send queue without gettting transmitted. Only when other traffic on that connection path in a non worker context is submitted, the earlier stuck work will be flushed out. This situation is causing latency in the rds-traffic, especially visible from the rds-ping data. We've reproduced this issue in-house with simple cpu scale-down activity. Corresponding details are shown below. ----------------------------------------- [Tue Dec 24 06:47:33 2024] Unregister pv shared memory for cpu 52 [Tue Dec 24 06:47:33 2024] smpboot: CPU 52 is now offline [Tue Dec 24 06:47:35 2024] <::ffff:192.168.10.15,::ffff:192.168.10.17,0> work scheduled on offine cpu: 52, delay: 1, raw_smp_processor_id: 22 PID: 53903 Comm: ora_dia0_c219cd [Tue Dec 24 06:47:35 2024] CPU: 22 PID: 53903 Comm: ora_dia0_c219cd Kdump: loaded Tainted: P OE 5.4.17-2136.322.6.5.el8uek.x86_64 #2 [Tue Dec 24 06:47:35 2024] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-4.module+el8.10.0+90413+d8f5961d 04/01/2014 [Tue Dec 24 06:47:35 2024] Call Trace: [Tue Dec 24 06:47:35 2024] dump_stack+0x6d/0x8f [Tue Dec 24 06:47:35 2024] rds_queue_delayed_work_on+0x131/0x140 [ksplice_4nnxk5aq_rds_new] [Tue Dec 24 06:47:35 2024] rds_sendmsg+0x1339/0x1499 [rds] [Tue Dec 24 06:47:35 2024] ? __check_object_size+0x51/0x1c7 [Tue Dec 24 06:47:35 2024] ? _copy_from_user+0x34/0x64 [Tue Dec 24 06:47:35 2024] ? rw_copy_check_uvector+0x61/0x13f [Tue Dec 24 06:47:35 2024] sock_sendmsg+0x67/0x69 [Tue Dec 24 06:47:35 2024] ____sys_sendmsg+0x1fe/0x266 [Tue Dec 24 06:47:35 2024] ? copy_msghdr_from_user+0x60/0x8f [Tue Dec 24 06:47:35 2024] ___sys_sendmsg+0x7c/0xb9 [Tue Dec 24 06:47:35 2024] ? ___sys_recvmsg+0x89/0xb8 [Tue Dec 24 06:47:35 2024] __sys_sendmsg+0x5c/0xa2 [Tue Dec 24 06:47:35 2024] __x64_sys_sendmsg+0x1f/0x25 [Tue Dec 24 06:47:35 2024] do_syscall_64+0x60/0x1cf [Tue Dec 24 06:47:35 2024] entry_SYSCALL_64_after_hwframe+0x175/0x0 [Tue Dec 24 06:47:35 2024] RIP: 0033:0x7f4bebd1aa85 ----------------------------------------- The above stack indicates that the oracle db process "ora_dia0_c219cd" issued an rds related work on the connection between 192.168.10.15 and 192.168.10.17 on lane0", which was scheduled to run on CPU 52 at 06:47:35, which just got offlined at 06:47:33. This started the increase in rds-ping latencies on the same connection. ----------------------------------------- [INFO:2024-12-24-06:42:20] numactl --cpunodebind=0 --membind=0 rds-ping -c 1 -i 5 -Q 0 -I 192.168.10.17 192.168.10.15: 1: 75 usec [INFO:2024-12-24-06:43:21] numactl --cpunodebind=0 --membind=0 rds-ping -c 1 -i 5 -Q 0 -I 192.168.10.17 192.168.10.15: 1: 90 usec [INFO:2024-12-24-06:44:41] numactl --cpunodebind=0 --membind=0 rds-ping -c 1 -i 5 -Q 0 -I 192.168.10.17 192.168.10.15: 1: 103 usec [INFO:2024-12-24-06:45:41] numactl --cpunodebind=0 --membind=0 rds-ping -c 1 -i 5 -Q 0 -I 192.168.10.17 192.168.10.15: 1: 97 usec [INFO:2024-12-24-06:46:41] numactl --cpunodebind=0 --membind=0 rds-ping -c 1 -i 5 -Q 0 -I 192.168.10.17 192.168.10.15: 1: 99 usec [INFO:2024-12-24-06:47:48] numactl --cpunodebind=0 --membind=0 rds-ping -c 1 -i 5 -Q 0 -I 192.168.10.17 192.168.10.15: 1: 1101878 usec [INFO:2024-12-24-06:48:48] numactl --cpunodebind=0 --membind=0 rds-ping -c 1 -i 5 -Q 0 -I 192.168.10.17 192.168.10.15: 1: 70558 usec [INFO:2024-12-24-06:49:50] numactl --cpunodebind=0 --membind=0 rds-ping -c 1 -i 5 -Q 0 -I 192.168.10.17 192.168.10.15: 1: 717324 usec ----------------------------------------- The patch we're proposing to fix this issue ensures that we execute the delayed work on a cpu which is online at the moment. In case the cpu becomes offline after this, the timer would migrate to the available cpu and get the job executed instead of remaining stuck. We've verified the performance through rds-stress tests to ensure there is no huge performance impact with this patch. QA tests for this patch are under progress. Orabug: 37260584 Signed-off-by: Praveen Kumar Kannoju <[email protected]> Reviewed-by: Imran Khan <[email protected]> Acked-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Arumugam Kolappan <[email protected]> Signed-off-by: Alok Tiwari <[email protected]> (cherry picked from commit dfcbc82) Orabug: 37551308 Signed-off-by: Arumugam Kolappan <[email protected]> Reviewed-by: Håkon Bugge <[email protected]>
1 parent fac6178 commit adeee1b

File tree

3 files changed

+29
-3
lines changed

3 files changed

+29
-3
lines changed

net/rds/ib_rdma.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -645,7 +645,7 @@ static void rds_ib_queue_delayed_work_on(struct rds_ib_device *rds_ibdev,
645645
char *reason)
646646
{
647647
trace_rds_ib_queue_work(rds_ibdev, wq, &dwork->work, delay, reason);
648-
queue_delayed_work_on(cpu, wq, dwork, delay);
648+
__rds_queue_delayed_work_on(cpu, wq, dwork, delay);
649649
}
650650

651651
static void rds_ib_queue_cancel_work(struct rds_ib_device *rds_ibdev,

net/rds/rds.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1244,6 +1244,10 @@ void rds_queue_delayed_work_on(struct rds_conn_path *cp, int cpu,
12441244
struct workqueue_struct *wq,
12451245
struct delayed_work *dwork,
12461246
unsigned long delay, char *reason);
1247+
void __rds_queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
1248+
struct delayed_work *dwork,
1249+
unsigned long delay);
1250+
12471251
void rds_mod_delayed_work(struct rds_conn_path *cp,
12481252
struct workqueue_struct *wq,
12491253
struct delayed_work *dwork,

net/rds/threads.c

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,28 @@ void rds_queue_work(struct rds_conn_path *cp,
112112
}
113113
EXPORT_SYMBOL_GPL(rds_queue_work);
114114

115+
void __rds_queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
116+
struct delayed_work *dwork,
117+
unsigned long delay)
118+
{
119+
if (!delay || cpu == WORK_CPU_UNBOUND) {
120+
queue_delayed_work_on(cpu, wq, dwork, delay);
121+
return;
122+
}
123+
124+
if (cpus_read_trylock()) {
125+
if (cpu_online(cpu)) {
126+
queue_delayed_work_on(cpu, wq, dwork, delay);
127+
cpus_read_unlock();
128+
return;
129+
}
130+
cpus_read_unlock();
131+
}
132+
133+
queue_delayed_work(wq, dwork, delay);
134+
}
135+
EXPORT_SYMBOL_GPL(__rds_queue_delayed_work_on);
136+
115137
void rds_queue_delayed_work(struct rds_conn_path *cp,
116138
struct workqueue_struct *wq,
117139
struct delayed_work *dwork,
@@ -125,7 +147,7 @@ void rds_queue_delayed_work(struct rds_conn_path *cp,
125147

126148
if (cp && cp->cp_conn->c_trans->conn_preferred_cpu) {
127149
cpu = cp->cp_conn->c_trans->conn_preferred_cpu(cp->cp_conn, false);
128-
queue_delayed_work_on(cpu, wq, dwork, delay);
150+
__rds_queue_delayed_work_on(cpu, wq, dwork, delay);
129151
} else
130152
queue_delayed_work(wq, dwork, delay);
131153
}
@@ -140,7 +162,7 @@ void rds_queue_delayed_work_on(struct rds_conn_path *cp,
140162
{
141163
trace_rds_queue_work(cp ? cp->cp_conn : NULL, cp, wq, &dwork->work,
142164
delay, reason);
143-
queue_delayed_work_on(cpu, wq, dwork, delay);
165+
__rds_queue_delayed_work_on(cpu, wq, dwork, delay);
144166
}
145167
EXPORT_SYMBOL_GPL(rds_queue_delayed_work_on);
146168

0 commit comments

Comments
 (0)