-
Notifications
You must be signed in to change notification settings - Fork 78
Commit 958ede1
RDMA/uverbs: restrack shared PDs
A SRQ inherits its parent PD's resource name in ib_create_srq_user():
rdma_restrack_new(&srq->res, RDMA_RESTRACK_SRQ);
rdma_restrack_parent_name(&srq->res, &pd->res);
But user PDs created via ib_uverbs_share_pd() aren't restracked causing
the PD to not have any parent name, causing the following crash when we run
"rdma res show srq" and so this patch adds the shpd to restrack.
[ 189.099669] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 189.100707] #PF: supervisor read access in kernel mode
[ 189.101504] #PF: error_code(0x0000) - not-present page
[ 189.102357] PGD 0 P4D 0
[ 189.102801] Oops: 0000 [#1] SMP NOPTI
[ 189.103413] CPU: 26 PID: 69041 Comm: rdma Kdump: loaded Not tainted 5.15.0-5.76.3.el8uek.x86_64 #2
[ 189.104758] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-2.module+el8.6.0+20659+3dcf7c70 04/01/2014
[ 189.106359] RIP: 0010:strlen+0x0/0x24
[ 189.106994] Code: 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee 31 d2 89 d1 89 d6 89 d7 41 89 d0 c3 cc cc cc cc 0f 1f 84 00 00 00 00 00 <80> 3f 00 74 16 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 31 ff
[ 189.109828] RSP: 0018:ffffa2f2b409b808 EFLAGS: 00010246
[ 189.110684] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
[ 189.111790] RDX: 0000000000000000 RSI: ffff93dca8f46448 RDI: 0000000000000000
[ 189.112943] RBP: ffff93f8091b2500 R08: 0000000000000000 R09: ffff93f8090750b4
[ 189.114102] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 189.115279] R13: ffff93f809075088 R14: ffff93f8067e46a8 R15: 0000000000000000
[ 189.116434] FS: 00007fe7c9707540(0000) GS:ffff9416c2800000(0000) knlGS:0000000000000000
[ 189.117753] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 189.118683] CR2: 0000000000000000 CR3: 000000240eebc004 CR4: 0000000000770ee0
[ 189.119857] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 189.121029] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 189.122198] PKRU: 55555554
[ 189.122676] Call Trace:
[ 189.123114] <TASK>
[ 189.123474] fill_res_name_pid+0x31/0xb0 [ib_core]
[ 189.124217] res_get_common_dumpit+0x38f/0x540 [ib_core]
[ 189.125045] ? fill_res_srq_qps+0x210/0x210 [ib_core]
[ 189.125930] netlink_dump+0x18b/0x307
[ 189.126511] __netlink_dump_start+0x1f2/0x2d9
[ 189.127145] rdma_nl_rcv_msg+0x1d4/0x210 [ib_core]
[ 189.127954] ? res_get_common_dumpit+0x540/0x540 [ib_core]
[ 189.128871] rdma_nl_rcv+0xaa/0x100 [ib_core]
[ 189.129616] netlink_unicast+0x213/0x2ce
[ 189.130284] netlink_sendmsg+0x24f/0x4d9
[ 189.130941] sock_sendmsg+0x65/0x6a
[ 189.131547] __sys_sendto+0x128/0x19b
[ 189.132189] __x64_sys_sendto+0x20/0x35
[ 189.132832] do_syscall_64+0x38/0x8d
[ 189.133451] entry_SYSCALL_64_after_hwframe+0x63/0x0
[ 189.134292] RIP: 0033:0x7fe7c87bc3ab
[ 189.134906] Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 f5 41 29 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
[ 189.137790] RSP: 002b:00007fffc9e324a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 189.139019] RAX: ffffffffffffffda RBX: 00007fffc9e32750 RCX: 00007fe7c87bc3ab
[ 189.140153] RDX: 0000000000000018 RSI: 0000558d21de1920 RDI: 0000000000000004
[ 189.141332] RBP: 0000000000000017 R08: 00007fe7c8c5c480 R09: 000000000000000c
[ 189.142470] R10: 0000000000000000 R11: 0000000000000246 R12: 0000558d2120e850
[ 189.143631] R13: 00007fffc9e32770 R14: 0000000000000000 R15: 0000000000000000
[ 189.144785] </TASK>
and so with the fix:
# rdma res show pd
...
dev mlx5_0 pdn 42 local_dma_lkey 0x0 users 12 ctxn 36 pid 87599 comm ora_ipc0_dbm051
dev mlx5_0 pdn 43 local_dma_lkey 0x0 users 4 ctxn 36 pid 87599 comm ora_ipc0_dbm051
...
we now see correct pdns, process names for the SRQs and no kernel crash:
# rdma res show srq
dev mlx5_0 srqn 1 type BASIC lqpn 2448 pdn 42 pid 87599 comm ora_ipc0_dbm051
dev mlx5_0 srqn 3 type XRC pdn 42 cqn 2081 pid 87599 comm ora_ipc0_dbm051
dev mlx5_0 srqn 4 type XRC pdn 42 cqn 2081 pid 87599 comm ora_ipc0_dbm051
dev mlx5_0 srqn 5 type XRC pdn 43 cqn 2083 pid 87599 comm ora_ipc0_dbm051
dev mlx5_0 srqn 6 type XRC pdn 43 cqn 2083 pid 87599 comm ora_ipc0_dbm051
...
Orabug: 34812519
Fixes: b09c4d7 ("RDMA/restrack: Improve readability in task name management")
Fixes: 86133a24cbd8 ("IB/Shared PD support from Oracle")
Signed-off-by: Sharath Srinivasan <[email protected]>
Reviewed-by: Gerd Rausch <[email protected]>
Reviewed-by: Qing Huang <[email protected]>1 parent 726fd8f commit 958ede1Copy full SHA for 958ede1
File tree
Expand file treeCollapse file tree
1 file changed
+4
-0
lines changedFilter options
- drivers/infiniband/core
Expand file treeCollapse file tree
1 file changed
+4
-0
lines changeddrivers/infiniband/core/uverbs_cmd.c
Copy file name to clipboardExpand all lines: drivers/infiniband/core/uverbs_cmd.c+4Lines changed: 4 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
629 | 629 |
| |
630 | 630 |
| |
631 | 631 |
| |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
632 | 636 |
| |
633 | 637 |
| |
634 | 638 |
| |
|
0 commit comments