Skip to content

Commit 0544f54

Browse files
mrybczynskaChristoph Hellwig
authored andcommitted
nvme-rdma: support devices with queue size < 32
In the case of small NVMe-oF queue size (<32) we may enter a deadlock caused by the fact that the IB completions aren't sent waiting for 32 and the send queue will fill up. The error is seen as (using mlx5): [ 2048.693355] mlx5_0:mlx5_ib_post_send:3765:(pid 7273): [ 2048.693360] nvme nvme1: nvme_rdma_post_send failed with error code -12 This patch changes the way the signaling is done so that it depends on the queue depth now. The magic define has been removed completely. Cc: [email protected] Signed-off-by: Marta Rybczynska <[email protected]> Signed-off-by: Samuel Jones <[email protected]> Acked-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
1 parent 0833289 commit 0544f54

File tree

1 file changed

+14
-4
lines changed

1 file changed

+14
-4
lines changed

drivers/nvme/host/rdma.c

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1038,6 +1038,19 @@ static void nvme_rdma_send_done(struct ib_cq *cq, struct ib_wc *wc)
10381038
nvme_rdma_wr_error(cq, wc, "SEND");
10391039
}
10401040

1041+
static inline int nvme_rdma_queue_sig_limit(struct nvme_rdma_queue *queue)
1042+
{
1043+
int sig_limit;
1044+
1045+
/*
1046+
* We signal completion every queue depth/2 and also handle the
1047+
* degenerated case of a device with queue_depth=1, where we
1048+
* would need to signal every message.
1049+
*/
1050+
sig_limit = max(queue->queue_size / 2, 1);
1051+
return (++queue->sig_count % sig_limit) == 0;
1052+
}
1053+
10411054
static int nvme_rdma_post_send(struct nvme_rdma_queue *queue,
10421055
struct nvme_rdma_qe *qe, struct ib_sge *sge, u32 num_sge,
10431056
struct ib_send_wr *first, bool flush)
@@ -1065,17 +1078,14 @@ static int nvme_rdma_post_send(struct nvme_rdma_queue *queue,
10651078
* Would have been way to obvious to handle this in hardware or
10661079
* at least the RDMA stack..
10671080
*
1068-
* This messy and racy code sniplet is copy and pasted from the iSER
1069-
* initiator, and the magic '32' comes from there as well.
1070-
*
10711081
* Always signal the flushes. The magic request used for the flush
10721082
* sequencer is not allocated in our driver's tagset and it's
10731083
* triggered to be freed by blk_cleanup_queue(). So we need to
10741084
* always mark it as signaled to ensure that the "wr_cqe", which is
10751085
* embedded in request's payload, is not freed when __ib_process_cq()
10761086
* calls wr_cqe->done().
10771087
*/
1078-
if ((++queue->sig_count % 32) == 0 || flush)
1088+
if (nvme_rdma_queue_sig_limit(queue) || flush)
10791089
wr.send_flags |= IB_SEND_SIGNALED;
10801090

10811091
if (first)

0 commit comments

Comments
 (0)