Skip to content

Commit de10506

Browse files
Chunguang Xukeithbusch
authored andcommitted
nvme: fix reconnection fail due to reserved tag allocation
We found a issue on production environment while using NVMe over RDMA, admin_q reconnect failed forever while remote target and network is ok. After dig into it, we found it may caused by a ABBA deadlock due to tag allocation. In my case, the tag was hold by a keep alive request waiting inside admin_q, as we quiesced admin_q while reset ctrl, so the request maked as idle and will not process before reset success. As fabric_q shares tagset with admin_q, while reconnect remote target, we need a tag for connect command, but the only one reserved tag was held by keep alive command which waiting inside admin_q. As a result, we failed to reconnect admin_q forever. In order to fix this issue, I think we should keep two reserved tags for admin queue. Fixes: ed01fee ("nvme-fabrics: only reserve a single tag") Signed-off-by: Chunguang Xu <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>
1 parent 2bc9174 commit de10506

File tree

2 files changed

+4
-9
lines changed

2 files changed

+4
-9
lines changed

drivers/nvme/host/core.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4385,7 +4385,8 @@ int nvme_alloc_admin_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
43854385
set->ops = ops;
43864386
set->queue_depth = NVME_AQ_MQ_TAG_DEPTH;
43874387
if (ctrl->ops->flags & NVME_F_FABRICS)
4388-
set->reserved_tags = NVMF_RESERVED_TAGS;
4388+
/* Reserved for fabric connect and keep alive */
4389+
set->reserved_tags = 2;
43894390
set->numa_node = ctrl->numa_node;
43904391
set->flags = BLK_MQ_F_NO_SCHED;
43914392
if (ctrl->ops->flags & NVME_F_BLOCKING)
@@ -4454,7 +4455,8 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
44544455
if (ctrl->quirks & NVME_QUIRK_SHARED_TAGS)
44554456
set->reserved_tags = NVME_AQ_DEPTH;
44564457
else if (ctrl->ops->flags & NVME_F_FABRICS)
4457-
set->reserved_tags = NVMF_RESERVED_TAGS;
4458+
/* Reserved for fabric connect */
4459+
set->reserved_tags = 1;
44584460
set->numa_node = ctrl->numa_node;
44594461
set->flags = BLK_MQ_F_SHOULD_MERGE;
44604462
if (ctrl->ops->flags & NVME_F_BLOCKING)

drivers/nvme/host/fabrics.h

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,6 @@
1818
/* default is -1: the fail fast mechanism is disabled */
1919
#define NVMF_DEF_FAIL_FAST_TMO -1
2020

21-
/*
22-
* Reserved one command for internal usage. This command is used for sending
23-
* the connect command, as well as for the keep alive command on the admin
24-
* queue once live.
25-
*/
26-
#define NVMF_RESERVED_TAGS 1
27-
2821
/*
2922
* Define a host as seen by the target. We allocate one at boot, but also
3023
* allow the override it when creating controllers. This is both to provide

0 commit comments

Comments
 (0)