Skip to content

Commit e7104a2

Browse files
chuckleveramschuma-ntap
authored andcommitted
xprtrdma: Cap req_cqinit
Recent work made FRMR registration and invalidation completions unsignaled. This greatly reduces the adapter interrupt rate. Every so often, however, a posted send Work Request is allowed to signal. Otherwise, the provider's Work Queue will wrap and the workload will hang. The number of Work Requests that are allowed to remain unsignaled is determined by the value of req_cqinit. Currently, this is set to the size of the send Work Queue divided by two, minus 1. For FRMR, the send Work Queue is the maximum number of concurrent RPCs (currently 32) times the maximum number of Work Requests an RPC might use (currently 7, though some adapters may need more). For mlx4, this is 224 entries. This leaves completion signaling disabled for 111 send Work Requests. Some providers hold back dispatching Work Requests until a CQE is generated. If completions are disabled, then no CQEs are generated for quite some time, and that can stall the Work Queue. I've seen this occur running xfstests generic/113 over NFSv4, where eventually, posting a FAST_REG_MR Work Request fails with -ENOMEM because the Work Queue has overflowed. The connection is dropped and re-established. Cap the rep_cqinit setting so completions are not left turned off for too long. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=269 Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
1 parent 92b9836 commit e7104a2

File tree

2 files changed

+9
-1
lines changed

2 files changed

+9
-1
lines changed

net/sunrpc/xprtrdma/verbs.c

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -733,7 +733,9 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
733733

734734
/* set trigger for requesting send completion */
735735
ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 - 1;
736-
if (ep->rep_cqinit <= 2)
736+
if (ep->rep_cqinit > RPCRDMA_MAX_UNSIGNALED_SENDS)
737+
ep->rep_cqinit = RPCRDMA_MAX_UNSIGNALED_SENDS;
738+
else if (ep->rep_cqinit <= 2)
737739
ep->rep_cqinit = 0;
738740
INIT_CQCOUNT(ep);
739741
ep->rep_ia = ia;

net/sunrpc/xprtrdma/xprt_rdma.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,12 @@ struct rpcrdma_ep {
9797
struct ib_wc rep_recv_wcs[RPCRDMA_POLLSIZE];
9898
};
9999

100+
/*
101+
* Force a signaled SEND Work Request every so often,
102+
* in case the provider needs to do some housekeeping.
103+
*/
104+
#define RPCRDMA_MAX_UNSIGNALED_SENDS (32)
105+
100106
#define INIT_CQCOUNT(ep) atomic_set(&(ep)->rep_cqcount, (ep)->rep_cqinit)
101107
#define DECR_CQCOUNT(ep) atomic_sub_return(1, &(ep)->rep_cqcount)
102108

0 commit comments

Comments
 (0)