Skip to content

Commit 52ef83b

Browse files
venkatxvenkatsubravijay-suman
authored andcommitted
RDS/IB: VRPC DELAY / OSS RECONNECT CAUSES 5 MINUTE STALL ON PORT FAILURE
This problem occurs when the user gets notified of a successful rdma write + bcopy message completion but the peer application does not receive the bcopy message. This happens during a port down/up test. What seems to happen is the rdma write succeeds but the bcopy message fails. RDS should not be returning successful completion status to the user in this case. When RDS does a rdma followed by a bcopy message the user notification is supposed to be implemented by method #3 below. /* If the user asked for a completion notification on this * message, we can implement three different semantics: * 1. Notify when we received the ACK on the RDS message * that was queued with the RDMA. This provides reliable * notification of RDMA status at the expense of a one-way * packet delay. * 2. Notify when the IB stack gives us the completion event for * the RDMA operation. * 3. Notify when the IB stack gives us the completion event for * the accompanying RDS messages. * Here, we implement approach #3. To implement approach #2, * we would need to take an event for the rdma WR. To implement #1, * don't call rds_rdma_send_complete at all, and fall back to the notify * handling in the ACK processing code. But unfortunately the user gets notified earlier to knowing the bcopy send status. Right after rdma write completes the user gets notified even though the subsequent bcopy eventually fails. The fix is to delay signaling completions of rdma op till the bcopy send completes. Orabug: 22847528 Signed-off-by: Venkat Venkatsubra <[email protected]> Acked-by: Rama Nichanamatlu <[email protected]> Orabug: 27364391 (cherry picked from commit 804df7a) cherry-pick-repo=linux-uek.git Signed-off-by: Gerd Rausch <[email protected]> Signed-off-by: Somasundaram Krishnasamy <[email protected]> Orabug: 33590097 UEK6 => UEK7 (cherry picked from commit 9bca09b) cherry-pick-repo=UEK/production/linux-uek.git Signed-off-by: Gerd Rausch <[email protected]> Reviewed-by: William Kucharski <[email protected]> Orabug: 33590087 UEK7 => LUCI (cherry picked from commit 9dc52eb) cherry-pick-repo=UEK/production/linux-uek.git Signed-off-by: Gerd Rausch <[email protected]> Reviewed-by: William Kucharski <[email protected]>
1 parent 331332e commit 52ef83b

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

net/rds/ib_send.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -970,7 +970,7 @@ int rds_ib_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
970970
send->s_queued = jiffies;
971971
send->s_op = NULL;
972972

973-
if (!op->op_remote_complete)
973+
if (!op->op_remote_complete && !op->op_notify)
974974
nr_sig += rds_ib_set_wr_signal_state(ic, send, op->op_notify);
975975

976976
send->s_wr.opcode = op->op_write ? IB_WR_RDMA_WRITE : IB_WR_RDMA_READ;

0 commit comments

Comments
 (0)