You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
net/rds: Avoid stalled connection due to CM REQ retries
RDS drops a connection and destroys its cm_id once a CM REJ is sent. In a
congested fabric, there is a race where a remote node receives a CM REJ
after CM has retried another CM REQ. In this scenario, the cm_id that sends
the CM REQ is no longer exists even though the remote end might respond
with a CM REP, and wait for an incoming CM RTU. This RDS connection
establishment is stuck until the connection is destroyed after the CM
timeout. As a result, this leads to a very long brownout time. Thus, this
patch adds a mechanism to detect a rejected CM REQ and rejects all the
subsequent CM REQ that are retried by the CM.
Orabug: 25521901
Signed-off-by: Wei Lin Guay <[email protected]>
Tested-by: Dib Chatterjee <[email protected]>
(cherry picked from commit c5c4f1472bc788ddc69af713f975ad92bdefe206
repo https://linux-git.us.oracle.com/UEK/linux-wguay-public)
Conflict:
net/rds/ib_cm.c
Made it checkpatch clean.
v1->v2:
Added Shannon's recommendations
Signed-off-by: Håkon Bugge <[email protected]>
Reviewed-by: Shannon Nelson <[email protected]>
0 commit comments