You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a remote peer has moved its IP address from one port to the other,
the local node may have an incorrect ARP entry in its cache. During
connection management, we will then get back a route-error-event from
the CM.
Current code attempts to flush the ARP entry from the cache. However,
1) it does not check for return values, 2) it does not supply the
device name, 3) it does not iterate over all possible device names,
and 4) its doesn't supply the correct flags.
Due to 2-4 above, the flushing doesn't work.
This commit fixes this.
On a system with a single CX-3 and 16 VFs, fail-over just after a
fail-back is reduced from ~60 seconds down to ~10 seconds with the fix
(1156 RDS connections).
The fix for UEK5 is slightly more complicated compared to the UEK4
variants, because rdmaip has moved stuff out of the rds_rdma module
and due to RoCE. Hence, this commit detects possible IB link-layers
and flushes the ARP cache for the possible devices accordingly.
This is a temporary fix and should be moved out of the rds_rdma module
and into the rdmaip module, as tracked by ER 28341928 - Move ARP
flushing logic from rds_rdma to rdmaip.
V1 -> V2:
* Added correct use of netmask for the ATF_PUBL flag (Ka-Cheong)
* Moved the link-layer detected flags into the rds_ib_transport
struct (Ka-Cheong)
V2 -> V3:
* Added to commit message that this is a temporary fix (Santosh)
* Added Santosh' r-b
Orabug: 28219823
Signed-off-by: Håkon Bugge <[email protected]>
Reviewed-by: [email protected]
0 commit comments