Skip to content

Commit b049453

Browse files
committed
libceph: request a new osdmap if lingering request maps to no osd
This commit does two things. First, if there are any homeless lingering requests, we now request a new osdmap even if the osdmap that is being processed brought no changes, i.e. if a given lingering request turned homeless in one of the previous epochs and remained homeless in the current epoch. Not doing so leaves us with a stale osdmap and as a result we may miss our window for reestablishing the watch and lose notifies. MON=1 OSD=1: # cat linger-needmap.sh #!/bin/bash rbd create --size 1 test DEV=$(rbd map test) ceph osd out 0 rbd map dne/dne # obtain a new osdmap as a side effect (!) sleep 1 ceph osd in 0 rbd resize --size 2 test # rbd info test | grep size -> 2M # blockdev --getsize $DEV -> 1M N.B.: Not obtaining a new osdmap in between "osd out" and "osd in" above is enough to make it miss that resize notify, but that is a bug^Wlimitation of ceph watch/notify v1. Second, homeless lingering requests are now kicked just like those lingering requests whose mapping has changed. This is mainly to recognize that a homeless lingering request makes no sense and to preserve the invariant that a registered lingering request is not sitting on any of r_req_lru_item lists. This spares us a WARN_ON, which commit ba9d114 ("libceph: clear r_req_lru_item in __unregister_linger_request()") tried to fix the _wrong_ way. Cc: [email protected] # 3.10+ Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]>
1 parent e260818 commit b049453

File tree

1 file changed

+20
-11
lines changed

1 file changed

+20
-11
lines changed

net/ceph/osd_client.c

Lines changed: 20 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2017,20 +2017,29 @@ static void kick_requests(struct ceph_osd_client *osdc, bool force_resend,
20172017
err = __map_request(osdc, req,
20182018
force_resend || force_resend_writes);
20192019
dout("__map_request returned %d\n", err);
2020-
if (err == 0)
2021-
continue; /* no change and no osd was specified */
20222020
if (err < 0)
20232021
continue; /* hrm! */
2024-
if (req->r_osd == NULL) {
2025-
dout("tid %llu maps to no valid osd\n", req->r_tid);
2026-
needmap++; /* request a newer map */
2027-
continue;
2028-
}
2022+
if (req->r_osd == NULL || err > 0) {
2023+
if (req->r_osd == NULL) {
2024+
dout("lingering %p tid %llu maps to no osd\n",
2025+
req, req->r_tid);
2026+
/*
2027+
* A homeless lingering request makes
2028+
* no sense, as it's job is to keep
2029+
* a particular OSD connection open.
2030+
* Request a newer map and kick the
2031+
* request, knowing that it won't be
2032+
* resent until we actually get a map
2033+
* that can tell us where to send it.
2034+
*/
2035+
needmap++;
2036+
}
20292037

2030-
dout("kicking lingering %p tid %llu osd%d\n", req, req->r_tid,
2031-
req->r_osd ? req->r_osd->o_osd : -1);
2032-
__register_request(osdc, req);
2033-
__unregister_linger_request(osdc, req);
2038+
dout("kicking lingering %p tid %llu osd%d\n", req,
2039+
req->r_tid, req->r_osd ? req->r_osd->o_osd : -1);
2040+
__register_request(osdc, req);
2041+
__unregister_linger_request(osdc, req);
2042+
}
20342043
}
20352044
reset_changed_osds(osdc);
20362045
mutex_unlock(&osdc->request_mutex);

0 commit comments

Comments
 (0)