Skip to content

Commit 133d452

Browse files
committed
md: flush writes before starting a recovery.
When we write to a degraded array which has a bitmap, we make sure the relevant bit in the bitmap remains set when the write completes (so a 're-add' can quickly rebuilt a temporarily-missing device). If, immediately after such a write starts, we incorporate a spare, commence recovery, and skip over the region where the write is happening (because the 'needs recovery' flag isn't set yet), then that write will not get to the new device. Once the recovery finishes the new device will be trusted, but will have incorrect data, leading to possible corruption. We cannot set the 'needs recovery' flag when we start the write as we do not know easily if the write will be "degraded" or not. That depends on details of the particular raid level and particular write request. This patch fixes a corruption issue of long standing and so it suitable for any -stable kernel. It applied correctly to 3.0 at least and will minor editing to earlier kernels. Reported-by: Bill <[email protected]> Tested-by: Bill <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: NeilBrown <[email protected]>
1 parent 9bd3592 commit 133d452

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

drivers/md/md.c

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7501,6 +7501,19 @@ void md_do_sync(struct md_thread *thread)
75017501
rdev->recovery_offset < j)
75027502
j = rdev->recovery_offset;
75037503
rcu_read_unlock();
7504+
7505+
/* If there is a bitmap, we need to make sure all
7506+
* writes that started before we added a spare
7507+
* complete before we start doing a recovery.
7508+
* Otherwise the write might complete and (via
7509+
* bitmap_endwrite) set a bit in the bitmap after the
7510+
* recovery has checked that bit and skipped that
7511+
* region.
7512+
*/
7513+
if (mddev->bitmap) {
7514+
mddev->pers->quiesce(mddev, 1);
7515+
mddev->pers->quiesce(mddev, 0);
7516+
}
75047517
}
75057518

75067519
printk(KERN_INFO "md: %s of RAID array %s\n", desc, mdname(mddev));

0 commit comments

Comments
 (0)