Skip to content

Commit 0e9cebe

Browse files
Josef Baciksnitm
authored andcommitted
dm: add log writes target
Introduce a new target that is meant for file system developers to test file system integrity at particular points in the life of a file system. We capture all write requests and associated data and log them to a separate device for later replay. There is a userspace utility to do this replay. The idea behind this is to give file system developers a tool to verify that the file system is always consistent. Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: Zach Brown <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
1 parent 7f61f5a commit 0e9cebe

File tree

4 files changed

+982
-0
lines changed

4 files changed

+982
-0
lines changed
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
dm-log-writes
2+
=============
3+
4+
This target takes 2 devices, one to pass all IO to normally, and one to log all
5+
of the write operations to. This is intended for file system developers wishing
6+
to verify the integrity of metadata or data as the file system is written to.
7+
There is a log_write_entry written for every WRITE request and the target is
8+
able to take arbitrary data from userspace to insert into the log. The data
9+
that is in the WRITE requests is copied into the log to make the replay happen
10+
exactly as it happened originally.
11+
12+
Log Ordering
13+
============
14+
15+
We log things in order of completion once we are sure the write is no longer in
16+
cache. This means that normal WRITE requests are not actually logged until the
17+
next REQ_FLUSH request. This is to make it easier for userspace to replay the
18+
log in a way that correlates to what is on disk and not what is in cache, to
19+
make it easier to detect improper waiting/flushing.
20+
21+
This works by attaching all WRITE requests to a list once the write completes.
22+
Once we see a REQ_FLUSH request we splice this list onto the request and once
23+
the FLUSH request completes we log all of the WRITEs and then the FLUSH. Only
24+
completed WRITEs, at the time the REQ_FLUSH is issued, are added in order to
25+
simulate the worst case scenario with regard to power failures. Consider the
26+
following example (W means write, C means complete):
27+
28+
W1,W2,W3,C3,C2,Wflush,C1,Cflush
29+
30+
The log would show the following
31+
32+
W3,W2,flush,W1....
33+
34+
Again this is to simulate what is actually on disk, this allows us to detect
35+
cases where a power failure at a particular point in time would create an
36+
inconsistent file system.
37+
38+
Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as
39+
they complete as those requests will obviously bypass the device cache.
40+
41+
Any REQ_DISCARD requests are treated like WRITE requests. Otherwise we would
42+
have all the DISCARD requests, and then the WRITE requests and then the FLUSH
43+
request. Consider the following example:
44+
45+
WRITE block 1, DISCARD block 1, FLUSH
46+
47+
If we logged DISCARD when it completed, the replay would look like this
48+
49+
DISCARD 1, WRITE 1, FLUSH
50+
51+
which isn't quite what happened and wouldn't be caught during the log replay.
52+
53+
Target interface
54+
================
55+
56+
i) Constructor
57+
58+
log-writes <dev_path> <log_dev_path>
59+
60+
dev_path : Device that all of the IO will go to normally.
61+
log_dev_path : Device where the log entries are written to.
62+
63+
ii) Status
64+
65+
<#logged entries> <highest allocated sector>
66+
67+
#logged entries : Number of logged entries
68+
highest allocated sector : Highest allocated sector
69+
70+
iii) Messages
71+
72+
mark <description>
73+
74+
You can use a dmsetup message to set an arbitrary mark in a log.
75+
For example say you want to fsck a file system after every
76+
write, but first you need to replay up to the mkfs to make sure
77+
we're fsck'ing something reasonable, you would do something like
78+
this:
79+
80+
mkfs.btrfs -f /dev/mapper/log
81+
dmsetup message log 0 mark mkfs
82+
<run test>
83+
84+
This would allow you to replay the log up to the mkfs mark and
85+
then replay from that point on doing the fsck check in the
86+
interval that you want.
87+
88+
Every log has a mark at the end labeled "dm-log-writes-end".
89+
90+
Userspace component
91+
===================
92+
93+
There is a userspace tool that will replay the log for you in various ways.
94+
It can be found here: https://github.com/josefbacik/log-writes
95+
96+
Example usage
97+
=============
98+
99+
Say you want to test fsync on your file system. You would do something like
100+
this:
101+
102+
TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
103+
dmsetup create log --table "$TABLE"
104+
mkfs.btrfs -f /dev/mapper/log
105+
dmsetup message log 0 mark mkfs
106+
107+
mount /dev/mapper/log /mnt/btrfs-test
108+
<some test that does fsync at the end>
109+
dmsetup message log 0 mark fsync
110+
md5sum /mnt/btrfs-test/foo
111+
umount /mnt/btrfs-test
112+
113+
dmsetup remove log
114+
replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync
115+
mount /dev/sdb /mnt/btrfs-test
116+
md5sum /mnt/btrfs-test/foo
117+
<verify md5sum's are correct>
118+
119+
Another option is to do a complicated file system operation and verify the file
120+
system is consistent during the entire operation. You could do this with:
121+
122+
TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
123+
dmsetup create log --table "$TABLE"
124+
mkfs.btrfs -f /dev/mapper/log
125+
dmsetup message log 0 mark mkfs
126+
127+
mount /dev/mapper/log /mnt/btrfs-test
128+
<fsstress to dirty the fs>
129+
btrfs filesystem balance /mnt/btrfs-test
130+
umount /mnt/btrfs-test
131+
dmsetup remove log
132+
133+
replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs
134+
btrfsck /dev/sdb
135+
replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \
136+
--fsck "btrfsck /dev/sdb" --check fua
137+
138+
And that will replay the log until it sees a FUA request, run the fsck command
139+
and if the fsck passes it will replay to the next FUA, until it is completed or
140+
the fsck command exists abnormally.

drivers/md/Kconfig

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -443,4 +443,20 @@ config DM_SWITCH
443443

444444
If unsure, say N.
445445

446+
config DM_LOG_WRITES
447+
tristate "Log writes target support"
448+
depends on BLK_DEV_DM
449+
---help---
450+
This device-mapper target takes two devices, one device to use
451+
normally, one to log all write operations done to the first device.
452+
This is for use by file system developers wishing to verify that
453+
their fs is writing a consitent file system at all times by allowing
454+
them to replay the log in a variety of ways and to check the
455+
contents.
456+
457+
To compile this code as a module, choose M here: the module will
458+
be called dm-log-writes.
459+
460+
If unsure, say N.
461+
446462
endif # MD

drivers/md/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ obj-$(CONFIG_DM_CACHE) += dm-cache.o
5555
obj-$(CONFIG_DM_CACHE_MQ) += dm-cache-mq.o
5656
obj-$(CONFIG_DM_CACHE_CLEANER) += dm-cache-cleaner.o
5757
obj-$(CONFIG_DM_ERA) += dm-era.o
58+
obj-$(CONFIG_DM_LOG_WRITES) += dm-log-writes.o
5859

5960
ifeq ($(CONFIG_DM_UEVENT),y)
6061
dm-mod-objs += dm-uevent.o

0 commit comments

Comments
 (0)