|
| 1 | +dm-log-writes |
| 2 | +============= |
| 3 | + |
| 4 | +This target takes 2 devices, one to pass all IO to normally, and one to log all |
| 5 | +of the write operations to. This is intended for file system developers wishing |
| 6 | +to verify the integrity of metadata or data as the file system is written to. |
| 7 | +There is a log_write_entry written for every WRITE request and the target is |
| 8 | +able to take arbitrary data from userspace to insert into the log. The data |
| 9 | +that is in the WRITE requests is copied into the log to make the replay happen |
| 10 | +exactly as it happened originally. |
| 11 | + |
| 12 | +Log Ordering |
| 13 | +============ |
| 14 | + |
| 15 | +We log things in order of completion once we are sure the write is no longer in |
| 16 | +cache. This means that normal WRITE requests are not actually logged until the |
| 17 | +next REQ_FLUSH request. This is to make it easier for userspace to replay the |
| 18 | +log in a way that correlates to what is on disk and not what is in cache, to |
| 19 | +make it easier to detect improper waiting/flushing. |
| 20 | + |
| 21 | +This works by attaching all WRITE requests to a list once the write completes. |
| 22 | +Once we see a REQ_FLUSH request we splice this list onto the request and once |
| 23 | +the FLUSH request completes we log all of the WRITEs and then the FLUSH. Only |
| 24 | +completed WRITEs, at the time the REQ_FLUSH is issued, are added in order to |
| 25 | +simulate the worst case scenario with regard to power failures. Consider the |
| 26 | +following example (W means write, C means complete): |
| 27 | + |
| 28 | +W1,W2,W3,C3,C2,Wflush,C1,Cflush |
| 29 | + |
| 30 | +The log would show the following |
| 31 | + |
| 32 | +W3,W2,flush,W1.... |
| 33 | + |
| 34 | +Again this is to simulate what is actually on disk, this allows us to detect |
| 35 | +cases where a power failure at a particular point in time would create an |
| 36 | +inconsistent file system. |
| 37 | + |
| 38 | +Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as |
| 39 | +they complete as those requests will obviously bypass the device cache. |
| 40 | + |
| 41 | +Any REQ_DISCARD requests are treated like WRITE requests. Otherwise we would |
| 42 | +have all the DISCARD requests, and then the WRITE requests and then the FLUSH |
| 43 | +request. Consider the following example: |
| 44 | + |
| 45 | +WRITE block 1, DISCARD block 1, FLUSH |
| 46 | + |
| 47 | +If we logged DISCARD when it completed, the replay would look like this |
| 48 | + |
| 49 | +DISCARD 1, WRITE 1, FLUSH |
| 50 | + |
| 51 | +which isn't quite what happened and wouldn't be caught during the log replay. |
| 52 | + |
| 53 | +Target interface |
| 54 | +================ |
| 55 | + |
| 56 | +i) Constructor |
| 57 | + |
| 58 | + log-writes <dev_path> <log_dev_path> |
| 59 | + |
| 60 | + dev_path : Device that all of the IO will go to normally. |
| 61 | + log_dev_path : Device where the log entries are written to. |
| 62 | + |
| 63 | +ii) Status |
| 64 | + |
| 65 | + <#logged entries> <highest allocated sector> |
| 66 | + |
| 67 | + #logged entries : Number of logged entries |
| 68 | + highest allocated sector : Highest allocated sector |
| 69 | + |
| 70 | +iii) Messages |
| 71 | + |
| 72 | + mark <description> |
| 73 | + |
| 74 | + You can use a dmsetup message to set an arbitrary mark in a log. |
| 75 | + For example say you want to fsck a file system after every |
| 76 | + write, but first you need to replay up to the mkfs to make sure |
| 77 | + we're fsck'ing something reasonable, you would do something like |
| 78 | + this: |
| 79 | + |
| 80 | + mkfs.btrfs -f /dev/mapper/log |
| 81 | + dmsetup message log 0 mark mkfs |
| 82 | + <run test> |
| 83 | + |
| 84 | + This would allow you to replay the log up to the mkfs mark and |
| 85 | + then replay from that point on doing the fsck check in the |
| 86 | + interval that you want. |
| 87 | + |
| 88 | + Every log has a mark at the end labeled "dm-log-writes-end". |
| 89 | + |
| 90 | +Userspace component |
| 91 | +=================== |
| 92 | + |
| 93 | +There is a userspace tool that will replay the log for you in various ways. |
| 94 | +It can be found here: https://github.com/josefbacik/log-writes |
| 95 | + |
| 96 | +Example usage |
| 97 | +============= |
| 98 | + |
| 99 | +Say you want to test fsync on your file system. You would do something like |
| 100 | +this: |
| 101 | + |
| 102 | +TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" |
| 103 | +dmsetup create log --table "$TABLE" |
| 104 | +mkfs.btrfs -f /dev/mapper/log |
| 105 | +dmsetup message log 0 mark mkfs |
| 106 | + |
| 107 | +mount /dev/mapper/log /mnt/btrfs-test |
| 108 | +<some test that does fsync at the end> |
| 109 | +dmsetup message log 0 mark fsync |
| 110 | +md5sum /mnt/btrfs-test/foo |
| 111 | +umount /mnt/btrfs-test |
| 112 | + |
| 113 | +dmsetup remove log |
| 114 | +replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync |
| 115 | +mount /dev/sdb /mnt/btrfs-test |
| 116 | +md5sum /mnt/btrfs-test/foo |
| 117 | +<verify md5sum's are correct> |
| 118 | + |
| 119 | +Another option is to do a complicated file system operation and verify the file |
| 120 | +system is consistent during the entire operation. You could do this with: |
| 121 | + |
| 122 | +TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" |
| 123 | +dmsetup create log --table "$TABLE" |
| 124 | +mkfs.btrfs -f /dev/mapper/log |
| 125 | +dmsetup message log 0 mark mkfs |
| 126 | + |
| 127 | +mount /dev/mapper/log /mnt/btrfs-test |
| 128 | +<fsstress to dirty the fs> |
| 129 | +btrfs filesystem balance /mnt/btrfs-test |
| 130 | +umount /mnt/btrfs-test |
| 131 | +dmsetup remove log |
| 132 | + |
| 133 | +replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs |
| 134 | +btrfsck /dev/sdb |
| 135 | +replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \ |
| 136 | + --fsck "btrfsck /dev/sdb" --check fua |
| 137 | + |
| 138 | +And that will replay the log until it sees a FUA request, run the fsck command |
| 139 | +and if the fsck passes it will replay to the next FUA, until it is completed or |
| 140 | +the fsck command exists abnormally. |
0 commit comments