Skip to content

Commit 4b0228f

Browse files
committed
dax: for truncate/hole-punch, do zeroing through the driver if possible
In the truncate or hole-punch path in dax, we clear out sub-page ranges. If these sub-page ranges are sector aligned and sized, we can do the zeroing through the driver instead so that error-clearing is handled automatically. For sub-sector ranges, we still have to rely on clear_pmem and have the possibility of tripping over errors. Cc: Dan Williams <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Jan Kara <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Vishal Verma <[email protected]>
1 parent 679c8bd commit 4b0228f

File tree

2 files changed

+57
-5
lines changed

2 files changed

+57
-5
lines changed

Documentation/filesystems/dax.txt

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,38 @@ These filesystems may be used for inspiration:
7979
- ext4: the fourth extended filesystem, see Documentation/filesystems/ext4.txt
8080

8181

82+
Handling Media Errors
83+
---------------------
84+
85+
The libnvdimm subsystem stores a record of known media error locations for
86+
each pmem block device (in gendisk->badblocks). If we fault at such location,
87+
or one with a latent error not yet discovered, the application can expect
88+
to receive a SIGBUS. Libnvdimm also allows clearing of these errors by simply
89+
writing the affected sectors (through the pmem driver, and if the underlying
90+
NVDIMM supports the clear_poison DSM defined by ACPI).
91+
92+
Since DAX IO normally doesn't go through the driver/bio path, applications or
93+
sysadmins have an option to restore the lost data from a prior backup/inbuilt
94+
redundancy in the following ways:
95+
96+
1. Delete the affected file, and restore from a backup (sysadmin route):
97+
This will free the file system blocks that were being used by the file,
98+
and the next time they're allocated, they will be zeroed first, which
99+
happens through the driver, and will clear bad sectors.
100+
101+
2. Truncate or hole-punch the part of the file that has a bad-block (at least
102+
an entire aligned sector has to be hole-punched, but not necessarily an
103+
entire filesystem block).
104+
105+
These are the two basic paths that allow DAX filesystems to continue operating
106+
in the presence of media errors. More robust error recovery mechanisms can be
107+
built on top of this in the future, for example, involving redundancy/mirroring
108+
provided at the block layer through DM, or additionally, at the filesystem
109+
level. These would have to rely on the above two tenets, that error clearing
110+
can happen either by sending an IO through the driver, or zeroing (also through
111+
the driver).
112+
113+
82114
Shortcomings
83115
------------
84116

fs/dax.c

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -947,6 +947,19 @@ int dax_pfn_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
947947
}
948948
EXPORT_SYMBOL_GPL(dax_pfn_mkwrite);
949949

950+
static bool dax_range_is_aligned(struct block_device *bdev,
951+
unsigned int offset, unsigned int length)
952+
{
953+
unsigned short sector_size = bdev_logical_block_size(bdev);
954+
955+
if (!IS_ALIGNED(offset, sector_size))
956+
return false;
957+
if (!IS_ALIGNED(length, sector_size))
958+
return false;
959+
960+
return true;
961+
}
962+
950963
int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
951964
unsigned int offset, unsigned int length)
952965
{
@@ -955,11 +968,18 @@ int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
955968
.size = PAGE_SIZE,
956969
};
957970

958-
if (dax_map_atomic(bdev, &dax) < 0)
959-
return PTR_ERR(dax.addr);
960-
clear_pmem(dax.addr + offset, length);
961-
wmb_pmem();
962-
dax_unmap_atomic(bdev, &dax);
971+
if (dax_range_is_aligned(bdev, offset, length)) {
972+
sector_t start_sector = dax.sector + (offset >> 9);
973+
974+
return blkdev_issue_zeroout(bdev, start_sector,
975+
length >> 9, GFP_NOFS, true);
976+
} else {
977+
if (dax_map_atomic(bdev, &dax) < 0)
978+
return PTR_ERR(dax.addr);
979+
clear_pmem(dax.addr + offset, length);
980+
wmb_pmem();
981+
dax_unmap_atomic(bdev, &dax);
982+
}
963983
return 0;
964984
}
965985
EXPORT_SYMBOL_GPL(__dax_zero_page_range);

0 commit comments

Comments
 (0)