Skip to content

Commit 213e8c5

Browse files
fdmananakdave
authored andcommitted
Btrfs: skip writeback of last page when truncating file to same size
When we truncate a file to the same size and that size is not aligned with the sector size, we end up triggering writeback (and wait for it to complete) of the last page. This is unncessary as we can not have delayed allocation beyond the inode's i_size and the goal of truncating a file to its own size is to discard prealloc extents (allocated via the fallocate(2) system call). Besides the unnecessary IO start and wait, it also breaks the oppurtunity for larger contiguous extents on disk, as before the last dirty page there might be other dirty pages. This scenario is probably not very common in general, however it is common for btrfs receive implementations because currently the send stream always issues a truncate operation for each processed inode as the last operation for that inode (this truncate operation is not always needed and the send implementation will be addressed to avoid them). So improve this by not starting and waiting for writeback of the inode's last page when we are truncating to exactly the same size. The following script was used to quickly measure the time a receive operation takes: $ cat test_send.sh #!/bin/bash SRC_DEV=/dev/sdc DST_DEV=/dev/sdd SRC_MNT=/mnt/sdc DST_MNT=/mnt/sdd mkfs.btrfs -f $SRC_DEV >/dev/null mkfs.btrfs -f $DST_DEV >/dev/null mount $SRC_DEV $SRC_MNT mount $DST_DEV $DST_MNT echo "Creating source filesystem" for ((t = 0; t < 10; t++)); do ( for ((i = 1; i <= 20000; i++)); do xfs_io -f -c "pwrite -S 0xab 0 5000" \ $SRC_MNT/file_$i > /dev/null done ) & worker_pids[$t]=$! done wait ${worker_pids[@]} echo "Creating and sending snapshot" btrfs subvolume snapshot -r $SRC_MNT $SRC_MNT/snap1 >/dev/null /usr/bin/time -f "send took %e seconds" \ btrfs send -f $SRC_MNT/send_file $SRC_MNT/snap1 /usr/bin/time -f "receive took %e seconds" \ btrfs receive -f $SRC_MNT/send_file $DST_MNT umount $SRC_MNT umount $DST_MNT The results for 5 runs were the following: * Without this change average receive time was 26.49 seconds standard deviation of 2.53 seconds * With this change average receive time was 12.51 seconds standard deviation of 0.32 seconds Reported-by: Robbie Ko <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent ed5d5f3 commit 213e8c5

File tree

1 file changed

+10
-8
lines changed

1 file changed

+10
-8
lines changed

fs/btrfs/inode.c

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ static const unsigned char btrfs_type_by_mode[S_IFMT >> S_SHIFT] = {
101101
};
102102

103103
static int btrfs_setsize(struct inode *inode, struct iattr *attr);
104-
static int btrfs_truncate(struct inode *inode);
104+
static int btrfs_truncate(struct inode *inode, bool skip_writeback);
105105
static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent);
106106
static noinline int cow_file_range(struct inode *inode,
107107
struct page *locked_page,
@@ -3668,7 +3668,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
36683668
goto out;
36693669
}
36703670

3671-
ret = btrfs_truncate(inode);
3671+
ret = btrfs_truncate(inode, false);
36723672
if (ret)
36733673
btrfs_orphan_del(NULL, BTRFS_I(inode));
36743674
} else {
@@ -5154,7 +5154,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
51545154
inode_dio_wait(inode);
51555155
btrfs_inode_resume_unlocked_dio(BTRFS_I(inode));
51565156

5157-
ret = btrfs_truncate(inode);
5157+
ret = btrfs_truncate(inode, newsize == oldsize);
51585158
if (ret && inode->i_nlink) {
51595159
int err;
51605160

@@ -9136,7 +9136,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
91369136
return ret;
91379137
}
91389138

9139-
static int btrfs_truncate(struct inode *inode)
9139+
static int btrfs_truncate(struct inode *inode, bool skip_writeback)
91409140
{
91419141
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
91429142
struct btrfs_root *root = BTRFS_I(inode)->root;
@@ -9147,10 +9147,12 @@ static int btrfs_truncate(struct inode *inode)
91479147
u64 mask = fs_info->sectorsize - 1;
91489148
u64 min_size = btrfs_calc_trunc_metadata_size(fs_info, 1);
91499149

9150-
ret = btrfs_wait_ordered_range(inode, inode->i_size & (~mask),
9151-
(u64)-1);
9152-
if (ret)
9153-
return ret;
9150+
if (!skip_writeback) {
9151+
ret = btrfs_wait_ordered_range(inode, inode->i_size & (~mask),
9152+
(u64)-1);
9153+
if (ret)
9154+
return ret;
9155+
}
91549156

91559157
/*
91569158
* Yes ladies and gentlemen, this is indeed ugly. The fact is we have

0 commit comments

Comments
 (0)