Skip to content

Commit fe816d0

Browse files
lorddoskiaskdave
authored andcommitted
btrfs: Fix delalloc inodes invalidation during transaction abort
When a transaction is aborted btrfs_cleanup_transaction is called to cleanup all the various in-flight bits and pieces which migth be active. One of those is delalloc inodes - inodes which have dirty pages which haven't been persisted yet. Currently the process of freeing such delalloc inodes in exceptional circumstances such as transaction abort boiled down to calling btrfs_invalidate_inodes whose sole job is to invalidate the dentries for all inodes related to a root. This is in fact wrong and insufficient since such delalloc inodes will likely have pending pages or ordered-extents and will be linked to the sb->s_inode_list. This means that unmounting a btrfs instance with an aborted transaction could potentially lead inodes/their pages visible to the system long after their superblock has been freed. This in turn leads to a "use-after-free" situation once page shrink is triggered. This situation could be simulated by running generic/019 which would cause such inodes to be left hanging, followed by generic/176 which causes memory pressure and page eviction which lead to touching the freed super block instance. This situation is additionally detected by the unmount code of VFS with the following message: "VFS: Busy inodes after unmount of Self-destruct in 5 seconds. Have a nice day..." Additionally btrfs hits WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree)); in free_fs_root for the same reason. This patch aims to rectify the sitaution by doing the following: 1. Change btrfs_destroy_delalloc_inodes so that it calls invalidate_inode_pages2 for every inode on the delalloc list, this ensures that all the pages of the inode are released. This function boils down to calling btrfs_releasepage. During test I observed cases where inodes on the delalloc list were having an i_count of 0, so this necessitates using igrab to be sure we are working on a non-freed inode. 2. Since calling btrfs_releasepage might queue delayed iputs move the call out to btrfs_cleanup_transaction in btrfs_error_commit_super before calling run_delayed_iputs for the last time. This is necessary to ensure that delayed iputs are run. Note: this patch is tagged for 4.14 stable but the fix applies to older versions too but needs to be backported manually due to conflicts. CC: [email protected] # 4.14.x: 2b87733: btrfs: Split btrfs_del_delalloc_inode into 2 functions CC: [email protected] # 4.14.x Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> [ add comment to igrab ] Signed-off-by: David Sterba <[email protected]>
1 parent 2b87733 commit fe816d0

File tree

1 file changed

+15
-11
lines changed

1 file changed

+15
-11
lines changed

fs/btrfs/disk-io.c

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3818,6 +3818,7 @@ void close_ctree(struct btrfs_fs_info *fs_info)
38183818
set_bit(BTRFS_FS_CLOSING_DONE, &fs_info->flags);
38193819

38203820
btrfs_free_qgroup_config(fs_info);
3821+
ASSERT(list_empty(&fs_info->delalloc_roots));
38213822

38223823
if (percpu_counter_sum(&fs_info->delalloc_bytes)) {
38233824
btrfs_info(fs_info, "at unmount delalloc count %lld",
@@ -4125,15 +4126,15 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
41254126

41264127
static void btrfs_error_commit_super(struct btrfs_fs_info *fs_info)
41274128
{
4129+
/* cleanup FS via transaction */
4130+
btrfs_cleanup_transaction(fs_info);
4131+
41284132
mutex_lock(&fs_info->cleaner_mutex);
41294133
btrfs_run_delayed_iputs(fs_info);
41304134
mutex_unlock(&fs_info->cleaner_mutex);
41314135

41324136
down_write(&fs_info->cleanup_work_sem);
41334137
up_write(&fs_info->cleanup_work_sem);
4134-
4135-
/* cleanup FS via transaction */
4136-
btrfs_cleanup_transaction(fs_info);
41374138
}
41384139

41394140
static void btrfs_destroy_ordered_extents(struct btrfs_root *root)
@@ -4258,19 +4259,23 @@ static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root)
42584259
list_splice_init(&root->delalloc_inodes, &splice);
42594260

42604261
while (!list_empty(&splice)) {
4262+
struct inode *inode = NULL;
42614263
btrfs_inode = list_first_entry(&splice, struct btrfs_inode,
42624264
delalloc_inodes);
4263-
4264-
list_del_init(&btrfs_inode->delalloc_inodes);
4265-
clear_bit(BTRFS_INODE_IN_DELALLOC_LIST,
4266-
&btrfs_inode->runtime_flags);
4265+
__btrfs_del_delalloc_inode(root, btrfs_inode);
42674266
spin_unlock(&root->delalloc_lock);
42684267

4269-
btrfs_invalidate_inodes(btrfs_inode->root);
4270-
4268+
/*
4269+
* Make sure we get a live inode and that it'll not disappear
4270+
* meanwhile.
4271+
*/
4272+
inode = igrab(&btrfs_inode->vfs_inode);
4273+
if (inode) {
4274+
invalidate_inode_pages2(inode->i_mapping);
4275+
iput(inode);
4276+
}
42714277
spin_lock(&root->delalloc_lock);
42724278
}
4273-
42744279
spin_unlock(&root->delalloc_lock);
42754280
}
42764281

@@ -4286,7 +4291,6 @@ static void btrfs_destroy_all_delalloc_inodes(struct btrfs_fs_info *fs_info)
42864291
while (!list_empty(&splice)) {
42874292
root = list_first_entry(&splice, struct btrfs_root,
42884293
delalloc_root);
4289-
list_del_init(&root->delalloc_root);
42904294
root = btrfs_grab_fs_root(root);
42914295
BUG_ON(!root);
42924296
spin_unlock(&fs_info->delalloc_root_lock);

0 commit comments

Comments
 (0)