Skip to content

Commit 4dbd80f

Browse files
Qu Wenruofdmanana
authored andcommitted
btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error
[BUG] When btrfs_reloc_clone_csum() reports error, it can underflow metadata and leads to kernel assertion on outstanding extents in run_delalloc_nocow() and cow_file_range(). BTRFS info (device vdb5): relocating block group 12582912 flags data BTRFS info (device vdb5): found 1 extents assertion failed: inode->outstanding_extents >= num_extents, file: fs/btrfs//extent-tree.c, line: 5858 Currently, due to another bug blocking ordered extents, the bug is only reproducible under certain block group layout and using error injection. a) Create one data block group with one 4K extent in it. To avoid the bug that hangs btrfs due to ordered extent which never finishes b) Make btrfs_reloc_clone_csum() always fail c) Relocate that block group [CAUSE] run_delalloc_nocow() and cow_file_range() handles error from btrfs_reloc_clone_csum() wrongly: (The ascii chart shows a more generic case of this bug other than the bug mentioned above) |<------------------ delalloc range --------------------------->| | OE 1 | OE 2 | ... | OE n | |<----------- cleanup range --------------->| |<----------- ----------->| \/ btrfs_finish_ordered_io() range So error handler, which calls extent_clear_unlock_delalloc() with EXTENT_DELALLOC and EXTENT_DO_ACCOUNT bits, and btrfs_finish_ordered_io() will both cover OE n, and free its metadata, causing metadata under flow. [Fix] The fix is to ensure after calling btrfs_add_ordered_extent(), we only call error handler after increasing the iteration offset, so that cleanup range won't cover any created ordered extent. |<------------------ delalloc range --------------------------->| | OE 1 | OE 2 | ... | OE n | |<----------- ----------->|<---------- cleanup range --------->| \/ btrfs_finish_ordered_io() range Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: Filipe Manana <[email protected]> Reviewed-by: Liu Bo <[email protected]>
1 parent a967efb commit 4dbd80f

File tree

1 file changed

+39
-12
lines changed

1 file changed

+39
-12
lines changed

fs/btrfs/inode.c

Lines changed: 39 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -998,15 +998,24 @@ static noinline int cow_file_range(struct inode *inode,
998998
BTRFS_DATA_RELOC_TREE_OBJECTID) {
999999
ret = btrfs_reloc_clone_csums(inode, start,
10001000
cur_alloc_size);
1001+
/*
1002+
* Only drop cache here, and process as normal.
1003+
*
1004+
* We must not allow extent_clear_unlock_delalloc()
1005+
* at out_unlock label to free meta of this ordered
1006+
* extent, as its meta should be freed by
1007+
* btrfs_finish_ordered_io().
1008+
*
1009+
* So we must continue until @start is increased to
1010+
* skip current ordered extent.
1011+
*/
10011012
if (ret)
1002-
goto out_drop_extent_cache;
1013+
btrfs_drop_extent_cache(BTRFS_I(inode), start,
1014+
start + ram_size - 1, 0);
10031015
}
10041016

10051017
btrfs_dec_block_group_reservations(fs_info, ins.objectid);
10061018

1007-
if (disk_num_bytes < cur_alloc_size)
1008-
break;
1009-
10101019
/* we're not doing compressed IO, don't unlock the first
10111020
* page (which the caller expects to stay locked), don't
10121021
* clear any dirty bits and don't set any writeback bits
@@ -1022,10 +1031,21 @@ static noinline int cow_file_range(struct inode *inode,
10221031
delalloc_end, locked_page,
10231032
EXTENT_LOCKED | EXTENT_DELALLOC,
10241033
op);
1025-
disk_num_bytes -= cur_alloc_size;
1034+
if (disk_num_bytes < cur_alloc_size)
1035+
disk_num_bytes = 0;
1036+
else
1037+
disk_num_bytes -= cur_alloc_size;
10261038
num_bytes -= cur_alloc_size;
10271039
alloc_hint = ins.objectid + ins.offset;
10281040
start += cur_alloc_size;
1041+
1042+
/*
1043+
* btrfs_reloc_clone_csums() error, since start is increased
1044+
* extent_clear_unlock_delalloc() at out_unlock label won't
1045+
* free metadata of current ordered extent, we're OK to exit.
1046+
*/
1047+
if (ret)
1048+
goto out_unlock;
10291049
}
10301050
out:
10311051
return ret;
@@ -1414,15 +1434,14 @@ static noinline int run_delalloc_nocow(struct inode *inode,
14141434
BUG_ON(ret); /* -ENOMEM */
14151435

14161436
if (root->root_key.objectid ==
1417-
BTRFS_DATA_RELOC_TREE_OBJECTID) {
1437+
BTRFS_DATA_RELOC_TREE_OBJECTID)
1438+
/*
1439+
* Error handled later, as we must prevent
1440+
* extent_clear_unlock_delalloc() in error handler
1441+
* from freeing metadata of created ordered extent.
1442+
*/
14181443
ret = btrfs_reloc_clone_csums(inode, cur_offset,
14191444
num_bytes);
1420-
if (ret) {
1421-
if (!nolock && nocow)
1422-
btrfs_end_write_no_snapshoting(root);
1423-
goto error;
1424-
}
1425-
}
14261445

14271446
extent_clear_unlock_delalloc(inode, cur_offset,
14281447
cur_offset + num_bytes - 1, end,
@@ -1434,6 +1453,14 @@ static noinline int run_delalloc_nocow(struct inode *inode,
14341453
if (!nolock && nocow)
14351454
btrfs_end_write_no_snapshoting(root);
14361455
cur_offset = extent_end;
1456+
1457+
/*
1458+
* btrfs_reloc_clone_csums() error, now we're OK to call error
1459+
* handler, as metadata for created ordered extent will only
1460+
* be freed by btrfs_finish_ordered_io().
1461+
*/
1462+
if (ret)
1463+
goto error;
14371464
if (cur_offset > end)
14381465
break;
14391466
}

0 commit comments

Comments
 (0)