Skip to content

Commit 38e3eeb

Browse files
josefbacikkdave
authored andcommitted
btrfs: honor path->skip_locking in backref code
Qgroups will do the old roots lookup at delayed ref time, which could be while walking down the extent root while running a delayed ref. This should be fine, except we specifically lock eb's in the backref walking code irrespective of path->skip_locking, which deadlocks the system. Fix up the backref code to honor path->skip_locking, nobody will be modifying the commit_root when we're searching so it's completely safe to do. This happens since fb235dc ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans"), kernel may lockup with quota enabled. There is one backref trace triggered by snapshot dropping along with write operation in the source subvolume. The example can be reliably reproduced: btrfs-cleaner D 0 4062 2 0x80000000 Call Trace: schedule+0x32/0x90 btrfs_tree_read_lock+0x93/0x130 [btrfs] find_parent_nodes+0x29b/0x1170 [btrfs] btrfs_find_all_roots_safe+0xa8/0x120 [btrfs] btrfs_find_all_roots+0x57/0x70 [btrfs] btrfs_qgroup_trace_extent_post+0x37/0x70 [btrfs] btrfs_qgroup_trace_leaf_items+0x10b/0x140 [btrfs] btrfs_qgroup_trace_subtree+0xc8/0xe0 [btrfs] do_walk_down+0x541/0x5e3 [btrfs] walk_down_tree+0xab/0xe7 [btrfs] btrfs_drop_snapshot+0x356/0x71a [btrfs] btrfs_clean_one_deleted_snapshot+0xb8/0xf0 [btrfs] cleaner_kthread+0x12b/0x160 [btrfs] kthread+0x112/0x130 ret_from_fork+0x27/0x50 When dropping snapshots with qgroup enabled, we will trigger backref walk. However such backref walk at that timing is pretty dangerous, as if one of the parent nodes get WRITE locked by other thread, we could cause a dead lock. For example: FS 260 FS 261 (Dropped) node A node B / \ / \ node C node D node E / \ / \ / \ leaf F|leaf G|leaf H|leaf I|leaf J|leaf K The lock sequence would be: Thread A (cleaner) | Thread B (other writer) ----------------------------------------------------------------------- write_lock(B) | write_lock(D) | ^^^ called by walk_down_tree() | | write_lock(A) | write_lock(D) << Stall read_lock(H) << for backref walk | read_lock(D) << lock owner is | the same thread A | so read lock is OK | read_lock(A) << Stall | So thread A hold write lock D, and needs read lock A to unlock. While thread B holds write lock A, while needs lock D to unlock. This will cause a deadlock. This is not only limited to snapshot dropping case. As the backref walk, even only happens on commit trees, is breaking the normal top-down locking order, makes it deadlock prone. Fixes: fb235dc ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans") CC: [email protected] # 4.14+ Reported-and-tested-by: David Sterba <[email protected]> Reported-by: Filipe Manana <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: Filipe Manana <[email protected]> [ rebase to latest branch and fix lock assert bug in btrfs/007 ] Signed-off-by: Qu Wenruo <[email protected]> [ copy logs and deadlock analysis from Qu's patch ] Signed-off-by: David Sterba <[email protected]>
1 parent f5fef45 commit 38e3eeb

File tree

1 file changed

+13
-7
lines changed

1 file changed

+13
-7
lines changed

fs/btrfs/backref.c

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -712,7 +712,7 @@ static int resolve_indirect_refs(struct btrfs_fs_info *fs_info,
712712
* read tree blocks and add keys where required.
713713
*/
714714
static int add_missing_keys(struct btrfs_fs_info *fs_info,
715-
struct preftrees *preftrees)
715+
struct preftrees *preftrees, bool lock)
716716
{
717717
struct prelim_ref *ref;
718718
struct extent_buffer *eb;
@@ -737,12 +737,14 @@ static int add_missing_keys(struct btrfs_fs_info *fs_info,
737737
free_extent_buffer(eb);
738738
return -EIO;
739739
}
740-
btrfs_tree_read_lock(eb);
740+
if (lock)
741+
btrfs_tree_read_lock(eb);
741742
if (btrfs_header_level(eb) == 0)
742743
btrfs_item_key_to_cpu(eb, &ref->key_for_search, 0);
743744
else
744745
btrfs_node_key_to_cpu(eb, &ref->key_for_search, 0);
745-
btrfs_tree_read_unlock(eb);
746+
if (lock)
747+
btrfs_tree_read_unlock(eb);
746748
free_extent_buffer(eb);
747749
prelim_ref_insert(fs_info, &preftrees->indirect, ref, NULL);
748750
cond_resched();
@@ -1227,7 +1229,7 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans,
12271229

12281230
btrfs_release_path(path);
12291231

1230-
ret = add_missing_keys(fs_info, &preftrees);
1232+
ret = add_missing_keys(fs_info, &preftrees, path->skip_locking == 0);
12311233
if (ret)
12321234
goto out;
12331235

@@ -1288,11 +1290,15 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans,
12881290
ret = -EIO;
12891291
goto out;
12901292
}
1291-
btrfs_tree_read_lock(eb);
1292-
btrfs_set_lock_blocking_read(eb);
1293+
1294+
if (!path->skip_locking) {
1295+
btrfs_tree_read_lock(eb);
1296+
btrfs_set_lock_blocking_read(eb);
1297+
}
12931298
ret = find_extent_in_eb(eb, bytenr,
12941299
*extent_item_pos, &eie, ignore_offset);
1295-
btrfs_tree_read_unlock_blocking(eb);
1300+
if (!path->skip_locking)
1301+
btrfs_tree_read_unlock_blocking(eb);
12961302
free_extent_buffer(eb);
12971303
if (ret < 0)
12981304
goto out;

0 commit comments

Comments
 (0)