Skip to content

Commit d7cd4dd

Browse files
fdmananakdave
authored andcommitted
Btrfs: fix sysfs warning and missing raid sysfs directories
In the 5.3 merge window, commit 7c7e301 ("btrfs: sysfs: Replace default_attrs in ktypes with groups"), we started using the member "defaults_groups" for the kobject type "btrfs_raid_ktype". That leads to a series of warnings when running some test cases of fstests, such as btrfs/027, btrfs/124 and btrfs/176. The traces produced by those warnings are like the following: [116648.059212] kernfs: can not remove 'total_bytes', no directory [116648.060112] WARNING: CPU: 3 PID: 28500 at fs/kernfs/dir.c:1504 kernfs_remove_by_name_ns+0x75/0x80 (...) [116648.066482] CPU: 3 PID: 28500 Comm: umount Tainted: G W 5.3.0-rc3-btrfs-next-54 #1 (...) [116648.069376] RIP: 0010:kernfs_remove_by_name_ns+0x75/0x80 (...) [116648.072385] RSP: 0018:ffffabfd0090bd08 EFLAGS: 00010282 [116648.073437] RAX: 0000000000000000 RBX: ffffffffc0c11998 RCX: 0000000000000000 [116648.074201] RDX: ffff9fff603a7a00 RSI: ffff9fff603978a8 RDI: ffff9fff603978a8 [116648.074956] RBP: ffffffffc0b9ca2f R08: 0000000000000000 R09: 0000000000000001 [116648.075708] R10: ffff9ffe1f72e1c0 R11: 0000000000000000 R12: ffffffffc0b94120 [116648.076434] R13: ffffffffb3d9b4e0 R14: 0000000000000000 R15: dead000000000100 [116648.077143] FS: 00007f9cdc78a2c0(0000) GS:ffff9fff60380000(0000) knlGS:0000000000000000 [116648.077852] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [116648.078546] CR2: 00007f9fc4747ab4 CR3: 00000005c7832003 CR4: 00000000003606e0 [116648.079235] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [116648.079907] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [116648.080585] Call Trace: [116648.081262] remove_files+0x31/0x70 [116648.081929] sysfs_remove_group+0x38/0x80 [116648.082596] sysfs_remove_groups+0x34/0x70 [116648.083258] kobject_del+0x20/0x60 [116648.083933] btrfs_free_block_groups+0x405/0x430 [btrfs] [116648.084608] close_ctree+0x19a/0x380 [btrfs] [116648.085278] generic_shutdown_super+0x6c/0x110 [116648.085951] kill_anon_super+0xe/0x30 [116648.086621] btrfs_kill_super+0x12/0xa0 [btrfs] [116648.087289] deactivate_locked_super+0x3a/0x70 [116648.087956] cleanup_mnt+0xb4/0x160 [116648.088620] task_work_run+0x7e/0xc0 [116648.089285] exit_to_usermode_loop+0xfa/0x100 [116648.089933] do_syscall_64+0x1cb/0x220 [116648.090567] entry_SYSCALL_64_after_hwframe+0x49/0xbe [116648.091197] RIP: 0033:0x7f9cdc073b37 (...) [116648.100046] ---[ end trace 22e24db328ccadf8 ]--- [116648.100618] ------------[ cut here ]------------ [116648.101175] kernfs: can not remove 'used_bytes', no directory [116648.101731] WARNING: CPU: 3 PID: 28500 at fs/kernfs/dir.c:1504 kernfs_remove_by_name_ns+0x75/0x80 (...) [116648.105649] CPU: 3 PID: 28500 Comm: umount Tainted: G W 5.3.0-rc3-btrfs-next-54 #1 (...) [116648.107461] RIP: 0010:kernfs_remove_by_name_ns+0x75/0x80 (...) [116648.109336] RSP: 0018:ffffabfd0090bd08 EFLAGS: 00010282 [116648.109979] RAX: 0000000000000000 RBX: ffffffffc0c119a0 RCX: 0000000000000000 [116648.110625] RDX: ffff9fff603a7a00 RSI: ffff9fff603978a8 RDI: ffff9fff603978a8 [116648.111283] RBP: ffffffffc0b9ca41 R08: 0000000000000000 R09: 0000000000000001 [116648.111940] R10: ffff9ffe1f72e1c0 R11: 0000000000000000 R12: ffffffffc0b94120 [116648.112603] R13: ffffffffb3d9b4e0 R14: 0000000000000000 R15: dead000000000100 [116648.113268] FS: 00007f9cdc78a2c0(0000) GS:ffff9fff60380000(0000) knlGS:0000000000000000 [116648.113939] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [116648.114607] CR2: 00007f9fc4747ab4 CR3: 00000005c7832003 CR4: 00000000003606e0 [116648.115286] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [116648.115966] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [116648.116649] Call Trace: [116648.117326] remove_files+0x31/0x70 [116648.117997] sysfs_remove_group+0x38/0x80 [116648.118671] sysfs_remove_groups+0x34/0x70 [116648.119342] kobject_del+0x20/0x60 [116648.120022] btrfs_free_block_groups+0x405/0x430 [btrfs] [116648.120707] close_ctree+0x19a/0x380 [btrfs] [116648.121396] generic_shutdown_super+0x6c/0x110 [116648.122057] kill_anon_super+0xe/0x30 [116648.122702] btrfs_kill_super+0x12/0xa0 [btrfs] [116648.123335] deactivate_locked_super+0x3a/0x70 [116648.123961] cleanup_mnt+0xb4/0x160 [116648.124586] task_work_run+0x7e/0xc0 [116648.125210] exit_to_usermode_loop+0xfa/0x100 [116648.125830] do_syscall_64+0x1cb/0x220 [116648.126463] entry_SYSCALL_64_after_hwframe+0x49/0xbe [116648.127080] RIP: 0033:0x7f9cdc073b37 (...) [116648.135923] ---[ end trace 22e24db328ccadf9 ]--- These happen because, during the unmount path, we call kobject_del() for raid kobjects that are not fully initialized, meaning that we set their ktype (as btrfs_raid_ktype) through link_block_group() but we didn't set their parent kobject, which is done through btrfs_add_raid_kobjects(). We have this split raid kobject setup since commit 75cb379 ("btrfs: defer adding raid type kobject until after chunk relocation") in order to avoid triggering reclaim during contextes where we can not (either we are holding a transaction handle or some lock required by the transaction commit path), so that we do the calls to kobject_add(), which triggers GFP_KERNEL allocations, through btrfs_add_raid_kobjects() in contextes where it is safe to trigger reclaim. That change expected that a new raid kobject can only be created either when mounting the filesystem or after raid profile conversion through the relocation path. However, we can have new raid kobject created in other two cases at least: 1) During device replace (or scrub) after adding a device a to the filesystem. The replace procedure (and scrub) do calls to btrfs_inc_block_group_ro() which can allocate a new block group with a new raid profile (because we now have more devices). This can be triggered by test cases btrfs/027 and btrfs/176. 2) During a degraded mount trough any write path. This can be triggered by test case btrfs/124. Fixing this by adding extra calls to btrfs_add_raid_kobjects(), not only makes things more complex and fragile, can also introduce deadlocks with reclaim the following way: 1) Calling btrfs_add_raid_kobjects() at btrfs_inc_block_group_ro() or anywhere in the replace/scrub path will cause a deadlock with reclaim because if reclaim happens and a transaction commit is triggered, the transaction commit path will block at btrfs_scrub_pause(). 2) During degraded mounts it is essentially impossible to figure out where to add extra calls to btrfs_add_raid_kobjects(), because allocation of a block group with a new raid profile can happen anywhere, which means we can't safely figure out which contextes are safe for reclaim, as we can either hold a transaction handle or some lock needed by the transaction commit path. So it is too complex and error prone to have this split setup of raid kobjects. So fix the issue by consolidating the setup of the kobjects in a single place, at link_block_group(), and setup a nofs context there in order to prevent reclaim being triggered by the memory allocations done through the call chain of kobject_add(). Besides fixing the sysfs warnings during kobject_del(), this also ensures the sysfs directories for the new raid profiles end up created and visible to users (a bug that existed before the 5.3 commit 7c7e301 ("btrfs: sysfs: Replace default_attrs in ktypes with groups")). Fixes: 75cb379 ("btrfs: defer adding raid type kobject until after chunk relocation") Fixes: 7c7e301 ("btrfs: sysfs: Replace default_attrs in ktypes with groups") Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent a6d155d commit d7cd4dd

File tree

4 files changed

+24
-52
lines changed

4 files changed

+24
-52
lines changed

fs/btrfs/ctree.h

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -401,7 +401,6 @@ struct btrfs_dev_replace {
401401
struct raid_kobject {
402402
u64 flags;
403403
struct kobject kobj;
404-
struct list_head list;
405404
};
406405

407406
/*
@@ -915,8 +914,6 @@ struct btrfs_fs_info {
915914
u32 thread_pool_size;
916915

917916
struct kobject *space_info_kobj;
918-
struct list_head pending_raid_kobjs;
919-
spinlock_t pending_raid_kobjs_lock; /* uncontended */
920917

921918
u64 total_pinned;
922919

@@ -2698,7 +2695,6 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr);
26982695
int btrfs_make_block_group(struct btrfs_trans_handle *trans,
26992696
u64 bytes_used, u64 type, u64 chunk_offset,
27002697
u64 size);
2701-
void btrfs_add_raid_kobjects(struct btrfs_fs_info *fs_info);
27022698
struct btrfs_trans_handle *btrfs_start_trans_remove_block_group(
27032699
struct btrfs_fs_info *fs_info,
27042700
const u64 chunk_offset);

fs/btrfs/disk-io.c

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2683,8 +2683,6 @@ int open_ctree(struct super_block *sb,
26832683
INIT_LIST_HEAD(&fs_info->delayed_iputs);
26842684
INIT_LIST_HEAD(&fs_info->delalloc_roots);
26852685
INIT_LIST_HEAD(&fs_info->caching_block_groups);
2686-
INIT_LIST_HEAD(&fs_info->pending_raid_kobjs);
2687-
spin_lock_init(&fs_info->pending_raid_kobjs_lock);
26882686
spin_lock_init(&fs_info->delalloc_root_lock);
26892687
spin_lock_init(&fs_info->trans_lock);
26902688
spin_lock_init(&fs_info->fs_roots_radix_lock);

fs/btrfs/extent-tree.c

Lines changed: 24 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
*/
55

66
#include <linux/sched.h>
7+
#include <linux/sched/mm.h>
78
#include <linux/sched/signal.h>
89
#include <linux/pagemap.h>
910
#include <linux/writeback.h>
@@ -7888,33 +7889,6 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
78887889
return 0;
78897890
}
78907891

7891-
/* link_block_group will queue up kobjects to add when we're reclaim-safe */
7892-
void btrfs_add_raid_kobjects(struct btrfs_fs_info *fs_info)
7893-
{
7894-
struct btrfs_space_info *space_info;
7895-
struct raid_kobject *rkobj;
7896-
LIST_HEAD(list);
7897-
int ret = 0;
7898-
7899-
spin_lock(&fs_info->pending_raid_kobjs_lock);
7900-
list_splice_init(&fs_info->pending_raid_kobjs, &list);
7901-
spin_unlock(&fs_info->pending_raid_kobjs_lock);
7902-
7903-
list_for_each_entry(rkobj, &list, list) {
7904-
space_info = btrfs_find_space_info(fs_info, rkobj->flags);
7905-
7906-
ret = kobject_add(&rkobj->kobj, &space_info->kobj,
7907-
"%s", btrfs_bg_type_to_raid_name(rkobj->flags));
7908-
if (ret) {
7909-
kobject_put(&rkobj->kobj);
7910-
break;
7911-
}
7912-
}
7913-
if (ret)
7914-
btrfs_warn(fs_info,
7915-
"failed to add kobject for block cache, ignoring");
7916-
}
7917-
79187892
static void link_block_group(struct btrfs_block_group_cache *cache)
79197893
{
79207894
struct btrfs_space_info *space_info = cache->space_info;
@@ -7929,18 +7903,36 @@ static void link_block_group(struct btrfs_block_group_cache *cache)
79297903
up_write(&space_info->groups_sem);
79307904

79317905
if (first) {
7932-
struct raid_kobject *rkobj = kzalloc(sizeof(*rkobj), GFP_NOFS);
7906+
struct raid_kobject *rkobj;
7907+
unsigned int nofs_flag;
7908+
int ret;
7909+
7910+
/*
7911+
* Setup a NOFS context because kobject_add(), deep in its call
7912+
* chain, does GFP_KERNEL allocations, and we are often called
7913+
* in a context where if reclaim is triggered we can deadlock
7914+
* (we are either holding a transaction handle or some lock
7915+
* required for a transaction commit).
7916+
*/
7917+
nofs_flag = memalloc_nofs_save();
7918+
rkobj = kzalloc(sizeof(*rkobj), GFP_KERNEL);
79337919
if (!rkobj) {
7920+
memalloc_nofs_restore(nofs_flag);
79347921
btrfs_warn(cache->fs_info,
79357922
"couldn't alloc memory for raid level kobject");
79367923
return;
79377924
}
79387925
rkobj->flags = cache->flags;
79397926
kobject_init(&rkobj->kobj, &btrfs_raid_ktype);
7940-
7941-
spin_lock(&fs_info->pending_raid_kobjs_lock);
7942-
list_add_tail(&rkobj->list, &fs_info->pending_raid_kobjs);
7943-
spin_unlock(&fs_info->pending_raid_kobjs_lock);
7927+
ret = kobject_add(&rkobj->kobj, &space_info->kobj, "%s",
7928+
btrfs_bg_type_to_raid_name(rkobj->flags));
7929+
memalloc_nofs_restore(nofs_flag);
7930+
if (ret) {
7931+
kobject_put(&rkobj->kobj);
7932+
btrfs_warn(fs_info,
7933+
"failed to add kobject for block cache, ignoring");
7934+
return;
7935+
}
79447936
space_info->block_group_kobjs[index] = &rkobj->kobj;
79457937
}
79467938
}
@@ -8206,7 +8198,6 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
82068198
inc_block_group_ro(cache, 1);
82078199
}
82088200

8209-
btrfs_add_raid_kobjects(info);
82108201
btrfs_init_global_block_rsv(info);
82118202
ret = check_chunk_block_group_mappings(info);
82128203
error:

fs/btrfs/volumes.c

Lines changed: 0 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3087,16 +3087,6 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
30873087
if (ret)
30883088
return ret;
30893089

3090-
/*
3091-
* We add the kobjects here (and after forcing data chunk creation)
3092-
* since relocation is the only place we'll create chunks of a new
3093-
* type at runtime. The only place where we'll remove the last
3094-
* chunk of a type is the call immediately below this one. Even
3095-
* so, we're protected against races with the cleaner thread since
3096-
* we're covered by the delete_unused_bgs_mutex.
3097-
*/
3098-
btrfs_add_raid_kobjects(fs_info);
3099-
31003090
trans = btrfs_start_trans_remove_block_group(root->fs_info,
31013091
chunk_offset);
31023092
if (IS_ERR(trans)) {
@@ -3223,9 +3213,6 @@ static int btrfs_may_alloc_data_chunk(struct btrfs_fs_info *fs_info,
32233213
btrfs_end_transaction(trans);
32243214
if (ret < 0)
32253215
return ret;
3226-
3227-
btrfs_add_raid_kobjects(fs_info);
3228-
32293216
return 1;
32303217
}
32313218
}

0 commit comments

Comments
 (0)