Skip to content

Commit ae1e766

Browse files
fdmananakdave
authored andcommitted
btrfs: only run the extent map shrinker from kswapd tasks
Currently the extent map shrinker can be run by any task when attempting to allocate memory and there's enough memory pressure to trigger it. To avoid too much latency we stop iterating over extent maps and removing them once the task needs to reschedule. This logic was introduced in commit b3ebb9b ("btrfs: stop extent map shrinker if reschedule is needed"). While that solved high latency problems for some use cases, it's still not enough because with a too high number of tasks entering the extent map shrinker code, either due to memory allocations or because they are a kswapd task, we end up having a very high level of contention on some spin locks, namely: 1) The fs_info->fs_roots_radix_lock spin lock, which we need to find roots to iterate over their inodes; 2) The spin lock of the xarray used to track open inodes for a root (struct btrfs_root::inodes) - on 6.10 kernels and below, it used to be a red black tree and the spin lock was root->inode_lock; 3) The fs_info->delayed_iput_lock spin lock since the shrinker adds delayed iputs (calls btrfs_add_delayed_iput()). Instead of allowing the extent map shrinker to be run by any task, make it run only by kswapd tasks. This still solves the problem of running into OOM situations due to an unbounded extent map creation, which is simple to trigger by direct IO writes, as described in the changelog of commit 956a17d ("btrfs: add a shrinker for extent maps"), and by a similar case when doing buffered IO on files with a very large number of holes (keeping the file open and creating many holes, whose extent maps are only released when the file is closed). Reported-by: kzd <[email protected]> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219121 Reported-by: Octavia Togami <[email protected]> Link: https://lore.kernel.org/linux-btrfs/CAHPNGSSt-a4ZZWrtJdVyYnJFscFjP9S7rMcvEMaNSpR556DdLA@mail.gmail.com/ Fixes: 956a17d ("btrfs: add a shrinker for extent maps") CC: [email protected] # 6.10+ Tested-by: kzd <[email protected]> Tested-by: Octavia Togami <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent 31723c9 commit ae1e766

File tree

2 files changed

+16
-16
lines changed

2 files changed

+16
-16
lines changed

fs/btrfs/extent_map.c

Lines changed: 6 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1147,8 +1147,7 @@ static long btrfs_scan_inode(struct btrfs_inode *inode, struct btrfs_em_shrink_c
11471147
return 0;
11481148

11491149
/*
1150-
* We want to be fast because we can be called from any path trying to
1151-
* allocate memory, so if the lock is busy we don't want to spend time
1150+
* We want to be fast so if the lock is busy we don't want to spend time
11521151
* waiting for it - either some task is about to do IO for the inode or
11531152
* we may have another task shrinking extent maps, here in this code, so
11541153
* skip this inode.
@@ -1191,9 +1190,7 @@ static long btrfs_scan_inode(struct btrfs_inode *inode, struct btrfs_em_shrink_c
11911190
/*
11921191
* Stop if we need to reschedule or there's contention on the
11931192
* lock. This is to avoid slowing other tasks trying to take the
1194-
* lock and because the shrinker might be called during a memory
1195-
* allocation path and we want to avoid taking a very long time
1196-
* and slowing down all sorts of tasks.
1193+
* lock.
11971194
*/
11981195
if (need_resched() || rwlock_needbreak(&tree->lock))
11991196
break;
@@ -1222,12 +1219,7 @@ static long btrfs_scan_root(struct btrfs_root *root, struct btrfs_em_shrink_ctx
12221219
if (ctx->scanned >= ctx->nr_to_scan)
12231220
break;
12241221

1225-
/*
1226-
* We may be called from memory allocation paths, so we don't
1227-
* want to take too much time and slowdown tasks.
1228-
*/
1229-
if (need_resched())
1230-
break;
1222+
cond_resched();
12311223

12321224
inode = btrfs_find_first_inode(root, min_ino);
12331225
}
@@ -1285,14 +1277,12 @@ long btrfs_free_extent_maps(struct btrfs_fs_info *fs_info, long nr_to_scan)
12851277
ctx.last_ino);
12861278
}
12871279

1288-
/*
1289-
* We may be called from memory allocation paths, so we don't want to
1290-
* take too much time and slowdown tasks, so stop if we need reschedule.
1291-
*/
1292-
while (ctx.scanned < ctx.nr_to_scan && !need_resched()) {
1280+
while (ctx.scanned < ctx.nr_to_scan) {
12931281
struct btrfs_root *root;
12941282
unsigned long count;
12951283

1284+
cond_resched();
1285+
12961286
spin_lock(&fs_info->fs_roots_radix_lock);
12971287
count = radix_tree_gang_lookup(&fs_info->fs_roots_radix,
12981288
(void **)&root,

fs/btrfs/super.c

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
#include <linux/btrfs.h>
2929
#include <linux/security.h>
3030
#include <linux/fs_parser.h>
31+
#include <linux/swap.h>
3132
#include "messages.h"
3233
#include "delayed-inode.h"
3334
#include "ctree.h"
@@ -2409,6 +2410,15 @@ static long btrfs_free_cached_objects(struct super_block *sb, struct shrink_cont
24092410
const long nr_to_scan = min_t(unsigned long, LONG_MAX, sc->nr_to_scan);
24102411
struct btrfs_fs_info *fs_info = btrfs_sb(sb);
24112412

2413+
/*
2414+
* We may be called from any task trying to allocate memory and we don't
2415+
* want to slow it down with scanning and dropping extent maps. It would
2416+
* also cause heavy lock contention if many tasks concurrently enter
2417+
* here. Therefore only allow kswapd tasks to scan and drop extent maps.
2418+
*/
2419+
if (!current_is_kswapd())
2420+
return 0;
2421+
24122422
return btrfs_free_extent_maps(fs_info, nr_to_scan);
24132423
}
24142424

0 commit comments

Comments
 (0)