Skip to content

Commit b3ebb9b

Browse files
fdmananakdave
authored andcommitted
btrfs: stop extent map shrinker if reschedule is needed
The extent map shrinker can be called in a variety of contexts where we are under memory pressure, and of them is when a task is trying to allocate memory. For this reason the shrinker is typically called with a value of struct shrink_control::nr_to_scan that is much smaller than what we return in the nr_cached_objects callback of struct super_operations (fs/btrfs/super.c:btrfs_nr_cached_objects()), so that the shrinker does not take a long time and cause high latencies. However we can still take a lot of time in the shrinker even for a limited amount of nr_to_scan: 1) When traversing the red black tree that tracks open inodes in a root, as for example with millions of open inodes we get a deep tree which takes time searching for an inode; 2) Iterating over the extent map tree, which is a red black tree, of an inode when doing the rb_next() calls and when removing an extent map from the tree, since often that requires rebalancing the red black tree; 3) When trying to write lock an inode's extent map tree we may wait for a significant amount of time, because there's either another task about to do IO and searching for an extent map in the tree or inserting an extent map in the tree, and we can have thousands or even millions of extent maps for an inode. Furthermore, there can be concurrent calls to the shrinker so the lock might be busy simply because there is already another task shrinking extent maps for the same inode; 4) We often reschedule if we need to, which further increases latency. So improve on this by stopping the extent map shrinking code whenever we need to reschedule and make it skip an inode if we can't immediately lock its extent map tree. Reported-by: Mikhail Gavrilov <[email protected]> Reported-by: Andrea Gelmini <[email protected]> Link: https://lore.kernel.org/linux-btrfs/CABXGCsMmmb36ym8hVNGTiU8yfUS_cGvoUmGCcBrGWq9OxTrs+A@mail.gmail.com/ Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent 68a3ebd commit b3ebb9b

File tree

1 file changed

+31
-8
lines changed

1 file changed

+31
-8
lines changed

fs/btrfs/extent_map.c

Lines changed: 31 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1057,7 +1057,18 @@ static long btrfs_scan_inode(struct btrfs_inode *inode, long *scanned, long nr_t
10571057
if (!down_read_trylock(&inode->i_mmap_lock))
10581058
return 0;
10591059

1060-
write_lock(&tree->lock);
1060+
/*
1061+
* We want to be fast because we can be called from any path trying to
1062+
* allocate memory, so if the lock is busy we don't want to spend time
1063+
* waiting for it - either some task is about to do IO for the inode or
1064+
* we may have another task shrinking extent maps, here in this code, so
1065+
* skip this inode.
1066+
*/
1067+
if (!write_trylock(&tree->lock)) {
1068+
up_read(&inode->i_mmap_lock);
1069+
return 0;
1070+
}
1071+
10611072
node = rb_first_cached(&tree->map);
10621073
while (node) {
10631074
struct extent_map *em;
@@ -1089,12 +1100,14 @@ static long btrfs_scan_inode(struct btrfs_inode *inode, long *scanned, long nr_t
10891100
break;
10901101

10911102
/*
1092-
* Restart if we had to reschedule, and any extent maps that were
1093-
* pinned before may have become unpinned after we released the
1094-
* lock and took it again.
1103+
* Stop if we need to reschedule or there's contention on the
1104+
* lock. This is to avoid slowing other tasks trying to take the
1105+
* lock and because the shrinker might be called during a memory
1106+
* allocation path and we want to avoid taking a very long time
1107+
* and slowing down all sorts of tasks.
10951108
*/
1096-
if (cond_resched_rwlock_write(&tree->lock))
1097-
node = rb_first_cached(&tree->map);
1109+
if (need_resched() || rwlock_needbreak(&tree->lock))
1110+
break;
10981111
}
10991112
write_unlock(&tree->lock);
11001113
up_read(&inode->i_mmap_lock);
@@ -1120,7 +1133,13 @@ static long btrfs_scan_root(struct btrfs_root *root, long *scanned, long nr_to_s
11201133
if (*scanned >= nr_to_scan)
11211134
break;
11221135

1123-
cond_resched();
1136+
/*
1137+
* We may be called from memory allocation paths, so we don't
1138+
* want to take too much time and slowdown tasks.
1139+
*/
1140+
if (need_resched())
1141+
break;
1142+
11241143
inode = btrfs_find_first_inode(root, min_ino);
11251144
}
11261145

@@ -1159,7 +1178,11 @@ long btrfs_free_extent_maps(struct btrfs_fs_info *fs_info, long nr_to_scan)
11591178
trace_btrfs_extent_map_shrinker_scan_enter(fs_info, nr_to_scan, nr);
11601179
}
11611180

1162-
while (scanned < nr_to_scan) {
1181+
/*
1182+
* We may be called from memory allocation paths, so we don't want to
1183+
* take too much time and slowdown tasks, so stop if we need reschedule.
1184+
*/
1185+
while (scanned < nr_to_scan && !need_resched()) {
11631186
struct btrfs_root *root;
11641187
unsigned long count;
11651188

0 commit comments

Comments
 (0)