Skip to content

Commit 196d59a

Browse files
josefbacikkdave
authored andcommitted
btrfs: switch extent buffer tree lock to rw_semaphore
Historically we've implemented our own locking because we wanted to be able to selectively spin or sleep based on what we were doing in the tree. For instance, if all of our nodes were in cache then there's rarely a reason to need to sleep waiting for node locks, as they'll likely become available soon. At the time this code was written the rw_semaphore didn't do adaptive spinning, and thus was orders of magnitude slower than our home grown locking. However now the opposite is the case. There are a few problems with how we implement blocking locks, namely that we use a normal waitqueue and simply wake everybody up in reverse sleep order. This leads to some suboptimal performance behavior, and a lot of context switches in highly contended cases. The rw_semaphores actually do this properly, and also have adaptive spinning that works relatively well. The locking code is also a bit of a bear to understand, and we lose the benefit of lockdep for the most part because the blocking states of the lock are simply ad-hoc and not mapped into lockdep. So rework the locking code to drop all of this custom locking stuff, and simply use a rw_semaphore for everything. This makes the locking much simpler for everything, as we can now drop a lot of cruft and blocking transitions. The performance numbers vary depending on the workload, because generally speaking there doesn't tend to be a lot of contention on the btree. However, on my test system which is an 80 core single socket system with 256GiB of RAM and a 2TiB NVMe drive I get the following results (with all debug options off): dbench 200 baseline Throughput 216.056 MB/sec 200 clients 200 procs max_latency=1471.197 ms dbench 200 with patch Throughput 737.188 MB/sec 200 clients 200 procs max_latency=714.346 ms Previously we also used fs_mark to test this sort of contention, and those results are far less impressive, mostly because there's not enough tasks to really stress the locking fs_mark -d /d[0-15] -S 0 -L 20 -n 100000 -s 0 -t 16 baseline Average Files/sec: 160166.7 p50 Files/sec: 165832 p90 Files/sec: 123886 p99 Files/sec: 123495 real 3m26.527s user 2m19.223s sys 48m21.856s patched Average Files/sec: 164135.7 p50 Files/sec: 171095 p90 Files/sec: 122889 p99 Files/sec: 113819 real 3m29.660s user 2m19.990s sys 44m12.259s Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent ecdcf3c commit 196d59a

File tree

5 files changed

+70
-351
lines changed

5 files changed

+70
-351
lines changed

fs/btrfs/extent_io.c

Lines changed: 1 addition & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4946,12 +4946,8 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
49464946
eb->len = len;
49474947
eb->fs_info = fs_info;
49484948
eb->bflags = 0;
4949-
rwlock_init(&eb->lock);
4950-
atomic_set(&eb->blocking_readers, 0);
4951-
eb->blocking_writers = 0;
4949+
init_rwsem(&eb->lock);
49524950
eb->lock_recursed = false;
4953-
init_waitqueue_head(&eb->write_lock_wq);
4954-
init_waitqueue_head(&eb->read_lock_wq);
49554951

49564952
btrfs_leak_debug_add(&fs_info->eb_leak_lock, &eb->leak_list,
49574953
&fs_info->allocated_ebs);
@@ -4967,13 +4963,6 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
49674963
> MAX_INLINE_EXTENT_BUFFER_SIZE);
49684964
BUG_ON(len > MAX_INLINE_EXTENT_BUFFER_SIZE);
49694965

4970-
#ifdef CONFIG_BTRFS_DEBUG
4971-
eb->spinning_writers = 0;
4972-
atomic_set(&eb->spinning_readers, 0);
4973-
atomic_set(&eb->read_locks, 0);
4974-
eb->write_locks = 0;
4975-
#endif
4976-
49774966
return eb;
49784967
}
49794968

fs/btrfs/extent_io.h

Lines changed: 2 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -87,31 +87,14 @@ struct extent_buffer {
8787
int read_mirror;
8888
struct rcu_head rcu_head;
8989
pid_t lock_owner;
90-
91-
int blocking_writers;
92-
atomic_t blocking_readers;
9390
bool lock_recursed;
91+
struct rw_semaphore lock;
92+
9493
/* >= 0 if eb belongs to a log tree, -1 otherwise */
9594
short log_index;
9695

97-
/* protects write locks */
98-
rwlock_t lock;
99-
100-
/* readers use lock_wq while they wait for the write
101-
* lock holders to unlock
102-
*/
103-
wait_queue_head_t write_lock_wq;
104-
105-
/* writers use read_lock_wq while they wait for readers
106-
* to unlock
107-
*/
108-
wait_queue_head_t read_lock_wq;
10996
struct page *pages[INLINE_EXTENT_BUFFER_PAGES];
11097
#ifdef CONFIG_BTRFS_DEBUG
111-
int spinning_writers;
112-
atomic_t spinning_readers;
113-
atomic_t read_locks;
114-
int write_locks;
11598
struct list_head leak_list;
11699
#endif
117100
};

0 commit comments

Comments
 (0)