Skip to content

Commit b01acfe

Browse files
Liu BoSomasundaram Krishnasamy
authored andcommitted
Btrfs: fix unexpected EEXIST from btrfs_get_extent
Orabug: 27446653 This fixes a corner case that is caused by a race of dio write vs dio read/write. Here is how the race could happen. Suppose that no extent map has been loaded into memory yet. There is a file extent [0, 32K), two jobs are running concurrently against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio read from [0, 4K) or [4K, 8K). t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K). ------------------------------------------------------ t1 t2 btrfs_get_blocks_direct() btrfs_get_blocks_direct() -> btrfs_get_extent() -> btrfs_get_extent() -> lookup_extent_mapping() -> add_extent_mapping() -> lookup_extent_mapping() # load [0, 32K) -> btrfs_new_extent_direct() -> btrfs_drop_extent_cache() # split [0, 32K) -> add_extent_mapping() # add [8K, 32K) -> add_extent_mapping() # handle -EEXIST when adding # [0, 32K) ------------------------------------------------------ More details about how t2(dio read/write) runs into -EEXIST: When add_extent_mapping() gets -EEXIST for adding em [0, 32k), search_extent_mapping() would return [0, 8k) as existing em, even though start == existing->start, em is [0, 32k) and extent_map_end(em) > extent_map_end(existing), ie. 32k > 8k, then it goes thru merge_extent_mapping() which tries to add a [8k, 8k) (with a length 0), and btrfs_get_extent() ends up returning -EEXIST, and dio read/write will get -EEXIST which is confusing applications. Here I also concluded all possible situations, 1) start < existing->start +-----------+em+-----------+ +--prev---+ | +-------------+ | | | | | | | +---------+ + +---+existing++ ++ + | + start 2) start == existing->start +------------em------------+ | +-------------+ | | | | | + +----existing-+ + | | + start 3) start > existing->start && start < (existing->start + existing->len) +------------em------------+ | +-------------+ | | | | | + +----existing-+ + | | + start 4) start >= (existing->start + existing->len) +-----------+em+-----------+ | +-------------+ | +--next---+ | | | | | | + +---+existing++ + +---------+ + | + start After going thru the above case by case, it turns out that if start is within existing em (front inclusive), then the existing em should be returned, otherwise, we try our best to merge candidate em with sibling ems to form a larger em. Reviewed-by: Anand Jain <[email protected]> Reported-by: David Vallender <[email protected]> Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Somasundaram Krishnasamy <[email protected]>
1 parent eebeb88 commit b01acfe

File tree

1 file changed

+3
-14
lines changed

1 file changed

+3
-14
lines changed

fs/btrfs/inode.c

Lines changed: 3 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7208,19 +7208,12 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
72087208
* existing will always be non-NULL, since there must be
72097209
* extent causing the -EEXIST.
72107210
*/
7211-
if (existing->start == em->start &&
7212-
extent_map_end(existing) >= extent_map_end(em) &&
7213-
em->block_start == existing->block_start) {
7214-
/*
7215-
* The existing extent map already encompasses the
7216-
* entire extent map we tried to add.
7217-
*/
7211+
if (start >= existing->start &&
7212+
start < extent_map_end(existing)) {
72187213
free_extent_map(em);
72197214
em = existing;
72207215
err = 0;
7221-
7222-
} else if (start >= extent_map_end(existing) ||
7223-
start <= existing->start) {
7216+
} else {
72247217
/*
72257218
* The existing extent map is the one nearest to
72267219
* the [start, start + len) range which overlaps
@@ -7235,10 +7228,6 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
72357228
em = NULL;
72367229
}
72377230
free_extent_map(existing);
7238-
} else {
7239-
free_extent_map(em);
7240-
em = existing;
7241-
err = 0;
72427231
}
72437232
}
72447233
write_unlock(&em_tree->lock);

0 commit comments

Comments
 (0)