You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Btrfs: fix unexpected EEXIST from btrfs_get_extent
Orabug: 27446653
This fixes a corner case that is caused by a race of dio write vs dio
read/write.
Here is how the race could happen.
Suppose that no extent map has been loaded into memory yet.
There is a file extent [0, 32K), two jobs are running concurrently
against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio
read from [0, 4K) or [4K, 8K).
t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K).
------------------------------------------------------
t1 t2
btrfs_get_blocks_direct() btrfs_get_blocks_direct()
-> btrfs_get_extent() -> btrfs_get_extent()
-> lookup_extent_mapping()
-> add_extent_mapping() -> lookup_extent_mapping()
# load [0, 32K)
-> btrfs_new_extent_direct()
-> btrfs_drop_extent_cache()
# split [0, 32K)
-> add_extent_mapping()
# add [8K, 32K)
-> add_extent_mapping()
# handle -EEXIST when adding
# [0, 32K)
------------------------------------------------------
More details about how t2(dio read/write) runs into -EEXIST:
When add_extent_mapping() gets -EEXIST for adding em [0, 32k),
search_extent_mapping() would return [0, 8k) as existing em, even
though start == existing->start, em is [0, 32k) and
extent_map_end(em) > extent_map_end(existing), ie. 32k > 8k,
then it goes thru merge_extent_mapping() which tries to add a [8k, 8k)
(with a length 0), and btrfs_get_extent() ends up returning -EEXIST,
and dio read/write will get -EEXIST which is confusing applications.
Here I also concluded all possible situations,
1) start < existing->start
+-----------+em+-----------+
+--prev---+ | +-------------+ |
| | | | | |
+---------+ + +---+existing++ ++
+
|
+
start
2) start == existing->start
+------------em------------+
| +-------------+ |
| | | |
+ +----existing-+ +
|
|
+
start
3) start > existing->start && start < (existing->start + existing->len)
+------------em------------+
| +-------------+ |
| | | |
+ +----existing-+ +
|
|
+
start
4) start >= (existing->start + existing->len)
+-----------+em+-----------+
| +-------------+ | +--next---+
| | | | | |
+ +---+existing++ + +---------+
+
|
+
start
After going thru the above case by case, it turns out that if start is
within existing em (front inclusive), then the existing em should be
returned, otherwise, we try our best to merge candidate em with
sibling ems to form a larger em.
Reviewed-by: Anand Jain <[email protected]>
Reported-by: David Vallender <[email protected]>
Signed-off-by: Liu Bo <[email protected]>
Signed-off-by: Somasundaram Krishnasamy <[email protected]>
0 commit comments