Skip to content

Commit 78748f2

Browse files
fdmananagregkh
authored andcommitted
Btrfs: fix infinite loop during fsync after rename operations
commit b5e4ff9 upstream. Recently fsstress (from fstests) sporadically started to trigger an infinite loop during fsync operations. This turned out to be because support for the rename exchange and whiteout operations was added to fsstress in fstests. These operations, unlike any others in fsstress, cause file names to be reused, whence triggering this issue. However it's not necessary to use rename exchange and rename whiteout operations trigger this issue, simple rename operations and file creations are enough to trigger the issue. The issue boils down to when we are logging inodes that conflict (that had the name of any inode we need to log during the fsync operation), we keep logging them even if they were already logged before, and after that we check if there's any other inode that conflicts with them and then add it again to the list of inodes to log. Skipping already logged inodes fixes the issue. Consider the following example: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/testdir # inode 257 $ touch /mnt/testdir/zz # inode 258 $ ln /mnt/testdir/zz /mnt/testdir/zz_link $ touch /mnt/testdir/a # inode 259 $ sync # The following 3 renames achieve the same result as a rename exchange # operation (<rename_exchange> /mnt/testdir/zz_link to /mnt/testdir/a). $ mv /mnt/testdir/a /mnt/testdir/a/tmp $ mv /mnt/testdir/zz_link /mnt/testdir/a $ mv /mnt/testdir/a/tmp /mnt/testdir/zz_link # The following rename and file creation give the same result as a # rename whiteout operation (<rename_whiteout> zz to a2). $ mv /mnt/testdir/zz /mnt/testdir/a2 $ touch /mnt/testdir/zz # inode 260 $ xfs_io -c fsync /mnt/testdir/zz --> results in the infinite loop The following steps happen: 1) When logging inode 260, we find that its reference named "zz" was used by inode 258 in the previous transaction (through the commit root), so inode 258 is added to the list of conflicting indoes that need to be logged; 2) After logging inode 258, we find that its reference named "a" was used by inode 259 in the previous transaction, and therefore we add inode 259 to the list of conflicting inodes to be logged; 3) After logging inode 259, we find that its reference named "zz_link" was used by inode 258 in the previous transaction - we add inode 258 to the list of conflicting inodes to log, again - we had already logged it before at step 3. After logging it again, we find again that inode 259 conflicts with him, and we add again 259 to the list, etc - we end up repeating all the previous steps. So fix this by skipping logging of conflicting inodes that were already logged. Fixes: 6b5fc43 ("Btrfs: fix fsync after succession of renames of different files") CC: [email protected] # 5.1+ Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
1 parent 79a29de commit 78748f2

File tree

1 file changed

+44
-0
lines changed

1 file changed

+44
-0
lines changed

fs/btrfs/tree-log.c

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4854,6 +4854,50 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
48544854
}
48554855
continue;
48564856
}
4857+
/*
4858+
* If the inode was already logged skip it - otherwise we can
4859+
* hit an infinite loop. Example:
4860+
*
4861+
* From the commit root (previous transaction) we have the
4862+
* following inodes:
4863+
*
4864+
* inode 257 a directory
4865+
* inode 258 with references "zz" and "zz_link" on inode 257
4866+
* inode 259 with reference "a" on inode 257
4867+
*
4868+
* And in the current (uncommitted) transaction we have:
4869+
*
4870+
* inode 257 a directory, unchanged
4871+
* inode 258 with references "a" and "a2" on inode 257
4872+
* inode 259 with reference "zz_link" on inode 257
4873+
* inode 261 with reference "zz" on inode 257
4874+
*
4875+
* When logging inode 261 the following infinite loop could
4876+
* happen if we don't skip already logged inodes:
4877+
*
4878+
* - we detect inode 258 as a conflicting inode, with inode 261
4879+
* on reference "zz", and log it;
4880+
*
4881+
* - we detect inode 259 as a conflicting inode, with inode 258
4882+
* on reference "a", and log it;
4883+
*
4884+
* - we detect inode 258 as a conflicting inode, with inode 259
4885+
* on reference "zz_link", and log it - again! After this we
4886+
* repeat the above steps forever.
4887+
*/
4888+
spin_lock(&BTRFS_I(inode)->lock);
4889+
/*
4890+
* Check the inode's logged_trans only instead of
4891+
* btrfs_inode_in_log(). This is because the last_log_commit of
4892+
* the inode is not updated when we only log that it exists and
4893+
* and it has the full sync bit set (see btrfs_log_inode()).
4894+
*/
4895+
if (BTRFS_I(inode)->logged_trans == trans->transid) {
4896+
spin_unlock(&BTRFS_I(inode)->lock);
4897+
btrfs_add_delayed_iput(inode);
4898+
continue;
4899+
}
4900+
spin_unlock(&BTRFS_I(inode)->lock);
48574901
/*
48584902
* We are safe logging the other inode without acquiring its
48594903
* lock as long as we log with the LOG_INODE_EXISTS mode. We

0 commit comments

Comments
 (0)