Skip to content

Commit b5e4ff9

Browse files
fdmananakdave
authored andcommitted
Btrfs: fix infinite loop during fsync after rename operations
Recently fsstress (from fstests) sporadically started to trigger an infinite loop during fsync operations. This turned out to be because support for the rename exchange and whiteout operations was added to fsstress in fstests. These operations, unlike any others in fsstress, cause file names to be reused, whence triggering this issue. However it's not necessary to use rename exchange and rename whiteout operations trigger this issue, simple rename operations and file creations are enough to trigger the issue. The issue boils down to when we are logging inodes that conflict (that had the name of any inode we need to log during the fsync operation), we keep logging them even if they were already logged before, and after that we check if there's any other inode that conflicts with them and then add it again to the list of inodes to log. Skipping already logged inodes fixes the issue. Consider the following example: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/testdir # inode 257 $ touch /mnt/testdir/zz # inode 258 $ ln /mnt/testdir/zz /mnt/testdir/zz_link $ touch /mnt/testdir/a # inode 259 $ sync # The following 3 renames achieve the same result as a rename exchange # operation (<rename_exchange> /mnt/testdir/zz_link to /mnt/testdir/a). $ mv /mnt/testdir/a /mnt/testdir/a/tmp $ mv /mnt/testdir/zz_link /mnt/testdir/a $ mv /mnt/testdir/a/tmp /mnt/testdir/zz_link # The following rename and file creation give the same result as a # rename whiteout operation (<rename_whiteout> zz to a2). $ mv /mnt/testdir/zz /mnt/testdir/a2 $ touch /mnt/testdir/zz # inode 260 $ xfs_io -c fsync /mnt/testdir/zz --> results in the infinite loop The following steps happen: 1) When logging inode 260, we find that its reference named "zz" was used by inode 258 in the previous transaction (through the commit root), so inode 258 is added to the list of conflicting indoes that need to be logged; 2) After logging inode 258, we find that its reference named "a" was used by inode 259 in the previous transaction, and therefore we add inode 259 to the list of conflicting inodes to be logged; 3) After logging inode 259, we find that its reference named "zz_link" was used by inode 258 in the previous transaction - we add inode 258 to the list of conflicting inodes to log, again - we had already logged it before at step 3. After logging it again, we find again that inode 259 conflicts with him, and we add again 259 to the list, etc - we end up repeating all the previous steps. So fix this by skipping logging of conflicting inodes that were already logged. Fixes: 6b5fc43 ("Btrfs: fix fsync after succession of renames of different files") CC: [email protected] # 5.1+ Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent d62b23c commit b5e4ff9

File tree

1 file changed

+44
-0
lines changed

1 file changed

+44
-0
lines changed

fs/btrfs/tree-log.c

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4822,6 +4822,50 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
48224822
}
48234823
continue;
48244824
}
4825+
/*
4826+
* If the inode was already logged skip it - otherwise we can
4827+
* hit an infinite loop. Example:
4828+
*
4829+
* From the commit root (previous transaction) we have the
4830+
* following inodes:
4831+
*
4832+
* inode 257 a directory
4833+
* inode 258 with references "zz" and "zz_link" on inode 257
4834+
* inode 259 with reference "a" on inode 257
4835+
*
4836+
* And in the current (uncommitted) transaction we have:
4837+
*
4838+
* inode 257 a directory, unchanged
4839+
* inode 258 with references "a" and "a2" on inode 257
4840+
* inode 259 with reference "zz_link" on inode 257
4841+
* inode 261 with reference "zz" on inode 257
4842+
*
4843+
* When logging inode 261 the following infinite loop could
4844+
* happen if we don't skip already logged inodes:
4845+
*
4846+
* - we detect inode 258 as a conflicting inode, with inode 261
4847+
* on reference "zz", and log it;
4848+
*
4849+
* - we detect inode 259 as a conflicting inode, with inode 258
4850+
* on reference "a", and log it;
4851+
*
4852+
* - we detect inode 258 as a conflicting inode, with inode 259
4853+
* on reference "zz_link", and log it - again! After this we
4854+
* repeat the above steps forever.
4855+
*/
4856+
spin_lock(&BTRFS_I(inode)->lock);
4857+
/*
4858+
* Check the inode's logged_trans only instead of
4859+
* btrfs_inode_in_log(). This is because the last_log_commit of
4860+
* the inode is not updated when we only log that it exists and
4861+
* and it has the full sync bit set (see btrfs_log_inode()).
4862+
*/
4863+
if (BTRFS_I(inode)->logged_trans == trans->transid) {
4864+
spin_unlock(&BTRFS_I(inode)->lock);
4865+
btrfs_add_delayed_iput(inode);
4866+
continue;
4867+
}
4868+
spin_unlock(&BTRFS_I(inode)->lock);
48254869
/*
48264870
* We are safe logging the other inode without acquiring its
48274871
* lock as long as we log with the LOG_INODE_EXISTS mode. We

0 commit comments

Comments
 (0)