Btrfs: fix race between cleaner kthread and space cache writeout

fdmanana · masoncl · commit 036a9348dcd0 · 2015-11-25T05:22:08.000-08:00
When a block group becomes unused and the cleaner kthread is currently
running, we can end up getting the current transaction aborted with error
-ENOENT when we try to commit the transaction, leading to the following
trace:

  [59779.258768] WARNING: CPU: 3 PID: 5990 at fs/btrfs/extent-tree.c:3740 btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]()
  [59779.272594] BTRFS: Transaction aborted (error -2)
  (...)
  [59779.291137] Call Trace:
  [59779.291621]  [&lt;ffffffff812566f4&gt;] dump_stack+0x4e/0x79
  [59779.292543]  [&lt;ffffffff8104d0a6&gt;] warn_slowpath_common+0x9f/0xb8
  [59779.293435]  [&lt;ffffffffa04cb81f&gt;] ? btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]
  [59779.295000]  [&lt;ffffffff8104d107&gt;] warn_slowpath_fmt+0x48/0x50
  [59779.296138]  [&lt;ffffffffa04c2721&gt;] ? write_one_cache_group.isra.32+0x77/0x82 [btrfs]
  [59779.297663]  [&lt;ffffffffa04cb81f&gt;] btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]
  [59779.299141]  [&lt;ffffffffa0549b0d&gt;] commit_cowonly_roots+0x1de/0x261 [btrfs]
  [59779.300359]  [&lt;ffffffffa04dd5b6&gt;] btrfs_commit_transaction+0x4c4/0x99c [btrfs]
  [59779.301805]  [&lt;ffffffffa04b5df4&gt;] btrfs_sync_fs+0x145/0x1ad [btrfs]
  [59779.302893]  [&lt;ffffffff81196634&gt;] sync_filesystem+0x7f/0x93
  (...)
  [59779.318186] ---[ end trace 577e2daff90da33a ]---

The following diagram illustrates a sequence of steps leading to this
problem:

       CPU 1                                             CPU 2

                           &lt;at transaction N&gt;

                                                        adds bg A to list
                                                        fs_info-&gt;unused_bgs

                                                        adds bg B to list
                                                        fs_info-&gt;unused_bgs

                           &lt;transaction kthread
                            commits transaction N
                            and wakes up the
                            cleaner kthread&gt;

  cleaner kthread
    delete_unused_bgs()

      sees bg A in list
      fs_info-&gt;unused_bgs

      btrfs_start_transaction()

                           &lt;transaction N + 1 starts&gt;

      deletes bg A

                                                        update_block_group(bg C)

                                                          --&gt; adds bg C to list
                                                              fs_info-&gt;unused_bgs

      deletes bg B

      sees bg C in the list
      fs_info-&gt;unused_bgs

      btrfs_remove_chunk(bg C)
        btrfs_remove_block_group(bg C)

          --&gt; checks if the block group
              is in a dirty list, and
              because it isn't now, it
              does nothing

          --&gt; the block group item
              is deleted from the
              extent tree

                                                          --&gt; adds bg C to list
                                                              transaction-&gt;dirty_bgs

                                                         some task calls
                                                         btrfs_commit_transaction(t N + 1)
                                                           commit_cowonly_roots()
                                                             btrfs_write_dirty_block_groups()
                                                               --&gt; sees bg C in cur_trans-&gt;dirty_bgs
                                                               --&gt; calls write_one_cache_group()
                                                                   which returns -ENOENT because
                                                                   it did not find the block group
                                                                   item in the extent tree
                                                               --&gt; transaction aborte with -ENOENT
                                                                   because write_one_cache_group()
                                                                   returned that error

So fix this by adding a block group to the list of dirty block groups
before adding it to the list of unused block groups.

This happened on a stress test using fsstress plus concurrent calls to
fallocate 20G and truncate (releasing part of the space allocated with
fallocate).

Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
@@ -5915,19 +5915,6 @@ static int update_block_group(struct btrfs_trans_handle *trans,
 			set_extent_dirty(info->pinned_extents,
 					 bytenr, bytenr + num_bytes - 1,
 					 GFP_NOFS | __GFP_NOFAIL);
-			/*
-			 * No longer have used bytes in this block group, queue
-			 * it for deletion.
-			 */
-			if (old_val == 0) {
-				spin_lock(&info->unused_bgs_lock);
-				if (list_empty(&cache->bg_list)) {
-					btrfs_get_block_group(cache);
-					list_add_tail(&cache->bg_list,
-						      &info->unused_bgs);
-				}
-				spin_unlock(&info->unused_bgs_lock);
-			}
 		}
 
 		spin_lock(&trans->transaction->dirty_bgs_lock);
@@ -5939,6 +5926,22 @@ static int update_block_group(struct btrfs_trans_handle *trans,
 		}
 		spin_unlock(&trans->transaction->dirty_bgs_lock);
 
+		/*
+		 * No longer have used bytes in this block group, queue it for
+		 * deletion. We do this after adding the block group to the
+		 * dirty list to avoid races between cleaner kthread and space
+		 * cache writeout.
+		 */
+		if (!alloc && old_val == 0) {
+			spin_lock(&info->unused_bgs_lock);
+			if (list_empty(&cache->bg_list)) {
+				btrfs_get_block_group(cache);
+				list_add_tail(&cache->bg_list,
+					      &info->unused_bgs);
+			}
+			spin_unlock(&info->unused_bgs_lock);
+		}
+
 		btrfs_put_block_group(cache);
 		total -= num_bytes;
 		bytenr += num_bytes;