Skip to content

Commit e66f318

Browse files
Hugh Dickinsakpm00
authored andcommitted
mm/thp: fix deferred split queue not partially_mapped
Recent changes are putting more pressure on THP deferred split queues: under load revealing long-standing races, causing list_del corruptions, "Bad page state"s and worse (I keep BUGs in both of those, so usually don't get to see how badly they end up without). The relevant recent changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin, improved swap allocation, and underused THP splitting. The new unlocked list_del_init() in deferred_split_scan() is buggy. I gave bad advice, it looks plausible since that's a local on-stack list, but the fact is that it can race with a third party freeing or migrating the preceding folio (properly unqueueing it with refcount 0 while holding split_queue_lock), thereby corrupting the list linkage. The obvious answer would be to take split_queue_lock there: but it has a long history of contention, so I'm reluctant to add to that. Instead, make sure that there is always one safe (raised refcount) folio before, by delaying its folio_put(). (And of course I was wrong to suggest updating split_queue_len without the lock: leave that until the splice.) And remove two over-eager partially_mapped checks, restoring those tests to how they were before: if uncharge_folio() or free_tail_page_prepare() finds _deferred_list non-empty, it's in trouble whether or not that folio is partially_mapped (and the flag was already cleared in the latter case). Link: https://lkml.kernel.org/r/[email protected] Fixes: dafff3f ("mm: split underused THPs") Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Usama Arif <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Acked-by: Zi Yan <[email protected]> Cc: Barry Song <[email protected]> Cc: Chris Li <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 59b723c commit e66f318

File tree

3 files changed

+20
-9
lines changed

3 files changed

+20
-9
lines changed

mm/huge_memory.c

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3718,8 +3718,8 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
37183718
struct deferred_split *ds_queue = &pgdata->deferred_split_queue;
37193719
unsigned long flags;
37203720
LIST_HEAD(list);
3721-
struct folio *folio, *next;
3722-
int split = 0;
3721+
struct folio *folio, *next, *prev = NULL;
3722+
int split = 0, removed = 0;
37233723

37243724
#ifdef CONFIG_MEMCG
37253725
if (sc->memcg)
@@ -3775,15 +3775,28 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
37753775
*/
37763776
if (!did_split && !folio_test_partially_mapped(folio)) {
37773777
list_del_init(&folio->_deferred_list);
3778-
ds_queue->split_queue_len--;
3778+
removed++;
3779+
} else {
3780+
/*
3781+
* That unlocked list_del_init() above would be unsafe,
3782+
* unless its folio is separated from any earlier folios
3783+
* left on the list (which may be concurrently unqueued)
3784+
* by one safe folio with refcount still raised.
3785+
*/
3786+
swap(folio, prev);
37793787
}
3780-
folio_put(folio);
3788+
if (folio)
3789+
folio_put(folio);
37813790
}
37823791

37833792
spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
37843793
list_splice_tail(&list, &ds_queue->split_queue);
3794+
ds_queue->split_queue_len -= removed;
37853795
spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
37863796

3797+
if (prev)
3798+
folio_put(prev);
3799+
37873800
/*
37883801
* Stop shrinker if we didn't split any page, but the queue is empty.
37893802
* This can happen if pages were freed under us.

mm/memcontrol.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4631,8 +4631,7 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug)
46314631
VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
46324632
VM_BUG_ON_FOLIO(folio_order(folio) > 1 &&
46334633
!folio_test_hugetlb(folio) &&
4634-
!list_empty(&folio->_deferred_list) &&
4635-
folio_test_partially_mapped(folio), folio);
4634+
!list_empty(&folio->_deferred_list), folio);
46364635

46374636
/*
46384637
* Nobody should be changing or seriously looking at

mm/page_alloc.c

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -961,9 +961,8 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
961961
break;
962962
case 2:
963963
/* the second tail page: deferred_list overlaps ->mapping */
964-
if (unlikely(!list_empty(&folio->_deferred_list) &&
965-
folio_test_partially_mapped(folio))) {
966-
bad_page(page, "partially mapped folio on deferred list");
964+
if (unlikely(!list_empty(&folio->_deferred_list))) {
965+
bad_page(page, "on deferred list");
967966
goto out;
968967
}
969968
break;

0 commit comments

Comments
 (0)