Skip to content

Commit 73bc328

Browse files
Barry Songakpm00
authored andcommitted
mm: hold PTL from the first PTE while reclaiming a large folio
Within try_to_unmap_one(), page_vma_mapped_walk() races with other PTE modifications preceded by pte clear. While iterating over PTEs of a large folio, it only starts acquiring PTL from the first valid (present) PTE. PTE modifications can temporarily set PTEs to pte_none. Consequently, the initial PTEs of a large folio might be skipped in try_to_unmap_one(). For example, for an anon folio, if we skip PTE0, we may have PTE0 which is still present, while PTE1 ~ PTE(nr_pages - 1) are swap entries after try_to_unmap_one(). So folio will be still mapped, the folio fails to be reclaimed and is put back to LRU in this round. This also breaks up PTEs optimization such as CONT-PTE on this large folio and may lead to accident folio_split() afterwards. And since a part of PTEs are now swap entries, accessing those parts will introduce overhead - do_swap_page. Although the kernel can withstand all of the above issues, the situation still seems quite awkward and warrants making it more ideal. The same race also occurs with small folios, but they have only one PTE, thus, it won't be possible for them to be partially unmapped. This patch holds PTL from PTE0, allowing us to avoid reading PTE values that are in the process of being transformed. With stable PTE values, we can ensure that this large folio is either completely reclaimed or that all PTEs remain untouched in this round. A corner case is that if we hold PTL from PTE0 and most initial PTEs have been really unmapped before that, we may increase the duration of holding PTL. Thus we only apply this optimization to folios which are still entirely mapped (not in deferred_split list). [[email protected]: rewrap comment, per Matthew] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Barry Song <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Chris Li <[email protected]> Cc: Chuanhua Han <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 4b68a77 commit 73bc328

File tree

1 file changed

+14
-0
lines changed

1 file changed

+14
-0
lines changed

mm/vmscan.c

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1257,6 +1257,20 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
12571257

12581258
if (folio_test_pmd_mappable(folio))
12591259
flags |= TTU_SPLIT_HUGE_PMD;
1260+
/*
1261+
* Without TTU_SYNC, try_to_unmap will only begin to
1262+
* hold PTL from the first present PTE within a large
1263+
* folio. Some initial PTEs might be skipped due to
1264+
* races with parallel PTE writes in which PTEs can be
1265+
* cleared temporarily before being written new present
1266+
* values. This will lead to a large folio is still
1267+
* mapped while some subpages have been partially
1268+
* unmapped after try_to_unmap; TTU_SYNC helps
1269+
* try_to_unmap acquire PTL from the first PTE,
1270+
* eliminating the influence of temporary PTE values.
1271+
*/
1272+
if (folio_test_large(folio) && list_empty(&folio->_deferred_list))
1273+
flags |= TTU_SYNC;
12601274

12611275
try_to_unmap(folio, flags);
12621276
if (folio_mapped(folio)) {

0 commit comments

Comments
 (0)