Skip to content

Commit 732ed55

Browse files
Hugh Dickinstorvalds
authored andcommitted
mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE (!unmap_success): with dump_page() showing mapcount:1, but then its raw struct page output showing _mapcount ffffffff i.e. mapcount 0. And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed, it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)), and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG(): all indicative of some mapcount difficulty in development here perhaps. But the !CONFIG_DEBUG_VM path handles the failures correctly and silently. I believe the problem is that once a racing unmap has cleared pte or pmd, try_to_unmap_one() may skip taking the page table lock, and emerge from try_to_unmap() before the racing task has reached decrementing mapcount. Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC to the options, and passing that from unmap_page(). When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same for both: the slight overhead added should rarely matter, except perhaps if splitting sparsely-populated multiply-mapped shmem. Once confident that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated. Link: https://lkml.kernel.org/r/[email protected] Fixes: fec89c1 ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers") Signed-off-by: Hugh Dickins <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jue Wang <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Wang Yugui <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 3b77e8c commit 732ed55

File tree

4 files changed

+29
-2
lines changed

4 files changed

+29
-2
lines changed

include/linux/rmap.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ enum ttu_flags {
9191

9292
TTU_SPLIT_HUGE_PMD = 0x4, /* split huge PMD if any */
9393
TTU_IGNORE_MLOCK = 0x8, /* ignore mlock */
94+
TTU_SYNC = 0x10, /* avoid racy checks with PVMW_SYNC */
9495
TTU_IGNORE_HWPOISON = 0x20, /* corrupted page is recoverable */
9596
TTU_BATCH_FLUSH = 0x40, /* Batch TLB flushes where possible
9697
* and caller guarantees they will

mm/huge_memory.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2350,7 +2350,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,
23502350

23512351
static void unmap_page(struct page *page)
23522352
{
2353-
enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK |
2353+
enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_SYNC |
23542354
TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
23552355
bool unmap_success;
23562356

mm/page_vma_mapped.c

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,17 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
212212
pvmw->ptl = NULL;
213213
}
214214
} else if (!pmd_present(pmde)) {
215+
/*
216+
* If PVMW_SYNC, take and drop THP pmd lock so that we
217+
* cannot return prematurely, while zap_huge_pmd() has
218+
* cleared *pmd but not decremented compound_mapcount().
219+
*/
220+
if ((pvmw->flags & PVMW_SYNC) &&
221+
PageTransCompound(pvmw->page)) {
222+
spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
223+
224+
spin_unlock(ptl);
225+
}
215226
return false;
216227
}
217228
if (!map_pte(pvmw))

mm/rmap.c

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1405,6 +1405,15 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
14051405
struct mmu_notifier_range range;
14061406
enum ttu_flags flags = (enum ttu_flags)(long)arg;
14071407

1408+
/*
1409+
* When racing against e.g. zap_pte_range() on another cpu,
1410+
* in between its ptep_get_and_clear_full() and page_remove_rmap(),
1411+
* try_to_unmap() may return false when it is about to become true,
1412+
* if page table locking is skipped: use TTU_SYNC to wait for that.
1413+
*/
1414+
if (flags & TTU_SYNC)
1415+
pvmw.flags = PVMW_SYNC;
1416+
14081417
/* munlock has nothing to gain from examining un-locked vmas */
14091418
if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
14101419
return true;
@@ -1777,7 +1786,13 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags)
17771786
else
17781787
rmap_walk(page, &rwc);
17791788

1780-
return !page_mapcount(page) ? true : false;
1789+
/*
1790+
* When racing against e.g. zap_pte_range() on another cpu,
1791+
* in between its ptep_get_and_clear_full() and page_remove_rmap(),
1792+
* try_to_unmap() may return false when it is about to become true,
1793+
* if page table locking is skipped: use TTU_SYNC to wait for that.
1794+
*/
1795+
return !page_mapcount(page);
17811796
}
17821797

17831798
/**

0 commit comments

Comments
 (0)