Skip to content

Commit df7a6d1

Browse files
leitaoakpm00
authored andcommitted
mm/hugetlb: restore the reservation if needed
Patch series "mm/hugetlb: Restore the reservation", v2. This is a fix for a case where a backing huge page could stolen after madvise(MADV_DONTNEED). A full reproducer is in selftest. See https://lore.kernel.org/all/[email protected]/ In order to test this patch, I instrumented the kernel with LOCKDEP and KASAN, and run the following tests, without any regression: * The self test that reproduces the problem * All mm hugetlb selftests SUMMARY: PASS=9 SKIP=0 FAIL=0 * All libhugetlbfs tests PASS: 0 86 FAIL: 0 0 This patch (of 2): Currently there is a bug that a huge page could be stolen, and when the original owner tries to fault in it, it causes a page fault. You can achieve that by: 1) Creating a single page echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 2) mmap() the page above with MAP_HUGETLB into (void *ptr1). * This will mark the page as reserved 3) touch the page, which causes a page fault and allocates the page * This will move the page out of the free list. * It will also unreserved the page, since there is no more free page 4) madvise(MADV_DONTNEED) the page * This will free the page, but not mark it as reserved. 5) Allocate a secondary page with mmap(MAP_HUGETLB) into (void *ptr2). * it should fail, but, since there is no more available page. * But, since the page above is not reserved, this mmap() succeed. 6) Faulting at ptr1 will cause a SIGBUS * it will try to allocate a huge page, but there is none available A full reproducer is in selftest. See https://lore.kernel.org/all/[email protected]/ Fix this by restoring the reserved page if necessary. These are the condition for the page restore: * The system is not using surplus pages. The goal is to reduce the surplus usage for this case. * If the VMA has the HPAGE_RESV_OWNER flag set, and is PRIVATE. This is safely checked using __vma_private_lock() * The page is anonymous Once this is scenario is found, set the `hugetlb_restore_reserve` bit in the folio. Then check if the resv reservations need to be adjusted later, done later, after the spinlock, since the vma_xxxx_reservation() might touch the file system lock. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Breno Leitao <[email protected]> Suggested-by: Rik van Riel <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 4e76c8c commit df7a6d1

File tree

1 file changed

+25
-0
lines changed

1 file changed

+25
-0
lines changed

mm/hugetlb.c

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5585,6 +5585,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
55855585
struct page *page;
55865586
struct hstate *h = hstate_vma(vma);
55875587
unsigned long sz = huge_page_size(h);
5588+
bool adjust_reservation = false;
55885589
unsigned long last_addr_mask;
55895590
bool force_flush = false;
55905591

@@ -5677,7 +5678,31 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
56775678
hugetlb_count_sub(pages_per_huge_page(h), mm);
56785679
hugetlb_remove_rmap(page_folio(page));
56795680

5681+
/*
5682+
* Restore the reservation for anonymous page, otherwise the
5683+
* backing page could be stolen by someone.
5684+
* If there we are freeing a surplus, do not set the restore
5685+
* reservation bit.
5686+
*/
5687+
if (!h->surplus_huge_pages && __vma_private_lock(vma) &&
5688+
folio_test_anon(page_folio(page))) {
5689+
folio_set_hugetlb_restore_reserve(page_folio(page));
5690+
/* Reservation to be adjusted after the spin lock */
5691+
adjust_reservation = true;
5692+
}
5693+
56805694
spin_unlock(ptl);
5695+
5696+
/*
5697+
* Adjust the reservation for the region that will have the
5698+
* reserve restored. Keep in mind that vma_needs_reservation() changes
5699+
* resv->adds_in_progress if it succeeds. If this is not done,
5700+
* do_exit() will not see it, and will keep the reservation
5701+
* forever.
5702+
*/
5703+
if (adjust_reservation && vma_needs_reservation(h, vma, address))
5704+
vma_add_reservation(h, vma, address);
5705+
56815706
tlb_remove_page_size(tlb, page, huge_page_size(h));
56825707
/*
56835708
* Bail out after unmapping reference page if supplied

0 commit comments

Comments
 (0)