Skip to content

Commit d042035

Browse files
xzpetertorvalds
authored andcommitted
mm/thp: Split huge pmds/puds if they're pinned when fork()
Pinned pages shouldn't be write-protected when fork() happens, because follow up copy-on-write on these pages could cause the pinned pages to be replaced by random newly allocated pages. For huge PMDs, we split the huge pmd if pinning is detected. So that future handling will be done by the PTE level (with our latest changes, each of the small pages will be copied). We can achieve this by let copy_huge_pmd() return -EAGAIN for pinned pages, so that we'll fallthrough in copy_pmd_range() and finally land the next copy_pte_range() call. Huge PUDs will be even more special - so far it does not support anonymous pages. But it can actually be done the same as the huge PMDs even if the split huge PUDs means to erase the PUD entries. It'll guarantee the follow up fault ins will remap the same pages in either parent/child later. This might not be the most efficient way, but it should be easy and clean enough. It should be fine, since we're tackling with a very rare case just to make sure userspaces that pinned some thps will still work even without MADV_DONTFORK and after they fork()ed. Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 70e806e commit d042035

File tree

1 file changed

+28
-0
lines changed

1 file changed

+28
-0
lines changed

mm/huge_memory.c

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1074,6 +1074,24 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
10741074

10751075
src_page = pmd_page(pmd);
10761076
VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
1077+
1078+
/*
1079+
* If this page is a potentially pinned page, split and retry the fault
1080+
* with smaller page size. Normally this should not happen because the
1081+
* userspace should use MADV_DONTFORK upon pinned regions. This is a
1082+
* best effort that the pinned pages won't be replaced by another
1083+
* random page during the coming copy-on-write.
1084+
*/
1085+
if (unlikely(is_cow_mapping(vma->vm_flags) &&
1086+
atomic_read(&src_mm->has_pinned) &&
1087+
page_maybe_dma_pinned(src_page))) {
1088+
pte_free(dst_mm, pgtable);
1089+
spin_unlock(src_ptl);
1090+
spin_unlock(dst_ptl);
1091+
__split_huge_pmd(vma, src_pmd, addr, false, NULL);
1092+
return -EAGAIN;
1093+
}
1094+
10771095
get_page(src_page);
10781096
page_dup_rmap(src_page, true);
10791097
add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
@@ -1177,6 +1195,16 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
11771195
/* No huge zero pud yet */
11781196
}
11791197

1198+
/* Please refer to comments in copy_huge_pmd() */
1199+
if (unlikely(is_cow_mapping(vma->vm_flags) &&
1200+
atomic_read(&src_mm->has_pinned) &&
1201+
page_maybe_dma_pinned(pud_page(pud)))) {
1202+
spin_unlock(src_ptl);
1203+
spin_unlock(dst_ptl);
1204+
__split_huge_pud(vma, src_pud, addr);
1205+
return -EAGAIN;
1206+
}
1207+
11801208
pudp_set_wrprotect(src_mm, addr, src_pud);
11811209
pud = pud_mkold(pud_wrprotect(pud));
11821210
set_pud_at(dst_mm, addr, dst_pud, pud);

0 commit comments

Comments
 (0)