Skip to content
This repository was archived by the owner on Nov 8, 2023. It is now read-only.

Commit 17396e3

Browse files
thejhgregkh
authored andcommitted
mm/mremap: fix move_normal_pmd/retract_page_tables race
commit 6fa1066 upstream. In mremap(), move_page_tables() looks at the type of the PMD entry and the specified address range to figure out by which method the next chunk of page table entries should be moved. At that point, the mmap_lock is held in write mode, but no rmap locks are held yet. For PMD entries that point to page tables and are fully covered by the source address range, move_pgt_entry(NORMAL_PMD, ...) is called, which first takes rmap locks, then does move_normal_pmd(). move_normal_pmd() takes the necessary page table locks at source and destination, then moves an entire page table from the source to the destination. The problem is: The rmap locks, which protect against concurrent page table removal by retract_page_tables() in the THP code, are only taken after the PMD entry has been read and it has been decided how to move it. So we can race as follows (with two processes that have mappings of the same tmpfs file that is stored on a tmpfs mount with huge=advise); note that process A accesses page tables through the MM while process B does it through the file rmap: process A process B ========= ========= mremap mremap_to move_vma move_page_tables get_old_pmd alloc_new_pmd *** PREEMPT *** madvise(MADV_COLLAPSE) do_madvise madvise_walk_vmas madvise_vma_behavior madvise_collapse hpage_collapse_scan_file collapse_file retract_page_tables i_mmap_lock_read(mapping) pmdp_collapse_flush i_mmap_unlock_read(mapping) move_pgt_entry(NORMAL_PMD, ...) take_rmap_locks move_normal_pmd drop_rmap_locks When this happens, move_normal_pmd() can end up creating bogus PMD entries in the line `pmd_populate(mm, new_pmd, pmd_pgtable(pmd))`. The effect depends on arch-specific and machine-specific details; on x86, you can end up with physical page 0 mapped as a page table, which is likely exploitable for user->kernel privilege escalation. Fix the race by letting process B recheck that the PMD still points to a page table after the rmap locks have been taken. Otherwise, we bail and let the caller fall back to the PTE-level copying path, which will then bail immediately at the pmd_none() check. Bug reachability: Reaching this bug requires that you can create shmem/file THP mappings - anonymous THP uses different code that doesn't zap stuff under rmap locks. File THP is gated on an experimental config flag (CONFIG_READ_ONLY_THP_FOR_FS), so on normal distro kernels you need shmem THP to hit this bug. As far as I know, getting shmem THP normally requires that you can mount your own tmpfs with the right mount flags, which would require creating your own user+mount namespace; though I don't know if some distros maybe enable shmem THP by default or something like that. Bug impact: This issue can likely be used for user->kernel privilege escalation when it is reachable. Link: https://lkml.kernel.org/r/20241007-move_normal_pmd-vs-collapse-fix-2-v1-1-5ead9631f2ea@google.com Fixes: 1d65b77 ("mm/khugepaged: retract_page_tables() without mmap or vma lock") Signed-off-by: Jann Horn <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Co-developed-by: David Hildenbrand <[email protected]> Closes: https://project-zero.issues.chromium.org/371047675 Acked-by: Qi Zheng <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
1 parent 6b91fd6 commit 17396e3

File tree

1 file changed

+9
-2
lines changed

1 file changed

+9
-2
lines changed

mm/mremap.c

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -238,6 +238,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
238238
{
239239
spinlock_t *old_ptl, *new_ptl;
240240
struct mm_struct *mm = vma->vm_mm;
241+
bool res = false;
241242
pmd_t pmd;
242243

243244
if (!arch_supports_page_table_move())
@@ -277,19 +278,25 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
277278
if (new_ptl != old_ptl)
278279
spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
279280

280-
/* Clear the pmd */
281281
pmd = *old_pmd;
282+
283+
/* Racing with collapse? */
284+
if (unlikely(!pmd_present(pmd) || pmd_leaf(pmd)))
285+
goto out_unlock;
286+
/* Clear the pmd */
282287
pmd_clear(old_pmd);
288+
res = true;
283289

284290
VM_BUG_ON(!pmd_none(*new_pmd));
285291

286292
pmd_populate(mm, new_pmd, pmd_pgtable(pmd));
287293
flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
294+
out_unlock:
288295
if (new_ptl != old_ptl)
289296
spin_unlock(new_ptl);
290297
spin_unlock(old_ptl);
291298

292-
return true;
299+
return res;
293300
}
294301
#else
295302
static inline bool move_normal_pmd(struct vm_area_struct *vma,

0 commit comments

Comments
 (0)