Skip to content

Commit 8430557

Browse files
xzpeterakpm00
authored andcommitted
mm/page_table_check: support userfault wr-protect entries
Allow page_table_check hooks to check over userfaultfd wr-protect criteria upon pgtable updates. The rule is no co-existance allowed for any writable flag against userfault wr-protect flag. This should be better than c2da319, where we used to only sanitize such issues during a pgtable walk, but when hitting such issue we don't have a good chance to know where does that writable bit came from [1], so that even the pgtable walk exposes a kernel bug (which is still helpful on triaging) but not easy to track and debug. Now we switch to track the source. It's much easier too with the recent introduction of page table check. There are some limitations with using the page table check here for userfaultfd wr-protect purpose: - It is only enabled with explicit enablement of page table check configs and/or boot parameters, but should be good enough to track at least syzbot issues, as syzbot should enable PAGE_TABLE_CHECK[_ENFORCED] for x86 [1]. We used to have DEBUG_VM but it's now off for most distros, while distros also normally not enable PAGE_TABLE_CHECK[_ENFORCED], which is similar. - It conditionally works with the ptep_modify_prot API. It will be bypassed when e.g. XEN PV is enabled, however still work for most of the rest scenarios, which should be the common cases so should be good enough. - Hugetlb check is a bit hairy, as the page table check cannot identify hugetlb pte or normal pte via trapping at set_pte_at(), because of the current design where hugetlb maps every layers to pte_t... For example, the default set_huge_pte_at() can invoke set_pte_at() directly and lose the hugetlb context, treating it the same as a normal pte_t. So far it's fine because we have huge_pte_uffd_wp() always equals to pte_uffd_wp() as long as supported (x86 only). It'll be a bigger problem when we'll define _PAGE_UFFD_WP differently at various pgtable levels, because then one huge_pte_uffd_wp() per-arch will stop making sense first.. as of now we can leave this for later too. This patch also removes commit c2da319 altogether, as we have something better now. [1] https://lore.kernel.org/all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Pasha Tatashin <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Nadav Amit <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 3ccae1d commit 8430557

File tree

3 files changed

+39
-18
lines changed

3 files changed

+39
-18
lines changed

Documentation/mm/page_table_check.rst

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,21 @@ Page table check performs extra verifications at the time when new pages become
1414
accessible from the userspace by getting their page table entries (PTEs PMDs
1515
etc.) added into the table.
1616

17-
In case of detected corruption, the kernel is crashed. There is a small
17+
In case of most detected corruption, the kernel is crashed. There is a small
1818
performance and memory overhead associated with the page table check. Therefore,
1919
it is disabled by default, but can be optionally enabled on systems where the
2020
extra hardening outweighs the performance costs. Also, because page table check
2121
is synchronous, it can help with debugging double map memory corruption issues,
2222
by crashing kernel at the time wrong mapping occurs instead of later which is
2323
often the case with memory corruptions bugs.
2424

25+
It can also be used to do page table entry checks over various flags, dump
26+
warnings when illegal combinations of entry flags are detected. Currently,
27+
userfaultfd is the only user of such to sanity check wr-protect bit against
28+
any writable flags. Illegal flag combinations will not directly cause data
29+
corruption in this case immediately, but that will cause read-only data to
30+
be writable, leading to corrupt when the page content is later modified.
31+
2532
Double mapping detection logic
2633
==============================
2734

arch/x86/include/asm/pgtable.h

Lines changed: 1 addition & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -388,23 +388,7 @@ static inline pte_t pte_wrprotect(pte_t pte)
388388
#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
389389
static inline int pte_uffd_wp(pte_t pte)
390390
{
391-
bool wp = pte_flags(pte) & _PAGE_UFFD_WP;
392-
393-
#ifdef CONFIG_DEBUG_VM
394-
/*
395-
* Having write bit for wr-protect-marked present ptes is fatal,
396-
* because it means the uffd-wp bit will be ignored and write will
397-
* just go through.
398-
*
399-
* Use any chance of pgtable walking to verify this (e.g., when
400-
* page swapped out or being migrated for all purposes). It means
401-
* something is already wrong. Tell the admin even before the
402-
* process crashes. We also nail it with wrong pgtable setup.
403-
*/
404-
WARN_ON_ONCE(wp && pte_write(pte));
405-
#endif
406-
407-
return wp;
391+
return pte_flags(pte) & _PAGE_UFFD_WP;
408392
}
409393

410394
static inline pte_t pte_mkuffd_wp(pte_t pte)

mm/page_table_check.c

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77
#include <linux/kstrtox.h>
88
#include <linux/mm.h>
99
#include <linux/page_table_check.h>
10+
#include <linux/swap.h>
11+
#include <linux/swapops.h>
1012

1113
#undef pr_fmt
1214
#define pr_fmt(fmt) "page_table_check: " fmt
@@ -182,6 +184,22 @@ void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
182184
}
183185
EXPORT_SYMBOL(__page_table_check_pud_clear);
184186

187+
/* Whether the swap entry cached writable information */
188+
static inline bool swap_cached_writable(swp_entry_t entry)
189+
{
190+
return is_writable_device_exclusive_entry(entry) ||
191+
is_writable_device_private_entry(entry) ||
192+
is_writable_migration_entry(entry);
193+
}
194+
195+
static inline void page_table_check_pte_flags(pte_t pte)
196+
{
197+
if (pte_present(pte) && pte_uffd_wp(pte))
198+
WARN_ON_ONCE(pte_write(pte));
199+
else if (is_swap_pte(pte) && pte_swp_uffd_wp(pte))
200+
WARN_ON_ONCE(swap_cached_writable(pte_to_swp_entry(pte)));
201+
}
202+
185203
void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
186204
unsigned int nr)
187205
{
@@ -190,18 +208,30 @@ void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
190208
if (&init_mm == mm)
191209
return;
192210

211+
page_table_check_pte_flags(pte);
212+
193213
for (i = 0; i < nr; i++)
194214
__page_table_check_pte_clear(mm, ptep_get(ptep + i));
195215
if (pte_user_accessible_page(pte))
196216
page_table_check_set(pte_pfn(pte), nr, pte_write(pte));
197217
}
198218
EXPORT_SYMBOL(__page_table_check_ptes_set);
199219

220+
static inline void page_table_check_pmd_flags(pmd_t pmd)
221+
{
222+
if (pmd_present(pmd) && pmd_uffd_wp(pmd))
223+
WARN_ON_ONCE(pmd_write(pmd));
224+
else if (is_swap_pmd(pmd) && pmd_swp_uffd_wp(pmd))
225+
WARN_ON_ONCE(swap_cached_writable(pmd_to_swp_entry(pmd)));
226+
}
227+
200228
void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd)
201229
{
202230
if (&init_mm == mm)
203231
return;
204232

233+
page_table_check_pmd_flags(pmd);
234+
205235
__page_table_check_pmd_clear(mm, *pmdp);
206236
if (pmd_user_accessible_page(pmd)) {
207237
page_table_check_set(pmd_pfn(pmd), PMD_SIZE >> PAGE_SHIFT,

0 commit comments

Comments
 (0)