Skip to content
This repository was archived by the owner on Nov 8, 2023. It is now read-only.

Commit 26273f5

Browse files
Yang Shigregkh
authored andcommitted
mm: gup: stop abusing try_grab_folio
commit f442fa6 upstream. A kernel warning was reported when pinning folio in CMA memory when launching SEV virtual machine. The splat looks like: [ 464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 __get_user_pages+0x423/0x520 [ 464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6 [ 464.325477] RIP: 0010:__get_user_pages+0x423/0x520 [ 464.325515] Call Trace: [ 464.325520] <TASK> [ 464.325523] ? __get_user_pages+0x423/0x520 [ 464.325528] ? __warn+0x81/0x130 [ 464.325536] ? __get_user_pages+0x423/0x520 [ 464.325541] ? report_bug+0x171/0x1a0 [ 464.325549] ? handle_bug+0x3c/0x70 [ 464.325554] ? exc_invalid_op+0x17/0x70 [ 464.325558] ? asm_exc_invalid_op+0x1a/0x20 [ 464.325567] ? __get_user_pages+0x423/0x520 [ 464.325575] __gup_longterm_locked+0x212/0x7a0 [ 464.325583] internal_get_user_pages_fast+0xfb/0x190 [ 464.325590] pin_user_pages_fast+0x47/0x60 [ 464.325598] sev_pin_memory+0xca/0x170 [kvm_amd] [ 464.325616] sev_mem_enc_register_region+0x81/0x130 [kvm_amd] Per the analysis done by yangge, when starting the SEV virtual machine, it will call pin_user_pages_fast(..., FOLL_LONGTERM, ...) to pin the memory. But the page is in CMA area, so fast GUP will fail then fallback to the slow path due to the longterm pinnalbe check in try_grab_folio(). The slow path will try to pin the pages then migrate them out of CMA area. But the slow path also uses try_grab_folio() to pin the page, it will also fail due to the same check then the above warning is triggered. In addition, the try_grab_folio() is supposed to be used in fast path and it elevates folio refcount by using add ref unless zero. We are guaranteed to have at least one stable reference in slow path, so the simple atomic add could be used. The performance difference should be trivial, but the misuse may be confusing and misleading. Redefined try_grab_folio() to try_grab_folio_fast(), and try_grab_page() to try_grab_folio(), and use them in the proper paths. This solves both the abuse and the kernel warning. The proper naming makes their usecase more clear and should prevent from abusing in the future. peterx said: : The user will see the pin fails, for gpu-slow it further triggers the WARN : right below that failure (as in the original report): : : folio = try_grab_folio(page, page_increm - 1, : foll_flags); : if (WARN_ON_ONCE(!folio)) { <------------------------ here : /* : * Release the 1st page ref if the : * folio is problematic, fail hard. : */ : gup_put_folio(page_folio(page), 1, : foll_flags); : ret = -EFAULT; : goto out; : } [1] https://lore.kernel.org/linux-mm/[email protected]/ [[email protected]: fix implicit declaration of function try_grab_folio_fast] Link: https://lkml.kernel.org/r/CAHbLzkowMSso-4Nufc9hcMehQsK9PNz3OSu-+eniU-2Mm-xjhA@mail.gmail.com Link: https://lkml.kernel.org/r/[email protected] Fixes: 57edfcf ("mm/gup: accelerate thp gup even for "pages != NULL"") Signed-off-by: Yang Shi <[email protected]> Reported-by: yangge <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Peter Xu <[email protected]> Cc: <[email protected]> [6.6+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
1 parent 9eae190 commit 26273f5

File tree

4 files changed

+135
-128
lines changed

4 files changed

+135
-128
lines changed

mm/gup.c

Lines changed: 129 additions & 122 deletions
Original file line numberDiff line numberDiff line change
@@ -97,95 +97,6 @@ static inline struct folio *try_get_folio(struct page *page, int refs)
9797
return folio;
9898
}
9999

100-
/**
101-
* try_grab_folio() - Attempt to get or pin a folio.
102-
* @page: pointer to page to be grabbed
103-
* @refs: the value to (effectively) add to the folio's refcount
104-
* @flags: gup flags: these are the FOLL_* flag values.
105-
*
106-
* "grab" names in this file mean, "look at flags to decide whether to use
107-
* FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount.
108-
*
109-
* Either FOLL_PIN or FOLL_GET (or neither) must be set, but not both at the
110-
* same time. (That's true throughout the get_user_pages*() and
111-
* pin_user_pages*() APIs.) Cases:
112-
*
113-
* FOLL_GET: folio's refcount will be incremented by @refs.
114-
*
115-
* FOLL_PIN on large folios: folio's refcount will be incremented by
116-
* @refs, and its pincount will be incremented by @refs.
117-
*
118-
* FOLL_PIN on single-page folios: folio's refcount will be incremented by
119-
* @refs * GUP_PIN_COUNTING_BIAS.
120-
*
121-
* Return: The folio containing @page (with refcount appropriately
122-
* incremented) for success, or NULL upon failure. If neither FOLL_GET
123-
* nor FOLL_PIN was set, that's considered failure, and furthermore,
124-
* a likely bug in the caller, so a warning is also emitted.
125-
*/
126-
struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
127-
{
128-
struct folio *folio;
129-
130-
if (WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == 0))
131-
return NULL;
132-
133-
if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
134-
return NULL;
135-
136-
if (flags & FOLL_GET)
137-
return try_get_folio(page, refs);
138-
139-
/* FOLL_PIN is set */
140-
141-
/*
142-
* Don't take a pin on the zero page - it's not going anywhere
143-
* and it is used in a *lot* of places.
144-
*/
145-
if (is_zero_page(page))
146-
return page_folio(page);
147-
148-
folio = try_get_folio(page, refs);
149-
if (!folio)
150-
return NULL;
151-
152-
/*
153-
* Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
154-
* right zone, so fail and let the caller fall back to the slow
155-
* path.
156-
*/
157-
if (unlikely((flags & FOLL_LONGTERM) &&
158-
!folio_is_longterm_pinnable(folio))) {
159-
if (!put_devmap_managed_page_refs(&folio->page, refs))
160-
folio_put_refs(folio, refs);
161-
return NULL;
162-
}
163-
164-
/*
165-
* When pinning a large folio, use an exact count to track it.
166-
*
167-
* However, be sure to *also* increment the normal folio
168-
* refcount field at least once, so that the folio really
169-
* is pinned. That's why the refcount from the earlier
170-
* try_get_folio() is left intact.
171-
*/
172-
if (folio_test_large(folio))
173-
atomic_add(refs, &folio->_pincount);
174-
else
175-
folio_ref_add(folio,
176-
refs * (GUP_PIN_COUNTING_BIAS - 1));
177-
/*
178-
* Adjust the pincount before re-checking the PTE for changes.
179-
* This is essentially a smp_mb() and is paired with a memory
180-
* barrier in page_try_share_anon_rmap().
181-
*/
182-
smp_mb__after_atomic();
183-
184-
node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
185-
186-
return folio;
187-
}
188-
189100
static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
190101
{
191102
if (flags & FOLL_PIN) {
@@ -203,58 +114,59 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
203114
}
204115

205116
/**
206-
* try_grab_page() - elevate a page's refcount by a flag-dependent amount
207-
* @page: pointer to page to be grabbed
208-
* @flags: gup flags: these are the FOLL_* flag values.
117+
* try_grab_folio() - add a folio's refcount by a flag-dependent amount
118+
* @folio: pointer to folio to be grabbed
119+
* @refs: the value to (effectively) add to the folio's refcount
120+
* @flags: gup flags: these are the FOLL_* flag values
209121
*
210122
* This might not do anything at all, depending on the flags argument.
211123
*
212124
* "grab" names in this file mean, "look at flags to decide whether to use
213-
* FOLL_PIN or FOLL_GET behavior, when incrementing the page's refcount.
125+
* FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount.
214126
*
215127
* Either FOLL_PIN or FOLL_GET (or neither) may be set, but not both at the same
216-
* time. Cases: please see the try_grab_folio() documentation, with
217-
* "refs=1".
128+
* time.
218129
*
219130
* Return: 0 for success, or if no action was required (if neither FOLL_PIN
220131
* nor FOLL_GET was set, nothing is done). A negative error code for failure:
221132
*
222-
* -ENOMEM FOLL_GET or FOLL_PIN was set, but the page could not
133+
* -ENOMEM FOLL_GET or FOLL_PIN was set, but the folio could not
223134
* be grabbed.
135+
*
136+
* It is called when we have a stable reference for the folio, typically in
137+
* GUP slow path.
224138
*/
225-
int __must_check try_grab_page(struct page *page, unsigned int flags)
139+
int __must_check try_grab_folio(struct folio *folio, int refs,
140+
unsigned int flags)
226141
{
227-
struct folio *folio = page_folio(page);
228-
229142
if (WARN_ON_ONCE(folio_ref_count(folio) <= 0))
230143
return -ENOMEM;
231144

232-
if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
145+
if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(&folio->page)))
233146
return -EREMOTEIO;
234147

235148
if (flags & FOLL_GET)
236-
folio_ref_inc(folio);
149+
folio_ref_add(folio, refs);
237150
else if (flags & FOLL_PIN) {
238151
/*
239152
* Don't take a pin on the zero page - it's not going anywhere
240153
* and it is used in a *lot* of places.
241154
*/
242-
if (is_zero_page(page))
155+
if (is_zero_folio(folio))
243156
return 0;
244157

245158
/*
246-
* Similar to try_grab_folio(): be sure to *also*
247-
* increment the normal page refcount field at least once,
159+
* Increment the normal page refcount field at least once,
248160
* so that the page really is pinned.
249161
*/
250162
if (folio_test_large(folio)) {
251-
folio_ref_add(folio, 1);
252-
atomic_add(1, &folio->_pincount);
163+
folio_ref_add(folio, refs);
164+
atomic_add(refs, &folio->_pincount);
253165
} else {
254-
folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);
166+
folio_ref_add(folio, refs * GUP_PIN_COUNTING_BIAS);
255167
}
256168

257-
node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1);
169+
node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
258170
}
259171

260172
return 0;
@@ -647,8 +559,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
647559
VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
648560
!PageAnonExclusive(page), page);
649561

650-
/* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */
651-
ret = try_grab_page(page, flags);
562+
/* try_grab_folio() does nothing unless FOLL_GET or FOLL_PIN is set. */
563+
ret = try_grab_folio(page_folio(page), 1, flags);
652564
if (unlikely(ret)) {
653565
page = ERR_PTR(ret);
654566
goto out;
@@ -899,7 +811,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
899811
goto unmap;
900812
*page = pte_page(entry);
901813
}
902-
ret = try_grab_page(*page, gup_flags);
814+
ret = try_grab_folio(page_folio(*page), 1, gup_flags);
903815
if (unlikely(ret))
904816
goto unmap;
905817
out:
@@ -1302,20 +1214,19 @@ static long __get_user_pages(struct mm_struct *mm,
13021214
* pages.
13031215
*/
13041216
if (page_increm > 1) {
1305-
struct folio *folio;
1217+
struct folio *folio = page_folio(page);
13061218

13071219
/*
13081220
* Since we already hold refcount on the
13091221
* large folio, this should never fail.
13101222
*/
1311-
folio = try_grab_folio(page, page_increm - 1,
1312-
foll_flags);
1313-
if (WARN_ON_ONCE(!folio)) {
1223+
if (try_grab_folio(folio, page_increm - 1,
1224+
foll_flags)) {
13141225
/*
13151226
* Release the 1st page ref if the
13161227
* folio is problematic, fail hard.
13171228
*/
1318-
gup_put_folio(page_folio(page), 1,
1229+
gup_put_folio(folio, 1,
13191230
foll_flags);
13201231
ret = -EFAULT;
13211232
goto out;
@@ -2541,6 +2452,102 @@ static void __maybe_unused undo_dev_pagemap(int *nr, int nr_start,
25412452
}
25422453
}
25432454

2455+
/**
2456+
* try_grab_folio_fast() - Attempt to get or pin a folio in fast path.
2457+
* @page: pointer to page to be grabbed
2458+
* @refs: the value to (effectively) add to the folio's refcount
2459+
* @flags: gup flags: these are the FOLL_* flag values.
2460+
*
2461+
* "grab" names in this file mean, "look at flags to decide whether to use
2462+
* FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount.
2463+
*
2464+
* Either FOLL_PIN or FOLL_GET (or neither) must be set, but not both at the
2465+
* same time. (That's true throughout the get_user_pages*() and
2466+
* pin_user_pages*() APIs.) Cases:
2467+
*
2468+
* FOLL_GET: folio's refcount will be incremented by @refs.
2469+
*
2470+
* FOLL_PIN on large folios: folio's refcount will be incremented by
2471+
* @refs, and its pincount will be incremented by @refs.
2472+
*
2473+
* FOLL_PIN on single-page folios: folio's refcount will be incremented by
2474+
* @refs * GUP_PIN_COUNTING_BIAS.
2475+
*
2476+
* Return: The folio containing @page (with refcount appropriately
2477+
* incremented) for success, or NULL upon failure. If neither FOLL_GET
2478+
* nor FOLL_PIN was set, that's considered failure, and furthermore,
2479+
* a likely bug in the caller, so a warning is also emitted.
2480+
*
2481+
* It uses add ref unless zero to elevate the folio refcount and must be called
2482+
* in fast path only.
2483+
*/
2484+
static struct folio *try_grab_folio_fast(struct page *page, int refs,
2485+
unsigned int flags)
2486+
{
2487+
struct folio *folio;
2488+
2489+
/* Raise warn if it is not called in fast GUP */
2490+
VM_WARN_ON_ONCE(!irqs_disabled());
2491+
2492+
if (WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == 0))
2493+
return NULL;
2494+
2495+
if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
2496+
return NULL;
2497+
2498+
if (flags & FOLL_GET)
2499+
return try_get_folio(page, refs);
2500+
2501+
/* FOLL_PIN is set */
2502+
2503+
/*
2504+
* Don't take a pin on the zero page - it's not going anywhere
2505+
* and it is used in a *lot* of places.
2506+
*/
2507+
if (is_zero_page(page))
2508+
return page_folio(page);
2509+
2510+
folio = try_get_folio(page, refs);
2511+
if (!folio)
2512+
return NULL;
2513+
2514+
/*
2515+
* Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
2516+
* right zone, so fail and let the caller fall back to the slow
2517+
* path.
2518+
*/
2519+
if (unlikely((flags & FOLL_LONGTERM) &&
2520+
!folio_is_longterm_pinnable(folio))) {
2521+
if (!put_devmap_managed_page_refs(&folio->page, refs))
2522+
folio_put_refs(folio, refs);
2523+
return NULL;
2524+
}
2525+
2526+
/*
2527+
* When pinning a large folio, use an exact count to track it.
2528+
*
2529+
* However, be sure to *also* increment the normal folio
2530+
* refcount field at least once, so that the folio really
2531+
* is pinned. That's why the refcount from the earlier
2532+
* try_get_folio() is left intact.
2533+
*/
2534+
if (folio_test_large(folio))
2535+
atomic_add(refs, &folio->_pincount);
2536+
else
2537+
folio_ref_add(folio,
2538+
refs * (GUP_PIN_COUNTING_BIAS - 1));
2539+
/*
2540+
* Adjust the pincount before re-checking the PTE for changes.
2541+
* This is essentially a smp_mb() and is paired with a memory
2542+
* barrier in folio_try_share_anon_rmap_*().
2543+
*/
2544+
smp_mb__after_atomic();
2545+
2546+
node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
2547+
2548+
return folio;
2549+
}
2550+
25442551
#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
25452552
/*
25462553
* Fast-gup relies on pte change detection to avoid concurrent pgtable
@@ -2605,7 +2612,7 @@ static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
26052612
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
26062613
page = pte_page(pte);
26072614

2608-
folio = try_grab_folio(page, 1, flags);
2615+
folio = try_grab_folio_fast(page, 1, flags);
26092616
if (!folio)
26102617
goto pte_unmap;
26112618

@@ -2699,7 +2706,7 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr,
26992706

27002707
SetPageReferenced(page);
27012708
pages[*nr] = page;
2702-
if (unlikely(try_grab_page(page, flags))) {
2709+
if (unlikely(try_grab_folio(page_folio(page), 1, flags))) {
27032710
undo_dev_pagemap(nr, nr_start, flags, pages);
27042711
break;
27052712
}
@@ -2808,7 +2815,7 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
28082815
page = nth_page(pte_page(pte), (addr & (sz - 1)) >> PAGE_SHIFT);
28092816
refs = record_subpages(page, addr, end, pages + *nr);
28102817

2811-
folio = try_grab_folio(page, refs, flags);
2818+
folio = try_grab_folio_fast(page, refs, flags);
28122819
if (!folio)
28132820
return 0;
28142821

@@ -2879,7 +2886,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
28792886
page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
28802887
refs = record_subpages(page, addr, end, pages + *nr);
28812888

2882-
folio = try_grab_folio(page, refs, flags);
2889+
folio = try_grab_folio_fast(page, refs, flags);
28832890
if (!folio)
28842891
return 0;
28852892

@@ -2923,7 +2930,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
29232930
page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
29242931
refs = record_subpages(page, addr, end, pages + *nr);
29252932

2926-
folio = try_grab_folio(page, refs, flags);
2933+
folio = try_grab_folio_fast(page, refs, flags);
29272934
if (!folio)
29282935
return 0;
29292936

@@ -2963,7 +2970,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
29632970
page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT);
29642971
refs = record_subpages(page, addr, end, pages + *nr);
29652972

2966-
folio = try_grab_folio(page, refs, flags);
2973+
folio = try_grab_folio_fast(page, refs, flags);
29672974
if (!folio)
29682975
return 0;
29692976

0 commit comments

Comments
 (0)