Skip to content

Commit b98072a

Browse files
yuzhaogoogleakpm00
authored andcommitted
mm/hugetlb_vmemmap: fix memory loads ordering
Using x86_64 as an example, for a 32KB struct page[] area describing a 2MB hugeTLB, HVO reduces the area to 4KB by the following steps: 1. Split the (r/w vmemmap) PMD mapping the area into 512 (r/w) PTEs; 2. For the 8 PTEs mapping the area, remap PTE 1-7 to the page mapped by PTE 0, and at the same time change the permission from r/w to r/o; 3. Free the pages PTE 1-7 used to map, hence the reduction from 32KB to 4KB. However, the following race can happen due to improperly memory loads ordering: CPU 1 (HVO) CPU 2 (speculative PFN walker) page_ref_freeze() synchronize_rcu() rcu_read_lock() page_is_fake_head() is false vmemmap_remap_pte() XXX: struct page[] becomes r/o page_ref_unfreeze() page_ref_count() is not zero atomic_add_unless(&page->_refcount) XXX: try to modify r/o struct page[] Specifically, page_is_fake_head() must be ordered after page_ref_count() on CPU 2 so that it can only return true for this case, to avoid the later attempt to modify r/o struct page[]. This patch adds the missing memory barrier and makes the tests on page_is_fake_head() and page_ref_count() done in the proper order. Link: https://lkml.kernel.org/r/[email protected] Fixes: bd22553 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers") Signed-off-by: Yu Zhao <[email protected]> Reported-by: Will Deacon <[email protected]> Closes: https://lore.kernel.org/20241128142028.GA3506@willie-the-truck/ Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Muchun Song <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent fe4cdc2 commit b98072a

File tree

2 files changed

+38
-1
lines changed

2 files changed

+38
-1
lines changed

include/linux/page-flags.h

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -226,11 +226,48 @@ static __always_inline const struct page *page_fixed_fake_head(const struct page
226226
}
227227
return page;
228228
}
229+
230+
static __always_inline bool page_count_writable(const struct page *page, int u)
231+
{
232+
if (!static_branch_unlikely(&hugetlb_optimize_vmemmap_key))
233+
return true;
234+
235+
/*
236+
* The refcount check is ordered before the fake-head check to prevent
237+
* the following race:
238+
* CPU 1 (HVO) CPU 2 (speculative PFN walker)
239+
*
240+
* page_ref_freeze()
241+
* synchronize_rcu()
242+
* rcu_read_lock()
243+
* page_is_fake_head() is false
244+
* vmemmap_remap_pte()
245+
* XXX: struct page[] becomes r/o
246+
*
247+
* page_ref_unfreeze()
248+
* page_ref_count() is not zero
249+
*
250+
* atomic_add_unless(&page->_refcount)
251+
* XXX: try to modify r/o struct page[]
252+
*
253+
* The refcount check also prevents modification attempts to other (r/o)
254+
* tail pages that are not fake heads.
255+
*/
256+
if (atomic_read_acquire(&page->_refcount) == u)
257+
return false;
258+
259+
return page_fixed_fake_head(page) == page;
260+
}
229261
#else
230262
static inline const struct page *page_fixed_fake_head(const struct page *page)
231263
{
232264
return page;
233265
}
266+
267+
static inline bool page_count_writable(const struct page *page, int u)
268+
{
269+
return true;
270+
}
234271
#endif
235272

236273
static __always_inline int page_is_fake_head(const struct page *page)

include/linux/page_ref.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -234,7 +234,7 @@ static inline bool page_ref_add_unless(struct page *page, int nr, int u)
234234

235235
rcu_read_lock();
236236
/* avoid writing to the vmemmap area being remapped */
237-
if (!page_is_fake_head(page) && page_ref_count(page) != u)
237+
if (page_count_writable(page, u))
238238
ret = atomic_add_unless(&page->_refcount, nr, u);
239239
rcu_read_unlock();
240240

0 commit comments

Comments
 (0)