Skip to content

Commit 9039b90

Browse files
luiz-capakpm00
authored andcommitted
mm: page_ext: add an iteration API for page extensions
Patch series "mm: page_ext: Introduce new iteration API", v3. Introduction ============ [ Thanks to David Hildenbrand for identifying the root cause of this issue and proving guidance on how to fix it. The new API idea, bugs and misconceptions are all mine though ] Currently, trying to reserve 1G pages with page_owner=on and sparsemem causes a crash. The reproducer is very simple: 1. Build the kernel with CONFIG_SPARSEMEM=y and the table extensions 2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line 3. Reserve one 1G page at run-time, this should crash (see patch 1 for backtrace) [ A crash with page_table_check is also possible, but harder to trigger ] Apparently, starting with commit cf54f31 ("mm/hugetlb: use __GFP_COMP for gigantic folios") we now pass the full allocation order to page extension clients and the page extension implementation assumes that all PFNs of an allocation range will be stored in the same memory section (which is not true for 1G pages). To fix this, this series introduces a new iteration API for page extension objects. The API checks if the next page extension object can be retrieved from the current section or if it needs to look up for it in another section. Please, find all details in patch 1. I tested this series on arm64 and x86 by reserving 1G pages at run-time and doing kernel builds (always with page_owner=on and page_table_check=on). This patch (of 3): The page extension implementation assumes that all page extensions of a given page order are stored in the same memory section. The function page_ext_next() relies on this assumption by adding an offset to the current object to return the next adjacent page extension. This behavior works as expected for flatmem but fails for sparsemem when using 1G pages. The commit cf54f31 ("mm/hugetlb: use __GFP_COMP for gigantic folios") exposes this issue, making it possible for a crash when using page_owner or page_table_check page extensions. The problem is that for 1G pages, the page extensions may span memory section boundaries and be stored in different memory sections. This issue was not visible before commit cf54f31 ("mm/hugetlb: use __GFP_COMP for gigantic folios") because alloc_contig_pages() never passed more than MAX_PAGE_ORDER to post_alloc_hook(). However, the series introducing mentioned commit changed this behavior allowing the full 1G page order to be passed. Reproducer: 1. Build the kernel with CONFIG_SPARSEMEM=y and table extensions support 2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line 3. Reserve one 1G page at run-time, this should crash (backtrace below) To address this issue, this commit introduces a new API for iterating through page extensions. The main iteration macro is for_each_page_ext() and it must be called with the RCU read lock taken. Here's an usage example: """ struct page_ext_iter iter; struct page_ext *page_ext; ... rcu_read_lock(); for_each_page_ext(page, 1 << order, page_ext, iter) { struct my_page_ext *obj = get_my_page_ext_obj(page_ext); ... } rcu_read_unlock(); """ The loop construct uses page_ext_iter_next() which checks to see if we have crossed sections in the iteration. In this case, page_ext_iter_next() retrieves the next page_ext object from another section. Thanks to David Hildenbrand for helping identify the root cause and providing suggestions on how to fix and optmize the solution (final implementation and bugs are all mine through). Lastly, here's the backtrace, without kasan you can get random crashes: [ 76.052526] BUG: KASAN: slab-out-of-bounds in __update_page_owner_handle+0x238/0x298 [ 76.060283] Write of size 4 at addr ffff07ff96240038 by task tee/3598 [ 76.066714] [ 76.068203] CPU: 88 UID: 0 PID: 3598 Comm: tee Kdump: loaded Not tainted 6.13.0-rep1 #3 [ 76.076202] Hardware name: WIWYNN Mt.Jade Server System B81.030Z1.0007/Mt.Jade Motherboard, BIOS 2.10.20220810 (SCP: 2.10.20220810) 2022/08/10 [ 76.088972] Call trace: [ 76.091411] show_stack+0x20/0x38 (C) [ 76.095073] dump_stack_lvl+0x80/0xf8 [ 76.098733] print_address_description.constprop.0+0x88/0x398 [ 76.104476] print_report+0xa8/0x278 [ 76.108041] kasan_report+0xa8/0xf8 [ 76.111520] __asan_report_store4_noabort+0x20/0x30 [ 76.116391] __update_page_owner_handle+0x238/0x298 [ 76.121259] __set_page_owner+0xdc/0x140 [ 76.125173] post_alloc_hook+0x190/0x1d8 [ 76.129090] alloc_contig_range_noprof+0x54c/0x890 [ 76.133874] alloc_contig_pages_noprof+0x35c/0x4a8 [ 76.138656] alloc_gigantic_folio.isra.0+0x2c0/0x368 [ 76.143616] only_alloc_fresh_hugetlb_folio.isra.0+0x24/0x150 [ 76.149353] alloc_pool_huge_folio+0x11c/0x1f8 [ 76.153787] set_max_huge_pages+0x364/0xca8 [ 76.157961] __nr_hugepages_store_common+0xb0/0x1a0 [ 76.162829] nr_hugepages_store+0x108/0x118 [ 76.167003] kobj_attr_store+0x3c/0x70 [ 76.170745] sysfs_kf_write+0xfc/0x188 [ 76.174492] kernfs_fop_write_iter+0x274/0x3e0 [ 76.178927] vfs_write+0x64c/0x8e0 [ 76.182323] ksys_write+0xf8/0x1f0 [ 76.185716] __arm64_sys_write+0x74/0xb0 [ 76.189630] invoke_syscall.constprop.0+0xd8/0x1e0 [ 76.194412] do_el0_svc+0x164/0x1e0 [ 76.197891] el0_svc+0x40/0xe0 [ 76.200939] el0t_64_sync_handler+0x144/0x168 [ 76.205287] el0t_64_sync+0x1ac/0x1b0 Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/a45893880b7e1601082d39d2c5c8b50bcc096305.1741301089.git.luizcap@redhat.com Fixes: cf54f31 ("mm/hugetlb: use __GFP_COMP for gigantic folios") Signed-off-by: Luiz Capitulino <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Luiz Capitulino <[email protected]> Cc: Muchun Song <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 11e88e9 commit 9039b90

File tree

2 files changed

+106
-0
lines changed

2 files changed

+106
-0
lines changed

include/linux/page_ext.h

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
#define __LINUX_PAGE_EXT_H
44

55
#include <linux/types.h>
6+
#include <linux/mmzone.h>
67
#include <linux/stacktrace.h>
78

89
struct pglist_data;
@@ -69,16 +70,31 @@ extern void page_ext_init(void);
6970
static inline void page_ext_init_flatmem_late(void)
7071
{
7172
}
73+
74+
static inline bool page_ext_iter_next_fast_possible(unsigned long next_pfn)
75+
{
76+
/*
77+
* page_ext is allocated per memory section. Once we cross a
78+
* memory section, we have to fetch the new pointer.
79+
*/
80+
return next_pfn % PAGES_PER_SECTION;
81+
}
7282
#else
7383
extern void page_ext_init_flatmem(void);
7484
extern void page_ext_init_flatmem_late(void);
7585
static inline void page_ext_init(void)
7686
{
7787
}
88+
89+
static inline bool page_ext_iter_next_fast_possible(unsigned long next_pfn)
90+
{
91+
return true;
92+
}
7893
#endif
7994

8095
extern struct page_ext *page_ext_get(const struct page *page);
8196
extern void page_ext_put(struct page_ext *page_ext);
97+
extern struct page_ext *page_ext_lookup(unsigned long pfn);
8298

8399
static inline void *page_ext_data(struct page_ext *page_ext,
84100
struct page_ext_operations *ops)
@@ -93,6 +109,83 @@ static inline struct page_ext *page_ext_next(struct page_ext *curr)
93109
return next;
94110
}
95111

112+
struct page_ext_iter {
113+
unsigned long index;
114+
unsigned long start_pfn;
115+
struct page_ext *page_ext;
116+
};
117+
118+
/**
119+
* page_ext_iter_begin() - Prepare for iterating through page extensions.
120+
* @iter: page extension iterator.
121+
* @pfn: PFN of the page we're interested in.
122+
*
123+
* Must be called with RCU read lock taken.
124+
*
125+
* Return: NULL if no page_ext exists for this page.
126+
*/
127+
static inline struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter,
128+
unsigned long pfn)
129+
{
130+
iter->index = 0;
131+
iter->start_pfn = pfn;
132+
iter->page_ext = page_ext_lookup(pfn);
133+
134+
return iter->page_ext;
135+
}
136+
137+
/**
138+
* page_ext_iter_next() - Get next page extension
139+
* @iter: page extension iterator.
140+
*
141+
* Must be called with RCU read lock taken.
142+
*
143+
* Return: NULL if no next page_ext exists.
144+
*/
145+
static inline struct page_ext *page_ext_iter_next(struct page_ext_iter *iter)
146+
{
147+
unsigned long pfn;
148+
149+
if (WARN_ON_ONCE(!iter->page_ext))
150+
return NULL;
151+
152+
iter->index++;
153+
pfn = iter->start_pfn + iter->index;
154+
155+
if (page_ext_iter_next_fast_possible(pfn))
156+
iter->page_ext = page_ext_next(iter->page_ext);
157+
else
158+
iter->page_ext = page_ext_lookup(pfn);
159+
160+
return iter->page_ext;
161+
}
162+
163+
/**
164+
* page_ext_iter_get() - Get current page extension
165+
* @iter: page extension iterator.
166+
*
167+
* Return: NULL if no page_ext exists for this iterator.
168+
*/
169+
static inline struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter)
170+
{
171+
return iter->page_ext;
172+
}
173+
174+
/**
175+
* for_each_page_ext(): iterate through page_ext objects.
176+
* @__page: the page we're interested in
177+
* @__pgcount: how many pages to iterate through
178+
* @__page_ext: struct page_ext pointer where the current page_ext
179+
* object is returned
180+
* @__iter: struct page_ext_iter object (defined in the stack)
181+
*
182+
* IMPORTANT: must be called with RCU read lock taken.
183+
*/
184+
#define for_each_page_ext(__page, __pgcount, __page_ext, __iter) \
185+
for (__page_ext = page_ext_iter_begin(&__iter, page_to_pfn(__page));\
186+
__page_ext && __iter.index < __pgcount; \
187+
__page_ext = page_ext_iter_next(&__iter))
188+
96189
#else /* !CONFIG_PAGE_EXTENSION */
97190
struct page_ext;
98191

mm/page_ext.c

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -507,6 +507,19 @@ void __meminit pgdat_page_ext_init(struct pglist_data *pgdat)
507507

508508
#endif
509509

510+
/**
511+
* page_ext_lookup() - Lookup a page extension for a PFN.
512+
* @pfn: PFN of the page we're interested in.
513+
*
514+
* Must be called with RCU read lock taken and @pfn must be valid.
515+
*
516+
* Return: NULL if no page_ext exists for this page.
517+
*/
518+
struct page_ext *page_ext_lookup(unsigned long pfn)
519+
{
520+
return lookup_page_ext(pfn_to_page(pfn));
521+
}
522+
510523
/**
511524
* page_ext_get() - Get the extended information for a page.
512525
* @page: The page we're interested in.

0 commit comments

Comments
 (0)