Skip to content

Commit bda807d

Browse files
minchanktorvalds
authored andcommitted
mm: migrate: support non-lru movable page migration
We have allowed migration for only LRU pages until now and it was enough to make high-order pages. But recently, embedded system(e.g., webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory) so we have seen several reports about troubles of small high-order allocation. For fixing the problem, there were several efforts (e,g,. enhance compaction algorithm, SLUB fallback to 0-order page, reserved memory, vmalloc and so on) but if there are lots of non-movable pages in system, their solutions are void in the long run. So, this patch is to support facility to change non-movable pages with movable. For the feature, this patch introduces functions related to migration to address_space_operations as well as some page flags. If a driver want to make own pages movable, it should define three functions which are function pointers of struct address_space_operations. 1. bool (*isolate_page) (struct page *page, isolate_mode_t mode); What VM expects on isolate_page function of driver is to return *true* if driver isolates page successfully. On returing true, VM marks the page as PG_isolated so concurrent isolation in several CPUs skip the page for isolation. If a driver cannot isolate the page, it should return *false*. Once page is successfully isolated, VM uses page.lru fields so driver shouldn't expect to preserve values in that fields. 2. int (*migratepage) (struct address_space *mapping, struct page *newpage, struct page *oldpage, enum migrate_mode); After isolation, VM calls migratepage of driver with isolated page. The function of migratepage is to move content of the old page to new page and set up fields of struct page newpage. Keep in mind that you should indicate to the VM the oldpage is no longer movable via __ClearPageMovable() under page_lock if you migrated the oldpage successfully and returns 0. If driver cannot migrate the page at the moment, driver can return -EAGAIN. On -EAGAIN, VM will retry page migration in a short time because VM interprets -EAGAIN as "temporal migration failure". On returning any error except -EAGAIN, VM will give up the page migration without retrying in this time. Driver shouldn't touch page.lru field VM using in the functions. 3. void (*putback_page)(struct page *); If migration fails on isolated page, VM should return the isolated page to the driver so VM calls driver's putback_page with migration failed page. In this function, driver should put the isolated page back to the own data structure. 4. non-lru movable page flags There are two page flags for supporting non-lru movable page. * PG_movable Driver should use the below function to make page movable under page_lock. void __SetPageMovable(struct page *page, struct address_space *mapping) It needs argument of address_space for registering migration family functions which will be called by VM. Exactly speaking, PG_movable is not a real flag of struct page. Rather than, VM reuses page->mapping's lower bits to represent it. #define PAGE_MAPPING_MOVABLE 0x2 page->mapping = page->mapping | PAGE_MAPPING_MOVABLE; so driver shouldn't access page->mapping directly. Instead, driver should use page_mapping which mask off the low two bits of page->mapping so it can get right struct address_space. For testing of non-lru movable page, VM supports __PageMovable function. However, it doesn't guarantee to identify non-lru movable page because page->mapping field is unified with other variables in struct page. As well, if driver releases the page after isolation by VM, page->mapping doesn't have stable value although it has PAGE_MAPPING_MOVABLE (Look at __ClearPageMovable). But __PageMovable is cheap to catch whether page is LRU or non-lru movable once the page has been isolated. Because LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also good for just peeking to test non-lru movable pages before more expensive checking with lock_page in pfn scanning to select victim. For guaranteeing non-lru movable page, VM provides PageMovable function. Unlike __PageMovable, PageMovable functions validates page->mapping and mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden destroying of page->mapping. Driver using __SetPageMovable should clear the flag via __ClearMovablePage under page_lock before the releasing the page. * PG_isolated To prevent concurrent isolation among several CPUs, VM marks isolated page as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru movable page, it can skip it. Driver doesn't need to manipulate the flag because VM will set/clear it automatically. Keep in mind that if driver sees PG_isolated page, it means the page have been isolated by VM so it shouldn't touch page.lru field. PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag for own purpose. [[email protected]: mm/compaction: remove local variable is_lru] Link: http://lkml.kernel.org/r/20160618014841.GA7422@leo-test Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Gioh Kim <[email protected]> Signed-off-by: Minchan Kim <[email protected]> Signed-off-by: Ganesh Mahendran <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rafael Aquini <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: John Einar Reitan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent c6c919e commit bda807d

File tree

14 files changed

+416
-53
lines changed

14 files changed

+416
-53
lines changed

Documentation/filesystems/Locking

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,9 @@ prototypes:
195195
int (*releasepage) (struct page *, int);
196196
void (*freepage)(struct page *);
197197
int (*direct_IO)(struct kiocb *, struct iov_iter *iter);
198+
bool (*isolate_page) (struct page *, isolate_mode_t);
198199
int (*migratepage)(struct address_space *, struct page *, struct page *);
200+
void (*putback_page) (struct page *);
199201
int (*launder_page)(struct page *);
200202
int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
201203
int (*error_remove_page)(struct address_space *, struct page *);
@@ -219,7 +221,9 @@ invalidatepage: yes
219221
releasepage: yes
220222
freepage: yes
221223
direct_IO:
224+
isolate_page: yes
222225
migratepage: yes (both)
226+
putback_page: yes
223227
launder_page: yes
224228
is_partially_uptodate: yes
225229
error_remove_page: yes

Documentation/filesystems/vfs.txt

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -592,9 +592,14 @@ struct address_space_operations {
592592
int (*releasepage) (struct page *, int);
593593
void (*freepage)(struct page *);
594594
ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
595+
/* isolate a page for migration */
596+
bool (*isolate_page) (struct page *, isolate_mode_t);
595597
/* migrate the contents of a page to the specified target */
596598
int (*migratepage) (struct page *, struct page *);
599+
/* put migration-failed page back to right list */
600+
void (*putback_page) (struct page *);
597601
int (*launder_page) (struct page *);
602+
598603
int (*is_partially_uptodate) (struct page *, unsigned long,
599604
unsigned long);
600605
void (*is_dirty_writeback) (struct page *, bool *, bool *);
@@ -747,13 +752,19 @@ struct address_space_operations {
747752
and transfer data directly between the storage and the
748753
application's address space.
749754

755+
isolate_page: Called by the VM when isolating a movable non-lru page.
756+
If page is successfully isolated, VM marks the page as PG_isolated
757+
via __SetPageIsolated.
758+
750759
migrate_page: This is used to compact the physical memory usage.
751760
If the VM wants to relocate a page (maybe off a memory card
752761
that is signalling imminent failure) it will pass a new page
753762
and an old page to this function. migrate_page should
754763
transfer any private data across and update any references
755764
that it has to the page.
756765

766+
putback_page: Called by the VM when isolated page's migration fails.
767+
757768
launder_page: Called before freeing a page - it writes back the dirty page. To
758769
prevent redirtying the page, it is kept locked during the whole
759770
operation.

Documentation/vm/page_migration

Lines changed: 106 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,5 +142,110 @@ Steps:
142142
20. The new page is moved to the LRU and can be scanned by the swapper
143143
etc again.
144144

145-
Christoph Lameter, May 8, 2006.
145+
C. Non-LRU page migration
146+
-------------------------
147+
148+
Although original migration aimed for reducing the latency of memory access
149+
for NUMA, compaction who want to create high-order page is also main customer.
150+
151+
Current problem of the implementation is that it is designed to migrate only
152+
*LRU* pages. However, there are potential non-lru pages which can be migrated
153+
in drivers, for example, zsmalloc, virtio-balloon pages.
154+
155+
For virtio-balloon pages, some parts of migration code path have been hooked
156+
up and added virtio-balloon specific functions to intercept migration logics.
157+
It's too specific to a driver so other drivers who want to make their pages
158+
movable would have to add own specific hooks in migration path.
159+
160+
To overclome the problem, VM supports non-LRU page migration which provides
161+
generic functions for non-LRU movable pages without driver specific hooks
162+
migration path.
163+
164+
If a driver want to make own pages movable, it should define three functions
165+
which are function pointers of struct address_space_operations.
166+
167+
1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);
168+
169+
What VM expects on isolate_page function of driver is to return *true*
170+
if driver isolates page successfully. On returing true, VM marks the page
171+
as PG_isolated so concurrent isolation in several CPUs skip the page
172+
for isolation. If a driver cannot isolate the page, it should return *false*.
173+
174+
Once page is successfully isolated, VM uses page.lru fields so driver
175+
shouldn't expect to preserve values in that fields.
176+
177+
2. int (*migratepage) (struct address_space *mapping,
178+
struct page *newpage, struct page *oldpage, enum migrate_mode);
179+
180+
After isolation, VM calls migratepage of driver with isolated page.
181+
The function of migratepage is to move content of the old page to new page
182+
and set up fields of struct page newpage. Keep in mind that you should
183+
indicate to the VM the oldpage is no longer movable via __ClearPageMovable()
184+
under page_lock if you migrated the oldpage successfully and returns 0.
185+
If driver cannot migrate the page at the moment, driver can return -EAGAIN.
186+
On -EAGAIN, VM will retry page migration in a short time because VM interprets
187+
-EAGAIN as "temporal migration failure". On returning any error except -EAGAIN,
188+
VM will give up the page migration without retrying in this time.
189+
190+
Driver shouldn't touch page.lru field VM using in the functions.
191+
192+
3. void (*putback_page)(struct page *);
193+
194+
If migration fails on isolated page, VM should return the isolated page
195+
to the driver so VM calls driver's putback_page with migration failed page.
196+
In this function, driver should put the isolated page back to the own data
197+
structure.
146198

199+
4. non-lru movable page flags
200+
201+
There are two page flags for supporting non-lru movable page.
202+
203+
* PG_movable
204+
205+
Driver should use the below function to make page movable under page_lock.
206+
207+
void __SetPageMovable(struct page *page, struct address_space *mapping)
208+
209+
It needs argument of address_space for registering migration family functions
210+
which will be called by VM. Exactly speaking, PG_movable is not a real flag of
211+
struct page. Rather than, VM reuses page->mapping's lower bits to represent it.
212+
213+
#define PAGE_MAPPING_MOVABLE 0x2
214+
page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
215+
216+
so driver shouldn't access page->mapping directly. Instead, driver should
217+
use page_mapping which mask off the low two bits of page->mapping under
218+
page lock so it can get right struct address_space.
219+
220+
For testing of non-lru movable page, VM supports __PageMovable function.
221+
However, it doesn't guarantee to identify non-lru movable page because
222+
page->mapping field is unified with other variables in struct page.
223+
As well, if driver releases the page after isolation by VM, page->mapping
224+
doesn't have stable value although it has PAGE_MAPPING_MOVABLE
225+
(Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
226+
page is LRU or non-lru movable once the page has been isolated. Because
227+
LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
228+
good for just peeking to test non-lru movable pages before more expensive
229+
checking with lock_page in pfn scanning to select victim.
230+
231+
For guaranteeing non-lru movable page, VM provides PageMovable function.
232+
Unlike __PageMovable, PageMovable functions validates page->mapping and
233+
mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
234+
destroying of page->mapping.
235+
236+
Driver using __SetPageMovable should clear the flag via __ClearMovablePage
237+
under page_lock before the releasing the page.
238+
239+
* PG_isolated
240+
241+
To prevent concurrent isolation among several CPUs, VM marks isolated page
242+
as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru
243+
movable page, it can skip it. Driver doesn't need to manipulate the flag
244+
because VM will set/clear it automatically. Keep in mind that if driver
245+
sees PG_isolated page, it means the page have been isolated by VM so it
246+
shouldn't touch page.lru field.
247+
PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
248+
for own purpose.
249+
250+
Christoph Lameter, May 8, 2006.
251+
Minchan Kim, Mar 28, 2016.

include/linux/compaction.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,9 @@ enum compact_result {
5454
struct alloc_context; /* in mm/internal.h */
5555

5656
#ifdef CONFIG_COMPACTION
57+
extern int PageMovable(struct page *page);
58+
extern void __SetPageMovable(struct page *page, struct address_space *mapping);
59+
extern void __ClearPageMovable(struct page *page);
5760
extern int sysctl_compact_memory;
5861
extern int sysctl_compaction_handler(struct ctl_table *table, int write,
5962
void __user *buffer, size_t *length, loff_t *ppos);
@@ -151,6 +154,19 @@ extern void kcompactd_stop(int nid);
151154
extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx);
152155

153156
#else
157+
static inline int PageMovable(struct page *page)
158+
{
159+
return 0;
160+
}
161+
static inline void __SetPageMovable(struct page *page,
162+
struct address_space *mapping)
163+
{
164+
}
165+
166+
static inline void __ClearPageMovable(struct page *page)
167+
{
168+
}
169+
154170
static inline enum compact_result try_to_compact_pages(gfp_t gfp_mask,
155171
unsigned int order, int alloc_flags,
156172
const struct alloc_context *ac,
@@ -212,6 +228,7 @@ static inline void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_i
212228
#endif /* CONFIG_COMPACTION */
213229

214230
#if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
231+
struct node;
215232
extern int compaction_register_node(struct node *node);
216233
extern void compaction_unregister_node(struct node *node);
217234

include/linux/fs.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -402,6 +402,8 @@ struct address_space_operations {
402402
*/
403403
int (*migratepage) (struct address_space *,
404404
struct page *, struct page *, enum migrate_mode);
405+
bool (*isolate_page)(struct page *, isolate_mode_t);
406+
void (*putback_page)(struct page *);
405407
int (*launder_page) (struct page *);
406408
int (*is_partially_uptodate) (struct page *, unsigned long,
407409
unsigned long);

include/linux/ksm.h

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,7 @@ static inline struct stable_node *page_stable_node(struct page *page)
4343
static inline void set_page_stable_node(struct page *page,
4444
struct stable_node *stable_node)
4545
{
46-
page->mapping = (void *)stable_node +
47-
(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
46+
page->mapping = (void *)((unsigned long)stable_node | PAGE_MAPPING_KSM);
4847
}
4948

5049
/*

include/linux/migrate.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,8 @@ extern int migrate_page(struct address_space *,
3737
struct page *, struct page *, enum migrate_mode);
3838
extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
3939
unsigned long private, enum migrate_mode mode, int reason);
40+
extern bool isolate_movable_page(struct page *page, isolate_mode_t mode);
41+
extern void putback_movable_page(struct page *page);
4042

4143
extern int migrate_prep(void);
4244
extern int migrate_prep_local(void);

include/linux/mm.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1035,6 +1035,7 @@ static inline pgoff_t page_file_index(struct page *page)
10351035
}
10361036

10371037
bool page_mapped(struct page *page);
1038+
struct address_space *page_mapping(struct page *page);
10381039

10391040
/*
10401041
* Return true only if the page has been allocated with

include/linux/page-flags.h

Lines changed: 23 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,9 @@ enum pageflags {
129129

130130
/* Compound pages. Stored in first tail page's flags */
131131
PG_double_map = PG_private_2,
132+
133+
/* non-lru isolated movable page */
134+
PG_isolated = PG_reclaim,
132135
};
133136

134137
#ifndef __GENERATING_BOUNDS_H
@@ -357,29 +360,37 @@ PAGEFLAG(Idle, idle, PF_ANY)
357360
* with the PAGE_MAPPING_ANON bit set to distinguish it. See rmap.h.
358361
*
359362
* On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
360-
* the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
361-
* and then page->mapping points, not to an anon_vma, but to a private
363+
* the PAGE_MAPPING_MOVABLE bit may be set along with the PAGE_MAPPING_ANON
364+
* bit; and then page->mapping points, not to an anon_vma, but to a private
362365
* structure which KSM associates with that merged page. See ksm.h.
363366
*
364-
* PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
367+
* PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is used for non-lru movable
368+
* page and then page->mapping points a struct address_space.
365369
*
366370
* Please note that, confusingly, "page_mapping" refers to the inode
367371
* address_space which maps the page from disk; whereas "page_mapped"
368372
* refers to user virtual address space into which the page is mapped.
369373
*/
370-
#define PAGE_MAPPING_ANON 1
371-
#define PAGE_MAPPING_KSM 2
372-
#define PAGE_MAPPING_FLAGS (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
374+
#define PAGE_MAPPING_ANON 0x1
375+
#define PAGE_MAPPING_MOVABLE 0x2
376+
#define PAGE_MAPPING_KSM (PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
377+
#define PAGE_MAPPING_FLAGS (PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
373378

374-
static __always_inline int PageAnonHead(struct page *page)
379+
static __always_inline int PageMappingFlags(struct page *page)
375380
{
376-
return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
381+
return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) != 0;
377382
}
378383

379384
static __always_inline int PageAnon(struct page *page)
380385
{
381386
page = compound_head(page);
382-
return PageAnonHead(page);
387+
return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
388+
}
389+
390+
static __always_inline int __PageMovable(struct page *page)
391+
{
392+
return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
393+
PAGE_MAPPING_MOVABLE;
383394
}
384395

385396
#ifdef CONFIG_KSM
@@ -393,7 +404,7 @@ static __always_inline int PageKsm(struct page *page)
393404
{
394405
page = compound_head(page);
395406
return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
396-
(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
407+
PAGE_MAPPING_KSM;
397408
}
398409
#else
399410
TESTPAGEFLAG_FALSE(Ksm)
@@ -641,6 +652,8 @@ static inline void __ClearPageBalloon(struct page *page)
641652
atomic_set(&page->_mapcount, -1);
642653
}
643654

655+
__PAGEFLAG(Isolated, isolated, PF_ANY);
656+
644657
/*
645658
* If network-based swap is enabled, sl*b must keep track of whether pages
646659
* were allocated from pfmemalloc reserves.

0 commit comments

Comments
 (0)