Skip to content

Commit 5bbe354

Browse files
Eric B Munsontorvalds
authored andcommitted
mm: allow compaction of unevictable pages
Currently, pages which are marked as unevictable are protected from compaction, but not from other types of migration. The POSIX real time extension explicitly states that mlock() will prevent a major page fault, but the spirit of this is that mlock() should give a process the ability to control sources of latency, including minor page faults. However, the mlock manpage only explicitly says that a locked page will not be written to swap and this can cause some confusion. The compaction code today does not give a developer who wants to avoid swap but wants to have large contiguous areas available any method to achieve this state. This patch introduces a sysctl for controlling compaction behavior with respect to the unevictable lru. Users who demand no page faults after a page is present can set compact_unevictable_allowed to 0 and users who need the large contiguous areas can enable compaction on locked memory by leaving the default value of 1. To illustrate this problem I wrote a quick test program that mmaps a large number of 1MB files filled with random data. These maps are created locked and read only. Then every other mmap is unmapped and I attempt to allocate huge pages to the static huge page pool. When the compact_unevictable_allowed sysctl is 0, I cannot allocate hugepages after fragmenting memory. When the value is set to 1, allocations succeed. Signed-off-by: Eric B Munson <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mel Gorman <[email protected]> Cc: David Rientjes <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent a4bb3ec commit 5bbe354

File tree

4 files changed

+28
-0
lines changed

4 files changed

+28
-0
lines changed

Documentation/sysctl/vm.txt

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Currently, these files are in /proc/sys/vm:
2121
- admin_reserve_kbytes
2222
- block_dump
2323
- compact_memory
24+
- compact_unevictable_allowed
2425
- dirty_background_bytes
2526
- dirty_background_ratio
2627
- dirty_bytes
@@ -106,6 +107,16 @@ huge pages although processes will also directly compact memory as required.
106107

107108
==============================================================
108109

110+
compact_unevictable_allowed
111+
112+
Available only when CONFIG_COMPACTION is set. When set to 1, compaction is
113+
allowed to examine the unevictable lru (mlocked pages) for pages to compact.
114+
This should be used on systems where stalls for minor page faults are an
115+
acceptable trade for large contiguous free memory. Set to 0 to prevent
116+
compaction from moving pages that are unevictable. Default value is 1.
117+
118+
==============================================================
119+
109120
dirty_background_bytes
110121

111122
Contains the amount of dirty memory at which the background kernel

include/linux/compaction.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ extern int sysctl_compaction_handler(struct ctl_table *table, int write,
3434
extern int sysctl_extfrag_threshold;
3535
extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
3636
void __user *buffer, size_t *length, loff_t *ppos);
37+
extern int sysctl_compact_unevictable_allowed;
3738

3839
extern int fragmentation_index(struct zone *zone, unsigned int order);
3940
extern unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,

kernel/sysctl.c

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1335,6 +1335,15 @@ static struct ctl_table vm_table[] = {
13351335
.extra1 = &min_extfrag_threshold,
13361336
.extra2 = &max_extfrag_threshold,
13371337
},
1338+
{
1339+
.procname = "compact_unevictable_allowed",
1340+
.data = &sysctl_compact_unevictable_allowed,
1341+
.maxlen = sizeof(int),
1342+
.mode = 0644,
1343+
.proc_handler = proc_dointvec,
1344+
.extra1 = &zero,
1345+
.extra2 = &one,
1346+
},
13381347

13391348
#endif /* CONFIG_COMPACTION */
13401349
{

mm/compaction.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1046,6 +1046,12 @@ typedef enum {
10461046
ISOLATE_SUCCESS, /* Pages isolated, migrate */
10471047
} isolate_migrate_t;
10481048

1049+
/*
1050+
* Allow userspace to control policy on scanning the unevictable LRU for
1051+
* compactable pages.
1052+
*/
1053+
int sysctl_compact_unevictable_allowed __read_mostly = 1;
1054+
10491055
/*
10501056
* Isolate all pages that can be migrated from the first suitable block,
10511057
* starting at the block pointed to by the migrate scanner pfn within
@@ -1057,6 +1063,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
10571063
unsigned long low_pfn, end_pfn;
10581064
struct page *page;
10591065
const isolate_mode_t isolate_mode =
1066+
(sysctl_compact_unevictable_allowed ? ISOLATE_UNEVICTABLE : 0) |
10601067
(cc->mode == MIGRATE_ASYNC ? ISOLATE_ASYNC_MIGRATE : 0);
10611068

10621069
/*

0 commit comments

Comments
 (0)