Skip to content

Commit a128ca7

Browse files
shligittorvalds
authored andcommitted
mm: delete unnecessary TTU_* flags
Patch series "mm: fix some MADV_FREE issues", v5. We are trying to use MADV_FREE in jemalloc. Several issues are found. Without solving the issues, jemalloc can't use the MADV_FREE feature. - Doesn't support system without swap enabled. Because if swap is off, we can't or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed with other anonymous pages, we can't reclaim MADV_FREE pages. In current implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled. But in our environment, a lot of machines don't enable swap. This will prevent our setup using MADV_FREE. - Increases memory pressure. page reclaim bias file pages reclaim against anonymous pages. This doesn't make sense for MADV_FREE pages, because those pages could be freed easily and refilled with very slight penality. Even page reclaim doesn't bias file pages, there is still an issue, because MADV_FREE pages and other anonymous pages are mixed together. To reclaim a MADV_FREE page, we probably must scan a lot of other anonymous pages, which is inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing without it. - Accounting. There are two accounting problems. We don't have a global accounting. If the system is abnormal, we don't know if it's a problem from MADV_FREE side. The other problem is RSS accounting. MADV_FREE pages are accounted as normal anon pages and reclaimed lazily, so application's RSS becomes bigger. This confuses our workloads. We have monitoring daemon running and if it finds applications' RSS becomes abnormal, the daemon will kill the applications even kernel can reclaim the memory easily. To address the first the two issues, we can either put MADV_FREE pages into a separate LRU list (Minchan's previous patches and V1 patches), or put them into LRU_INACTIVE_FILE list (suggested by Johannes). The patchset use the second idea. The reason is LRU_INACTIVE_FILE list is tiny nowadays and should be full of used once file pages. So we can still efficiently reclaim MADV_FREE pages there without interference with other anon and active file pages. Putting the pages into inactive file list also has an advantage which allows page reclaim to prioritize MADV_FREE pages and used once file pages. MADV_FREE pages are put into the lru list and clear SwapBacked flag, so PageAnon(page) && !PageSwapBacked(page) will indicate a MADV_FREE pages. These pages will directly freed without pageout if they are clean, otherwise normal swap will reclaim them. For the third issue, the previous post adds global accounting and a separate RSS count for MADV_FREE pages. The problem is we never get accurate accounting for MADV_FREE pages. The pages are mapped to userspace, can be dirtied without notice from kernel side. To get accurate accounting, we could write protect the page, but then there is extra page fault overhead, which people don't want to pay. Jemalloc guys have concerns about the inaccurate accounting, so this post drops the accounting patches temporarily. The info exported to /proc/pid/smaps for MADV_FREE pages are kept, which is the only place we can get accurate accounting right now. This patch (of 6): Johannes pointed out TTU_LZFREE is unnecessary. It's true because we always have the flag set if we want to do an unmap. For cases we don't do an unmap, the TTU_LZFREE part of code should never run. Also the TTU_UNMAP is unnecessary. If no other flags set (for example, TTU_MIGRATION), an unmap is implied. The patch includes Johannes's cleanup and dead TTU_ACTION macro removal code Link: http://lkml.kernel.org/r/4be3ea1bc56b26fd98a54d0a6f70bec63f6d8980.1487965799.git.shli@fb.com Signed-off-by: Shaohua Li <[email protected]> Suggested-by: Johannes Weiner <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 0a372d0 commit a128ca7

File tree

4 files changed

+15
-22
lines changed

4 files changed

+15
-22
lines changed

include/linux/rmap.h

Lines changed: 9 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -83,19 +83,17 @@ struct anon_vma_chain {
8383
};
8484

8585
enum ttu_flags {
86-
TTU_UNMAP = 1, /* unmap mode */
87-
TTU_MIGRATION = 2, /* migration mode */
88-
TTU_MUNLOCK = 4, /* munlock mode */
89-
TTU_LZFREE = 8, /* lazy free mode */
90-
TTU_SPLIT_HUGE_PMD = 16, /* split huge PMD if any */
91-
92-
TTU_IGNORE_MLOCK = (1 << 8), /* ignore mlock */
93-
TTU_IGNORE_ACCESS = (1 << 9), /* don't age */
94-
TTU_IGNORE_HWPOISON = (1 << 10),/* corrupted page is recoverable */
95-
TTU_BATCH_FLUSH = (1 << 11), /* Batch TLB flushes where possible
86+
TTU_MIGRATION = 0x1, /* migration mode */
87+
TTU_MUNLOCK = 0x2, /* munlock mode */
88+
89+
TTU_SPLIT_HUGE_PMD = 0x4, /* split huge PMD if any */
90+
TTU_IGNORE_MLOCK = 0x8, /* ignore mlock */
91+
TTU_IGNORE_ACCESS = 0x10, /* don't age */
92+
TTU_IGNORE_HWPOISON = 0x20, /* corrupted page is recoverable */
93+
TTU_BATCH_FLUSH = 0x40, /* Batch TLB flushes where possible
9694
* and caller guarantees they will
9795
* do a final flush if necessary */
98-
TTU_RMAP_LOCKED = (1 << 12) /* do not grab rmap lock:
96+
TTU_RMAP_LOCKED = 0x80 /* do not grab rmap lock:
9997
* caller holds it */
10098
};
10199

@@ -193,8 +191,6 @@ static inline void page_dup_rmap(struct page *page, bool compound)
193191
int page_referenced(struct page *, int is_locked,
194192
struct mem_cgroup *memcg, unsigned long *vm_flags);
195193

196-
#define TTU_ACTION(x) ((x) & TTU_ACTION_MASK)
197-
198194
int try_to_unmap(struct page *, enum ttu_flags flags);
199195

200196
/* Avoid racy checks */

mm/memory-failure.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -907,7 +907,7 @@ EXPORT_SYMBOL_GPL(get_hwpoison_page);
907907
static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
908908
int trapno, int flags, struct page **hpagep)
909909
{
910-
enum ttu_flags ttu = TTU_UNMAP | TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS;
910+
enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS;
911911
struct address_space *mapping;
912912
LIST_HEAD(tokill);
913913
int ret;

mm/rmap.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1426,7 +1426,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
14261426
*/
14271427
VM_BUG_ON_PAGE(!PageSwapCache(page), page);
14281428

1429-
if (!PageDirty(page) && (flags & TTU_LZFREE)) {
1429+
if (!PageDirty(page)) {
14301430
/* It's a freeable page by MADV_FREE */
14311431
dec_mm_counter(mm, MM_ANONPAGES);
14321432
rp->lazyfreed++;

mm/vmscan.c

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -966,7 +966,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
966966
int may_enter_fs;
967967
enum page_references references = PAGEREF_RECLAIM_CLEAN;
968968
bool dirty, writeback;
969-
bool lazyfree = false;
970969
int ret = SWAP_SUCCESS;
971970

972971
cond_resched();
@@ -1120,7 +1119,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
11201119
goto keep_locked;
11211120
if (!add_to_swap(page, page_list))
11221121
goto activate_locked;
1123-
lazyfree = true;
11241122
may_enter_fs = 1;
11251123

11261124
/* Adding to swap updated mapping */
@@ -1138,9 +1136,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
11381136
* processes. Try to unmap it here.
11391137
*/
11401138
if (page_mapped(page) && mapping) {
1141-
switch (ret = try_to_unmap(page, lazyfree ?
1142-
(ttu_flags | TTU_BATCH_FLUSH | TTU_LZFREE) :
1143-
(ttu_flags | TTU_BATCH_FLUSH))) {
1139+
switch (ret = try_to_unmap(page,
1140+
ttu_flags | TTU_BATCH_FLUSH)) {
11441141
case SWAP_FAIL:
11451142
nr_unmap_fail++;
11461143
goto activate_locked;
@@ -1348,7 +1345,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
13481345
}
13491346

13501347
ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
1351-
TTU_UNMAP|TTU_IGNORE_ACCESS, NULL, true);
1348+
TTU_IGNORE_ACCESS, NULL, true);
13521349
list_splice(&clean_pages, page_list);
13531350
mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret);
13541351
return ret;
@@ -1740,7 +1737,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
17401737
if (nr_taken == 0)
17411738
return 0;
17421739

1743-
nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, TTU_UNMAP,
1740+
nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, 0,
17441741
&stat, false);
17451742

17461743
spin_lock_irq(&pgdat->lru_lock);

0 commit comments

Comments
 (0)