Skip to content

Commit bad8c6c

Browse files
JoonsooKimtorvalds
authored andcommitted
mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE
Patch series "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE", v2. 0. History This patchset is the follow-up of the discussion about the "Introduce ZONE_CMA (v7)" [1]. Please reference it if more information is needed. 1. What does this patch do? This patch changes the management way for the memory of the CMA area in the MM subsystem. Currently the memory of the CMA area is managed by the zone where their pfn is belong to. However, this approach has some problems since MM subsystem doesn't have enough logic to handle the situation that different characteristic memories are in a single zone. To solve this issue, this patch try to manage all the memory of the CMA area by using the MOVABLE zone. In MM subsystem's point of view, characteristic of the memory on the MOVABLE zone and the memory of the CMA area are the same. So, managing the memory of the CMA area by using the MOVABLE zone will not have any problem. 2. Motivation There are some problems with current approach. See following. Although these problem would not be inherent and it could be fixed without this conception change, it requires many hooks addition in various code path and it would be intrusive to core MM and would be really error-prone. Therefore, I try to solve them with this new approach. Anyway, following is the problems of the current implementation. o CMA memory utilization First, following is the freepage calculation logic in MM. - For movable allocation: freepage = total freepage - For unmovable allocation: freepage = total freepage - CMA freepage Freepages on the CMA area is used after the normal freepages in the zone where the memory of the CMA area is belong to are exhausted. At that moment that the number of the normal freepages is zero, so - For movable allocation: freepage = total freepage = CMA freepage - For unmovable allocation: freepage = 0 If unmovable allocation comes at this moment, allocation request would fail to pass the watermark check and reclaim is started. After reclaim, there would exist the normal freepages so freepages on the CMA areas would not be used. FYI, there is another attempt [2] trying to solve this problem in lkml. And, as far as I know, Qualcomm also has out-of-tree solution for this problem. Useless reclaim: There is no logic to distinguish CMA pages in the reclaim path. Hence, CMA page is reclaimed even if the system just needs the page that can be usable for the kernel allocation. Atomic allocation failure: This is also related to the fallback allocation policy for the memory of the CMA area. Consider the situation that the number of the normal freepages is *zero* since the bunch of the movable allocation requests come. Kswapd would not be woken up due to following freepage calculation logic. - For movable allocation: freepage = total freepage = CMA freepage If atomic unmovable allocation request comes at this moment, it would fails due to following logic. - For unmovable allocation: freepage = total freepage - CMA freepage = 0 It was reported by Aneesh [3]. Useless compaction: Usual high-order allocation request is unmovable allocation request and it cannot be served from the memory of the CMA area. In compaction, migration scanner try to migrate the page in the CMA area and make high-order page there. As mentioned above, it cannot be usable for the unmovable allocation request so it's just waste. 3. Current approach and new approach Current approach is that the memory of the CMA area is managed by the zone where their pfn is belong to. However, these memory should be distinguishable since they have a strong limitation. So, they are marked as MIGRATE_CMA in pageblock flag and handled specially. However, as mentioned in section 2, the MM subsystem doesn't have enough logic to deal with this special pageblock so many problems raised. New approach is that the memory of the CMA area is managed by the MOVABLE zone. MM already have enough logic to deal with special zone like as HIGHMEM and MOVABLE zone. So, managing the memory of the CMA area by the MOVABLE zone just naturally work well because constraints for the memory of the CMA area that the memory should always be migratable is the same with the constraint for the MOVABLE zone. There is one side-effect for the usability of the memory of the CMA area. The use of MOVABLE zone is only allowed for a request with GFP_HIGHMEM && GFP_MOVABLE so now the memory of the CMA area is also only allowed for this gfp flag. Before this patchset, a request with GFP_MOVABLE can use them. IMO, It would not be a big issue since most of GFP_MOVABLE request also has GFP_HIGHMEM flag. For example, file cache page and anonymous page. However, file cache page for blockdev file is an exception. Request for it has no GFP_HIGHMEM flag. There is pros and cons on this exception. In my experience, blockdev file cache pages are one of the top reason that causes cma_alloc() to fail temporarily. So, we can get more guarantee of cma_alloc() success by discarding this case. Note that there is no change in admin POV since this patchset is just for internal implementation change in MM subsystem. Just one minor difference for admin is that the memory stat for CMA area will be printed in the MOVABLE zone. That's all. 4. Result Following is the experimental result related to utilization problem. 8 CPUs, 1024 MB, VIRTUAL MACHINE make -j16 <Before> CMA area: 0 MB 512 MB Elapsed-time: 92.4 186.5 pswpin: 82 18647 pswpout: 160 69839 <After> CMA : 0 MB 512 MB Elapsed-time: 93.1 93.4 pswpin: 84 46 pswpout: 183 92 akpm: "kernel test robot" reported a 26% improvement in vm-scalability.throughput: http://lkml.kernel.org/r/20180330012721.GA3845@yexl-desktop [1]: lkml.kernel.org/r/[email protected] [2]: https://lkml.org/lkml/2014/10/15/623 [3]: http://www.spinics.net/lists/linux-mm/msg100562.html Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joonsoo Kim <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Tested-by: Tony Lindgren <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Russell King <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent d3cda23 commit bad8c6c

File tree

5 files changed

+126
-19
lines changed

5 files changed

+126
-19
lines changed

include/linux/memory_hotplug.h

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -216,9 +216,6 @@ void put_online_mems(void);
216216
void mem_hotplug_begin(void);
217217
void mem_hotplug_done(void);
218218

219-
extern void set_zone_contiguous(struct zone *zone);
220-
extern void clear_zone_contiguous(struct zone *zone);
221-
222219
#else /* ! CONFIG_MEMORY_HOTPLUG */
223220
#define pfn_to_online_page(pfn) \
224221
({ \

include/linux/mm.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2108,6 +2108,7 @@ extern void setup_per_cpu_pageset(void);
21082108

21092109
extern void zone_pcp_update(struct zone *zone);
21102110
extern void zone_pcp_reset(struct zone *zone);
2111+
extern void setup_zone_pageset(struct zone *zone);
21112112

21122113
/* page_alloc.c */
21132114
extern int min_free_kbytes;

mm/cma.c

Lines changed: 72 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
#include <trace/events/cma.h>
4040

4141
#include "cma.h"
42+
#include "internal.h"
4243

4344
struct cma cma_areas[MAX_CMA_AREAS];
4445
unsigned cma_area_count;
@@ -109,23 +110,25 @@ static int __init cma_activate_area(struct cma *cma)
109110
if (!cma->bitmap)
110111
return -ENOMEM;
111112

112-
WARN_ON_ONCE(!pfn_valid(pfn));
113-
zone = page_zone(pfn_to_page(pfn));
114-
115113
do {
116114
unsigned j;
117115

118116
base_pfn = pfn;
117+
if (!pfn_valid(base_pfn))
118+
goto err;
119+
120+
zone = page_zone(pfn_to_page(base_pfn));
119121
for (j = pageblock_nr_pages; j; --j, pfn++) {
120-
WARN_ON_ONCE(!pfn_valid(pfn));
122+
if (!pfn_valid(pfn))
123+
goto err;
124+
121125
/*
122-
* alloc_contig_range requires the pfn range
123-
* specified to be in the same zone. Make this
124-
* simple by forcing the entire CMA resv range
125-
* to be in the same zone.
126+
* In init_cma_reserved_pageblock(), present_pages
127+
* is adjusted with assumption that all pages in
128+
* the pageblock come from a single zone.
126129
*/
127130
if (page_zone(pfn_to_page(pfn)) != zone)
128-
goto not_in_zone;
131+
goto err;
129132
}
130133
init_cma_reserved_pageblock(pfn_to_page(base_pfn));
131134
} while (--i);
@@ -139,7 +142,7 @@ static int __init cma_activate_area(struct cma *cma)
139142

140143
return 0;
141144

142-
not_in_zone:
145+
err:
143146
pr_err("CMA area %s could not be activated\n", cma->name);
144147
kfree(cma->bitmap);
145148
cma->count = 0;
@@ -149,6 +152,41 @@ static int __init cma_activate_area(struct cma *cma)
149152
static int __init cma_init_reserved_areas(void)
150153
{
151154
int i;
155+
struct zone *zone;
156+
pg_data_t *pgdat;
157+
158+
if (!cma_area_count)
159+
return 0;
160+
161+
for_each_online_pgdat(pgdat) {
162+
unsigned long start_pfn = UINT_MAX, end_pfn = 0;
163+
164+
zone = &pgdat->node_zones[ZONE_MOVABLE];
165+
166+
/*
167+
* In this case, we cannot adjust the zone range
168+
* since it is now maximum node span and we don't
169+
* know original zone range.
170+
*/
171+
if (populated_zone(zone))
172+
continue;
173+
174+
for (i = 0; i < cma_area_count; i++) {
175+
if (pfn_to_nid(cma_areas[i].base_pfn) !=
176+
pgdat->node_id)
177+
continue;
178+
179+
start_pfn = min(start_pfn, cma_areas[i].base_pfn);
180+
end_pfn = max(end_pfn, cma_areas[i].base_pfn +
181+
cma_areas[i].count);
182+
}
183+
184+
if (!end_pfn)
185+
continue;
186+
187+
zone->zone_start_pfn = start_pfn;
188+
zone->spanned_pages = end_pfn - start_pfn;
189+
}
152190

153191
for (i = 0; i < cma_area_count; i++) {
154192
int ret = cma_activate_area(&cma_areas[i]);
@@ -157,9 +195,32 @@ static int __init cma_init_reserved_areas(void)
157195
return ret;
158196
}
159197

198+
/*
199+
* Reserved pages for ZONE_MOVABLE are now activated and
200+
* this would change ZONE_MOVABLE's managed page counter and
201+
* the other zones' present counter. We need to re-calculate
202+
* various zone information that depends on this initialization.
203+
*/
204+
build_all_zonelists(NULL);
205+
for_each_populated_zone(zone) {
206+
if (zone_idx(zone) == ZONE_MOVABLE) {
207+
zone_pcp_reset(zone);
208+
setup_zone_pageset(zone);
209+
} else
210+
zone_pcp_update(zone);
211+
212+
set_zone_contiguous(zone);
213+
}
214+
215+
/*
216+
* We need to re-init per zone wmark by calling
217+
* init_per_zone_wmark_min() but doesn't call here because it is
218+
* registered on core_initcall and it will be called later than us.
219+
*/
220+
160221
return 0;
161222
}
162-
core_initcall(cma_init_reserved_areas);
223+
pure_initcall(cma_init_reserved_areas);
163224

164225
/**
165226
* cma_init_reserved_mem() - create custom contiguous area from reserved memory

mm/internal.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,9 @@ extern void post_alloc_hook(struct page *page, unsigned int order,
168168
gfp_t gfp_flags);
169169
extern int user_min_free_kbytes;
170170

171+
extern void set_zone_contiguous(struct zone *zone);
172+
extern void clear_zone_contiguous(struct zone *zone);
173+
171174
#if defined CONFIG_COMPACTION || defined CONFIG_CMA
172175

173176
/*

mm/page_alloc.c

Lines changed: 50 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1747,16 +1747,38 @@ void __init page_alloc_init_late(void)
17471747
}
17481748

17491749
#ifdef CONFIG_CMA
1750+
static void __init adjust_present_page_count(struct page *page, long count)
1751+
{
1752+
struct zone *zone = page_zone(page);
1753+
1754+
/* We don't need to hold a lock since it is boot-up process */
1755+
zone->present_pages += count;
1756+
}
1757+
17501758
/* Free whole pageblock and set its migration type to MIGRATE_CMA. */
17511759
void __init init_cma_reserved_pageblock(struct page *page)
17521760
{
17531761
unsigned i = pageblock_nr_pages;
1762+
unsigned long pfn = page_to_pfn(page);
17541763
struct page *p = page;
1764+
int nid = page_to_nid(page);
1765+
1766+
/*
1767+
* ZONE_MOVABLE will steal present pages from other zones by
1768+
* changing page links so page_zone() is changed. Before that,
1769+
* we need to adjust previous zone's page count first.
1770+
*/
1771+
adjust_present_page_count(page, -pageblock_nr_pages);
17551772

17561773
do {
17571774
__ClearPageReserved(p);
17581775
set_page_count(p, 0);
1759-
} while (++p, --i);
1776+
1777+
/* Steal pages from other zones */
1778+
set_page_links(p, ZONE_MOVABLE, nid, pfn);
1779+
} while (++p, ++pfn, --i);
1780+
1781+
adjust_present_page_count(page, pageblock_nr_pages);
17601782

17611783
set_pageblock_migratetype(page, MIGRATE_CMA);
17621784

@@ -6208,6 +6230,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
62086230
{
62096231
enum zone_type j;
62106232
int nid = pgdat->node_id;
6233+
unsigned long node_end_pfn = 0;
62116234

62126235
pgdat_resize_init(pgdat);
62136236
#ifdef CONFIG_NUMA_BALANCING
@@ -6235,9 +6258,13 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
62356258
struct zone *zone = pgdat->node_zones + j;
62366259
unsigned long size, realsize, freesize, memmap_pages;
62376260
unsigned long zone_start_pfn = zone->zone_start_pfn;
6261+
unsigned long movable_size = 0;
62386262

62396263
size = zone->spanned_pages;
62406264
realsize = freesize = zone->present_pages;
6265+
if (zone_end_pfn(zone) > node_end_pfn)
6266+
node_end_pfn = zone_end_pfn(zone);
6267+
62416268

62426269
/*
62436270
* Adjust freesize so that it accounts for how much memory
@@ -6286,12 +6313,30 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
62866313
zone_seqlock_init(zone);
62876314
zone_pcp_init(zone);
62886315

6289-
if (!size)
6316+
/*
6317+
* The size of the CMA area is unknown now so we need to
6318+
* prepare the memory for the usemap at maximum.
6319+
*/
6320+
if (IS_ENABLED(CONFIG_CMA) && j == ZONE_MOVABLE &&
6321+
pgdat->node_spanned_pages) {
6322+
movable_size = node_end_pfn - pgdat->node_start_pfn;
6323+
}
6324+
6325+
if (!size && !movable_size)
62906326
continue;
62916327

62926328
set_pageblock_order();
6293-
setup_usemap(pgdat, zone, zone_start_pfn, size);
6294-
init_currently_empty_zone(zone, zone_start_pfn, size);
6329+
if (movable_size) {
6330+
zone->zone_start_pfn = pgdat->node_start_pfn;
6331+
zone->spanned_pages = movable_size;
6332+
setup_usemap(pgdat, zone,
6333+
pgdat->node_start_pfn, movable_size);
6334+
init_currently_empty_zone(zone,
6335+
pgdat->node_start_pfn, movable_size);
6336+
} else {
6337+
setup_usemap(pgdat, zone, zone_start_pfn, size);
6338+
init_currently_empty_zone(zone, zone_start_pfn, size);
6339+
}
62956340
memmap_init(size, nid, j, zone_start_pfn);
62966341
}
62976342
}
@@ -7932,7 +7977,7 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
79327977
}
79337978
#endif
79347979

7935-
#ifdef CONFIG_MEMORY_HOTPLUG
7980+
#if defined CONFIG_MEMORY_HOTPLUG || defined CONFIG_CMA
79367981
/*
79377982
* The zone indicated has a new number of managed_pages; batch sizes and percpu
79387983
* page high values need to be recalulated.

0 commit comments

Comments
 (0)